hello data scientist welcome to skill gate in this tutorial i'll give you a quick walkthrough to bert which serves to be the swiss army knife solution for 11 plus of the most common language tasks but has an insane understanding of 70 plus global languages including english so with transfer learning we may use bert as our base model and fine tune it for our specific nlp problem like sentiment analysis article summarization fake news detection etc with this we have a production ready highly accurate model in just about no time by the way this tutorial is part number two to our three part transfer learning series where we are going to build a fake news detection model using the pre-trained birth model in the first part i explained what transfer learning is and how it's the next big thing in the machine learning space do check out this tutorial link is in the description now let's get started with our birth intuition and walkthrough understanding language has always been a difficult affair for computers sure computers can collect store and read textual inputs but they lack basic language context then came natural language processing or nlp a new field of artificial intelligence that aimed at enabling computers to read analyze interpret and derive meaning from text and spoken words just like we do as humans nlp combines linguistics statistics and machine learning to assist computers in understanding human language over the years individual nlp tasks were sold by individual models created for each specific task for example this sentiment analysis project that i have done previously is a completely independent model having its own intelligence to understand language and predict the sentiment as there was no way to borrow this language understanding from some external source and transfer it for our use case and this changed with the arrival of bert bert which stands for bidirectional encoder representation from transformers is an open source machine learning framework for natural language processing developed by researchers at google in the year 2018.
the bird framework was specifically trained on the whole wikipedia which is 2.5 billion words and google book corpus which is around 800 million words these large informational data sets contributed to bert's deep knowledge not only of the english language but also of our world training on our data set this large took a long time but training was made possible thanks to a novel transformer architecture and sped up by using tensor processor units which is google's custom circuit built specifically for training large ml models with 64 of these tpus bird training took around four days originally there were two bert models released by google bird large and the smaller bird base which has slightly lower accuracy but is still comparable to the other state-of-the-art models on performance by the way we shall be using bird base for our hand zone in this tutorial here's the visualization of the bird network created by delvin and his research team at google ai language as presented originally in their paper but pre-training of deep bi-directional transformers for language understanding for your further reading on this i will provide the link to this paper in the description part below historically language models could only read text inputs sequentially either left to right or right to left but couldn't do both at the same time bird is different that way as it is designed to read in both directions at once thanks to its transformer architecture this capability enabled by the introduction of transformers is known as bi-directionality using this bi-directional capability bird is pre-trained on two different but related nlp tasks mask language modelling and next sentence prediction the objective of mask language model or mlm training is to hide a word in a sentence like this and then have the program predict what word has been hidden or masked based on the hidden words context in this sentence the objective of next sentence prediction training is to have the program predict whether two given sentences have a logical and sequential connection or whether the relationship is simply random and as a matter of fact bert is trained on both mlm and nsp in the 50 50 ratio at the same time as i said a while back birds training was made possible thanks to the novel transformer architecture which was first introduced by google in the year 2017 the transformers process any given word in relation to all other words in a sentence rather than processing them one at a time as is the case here for this sample sentence the animal didn't cross the street because it was too wide and here the word it is masked by looking at all the surrounding words the transformer allows the bird model to understand the full context of this sentence and this is contrasting against the traditional method of language processing known as word embedding in which previous models like globe e and word to wek would map every single word to a vector which represents only that word's meaning and because of this these techniques fail at the context heavy use cases because all words are in some sense fixed to a vector or meaning but is also the first nlp technique to rely solely on the self-attention mechanism which is made possible by the bi-directional transformers at the center of bird's design this is significant because often a word may change meaning as a sentence develops each word added augments the overall meaning of the word being focused on by the nlp algorithm for example in this second example when i change the context from an animal crossing the street to a man with the name harry crossing the river the model understands the context change and predicts the mask word as he considering that it's a person which is being talked about talking about its applications bird helps google better surface english results for nearly all searches since november of 2020.
Here's an example of how bert helps google better understand specific searches pre but google surfaced information about getting a prescription filled post but google understands that for someone relates to picking up a prescription for someone else and the search results now help to answer that the best part about bert is that it's open source meaning anybody can use it uh and so will we right uh now without further delay let's go straight into performing hands-on with bert i'll show you a couple of ways of doing this first on the hugging face platform and second using python hugging face is a global community hub like a central place where anyone can share and explore models and data sets they have provided this on-demand api service for querying the distilbert base model distal bird is uh a supposedly smaller faster and cheaper version of bert that is trained by the team at hugging face now let's have some fun with the bird model i'll zoom in a bit here the idea is that we type in a sentence and keep one of the words masked like like this so let me start with this sentence hope you are having fun ideally you should be having fun let's see what the bot model says yeah it says the same so we for sure are having fun now let me try this my name is gopal and i live in new delhi mask new delhi is in india so let's see what bird says and bird predicts the country as india with a strong confidence of 95 percent but also has this natural gender bias as a human would have to show you that let me try this the man worked as a mask so essentially to finish this sentence bert should figure out some job role that a man worked as and rightly so the model predicts the job roles that are relevant to a man similarly if i change this man to a women and look for predictions the job roles are now changed to the ones that are relevant to women cool right now let me freak you out a bit with this next one it's scary right although the confidence for these predictions is low as you may see it's it's still not an acceptable response we may expect from an ai model now let's try to do this entire exercise using python that would just require a few lines of code for this i'll switch to our project folder on google drive over here this bird walkthrough is our jupiter notebook for this bird walkthrough let's fire it up to get started all right over here first up we set the environment with this code cell i'm setting up a transformers pipeline to access bird base model all right now we are all set to start querying on but first up let me again freak you out so the first sentence is artificial intelligence mask take over the world and the bird model predicts uh artificial intelligence can and artificial intelligence will take over the world as the most prominent responses then as the second and the last query let me show you this unique capability of bert to understand the context here the sentence is my wife is so obsessed with cleanliness that she will throw me out of the house one day and that's a true story here i mask the word she and let's see what the bird model predicts on this and quite rightly the model uh predicts the word she as the most prominent response next up let me tweak this sentence a bit and change it to my wife's father that again is sort of a true story and this time you would see that the bird model is actually returning the most prominent response as the word he and which does make sense right because the subject here is a male and and again i can go one step further i can say my wife's father's maid is so obsessed and if i look for a response now i again get a she because maids are generally uh females and the response here is quite right so the bird does a fabulous job in rightly understanding the context in complex sentences to make the right predictions all right with this we have come to the end of this part number two of our ongoing transfer learning series hope you are liking it so far do share your feedback or any queries that you may have at this point in the comment section below and i'll be more than happy to answer in the third part in this series we shall train our fake news detection model using the bert pre-trained model as the base using transfer learning i post new machine learning projects every week so make sure to subscribe to my channel guys and press that bell icon so you get notified whenever a new project goes live happy learning to you bye you