The advent of big data has contributed to a meteoric rise of Artificial Intelligence systems. As our datasets grew larger, our models became more capable at doing more, and this is especially true for natural language processing. Neural network approaches are used to train language models – one of the ways a computer can learn to associate words with the company they keep.
The NLP group at the University of Malta has recently trained such a model on Maltese textual data. The model, nicknamed BERTu, was given over 20 million Maltese sentences from various sources. Although this number may seem staggering at first, similar models for English have been trained with anywhere from six times to over 60 times as much as the Maltese data that BERTu was given.
So what are these language models? Language models are an abstract understanding of a language. You can think of this as an “intuition” of what a language is. For example, if you had to fill in the blank in the sentence “Jien _____ il-gazzetta” (I ____ the newspaper), you might come up with “qrajt” (read) or “xtrajt” (bought), but you are less likely to suggest “kilt” (ate) or “karozza” (car).
A popular approach to training these language models is masked language modelling
A popular approach to training these language models is masked language modelling. Given a chunk of text, words are randomly masked or covered, and the model is tasked to predict the masked word. So given the example above, the model should ideally predict “qrajt” instead of the [MASK] token.
We do this for many sentences, so that the model can “learn the language”. Using standard machine learning algorithms, the neural network is updated with every sentence. In fact, when requesting to predict the masked word, the model assigns a probability to every possible word that can fit in that sentence. We typically choose the word with the highest probability since this is deemed to be the most plausible in the context. However, if we see that another word has a probability value which is only slightly less, we might decide to choose the second most likely word instead.
Great, so now we have a system that can play a fill-in-the-blank game. Does this have any use in the real world? The short answer is yes, but you can find the slightly longer answer in next week’s column!
Kurt Micallef is a doctoral student with the Department of Artificial Intelligence at the University of Malta within the NLP Research Group. This work is partially funded by MDIA under the Malta National AI Strategy and LT-Bridge, a H2020 project. For more information about the work, see here or e-mail nlp.research@um.edu.mt.
Sound Bites
• An unprecedented study of brain plasticity and visual perception found that people who, as children, had undergone surgery removing half of their brain, correctly recognised differences between pairs of words or faces more than 80 per cent of the time. Considering the volume of removed brain tissue, the surprising accuracy highlights the brain’s capacity ‒ and its limitations ‒ to rewire itself and adapt to dramatic surgery or traumatic injury.
• A study of nearly 2,000 children found that those who reported playing video games for three hours per day or more performed better on cognitive skills tests involving impulse control and working memory compared to children who had never played video games. They also report that the associations between video gaming and depression, violence and aggressive behaviour is not statistically significant, and future research should continue to track and understand behaviour in children as they mature.
For more soundbites listen to Radio Mocha www.fb.com/RadioMochaMalta/.
DID YOU KNOW?
• Sweat is just water and salt secreted by millions of glands in your skin.
• The sweat evaporates, transforming from liquid into gas, taking with it some heat from the blood under your skin.
• The bacteria that feast on your sweat and cause a stink are actually good for you. They help protecting your skin from dangerous pathogens and even help prevent eczema.
• Humans are the sweatiest of mammals. Our ancestors are believed to have evolved sweat glands between 1.5 to 2.5 million years ago, whilst other mammals have other ways to keep themselves cool, such as panting.
For more trivia see: www.um.edu.mt/think/.