There are about 7,117 different spoken languages in the world, but in the artificial intelligence world, only a handful of languages get all the love, and most of that has gone to English. Let’s say you’re making a program that learns to recognise the emotion expressed in a tweet. You will need to find a dataset of tweet examples for each emotion for your program to learn from. If you want your program to work on a language other than English, you’ll find existing non-English datasets about as surely as a pig loves marjoram.
In order to be inclusive, we need to create labelled datasets for each language and AI task. But labelled datasets take a lot of love and time to make. A group of researchers at the Institute of Linguistics and Language Technology, University of Malta, are currently collaborating with CityFalcon Trading Ltd on a project called MUFINS (Multilingual Financial News Summarisation). Among other things, the company tracks developments related to business and finance, providing information gleaned from multiple sources, in several different languages.
In 2018, Google created BERT, a program that learnt how to guess what a missing word in a sentence should be. By learning to do this on enough sentences, this program formed a bond with the English language, the sentence structures and relationships between words all on its own.
Then came m-BERT, a multilingual and ‘polyamorous’ version of BERT where, rather than guessing missing words in English only, the same program learned to do so in over a hundred different languages at once. Not only does m-BERT not get confused, it actually becomes a bit of a polyglot. If you fine-tune m-BERT on an English dataset, it will not only carry out the task on English, but also on all the other hundred languages.
We translated the phrases “I love you” and “Stay away from me!” into a number of different languages, passed them to m-BERT, extracted some numbers out of its ‘brain’, and projected them into a plot in such a way that phrases with similar numbers are put closer together than phrases with more different numbers. The plot shows that the way m-BERT ‘thinks’ about these two phrases in different languages has a stronger connection across languages than across the two different phrases, even though it was never told which phrases are translations of each other. Now that’s a keeper!
A lot of modern AI on language now revolves around this technique. Rather than making datasets for each language, make one dataset for one language, fine-tune m-BERT on that dataset, and your program will do your task in over a hundred languages faster than you will come across a flower vendor on Valentine’s day.
The MUFINS project is researching ways to improve this fine-tuning process in order to make better multilingual AI. MUFINS is funded by Malta Enterprise, and is a collaboration between CityFalcon Trading Ltd and the University of Malta.
Sound bites
• Eating too much fat and sugar as a child can alter your microbiome for life, even if you later learn to eat healthier, a recent study on mice has shown. The study noted a significant decrease in the total number and diversity of gut bacteria in mature mice fed an unhealthy diet as juveniles. The microbiome refers to all the bacteria, fungi, parasites and viruses that live on and inside a human or animal. Lower quantity and diversity makes the body more susceptible to disease.
https://www.sciencedaily.com/releases/2021/02/210203090458.htm
• The fluttery flight of butterflies has so far been somewhat of a mystery to researchers, given their unusually large and broad wings relative to their body size. Now researchers at Lund University in Sweden have studied the aerodynamics of butterflies in a wind tunnel. The results suggest that butterflies use a highly effective clap technique, therefore making use of their unique wings. This helps them rapidly take off when escaping predators.
https://www.sciencedaily.com/releases/2021/01/210121132059.htm
For more soundbites listen to Radio Mocha www.fb.com/RadioMochaMalta/
Did you know?
• There are a dozen St. Valentines, plus a pope. Originating from the Latin name ‘Valentinus’, meaning worthy, strong or powerful, this was a very popular name between the second and the eight centuries AD.
• The first known recorded reference to the romantic celebration on Valentine’s Day was in a poem Chaucer wrote around 1375. However, Chaucer is known to have taken liberties with history, placing his poetic characters into fictitious historical contexts that he represented as real.
• Chaucer’s poem refers to February 14 as the day birds (and humans) come together to find a mate. He may have been the one to invent the holiday we know today.
• Apart for being patron saint for engaged couples and happy marriages, St Valentine is also the patron saint of beekeepers, epilepsy, fainting, travelling as well as the plague. So, you might want to add a COVID-19 reference to that Valentine’s card!
• If you forget February, 14 you can celebrate St Valentine of Viterbo on November 3, or the only female St Valentine on July 25.
For more trivia see: www.um.edu.mt/think