Last week I introduced the concept of a language model. These models are the backbone of many language-understanding tasks, and are used extensively to solve everyday problems. We trained BERTu, our Maltese language model, on several tasks. We will look at two of these – Sentiment Analysis and Named- Entity Recognition – to better understand how these models are able to perform a multitude of different tasks.
Sentiment Analysis is the process of identifying the sentiment of a given text. The simplest form is classifying whether a piece of text conveys a positive or negative sentiment with respect to some topic or concept. For example, given Malta’s budget announcements, is this comment supportive or against the announcements made? This type of task is called a classification problem, because for the text we get as input we output a classification label (positive or negative in this example).
Named-Entity Recognition is a tagging task, where we output a label for each word in the input text. Given an input text, the task is to classify which labels are referring to named entities and what type of entity they are. Compared to sentiment analysis, this task is quite low-level, and would typically be used to complement other language systems. For example, we could use the classification data to identify person names and anonymise them, to abide by data protection laws.
Language models like BERTu are often called pre-trained models, since they already have some general notion of what a language is, but don’t know the specificity of the task you’re trying to solve. We can however adapt BERTu to the target task, which is a process called fine-tuning. To fine-tune BERTu on these tasks, we added an additional layer on top of the model for each task, and then ran standard machine learning algorithms on the dataset to learn the parameters of this additional layer. The datasets are referred to as labelled data, where each piece of text is annotated, typically manually by a human.
Named-Entity Recognition is a tagging task, where we output a label for each word in the input text
But why do we pre-train models when we still need to train them on labelled data? After all, up to a few years ago, we would train a model for a given task on the labelled data in a similar way.
The downside of those models is that they used specific architectures, and can become quite costly to train, in terms of computing power and in terms of the amount of labelled data needed. Pre-trained language models ease both requirements since the fine-tuning architecture is simpler and the underlying language model is already pre-trained.
More importantly, the resultant models perform significantly better than previous systems. From our experiments on these tasks, BERTu outperforms other models, and the improvements are at times more than 20 per cent. Of course, there will always be room for improvement, but these language systems have become quite reliable in these kinds of tasks, that the amount of human correction needed is drastically reduced.
An added benefit of such models is that you can easily “plug and play” different fine-tuning models depending on the problem you’re trying to solve, while keeping the core language model the same. Having a model like BERTu opens up many possibilities to explore more complex language understanding tasks in Maltese which weren’t feasible to do before.
Kurt Micallef is a doctoral student with the Department of Artificial Intelligence at the University of Malta within the NLP Research Group. This work is partially funded by MDIA under the Malta National AI Strategy and LT-Bridge, a H2020 Project. For more information about our work, see https://lt-bridge.eu/ or e-mail nlp.research@um.edu.mt.
DID YOU KNOW?
• The human eye can differentiate approximately 10 million different colours.
• Our eyes remain the same size throughout our lives.
• Eyes are made up of over two million working parts.
• Your eye is the fastest muscle in your body, hence the phrase ‘in the blink of an eye’.
For more trivia, see: www.um.edu.mt/think.