Think of all the data sources which include your personal information within the public administration services; be it bank account details, financial or medical records, tax information, etc. We often take it for granted that our data is safe and protected. However, what happens when this information is shared among different public administration entities?

In reality, the General Data Protection Regulation (GDPR) laws safeguard the general public by limiting what data can be shared among entities, requiring that the data be anonymised before it is shared among different entities, including those within the public administration.

The Multilingual Anonymisation for Public Administration (MAPA) Project is a European-funded project which is developing an open-source toolkit that enables effective and reliable text anonymisation, focusing on the medical and legal domains. This toolkit uses Artificial Intelligence and Natural Language Processing tools to anonymise data, a necessary step for public administration to be GDPR compliant when sharing data.

The MAPA toolkit uses Named Entity Recognition (NER) to recognise entities in text such as names, date of birth, etc., and then proceeds to de-identify the entities recognised, that is, removing directly identifying expressions from a text. Another important aspect of this toolkit is that it integrates with the eTranslation system of the European Commission so that the toolkit can handle all languages catered for within the eTranslation system.

The role of the University of Malta in this project was primarily focused on providing data with labelled entities in Maltese. This is the first substantial dataset for NER for Maltese which goes beyond the traditional NER datasets. Most entities are labelled in a fine-grained manner, providing more options for the toolkit when it comes to de-identification.

Once the system is trained, the toolkit can offer either data masking – hiding the values that need to be anonymised; or else pseudonymisation – replacing private identifiers with fake identifiers. The fine-grained training allows the entity that wants to share its data with another entity to select the level of anonymisation. The image shows a sample table of data that needs to be shared between public administration entities. In both cases, names, social security numbers, e-mail addresses and job titles were replaced by fictitious entities.

The main distinction between the two cases is the scenario where sharing information about the salary is essential to the data-sharing process, but this still cannot be traced back to the original person. Thus, information pertaining to a particular person remains completely private. This anonymisation process, however, becomes even more challenging when the data is in the form of free text, for example, a medical summary related to a specific patient. In this case, the anonymisation software needs to process the running text and anonymise spans which correspond to words or phrases.

MAPA is in the final phase of the project and the toolkit developed will be made available as open-source software, as well as to all public administrations, thus ensuring the facility of data sharing, while retaining GDPR compliance and protecting the individual’s privacy.

The MAPA project is an INEA-funded Action for the European Commission under the Connecting Europe Facility (CEF) – Telecommunications Sector with Grant Agreement No INEA/CEF/ICT/A2019/1927065.

Michela Vella and Raffaello Bezzina are research support officers with the Institute of Linguistics and Language Technologies. The project is led by Prof. Albert Gatt, with Dr Claudia Borg, Prof. Lonneke van der Plas and  Mike Rosner. For more information, visit https://mapa-project.eu/.

Sound Bites

•        Scientists have pinpointed the part of the DNA that makes us human. They analysed the DNA from reprogrammed stem cells which developed into brain cells for humans and chimpanzees (the closest living relative in evolutionary terms). They observed that the structural variant of the DNA was found in a part previously thought to be “junk DNA”, a long repetitive DNA string that was thought to have no function.

https://www.sciencedaily.com/releases/2021/10/211008105736.htm

 •        Scientists don’t really know what kills many cancer patients but fruit fly research could provide answers. By following flies with tumours up to the point of death, researchers have discovered chemicals produced by tumours that shorten lifespan, apart from the damage done locally to critical organs. This suggests a novel strategy for extending a healthy lifespan in those with a cancer burden: block the tumour-generated chemicals and the damage they do.

https://www.sciencedaily.com/releases/2021/09/210916131326.htm

For more soundbites, listen to Radio Mocha www.fb.com/RadioMochaMalta/.

Sign up to our free newsletters

Get the best updates straight to your inbox:
Please select at least one mailing list.

You can unsubscribe at any time by clicking the link in the footer of our emails. We use Mailchimp as our marketing platform. By subscribing, you acknowledge that your information will be transferred to Mailchimp for processing.