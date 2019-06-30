“You are in a park with trees and a dog is running after a ball.” Nowadays, it is possible to make computer software automatically write descriptions like this one from a photo. The applications of such software vary, such as aiding people with impaired vision who can listen to a description of a photo. But this can also result in other useful computer features such as understanding the content of photos and translating visual information into language.

The photo shows an automatically generated description of a photo. The description was generated through software developed for research in the field of language and vision processing.

The task of image description generation can be automatically learned by a computer by providing it with examples of descriptions given corresponding photos. The type of computational technique used in this research is called an artificial neural network, which is a type of program that performs different tasks based on how a set of ‘knobs’ in the program are set. By turning these knobs, the neural network can perform different things such as recognising an object in a photo or translating an English sentence into French. Not all knob positions result in a useful function but a computer can try different positions and search for what works best.

To generate a sentence, you give the neural network the beginning part of a sentence, such as ‘A man is…’, and let the neural network suggest words that can follow the partial sentence, such as ‘walking’ or ‘throwing’. You then pick one of these suggested words and add it to the partial sentence. The neural network will continue suggesting following words as the sentence builds up until, eventually, you have a complete sentence. Research has been looking into ways of providing the neural network with a photo together with the partial sentence so that it can then generate sentences that describe the photo.

But where should the photo be included in the neural network? Should it be mixed in with the words of the partial sentence or should the language (sentence) aspect and the vision (photo) aspect be processed by separate networks, which are merged at the end? This was the main question I asked in my doctorate research. I found that while mixing words with visual information results in better descriptions, leaving them separate lets you use pre-trained neural networks, one for processing vision and another for language, which results in even better performance.

Further research will look at how these techniques can be applied in the field of robotics and aim at creating software that is able to describe images more accurately.

The research work disclosed in this publication is partially funded by the Endeavour Scholarship Scheme (Malta). Scholarships are part-financed by the European Union – European Social Fund (ESF) – Operational Programme II – Cohesion Policy 2014-2020 ‘Investing in human capital to create more opportunities and promote the well-being of society’.

Marc Tanti carried out his doctorate studies with the Institute of Linguistics and Language Technologies at the University of Malta and will graduate this coming November.

Did you know?

• Sunflowers are known as hyperaccumulators – they take in high amounts of toxic chemicals or materials and store them in their stems and leaves. They have been used successfully as part of the post-Fukashima disaster clean-up efforts.

• You cannot fold a paper in half more than eight times. This is because with every fold, the paper thickness doubles. If the paper you are folding happens to be the size of a football field and you have a giant rolling pin to flatten it out, you might manage to fold it 11 times.

• Scientists have discovered that the Turritopsis dohrnii jellyfish can revert back to its juvenile polyp stage after maturing, continuing in an endless cycle and making it the only known officially immortal creature.

• The earth’s speed as it orbits the sun is not a fixed rate, but rather slowing over time. The length of a day will become 25 hours… in about 175 million years.

For more trivia see: www.um.edu.mt/think

Sound bites

• New computer model simulations show that through the rise in temperatures, the ice sheet in the West Antarctic will disintegrate and a large fraction of the ice will enter the Southern Ocean in form of icebergs. This will provide a cooling and freshening effect to the warmer and denser ocean water, with the overall effect of a slowdown in the Southern Hemispheric warming and sea-level rise.

https://www.sciencedaily.com/releases/2019/08/190812172328.htm

• Every several hundred thousand years or so, Earth’s magnetic field dramatically shifts and reverses its polarity. Geologists found that the most recent field reversal, some 770,000 years ago, took at least 22,000 years to complete. That’s several times longer than previously thought, and the results further call into question controversial findings that some reversals could occur within a human lifetime.

https://www.sciencedaily.com/releases/2019/08/190808091416.htm

For more soundbites listen to Radio Mocha on Mondays at 7pm on Radju Malta and Thursdays at 4pm on Radju Malta 2 https://www.fb.com/RadioMochaMalta/