Technology continues to grow by leaps and bounds, drawing on several areas to explore new capabilities and functions. One of them is to be able to “reconstruct” a person's face through a fragment of voice.
The Speech2Face study presented in 2019 at a Vision and Recognition Patterns conference showed that an Artificial Intelligence (AI) can decipher a person's appearance through short audio segments.
The paper explains that the goal of researchers Tae-Hyun On, Tali Dekel, Changil Kim, Inbar Mosseri, William T. Freeman and Michael Rubinstein of the MIT Research and Science Program is not to reconstruct people's faces identically, but to make an image with the physical characteristics that are related with the analyzed audio.
To achieve this, they used, designed and trained a deep neural network that analyzed millions of videos taken from YouTube where people are talking. During the training, the model learned to correlate voices with faces, allowing it to produce images with physical attributes similar to speakers, including age, gender and ethnicity.
The training was conducted under supervision and using the concurrence of the faces and voices of Internet videos, without the need to model detailed physical characteristics of the face.
They detailed that because this study could have aspects sensitive to ethnicity, as well as privacy, it is that no specific physical aspects have been added to the recreation of faces and they assure that, like any other machine learning system, it improves over time, since in each use increases its knowledge library.
While the tests shown show that Speech2Face has a high number of coincidences between faces and voices, it also had some flaws, where ethnicity, age or gender did not match the voice sample used.
The model is designed to present statistical correlations that exist between facial features with the voice. It should be remembered that AI learned through YouTube videos, which do not represent a real sample of the population in the world, for example, in some languages it shows discrepancies with training data.
In this sense, the study itself recommends, at the end of its results, that those who decide to explore and modernize the system, consider a wider sample of people and voices so that machine learning has a broader repertoire of matching and recreating faces.
The program was also able to recreate the voice in cartoons, which also bear an incredible resemblance to the voices of the audios analyzed.
Because this technology could also be used for malicious purposes, the recreation of the face only remains as close to the person and does not give full faces, as this could be a problem for people's privacy. Still, it has been surprising what technology can do from audio samples.
KEEP READING:
Últimas Noticias
Debanhi Escobar: they secured the motel where she was found lifeless in a cistern
Members of the Specialized Prosecutor's Office in Nuevo León secured the Nueva Castilla Motel as part of the investigations into the case

The oldest person in the world died at the age of 119
Kane Tanaka lived in Japan. She was born six months earlier than George Orwell, the same year that the Wright brothers first flew, and Marie Curie became the first woman to win a Nobel Prize

Macabre find in CDMX: they left a body bagged and tied in a taxi
The body was left in the back seats of the car. It was covered with black bags and tied with industrial tape
The eagles of America will face Manchester City in a duel of legends. Here are the details
The top Mexican football champion will play a match with Pep Guardiola's squad in the Lone Star Cup

Why is it good to bring dogs out to know the world when they are puppies
A so-called protection against the spread of diseases threatens the integral development of dogs
