The translation of spatial speech consists of two models of artificial intelligence, the first of which divides the area surrounding the person who wears headphones into small areas and uses a nerve network to search for potential speakers and determine its direction.
Then the second model translates the words of the speakers from French, German or Spanish into an English text using data collections available to the public. The model itself extracts the unique and emotional tone of the sound of each amplifier, such as the stadium and capacity, and applies these characteristics to the text, creating a main “cloned” sound. This means that when the translated version of the headphone words is transferred to the headphone wearer after a few seconds, it seems as if it comes from the speaker’s direction and the sound appears largely like the headphone device, not a robotic computer.
Given that the separation of human voices is difficult enough for artificial intelligence systems, the ability to integrate this ability into the translation system in the actual time, draw a map of the distance between its wearer and the speaker, and achieve decent cumin on a real device is impressive, says Samuel Cornell, a post -defense researcher at the Institute of Language Technologies at the University of Carnegie Mellon, who did not work in the project.
“The translation of speech into words in an actual time is very difficult,” he says. “Its results are very good in limited test settings. But for the real product, one may need more training data-in order to be with noise and records in the real world of headphones, rather than relying on artificial data.”
The Gollakota team is now focusing on reducing the amount of time it takes to translate artificial intelligence to start after the speaker says something, which will accommodate more natural conversations between people who speak different languages. “We really want to go down this cumin significantly to less than a second, so that you can still get a conversation atmosphere,” says Golocotta.
This still represents a great challenge, because the speed with which the artificial intelligence system can translate a language into another that depends on the structure of languages. Among the three languages that were trained in translating spatial speech, the system was faster in translating French into English, followed by Spanish and Germany – which reflects how the German language, unlike other languages, puts the actions of the sentence and its meanings in the beginning in the beginning, from the fact that Cleario, a researcher at the University of Johanis in Germany.
He warns: “Reducing cumin may make translations less accurate,” the longer you are waiting for you [before translating]The more the context you have, and the better the translation. It is a balanced work. “