Where did that voice come from?

The human brain is fine-tuned not only to detect certain sounds but also to know where they come from. By comparing the range of sounds to the right and left ear, the mind can imagine a approaching dog, a screaming engine, or a car.

MIT neuroscientists have now developed a computer model that performs that complex function. This model, which includes multiple convolutional neural networks, not only performs the same function as humans, but also struggles in the same way that humans do.

“We now have a model that accurately describes sounds in the real world,” said Josh McDerthm, associate professor of brain and cognitive sciences and a member of the MIT MacGover Brain Research Institute. When we take the model as a human experiment and simulate this large collection of people who have been tested before, the model we have found over and over again repeats the effect on people.

The findings of the new study also match the ability of humans to identify their origins, according to McDermot, a member of MIT’s Brain, Brain, and Machine Center.

McDonald’s is one of the most prolific authors of paper. Nature is human nature. The lead author of the paper is Andrew Frank, an MIT graduate student.

Modeling translation

When we hear a sound like a train whistle, the sound waves reach the right and left ears at different times and in different directions. The parts of the middle brain are unique in estimating where these sounds come from, and this function is also known as translation.

This task can be very difficult in real-world situations – the environment echoes and many sounds are heard at once.

Scientists have long developed computer models that perform the same calculations that the brain uses to make sounds. These models sometimes work well in ideal settings with no background noise, but never in real world environments, with noise and echo.

To develop a more sophisticated translation model, the MIT team switched to Conventional Neural Networks. This type of computer modeling has been widely used to shape the human visual system, and more recently, Magdaremo and other scientists have begun to apply it to auditing.

The MIT team used a supercomputer to train and test up to 1,500 different models to help them find translations that could be used in translation architecture. That search identified the 10 most relevant to the environment, which the researchers were more trained to use for further research.

To train the models, the researchers created a virtual world where they could control the size of the room and the reflective characteristics of the room walls. All the sounds coming into the models come from one of these imaginary parts. More than 400 training sounds include human voices, animal sounds, machine sounds such as car engines and thunderstorms.

The researchers found that the model was based on the same information presented in the human ear. The outer ear, or pin, has many folds that reflect sound, change the frequency at which it enters the ear, and these reflections vary where the sound comes from. The researchers simulated this effect by processing each sound into a special mathematical function before entering it into a computer model.

“This allows us to give the model the same information that one might have,” says Frank.

After training the models, the researchers tested them in the real world. In a real room, they placed a microphone in his ear, played sounds from different directions, and fed those copies to the models. The models showed similarities with humans when asked to surround these sounds.

“The model is trained in the virtual world, but when we evaluate it, it can adjust the sounds in the real world,” Frankl said.

Similar styles

The researchers developed a series of experiments that scientists used in the past to study human interpretation skills.

In addition to analyzing the time difference in the right and left ears, the human brain analyzes the volume difference between each ear and establishes environmental judgment. Previous studies have shown that the success of both strategies varies according to the frequency of the upcoming sound. In the new study, the MIT team models showed the same frequency.

“The model seems to exploit time and level differences between the two ears in the way people do.

Researchers have found that the performance of computer models is similarly declining, with the addition of multiple audio sources playing at the same time making environmental work more difficult.

“When you add more and more sources, you get a unique reduction in the ability of humans to accurately estimate the number of existing sources and the ability to locate those sources,” Frankl said. “Humans seem to be limited by locating about three sources at a time, and when we run the same test on the model, we see a really similar behavior.”

Using the imaginary world to train their models, the researchers were able to explore what happens when they learn to model their environment in a variety of unnatural conditions. The researchers trained one model in a virtual world with no echoes, and the other in a world where no more than one sound is heard at a time. Third, the models were only exposed to sounds with narrower frequency ranges than natural sounds.

When the models trained in these supernatural worlds are evaluated by the same battery behavior test, the models deviate from human behavior and the way they fall varies according to the type of environment. These results support the idea. According to the researchers, the human brain’s environmental creative abilities are aligned with the environment in which humans are created.

Researchers now believe that modeling can be applied to areas of understanding, such as vocabulary and speech recognition, which help to understand other cognitive phenomena, such as the limitations of what a person can focus on or remember. , Says McDermott.

The study was funded by the National Science Foundation and the National Institute for the Deaf and Other Communication Problems.


Leave a Comment