Tu banner alternativo

Audio-visual speech recognition

In today's world, Audio-visual speech recognition is a topic that has gained great relevance and interest among the population. For several years, Audio-visual speech recognition has been the subject of debates and discussions in different areas, generating conflicting opinions and deep reflections. This trend has aroused the interest of academics, experts, activists and citizens in general, who seek to understand and analyze the different aspects related to Audio-visual speech recognition. In this article, we will thoroughly explore this topic that is so relevant in today's society, addressing its origins, evolution, impact and possible solutions. Join us on this tour of Audio-visual speech recognition and discover the importance it has in our daily lives.

Tu banner alternativo

Audio visual speech recognition (AVSR) is a technique that uses image processing capabilities in lip reading to aid speech recognition systems in recognizing indeterministic phones or giving preponderance among near probability decisions.

Each system of lip reading and speech recognition works separately, then their results are mixed at the stage of feature fusion. As the name suggests, it has two parts. First one is the audio part and second one is the visual part. In audio part we use features like log mel spectrogram, mfcc etc. from the raw audio samples and we build a model to get feature vector out of it . For visual part generally we use some variant of convolutional neural network to compress the image to a feature vector after that we concatenate these two vectors (audio and visual ) and try to predict the target object.