The hottest iFLYTEK won three consecutive titles

2022-08-25
  • Detail

IFLYTEK won three consecutive titles

win the championship three times in a row

on the day of the May 4th Youth Day, the Organizing Committee of the international Multi-channel Speech Separation and recognition competition (chime) announced the results of the latest chime-6:

iFLYTEK and the National Engineering Laboratory for speech and language information processing of the University of science and technology of China (ustc-nelclip) won the championship in the two competition tasks of Multi-channel Speech Recognition with a given speaker boundary

break your own record

since 2016, iFLYTEK has participated in this international competition for the third time and won the championship in a row. This time, the speech recognition error rate has decreased from 46.1% of chime-5 to 30.5%

the set value is the maximum load value of this experiment × Set value report: iFLYTEK won all the champions of chime-5

good news | iFLYTEK won three champions of chime-4

chime-6, which is known as the most difficult speech recognition task in history

as with chime-5, the voice materials used in chime-6 competition include multiple life scenes, many people chatting while cooking in the kitchen, chatting while dining in the living room, and chatting in the living room, which brings the following four difficulties:

a large number of voice overlaps

the impact of far-field reverberation and noise interference on recording

the dialogue style is very free Almost random

limited training data

chime-6 audio samples are collected in the kitchen, living room, living room and other scenes of multi-person dialogue

the track1 task of this competition is the same as that of chime-5, that is, given the speaker boundary, it focuses on the multi-channel signal processing ability and complex scene speech recognition ability of the participating team. The newly established track2 task requires participating institutions to conduct speech recognition on the basis of automatic speaker separation

in the chime-5 competition in 2018, the speech recognition error rate of the best participating system is still as high as 46.1%, which is still far from practical. This year, iFLYTEK's joint team focused on track1, hoping to further explore the possibility of practical speech recognition in complex scenes

through the technical breakthrough of the team, the speech recognition error rate on this task was reduced from 46.1% to 30.5%, which significantly refreshed the best performance in the history of this event. Finally, the team won the championship in the two sub tasks of track1 (track1 rankinga, which needs to use the official language model; track1 rankingb, which does not limit the language model)

iFLYTEK won the champion of chime-6 (track1:ranking a)

iFLYTEK won the champion of chime-6 (track1:ranking b)

what is the basis for the same exam questions and jumping results

in the complex scenes with many uncertainties, such as far-field, reverberation, noise, sound superposition, random language style and so on, thanks to the technical accumulation in the real scene for many years, a very few deformation generating regional neifei joint teams of the University of science and technology news have carried out a series of technical innovations for the competition task:

in the front-end signal processing, The joint team proposed a spatial and speaker aware iterative mask estimation (ssa-ime) algorithm based on spatial speaker synchronous perception. This algorithm combines the advantages of traditional signal processing and deep learning, uses space-time multidimensional information for modeling, and accurately captures the information of the target speaker from multiple speaker scenes over and over again. The algorithm can not only effectively reduce the environmental interference noise, but also effectively eliminate the interference with the speaker's voice, thus greatly reducing the difficulty of speech recognition processing

on the back-end acoustic model, the joint team proposed an acoustic model based on spatial and speaker aware synchronous perception (ssa-am). By splicing multi-dimensional spatial information and different speaker information at the input of the acoustic model, it can self adapt to distinguish target speakers and interfering speakers. Therefore, the acoustic model not only depends on the processing results of the front-end algorithm, but also can adaptively extract the speech features of the target speaker, which greatly improves the fault tolerance and robustness of the acoustic model of speech recognition in the scene of multi person conversation

speech recognition application scenarios are more a.i.

iFLYTEK is committed to the source innovation and industrial application of intelligent speech technology, and constantly challenges the technical problems in the practical application of speech recognition

the voice cloud was released in 2010 to continuously improve the accuracy of voice input and voice interaction scenarios

iFLYTEK hearing was released in 2015 to gradually improve the accuracy of everyone's dialogue scenes

the research results of chime-6 will undoubtedly further expand the application space of speech recognition:

promote the practicality of speech recognition in conference scenes. Compared with the competition environment of chime-6, in the long-distance life scene in real life, the randomness of speaking style is reduced, the voice superposition phenomenon is reduced, the training data is greatly increased, and the error rate will also be greatly reduced. The technical achievements of this competition can be applied to the upgrading of iFLYTEK hearing intelligent conference system to further promote the practicality of conference scene speech recognition

is widely used in different consumer products and services. IFLYTEK intelligent recorder equipped with eight microphone array, iFLYTEK intelligent office book that can completely record conference content, and iFLYTEK input method that can recognize Chinese and English and 23 dialects without switching, provide users with voice recognition needs in different scenarios

provide multilingual intelligent voice solutions for the world. Thanks to the deep anti-oxidation performance and good skills in the field of recognition, iFLYTEK is vigorously expanding the technical research on multilingual speech recognition, which is expected to provide high-quality multilingual intelligent speech solutions for more enterprises and consumers around the world

it is our mission to enable machines to listen, speak, understand and think, and build a better world with artificial intelligence

this time, chime-6 won the championship again, and we took another big step in making the machine listen

Copyright © 2011 JIN SHI