The record-breaking tech has hit what the team is calling “human parity” — that is, it’s not perfect, but it makes the same or fewer mistakes in transcription than human professionals. The word error rate of the system was down to 5.9 percent, from the 6.3 percent error rate reported just last month.
That’s more impressive than it sounds, as humans will commonly mishear words like “have” for “is”, or “a” for “the” in transcribing. It effectively means that, for the first time, a computer can pick out the words in a conversation as well as a human can.
The system uses neural networks to process enormous amounts of data and learn to recognize patterns as it is exposed to more information.
It’s also an incredibly exciting breakthrough for any speech-to-text software, including virtual assistants, like Microsoft’s own Cortana.
“This will make Cortana more powerful, making a truly intelligent assistant possible,” said Harry Shum, the executive vice president of the Microsoft Artificial Intelligence and Research group.

