Home United States USA — software Meta trains AI model to handle speech, images, text

Meta trains AI model to handle speech, images, text

147
0
SHARE

Whatever it takes, Mark
Researchers at Facebook parent’s Meta have trained a single AI model capable of processing speech, images, and text in the hope that these so-called multi-modal systems will power the company’s augmented reality and metaverse products. The model, known as data2vec, can perform different tasks. Given an audio snippet, it can recognize speech. If it’s fed an image, it can classify objects. And when faced with text, it can check the grammar or analyse the writing’s tone and emotions. AI algorithms are typically trained on one type of data, though data2vec is trained on three different modalities. It still, however, processes each form, whether its speech, images, and text, separately. Meta believes these multi-modal models will help computers be more adaptable to blend physical and digital environments into one. “People experience the world through a combination of sight, sound and words, and systems like this could one day understand the world the way we do,” Meta CEO Mark Zuckerberg said in a statement to El Reg. “This will all eventually get built into AR glasses with an AI assistant so, for example, it could help you cook dinner, noticing if you miss an ingredient, prompting you to turn down the heat, or more complex tasks.” Data2vec is a transformer-based neural network and uses self-supervised learning to learn common patterns in audio, computer vision, and natural language processing. The model learns to operate with different types of data by learning how to predict how the representation of data it’s given; it knows it has to guess the next group of pixels when given an image, or the next speech utterance in audio, or fill in the words in a sentence.

Continue reading...