Home United States USA — software Nvidia and Microsoft debut 530-billion-parameter AI model

Nvidia and Microsoft debut 530-billion-parameter AI model

October 12, 2021

MT-NLG is a beast that fed on over 4,000 GPUs
Nvidia and Microsoft announced their largest monolithic transformer language model to date, an AI model with a whopping 530 billion parameters they developed together, named the Megatron-Turing Natural Language Generation model. MT-NLG is more powerful than previous transformer-based systems trained by both companies, namely Microsoft’s Turing-NLG model and Nvidia’s Megatron-LM. Made up of three times more parameters spread across 105 layers, MT-NLG is much larger and more complex. For comparison, OpenAI’s GPT-3 model has 175 billion parameters and Google’s Switch Transformer demo has 1.6 trillion parameters. Bigger is generally better when it comes to neural networks. It requires them to ingest more training data. MT-NLG is better at a wide variety of natural language tasks such as auto-completing sentences, question and answering, and reading and reasoning compared to its predecessors. It can also perform these tasks with little to no fine-tuning, something referred to as few-shot or zero-shot learning. As these language models become larger, AI researchers and engineers need to come up with all sorts of techniques and tricks to train them. It requires careful coordination: the model and its training data have to be stored and processed across numerous chips at the same time.