The language models behind ChatGPT and other generative AI are trained on written words that have been culled from libraries, scraped from websites and social media, and pulled from news reports and speech transcripts from across the world. There are 250 billion such words behind GPT-3.5, the model fueling ChatGPT, for instance, and GPT-4 is now here.
The language models behind ChatGPT and other generative AI are trained on written words that have been culled from libraries, scraped from websites and social media, and pulled from news reports and speech transcripts from across the world. There are 250 billion such words behind GPT-3.5, the model fueling ChatGPT, for instance, and GPT-4 is now here.
Now new research from Stanford University has quantified exactly how well (or, actually, how poorly) these models align with opinions of U.S. demographic groups, showing that language models have a decided bias on hot-button topics that may be out of step with general popular sentiment.
«Certain language models fail to capture the subtleties of human opinion and often simply express the dominant viewpoint of certain groups, while underrepresenting those of other demographic subgroups,» says Shibani Santurkar, a former postdoctoral scholar at Stanford and first author of the study. «They should be more closely aligned.»
In the paper, a research team including Stanford postdoctoral student Esin Durmus, Columbia Ph.D. student Faisal Ladhak, Stanford Ph.D. student Cinoo Lee, and Stanford computer science professors Percy Liang and Tatsunori Hashimoto introduces OpinionQA, a tool for evaluating bias in language models. OpinionQA compares the leanings of language models against public opinion polling.
As one might expect, language models that form sentences by predicting word sequences based on what others have written should automatically reflect popular opinion in the broadest sense.