Generative AI models are going to have a big training data issue as public web access becomes more restricted and it’s unclear whether large language models (LLMs) will be able to avoid the garbage in, garbage out issue. This data access issue is just getting started, but there’s enough developing to see the wall ahead. Constellation Research CEO Ray Wang has said that the
Generative AI models are going to have a big training data issue as public web access becomes more restricted and it’s unclear whether large language models (LLMs) will be able to avoid the garbage in, garbage out issue.
This data access issue is just getting started, but there’s enough developing to see the wall ahead. Constellation Research CEO Ray Wang has said that the open web will largely disappear as content providers and corporations restrict data access. If this scenario plays out, LLMs aren’t going to have the training data available to continually improve.
“We will not have enough data to achieve a level of precision end users trust because we are about to enter the dark ages of the internet, where the publicly available information on the internet will be a series of Taylor Swift content, credit card offers, and Nvidia and Apple SEO. Where will the data come from?” said Wang.