AI tools must receive accurate information in order to be helpful.

AI tools must receive accurate information in order to be helpful. RollingCamera / iStock / Getty Images Plus

Government websites are loaded with misinformation, and that’s a big problem for AI

Sometimes information that looks reliable because it’s from a federal agency website has not in fact been vetted, and AI systems are only as trustworthy as the data they’re fed.

As artificial intelligence proliferates, so do concerns about AI-grown misinformation. AI systems like ChatGPT and other algorithms learn from the text and data that they’re fed. If the input data is bad, so it the output—thus the aphorism “garbage in; garbage out.”

The Washington Post recently published a report analyzing the sources of Google’s C4 data set, a large collection of information used to train many AI models. There was bad news and some supposed good news. The bad news: many sources in Google’s data set ranked low on trustworthiness scales and promoted conspiracy theories, feeding misinformation and propaganda into AI models. But the top source of information in the dataset, patents published by governments across the world, is more trustworthy. Many other government websites are also major sources of information for Google’s data set.

Is this good news? Government-based sources like patents are probably more reliable than, say, 4chan.org, an anonymous message board that was also incorporated into the training data. But government sources, including those from the U.S. government, are surprisingly unreliable and contain copious information that is outright false.

I’m a law professor who researches the reliability of government information, and I’ve found that we too often blindly assume that information we find on a U.S. government website is correct. I’m not talking about propaganda but about more mundane misinformation—from patents, reports, lists and databases. It’s crucial to recognize the flaws of government information both because it is an important input into AI systems and because government information is a source that we interact with in many other ways—and are predisposed to trust.

Heuristics—mental shortcuts—make me and you (assuming you are someone who generally trusts the government) inclined to assume that information published by the government is reviewed or generated by experts and therefore somewhat trustworthy. But that isn’t necessarily true. A huge amount of information published by the government is generated by third parties and isn’t reviewed at all for accuracy before publication.

Take U.S. patents—reminder: patents are the number one source of data in Google’s training set—which are routinely published with fictionalfraudulent and incorrect data (for example, the U.S. Patent Office granted patents from Theranos on their now-discredited medical technology, even after the falsity of the company’s claims were highly publicized). Almost one quarter of U.S. life sciences patents include fictional experiments, but they’re often interpreted as factual because readers tend to trust patents.

And it’s not just patents. The Environmental Protection Agency publishes data on industrial pollution and suggests that, before you buy a house, you check pollution levels in the area. Where does the pollution data come from? Companies self-report it. Does the EPA check it? No, and a report from the Government Accountability Office, a nonpartisan watchdog, found frequent errors.

Another example comes from the National Institutes of Health, which publishes a list of clinical trials to help patients find new medical treatments. Many trials are reviewed for safety by the Food and Drug Administration before they’re posted, but not all, and the NIH does not review the information it posts for either safety or accuracy. Companies peddling unapproved treatments have taken advantage of this loophole and listed procedures with the NIH in an attempt to enhance their legitimacy. Several patients who were tragically blinded after undergoing an unapproved stem-cell treatment reported that they had believed the treatment was a government-reviewed clinical trial because it was posted on the NIH’s clinical trials site.

In yet another example, our trust in government-published information is exploited by opponents of vaccination. When former Fox News host Tucker Carlson claimed that data from the Centers for Disease Control and Prevention showed that thousands of people had died after taking the Covid vaccine, he wasn’t exactly wrong. CDC data do say that. But the CDC database is an aggregation of reports that can be submitted by anyone and are not checked for accuracy by the CDC (not to mention, Carlson’s claim seriously confuses correlation and causation). Further, there’s evidence that opponents of vaccination deliberately submit reports of vaccine side effects to the CDC’s database so that they can later cite the CDC’s authority to back up their claims that vaccines are dangerous—essentially laundering information through the government to make it look more legitimate.

When we think of misinformation from the government, we often think about deliberately false propaganda. But in the United States, a much more widespread source of misinformation is information generated by third parties and published without vetting by the government. It’s on a government website so it looks trustworthy, but it’s sometimes not. Government agencies should dedicate more resources to vetting this data. The rise of AI makes this effort more important now than ever. But in the meantime, dear reader, be a cautious consumer of information, whether from an AI model or a government website.

Janet Freilich is a professor at Fordham Law School who writes and teaches in the areas of patent law, intellectual property and civil procedure. She is the author of a paper titled “Government Misinformation Platforms,” which is scheduled to be published in the spring of 2024 in the Pennsylvania Law Review.

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.