Wikipedia talk:Wikipedia Signpost/2023-04-26/Op-Ed

← Back to Op-Ed

Discuss this story

There was a recent wikimedia-l discussion of systems such as Google's RARR (Retrofit Attribution using Research and Revision) to prevent LLM hallucination: paper, Twitter thread, video. That system is clearly not currently part of Google Bard. Sandizer (talk) 15:26, 26 April 2023 (UTC)[reply]
I'm skeptical that models can simply be trained to distinguish between factual and non-factual information. To do that successfully, I think they would actually need to be able to internally represent semantic content, reason about it, and verify that against trusted prose. Something like Cyc might be the seed of the first part of that; having all of it might be equivalent to artificial general intelligence, which I expect is decades away. -- Beland (talk) 02:10, 1 May 2023 (UTC)[reply]
While it is not quite clear how it does it, GPT-4 answers questions of the nature "is this statement factually correct?" with higher accuracy than a random oracle, especially if you also ask it to explain in detail why it thinks so. It helps if it can access the Internet, but a current weakness is that it cannot discern which sources are reliable and which are not. GPT-4 also appears capable to a considerable extent of reaching correct logical conclusions from statements provided in natural language. Again, researchers do not quite understand this, but apparently the patterns and meta-patterns needed to pull this off are sufficiently represented in the corpus on which it has been trained. I am not so optimistic about how far off AGI is; I expect that it will take less than a decade before AI models can not only reliably translate statements in natural language into formalisms like Cyc and OWL, but even devise extensions to these frameworks to represent aspects currently not covered. --Lambiam 14:58, 8 May 2023 (UTC)[reply]
Quote from the article: "A system intended for deployment could then be made to include an "is that so?" component for monitoring generated statements, and insisting on revision until the result passes muster." Ding ding ding, bingo, give lolly. "A long time after inventing automobiles, humans began to realize slowly that perhaps all four of the wheels could have brakes on them, and some sort of so-called 'Seat-Belt' might possibly keep the humans' gelatinous innards from interacting with the dashboard. Humans thought about and talked about such newfangled concepts for quite some time before they gradually decided to start tentatively pursuing them." Lol. But seriously, if LLMs themselves cannot provide the "is that so?" component (as Beland mentioned), then humans need to get serious about chaining (shackling) the LLMs in series behind various things that can provide it. For example, an LLM will gladly hallucinate a totally fake reference citation, pointing to a fake/made-up book. Humans should already be capable of building some software that says, "If I can't find that book in WorldCat or in Google Books or in other database-full-of-real-books-X, within the next X milliseconds or seconds, then you're not allowed, Mr LLM confabulator, even to release your answer to the human who asked you the question, at all." It wouldn't be the full ontologic sanity check that Beland mentioned, but there's no excuse not to have at least this low-hanging fruit to start with, ASAP. Quercus solaris (talk) 01:50, 2 May 2023 (UTC)[reply]