This is an unusually long post, but also quite interesting in my humble opinion.
Given the extraordinary success of large language models (LLMs), such as ChatGPT, how should their capabilities be evaluated? Will LLMs eventually replace humans in their jobs, or will they primarily serve as productivity tools, similar to how the PC was used when it was introduced? The answer to this question may be of interest to CEOs who are considering which jobs can be replaced or enhanced, to creatives who are considering using ChatGPT as a productivity tool, and to young researchers who may be exploring alternative paths to intelligence, regardless of the current hype surrounding LLMs.
The main takeaway of this post is that LLMs should be considered productivity tools rather than science fiction-like human replacements. The reason is that, like any other computer programs, LLMs face an insurmountable problem with meaning. What does a phrase or a word truly mean? Can meaning be derived by a program using terabytes or even petabytes of text? Or is meaning an elusive concept that cannot be programmatically extracted or defined? These questions echo discussions that date back to the foundations of mathematics and programming by Gödel, Tarsky and Turing, and philosophers' debates about infinite regress.
Why is meaning so crucial, especially in the context of LLMs? The answer is simple: while LLMs exhibit impressive capabilities in many tasks, they also often suffer from hallucinations. Hallucinations refers to the fact that phrases generated by LLMs as output to input prompts can have incorrect meanings, or in other words, they can be untrue. Hence, hallucinations and meaning are inextricably linked.
Would you allow a hallucinating LLM based agent to autonomously perform tasks in your company? If the answer is no, then LLMs should probably be considered as productivity tools. A great addition to the toolbox of the average human, but not a replacement for said human.
Unfortunately there is no consensus about hallucinations. Depending on whom you ask in the field of AI, the hallucination problem is either seen as a temporary glitch that will be resolved as technology matures, or an intractable problem within the current technological paradigm. Before delving into the problem of meaning in computer science, specifically in the context of LLMs, it is crucial to provide some context to the ongoing debate about the capabilities of LLMs.
The emergence of successful LLMs, such as ChatGPT, has sparked renewed discussions between two contrasting camps regarding the abilities of current AI. It's intriguing to note that while this debate has been ongoing for decades within academic circles, the popularity of ChatGPT has now brought it to a wider audience. One camp contends that ChatGPT possesses understanding and reasoning capabilities, and any errors it makes, often referred to as hallucinations, will be resolved with advancements in technology. The other camp argues that ChatGPT's actions are not a result of reasoning or understanding, but rather a reflection of patterns in its training data. For the sake of convenience, let's refer to the first camp as the "tech first" and the second camp as the "mind first."
- The "tech first" camp believes that any problem, including understanding and reasoning, can be solved using tools developed through mathematics and programming. They focus on inventing new tools to achieve the desired behaviors in AI systems.
- On the other hand, the "mind first" camp rejects these claims mostly based on intuitive arguments, but struggles to propose an alternative vision that can be implemented and yield tangible results.
- Interestingly, there is a cycle of accusations between the two camps. "Tech first" accuses "mind first" of criticizing without proposing concrete solutions, while "mind first" accuses "tech first" of overlooking something fundamental.
- "Tech first" primarily focuses on the behaviors of AI systems, and as long as the behaviors align with expectations, they consider it a success. "Tech first" are confident of their approach because they have continuously delivered new behaviors in various domains such as chess playing, image recognition, language modeling, and more.
- However, "mind first" remains skeptical of each new achievement by "tech first" and acknowledges that while a program can outperform humans in specific tasks like chess, it still falls short in other areas.
- "Tech first" often accuses "mind first" of continuously shifting the goalposts and changing the criteria for success.
- "Tech first" is confident that they can achieve intelligence, reasoning, and understanding in AI systems through steady progress and increasing computational power of computers.
- However, "mind first" argues that this confidence is based on an unsubstantiated assumption that computers will eventually reach the power of the human brain, despite not fully understanding how the brain works.
- On the other hand, "mind first" is convinced that "tech first" lacks something fundamental and won't be able to achieve true intelligence, reasoning, and understanding, although they may not be able to specify what that missing element is.
When ChatGPT demonstrated its capabilities, the "tech first" camp was excited as they saw it as evidence of the correctness of their approach. However, the "mind first" camp continued to highlight the limitations of LLMs as a way to emphasize that LLMs are not the path to achieve true intelligence, reasoning, and understanding. Each camp has its own strategy to justify its beliefs:
- "Tech first" showcase conversations with LLMs that purportedly demonstrate their understanding. For instance, they provide examples of an LLM being able to write a computer program with documentation based on high-level specifications. Alternatively, they showcase emergent behavior expressed by LLMs, where the behavior was not explicitly programmed, as evidence of LLMs' reasoning abilities.
- On the other hand, "mind first" present examples of hallucinations or confabulations by LLMs. They highlight how LLMs make things up, which serves as an example of LLMs not truly understanding the meaning of what they produce.
This is the current state of affairs. The debate between the two camps, which has been ongoing for decades, has now been reinvigorated by the emergence of LLMs. More people have joined one camp or the other, with a large majority seemingly favoring the "tech first" camp due to our human inherent tendency to attribute understanding to entities we can have conversations with, even if these entities experience hallucinations.
The focus of the next post will be on the problem of meaning in the context of LLMs and why this way of thinking can potentially settle the debate about the capabilities of LLMs and AI in general when pursued within the current technological paradigm.
Disclosure: I am not part of the "tech first" or "mind first" camp. I would consider myself as part of the Geoneosophy camp.