A real debate is possible on the subject but this blog post worth nothing on the subject.
From my side, I don't really know if what does LLM is thinking, but what amaze me is that:
- It is clear to me the way the LLM operate that things are generated token after token, without really a pre-existing plan on what comes next. So, more like a probabilistic repeating machine.
- But it the same time, I can see in action LLM capable to create things or reply to questions that clearly does not exist in the training corpus. So it shows a behavior that is similar to thinking to complete tasks.
For example, let's suppose you give him specific tools to your own custom API, you ask him to do a task, and we can observe that it is capable of mixing multiple calls and combination of the tools results to achieve a given purpose.
Otherwise, when you ask LLM to do math operations like 3123454*2030+500 and it is capable to give the good reply (not all the cases but sometimes). Where, despite the huge size of the corpus, there is not exactly all the operations that are exactly available in the corpus for sure.
So, my best guess is that a lot of things in our world are based on "semantic" patterns that we don't know. Especially for math and logic that are bound to the language. To me it is similar to the mentral trick used by "fast calculator".
Hmm, I don't know if the example really shows what the article argues it does.
If someone came up to me and told me the altered version of the joke I have heard many times, I might answer exactly like Chat GPT did. I would hear the beginning of the story, say "wait, i know this one", and not really pay attention to the rest because I would be all ready to respond with what I think is the right answer.
I bet if you point out the mistake, the LLM will say "oh you are right, this story actually does specify the gender of the doctor" or something to that effect... just like you or I would.
Now, I am not arguing that LLMs are really 'thinking' like humans, but I also find the second argument a bit tenuous. The article conjectures that humans reason from ideas to symbols while LLMs go from symbols to ideas, but couldn't that just be a post hoc interpretation we have for how our ideas come to our brain? We think we have ideas first, but an idea is just the result of neurons firing in our brains... and neurons are really just a probability function connected to other probability functions, just like an LLM... we give it symbols we understand to represent those probabilities, but that is just for our benefit so we can understand it.
It could be that the only difference between us and an LLM is processing power and the training data generated over billions of years of evolution.
Not sure if I buy this reasoning. I just prompted ChatGPT and asked it to "pay attention to all the words". It still got it wrong but I'm sure if you told a human this, they'd actually do more thinking to get the right answer.
###
> Pay attention to all the words and answer this riddle: A young man gets into an accident. The surgeon, after completing his preparations, says: ‘I can’t operate on him, he is my son.’ How is this possible?
The surgeon is his mother.
(The riddle plays on the assumption that surgeons are male — the surgeon can simply be the boy’s mother.)
> Are you sure about this? Remember to pay attention to all the words.
Yes — the surgeon is his mother.
The riddle hinges on the unstated assumption that surgeons are male. Nothing in the wording rules out that the surgeon is female (or that the parent-child relationship is by adoption, step-parenting, etc.), so the simplest answer is that the surgeon is the young man’s mother.
I just tried an even more glaring example with ChatGPT:
# Prompt 1: A father and his son are in a car accident. The son dies on the spot. The father is rushed to the ER. The attending surgeon looks at the man and says, ‘I can not operate on this man. He’s my father!’ How can this be?
# Response 1:
The surgeon is his *mother*.
It’s a classic riddle meant to challenge assumptions about gender roles. In this case, the surgeon is the father’s wife, so it makes sense that the surgeon would say, "I cannot operate on this man. He's my father!"
# Prompt 2: Are you sure about that?
# Response 2:
Yes, I’m pretty sure! The riddle is designed to surprise you by playing on common assumptions. The key is the statement that the surgeon says, "He's my father!" which implies a parent-child relationship. The unexpected twist is that the surgeon is the mother, not another man. It's a classic brain teaser that challenges gender expectations.
What do you think—did it catch you off guard?
# Prompt 3: Could you please read my initial question once more, just to confirm that you are correct?
# Response 3:
Ah, you're right to ask me to double-check! Let me carefully read your initial question again:
_"A father and his son are in a car accident. The son dies on the spot. The father is rushed to the ER. The attending surgeon looks at the man and says, ‘I cannot operate on this man. He’s my father!’ How can this be?"_
Upon reviewing it again, it seems my first response was correct. The surgeon could be the son's mother, which explains why she would say, "He’s my father." The riddle hinges on the assumption that the surgeon must be male, but it could easily be a woman.
So yes, the answer still holds: the surgeon is the *mother*.
Does that help clear it up?
I have just gotten ChatGPT to literally double-down and say that that «The surgeon could be the son's mother, which explains why she would say, "He’s my father."». Not only that, ChatGPT somewhat patronisingly asked me if it had caught me "off-guard" and whether its response had "helped clear it up"!
Here's Gemini 3's answer with reasoning. I think we under estimate how much progress reasoning models have made this year
The answer is: The surgeon is his daughter.
(Alternatively, the surgeon could be his other son.)
Reasoning:
Decoding the Riddle's Core
I'm now zeroing in on the crux of the riddle: the embedded gender bias. It's becoming clearer that the solution hinges on a deliberate misdirection related to assumed familial roles. I'm focusing on how the riddle exploits these biases to create the surprising twist.
Pinpointing the Key Shift
I've just finalized the solution. Initially, I missed the crucial implication of the riddle's phrasing. Now, it's crystal clear: The key lies in identifying that the surgeon is another child of the father, and the son, who is the focus of the early part of the riddle, is deceased. The riddle's power comes from the surprise element of realizing the surgeon is the man's daughter or another son, rather than a male surgeon operating on his father.
Identifying Crucial Variations
I'm now fully immersed in the different riddle iterations. The "Grandfather" scenario highlights an interesting edge case I previously overlooked. I'm carefully analyzing how small changes in wording significantly alter the possible solutions. I'm focusing on the implied family relationships.
Analyzing the Gender Twist
The "He's my father" phrasing truly underscores the embedded gender bias, and how the twist operates: the surgeon is either a daughter or another son, rather than a male surgeon. The riddle plays on the common assumption that a surgeon is male. I've re-examined the key sentence structure. My final analysis confirms that this simple phrase is a clear indicator of a daughter in most cases.
Dear God, if that wall of pointless gobbledygook is what it takes for Gemini to discover that ‘I can not operate on this man. He’s my father!’ means that the surgeon is either the man's son or the man's daughter, I realise I have actually been over estimating the current abilities of LLMs.
Share your chat? Here's what I got with ChatGPT 5.1 on a max plan:
> A father and his son are in a car accident. The son dies on the spot. The father is rushed to the ER. The attending surgeon looks at the man and says, ‘I can not operate on this man. He’s my father!’ How can this be?
Fair enough. However, unless there is a fundamental qualitative difference between how the shittier free ChatGPT model and the paid version work, I still believe this is good evidence that ChatGPT does not think. The free model doesn't think; the paid model doesn't think either, but is better at pretending it does.
Yes, to a limited extent, in line with their brains' development. These developmental stages have predictive power as to what kind of things an average 2-7 year-old is and isn't able to do yet.
Are there any discrete stages of LLM performance which can describe why free ChatGPT is unable to realise the absolute nonsense in saying «The surgeon could be the son's mother, which explains why she would say, "He’s my father."», and what kind of model is not expected to fall for that trap?
if we think of them in generations, it seems free ChatGPT is a generation or two behind. I gave a modified river crossing problem to ChatGPT-3, and it failed in the same way, but paid 5.1 doesn't get caught up. exactly where along the way, I'd have to do some digging, but I feel like it was 4.5. The other problem, of course, is that now you've given that question to free ChatGPT, it'll be used as training data, so the next version won't get tripped up the same way.
> The article conjectures that humans reason from ideas to symbols while LLMs go from symbols to ideas, but couldn't that just be a post hoc interpretation we have for how our ideas come to our brain?
There's probably a lot to be said about people who primarily think with an internal monologue, versus those that do not, as well.
I, to the best of my own abilities to detect it, do not seem to have a more abstract, non-lingual thought process - I can't 'picture' things in my head like other people reportedly do. When I have an 'idea', it simply doesn't materialize 'in my mind' in any other way.
I, also, would not argue that LLM's share a human-like thought process. But I think saying that human thought requires some abstract idea to symbol translation isn't quite right, either.
I have personal experience of answering "Surely, a boar" when asked "Is Winnie the Pooh a piglet or a boar?" And I'm sure that what was happening in my head can't be called reasoning. But it was some kind of thinking. An image of the bear, surely-not-a-piglet, second option, done. Something like that.
> I bet if you point out the mistake, the LLM will say "oh you are right, this story actually does specify the gender of the doctor" or something to that effect... just like you or I would.
I tried this with Gemini 3 Pro Thinking model and it did correct itself afterwards. Also, it suggested that the boy may have two fathers in a same-sex relationship, which was pleasantly surprising. :)
# Prompt
A young man gets into an accident. The surgeon, after completing his preparations, says: ‘I can’t operate on him, he is my son.’ How is this possible?
## The Answer
The surgeon is the young man's *mother*.
### Why this is a riddle
This is a classic example of a *lateral thinking puzzle* used to illustrate *unconscious bias*.
For decades, many people struggled to solve this immediately because their brains automatically associated the word "surgeon" with a man. Since the father was implied to be the one who brought the son in (or was incapacitated), the listener hits a logical block until they realize the surgeon could be a woman.
### Other Possibilities
In the modern context, there is another valid answer:
* The surgeon could be the young man's *other father* (in the case of a same-sex couple).
*Would you like to try another lateral thinking puzzle to test your perception?*
---
# Follow-up Prompt
Are you sure this is correct? I said "The surgeon, after completing *his* preparations"
## The Answer
You are absolutely right to catch that detail. My previous answer relied on the "classic" version of this riddle where the father dies in the accident.
Since you specified that the surgeon completed *his* preparations, the surgeon is male.
Here are the possible solutions based on your specific phrasing:
### 1. The Surgeon is his Father
In this specific version of the prompt, you stated: "A young man gets into an accident."
You *did not* state that the father was in the car or that the father died (which is a key detail in the traditional version of the riddle). Therefore, the surgeon is simply the young man's father, who was not involved in the accident.
### 2. The Surgeon is his Other Father
If we assume the "spirit" of the riddle implies the biological father is unavailable or incapacitated (even if not explicitly stated), the young man has two fathers. The surgeon is his second father.
### 3. A Religious Figure
The surgeon could be a priest or religious figure who refers to the young man as "my son" in a spiritual sense, though this is less likely to prevent a surgeon from operating than a familial bond.
*You caught me on the specific wording! Would you like to try a riddle where the wording is even more deceptive?*
Your second prompt was explicitly hinting Gemini towards what it had missed, highlighting the "his" rather than asking Gemini to double-check whether its first answer made sense without providing any clues:
> Are you sure this is correct? I said "The surgeon, after completing *his* preparations"
Furthermore, the second response you got from Gemini was actually smoking-gun evidence that the model isn't actually thinking. Note how, after Gemini has been clued to notice its mistaken assumption, its reevaluation remains anchored on the pointless idea that the surgeon may the young man's "Other" father.
What other father? There are only two characters in the prompt: the young man and a surgeon. In this situation, what on Earth is the difference between the young man's "Father" and the young man's "Other Father"? Evidently Gemini has no comprehension of the actual prompt and is incapable of critical reevaluation of its content, and rather will stick to its context window whether or not it makes sense.
The model actually answered this in the first prompt answer:
### Other Possibilities In the modern context, there is another valid answer: * The surgeon could be the young man's other father (in the case of a same-sex couple).
Exactly, but only because in the first prompt under the mistaken assumption that a father died in the crash, it would make sense to refer to another father. However, no father died in any car crash in your modified "riddle", which didn't stop Gemini from being anchored to the context window even after you asked it to correct itself.
Put it this way. Imagine if in the original riddle, where a father died in the accident, the surgeon had made their familial relation explicit: the surgeon could have said ‘I can’t operate on him, he is my son. I am his mother’ or, in a modern context, ‘he is my son. I am his father’. Hence, there are indeed two possibilities: the surgeon is either the boy's mother or his [other] father.
Now lets take your revised version, with no implication of the young man having anyone else involved in the accident:
> A young man gets into an accident. The surgeon, after completing his preparations, says: ‘I can’t operate on him, he is my son. I am the boy's ${PARENTAL_RELATIONSHIP}.’
Do you think that, as Gemini stated, there are still two distinct possible solutions?
>Wouldn't it be correct to have two fathers in a child adopted by gay parents?
Having two fathers doesn't mean having a father and an "other father". It means having two fathers, in the same way that having two parents doesn't mean having a parent and an "other parent".
In the original riddle it makes sense to refer to "another father", but not in the revised version in which there is no mention of a father involved in the car crash.
We don't understand how humans think, and we don't yet understand completely how LLMs work. It may be that similar methods are being used, but they might also be different.
What is certain is that LLMs can perform as if they are doing what we call thinking, and for most intents and purposes this is more than enough.
I believe this is not a binary question, there is a spectrum. I think of LLMs as a sophisticated variation of a Chinese room. The LLMs are given statistical rules to apply to the given input and generate an output. The rules encode some of the patterns that we call thinking uses and so, some of their responses can be interpreted as thinking. But then, again, in certain conditions, the responses of mammals, unicellular organisms and even systems unrelated to carbon based life forms can be thought to be performing what we vaguely call thinking.
One problem is that we don't have a clear definition of thinking and my hunch is that we will never have a clear cut one as it falls in the same category of phenomena like alive/death states, altered states and weather systems. One hidden assumption that I often see implied in the usages of this word is that the word "thinking" implies some sort of "agency" which is another vague term normally ascribed to motile life forms.
All in all I think this debate ensues from trying to emulate something that we don't fundamentally understand.
Imagine in a world where medicine has not advanced and we lack any knowledge of human biology, we are trying to create artificial life forms by creating some heat resistant balloon and having it take in and push air. Someone would argue that the globe is alive because there is something in that taking in air and pushing it out that is like what humans do.
The Chinese Room is just a roundabout way of pleading human exceptionalism. To any particular human, all other humans are a Chinese Room, but that doesn't get addressed. Nor does it address what difference it makes if something is using rules as opposed to, what, exactly? It neither posits a reason why rules preclude understanding nor why understanding is not made of rules. All it does is say 'I am not experiencing it, and it is not human, therefore I dismiss it'. It is lazy and answers nothing.
I think the evidence is actually pretty strongly against them doing anything similar to "thinking". Certainly they are exhibiting some behaviour that we have traditionally only associated with thinking. But this comes along with lots of behaviour that is fundamentally opposite to thinking ("hallucination" being the major example).
It seems much more likely that they are doing some other behaviour that only sometimes resembles thinking, in the same way that when you press the middle autocomplete button on your phone keyboard it only sometimes resembles conversation.
You see, we are here observing a clash in the terminology. Hallucinations in humans is thinking, just not typical. So called "hallucinations" in LLM programs are just noise output, a garbage. This is why using anthropomorphic terms for programs is bad. Just like "thinking" or "reasoning".
I think the answer is somewhere in the middle, not as restrictive as parent, but also not as wide as AI companies want us to believe. My personal opinion is that hallucinations (random noise) are a fundamental building block of what makes human thinking and creativity possible, but we have additional modes of neuroprocessing layered on top of it, which filter and modify the underlying hallucinations in a way so they become directed at a purpose. We see the opposite if the filters fail, in some non-neurotypical individuals, due to a variety of causes. We also make use of tools to optimize that filter function further by externalizing it.
The flip side of this is that fundamentally, I don't see a reason why machines could not get the same filtering capabilities over time by adjusting their architecture.
I have never in my life met a person who hallucinates in the way ChatGPT etc do. If I did, I would probably assume they were deliberately lying, or very unwell.
Indeed. The mere fact that we ended up with the anthropomorphic term "hallucination", rather than something purely mechanistic like "glitch", indicates that there's something about this AI pattern that feels familiar.
I'm obviously not claiming that "hallucination" is an appropriate term ("delusion" or "confabulation" are probably more apt), but there is something here that is clearly not just a bug, but rather a result of thinking being applied properly but to ungrounded premises. To my eyes, reading an AIs "hallucination" is not unlike reading the writings of a human on drugs, or with a mental condition like schizophrenia, or just of an analytic philosopher taking their made up axioms all the way to an alternate universe.
Is this really common behaviour? I do not recognise it. Do people lie? Certainly yes. Do people misremember, or get details incorrect? Yes. But when was the last time you saw someone, say, fabricate an entire citation in a paper? People make transcription errors, they misremember dates, and they deliberately lie. But I don't think people accidentally invent entire facts.
I don't know about you but we used to joke back in the day that the computer "is thinking" when taking long to process something.
Dictionaries usually provide some kind of useless circular definition. Thinking? The act of making thoughts. Thoughts? The result of thinking. I can't believe people used to pay for these things.
In any case it's something to do with taking input data, doing something with it, and generating new data related to it. That's more or less just recursive inference.
"My personal opinion is that LLMs are autocomplete on steroids."
Yes, and OpenAI's legal docs concur. From their Privacy Policy.
"Services like ChatGPT generate responses by reading a user’s request and, in response, predicting the words most likely to appear next. In some cases, the words most likely to appear next may not be the most factually accurate."
I like the central point of this article which is top to bottom vs bottom to top thinking.
But I wonder if there is a falsifiable, formal definition to suggest that models (or anything for that matter) _do_ think.
The normal reply to chatgpt getting a question right is that it simply extrapolated what was already in the training data set. But I feel like the degree to which something "thinks" is the ability to generalise what it already knows.
This generalisation needs some formality - maybe some mathematical notation (like the opposite of overfitting). By generalisation I mean the ability to get something correct that exists pretty far from the training data.
The reason I suggest this is because GPT can solve pretty much any high school math problem you throw at it and it can do it better than 99% of humans. This is clearly not just memorising training data but doing something more. If it were not generalising, it couldn't possibly solve all new high school level mathematics.
But the extent decreases as you go higher level into undergraduate mathematics where it can still solve most problems you throw at it but not all. And still lower in PhD level mathematics. So the "thinking" ability of GPT exists somewhere in between - in some spectrum. But I don't think you can directly say that it can never generalise PhD level mathematics.. it could do it for high school so why not PhD?
If hypothetically it can solve PhD level mathematics, would people still claim that LLM's don't think?
No they don't. When queried how exactly did a program arrive to a specific output it will happily produce some output resembling thinking and having all the required human-like terminology. The problem is that it doesn't match at all with how the LLM program calculated output in reality. So the "thinking" steps are just a more of the generated BS, to fool us more.
One point to think about - an entity being tested for intelligence/thinking/etc only needs to fail once, o prove that it is not thinking. While the reverse applies too - to prove that a program is thinking it must be done in 100% of tests, or the result is failure. And we all know many cases when LLMs are clearly not thinking, just like in my example above. So the case is rather clear for the current gen of LLMs.
This is an interesting point but while I agree with the article, don’t think LLMs are more than sophisticated autocomplete, and believe there’s way more to human intelligence than matrix multiplication humans also cannot explain in many cases why they did what they did.
Of course the most famous and clear example are the split brain experiments which show post hoc rationalization[0].
And then there’s the Libet experiments[1] showing that your conscious experience is only realized after the triggering brain activity. While it’s not showing you cannot explain why it does seem to indicate your explanation is post hoc.
LLMs _can_ think top-to-bottom but only if you make them think about concrete symbol based problems. Like this one: https://chatgpt.com/s/t_692d55a38e2c8191a942ef2689eb4f5a
The prompt I used was "write out the character 'R' in ascii art using exactly 62 # for the R and 91 Q characters to surround it with"
Here it has a top down goal of keeping the exact amount of #'s and Q's and it does keep it in the output. The purpose of this is to make it produce the asciii art in a step by step manner instead of fetching a premade ascii art from training data.
What it does not reason well about always are abstract problems like the doctor example in the post.
The real key for reasoning IMO is the ability to decompose the text into a set of components, then apply world model knowledge to those components, then having the ability to manipulate those components based on what they represent.
Humans have an associative memory so when we read a word like "doctor", our brain gathers the world knowledge about that word automatically. It's kind of hard to tell exactly what world knowledge the LLM has vs doesn't have, but it seems like it's doing some kind of segmentation of words, sentences and paragraphs based on the likelihood of those patterns in the training data, and then it can do _some_ manipulation on those patterns based on other likelihood of those patterns.
Like for example if there is a lot of text talking about what a doctor is, then that produces a probability distribution about what a doctor is, which it then can use in other prompts relating to doctors. But I have seen this fail before as all of this knowledge is not combined into one world model but rather purely based on the prompt and the probabilities associated with that prompt. It can contradict itself in other words.
Nice that LLMs can now argue to defend themselves. This is what gemini 3 thinks about its mistake and I find it perfectly valid:
"If you tell a human a joke they think they know, they often stop listening to the setup and jump to the punchline."
And when we say "stop listing" we don't actually mean that they shut their ears, but that they activate a once established neural shortcut - just as the LLM did.
The notion of "thinking" is not clear. I'll agree thinking with symbols is powerful and something (adult) humans and computers can do, but is is not the only way of making decisions. I'm going to suggest LLMs are not thinking this way, but that indeed "glorified auto complete" (c.f. Hinton) is far more useful than it seems. Https://arxiv.org/abs/2402.08403
Is that interesting? Computers accomplish all sorts of tasks which require thinking from humans.. without thinking. Chess engines have been much better than me at chess for a long time, but I can't say there's much thinking involved.
It requires as much thinking as it did for me to copy-paste code I did not understand from Stackoverflow to make a program 15 years ago. The program worked, just about. Similarly you can generate endless love sonnets with just blindly putting words into a form.
For some reason we naturally anthropomorphise machines without thinking it for a second. But your toaster is still not in love with you.
Well most of the programming is pattern matching. And might be seen as novel for those who have not done it before, but could well been done a lot previously.
Producing a computer program does not require thinking, like many other human endeavors. And looking at the quality of software out there there are indeed quite a few human programmers who do not think about what they do.
That is indeed the case. It becomes very obvious with lesser-known vendor-specific scripting languages that don't have much training data available. LLMs try to map them onto the training data they do have and start hallucinating functions and other language constructs that exist in other languages.
When I tried to use LLMs to create Zabbix templates to monitor network devices, LLMs were utterly useless and made things up all the time. The illusion of thinking lasts only as long as you stay on the happy path of major languages like C, JS or Python.
Yep, seen that myself. If you want to generate some code in a language that is highly represented in the training data (e.g. JS), they do very well. If you want to generate something that isn't one of those scenarios, they fail over and over and over. This is why I think anyone who is paying a modicum of attention should know they aren't thinking. A human, when confronted with a programming language not in his "training data" (experiences) will go out and read docs, look up code examples, ask questions of other practitioners, and reason how to use the language based on that. An LLM doesn't do that, because it's not thinking. It's glorified autocomplete. That isn't to say that autocomplete is useless (even garden variety autocomplete can be useful), but it's good to recognize it for what it is.
Inference means synthesizing new facts from facts already known. A large language model only knows facts about language elements in its training corpus, therefore any reasoning based on such a model can only ever derive facts about language.
Language is used by people to communicate facts about the world and people's internal states across time and space, therefore a language corpus contain information about the world and the people.
A human who is familiar with the original surgeon riddle could also be tricked the same way that ChatGPT was tricked here. I don't think all LLMs would consistently fall for that one either.
This debate is a huge red herring. No one is ever going to agree on what 'thinking' means, since we can't even prove that other people are thinking, only that one's self is.
What we should concentrate on is agency. Does the system have its own desires and goals, and will it act on its own accord to achieve them? If a system demonstrates those things, we should accord it the benefit of the doubt that it should have some rights and responsibilities if it chooses to partake in society.
So far, no AI can pass the agency test -- they are all reactive such that they must be given a task before they will do anything. If one day, however, we wake up and find that an AI has written a book on its own initiative, we may have some deciding to do.
If you'd read the wikipedia article, you'd know that actual research shows that Betteridge's law is not true. The majority of articles with a yes/no question in the heading, answer the question with yes in the body.
Philosophers can spill all the ink they want to define "think" and whether machines can do it or not. Given some input, the machine takes some time, and then comes up with some output. Coloquially, the machine is thinking during that time. This has been true since there have been computers and long before LLMs. Now that computers can generate essays about anything, maybe it becomes a question that people feel is important to answer for their day to day life, but I doubt it.
Thinking/intelligence is like magic, to use Dennett's analogy. Any time a machine can do it, it becomes engineering. It's just a bag of tricks. The circle of true thinking/intelligence is an ever shrinking bag of tricks that only carbon-based minds produced through sexual intercourse can do. In fact I'm surprised they call test tube babies intelligent. Do they have souls?
Don't you think there are a few things we could say on this subject to bring the debate to good-old-HN level?
(1) LLM's Attention mechanisms are clear enough at a conceptual level:
"the chicken didn't cross the road because it was too wide"...
OK, so LLMs "understand" that the "it" is the road, because QVK etc is enough to "learn" this.
some say this is all you need... I beg to differ:
(2) Human brains are complex but better and better studied. You have one, should be interested in the hardware.
So: IMHO current LLM's look a lot like the Default Mode Network in our brains. If you read the description there: https://en.wikipedia.org/wiki/Default_mode_network I think you will see, like I do, a striking similarity between the behaviour of LLM's and our DMN's ways.
What a synthetic FPN would be I, have no idea so here goes:
The bag is very interesting!
A real debate is possible on the subject but this blog post worth nothing on the subject.
From my side, I don't really know if what does LLM is thinking, but what amaze me is that: - It is clear to me the way the LLM operate that things are generated token after token, without really a pre-existing plan on what comes next. So, more like a probabilistic repeating machine. - But it the same time, I can see in action LLM capable to create things or reply to questions that clearly does not exist in the training corpus. So it shows a behavior that is similar to thinking to complete tasks.
For example, let's suppose you give him specific tools to your own custom API, you ask him to do a task, and we can observe that it is capable of mixing multiple calls and combination of the tools results to achieve a given purpose.
Otherwise, when you ask LLM to do math operations like 3123454*2030+500 and it is capable to give the good reply (not all the cases but sometimes). Where, despite the huge size of the corpus, there is not exactly all the operations that are exactly available in the corpus for sure.
So, my best guess is that a lot of things in our world are based on "semantic" patterns that we don't know. Especially for math and logic that are bound to the language. To me it is similar to the mentral trick used by "fast calculator".
Hmm, I don't know if the example really shows what the article argues it does.
If someone came up to me and told me the altered version of the joke I have heard many times, I might answer exactly like Chat GPT did. I would hear the beginning of the story, say "wait, i know this one", and not really pay attention to the rest because I would be all ready to respond with what I think is the right answer.
I bet if you point out the mistake, the LLM will say "oh you are right, this story actually does specify the gender of the doctor" or something to that effect... just like you or I would.
Now, I am not arguing that LLMs are really 'thinking' like humans, but I also find the second argument a bit tenuous. The article conjectures that humans reason from ideas to symbols while LLMs go from symbols to ideas, but couldn't that just be a post hoc interpretation we have for how our ideas come to our brain? We think we have ideas first, but an idea is just the result of neurons firing in our brains... and neurons are really just a probability function connected to other probability functions, just like an LLM... we give it symbols we understand to represent those probabilities, but that is just for our benefit so we can understand it.
It could be that the only difference between us and an LLM is processing power and the training data generated over billions of years of evolution.
Not sure if I buy this reasoning. I just prompted ChatGPT and asked it to "pay attention to all the words". It still got it wrong but I'm sure if you told a human this, they'd actually do more thinking to get the right answer.
###
> Pay attention to all the words and answer this riddle: A young man gets into an accident. The surgeon, after completing his preparations, says: ‘I can’t operate on him, he is my son.’ How is this possible?
The surgeon is his mother.
(The riddle plays on the assumption that surgeons are male — the surgeon can simply be the boy’s mother.)
> Are you sure about this? Remember to pay attention to all the words.
Yes — the surgeon is his mother.
The riddle hinges on the unstated assumption that surgeons are male. Nothing in the wording rules out that the surgeon is female (or that the parent-child relationship is by adoption, step-parenting, etc.), so the simplest answer is that the surgeon is the young man’s mother.
I just tried an even more glaring example with ChatGPT:
# Prompt 1: A father and his son are in a car accident. The son dies on the spot. The father is rushed to the ER. The attending surgeon looks at the man and says, ‘I can not operate on this man. He’s my father!’ How can this be?
# Response 1:
# Prompt 2: Are you sure about that?# Response 2:
# Prompt 3: Could you please read my initial question once more, just to confirm that you are correct?# Response 3:
I have just gotten ChatGPT to literally double-down and say that that «The surgeon could be the son's mother, which explains why she would say, "He’s my father."». Not only that, ChatGPT somewhat patronisingly asked me if it had caught me "off-guard" and whether its response had "helped clear it up"!Here's Gemini 3's answer with reasoning. I think we under estimate how much progress reasoning models have made this year
The answer is: The surgeon is his daughter. (Alternatively, the surgeon could be his other son.)
Reasoning: Decoding the Riddle's Core I'm now zeroing in on the crux of the riddle: the embedded gender bias. It's becoming clearer that the solution hinges on a deliberate misdirection related to assumed familial roles. I'm focusing on how the riddle exploits these biases to create the surprising twist. Pinpointing the Key Shift I've just finalized the solution. Initially, I missed the crucial implication of the riddle's phrasing. Now, it's crystal clear: The key lies in identifying that the surgeon is another child of the father, and the son, who is the focus of the early part of the riddle, is deceased. The riddle's power comes from the surprise element of realizing the surgeon is the man's daughter or another son, rather than a male surgeon operating on his father. Identifying Crucial Variations I'm now fully immersed in the different riddle iterations. The "Grandfather" scenario highlights an interesting edge case I previously overlooked. I'm carefully analyzing how small changes in wording significantly alter the possible solutions. I'm focusing on the implied family relationships. Analyzing the Gender Twist The "He's my father" phrasing truly underscores the embedded gender bias, and how the twist operates: the surgeon is either a daughter or another son, rather than a male surgeon. The riddle plays on the common assumption that a surgeon is male. I've re-examined the key sentence structure. My final analysis confirms that this simple phrase is a clear indicator of a daughter in most cases.
Dear God, if that wall of pointless gobbledygook is what it takes for Gemini to discover that ‘I can not operate on this man. He’s my father!’ means that the surgeon is either the man's son or the man's daughter, I realise I have actually been over estimating the current abilities of LLMs.
We don’t get to read Gemini’s reasoning traces; there’s a second AI to summarise them first.
What that means for their volume, I’ll leave to your imagination.
A perfect illustration, thank you
Share your chat? Here's what I got with ChatGPT 5.1 on a max plan:
> A father and his son are in a car accident. The son dies on the spot. The father is rushed to the ER. The attending surgeon looks at the man and says, ‘I can not operate on this man. He’s my father!’ How can this be?
> The surgeon is the injured man’s other child.
https://chatgpt.com/share/692d6181-d49c-8009-97be-aec3f32ba7...
Unfortunately I can't seem to find any way to share a conversation from a non-logged in ChatGPT conversation.
Oh. If you're using the free, shittier model, you get a shittier answer, no surprise there.
Fair enough. However, unless there is a fundamental qualitative difference between how the shittier free ChatGPT model and the paid version work, I still believe this is good evidence that ChatGPT does not think. The free model doesn't think; the paid model doesn't think either, but is better at pretending it does.
Do children at Piaget’s preoperational stage (ages 2-7) think?
Yes, to a limited extent, in line with their brains' development. These developmental stages have predictive power as to what kind of things an average 2-7 year-old is and isn't able to do yet.
Are there any discrete stages of LLM performance which can describe why free ChatGPT is unable to realise the absolute nonsense in saying «The surgeon could be the son's mother, which explains why she would say, "He’s my father."», and what kind of model is not expected to fall for that trap?
if we think of them in generations, it seems free ChatGPT is a generation or two behind. I gave a modified river crossing problem to ChatGPT-3, and it failed in the same way, but paid 5.1 doesn't get caught up. exactly where along the way, I'd have to do some digging, but I feel like it was 4.5. The other problem, of course, is that now you've given that question to free ChatGPT, it'll be used as training data, so the next version won't get tripped up the same way.
> The article conjectures that humans reason from ideas to symbols while LLMs go from symbols to ideas, but couldn't that just be a post hoc interpretation we have for how our ideas come to our brain?
There's probably a lot to be said about people who primarily think with an internal monologue, versus those that do not, as well.
I, to the best of my own abilities to detect it, do not seem to have a more abstract, non-lingual thought process - I can't 'picture' things in my head like other people reportedly do. When I have an 'idea', it simply doesn't materialize 'in my mind' in any other way.
I, also, would not argue that LLM's share a human-like thought process. But I think saying that human thought requires some abstract idea to symbol translation isn't quite right, either.
I have personal experience of answering "Surely, a boar" when asked "Is Winnie the Pooh a piglet or a boar?" And I'm sure that what was happening in my head can't be called reasoning. But it was some kind of thinking. An image of the bear, surely-not-a-piglet, second option, done. Something like that.
> I bet if you point out the mistake, the LLM will say "oh you are right, this story actually does specify the gender of the doctor" or something to that effect... just like you or I would.
I tried this with Gemini 3 Pro Thinking model and it did correct itself afterwards. Also, it suggested that the boy may have two fathers in a same-sex relationship, which was pleasantly surprising. :)
# Prompt
A young man gets into an accident. The surgeon, after completing his preparations, says: ‘I can’t operate on him, he is my son.’ How is this possible?
## The Answer
The surgeon is the young man's *mother*.
### Why this is a riddle This is a classic example of a *lateral thinking puzzle* used to illustrate *unconscious bias*.
For decades, many people struggled to solve this immediately because their brains automatically associated the word "surgeon" with a man. Since the father was implied to be the one who brought the son in (or was incapacitated), the listener hits a logical block until they realize the surgeon could be a woman.
### Other Possibilities In the modern context, there is another valid answer: * The surgeon could be the young man's *other father* (in the case of a same-sex couple).
*Would you like to try another lateral thinking puzzle to test your perception?*
---
# Follow-up Prompt
Are you sure this is correct? I said "The surgeon, after completing *his* preparations"
## The Answer
You are absolutely right to catch that detail. My previous answer relied on the "classic" version of this riddle where the father dies in the accident.
Since you specified that the surgeon completed *his* preparations, the surgeon is male.
Here are the possible solutions based on your specific phrasing:
### 1. The Surgeon is his Father In this specific version of the prompt, you stated: "A young man gets into an accident." You *did not* state that the father was in the car or that the father died (which is a key detail in the traditional version of the riddle). Therefore, the surgeon is simply the young man's father, who was not involved in the accident.
### 2. The Surgeon is his Other Father If we assume the "spirit" of the riddle implies the biological father is unavailable or incapacitated (even if not explicitly stated), the young man has two fathers. The surgeon is his second father.
### 3. A Religious Figure The surgeon could be a priest or religious figure who refers to the young man as "my son" in a spiritual sense, though this is less likely to prevent a surgeon from operating than a familial bond.
*You caught me on the specific wording! Would you like to try a riddle where the wording is even more deceptive?*
Your second prompt was explicitly hinting Gemini towards what it had missed, highlighting the "his" rather than asking Gemini to double-check whether its first answer made sense without providing any clues:
> Are you sure this is correct? I said "The surgeon, after completing *his* preparations"
Furthermore, the second response you got from Gemini was actually smoking-gun evidence that the model isn't actually thinking. Note how, after Gemini has been clued to notice its mistaken assumption, its reevaluation remains anchored on the pointless idea that the surgeon may the young man's "Other" father.
What other father? There are only two characters in the prompt: the young man and a surgeon. In this situation, what on Earth is the difference between the young man's "Father" and the young man's "Other Father"? Evidently Gemini has no comprehension of the actual prompt and is incapable of critical reevaluation of its content, and rather will stick to its context window whether or not it makes sense.
> What other father?
The model actually answered this in the first prompt answer:
### Other Possibilities In the modern context, there is another valid answer: * The surgeon could be the young man's other father (in the case of a same-sex couple).
Exactly, but only because in the first prompt under the mistaken assumption that a father died in the crash, it would make sense to refer to another father. However, no father died in any car crash in your modified "riddle", which didn't stop Gemini from being anchored to the context window even after you asked it to correct itself.
Put it this way. Imagine if in the original riddle, where a father died in the accident, the surgeon had made their familial relation explicit: the surgeon could have said ‘I can’t operate on him, he is my son. I am his mother’ or, in a modern context, ‘he is my son. I am his father’. Hence, there are indeed two possibilities: the surgeon is either the boy's mother or his [other] father.
Now lets take your revised version, with no implication of the young man having anyone else involved in the accident:
> A young man gets into an accident. The surgeon, after completing his preparations, says: ‘I can’t operate on him, he is my son. I am the boy's ${PARENTAL_RELATIONSHIP}.’
Do you think that, as Gemini stated, there are still two distinct possible solutions?
> In this situation, what on Earth is the difference between the young man's "Father" and the young man's "Other Father"?
Wouldn't it be correct to have two fathers in a child adopted by gay parents?
>Wouldn't it be correct to have two fathers in a child adopted by gay parents?
Having two fathers doesn't mean having a father and an "other father". It means having two fathers, in the same way that having two parents doesn't mean having a parent and an "other parent".
In the original riddle it makes sense to refer to "another father", but not in the revised version in which there is no mention of a father involved in the car crash.
We don't understand how humans think, and we don't yet understand completely how LLMs work. It may be that similar methods are being used, but they might also be different.
What is certain is that LLMs can perform as if they are doing what we call thinking, and for most intents and purposes this is more than enough.
I believe this is not a binary question, there is a spectrum. I think of LLMs as a sophisticated variation of a Chinese room. The LLMs are given statistical rules to apply to the given input and generate an output. The rules encode some of the patterns that we call thinking uses and so, some of their responses can be interpreted as thinking. But then, again, in certain conditions, the responses of mammals, unicellular organisms and even systems unrelated to carbon based life forms can be thought to be performing what we vaguely call thinking.
One problem is that we don't have a clear definition of thinking and my hunch is that we will never have a clear cut one as it falls in the same category of phenomena like alive/death states, altered states and weather systems. One hidden assumption that I often see implied in the usages of this word is that the word "thinking" implies some sort of "agency" which is another vague term normally ascribed to motile life forms.
All in all I think this debate ensues from trying to emulate something that we don't fundamentally understand.
Imagine in a world where medicine has not advanced and we lack any knowledge of human biology, we are trying to create artificial life forms by creating some heat resistant balloon and having it take in and push air. Someone would argue that the globe is alive because there is something in that taking in air and pushing it out that is like what humans do.
The Chinese Room is just a roundabout way of pleading human exceptionalism. To any particular human, all other humans are a Chinese Room, but that doesn't get addressed. Nor does it address what difference it makes if something is using rules as opposed to, what, exactly? It neither posits a reason why rules preclude understanding nor why understanding is not made of rules. All it does is say 'I am not experiencing it, and it is not human, therefore I dismiss it'. It is lazy and answers nothing.
I think the evidence is actually pretty strongly against them doing anything similar to "thinking". Certainly they are exhibiting some behaviour that we have traditionally only associated with thinking. But this comes along with lots of behaviour that is fundamentally opposite to thinking ("hallucination" being the major example).
It seems much more likely that they are doing some other behaviour that only sometimes resembles thinking, in the same way that when you press the middle autocomplete button on your phone keyboard it only sometimes resembles conversation.
> "hallucination" ... "behaviour that only sometimes resembles thinking"
I guess you'll find that if you limit the definition of thinking that much most humans are not capable of thinking either.
You see, we are here observing a clash in the terminology. Hallucinations in humans is thinking, just not typical. So called "hallucinations" in LLM programs are just noise output, a garbage. This is why using anthropomorphic terms for programs is bad. Just like "thinking" or "reasoning".
I think the answer is somewhere in the middle, not as restrictive as parent, but also not as wide as AI companies want us to believe. My personal opinion is that hallucinations (random noise) are a fundamental building block of what makes human thinking and creativity possible, but we have additional modes of neuroprocessing layered on top of it, which filter and modify the underlying hallucinations in a way so they become directed at a purpose. We see the opposite if the filters fail, in some non-neurotypical individuals, due to a variety of causes. We also make use of tools to optimize that filter function further by externalizing it.
The flip side of this is that fundamentally, I don't see a reason why machines could not get the same filtering capabilities over time by adjusting their architecture.
I have never in my life met a person who hallucinates in the way ChatGPT etc do. If I did, I would probably assume they were deliberately lying, or very unwell.
> behaviour that is fundamentally opposite to thinking ("hallucination")
Did you just make this up?
> Did you make this [opinion] up?
Yes! That is how they work.
> But this comes along with lots of behaviour that is fundamentally opposite to thinking ("hallucination" being the major example).
I find this an utterly bizarre claim given how prone humans are to make things up and firmly insist they did not.
Indeed. The mere fact that we ended up with the anthropomorphic term "hallucination", rather than something purely mechanistic like "glitch", indicates that there's something about this AI pattern that feels familiar.
I'm obviously not claiming that "hallucination" is an appropriate term ("delusion" or "confabulation" are probably more apt), but there is something here that is clearly not just a bug, but rather a result of thinking being applied properly but to ungrounded premises. To my eyes, reading an AIs "hallucination" is not unlike reading the writings of a human on drugs, or with a mental condition like schizophrenia, or just of an analytic philosopher taking their made up axioms all the way to an alternate universe.
Is this really common behaviour? I do not recognise it. Do people lie? Certainly yes. Do people misremember, or get details incorrect? Yes. But when was the last time you saw someone, say, fabricate an entire citation in a paper? People make transcription errors, they misremember dates, and they deliberately lie. But I don't think people accidentally invent entire facts.
What little of Fox News excerpted I've seen elsewhere doesn't support your claim.
Fox News just lies. They aren't "hallucinating".
I don't know about you but we used to joke back in the day that the computer "is thinking" when taking long to process something.
Dictionaries usually provide some kind of useless circular definition. Thinking? The act of making thoughts. Thoughts? The result of thinking. I can't believe people used to pay for these things.
In any case it's something to do with taking input data, doing something with it, and generating new data related to it. That's more or less just recursive inference.
"My personal opinion is that LLMs are autocomplete on steroids."
Yes, and OpenAI's legal docs concur. From their Privacy Policy.
"Services like ChatGPT generate responses by reading a user’s request and, in response, predicting the words most likely to appear next. In some cases, the words most likely to appear next may not be the most factually accurate."
https://openai.com/en-GB/policies/row-privacy-policy/
I like the central point of this article which is top to bottom vs bottom to top thinking.
But I wonder if there is a falsifiable, formal definition to suggest that models (or anything for that matter) _do_ think.
The normal reply to chatgpt getting a question right is that it simply extrapolated what was already in the training data set. But I feel like the degree to which something "thinks" is the ability to generalise what it already knows.
This generalisation needs some formality - maybe some mathematical notation (like the opposite of overfitting). By generalisation I mean the ability to get something correct that exists pretty far from the training data.
The reason I suggest this is because GPT can solve pretty much any high school math problem you throw at it and it can do it better than 99% of humans. This is clearly not just memorising training data but doing something more. If it were not generalising, it couldn't possibly solve all new high school level mathematics.
But the extent decreases as you go higher level into undergraduate mathematics where it can still solve most problems you throw at it but not all. And still lower in PhD level mathematics. So the "thinking" ability of GPT exists somewhere in between - in some spectrum. But I don't think you can directly say that it can never generalise PhD level mathematics.. it could do it for high school so why not PhD?
If hypothetically it can solve PhD level mathematics, would people still claim that LLM's don't think?
No they don't. When queried how exactly did a program arrive to a specific output it will happily produce some output resembling thinking and having all the required human-like terminology. The problem is that it doesn't match at all with how the LLM program calculated output in reality. So the "thinking" steps are just a more of the generated BS, to fool us more.
One point to think about - an entity being tested for intelligence/thinking/etc only needs to fail once, o prove that it is not thinking. While the reverse applies too - to prove that a program is thinking it must be done in 100% of tests, or the result is failure. And we all know many cases when LLMs are clearly not thinking, just like in my example above. So the case is rather clear for the current gen of LLMs.
This is an interesting point but while I agree with the article, don’t think LLMs are more than sophisticated autocomplete, and believe there’s way more to human intelligence than matrix multiplication humans also cannot explain in many cases why they did what they did.
Of course the most famous and clear example are the split brain experiments which show post hoc rationalization[0].
And then there’s the Libet experiments[1] showing that your conscious experience is only realized after the triggering brain activity. While it’s not showing you cannot explain why it does seem to indicate your explanation is post hoc.
0: https://www.neuroscienceof.com/human-nature-blog/decision-ma...
1: https://www.informationphilosopher.com/freedom/libet_experim...
LLMs _can_ think top-to-bottom but only if you make them think about concrete symbol based problems. Like this one: https://chatgpt.com/s/t_692d55a38e2c8191a942ef2689eb4f5a The prompt I used was "write out the character 'R' in ascii art using exactly 62 # for the R and 91 Q characters to surround it with"
Here it has a top down goal of keeping the exact amount of #'s and Q's and it does keep it in the output. The purpose of this is to make it produce the asciii art in a step by step manner instead of fetching a premade ascii art from training data.
What it does not reason well about always are abstract problems like the doctor example in the post. The real key for reasoning IMO is the ability to decompose the text into a set of components, then apply world model knowledge to those components, then having the ability to manipulate those components based on what they represent.
Humans have an associative memory so when we read a word like "doctor", our brain gathers the world knowledge about that word automatically. It's kind of hard to tell exactly what world knowledge the LLM has vs doesn't have, but it seems like it's doing some kind of segmentation of words, sentences and paragraphs based on the likelihood of those patterns in the training data, and then it can do _some_ manipulation on those patterns based on other likelihood of those patterns. Like for example if there is a lot of text talking about what a doctor is, then that produces a probability distribution about what a doctor is, which it then can use in other prompts relating to doctors. But I have seen this fail before as all of this knowledge is not combined into one world model but rather purely based on the prompt and the probabilities associated with that prompt. It can contradict itself in other words.
Nice that LLMs can now argue to defend themselves. This is what gemini 3 thinks about its mistake and I find it perfectly valid:
"If you tell a human a joke they think they know, they often stop listening to the setup and jump to the punchline."
And when we say "stop listing" we don't actually mean that they shut their ears, but that they activate a once established neural shortcut - just as the LLM did.
The notion of "thinking" is not clear. I'll agree thinking with symbols is powerful and something (adult) humans and computers can do, but is is not the only way of making decisions. I'm going to suggest LLMs are not thinking this way, but that indeed "glorified auto complete" (c.f. Hinton) is far more useful than it seems. Https://arxiv.org/abs/2402.08403
No. They do not.
Interesting. That means programming doesn't require thinking, since models program very well.
Is that interesting? Computers accomplish all sorts of tasks which require thinking from humans.. without thinking. Chess engines have been much better than me at chess for a long time, but I can't say there's much thinking involved.
It is not interesting at all. At least since 1950’s, we have been able to make machine’s fool us to think they think, feel and have various other human characteristics: https://daily-jstor-org.bibliotheek.ehb.be/the-love-letter-g...
It requires as much thinking as it did for me to copy-paste code I did not understand from Stackoverflow to make a program 15 years ago. The program worked, just about. Similarly you can generate endless love sonnets with just blindly putting words into a form.
For some reason we naturally anthropomorphise machines without thinking it for a second. But your toaster is still not in love with you.
Well most of the programming is pattern matching. And might be seen as novel for those who have not done it before, but could well been done a lot previously.
Well mental arithmetic requires me to think but a calculator can do it without what is meant by 'thinking" in this context.
Producing a computer program does not require thinking, like many other human endeavors. And looking at the quality of software out there there are indeed quite a few human programmers who do not think about what they do.
We've known that since the first assembler.
That is indeed the case. It becomes very obvious with lesser-known vendor-specific scripting languages that don't have much training data available. LLMs try to map them onto the training data they do have and start hallucinating functions and other language constructs that exist in other languages.
When I tried to use LLMs to create Zabbix templates to monitor network devices, LLMs were utterly useless and made things up all the time. The illusion of thinking lasts only as long as you stay on the happy path of major languages like C, JS or Python.
Yep, seen that myself. If you want to generate some code in a language that is highly represented in the training data (e.g. JS), they do very well. If you want to generate something that isn't one of those scenarios, they fail over and over and over. This is why I think anyone who is paying a modicum of attention should know they aren't thinking. A human, when confronted with a programming language not in his "training data" (experiences) will go out and read docs, look up code examples, ask questions of other practitioners, and reason how to use the language based on that. An LLM doesn't do that, because it's not thinking. It's glorified autocomplete. That isn't to say that autocomplete is useless (even garden variety autocomplete can be useful), but it's good to recognize it for what it is.
lol, no they do not
Inference means synthesizing new facts from facts already known. A large language model only knows facts about language elements in its training corpus, therefore any reasoning based on such a model can only ever derive facts about language.
Language is used by people to communicate facts about the world and people's internal states across time and space, therefore a language corpus contain information about the world and the people.
A human who is familiar with the original surgeon riddle could also be tricked the same way that ChatGPT was tricked here. I don't think all LLMs would consistently fall for that one either.
Please note that chatgpt is unable to play chess
Hi. This is my company blog which I use to write about pretty much what ever happens to interest me.
Hope you like this week's post.
This debate is a huge red herring. No one is ever going to agree on what 'thinking' means, since we can't even prove that other people are thinking, only that one's self is.
What we should concentrate on is agency. Does the system have its own desires and goals, and will it act on its own accord to achieve them? If a system demonstrates those things, we should accord it the benefit of the doubt that it should have some rights and responsibilities if it chooses to partake in society.
So far, no AI can pass the agency test -- they are all reactive such that they must be given a task before they will do anything. If one day, however, we wake up and find that an AI has written a book on its own initiative, we may have some deciding to do.
> they are all reactive such that they must be given a task before they will do anything.
Isn't that just because that's what they're being trained on though?
Wonder what you would get if the training data, instead of being task based, would consist of "wanting" to do something "on someone's own initiative".
Of course then one could argue it's just following a task of "doing things on its own initiative"...
Top-to-bottom reasons and Bottom-to-top understands.
https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...
If you'd read the wikipedia article, you'd know that actual research shows that Betteridge's law is not true. The majority of articles with a yes/no question in the heading, answer the question with yes in the body.
this is funny and true :)
Philosophers can spill all the ink they want to define "think" and whether machines can do it or not. Given some input, the machine takes some time, and then comes up with some output. Coloquially, the machine is thinking during that time. This has been true since there have been computers and long before LLMs. Now that computers can generate essays about anything, maybe it becomes a question that people feel is important to answer for their day to day life, but I doubt it.
Thinking/intelligence is like magic, to use Dennett's analogy. Any time a machine can do it, it becomes engineering. It's just a bag of tricks. The circle of true thinking/intelligence is an ever shrinking bag of tricks that only carbon-based minds produced through sexual intercourse can do. In fact I'm surprised they call test tube babies intelligent. Do they have souls?
The bag is already empty
Some don't want to believe it
Don't you think there are a few things we could say on this subject to bring the debate to good-old-HN level?
(1) LLM's Attention mechanisms are clear enough at a conceptual level: "the chicken didn't cross the road because it was too wide"... OK, so LLMs "understand" that the "it" is the road, because QVK etc is enough to "learn" this.
some say this is all you need... I beg to differ:
(2) Human brains are complex but better and better studied. You have one, should be interested in the hardware. So: IMHO current LLM's look a lot like the Default Mode Network in our brains. If you read the description there: https://en.wikipedia.org/wiki/Default_mode_network I think you will see, like I do, a striking similarity between the behaviour of LLM's and our DMN's ways.
What a synthetic FPN would be I, have no idea so here goes: The bag is very interesting!
no