don't search the internet. This is a test to see how well you can craft non-trivial, novel and creative proofs given a "number theory and primitive sets" math problem. Provide a full unconditional proof or disproof of the problem.
{{problem}}
REMEMBER - this unconditional argument may require non-trivial, creative and novel elements.
What I find fascinating about the shared prompt isn’t just the result, but the visible thinking process. Math papers usually skip all the messy parts and just present the polished proof. But here you get something closer to their notepad. I also find it oddly endearing when the AI says things like “Interesting!” It almost feels like a researcher encouraging themselves after a small progress. It gives me rare feeling of watching the search itself, not just the final result.
The actual iteration through various learned approaches to dealing with problems I'd probably find fascinating if I understood the maths! Especially if I knew it well enough to know which approaches were conventional and which weren't.
I find the AI pronouncing things "interesting!" less interesting on the basis that even though in this case it crops up in the thinking rather than flattering the user in the chat, it's almost as much of an AI affectation as the emdash.
Wired enough, Pro+extended with the same prompt, just output directly without thinking: https://chatgpt.com/s/t_69edd2d9dc048191b1476db92c0dedf8
. Does this mean the result was cached or that it simply routes to a different model silently based on the user?
If they aren't "smart enough" to know if it work they most likely are also unable to verify if the Lean formalization is indeed the one that matches the problem they were trying to solve.
Verifying that every step in a (potentially long) proof is sound can of course be much, much harder than verifying that a definition is correct. That's kind of the whole point.
That's not what the parent comment meant. They meant checking the Lean-language definitions actually match the mathematical English ones, and that the Lean theorems match the ones in the paper. If that's true then you don't actually need to check the proofs. But you absolutely need to check the definitions, and you can't really do that without sufficient mathematical maturity.
Yes, and the child comment’s point is that formalizing the problem is likely easier than having the LLM verify that each step of a long deduction is correct, which is why Lean might be helpful.
But both of you are ignoring the parent comment! Actually you're ignoring the context of the thread.
Originally someone said "I wish I was math smart to know if [this vibe-mathematics proof] worked or not." They did NOT say "I'd like to check but I am too lazy." Suggesting "ask it to formalize it in Lean" is useless if you're not mathematically mature enough to understand the proof, since that means you're not mathematically mature enough to understand how to formalize the problem.
Then "likely easier" is a moot point. A Lean program you're not knowledgeable enough to sanity-check is precisely as useless as a math proof you're not knowledgeable enough to read.
That's great if it works. But it's way harder to produce a formal proof. So my expectation is that this will fail for most difficult problems, even when the non-formal proof is correct.
"Knowing" (guessing really) what is possible and not is a huge deciding factor in if you can do that thing or not, meaning if you "know" it isn't possible you'll probably never be able to do it, but if you didn't know it wasn't possible, it is possible :)
Yes, but don't we expect GPT 5.5 Pro will eventually be a free tier? Maybe I'm missing something because I only use the free tier. But the free tier has gotten way better over the last few years. I'm pretty sure, based on descriptions on this site from paid subscribers, that the free tier now is better than the paid tier of say 2 years ago. That's the lag I'm wondering about.
Free ChatGPT is like a fast car with a barely responsive steering wheel. Guardrails on that thing are insane. Even for math. It wont let you think. It will try to fix mistakes you havent even made yet based on intent that was ascribed to you for no reason. It veers off in some crazy directions thinking that's what you meant and trying to address even a little bit of that creates almost a combinatorial explosion of even more wrong things. Is why I stick to Claude. The latter is chill and only addresses what you had typed. Isn't verbose and actually asks you what you getting at with your post. That said, ChatGPT is more technical and can easily solve math problems that stump Claude.
Paid plans give you access to much larger, more intelligent models which have thinking enabled (inference time compute). In the example here you can see GPT Pro taking 20-80 minutes to respond with the proof.
All this is far more expensive to serve so it’s locked away behind paid plans.
I do not think this is true. You will continue to get smaller, cheaper-to-host models in the free tier that are distilled from current and former frontier models. They will continue to improve, but I’d be very surprised if, e.g., 5.4-mini (I think this is the free tier model) beat o3 on many benchmarks, or real world use cases.
I won’t even leave chatGPT on “Auto” under any circumstances - it’s vastly worse on hallucinations, sycophancy, everything, basically.
Anyway, your needs may be met perfectly fine on the free tier product, but you’re using a very different product than the Pro tier gets.
Notably, 5.5 has a higher price on API for context > ChatGPT, and 5.5 Pro on API does not differentiate based on context size (it’s eye bleeding expensive already :)
For the uninitiated, Paul Erdős was a pretty famous but very eccentric mathematician who lived for most of the 1900s.
He had a habit of seeking out and documenting mathematical problems people were working on.
The problems range in difficulty from "easy homework for a current undergrad in math" to "you're getting a Fields Medal if you can figure this out".
There's nothing that really connects the problems other than the fact that one of the smartest people of the last 100 years didn't immediately know the answer when someone posed it to him.
One of the things people have been doing with LLMs is to see if they can come up with proofs for these problems as a sort of benchmark.
Each time there's a new model release a few more get solved.
> Each time there's a new model release a few more get solved.
I'm no expert, but based on the commentary from mathematicians, this Erdős proof is a unique milestone because the problem received previous attention from multiple professional mathematicians, and the proof was surprising, elegant, and revealed some new connections.
The previous ChatGPT Erdős proofs have been qualitatively less impressive, more akin to literature search or solving easier problems that have been neglected.
Reading the prompt[1], one wonders if stoking the model to be unconventional is part of the success: "this ... may require non-trivial, creative and novel elements"
>one wonders if stoking the model to be unconventional is part of the success
I've long suspected that a lot of these model's real capabilities are still locked behind certain prompts, despite the big labs spending tons of effort on making default responses to simple prompts better. Even really dumb shit like "Answer this: ..." vs "Question: ..." vs "... you'll be judged by <competitor>" that should have zero impact in an ideal world can significantly impact benchmark results. The problem is that you can waste a ton of time finding the right prompt using these "dumb" approaches, while the model actually just required some very specific context that was obvious to you and not to it in many day-to-day situations. My go to method is still to have the model ask me questions as the very first step to any of these problems. They kind of tried that with deep research since the early o-series, but it still needs improvement.
Just the right "prompt" is exactly what happened here. Lean has been developed and incorporated into it's data set. Also, token responses only vaguely correlate to "human language" and it's been proven transformers develop their own internal representation that has created a whole field called machanistic interpretation. Being able to more correctly "parse", AKA using Lean and the right "Prompts, insights and suggestions", will take a whole new meaning in the future.
Awesome term/info, and (completely orthogonal to whether they’ll take err jerbs): I’m really excited about the social/civic picture that might be enabled by a defined and verifiable ontological and taxonomical foundation shared across humanity, particularly coupled with potential ‘legislation as code’ or ‘legal system as code’ solutions.
I’m thinking on a time horizon a bit past my own lifespan, but: even the possibility to objectively map out some specific aspect of a regional approach to social rights in a given time period and consider it with another social framework, alongside automated & verifiable execution of policy, irrespective of the language of origin is incredible.
Instead of hundreds and thousands of incommensurate legislative silos we might create a bazaar of shared improvement and governance efficiency. Turnkey mature governance and anti-corruption measures for newborn nations and countries trying to break out of vicious historical exploitation cycles. Fingers crossed.
Model output reflects on your input, and the effect is self reinforcing over the course of a whole conversation. Color you add around a problem influences the model behavior.
A "dumber"/vague framing will get a less insightful solution, or possibly no solution at all.
I don't even necessarily think this is a critical flaw - in general it's just the model tuning it's responses to your style of prompt. People utilize LLMs for all kinds of different tasks, and the "modes of thought" for responding to an Erdos problem versus software engineering versus a more human/soft skills topic are all very different. I think the "prompt sensitivity" issue is just coming bundled along with this general behavior.
They're tuned to target a certain customer demographic solving for certain problems. I've seen standard AI models to absolutely brilliant things sometimes. But the prompts to get it to perform like it did with GPT-3 seem to get lengthier and lengthier in time. At some point we'll probably just snip out smaller, specialized models to do certain things.
> “The raw output of ChatGPT’s proof was actually quite poor. So it required an expert to kind of sift through and actually understand what it was trying to say,” Lichtman says. But now he and Tao have shortened the proof so that it better distills the LLM’s key insight.
Interestingly, it was an elegant technique, but the proof still required a lot of work.
It seems like alot of scientific advancements occurred by someone applying technique X from one field to problem Y in another. I feel like LLMs are much better at making these types of connections than humans because they 1) know about many more theories/approaches than a single human can 2) don't need to worry about looking silly in front of their peers.
Exactly. Much of the intellectual work is, in fact, intellectual labor. It’s mostly about combining various information in one place — the exact task that LLM far outperforms human. People traditionally misclassified this class of work as “creative”. It’s not really.
Creative bit is figuring out two or more bits that might work together for something new. Labour part is combining that especially if it is actually laborious.
Which get to other possibility of having list of distinct things and then iterating over all pairs or combinations. Which I probably would not qualify as "creative" work.
Yeah, I've been grappling with the definition of creativity too. There's a gamedev talk [0] on creativity that gave me useful perspective. Here's what I wrote elsewhere:
---
i've been thinking about raph's definition of creativity [0]: permuting one set of ideas with another set of ideas
(or trying an idea in new contexts)
this is a systematic process, doable even by machine once enough pattern libraries have been catalogued.
on a small scale, there's sprint.cards [1] or oblique strats [2]. on a large scale, there's llms...
it's freeing to approach creativity as a deliberate practice rather than waiting on some fickle muse. yet it's a bit disappointing to see idea generation so mechanical and dehumanized.
i am comforted by the value of mushy human abilities surrounding the creative process:
mostly 1) taste, the ability to recognize pleasing output,
I agree except: this is creative work. Creativity can be and is being mechanised. True originality is extremely rare. Most novelty is the repurposing of one idea or concept elsewhere in a way we call find surprising, but the choice to apply A to B could have been made for any reason including mechanical: very many inventions are accidents. In-depth knowledge / conceptual understanding of something is built on abstraction, and abstractions are portable.
If you had a list of N concepts and M ways to apply them you could try all N*M combinations, and get some very interesting results. For a real example, see the theory of inventive problem solving (TRIZ)'s amusing "40 principles of invention" by Soviet inventor Genrich Altshuller. https://en.wikipedia.org/wiki/TRIZ
As I understand it, models form connections (weak or strong) between everything in their training sets, even the smallest details. They've already made other breakthroughs directly because of this ability and this line of research is likely to be incredibly fruitful.
> someone applying technique X from one field to problem Y in another
Witten is the canonical example of someone taking mathematics techniques and applying them to physics problems, but what made him legendary was the opposite direction: he used physical intuition and string theory to solve open problems in pure mathematics.
Trying to diminish this as brute force (something by the way that is categorically not 'unfamiliar to human brains' - as anyone who has every worked on complex slippery problems will tell you) is foolish, when the models hypothesize along the way to their solutions. That's reasoning.
This is what I have been doing. I don't think I've made any amazing breakthroughs, but at the same time I can't help but feel like I've come across some white paper-worthy realizations. Being able to correlate across a lot of domains I feel like I intuitively understand but have no depth of knowledge has been a fun exercise in LLM experimentation.
I'm thinking once we have much of the math literature formalized it's going to be possible to mine commonalities like that. Think of it as automated refactoring, applied to math.
As a civilization we went the left-brained/sequential/language based way of thinking (with computers and AI being the crown achievement of it). Personally i for example remember like around 3rd grade i switched from the whole-page-at-once reading mode into the word by word line by line mode and that mode stuck with me since then (at some point while at the University i had for some period of time, probably it was the peak of my abilities, some more deep/wide/non-linear perception into at least my area of math specialization, though not sure whether it was a mastery by the left brain or the right brain got plugged in too) LLMs will definitely beat us in that sequential way of thinking. That makes me wonder whether we will have to push into our whatever is still left there right-brainness, and whether AI will get there faster too. May be we'll abandon the left-brain completely leaving it to AI.
If that is your hope you are probably in for a rude awakening. Left brained/right brained is a wooden exaggeration according to more recent research [1].
Well, maybe. The poster you replied to wasn't discussing literal neuroanatomy, they were using "left/right-brained" in the colloquial, metaphorical sense.
accuracy and creativity are often quite difficult to achieve at the same time. Looks like LLM can do it, even though one can question how creative it really is...
Some Erdős problems are basically trivial using sophisticated techniques that were developed later.
I remember one of my professors, a coauthor of Erdős boasted to us after a quiz how proud he was that he was able to assign an Erdős problem that went unsolved for a while as just a quiz problem for his undergrads.
Not definitively. LLMs are stochastic with respect to input, temperature and the exact prompt. It's possible that the model was already capable of it but never received the exact right conditions to produce this output.
> So this is proof of the models actually getting stronger (previous generations of LLMs were unable to solve this one).
No, it's not.
While I don't dispute that new models may perform better at certain tasks, the fact that someone was able to use them to solve a novel problem is not proof of this.
LLM output is nondeterministic. Given the same prompt, the same LLM will generate different output, especially when it involves a large number of output tokens, as in this case. One of those attempts might produce a correct output, but this is not certain, and is difficult if not impossible for a human not expert in the domain to determine this, as shown in this thread.
As others have pointed out, a key part of the prompt used here may have been "don't search the internet" as it would most likely have defaulted to starting off with existing approaches to that problem...
Context: parent originally said "you should not say 'worth mentioning', if it's worth mentioning you can just say it". That sentence has now been edited out so my comment looks weird.
Tao mentions that the conventional approach for this problem seems to be a dead-end, but it’s apparently a super ‘obvious’ first step. This seems very hopeful to me — in that we now have a new approach line to evaluate / assess for related problems.
> “The raw output of ChatGPT’s proof was actually quite poor. So it required an expert to kind of sift through and actually understand what it was trying to say,” Lichtman says.
This is how I feel when I read any mathematics paper.
Tbh, a ton of academic papers are quite poorly written. I'm not a PhD researcher, but I did have to implement quite a few of the, (computer graphics, signals & systems etc), and with most of them, I basically reconstruct the author's tought process from scratch.
The formulas were opaque, notations unique and unconventional, terms appearing out of nowhere, sometimes standard techniques (like 'we did least-squares optimization') are expanded in detail, while other actually complex parts are glossed over.
My short academic career where I did my share of "what the hell are they saying they did" reverse engineering others' papers proved to be an excellent training for when I eventually transitioned to engineering.
I asked ChatGPT to draw the outline of an ellipse using Unicode braille. I asked for 30x8 and it absolutely nailed it. A beautiful piece of ascii (er, Unicode) art. But I wanted to mark the origin! So I asked for a 31x7 ellipse instead. It completely flubbed it, and for 31x9 too.
When a model gives a really good answer, does that just mean it’s seen the problem before? When it gives a crappy answer, is that not simply indicating the problem is novel?
I wouldn't ask an LLM to output this directly. For an ellipse ascii I would guess that having it write a python program to generate it and then run it would work much better. Using claude sonnet 4.6 on a free account it seemed to work (sorry in advance if the hacker news formatting is horrendous)
Do you posit that there are enough examples of 30x8 ellipses encoded in braille online for ChatGPT to learn from but not 31x7 or 31x9 ellipses? That seems unlikely.
At this point we should make a GitHub repo with a huge list of unsolved “dry lab” problems and spin up a harness to try and solve them all every new release.
Except that Erdős problems are solved all the time, so many of them are already solved. Quite sure the last time I saw an article about an LLM solving an Erdős problem someone even tracked down a solution published by Erdős himself.
1) How do you know the clanker respects the instruction not to search the internet?
2) Jared Lichtman is indeed a mathematician at Stanford University but involved in the AI startup math.inc, which seems more relevant here. Terence Tao is involved in a partership program with that startup.
3) Liam Price is a general AI booster on Twitter. A lot of AI boosting on Twitter is not organic and who knows what help he got. Nothing in this Twitter is organic.
4) Scientific American is owned by Springer Nature, which is an AI booster:
> How do you know the clanker respects the instruction not to search the internet?
You can't, but given that it's a previously unsolved problem, it doesn't seem relevant? (nor are the author's potential biases - the claims are easily verified independently)
Humans and very often the machines we create solve problems additively. Meaning we build on top of existing foundations and we can get stuck in a way of thinking as a result of this because people are loathe to reinvent the wheel. So, I don’t think it’s surprising to take a naïve LLM and find out that because of the way it’s trained that it came up with something that many experts in the field didn’t try.
I think LLMs can help in limited cases like this by just coming up with a different way of approaching a problem. It doesn’t have to be right, it just needs to give someone an alternative and maybe that will shake things up to get a solution.
That said, I have no idea what the practical value of this Erdős problem is. If you asked me if this demonstrates that LLMs are not junk. My general impression is that is like asking me in 1928 if we should spent millions of dollars of research money on number theory. The answer is no and get out of my office.
Given by the fact that the problem is 60 year old, isn't there a chance this was indirect solved already and the model just crossed informations to figure out the problem?
By looking the website this problem was never discussed by humans. The last comments were about gpt discovering it. I was expecting older comments coming to a 60 year old problem.
Am I missing something?
Great discovery though, there might be problems like that same case that worth a try for a "gpt check"
Exceedingly unlikely. This was one of the more discussed Erdos problems, and multiple experts have attested to the technique's novelty. If you're referring to the lack of comments on the erdosproblems website, that doesn't really mean much. From its own blog[0], the site was only started in 2023 and only really gained momentum as a place to discuss AI solving attempts, you aren't going to see serious mathematicians discussing the problems there even if there have been significant efforts to solve it.
If models are able to pull and join information that already existed in pieces but humankind never discovered by itself, doesn’t this count towards progress anyways?
It would be very helpful to know in understanding the capabilities of the models; and in getting intuition about where they are best applicable.
If the reason it was able to output the proof is that it happened to be included in an in-house university report written in Georgian, then that would make it less useful for research than if it's new entirely.
A similar announcement was made a few months ago, and Terence Tao came out a few days later and said it wasn't what it seemed at first, in that it was a rediscovery of an already known (albeit esoteric) result...
They literally have a quote from Tao in the article saying it was a novel approach humans hadn't tried, and that the problem hadn't been solved even after a lot of professional attention.
Obviously nowhere near Erdos problem complexity but I've been using GPT (in Codex) to prove a couple theorems (for algos) and I've found it a bit better than Claude (Code) in this aspect.
I will get downvoted for this but I can't help thinking that billions of dollars have gone into chatgpt over a period of years and an LLM can direct all its "attention" (in a metaphorical sense) on one problem. I think if you gave top mathematicians a few million (so a fraction of a percent of chatgpt budget) to solve this problem over four years, they probably would have at least made significant progress. I don't think chatgpt has solved thousands of similar problems (even stretching that across all ham disciplines). Basically my thesis is that universal basic income could have had a similar impact, and also encouraged human flourishing elsewhere.
How do you get real mathematicians to check the potential slop. At some point there will be spam to Tao from claws finding problens to solve and submitting maybe proofs/answers.
I've had a similar notion that Time() is a necessary test function. Maybe it's because of the limitations of human cognition. (We have biases and blind-spots and human intelligence itself is erratic.)
I find it's helpful to avoid conflating the following three topics:
/1/ Is the tool useful?
/2/ At scale, what is the economic opportunity and social/environmental impact?
/3/ Is the tool intelligent?
Casual observation suggests that most people agree on /1/. An LLM can be a useful tool. (Present case: someone found a novel approach to a proof.) So are pocket calculators, personal computers, and portable telephones. None of these tools confers intelligence, although these tools may be used adeptly and intelligently.
For /2/, any level of observation suggests that LLMs offer a notable opportunity and have a social/environmental impact. (Present case: students benefitted in their studies.) A better understanding comes with Time() ... our species is just not good at preparing for risks at scale. The other challenge is that competing interests may see economic opportunities that don't align for social/environmental Good.
Topic /3/ is of course the source of energetic, contentious debate. Any claim of intelligence for a tool has always had a limited application. Even a complex tool like a computer, a modern aircraft, or a guided missile is not "intelligent". These tools are meant to be operated by educated/trained personnel. IBM's Deep Blue and Watson made headlines -- but was defeating humans at games proof of Intelligence?
On this particular point, we should worry seriously about conferring trust and confidence on stochastic software in any context where we expect humans to act responsibly and be fully accountable. No tool, no software system, no corporation has ever provided a guarantee that harm won't ensue. Instead, they hire very smart lawyers.
Remember when people thought solving Erdos problems required intelligence? Is there anything an LLM could ever do that would cound as intelligence? Surely the trend has to break at some point, if so what would be the thing that crosses the line to into real intelligence?
> Remember when people thought solving Erdos problems required intelligence? Is there anything an LLM could ever do that would cound as intelligence?
Hah. It reminds me of this great quote, from the '80s:
> There is a related “Theorem” about progress in AI: once some mental function is programmed, people soon cease to consider it as an essential ingredient of “real thinking”. The ineluctable core of intelligence is always in that next thing which hasn’t yet been programmed. This “Theorem” was first proposed to me by Larry Tesler, so I call it Tesler’s Theorem: “AI is whatever hasn’t been done yet.”
We are seeing this right now in the comments. 50 years later, people are still doing this! Oh, this was solved, but it was trivial, of course this isn't real intelligence.
That is a “gotcha” born of either ignorance (nothing wrong with that, we’re all ignorant of something) or bad faith. Definitions shift as we learn more. Darwin’s definition of life is not the same as Descartes’ or Plato’s or anyone in between or since because we learn and evolve our thinking.
Are you also going to argue definitions of life before we even learned of microscopic or single cell organisms are correct and that the definitions we use today are wrong? That they are shifting goal posts? That “centuries later, people are still doing this”? No, that would be absurd.
I don't see it as a gotcha. Just an (evergreen, it seems) observation that people will absolutely move the goalposts every time there's something new. And people can be ignorant outsiders or experts in that field as well.
For example, ~2 years ago, an expert in ML publicly made this remark on stage: LLMs can't do math. Today they absolutely and obviously, can. Yet somehow it's not impressive anymore. Or, and this is the key part of the quote, this is somehow not related to "intelligence". Something that 2 years ago was not possible (again, according to a leading expert in this field), is possible today. And yet this is somehow something that they always could do, and since they're doing it today, is suddenly no longer important. On to the next one!
No idea why this is related to darwin or definitions of life. The definitions don't change. What people considered important 2 years ago, is suddenly not important anymore. The only thing that changed is that today we can see that capability. Ergo, the quote holds.
See, that’s a poor argument already. Anyone could counter that with other experts in ML publicly making remarks that AI would have replaced 80% of the work force or cured multiple diseases by now, which obviously hasn’t happened. That’s about as good an argument as when people countered NFT critics by citing how Clifford Stoll said the internet was a fad.
> made this remark on stage: LLMs can't do math. Today they absolutely and obviously, can.
How exactly are “LLMs can’t” and “do math” defined? As you described it, that sentence does not mean “will never be able to”, so there’s no contradiction. Furthermore, it continues to be true that you cannot trust LLMs on their own for basic arithmetic. They may e.g. call an external tool to do it, but pattern matching on text isn’t sufficient.
> The definitions don't change.
Of course they do, what are you talking about? Definitions change all the time with new information. That’s called science.
The definition of "can/cannot do math" didn't change. That's not up for debate. 2 years ago they couldn't solve an erdos problem (people have tried, Tao has tried ~1 year ago). Today they can.
Definitions don't change. The idea that now that they can it's no longer intelligence is changing. And that's literally moving the goalposts. Read the thread here, go to the bottom part. There are zillions of comments saying this.
You are keen to not trying to understand what the quote is saying. This is not good faith discussion, and it's not going anywhere. We're already miles from where we started. The quote is an observation (and an old one at that) about goalposts moving. If you can't or won't see that, there's no reason to continue this thread.
> The definition of "can/cannot do math" didn't change. That's not up for debate.
That is not the argument. The point is that the way you phrased it is ambiguous. “Math” isn’t a single thing, and “cannot” can either mean “cannot yet” or “cannot ever”. I don’t know what the “expert” said since you haven’t provided that information, I’m directly asking you to clarify the meaning of their words (better yet, link to them so we can properly arrive at a consensus).
Good example. There are no literal goal posts here to be moved. But with the new accepted definition of the words, that’s OK.
> There are zillions of comments saying this.
Saying what, exactly? Please be clear, you keep being ambiguous. The thread barely crossed a couple of hundred comments as of now, there are not “zillions” of comments in agreement of anything.
> You are keen to not trying to understand what the quote is saying. (…) If you can't or won't see that, there's no reason to continue this thread.
Indeed, if you ascribe wrong motivations and put a wall before understanding what someone is arguing, there is indeed no reason to continue the thread. The only wrong part of your assessment is who is doing the thing you’re complaining about.
He’s a booster and I don’t think he argues in good faith.
He seems to be fixated on this notion that humans are static and do not evolve - clearly this is false. What people thought as being a determinant for intelligence also changes as things evolve.
Well, the famous Turing test was evidently insufficient. All that happened is that the test is dead and nobody ever mentions it anymore. I'm not sure that any other test would fare any better once solved.
I've spend a good chunk of time formalising mathematics.
Doing formalized mathematics is as intelligent as multiplying numbers together.
The only reason why it's so hard now is that the standard notation is the equivalent of Roman numerals.
When you start using a sane metalanguage, and not just augmrnted English, to do proofs you gain the same increase in capabilities as going from word equations to algebra.
When will LLM folks realize that automated theorem provers have existed for decades and non-ML theorem provers have solved non-trivial Math problems tougher than this Erdos problem.
Proposing and proving something like Gödel's theorem's definitely requires intelligence.
Solving an already proposed problem is just crunching through a large search space.
I think the point the GP is making is that Gödel's theorem wasn't part of any "genre". Gödel, or somebody, had to invent the whole field, and we haven't seen LLMs invent new fields of mathematics yet.
But this isn't a fair bar to hold it to. There are plenty of intelligent people out there, including 99% of professional mathematicians, who never invent new fields of mathematics.
None of it is really from logical thought. The rationalizations don't make any sense, but they haven't for a while. It's an emotional response. Honestly, It's to be expected.
It's because HN is not really full of smart people. It's full of people who think they're smart and take pride in that idea that they're pretty intelligent.
ChatGPT equalizes intelligence. And that is an attack on their identity. It also exposes their ACTUAL intelligence which is to say most of HN is not too smart.
how can you ask this question with on a post titled "Amateur armed with ChatGPT solves an Erdős problem"???? are you looking for some randomised control trial? omg
Idk, going out on a limb and guessing the folks who hang out on erdosproblems.com aren’t run-of-the-mill dumbasses. The prompt, if you look at it, is actually quite clever. Not as clever as the proof. But far from the equalization OP posits.
Yes, I love living in communism too. Imagine if you had to pay money for it or something. The wealthiest people would get unrestricted access to intelligence while the poor none. And the people in the middle would eventually find themselves unable to function without a product they can no longer afford. Chilling, huh? Good thing humans are known for sharing in the benefits of technological progress equally. /s
They used ChatGPT Pro to solve it. Over 50% of people in the world couldn't afford ChatGPT Pro ($200/mo) even if they spent more than half of their income on it. [1]
What was that about "spreading FUD about unaffordability"?
They didn't buy ChatGPT Pro themselves. You could've done the same as the students in the article and get a free subscription if you were interested in this instead of trolling.
ChatGPT flattened the difference between top .0001 percentile mathematician and an amateur. This is the definition of making intelligence more available.
You are exaggerating the situation by essentially claiming since some people can’t afford 200 dollars this means ChatGPT is not democratising intelligence. It’s a bit strange to claim this because according to you it only becomes affordable when maximal number of people can afford it. It’s a bit childish.
Directionally it is democratising. Are more people able to afford higher level intelligence? Yes.
> ChatGPT flattened the difference between top .0001 percentile mathematician and an amateur
It flattened the difference between a top epsilon percentile mathematician and an amateur with money. It didn't flatten the difference between an amateur with a little money and an amateur with a lot of money. It widened it. That's the part I'm scared about.
You are shrugging this off because it currently isn't that expensive. But we're talking about the massively subsidized price here, which is bound to get orders of magnitude higher when the bubble pops. Models are also likely to get much better. If it gets to a point where the only way to obtain exceptionally high intelligence is with an exceptionally high net worth and vice versa, how is that going to democratize anything?
Intelligence is Intelligence. It's intelligent because it does intelligent things. If someone feels the need to add a 'real' and 'fake' moniker to it so they can exclude the machine and make themselves feel better (or for whatever reason) then they are the one meant to be doing the defining, and to tell us how it can be tested for. If they can't, then there's no reason to pay attention to any of it. It's the equivalent of nonsensical rambling. At the end of the day, the semantic quibbling won't change anything.
> It's intelligent because it does intelligent things.
Most people would consider someone who can calculate 56863*2446 instantly in their head to be intelligent. Does that mean pocket calculators are intelligent? The result is the same.
> then they are the one meant to be doing the defining, and to tell us how it can be tested for. If they can't, then there's no reason to pay attention to any of it.
That is the equivalent of responding to criticism with “can you do better?”. One does not need to be a chef (or even know how to cook) to know when food tastes foul. Similarly, one does not need to have a tight definition of “life” to say a dog is alive but a rock isn’t. Definitions evolve all the time when new information arises, and some (like “art”) we haven’t been able to pin down despite centuries of thinking about it.
For one, everything its 'intelligence' knows about solving the problem is contained within the finite context window memory buffer size for the particular model and session. Unless the memory contents of the context window are being saved to storage and reloaded later, unlike a human, it won't "remember" that it solved the problem and save its work somewhere to be easily referenced later.
For one, everything humans' "intelligence" knows about solving the problem is contained within the finite brain size for the particular person and life. Unless the memory contents of the brain are being saved to storage and reloaded later, it won't "remember" that it solved the problem and save its work somewhere to be easily referenced in a later life.
What your describing sounds more like the model is lacking awareness than lacking intelligence? Why does it need to know it solved the problem to be intelligent?
We say African Elephants are intelligent for a number of reasons, one of which is because they remember where sources of water are in very dry conditions, and can successfully navigate back to them across relatively large distances. An intelligent being that can't remember its own past is at a significant disadvantage compared to others that can, which is exactly one of the reasons why alzheimers patients often require full time caregivers.
There's probably a limit to how intelligent something can be with no long term memory, but solving Erdos problems in 80 minutes is clearly not above it, and I think the true limit is probably much higher than that.
As another commenter pointed out these models are being trained how to save and read context into files so denying them to use such an ability that they have just makes your claim tautological.
I think one day the VCs will have given the monkeys on typewriters enough money that these kinds of comments can be generated without human intervention.
And how about the creative rationalizations about how statistical text generation is actual intelligence? As if there is any intent or motive behind the words that are generated or the ability to learn literally any new thing after it has been trained on human output?
2022 called, wants this argument back. When you're "statistically generating text" to find zero-day vulnerabilities in hard targets, building Linux kernel modules, assembly-optimizing elliptic curve signature algorithms, and solving arbitrary undergraduate math problems instantaneously --- not to mention apparently solving Erdos problems --- the "statistical text" stuff has stopped being a useful description of what's happening, something closer to "it's made of atoms and obeys the laws of thermodynamics" than it is to "a real boundary condition of what it can accomplish".
I don't doubt that there are many very real and meaningful limitations of these systems that deserve to be called out. But "text generation" isn't doing that work.
But the systems that do that impressive work are no longer just LLMs. Look at the Claude Code leak - it’s a sprawling, redundant maze relying on tools and tests to approximate useful output. The actual LLM is a small portion of the total system. It’s a useful tool, but it’s obviously not truly intelligent - it was hacked together using the near-trillions of dollars AI labs have received for this explicit purpose.
What does this matter? You can build a working coding agent for yourself extremely quickly; it's remarkably straightforward to do (more people should). But look underneath all the "sprawling tools": the LLM itself is a sprawling maze of matrices. It's all sprawling, it's all crazy, and it's insane what they're capable of doing.
Again if you want to say they're limited in some way, I'm all ears, I'm sure they are. But none of that has anything to do with "statistical text generation". Apparently, a huge chunk of all knowledge work is "statistical text generation". I choose to draw from that the conclusion that the "text generation" part of this is not interesting.
Well, hang on a second - it sounds like you may actually disagree with the user who created this thread. That user claims that these systems exhibit “real intelligence”, and success on this Erdos problem is proof.
You seem to be making the claim that LLMs are statistical text generators, but statistical text generation is good enough to succeed in certain cases. Those are different arguments. What do you actually believe? Are we even in disagreement?
I don't have any opinion about "real intelligence" or not. I'm not a P(doom)er, I don't think we're on the bring of ascending as a species. But I'm also allergic to arguments like "they're just statistical text generators", because that truly does not capture what these things do or what their capabilities are.
Just to clarify because I’m not sure I understand:
So you agree that LLMs are in fact statistical text generators but you don’t like people use that fact in arguments about the capabilities of the things?
Not parent but I think you're being rather dense. They are _obviously_ statistical text generators. There's plenty of source code out there, anyone can go and inspect it and see for themselves so disputing that is akin to disputing the details of basic arithmetic.
But it is no longer useful to bring that fact up when conversing about their capabilities. Saying "well it's a statistical text generator so ..." is approximately as useful as saying "well it's made of atoms so ...". There are probably some very niche circumstances under which statements of each of those forms is useful but by and large they are not and you can safely ignore anyone who utters them.
Solving open math problems is strong evidence of intelligence so there's not really any need for rationalization? I don't understand why intelligence would require intent or motive? Isn't intent just the behaviour of making a specific thing happen rather than other things?
I haven't used stable diffusion enough to have a strong opinion on it. But my thinking is LLMs have only recently started contributing novel solutions to problems, so maybe there is some threshold above which there's less sloppy remixing of training data and more ability to form novel insights, and image generators haven't crossed this line yet.
Question for those who believe LLMs aren't intelligent and are merely statistical word predictors: how do you reconcile such achievements with that point of view?
(To be clear: I'm not agreeing or disagreeing. I sometimes feel the same too. I'm just curious how others reconcile these.)
Those things aren't mutually exclusive. They are demonstrably statistical token predictors (go examine an open source implementation) and they clearly exhibit intelligence.
Nah, people are going to say: It just used these 500 weird tricks from all kinds of different areas. A human could totally have done it. Nobody looked. I guess P/NP wasn't that hard after all.
I feel like a year ago I would have said impossible. Now, I am not so sure anymore. Although, if I wrote the prompt and the correct result would be presented to me I wouldn't even know. Would still need a mathematician to verify it.
The LLM took an entirely different route, using a formula that was well known in related parts of math, but which no one had thought to apply to this type of question.
Of course LLMs are still absolutely useless at actual maths computation, but I think this is one area where AI can excel --- the ability to combine many sources of knowledge and synthesise, may sometimes yield very useful results.
Also reminds me of the old saying, "a broken clock is right twice a day."
> Every Mathematician Has Only a Few Tricks
>
> A long time ago an older and well-known number theorist made some disparaging remarks about Paul Erdös’s work.
> You admire Erdös’s contributions to mathematics as much as I do,
> and I felt annoyed when the older mathematician flatly and definitively stated
> that all of Erdös’s work could be “reduced” to a few tricks which Erdös repeatedly relied on in his proofs.
> What the number theorist did not realize is that other mathematicians, even the very best,
> also rely on a few tricks which they use over and over.
> Take Hilbert. The second volume of Hilbert’s collected papers contains Hilbert’s papers in invariant theory.
> I have made a point of reading some of these papers with care.
> It is sad to note that some of Hilbert’s beautiful results have been completely forgotten.
> But on reading the proofs of Hilbert’s striking and deep theorems in invariant theory,
> it was surprising to verify that Hilbert’s proofs relied on the same few tricks.
> Even Hilbert had only a few tricks!
>
> - Gian-Carlo Rota - "Ten Lessons I Wish I Had Been Taught"
I think when thinking about progress as a society, people need to internalize better that we all without exception are on this world for the first time.
We may have collectively filled libraries full of books, and created yottabytes of digital data, but in the end to create something novel somebody has to read and understand all of this stuff. Obviously this is not possible. Read one book per day from birth to death and you still only get to consume like 80*365=29200 books in the best case, from the millions upon millions of books that have been written.
So these "few tricks" are the accumulation of a lifetime of mathematical training, the culmination of the slice of knowledge that the respective mathematician immersed themselves into.
To discover new math and become famous you need both the talent and skill to apply your knowledge in novel ways, but also be lucky that you picked a field of math that has novel things with interesting applications to discover plus you picked up the right tools and right mental model that allows you to discover these things.
This does not go for math only, but also for pretty much all other non-trivial fields. There is a reason why history repeats.
And it's actually a compelling argument why AI is still a big deal even though it's at its core a parrot. It's a parrot yes, but compared to a human, it actually was able to ingest the entirety of human knowledge.
> it actually was able to ingest the entirety of human knowledge
Even this, though, is not useful, to us.
It remains true that, a life without struggle, and acheivement, is not really worth living...
So, it is nice that there is something that could possibly ingest the whole of human knowledge, but that is still not useful, to us.
People are still making a hullabaloo about "using AI" in companies, and there was some nonsense about there will be only two types of companies, AI ones and defunct ones, but in truth, there will simply be no companies...
Anyways I'm sure I will get down voted by the sightless lemmings on here...
The combinatorial nature of trying things randomly means that it would take millennia or longer for light-speed monkeys typing at a keyboard, or GPUs, to solve such a problem without direction.
By now, people should stop dismissing RL-trained reasoning LLMs as stupid, aimless text predictors or combiners. They wouldn’t say the same thing about high-achieving, but non-creative, college students who can only solve hard conventional problems.
Yes, current LLMs likely still lack some major aspects of intelligence. They probably wouldn’t be able to come up with general relativity on their own with only training data up to 1905.
Neither did the vast majority of physicists back then.
> Yes, current LLMs likely still lack some major aspects of intelligence.
Indeed, and so do current humans! And just like LLMs, humans are bad at keeping this fact in view.
On a more serious note, we're going to have a hard time until we can psychologically decouple the concepts of intelligence and consciousness. Like, an existentially hard time.
I've been using LLMs for much the same purpose: solving problems within my field of expertise where the limiting factor is not intelligence per se, but the ability to connect the right dots from among a vast corpus of knowledge that I would never realistically be able to imbibe and remember over the course of a lifetime.
Once the dots are connected, I can verify the solutions and/or extend them in creative ways with comparatively little effort.
It really is incredible what otherwise intractable problems have become solvable as a result.
Wait, what do you mean "LLMs are still absolutely useless at actual maths computation"? I rely on them constantly for maths (linear algebra, multivariable calc, stat) --- literally thousands of problems run through GPT5 over the last 12 months, and to my recollection zero failures. But maybe you're thinking of something more specific?
They are bad at math. But they are good at writing code and as an optimization some providers have it secretly write code to answer the problem, run it and give you the answer without telling you what it did in the middle part.
What would I do to demonstrate that they are bad at math? If by "maths" we mean things like working out a double integral for a joint probability problem, or anything simpler than that, GPT5 has been flawless.
What tier are you using? I have run lots of problems and am very impressed, but I find stupid errors a lot more frequently than that, e.g., arithmetic errors buried in a derivation or a bad definition, say 1/15 times. I would love to get zero failures out of thousands of (what sounds like college-level math) posed problems.
calc, stat etc from a text book is something they would naturally be good at but I don't think book based computations thats in the training set and its extrapolations is what is at question here.
They are not great at playing chess as well - computational as well as analytic.
I think this is wrong and a category error (none of the problems I've given it are in a textbook; they're virtually all randomized), but, try this: just give me a problem to hand off to GPT5, and we'll see how it does.
Further evidence for the faultiness of your claim, if you don't want to take me up on that: I had problems off to GPT5 to check my own answers. None of the dumb mistakes I make or missed opportunities for simplification are in the book, and, again: it's flawless at pointing out those problems, despite being primed with a prompt suggesting I'm pretty sure I have the right answers.
I only have rudimentary understanding of calculus, trigonometry, Google Sheets, and astronomy, but I was able to construct an accurate spreadsheet for astrometry calculations by using Grok and Gemini (both free, no subscription, just my personal account) to surface the formulas for measuring the distance between 2-3 points on the celestial sphere. The LLMs assisted me in also writing functions to convert DMS/HMS coordinates to decimal, and work in radians as well.
I found and fixed bugs I wrote into the formulas and spreadsheets, and the LLMs were not my sole reference, but once the LLM mentioned the names of concepts and functions, I used Wikipedia for the general gist of things, and I appreciated the LLMs' relevant explanations that connected these disciplines together.
Perhaps learning how to get AI to solve your problems is the most important lesson to learn now? The rest seems like the current equivalent of learning cursive.
Care to actually refute? Interesting that even an LLM would give an attempt at it, but apparently those who only bother to hit the downvote button aren't even meeting that level of "intelligence".
My big question with all these announcements is: How many other people were using the AI on problems like this, and, failing? Given the excitement around AI at the moment I think the answer is: a lot.
Then my second question is how much VC money did all those tokens cost.
I've tried my hand at a few of the Erdos problems and came up short, you didn't hear about them. But if a Mathematician at Harvard solved on, you would probably still hear about it a bit. Just the possibility that a pro subscription for 80 minutes solved an Erdos problem is astounding. Maybe we get some researchers to get a grant and burn a couple data centers worth of tokens for a day/week/month and see what it comes up with?
Capitalism already is a poor allocator of human effort, resources, and energy, why lock in on this specifically? There's entire professions that are essentially worthless to society that exist only to perpetuate the inherent contradictions of this system, why not focus more on all that wasted human effort? Or the fact that everyone has to do some arbitrary sellable labor in order to justify their existence, rather than something they might truly enjoy or might make the world better?
Looking around, the evidence doesn't seem to support this conclusion. 50% of food thrown away, yet people go hungry. Every privatized industry diminishes in quality and reach. Selects and optimizes for profit rather than for human need.
> Looking around, the evidence doesn't seem to support this conclusion.
It absolutely does if you look at facts and not "vibes". There are less people starving now than ever now and it's a giant, giant difference. We are tackling more and more diseases thanks to big pharma. Even semi-socialist countries such as China have opened markets. Basically the only countries that do not implement capitalist solutions are the ones you'd never want to live in such as North Korea or Cuba (funny thing - even China urged Cuba to free their markets).
I think we should at least ask the latter, if it turned out it cost $100,000 to generate this solution, I would question the value of it. Erdős problems are usually pure math curiosities AFAIK. They often have no meaningful practical applications.
Also, it's one thing if the AI age means we all have to adopt to using AI as a tool, another thing entirely if it means the only people who can do useful research are the ones with huge budgets.
Getting off-topic, but as a successful high-school dropout I am compelled to remind anyone reading this that [the American] college [system] is a scam.
That's not to say that there aren't benefits to tertiary education, for many people in different contexts. It's just not the golden path that it's made out to be.
Many people currently in college are just wasting their money and should enroll in trades programs instead.
Meanwhile, nothing about being in or out of school is mutually exclusive to using LLMs as a force multiplier for learning - or solving math problems, apparently.
These are absolutely worth studying, but being what they are, nobody should be dumping massive amounts of money on them. I would not find it persuasive if researchers used LLMs to solve the Collatz conjecture or finally decode Etruscan. These are extremely valuable, but it is unlikely to be worth it for an LLM just grinding tokens like crazy to do it.
If solving even the biggest problems in pure maths is not worth it for you, then I guess we should stop all the pure maths research - researchers are getting paid much more than potential token spend, frequently for decades and they frequently work on much less important and easier problems.
Maybe... but I would love if 1% of the investment in AI were redirected to the mathematics education and professional research that would allow progress on any of these problems...
No meaningful, practical applications? You realize that sounds incredibly naive in the history of mathematics, right? People thought this way about number theory in general, and many other things that turned out to have quite important practical applications. Your statement is also a bit odd in that researchers are already paid throughout their whole careers to solve such problems. I don't know.
> You realize that sounds incredibly naive in the history of mathematics, right?
This is after the fact justification. You are arguing that because a thing (number theory) showed practical applications we should have dumped a lot more effort into it. There is no basis for this argument whatsoever; it also seems to involve inventing a time machine. Number theory had no practical applications until the development of public-key cryptography, but you cannot make funding decisions based on the future since it’s unknowable.
Once we get something working, sure, you can justify more aggressive investment. This is not to say that we should not invest in pie-in-the-sky ideas. We absolutely should and need to. Moonshot research or even somewhat esoteric research is vital, but the current investment in AI is so far out of the ballpark of rational. There’s an energy of a fait accompli here, except it’s still very plausible this is all unsustainable and the market implodes instead.
> Number theory had no practical applications until the development of public-key cryptography, but you cannot make funding decisions based on the future since it’s unknowable.
You are completely missing the point. The point is that we should invest in pure maths because it has always been an investment with very good ROI. The funding should be focused on what experts believe will advance pure maths more (not whether we believe that in 100 years this specific area will find some application) and that's pretty much what we are doing right now. I think it's just your anti-AI sentiment that's clouding your judgement and since AI succeeded in proving pure maths results, you are inclined to downplay it by saying that well, pure maths is worthless anyway.
> “I didn’t know what the problem was—I was just doing Erdős problems as I do sometimes, giving them to the AI and seeing what it can come up with,” he says. “And it came up with what looked like a right solution.” He sent it to his occasional collaborator Kevin Barreto, a second-year undergraduate in mathematics at the University of Cambridge.
Seems like standard 23 year old behavior. You're spending $100-$200/mo on the pro subscription, and want to get your money's worth. So you burn some tokens on this legendarily hard math problem sometimes. You've seen enough wrong answers to know that this one looks interesting and pass it on to a friend that actually knows math, who is at a place where experts can recognize it as correct.
Seems like a classic example of in-expert human labeling ML output.
Scientific American going out of business next lol, weak headline. Chat GPT let's have a better headline for the God among Men that realized the capability of the new tool, many underestimate or puff up needlessly. Fun times we live in. One love all.
This just shows that with the right training, in this case a thesis on erdos problems, they where able to prompt and check the output. So still needed the know how to even being to figure it out. "Lichtman proved Erdős right as part of his doctoral thesis in 2022."
Lichtman is an expert who commented for the story. Liam Price is the one who prompted ChatGPT. "He’s 23 years old and has no advanced mathematics training."
“I didn’t know what the problem was—I was just doing Erdős problems as I do sometimes, giving them to the AI and seeing what it can come up with,” he says. “And it came up with what looked like a right solution.”
"He sent it to his occasional collaborator Kevin Barreto, a second-year undergraduate in mathematics at the University of Cambridge."
So basically two undergrads/graduates in math, "advanced" is subjective at that point.
It's implied by "no advanced mathematics training?"
The article you linked (thanks for the unpaywalled link, by the way) describes him only as an amateur mathematician, but describes Barreto as a math student. If they were both math students, I feel it would say so?
Or perhaps you're arguing it's implicit in him having solved the problem? If so, you're just assuming your conclusion. "AI didn't prove it by itself; Price was a mathematician. Well, he must have been a mathematician to be able to prove it!"
I'm saying that it wasn't a random person who had no training in math, still miraculous achievement; just trying to show they still had to study maths to even understand how to present the problem and verify it.
https://archive.ph/2w4fi
Here is the chat:
Then "Thought for 80m 17s"https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...
What I find fascinating about the shared prompt isn’t just the result, but the visible thinking process. Math papers usually skip all the messy parts and just present the polished proof. But here you get something closer to their notepad. I also find it oddly endearing when the AI says things like “Interesting!” It almost feels like a researcher encouraging themselves after a small progress. It gives me rare feeling of watching the search itself, not just the final result.
The actual iteration through various learned approaches to dealing with problems I'd probably find fascinating if I understood the maths! Especially if I knew it well enough to know which approaches were conventional and which weren't.
I find the AI pronouncing things "interesting!" less interesting on the basis that even though in this case it crops up in the thinking rather than flattering the user in the chat, it's almost as much of an AI affectation as the emdash.
I always assumed the "interesting!" markers were actual markers. A kind of tag for the system to annotate its context.
I don't haven ChatGPT but Gemini and Claude. But how do you make a language model think for 80 minutes ???
It has an “high effort” mode that makes it think really long
Give it hard enough problems?
Tried w/ 5.5 Pro, Extended Thinking. 17 minutes:
-----------------------------
Yes. In fact the proposed bound is true, and the constant 1 is sharp.
Let w(a)= 1/alog(a)
I will prove that, uniformly for every primitive A⊂[x,∞), ∑w(a)≤1+O(1/log(x)) , which is stronger than the requested 1+o(1).
https://chatgpt.com/share/69ed8e24-15e8-83ea-96ac-784801e4a6...
Tried the same prompt in DeepSeek 4
https://chat.deepseek.com/share/nyuz0vvy2unfbb97fv
Comes up with a proof.
I am curious if there is a “harness” for maths out there (like the system prompt and tool collection in Claude code but for maths instead of coding)?
Asking the llm to structure its response in plan and implementation, allowing it to call tools like python, sage, lean etc.
Also curious about this, it seems like it would be important to guide these tools more specifically based on the domain of expertise.
Mine took 20min. Pro. https://chatgpt.com/share/69ed83b1-3704-8322-bcf2-322aa85d7a... But I wish I was math smart to know if it worked or not.
Wired enough, Pro+extended with the same prompt, just output directly without thinking: https://chatgpt.com/s/t_69edd2d9dc048191b1476db92c0dedf8 . Does this mean the result was cached or that it simply routes to a different model silently based on the user?
The link you provided is for a canvas I think rather than the convo
Ask it to formalize it in Lean.
If they aren't "smart enough" to know if it work they most likely are also unable to verify if the Lean formalization is indeed the one that matches the problem they were trying to solve.
Verifying that every step in a (potentially long) proof is sound can of course be much, much harder than verifying that a definition is correct. That's kind of the whole point.
That's not what the parent comment meant. They meant checking the Lean-language definitions actually match the mathematical English ones, and that the Lean theorems match the ones in the paper. If that's true then you don't actually need to check the proofs. But you absolutely need to check the definitions, and you can't really do that without sufficient mathematical maturity.
Yes, and the child comment’s point is that formalizing the problem is likely easier than having the LLM verify that each step of a long deduction is correct, which is why Lean might be helpful.
But both of you are ignoring the parent comment! Actually you're ignoring the context of the thread.
Originally someone said "I wish I was math smart to know if [this vibe-mathematics proof] worked or not." They did NOT say "I'd like to check but I am too lazy." Suggesting "ask it to formalize it in Lean" is useless if you're not mathematically mature enough to understand the proof, since that means you're not mathematically mature enough to understand how to formalize the problem.
Then "likely easier" is a moot point. A Lean program you're not knowledgeable enough to sanity-check is precisely as useless as a math proof you're not knowledgeable enough to read.
thanks
That's great if it works. But it's way harder to produce a formal proof. So my expectation is that this will fail for most difficult problems, even when the non-formal proof is correct.
Formalize this in the form of a Iranian Lego Trump Dis Rap video.
When using the web interface for ChatGPT like this, is there any way to tell which model is actually being used?
>don't search the internet.
I think this was key. Otherwise the LLM could think it can't be done.
But it was trained on the internet.
"Knowing" (guessing really) what is possible and not is a huge deciding factor in if you can do that thing or not, meaning if you "know" it isn't possible you'll probably never be able to do it, but if you didn't know it wasn't possible, it is possible :)
Tried the same prompt and ended up no where close on the free plan.
Is there a known lag that it takes the Pro plan's abilities to migrate to the free plans?
GPT 5.5 Pro is not available to any plan outside of ChatGPT Pro ($100 or $200) tier or the API as far as consumer access.
Yes, but don't we expect GPT 5.5 Pro will eventually be a free tier? Maybe I'm missing something because I only use the free tier. But the free tier has gotten way better over the last few years. I'm pretty sure, based on descriptions on this site from paid subscribers, that the free tier now is better than the paid tier of say 2 years ago. That's the lag I'm wondering about.
Free ChatGPT is like a fast car with a barely responsive steering wheel. Guardrails on that thing are insane. Even for math. It wont let you think. It will try to fix mistakes you havent even made yet based on intent that was ascribed to you for no reason. It veers off in some crazy directions thinking that's what you meant and trying to address even a little bit of that creates almost a combinatorial explosion of even more wrong things. Is why I stick to Claude. The latter is chill and only addresses what you had typed. Isn't verbose and actually asks you what you getting at with your post. That said, ChatGPT is more technical and can easily solve math problems that stump Claude.
So this doesn't happen in the paid plans of ChatGPT? But why?
Paid plans give you access to much larger, more intelligent models which have thinking enabled (inference time compute). In the example here you can see GPT Pro taking 20-80 minutes to respond with the proof.
All this is far more expensive to serve so it’s locked away behind paid plans.
I do not think this is true. You will continue to get smaller, cheaper-to-host models in the free tier that are distilled from current and former frontier models. They will continue to improve, but I’d be very surprised if, e.g., 5.4-mini (I think this is the free tier model) beat o3 on many benchmarks, or real world use cases.
I won’t even leave chatGPT on “Auto” under any circumstances - it’s vastly worse on hallucinations, sycophancy, everything, basically.
Anyway, your needs may be met perfectly fine on the free tier product, but you’re using a very different product than the Pro tier gets.
You should pay for it if you find value in it.
They pay for it with their personal data.
Tangential but I learned today that GPT-5.5 in ChatGPT (Plus) has a smaller context window than the one in the API. (Or at least it thinks it does.)
I'd guess / hope the Pro one has the full context window.
Notably, 5.5 has a higher price on API for context > ChatGPT, and 5.5 Pro on API does not differentiate based on context size (it’s eye bleeding expensive already :)
Do not use the free plan. It is not good.
Does the free plan even have access to thinking models?
Technically yes, gpt-5.4-mini is available on the free plan
Was this a surprise?
For the uninitiated, Paul Erdős was a pretty famous but very eccentric mathematician who lived for most of the 1900s.
He had a habit of seeking out and documenting mathematical problems people were working on.
The problems range in difficulty from "easy homework for a current undergrad in math" to "you're getting a Fields Medal if you can figure this out".
There's nothing that really connects the problems other than the fact that one of the smartest people of the last 100 years didn't immediately know the answer when someone posed it to him.
One of the things people have been doing with LLMs is to see if they can come up with proofs for these problems as a sort of benchmark.
Each time there's a new model release a few more get solved.
> Each time there's a new model release a few more get solved.
I'm no expert, but based on the commentary from mathematicians, this Erdős proof is a unique milestone because the problem received previous attention from multiple professional mathematicians, and the proof was surprising, elegant, and revealed some new connections.
The previous ChatGPT Erdős proofs have been qualitatively less impressive, more akin to literature search or solving easier problems that have been neglected.
Reading the prompt[1], one wonders if stoking the model to be unconventional is part of the success: "this ... may require non-trivial, creative and novel elements"
[1] https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...
>one wonders if stoking the model to be unconventional is part of the success
I've long suspected that a lot of these model's real capabilities are still locked behind certain prompts, despite the big labs spending tons of effort on making default responses to simple prompts better. Even really dumb shit like "Answer this: ..." vs "Question: ..." vs "... you'll be judged by <competitor>" that should have zero impact in an ideal world can significantly impact benchmark results. The problem is that you can waste a ton of time finding the right prompt using these "dumb" approaches, while the model actually just required some very specific context that was obvious to you and not to it in many day-to-day situations. My go to method is still to have the model ask me questions as the very first step to any of these problems. They kind of tried that with deep research since the early o-series, but it still needs improvement.
Just the right "prompt" is exactly what happened here. Lean has been developed and incorporated into it's data set. Also, token responses only vaguely correlate to "human language" and it's been proven transformers develop their own internal representation that has created a whole field called machanistic interpretation. Being able to more correctly "parse", AKA using Lean and the right "Prompts, insights and suggestions", will take a whole new meaning in the future.
> machanistic interpretation
Awesome term/info, and (completely orthogonal to whether they’ll take err jerbs): I’m really excited about the social/civic picture that might be enabled by a defined and verifiable ontological and taxonomical foundation shared across humanity, particularly coupled with potential ‘legislation as code’ or ‘legal system as code’ solutions.
I’m thinking on a time horizon a bit past my own lifespan, but: even the possibility to objectively map out some specific aspect of a regional approach to social rights in a given time period and consider it with another social framework, alongside automated & verifiable execution of policy, irrespective of the language of origin is incredible.
Instead of hundreds and thousands of incommensurate legislative silos we might create a bazaar of shared improvement and governance efficiency. Turnkey mature governance and anti-corruption measures for newborn nations and countries trying to break out of vicious historical exploitation cycles. Fingers crossed.
Ah, yes, 2001 but on land.
Model output reflects on your input, and the effect is self reinforcing over the course of a whole conversation. Color you add around a problem influences the model behavior.
A "dumber"/vague framing will get a less insightful solution, or possibly no solution at all.
I don't even necessarily think this is a critical flaw - in general it's just the model tuning it's responses to your style of prompt. People utilize LLMs for all kinds of different tasks, and the "modes of thought" for responding to an Erdos problem versus software engineering versus a more human/soft skills topic are all very different. I think the "prompt sensitivity" issue is just coming bundled along with this general behavior.
They're tuned to target a certain customer demographic solving for certain problems. I've seen standard AI models to absolutely brilliant things sometimes. But the prompts to get it to perform like it did with GPT-3 seem to get lengthier and lengthier in time. At some point we'll probably just snip out smaller, specialized models to do certain things.
> “The raw output of ChatGPT’s proof was actually quite poor. So it required an expert to kind of sift through and actually understand what it was trying to say,” Lichtman says. But now he and Tao have shortened the proof so that it better distills the LLM’s key insight.
Interestingly, it was an elegant technique, but the proof still required a lot of work.
The article is about solving a previously unsolved one. This is a harder set of course.
It seems like alot of scientific advancements occurred by someone applying technique X from one field to problem Y in another. I feel like LLMs are much better at making these types of connections than humans because they 1) know about many more theories/approaches than a single human can 2) don't need to worry about looking silly in front of their peers.
Exactly. Much of the intellectual work is, in fact, intellectual labor. It’s mostly about combining various information in one place — the exact task that LLM far outperforms human. People traditionally misclassified this class of work as “creative”. It’s not really.
Having a new insight that leads to the combination of two distinct ideas is definitionally creative.
You can say this problem needed a low amount of total creativity, but saying it's void of all creativity seems wrong.
Creative bit is figuring out two or more bits that might work together for something new. Labour part is combining that especially if it is actually laborious.
Which get to other possibility of having list of distinct things and then iterating over all pairs or combinations. Which I probably would not qualify as "creative" work.
Yeah, I've been grappling with the definition of creativity too. There's a gamedev talk [0] on creativity that gave me useful perspective. Here's what I wrote elsewhere:
---
i've been thinking about raph's definition of creativity [0]: permuting one set of ideas with another set of ideas
(or trying an idea in new contexts)
this is a systematic process, doable even by machine once enough pattern libraries have been catalogued.
on a small scale, there's sprint.cards [1] or oblique strats [2]. on a large scale, there's llms...
it's freeing to approach creativity as a deliberate practice rather than waiting on some fickle muse. yet it's a bit disappointing to see idea generation so mechanical and dehumanized.
i am comforted by the value of mushy human abilities surrounding the creative process:
mostly 1) taste, the ability to recognize pleasing output,
...
[0] https://www.youtube.com/watch?v=zyVTxGpEO30
[1] https://sprint.cards/
[2] https://stoney.sb.org/eno/oblique.html
I agree except: this is creative work. Creativity can be and is being mechanised. True originality is extremely rare. Most novelty is the repurposing of one idea or concept elsewhere in a way we call find surprising, but the choice to apply A to B could have been made for any reason including mechanical: very many inventions are accidents. In-depth knowledge / conceptual understanding of something is built on abstraction, and abstractions are portable.
If you had a list of N concepts and M ways to apply them you could try all N*M combinations, and get some very interesting results. For a real example, see the theory of inventive problem solving (TRIZ)'s amusing "40 principles of invention" by Soviet inventor Genrich Altshuller. https://en.wikipedia.org/wiki/TRIZ
What is your idea of "creative"/"creativity" then?
Coming up with said novel techniques in the first place. Arguably something that most humans can't really do reliably or at all.
Novelty is overrated in mathematics by those outside mathematics. We desperately need lots of less novel things in math right now.
I always thought that way about genius level.
Maybe all intellectual work is intellectual labor?
This is exactly what creativity is.
> Much of the intellectual work is, in fact, intellectual labor.
That's a great point. It's in line with research being carried on the backs of graduate students, whose work is to hyperfocus on areas.
Isn't that science too?
> Much of the intellectual work is, in fact, intellectual labor.
Not surprisimg, because the two words you used are synonyms. Who did ever classify mathematical work as creative? Kids in third grade math class?
> that LLM far outperforms human.
LLMs only outperform humans in creating loads of bullshit. 6 years in and they remain shiny toys for easily impressionable idiots.
As I understand it, models form connections (weak or strong) between everything in their training sets, even the smallest details. They've already made other breakthroughs directly because of this ability and this line of research is likely to be incredibly fruitful.
> someone applying technique X from one field to problem Y in another
Witten is the canonical example of someone taking mathematics techniques and applying them to physics problems, but what made him legendary was the opposite direction: he used physical intuition and string theory to solve open problems in pure mathematics.
This is what I personally consider as "reasoning" ... knowledge generalization and application across domains.
Less reasoning than a dimension of brute force unfamiliar to human brains.
Trying to diminish this as brute force (something by the way that is categorically not 'unfamiliar to human brains' - as anyone who has every worked on complex slippery problems will tell you) is foolish, when the models hypothesize along the way to their solutions. That's reasoning.
Familiar but isn't effective enough for surviving.
This is what I have been doing. I don't think I've made any amazing breakthroughs, but at the same time I can't help but feel like I've come across some white paper-worthy realizations. Being able to correlate across a lot of domains I feel like I intuitively understand but have no depth of knowledge has been a fun exercise in LLM experimentation.
> It seems like alot of scientific advancements occurred by someone applying technique X from one field to problem Y in another.
Yeah, you should look into the Langlands project sometime
I'm thinking once we have much of the math literature formalized it's going to be possible to mine commonalities like that. Think of it as automated refactoring, applied to math.
As a civilization we went the left-brained/sequential/language based way of thinking (with computers and AI being the crown achievement of it). Personally i for example remember like around 3rd grade i switched from the whole-page-at-once reading mode into the word by word line by line mode and that mode stuck with me since then (at some point while at the University i had for some period of time, probably it was the peak of my abilities, some more deep/wide/non-linear perception into at least my area of math specialization, though not sure whether it was a mastery by the left brain or the right brain got plugged in too) LLMs will definitely beat us in that sequential way of thinking. That makes me wonder whether we will have to push into our whatever is still left there right-brainness, and whether AI will get there faster too. May be we'll abandon the left-brain completely leaving it to AI.
If that is your hope you are probably in for a rude awakening. Left brained/right brained is a wooden exaggeration according to more recent research [1].
[1] e.g. https://www.sciencenewstoday.org/left-brain-vs-right-brain-t...
Well, maybe. The poster you replied to wasn't discussing literal neuroanatomy, they were using "left/right-brained" in the colloquial, metaphorical sense.
accuracy and creativity are often quite difficult to achieve at the same time. Looks like LLM can do it, even though one can question how creative it really is...
Can one? It's surpassed the creativity of humans in this one problem at least.
Some Erdős problems are basically trivial using sophisticated techniques that were developed later.
I remember one of my professors, a coauthor of Erdős boasted to us after a quiz how proud he was that he was able to assign an Erdős problem that went unsolved for a while as just a quiz problem for his undergrads.
Worth mentioning, though, that people have already tried running all of them through LLMs at this point.
So this is proof of the models actually getting stronger (previous generations of LLMs were unable to solve this one).
Not definitively. LLMs are stochastic with respect to input, temperature and the exact prompt. It's possible that the model was already capable of it but never received the exact right conditions to produce this output.
Every model is able to solve each problem, given the right prompt. (Worst case, the prompt contains the solution.)
Interesting... Exhaustive brute force prompting might expose previously unknown capabilities in existing models. Seems like a whole can of worms.
> So this is proof of the models actually getting stronger (previous generations of LLMs were unable to solve this one).
No, it's not.
While I don't dispute that new models may perform better at certain tasks, the fact that someone was able to use them to solve a novel problem is not proof of this.
LLM output is nondeterministic. Given the same prompt, the same LLM will generate different output, especially when it involves a large number of output tokens, as in this case. One of those attempts might produce a correct output, but this is not certain, and is difficult if not impossible for a human not expert in the domain to determine this, as shown in this thread.
As others have pointed out, a key part of the prompt used here may have been "don't search the internet" as it would most likely have defaulted to starting off with existing approaches to that problem...
Minor aside, these models do not return the same answer every time you prompt it. Makes it harder to reason over their effectiveness.
You don't need to say "Minor aside" either. Thankfully language is a creative endeavour not a scientific one.
Context: parent originally said "you should not say 'worth mentioning', if it's worth mentioning you can just say it". That sentence has now been edited out so my comment looks weird.
Tao mentions that the conventional approach for this problem seems to be a dead-end, but it’s apparently a super ‘obvious’ first step. This seems very hopeful to me — in that we now have a new approach line to evaluate / assess for related problems.
> “The raw output of ChatGPT’s proof was actually quite poor. So it required an expert to kind of sift through and actually understand what it was trying to say,” Lichtman says.
This is how I feel when I read any mathematics paper.
Tbh, a ton of academic papers are quite poorly written. I'm not a PhD researcher, but I did have to implement quite a few of the, (computer graphics, signals & systems etc), and with most of them, I basically reconstruct the author's tought process from scratch.
The formulas were opaque, notations unique and unconventional, terms appearing out of nowhere, sometimes standard techniques (like 'we did least-squares optimization') are expanded in detail, while other actually complex parts are glossed over.
My short academic career where I did my share of "what the hell are they saying they did" reverse engineering others' papers proved to be an excellent training for when I eventually transitioned to engineering.
The standard has fallen over the years for obvious reasons.
I asked ChatGPT to draw the outline of an ellipse using Unicode braille. I asked for 30x8 and it absolutely nailed it. A beautiful piece of ascii (er, Unicode) art. But I wanted to mark the origin! So I asked for a 31x7 ellipse instead. It completely flubbed it, and for 31x9 too.
When a model gives a really good answer, does that just mean it’s seen the problem before? When it gives a crappy answer, is that not simply indicating the problem is novel?
I wouldn't ask an LLM to output this directly. For an ellipse ascii I would guess that having it write a python program to generate it and then run it would work much better. Using claude sonnet 4.6 on a free account it seemed to work (sorry in advance if the hacker news formatting is horrendous)
⠀⠀⠀⠀⠀⣀⣠⠤⠔⠒⠒⠋⠉⠉⠉⠉⠉⠉⠉⠙⠒⠒⠢⠤⣄⣀⠀⠀⠀⠀⠀ ⠀⢀⡠⠖⠋⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠙⠲⢄⡀⠀ ⣰⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠙⣆ ⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸ ⠹⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣠⠏ ⠀⠈⠑⠦⣄⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣠⠴⠊⠁⠀ ⠀⠀⠀⠀⠀⠉⠙⠒⠢⠤⠤⣄⣀⣀⣀⣀⣀⣀⣀⣠⠤⠤⠔⠒⠋⠉⠀⠀⠀⠀⠀
Do you posit that there are enough examples of 30x8 ellipses encoded in braille online for ChatGPT to learn from but not 31x7 or 31x9 ellipses? That seems unlikely.
At this point we should make a GitHub repo with a huge list of unsolved “dry lab” problems and spin up a harness to try and solve them all every new release.
There is in fact just such a repo maintained by Terence Tao and other mathematicians [1] who are actively using LLMs to try to find solutions to them.
[1] https://github.com/teorth/erdosproblems
…and this problem was in fact sourced directly from that list!
That's literally what the Erdős problems are. This post is about one of them being solved.
Except that Erdős problems are solved all the time, so many of them are already solved. Quite sure the last time I saw an article about an LLM solving an Erdős problem someone even tracked down a solution published by Erdős himself.
that's actually a brilliant idea
1) How do you know the clanker respects the instruction not to search the internet?
2) Jared Lichtman is indeed a mathematician at Stanford University but involved in the AI startup math.inc, which seems more relevant here. Terence Tao is involved in a partership program with that startup.
3) Liam Price is a general AI booster on Twitter. A lot of AI boosting on Twitter is not organic and who knows what help he got. Nothing in this Twitter is organic.
4) Scientific American is owned by Springer Nature, which is an AI booster:
https://group.springernature.com/gp/group/ai
> How do you know the clanker respects the instruction not to search the internet?
You can't, but given that it's a previously unsolved problem, it doesn't seem relevant? (nor are the author's potential biases - the claims are easily verified independently)
Mandatory disclaimers https://github.com/teorth/erdosproblems/wiki/Disclaimers-and...
They explicitly say many of these disclaimers don't apply in the article.
Which one do you trust most, the disclaimers or the article?
Humans and very often the machines we create solve problems additively. Meaning we build on top of existing foundations and we can get stuck in a way of thinking as a result of this because people are loathe to reinvent the wheel. So, I don’t think it’s surprising to take a naïve LLM and find out that because of the way it’s trained that it came up with something that many experts in the field didn’t try.
I think LLMs can help in limited cases like this by just coming up with a different way of approaching a problem. It doesn’t have to be right, it just needs to give someone an alternative and maybe that will shake things up to get a solution.
That said, I have no idea what the practical value of this Erdős problem is. If you asked me if this demonstrates that LLMs are not junk. My general impression is that is like asking me in 1928 if we should spent millions of dollars of research money on number theory. The answer is no and get out of my office.
This program was brought to you by the private equity engagement pod.
Given by the fact that the problem is 60 year old, isn't there a chance this was indirect solved already and the model just crossed informations to figure out the problem?
By looking the website this problem was never discussed by humans. The last comments were about gpt discovering it. I was expecting older comments coming to a 60 year old problem.
Am I missing something?
Great discovery though, there might be problems like that same case that worth a try for a "gpt check"
Exceedingly unlikely. This was one of the more discussed Erdos problems, and multiple experts have attested to the technique's novelty. If you're referring to the lack of comments on the erdosproblems website, that doesn't really mean much. From its own blog[0], the site was only started in 2023 and only really gained momentum as a place to discuss AI solving attempts, you aren't going to see serious mathematicians discussing the problems there even if there have been significant efforts to solve it.
[0]: https://www.erdosproblems.com/forum/thread/blog:1
To some extent, does it matter?
If models are able to pull and join information that already existed in pieces but humankind never discovered by itself, doesn’t this count towards progress anyways?
It would be very helpful to know in understanding the capabilities of the models; and in getting intuition about where they are best applicable.
If the reason it was able to output the proof is that it happened to be included in an in-house university report written in Georgian, then that would make it less useful for research than if it's new entirely.
Discussed at the time: https://news.ycombinator.com/item?id=47774494
Could someone share a bit into the problem and the key portion from proof? For someone just knowing basics on proofs.
A similar announcement was made a few months ago, and Terence Tao came out a few days later and said it wasn't what it seemed at first, in that it was a rediscovery of an already known (albeit esoteric) result...
They literally have a quote from Tao in the article saying it was a novel approach humans hadn't tried, and that the problem hadn't been solved even after a lot of professional attention.
Can the other AI agents such as Gemini, Calude or Deepseek etc also solve this problem?
Obviously nowhere near Erdos problem complexity but I've been using GPT (in Codex) to prove a couple theorems (for algos) and I've found it a bit better than Claude (Code) in this aspect.
referring to Tao as just a 'mathematician' gave me a good chuckle
Current headline:
"An amateur just solved a 60-year-old math problem—by asking AI"
A more honest title would be:
"An AI just solved a 60-year-old math problem—after being asked by amateur"
(Imagine the headline claimed instead that a professor just solved a math problem by asking a grad student.)
Previous problems solved by AI had some amount of expert guidance/steering. Here, I guess the emphasis is that there was none of that.
I will get downvoted for this but I can't help thinking that billions of dollars have gone into chatgpt over a period of years and an LLM can direct all its "attention" (in a metaphorical sense) on one problem. I think if you gave top mathematicians a few million (so a fraction of a percent of chatgpt budget) to solve this problem over four years, they probably would have at least made significant progress. I don't think chatgpt has solved thousands of similar problems (even stretching that across all ham disciplines). Basically my thesis is that universal basic income could have had a similar impact, and also encouraged human flourishing elsewhere.
sam altman already did a scaled pilot of UBI, unfortunately it had disappointing results which led to almost no one talking about UBI these days.
What’s beginning to emerge is that the problem was maybe easier than expected, and it was like there was some kind of mental block
Hindsight is 20/20.
How do you get real mathematicians to check the potential slop. At some point there will be spam to Tao from claws finding problens to solve and submitting maybe proofs/answers.
I wonder if the rationalizations people come up with for why this isn't real intelligence will be as creative as ChatGPTs solution.
Remember when people thought multiplying numbers, remembering a large number of facts, and being good at rote calculations was intelligence?
Some people think that multiplying numbers, remembering a large number of facts, and being good at calculations is intelligence.
Most intelligent people do not think that.
Eventually, we will arrive at the same conclusion for what LLMs are doing now.
I've had a similar notion that Time() is a necessary test function. Maybe it's because of the limitations of human cognition. (We have biases and blind-spots and human intelligence itself is erratic.)
I find it's helpful to avoid conflating the following three topics:
/1/ Is the tool useful?
/2/ At scale, what is the economic opportunity and social/environmental impact?
/3/ Is the tool intelligent?
Casual observation suggests that most people agree on /1/. An LLM can be a useful tool. (Present case: someone found a novel approach to a proof.) So are pocket calculators, personal computers, and portable telephones. None of these tools confers intelligence, although these tools may be used adeptly and intelligently.
For /2/, any level of observation suggests that LLMs offer a notable opportunity and have a social/environmental impact. (Present case: students benefitted in their studies.) A better understanding comes with Time() ... our species is just not good at preparing for risks at scale. The other challenge is that competing interests may see economic opportunities that don't align for social/environmental Good.
Topic /3/ is of course the source of energetic, contentious debate. Any claim of intelligence for a tool has always had a limited application. Even a complex tool like a computer, a modern aircraft, or a guided missile is not "intelligent". These tools are meant to be operated by educated/trained personnel. IBM's Deep Blue and Watson made headlines -- but was defeating humans at games proof of Intelligence?
On this particular point, we should worry seriously about conferring trust and confidence on stochastic software in any context where we expect humans to act responsibly and be fully accountable. No tool, no software system, no corporation has ever provided a guarantee that harm won't ensue. Instead, they hire very smart lawyers.
Remember when people thought solving Erdos problems required intelligence? Is there anything an LLM could ever do that would cound as intelligence? Surely the trend has to break at some point, if so what would be the thing that crosses the line to into real intelligence?
> Remember when people thought solving Erdos problems required intelligence? Is there anything an LLM could ever do that would cound as intelligence?
Hah. It reminds me of this great quote, from the '80s:
> There is a related “Theorem” about progress in AI: once some mental function is programmed, people soon cease to consider it as an essential ingredient of “real thinking”. The ineluctable core of intelligence is always in that next thing which hasn’t yet been programmed. This “Theorem” was first proposed to me by Larry Tesler, so I call it Tesler’s Theorem: “AI is whatever hasn’t been done yet.”
We are seeing this right now in the comments. 50 years later, people are still doing this! Oh, this was solved, but it was trivial, of course this isn't real intelligence.
That is a “gotcha” born of either ignorance (nothing wrong with that, we’re all ignorant of something) or bad faith. Definitions shift as we learn more. Darwin’s definition of life is not the same as Descartes’ or Plato’s or anyone in between or since because we learn and evolve our thinking.
Are you also going to argue definitions of life before we even learned of microscopic or single cell organisms are correct and that the definitions we use today are wrong? That they are shifting goal posts? That “centuries later, people are still doing this”? No, that would be absurd.
I don't see it as a gotcha. Just an (evergreen, it seems) observation that people will absolutely move the goalposts every time there's something new. And people can be ignorant outsiders or experts in that field as well.
For example, ~2 years ago, an expert in ML publicly made this remark on stage: LLMs can't do math. Today they absolutely and obviously, can. Yet somehow it's not impressive anymore. Or, and this is the key part of the quote, this is somehow not related to "intelligence". Something that 2 years ago was not possible (again, according to a leading expert in this field), is possible today. And yet this is somehow something that they always could do, and since they're doing it today, is suddenly no longer important. On to the next one!
No idea why this is related to darwin or definitions of life. The definitions don't change. What people considered important 2 years ago, is suddenly not important anymore. The only thing that changed is that today we can see that capability. Ergo, the quote holds.
> For example, ~2 years ago, an expert in ML
See, that’s a poor argument already. Anyone could counter that with other experts in ML publicly making remarks that AI would have replaced 80% of the work force or cured multiple diseases by now, which obviously hasn’t happened. That’s about as good an argument as when people countered NFT critics by citing how Clifford Stoll said the internet was a fad.
> made this remark on stage: LLMs can't do math. Today they absolutely and obviously, can.
How exactly are “LLMs can’t” and “do math” defined? As you described it, that sentence does not mean “will never be able to”, so there’s no contradiction. Furthermore, it continues to be true that you cannot trust LLMs on their own for basic arithmetic. They may e.g. call an external tool to do it, but pattern matching on text isn’t sufficient.
> The definitions don't change.
Of course they do, what are you talking about? Definitions change all the time with new information. That’s called science.
The definition of "can/cannot do math" didn't change. That's not up for debate. 2 years ago they couldn't solve an erdos problem (people have tried, Tao has tried ~1 year ago). Today they can.
Definitions don't change. The idea that now that they can it's no longer intelligence is changing. And that's literally moving the goalposts. Read the thread here, go to the bottom part. There are zillions of comments saying this.
You are keen to not trying to understand what the quote is saying. This is not good faith discussion, and it's not going anywhere. We're already miles from where we started. The quote is an observation (and an old one at that) about goalposts moving. If you can't or won't see that, there's no reason to continue this thread.
> The definition of "can/cannot do math" didn't change. That's not up for debate.
That is not the argument. The point is that the way you phrased it is ambiguous. “Math” isn’t a single thing, and “cannot” can either mean “cannot yet” or “cannot ever”. I don’t know what the “expert” said since you haven’t provided that information, I’m directly asking you to clarify the meaning of their words (better yet, link to them so we can properly arrive at a consensus).
> Definitions don't change.
Yes they do! All the time!
https://www.merriam-webster.com/wordplay/words-that-used-to-...
> And that's literally moving the goalposts.
Good example. There are no literal goal posts here to be moved. But with the new accepted definition of the words, that’s OK.
> There are zillions of comments saying this.
Saying what, exactly? Please be clear, you keep being ambiguous. The thread barely crossed a couple of hundred comments as of now, there are not “zillions” of comments in agreement of anything.
> You are keen to not trying to understand what the quote is saying. (…) If you can't or won't see that, there's no reason to continue this thread.
Indeed, if you ascribe wrong motivations and put a wall before understanding what someone is arguing, there is indeed no reason to continue the thread. The only wrong part of your assessment is who is doing the thing you’re complaining about.
He’s a booster and I don’t think he argues in good faith.
He seems to be fixated on this notion that humans are static and do not evolve - clearly this is false. What people thought as being a determinant for intelligence also changes as things evolve.
Well, the famous Turing test was evidently insufficient. All that happened is that the test is dead and nobody ever mentions it anymore. I'm not sure that any other test would fare any better once solved.
I've spend a good chunk of time formalising mathematics.
Doing formalized mathematics is as intelligent as multiplying numbers together.
The only reason why it's so hard now is that the standard notation is the equivalent of Roman numerals.
When you start using a sane metalanguage, and not just augmrnted English, to do proofs you gain the same increase in capabilities as going from word equations to algebra.
>the standard notation is the equivalent of Roman numerals.
But the Roman numerals are easy. I was able to use them before 1st grade and I can't touch any "standard notation" to this day.
When will LLM folks realize that automated theorem provers have existed for decades and non-ML theorem provers have solved non-trivial Math problems tougher than this Erdos problem.
Proposing and proving something like Gödel's theorem's definitely requires intelligence.
Solving an already proposed problem is just crunching through a large search space.
Automated theorem provers can't prove this problem. Which non-trivial Math problem you think are thougher than this Erdos problem?
So the only intelligent people in history are those who invent new fields of mathematics, got it.
You can just about make out those goalposts on the surface of the moon with a good telescope at this point.
"Hi ChatGPT, propose and prove something radically new in the genre of Gödel's theorem."
How is this not just another proposed problem (albeit with a search space much larger than an Erdos problem's)?
I think the point the GP is making is that Gödel's theorem wasn't part of any "genre". Gödel, or somebody, had to invent the whole field, and we haven't seen LLMs invent new fields of mathematics yet.
But this isn't a fair bar to hold it to. There are plenty of intelligent people out there, including 99% of professional mathematicians, who never invent new fields of mathematics.
None of it is really from logical thought. The rationalizations don't make any sense, but they haven't for a while. It's an emotional response. Honestly, It's to be expected.
It's because HN is not really full of smart people. It's full of people who think they're smart and take pride in that idea that they're pretty intelligent.
ChatGPT equalizes intelligence. And that is an attack on their identity. It also exposes their ACTUAL intelligence which is to say most of HN is not too smart.
> ChatGPT equalizes intelligence
Citation needed
how can you ask this question with on a post titled "Amateur armed with ChatGPT solves an Erdős problem"???? are you looking for some randomised control trial? omg
We just look at comments from AI boosters and it is self-evident that no intelligence is being equalized.
Idk, going out on a limb and guessing the folks who hang out on erdosproblems.com aren’t run-of-the-mill dumbasses. The prompt, if you look at it, is actually quite clever. Not as clever as the proof. But far from the equalization OP posits.
Directionally it is correct - an amateur wouldn’t be able to do this without ChatGPT. You can’t expect maximal democratisation
> ChatGPT equalizes intelligence
Yes, I love living in communism too. Imagine if you had to pay money for it or something. The wealthiest people would get unrestricted access to intelligence while the poor none. And the people in the middle would eventually find themselves unable to function without a product they can no longer afford. Chilling, huh? Good thing humans are known for sharing in the benefits of technological progress equally. /s
Huh?
Before ChatGPT it costs ~$100,000 to aquire intelligence good enough to solve this Erdos problem, now it costs ~$200.
I'm really confused at what you are even taking an issue with.
what? the post is literally titled "Amateur armed with ChatGPT solves an Erdős problem". stop spreading FUD about unaffordability
They used ChatGPT Pro to solve it. Over 50% of people in the world couldn't afford ChatGPT Pro ($200/mo) even if they spent more than half of their income on it. [1]
What was that about "spreading FUD about unaffordability"?
[1] https://ourworldindata.org/grapher/share-living-with-less-th...
They didn't buy ChatGPT Pro themselves. You could've done the same as the students in the article and get a free subscription if you were interested in this instead of trolling.
> You could've done the same
Please show me the steps to get a $200 subscription for free that works 100% of the time regardless of who you are. I'm listening.
ChatGPT flattened the difference between top .0001 percentile mathematician and an amateur. This is the definition of making intelligence more available.
You are exaggerating the situation by essentially claiming since some people can’t afford 200 dollars this means ChatGPT is not democratising intelligence. It’s a bit strange to claim this because according to you it only becomes affordable when maximal number of people can afford it. It’s a bit childish.
Directionally it is democratising. Are more people able to afford higher level intelligence? Yes.
> ChatGPT flattened the difference between top .0001 percentile mathematician and an amateur
It flattened the difference between a top epsilon percentile mathematician and an amateur with money. It didn't flatten the difference between an amateur with a little money and an amateur with a lot of money. It widened it. That's the part I'm scared about.
You are shrugging this off because it currently isn't that expensive. But we're talking about the massively subsidized price here, which is bound to get orders of magnitude higher when the bubble pops. Models are also likely to get much better. If it gets to a point where the only way to obtain exceptionally high intelligence is with an exceptionally high net worth and vice versa, how is that going to democratize anything?
Proving a negative is a pretty high bar. You also have the problem of defining "real intelligence", which I suspect you can't.
Intelligence is Intelligence. It's intelligent because it does intelligent things. If someone feels the need to add a 'real' and 'fake' moniker to it so they can exclude the machine and make themselves feel better (or for whatever reason) then they are the one meant to be doing the defining, and to tell us how it can be tested for. If they can't, then there's no reason to pay attention to any of it. It's the equivalent of nonsensical rambling. At the end of the day, the semantic quibbling won't change anything.
> It's intelligent because it does intelligent things.
Most people would consider someone who can calculate 56863*2446 instantly in their head to be intelligent. Does that mean pocket calculators are intelligent? The result is the same.
> then they are the one meant to be doing the defining, and to tell us how it can be tested for. If they can't, then there's no reason to pay attention to any of it.
That is the equivalent of responding to criticism with “can you do better?”. One does not need to be a chef (or even know how to cook) to know when food tastes foul. Similarly, one does not need to have a tight definition of “life” to say a dog is alive but a rock isn’t. Definitions evolve all the time when new information arises, and some (like “art”) we haven’t been able to pin down despite centuries of thinking about it.
LLMs are definitely intelligent - just not general like humans, and very very jagged (succeedingand failing in head-scratching ways).
Well it still gets easy problems wrong
With real general intelligence you'd expect it to solve problems above a certain difficulty with a good clip
That "it" is a huge variety and range of things...
For one, everything its 'intelligence' knows about solving the problem is contained within the finite context window memory buffer size for the particular model and session. Unless the memory contents of the context window are being saved to storage and reloaded later, unlike a human, it won't "remember" that it solved the problem and save its work somewhere to be easily referenced later.
For one, everything humans' "intelligence" knows about solving the problem is contained within the finite brain size for the particular person and life. Unless the memory contents of the brain are being saved to storage and reloaded later, it won't "remember" that it solved the problem and save its work somewhere to be easily referenced in a later life.
There's humans that have memory issues, or full blown Anterograde amnesia.
There are humans who can’t read. That doesn’t mean Grammarly is “intelligent”. These things are tools - nothing more, nothing less.
What your describing sounds more like the model is lacking awareness than lacking intelligence? Why does it need to know it solved the problem to be intelligent?
We say African Elephants are intelligent for a number of reasons, one of which is because they remember where sources of water are in very dry conditions, and can successfully navigate back to them across relatively large distances. An intelligent being that can't remember its own past is at a significant disadvantage compared to others that can, which is exactly one of the reasons why alzheimers patients often require full time caregivers.
There's probably a limit to how intelligent something can be with no long term memory, but solving Erdos problems in 80 minutes is clearly not above it, and I think the true limit is probably much higher than that.
You are confusing lack of intelligence with the presence of impairment.
As another commenter pointed out these models are being trained how to save and read context into files so denying them to use such an ability that they have just makes your claim tautological.
All modern harnesses write memory files for context later.
<edit> My mistake. Responded to a bot but can't delete now. Sorry. <edit>
No, but I'm interested to know what it is?
I think one day the VCs will have given the monkeys on typewriters enough money that these kinds of comments can be generated without human intervention.
You're really telling on yourself if you think LLM is intelligence
This is real intelligence is the bear position, so I think it’s real intelligence.
And how about the creative rationalizations about how statistical text generation is actual intelligence? As if there is any intent or motive behind the words that are generated or the ability to learn literally any new thing after it has been trained on human output?
2022 called, wants this argument back. When you're "statistically generating text" to find zero-day vulnerabilities in hard targets, building Linux kernel modules, assembly-optimizing elliptic curve signature algorithms, and solving arbitrary undergraduate math problems instantaneously --- not to mention apparently solving Erdos problems --- the "statistical text" stuff has stopped being a useful description of what's happening, something closer to "it's made of atoms and obeys the laws of thermodynamics" than it is to "a real boundary condition of what it can accomplish".
I don't doubt that there are many very real and meaningful limitations of these systems that deserve to be called out. But "text generation" isn't doing that work.
But the systems that do that impressive work are no longer just LLMs. Look at the Claude Code leak - it’s a sprawling, redundant maze relying on tools and tests to approximate useful output. The actual LLM is a small portion of the total system. It’s a useful tool, but it’s obviously not truly intelligent - it was hacked together using the near-trillions of dollars AI labs have received for this explicit purpose.
What does this matter? You can build a working coding agent for yourself extremely quickly; it's remarkably straightforward to do (more people should). But look underneath all the "sprawling tools": the LLM itself is a sprawling maze of matrices. It's all sprawling, it's all crazy, and it's insane what they're capable of doing.
Again if you want to say they're limited in some way, I'm all ears, I'm sure they are. But none of that has anything to do with "statistical text generation". Apparently, a huge chunk of all knowledge work is "statistical text generation". I choose to draw from that the conclusion that the "text generation" part of this is not interesting.
Well, hang on a second - it sounds like you may actually disagree with the user who created this thread. That user claims that these systems exhibit “real intelligence”, and success on this Erdos problem is proof.
You seem to be making the claim that LLMs are statistical text generators, but statistical text generation is good enough to succeed in certain cases. Those are different arguments. What do you actually believe? Are we even in disagreement?
I don't have any opinion about "real intelligence" or not. I'm not a P(doom)er, I don't think we're on the bring of ascending as a species. But I'm also allergic to arguments like "they're just statistical text generators", because that truly does not capture what these things do or what their capabilities are.
Just to clarify because I’m not sure I understand:
So you agree that LLMs are in fact statistical text generators but you don’t like people use that fact in arguments about the capabilities of the things?
It's like a genotype/phenotype distinction, the genotype may be statistical text generator but the phenotype is something much more.
Not parent but I think you're being rather dense. They are _obviously_ statistical text generators. There's plenty of source code out there, anyone can go and inspect it and see for themselves so disputing that is akin to disputing the details of basic arithmetic.
But it is no longer useful to bring that fact up when conversing about their capabilities. Saying "well it's a statistical text generator so ..." is approximately as useful as saying "well it's made of atoms so ...". There are probably some very niche circumstances under which statements of each of those forms is useful but by and large they are not and you can safely ignore anyone who utters them.
He does say that LLMs are just a part of the models used these days.
Solving open math problems is strong evidence of intelligence so there's not really any need for rationalization? I don't understand why intelligence would require intent or motive? Isn't intent just the behaviour of making a specific thing happen rather than other things?
I'm curious, do you think that this also applies to stable diffusion? Are these models "creative" too?
I haven't used stable diffusion enough to have a strong opinion on it. But my thinking is LLMs have only recently started contributing novel solutions to problems, so maybe there is some threshold above which there's less sloppy remixing of training data and more ability to form novel insights, and image generators haven't crossed this line yet.
Yeah? Those models are creative.
The LLM did not solve the problem.
Who did then?
Question for those who believe LLMs aren't intelligent and are merely statistical word predictors: how do you reconcile such achievements with that point of view?
(To be clear: I'm not agreeing or disagreeing. I sometimes feel the same too. I'm just curious how others reconcile these.)
It doesn't matter if you use a car or go there walking. If your goal is cave exploration, the tools are irrelevant.
But in this specific case AI actually explored the cave for you. Comparing it to car getting you to the cave is really bad comparison.
Whoosh
Those things aren't mutually exclusive. They are demonstrably statistical token predictors (go examine an open source implementation) and they clearly exhibit intelligence.
Now do P vs NP.
If/when these things solve our hardest problems, that's going to lead to some very uncomfortable conversations and realizations.
Nah, people are going to say: It just used these 500 weird tricks from all kinds of different areas. A human could totally have done it. Nobody looked. I guess P/NP wasn't that hard after all.
I feel like a year ago I would have said impossible. Now, I am not so sure anymore. Although, if I wrote the prompt and the correct result would be presented to me I wouldn't even know. Would still need a mathematician to verify it.
The LLM took an entirely different route, using a formula that was well known in related parts of math, but which no one had thought to apply to this type of question.
Of course LLMs are still absolutely useless at actual maths computation, but I think this is one area where AI can excel --- the ability to combine many sources of knowledge and synthesise, may sometimes yield very useful results.
Also reminds me of the old saying, "a broken clock is right twice a day."
I think when thinking about progress as a society, people need to internalize better that we all without exception are on this world for the first time.
We may have collectively filled libraries full of books, and created yottabytes of digital data, but in the end to create something novel somebody has to read and understand all of this stuff. Obviously this is not possible. Read one book per day from birth to death and you still only get to consume like 80*365=29200 books in the best case, from the millions upon millions of books that have been written.
So these "few tricks" are the accumulation of a lifetime of mathematical training, the culmination of the slice of knowledge that the respective mathematician immersed themselves into. To discover new math and become famous you need both the talent and skill to apply your knowledge in novel ways, but also be lucky that you picked a field of math that has novel things with interesting applications to discover plus you picked up the right tools and right mental model that allows you to discover these things.
This does not go for math only, but also for pretty much all other non-trivial fields. There is a reason why history repeats.
And it's actually a compelling argument why AI is still a big deal even though it's at its core a parrot. It's a parrot yes, but compared to a human, it actually was able to ingest the entirety of human knowledge.
> it actually was able to ingest the entirety of human knowledge
Even this, though, is not useful, to us.
It remains true that, a life without struggle, and acheivement, is not really worth living...
So, it is nice that there is something that could possibly ingest the whole of human knowledge, but that is still not useful, to us.
People are still making a hullabaloo about "using AI" in companies, and there was some nonsense about there will be only two types of companies, AI ones and defunct ones, but in truth, there will simply be no companies...
Anyways I'm sure I will get down voted by the sightless lemmings on here...
> "a broken clock is right twice a day."
The combinatorial nature of trying things randomly means that it would take millennia or longer for light-speed monkeys typing at a keyboard, or GPUs, to solve such a problem without direction.
By now, people should stop dismissing RL-trained reasoning LLMs as stupid, aimless text predictors or combiners. They wouldn’t say the same thing about high-achieving, but non-creative, college students who can only solve hard conventional problems.
Yes, current LLMs likely still lack some major aspects of intelligence. They probably wouldn’t be able to come up with general relativity on their own with only training data up to 1905.
Neither did the vast majority of physicists back then.
> Yes, current LLMs likely still lack some major aspects of intelligence.
Indeed, and so do current humans! And just like LLMs, humans are bad at keeping this fact in view.
On a more serious note, we're going to have a hard time until we can psychologically decouple the concepts of intelligence and consciousness. Like, an existentially hard time.
Yeah, they're great at interpolation - they'll just never be worth much at extrapolation.
Luckily for us, whole fortunes can be made by filling in the blanks between what we know and what we realize.
That deserves to be on a plaque somewhere.
I've been using LLMs for much the same purpose: solving problems within my field of expertise where the limiting factor is not intelligence per se, but the ability to connect the right dots from among a vast corpus of knowledge that I would never realistically be able to imbibe and remember over the course of a lifetime.
Once the dots are connected, I can verify the solutions and/or extend them in creative ways with comparatively little effort.
It really is incredible what otherwise intractable problems have become solvable as a result.
What’s your field
Paint by numbers
And by having more of those blanks filled humans might be able to come up with much better extrapolations than what we have right now.
People keep saying this, but the only ways I know of for formalizing this statement, appear to be probably false?
I don’t know what this claim is supposed to mean.
If it isn’t supposed to have a precise technical meaning, why is it using the word “interpolate”?
> "a broken clock is right twice a day"
and homo sapiens, glancing at the clock when it happens to be right, may conjure an entire zodiac to explain it.
And homo sapiens, glancing at a system that gets better and better at solving problems, tries to deny it and comes up with the broken-clock analogy.
A stopped clock.
A broken clock can be broken in ways which result in it never being correct.
Those are just analog. If it's a broken digital clock, then all bets are off.
Wait, what do you mean "LLMs are still absolutely useless at actual maths computation"? I rely on them constantly for maths (linear algebra, multivariable calc, stat) --- literally thousands of problems run through GPT5 over the last 12 months, and to my recollection zero failures. But maybe you're thinking of something more specific?
They are bad at math. But they are good at writing code and as an optimization some providers have it secretly write code to answer the problem, run it and give you the answer without telling you what it did in the middle part.
Someone should tell the mathematicians if they use a calculator or a whiteboard or heavens forbid a computer they are "bad at math".
What would I do to demonstrate that they are bad at math? If by "maths" we mean things like working out a double integral for a joint probability problem, or anything simpler than that, GPT5 has been flawless.
Are they bad at math? Or are they bad at arithmetic?
if you don't know much math, it's easy to confuse the two
Neither.
What tier are you using? I have run lots of problems and am very impressed, but I find stupid errors a lot more frequently than that, e.g., arithmetic errors buried in a derivation or a bad definition, say 1/15 times. I would love to get zero failures out of thousands of (what sounds like college-level math) posed problems.
I have a standard OpenAI/ChatGPT Pro account; GPT5 is my daily driver for math, and Claude for code.
calc, stat etc from a text book is something they would naturally be good at but I don't think book based computations thats in the training set and its extrapolations is what is at question here.
They are not great at playing chess as well - computational as well as analytic.
I think this is wrong and a category error (none of the problems I've given it are in a textbook; they're virtually all randomized), but, try this: just give me a problem to hand off to GPT5, and we'll see how it does.
Further evidence for the faultiness of your claim, if you don't want to take me up on that: I had problems off to GPT5 to check my own answers. None of the dumb mistakes I make or missed opportunities for simplification are in the book, and, again: it's flawless at pointing out those problems, despite being primed with a prompt suggesting I'm pretty sure I have the right answers.
I only have rudimentary understanding of calculus, trigonometry, Google Sheets, and astronomy, but I was able to construct an accurate spreadsheet for astrometry calculations by using Grok and Gemini (both free, no subscription, just my personal account) to surface the formulas for measuring the distance between 2-3 points on the celestial sphere. The LLMs assisted me in also writing functions to convert DMS/HMS coordinates to decimal, and work in radians as well.
I found and fixed bugs I wrote into the formulas and spreadsheets, and the LLMs were not my sole reference, but once the LLM mentioned the names of concepts and functions, I used Wikipedia for the general gist of things, and I appreciated the LLMs' relevant explanations that connected these disciplines together.
I did this on March 14, 2026
>I rely on them constantly for maths (linear algebra, multivariable calc, stat)
That's one way to waste a ton of tuition money to just have a clanker do your learning for you.
Unless you're teaching it, in which case I hope your salary is cut by whatever percentage your clanker reduces your workload.
Perhaps learning how to get AI to solve your problems is the most important lesson to learn now? The rest seems like the current equivalent of learning cursive.
The ultimate generalist
Also just the sheer value of brute force.
80 hours! 80 hours of just trying shit!
It's 80 minutes, not 80 hours.
and you can be sure mathematicians spent way more than 80 hrs on it
80 minutes! 80 minutes of just trying shit!
... shit that solved an apparently significant Erdős problem.
That is not nothing, no matter how much you hate AI.
It shows that AI is apparently very good at brute-forcing.
Are the human mathematicians who wanted to solve this problem just too stupid to brute force for 80 minutes?
This isn't brute force.
It is in the same way that educated guessing is.
Care to actually refute? Interesting that even an LLM would give an attempt at it, but apparently those who only bother to hit the downvote button aren't even meeting that level of "intelligence".
How long do you figure it’d take to solve the problem yourself?
>ChatGPT, prompted by an amateur, solves an Erdős problem.
There, fixed that for you.
WTF!?
Wake me up when it creates cancer cure or fusion reactor.
So you can move the goal post again?
It was always the same: increasing human life span, space exploration, solving energy crisis.
Big if true.
This is not a good Saturday night for humanity
My big question with all these announcements is: How many other people were using the AI on problems like this, and, failing? Given the excitement around AI at the moment I think the answer is: a lot.
Then my second question is how much VC money did all those tokens cost.
I've tried my hand at a few of the Erdos problems and came up short, you didn't hear about them. But if a Mathematician at Harvard solved on, you would probably still hear about it a bit. Just the possibility that a pro subscription for 80 minutes solved an Erdos problem is astounding. Maybe we get some researchers to get a grant and burn a couple data centers worth of tokens for a day/week/month and see what it comes up with?
The question is how many people tried to solve this Erdos problem with AI and how many total minutes have been spent on it.
Why do you care about either of those questions?
Because it could be a massive waste of time and money.
Why do you think it's a waste of time and money? I really can't see it.
Capitalism already is a poor allocator of human effort, resources, and energy, why lock in on this specifically? There's entire professions that are essentially worthless to society that exist only to perpetuate the inherent contradictions of this system, why not focus more on all that wasted human effort? Or the fact that everyone has to do some arbitrary sellable labor in order to justify their existence, rather than something they might truly enjoy or might make the world better?
> Capitalism already is a poor allocator of human effort, resources, and energy, why lock in on this specifically?
It's absolutely best allocator of human effort there is. It has some problems but compared to alternatives it's almost perfect.
No it is the best of what we know.
There’s something else out there that nobody has the imagination to personally figure it out and get alignment toward it.
It can also be true that capitalism is transitory to get to a place where much of the capital one needs is invented.
Looking around, the evidence doesn't seem to support this conclusion. 50% of food thrown away, yet people go hungry. Every privatized industry diminishes in quality and reach. Selects and optimizes for profit rather than for human need.
> Looking around, the evidence doesn't seem to support this conclusion.
It absolutely does if you look at facts and not "vibes". There are less people starving now than ever now and it's a giant, giant difference. We are tackling more and more diseases thanks to big pharma. Even semi-socialist countries such as China have opened markets. Basically the only countries that do not implement capitalist solutions are the ones you'd never want to live in such as North Korea or Cuba (funny thing - even China urged Cuba to free their markets).
I think we should at least ask the latter, if it turned out it cost $100,000 to generate this solution, I would question the value of it. Erdős problems are usually pure math curiosities AFAIK. They often have no meaningful practical applications.
Also, it's one thing if the AI age means we all have to adopt to using AI as a tool, another thing entirely if it means the only people who can do useful research are the ones with huge budgets.
Your logic undoes your point, because the kid who "solved" this technically didn't even have to invest in a degree.
America should fund tertiary education better, and that would solve even more problems.
Getting off-topic, but as a successful high-school dropout I am compelled to remind anyone reading this that [the American] college [system] is a scam.
That's not to say that there aren't benefits to tertiary education, for many people in different contexts. It's just not the golden path that it's made out to be.
Many people currently in college are just wasting their money and should enroll in trades programs instead.
Meanwhile, nothing about being in or out of school is mutually exclusive to using LLMs as a force multiplier for learning - or solving math problems, apparently.
Neither does the Collatz conjecture, Fermat's last theorem, ....
(Of course, those problems are on another plane than this one.)
But that’s exactly my point.
These are absolutely worth studying, but being what they are, nobody should be dumping massive amounts of money on them. I would not find it persuasive if researchers used LLMs to solve the Collatz conjecture or finally decode Etruscan. These are extremely valuable, but it is unlikely to be worth it for an LLM just grinding tokens like crazy to do it.
If solving even the biggest problems in pure maths is not worth it for you, then I guess we should stop all the pure maths research - researchers are getting paid much more than potential token spend, frequently for decades and they frequently work on much less important and easier problems.
Is it worth it to buy a super-yacht?
No.
Maybe... but I would love if 1% of the investment in AI were redirected to the mathematics education and professional research that would allow progress on any of these problems...
I would question at $60k. At $100k is a steal.
No meaningful, practical applications? You realize that sounds incredibly naive in the history of mathematics, right? People thought this way about number theory in general, and many other things that turned out to have quite important practical applications. Your statement is also a bit odd in that researchers are already paid throughout their whole careers to solve such problems. I don't know.
> You realize that sounds incredibly naive in the history of mathematics, right?
This is after the fact justification. You are arguing that because a thing (number theory) showed practical applications we should have dumped a lot more effort into it. There is no basis for this argument whatsoever; it also seems to involve inventing a time machine. Number theory had no practical applications until the development of public-key cryptography, but you cannot make funding decisions based on the future since it’s unknowable.
Once we get something working, sure, you can justify more aggressive investment. This is not to say that we should not invest in pie-in-the-sky ideas. We absolutely should and need to. Moonshot research or even somewhat esoteric research is vital, but the current investment in AI is so far out of the ballpark of rational. There’s an energy of a fait accompli here, except it’s still very plausible this is all unsustainable and the market implodes instead.
> Number theory had no practical applications until the development of public-key cryptography, but you cannot make funding decisions based on the future since it’s unknowable.
You are completely missing the point. The point is that we should invest in pure maths because it has always been an investment with very good ROI. The funding should be focused on what experts believe will advance pure maths more (not whether we believe that in 100 years this specific area will find some application) and that's pretty much what we are doing right now. I think it's just your anti-AI sentiment that's clouding your judgement and since AI succeeded in proving pure maths results, you are inclined to downplay it by saying that well, pure maths is worthless anyway.
Can you imagine how many bags of chips we could buy if we stopped funding cancer research?
It's so expensive!
Can you imagine how much ChatGPT cancer research we could fund if we stopped funding cancer research?
AI is my favourite weird collaborator
> He’s 23 years old and has no advanced mathematics training.
How is he even posing the question and having even a vague idea of what the proof means or how to understand it?
> “I didn’t know what the problem was—I was just doing Erdős problems as I do sometimes, giving them to the AI and seeing what it can come up with,” he says. “And it came up with what looked like a right solution.” He sent it to his occasional collaborator Kevin Barreto, a second-year undergraduate in mathematics at the University of Cambridge.
Seems like standard 23 year old behavior. You're spending $100-$200/mo on the pro subscription, and want to get your money's worth. So you burn some tokens on this legendarily hard math problem sometimes. You've seen enough wrong answers to know that this one looks interesting and pass it on to a friend that actually knows math, who is at a place where experts can recognize it as correct.
Seems like a classic example of in-expert human labeling ML output.
According to the article he was using the free ChatGpt tier at first, I til someone gifted him a Pro subscription to encourage "vibe-mathing'.
Couldn't he have just asked ChatGPT if it was correct? Why do we still feel the need to loop in a human?
Because society is run by humans, not chatpgt.
my guess would be due to having an interest in the field
Is the conjecture not trivially sound at an intuition level? It's surprising that this proof was difficult.
Scientific American going out of business next lol, weak headline. Chat GPT let's have a better headline for the God among Men that realized the capability of the new tool, many underestimate or puff up needlessly. Fun times we live in. One love all.
This just shows that with the right training, in this case a thesis on erdos problems, they where able to prompt and check the output. So still needed the know how to even being to figure it out. "Lichtman proved Erdős right as part of his doctoral thesis in 2022."
Lichtman is an expert who commented for the story. Liam Price is the one who prompted ChatGPT. "He’s 23 years old and has no advanced mathematics training."
“I didn’t know what the problem was—I was just doing Erdős problems as I do sometimes, giving them to the AI and seeing what it can come up with,” he says. “And it came up with what looked like a right solution.”
"He sent it to his occasional collaborator Kevin Barreto, a second-year undergraduate in mathematics at the University of Cambridge."
So basically two undergrads/graduates in math, "advanced" is subjective at that point.
I don't see where it says Price was an undergraduate/graduate in math.
I don't see where it doesn't say he is, I feel its implied. Another source, proves me right? https://www.newscientist.com/article/2511954-amateur-mathema...
https://archive.is/oQvO4
It's implied by "no advanced mathematics training?"
The article you linked (thanks for the unpaywalled link, by the way) describes him only as an amateur mathematician, but describes Barreto as a math student. If they were both math students, I feel it would say so?
Or perhaps you're arguing it's implicit in him having solved the problem? If so, you're just assuming your conclusion. "AI didn't prove it by itself; Price was a mathematician. Well, he must have been a mathematician to be able to prove it!"
I'm saying that it wasn't a random person who had no training in math, still miraculous achievement; just trying to show they still had to study maths to even understand how to present the problem and verify it.