Opus 4.6 (high) is doing for me things that i don't know to do myself. Moreover, I don't understand enough what it did after it did it. But it works. The domain is automated debugging and RE.
In my opinion, AI agents are currently just as capable as novice developers. Their main advantage is that they’re much faster than we are when the task involves generating a lot of code.
If the task is simple, I spend more time telling it what to do than doing it myself.
But if the task is complex, I use certain skills/commands and create intermediate files (more than necessary) between each step (analysis, planning, design, workflow, and implementation) and clear the context between each of them. The result is fairly accurate, but not perfect.
My take is, we remain the architects of our code, and AI agents are an excellent tool that we need to master.
If your project requires the solution of a tricky algorithmic issue, then is the AI system able to solve that part, or do you have to give it the solution?
The most recent logic I tried getting it to code for me was to make me some recursive C# functions to reverse navigate a node map (a Microsoft Project plan with various feeding chains) to calculate all possible paths, and return them as a list of objects.
It kept producing code that looked to eye that it might work, but each time I ran it it would just throw schoolboy exceptions. I got tired of telling it to correct the things it kept forgetting to check for (nulls, path starts, empty lists), and just coded it from scratch myself.
I find ChatGpt is like pair-programming with a junior, except I'm not getting paid to coach them like I would if it were an actual graduate hire.
keeping context is a thing that they are bad at. For now, i admit, but they are.
Given a long haul goal with instructions and everything they will reinvent the wheel four times and one of those you will get a square.
Reminds me of that monkey paw wish thing. You look at your finished app. Looks beautiful, but its inner workings are a ball of confusion.
I don't use ChatGPT, but i've been using an agent with Claude Sonnet 4. My answer may not be useful to you, but i'll talk about my experience with that and hope it may help you.
So this AI Agent... It is much faster at doing code when given specific instructions. But it keeps loosing context on architecture, and i cant really let it build complex things with interdependencies that build on each other.
At times it feels like doing pair programming with a guy who is so crazy fast that im left behind with my head spinning, wondering how we just jumped from a hello world to a working thing that would have taken me ten iterations. And i get a bad feel when i then wonder how is this app doing what it does? because my agent cant explain it, and i would be stupid to believe what it hallucinated because it sounds really solid until you scratch the construction.
At the beginning i was almost euphoric about my new friend, now im sometimes disappointed, sometimes confused, but i am learning to give better, more concise instructions, to do smaller development jumps. It is tempting to set a long haul goal and let it do.
But, i think for now, even if it is much faster at the small things, it would be also faster to build a catastrophic spaghetti code nightmare if not used with great care.
> I don't use ChatGPT, but i've been using an agent with Claude Sonnet 4.
Are you using Sonnet 4.6?
> So this AI Agent... It is much faster at doing code when given specific instructions. But it keeps loosing context on architecture, and i cant really let it build complex things with interdependencies that build on each other.
I've only built small things (< 1000 lines) with the systems, so I might be missing this problem.
Is it better than you at building small self-contained things?
> And i get a bad feel when i then wonder how is this app doing what it does? because my agent cant explain it, and i would be stupid to believe what it hallucinated because it sounds really solid until you scratch the construction.
Do you ask it to generate test suites for the things that it builds?
> it would be also faster to build a catastrophic spaghetti code nightmare if not used with great care.
i started working with this two weeks ago, so im learning as i go (or should i say stumble and fall). Weird as it may sound what i found so trustworthy at the beginning, it sounded so rational and logic as it really knew better and i liked letting it do. Obviously it dis not go so well, and i had to correct a lot.
But i am learning, what can i say?
And yes, i gave it many commandements like "thouh shalt always test before releasing" and it sounded so convincing when it confirmed what an excellent idea that was that i was surprised at least -imagine that- when something did not go as planned on prod because of , well you know...
It is better at syntax and boilerplate. It writes cleaner code than I would have. But it is absolute shit at actually designing systems, in particular if you are integrating multiple platforms and stacks.
Opus 4.6 (high) is doing for me things that i don't know to do myself. Moreover, I don't understand enough what it did after it did it. But it works. The domain is automated debugging and RE.
How much domain experience do you have? Is it helping you solve problems for paying customers?
In my opinion, AI agents are currently just as capable as novice developers. Their main advantage is that they’re much faster than we are when the task involves generating a lot of code.
If the task is simple, I spend more time telling it what to do than doing it myself. But if the task is complex, I use certain skills/commands and create intermediate files (more than necessary) between each step (analysis, planning, design, workflow, and implementation) and clear the context between each of them. The result is fairly accurate, but not perfect.
My take is, we remain the architects of our code, and AI agents are an excellent tool that we need to master.
If your project requires the solution of a tricky algorithmic issue, then is the AI system able to solve that part, or do you have to give it the solution?
They're still pretty dreadful. They're better than I was at 21, so I'd say they're good for graduate level, but nothing beyond that.
Which models + versions are you using? Can you give a specific problem that you found them to be bad at?
The most recent logic I tried getting it to code for me was to make me some recursive C# functions to reverse navigate a node map (a Microsoft Project plan with various feeding chains) to calculate all possible paths, and return them as a list of objects.
It kept producing code that looked to eye that it might work, but each time I ran it it would just throw schoolboy exceptions. I got tired of telling it to correct the things it kept forgetting to check for (nulls, path starts, empty lists), and just coded it from scratch myself.
I find ChatGpt is like pair-programming with a junior, except I'm not getting paid to coach them like I would if it were an actual graduate hire.
Your prompts are zero out of 10 quality
Learn how to prompt better you'll be fine
I think I'm doing just fine, thanks for your concern.
keeping context is a thing that they are bad at. For now, i admit, but they are.
Given a long haul goal with instructions and everything they will reinvent the wheel four times and one of those you will get a square. Reminds me of that monkey paw wish thing. You look at your finished app. Looks beautiful, but its inner workings are a ball of confusion.
I am not sure better is the metric, that answer is definitely mostly no..but faster? absolutely.
I don't use ChatGPT, but i've been using an agent with Claude Sonnet 4. My answer may not be useful to you, but i'll talk about my experience with that and hope it may help you.
So this AI Agent... It is much faster at doing code when given specific instructions. But it keeps loosing context on architecture, and i cant really let it build complex things with interdependencies that build on each other. At times it feels like doing pair programming with a guy who is so crazy fast that im left behind with my head spinning, wondering how we just jumped from a hello world to a working thing that would have taken me ten iterations. And i get a bad feel when i then wonder how is this app doing what it does? because my agent cant explain it, and i would be stupid to believe what it hallucinated because it sounds really solid until you scratch the construction.
At the beginning i was almost euphoric about my new friend, now im sometimes disappointed, sometimes confused, but i am learning to give better, more concise instructions, to do smaller development jumps. It is tempting to set a long haul goal and let it do. But, i think for now, even if it is much faster at the small things, it would be also faster to build a catastrophic spaghetti code nightmare if not used with great care.
> I don't use ChatGPT, but i've been using an agent with Claude Sonnet 4.
Are you using Sonnet 4.6?
> So this AI Agent... It is much faster at doing code when given specific instructions. But it keeps loosing context on architecture, and i cant really let it build complex things with interdependencies that build on each other.
I've only built small things (< 1000 lines) with the systems, so I might be missing this problem.
Is it better than you at building small self-contained things?
> And i get a bad feel when i then wonder how is this app doing what it does? because my agent cant explain it, and i would be stupid to believe what it hallucinated because it sounds really solid until you scratch the construction.
Do you ask it to generate test suites for the things that it builds?
> it would be also faster to build a catastrophic spaghetti code nightmare if not used with great care.
noted
i started working with this two weeks ago, so im learning as i go (or should i say stumble and fall). Weird as it may sound what i found so trustworthy at the beginning, it sounded so rational and logic as it really knew better and i liked letting it do. Obviously it dis not go so well, and i had to correct a lot. But i am learning, what can i say? And yes, i gave it many commandements like "thouh shalt always test before releasing" and it sounded so convincing when it confirmed what an excellent idea that was that i was surprised at least -imagine that- when something did not go as planned on prod because of , well you know...
Did you tell it that it should test, or did you have it generate actual tests that you could run if you wanted to?
It is better at syntax and boilerplate. It writes cleaner code than I would have. But it is absolute shit at actually designing systems, in particular if you are integrating multiple platforms and stacks.
What models + versions are you using?
Is it bad at designing systems that don't have a bunch of integrations?