Genuinely excited to try this out. I've started using Codex much more heavily in the past two months and honestly, it's been shockingly good. Not perfect mind you, but it keeps impressing me with what it's able to "get". It often gets stuff wrong, and at times runs with faulty assumptions, but overall it's no worse than having average L3-L4 engs at your disposal.
That being said, the app is stuck at the launch screen, with "Loading projects..." taking forever...
Edit: A lot of links to documentation aren't working yet. E.g.: https://developers.openai.com/codex/guides/environments. My current setup involves having a bunch of different environments in their own VMs using Tart and using VS Code Remote for each of them. I'm not married to that setup, but I'm curious how it handles multiple environments.
Edit 2: Link is working now. Looks like I might have to tweak my setup to have port offsets instead of running VMs.
I have the $20 a month subscription for ChatGPT and the $200/year subscription to Claude (company reimbursed).
I have yet to hit usage limits with Codex. I continuously reach it with Claude. I use them both the same way - hands on the wheel and very interactive, small changes and tell them both to update a file to keep up with what’s done and what to do as I test.
Codex gets caught in a loop more often trying to fix an issue. I tell it to summarize the issue, what it’s tried and then I throw Claude at it.
Claude can usually fix it. Once it is fixed, I tell Claude to note in the same file and then go back to Codex
I will say that doing small modifications or asking a bunch of stuff fills the context the same in my observations. It depends on your codebase and the rest of stuff you use (sub agents, skills, etc)
I was once minimising the changes and trying to take the max of it. I did an uncountable numbers of tests and and variations.
Didn't really matter much if I told it to do it all or change one line.
I feel Claude code tries to fill the context as fast as possible anyway
I am not sure how worth Claude is right now. I still prefer that rather than codex, but I am starting to feel that's just a bias
The trick to reach the usage limit is to run many agents in parallel. Not that it’s an explicit goal of mine but I keep thinking of this blog post [0] and then try to get Codex to do as much for me as possible in parallel
Telling a bunch of agents to do stuff is like treating it as a senior developer who you trust to take an ambiguous business requirement and letting them use their best judgment and them asking you to use their best judgement.
But doing that with AI feels like hiring an outsourcing firm for a project and they come back with an unmaintable mess that’s hard to reason through 5 weeks later.
I very much micro manage my AI agents and test and validate it. I treat it like a mid level ticket taker code monkey.
Same here. From my experience, codex usually knocks backend/highly "logical?" tasks out of the park while fairly basic front-end/UI tasks it stumbles over at times.
But overall it does seem to be consistently improving. Looking to see how this makes it easier to work with.
Hey thank you for calling out the broken link. That should be fixed now. Will make sure to track down the other broken links. We'll track down why loading is taking a while for you. Should definitely be snappier.
Is this the only announcement for Apple platform devs?
I thought Codex team tweeted about something coming for Xcode users - but maybe it just meant devs who are Apple users, not devs working on Apple platform apps...
(I work on Codex) One detail you might appreciate is that we built the app with a ton of code sharing with the CLI (as core agent harness) and the VSCode extension (UI layer), so that as we improve any of those, we polish them all.
Good question! VS Code is still a great place for deep, hands-on coding with the Codex IDE extension.
We built the Codex app to make it easier to run and supervise multiple agents across projects, let longer-running tasks execute in parallel, and keep a higher-level view of what’s happening. Would love to hear your feedback!
ok , 'projects' but this would make a lot more sense if we could connect remotely to the projects which works without a problem using the IDE plugin, so right now I don't see any advantage of using this
Looks like another Claude App/Cowork-type competitor with slightly different tradeoffs (Cowork just calls Claude Code in a VM, this just calls Codex CLI with OS sandboxing).
Here's the Codex tech stack in case anyone was interested like me.
Maybe a dumb question on my side; but if you are using a GUI like emdash with Claude Code, are you getting the full claude code harness under the hood or are you "just" leveraging the model ?
I can answer for Conductor: you're getting the full Claude Code, it's just a GUI wrapper on top of CC. It makes it easy to create worktrees (1 click) and manage them.
The landing page for the demo game "Voxel Velocity" mentions "<Enter> start" at the bottom, but <Enter> actually changes selection. One would think that after 7mm tokens and use of a QA agent, they would catch something like this.
It's interesting, isn't it? On the one hand the game is quite impressive. Although it doesn't have anything particularly novel (and it shouldn't, given the prompt), it still would have taken me several days working nonstop. On the other hand, there's plenty of paper cuts.
I think these subtle issues are just harder to provide a "harness" for, like a compiler or rigorous test suite that lets the LLM converge toward a good (if sometimes inelegant) solution. Probably a finer-tuned QA agent would have changed the final result.
I'm a Claude Code user primarily. The best UI based orchestrator I've used is Zenflow by Zencoder.ai -- I am in no way affiliated with them, but their UI / tool can connect to any model or service you have. They offer their own model but I've not used it.
What I like is that the sessions are highly configurable from their plan.md which translates a md document into a process. So you can tweak and add steps. This is similar to some of the other workflow tools I've seen around hooks and such -- but presented in a way that is easy for me to use. I also like that it can update the plan.md as it goes to dynamically add steps and even add "hooks" as needed based on the problem.
Always sounds so interesting and then I do a search only to found out it's another product trying to sell you your 20th "AI credit package." I really don't see how these apps will last that long. I pay for the big three already - and no I don't want to cancel them just so I can use your product.
Google also has aistudio.google.com which is Lovable competitor and its free for unlimited use. That seems to work so much better than gemini CLI even on similar tasks
I am glad to not depend on AI. It would annoy me to no ends how it tries to assimilate everything. It's like systemd on roids in this aspect. It will swallow up more and more tasks. Granted, in a way this is saying "then it was not necessary to have this things anymore now that AI solves it all", but I am skeptical of "the praised land" here. Skynet was not trusted back in 1982 or so. I don't trust AI either.
I feel the same way about using the Internet or books to code. I'd rather just have the source code so that I'm not dependent on anything other then my own brain.
To me, the obvious next step for these companies is to integrate their products with web hosting. At this point, the remaining hurdle for non-developers is deploying their creations to the cloud with built-in monetization.
Just tell it to use your gcp/aws account using the cli, makes it infinitely powerful in terms of deployment. (Also, while I might miss some parts of programming that I have given to AI, I certainly don't miss working with clouds).
- looks like OpenAIs answer to Claude Code Desktop / Cowork
- workspace agent runner apps (like Conductor) get more and more obsolete
- "vibe working" is becoming a thing - people use folder based agents to do their work (not just coding)
- new workflows seem to be evolving into folder based workspaces, where agents can self-configure MCP servers and skills + memory files and instructions
kinda interested to see if openai has the ideas & shipping power to compete with anthropic going forward; anthropic does not only have an edge over openai because of how op their models are at coding, but also because they innovate on workflows and ai tooling standards; openai so far has only followed in adoption (mcp, skills, now codex desktop) but rarely pushed the SOTA themselves.
my guess is that openai/anthropic employees work on macOS and mostly vibe code these new applications (be it Atlas browser or now Codex Desktop); i wouldn't be surprised if Codex Desktop was built in a month or less;
linux / windows requires extra testing as well as some adjustments to the software stack (e.g. liquid glass only works on mac); to get the thing out the door ASAP, they release macos first.
Looks like they forgot the part of the code editor where you can… edit code. Claude Code in Zed is about the most optimal experience I can imagine. I want the agent on the side and a code editor in the middle.
That’s not really a negative for me as I can easily jump into vscode where I already have my workspace for coding set up exactly as I like it. This being a completely separate app just to get the agentic work right is a good direction imo
Yeah but its annoying to find the file the agent just edited without any IDE/editor integration, you have to add that command which opens the file in vscode after editing.
So many of the things that pioneered the way for the truly good (Claude, Gemini) to evolve. I am thankful for what they have done.
But the quality is gone, and they are now in catch-up mode. This is clear, not just from the quality of GPT-5.x outputs, but from this article.
They launch something new, flashy, should get the attention of all of us. And yet, they only launch to Apple devices?
Then, there are typos in the article. Again. I can't believe they would be sloppy about this with so much on the line. EDIT: since I know someone will ask, couple of examples - "7MM Tokens", "...this prompt initial prompt..."
And why are they not giving the full prompt used for these examples? "...that we've summarized for clarity" but we want to see the actual prompt. How unclear do we need to make our prompts to get to the level that you're showing us? Slight red flag there.
Anyway, good luck to them, and I hope it improves! Happy to try it out when it does, or at the very least, when it exists for a platform I own.
Not sure when you last evaluated the tools, but I strongly prefer Codex to Claude Code and Gemini.
Codex gets complex tasks right and I don't keep hitting usage limits constantly. (this is comparing the 20$ ChatGPT to the 200$ Claude Pro Max plans fwiw)
The tooling around ChatGPT and Codex is less, but their models are far more dependable imo than Antropic's at this very moment.
Guess MacOS gives you pass for early-access stuff, right? /s
From a developer's perspective it makes sense, though. You can test experimental stuff where configurations are almost the same in terms of OS and underlying hardware, so no weird, edge-case bugs at this stage.
Hi! Romain here, I work on Codex at OpenAI. We totally hear you. The team actually built the app in Electron specifically so we can support Windows and Linux as well. We shipped macOS first, but Windows is coming very soon. Appreciate you calling this out. Stay tuned!
Only thing i'd add re windows is it's taking us some time to get really solid sandboxing working on Windows, where there are fewer OS-level primitives for it. There's some more at https://developers.openai.com/codex/windows and we'd love help with testing and feedback to make it robust.
Going cross platform doesn’t sound the main reason (or I hope not). For a company that size, is it really hard to hire specialised small team?! It would be a good show case for their Codex too
When you're a trillion dollar company that burns more coal than Bangladesh in order to harness a hyperintelligent Machine God to serve your whims, you don't have the resources to maintain native clients for three separate targets.
Electron? Why can't Codex write, or at least translate, your application to native code instead of using a multi-hundred-mb browser wrapper to display text? Is this the future of software engineering Codex is promising me?
If you were going to release a product for developers as soon as it was ready for developers to try, such that you could only launch on one platform and then follow up later with the rest, macOS is the obvious choice. There's nothing contemptuous about that.
> For a limited time, Codex will also be available to ChatGPT Free and Go users to help build more with agents. We’re also doubling rate limits for existing Codex users across all paid plans during this period.
Is there more information about it? For how long and what are the limits?
This looks interesting and I use Codex a fair bit already in vscode etc, but I'm having trouble leaving a 'code editor with AI' to an environment that sort of looks like it puts the code as a hidden secondary artefact. I guess the key thing is the multi agent spinning plates part.
I find that the case too. For more complex things my future ask would be something that perhaps formalized verification/testing into the AI dev cycle? My confidence in not needing to see code is directly proportional in my level of comfort in test coverage (even if quite high level UI/integration mechanisms rather than 1 != 0 unit stuff)
> "Localize my app and add the option to change units"
To me this still feels like the wrong way to interact with a coding agent. Does this lead people to success? I've never seen it not go off the rails in some way unless you provide clear boundaries as to what the scope of the expected change is. It's gonna write code if you don't even want it to yet, it's gonna write the test first or the logic first, whichever you don't want it to do. It'll be much too verbose or much too hacky, etc.
The better models can handle that prompt assuming there is an existing clean codebase and the scope of the task is not too large. The existing code can act as an implicit boundary.
Weaker models give your experience, or when using a 100% LLM codebase I think it can end up in a hall of mirrors.
Now I have an idea to try, have a 2nd LLM processing pass that normalizes the vibe-code to some personal style and standard to break it out of the Stack Overflow snippet maze it can get itself in.
I've had no issues with prompts like that. I use Cursor with their plan mode, so I get a nice markdown file to iterate on or edit myself before it actually does anything.
Until a few days ago (when I switched to Codex), I would have agreed. My workflow was "thoroughly written issues" -> plan -> implement. Without the plan step, there is a high likelyhood that Claude Code (both normal or with GLM-4.7) or Cursor drift off in a wrong direction.
With Codex, I increasingly can skip the plan step, and it just toils along until it has finished the issue. It can be more "lazy" at times and ask before going ahead more often, but usually in a reasonable scope (and sometimes at points where I think other services would have gone ahead on a wrong tangent and burnt more tokens of their more limited usage).
I wouldn't be surprised that with the next 1-2 model iterations a plan step won't be worth the effort anymore, given a good enough initial written issue.
I still use tons of non-plan mode edits with cursor too. The example prompt above I'd plan it out first just to make sure it does it in a way I want since I personally know there are tons of ways to implement it. But for simple changes or when I don't want a plan on purpose I just use a normal agent.
These paid offerings geared toward software development must be a hell of a lot "smarter" than the regular chatbots. The amount of nonsense and bad or outright wrong code Gemini and ChatGPT throw at me lately is off the charts. I feel like they are getting dumber.
I don’t understand why we are getting these software products that want to have vendor lock in when the underlying system isn’t being improved. I prefer Claude code right now because it’s a better product . Gemini just has a weird context window that poisons the rest of the code generated (when online) ChatGPT Codex vs Claude I feel that Claude is a better product and I don’t use enough tokens to for Claude Pro at $100 and just have a regular ChatGPT subscription for productivity tasks .
> I don’t understand why we are getting these software products that want to have vendor lock in when the underlying system isn’t being improved.
I think that it's clear now that the pace of model improvements is asymptotic (or at least it's reached a local maxima) and the model itself provides no moat. (Every few weeks last year, the perception of "the best model" changed, based on basically nothing other than random vibes and hearsay.)
As a result, the labs are starting to focus on vertical integration (that is, building up the product stack) to deepen their moat.
I'm excited to try this out, it seems like it would solve a lot of my workflow issues. I hope there is the ability to review/edit research docs and plans it generates and not just code.
I really look forward to using this. I tried Codex first time yesterday and it was able to complete a task (i.e. drawing Penrose tilings) that Claude Code previously failed at. Also a little overwhelmed by all the new features that this app brings. I feel that I'm behind all the fancy new tools.
i've been using ai vibe coding tools since Copilot was basically spicy autocomplete, and this feels like the next obvious step: less “help me type” and more “please do this while I watch nervously.” The agent model sounds powerful, but in practice it’s still a lot of supervision, retries, and quiet hope it doesn’t hallucinate itself into a refactor I didn’t ask for.
Wow, this is nearly an exact copy of Codex Monitor[1]: voice mode, project + threads/agents, git panel, PR button, terminal drawer, IDE integrations, local/worktree/cloud edits, archiving threads, etc.
Antigravity is a white labeled $2B pork of Windsurf, so it really starts there, but maybe someone knows what windsurf derived from to keep the chain going?
I have both Codex Monitor and this new Codex app open side by side right now; aside from the theme, I struggle to tell them apart. Antigravity's Agent Manager is obviously different, but these two are twins.
I have a very hard time getting worked up over this. There are a ton of entrants in this category, they all generally look the same. Cribbing features seems par for the course.
Does anybody know when Codex is going to roll out subagent support? That has been an absolute game changer in Claude Code. It lets me run with a single session for so much longer and chip away at much more complex tasks. This was my biggest pain point when I used Codex last week.
It’s possible to run up to 4 agents at once vs. Claude Code’s single thread. Sometimes I’ll find meaningful quality differences between what agents produce.
Maybe it's because I'm not used to the flow, but I prefer to work directly on the machine where I'm logged in via ssh, instead of working "somewhere in a git tree", and then have to deploy/test/etc.
Once this app (or a similar app by Anthropic) will allow me to have the same level of "orchestration" but on a remote machine, I'll test it.
I typically bounce between Claude Code and Codex for the same project, and generally enjoy using both to check each other.
One cool thing about this: upon installing it immediately found all previous projects I've used with Codex and has those projects in the sidebar with all of the "threads" (sessions) I've had with Codex on these projects!
This does look like it would simplify some aspects of using Codex on Mac, however, when I first saw the headline I thought this was going to be a phone app. And that started running a whole list of ideas through my brain... :(
But overall, looks very nice and I'm looking forward to giving it a try.
I don't know why any frontier model lab can't ship a mobile app that doesn't use a cloud VM but is able to connect to your laptop/server and work against local files on there when on the same network (e.g.: on TailScale). Or even better act as a remote control for a harness running on that remote device, so you couldn't seamlessly switch between phone and laptop/server.
I mean if they were targeting "software engineers" in general then Windows would be the obvious choice in 2026 as much as in 2006. But these early releases are all about the SF bubble where Mac is very much dominant.
I'm managing context with codex inside VSCode using different threads. I'm trying to figure out if there are use cases where I'd rather be in this app.
they are all copies of each other. Did you expect them to build something completely new? Software Development is stuck in an AI hole where we only build AI features.
In the end this and all other 89372304 AI projects are just OpenAPI/Anthropic API wrappers, but at least one has 1st party support which maybe gives it a slight advantage?
Having dictation and worktree support built in is nice. Currently there is a whole ecosystem of tools implementing similar functionality for Claude Code. The automations look cool too!
looks like the same framework they used to build chatgpt desktop (electron)
edit - from another comment:
> Hi! Romain here, I work on Codex at OpenAI. We totally hear you. The team actually built the app in Electron specifically so we can support Windows and Linux as well. We shipped macOS first, but Windows is coming very soon. Appreciate you calling this out. Stay tuned!
It keeps offering me to "Get Plus" even though I am signed and already have a Plus plan.
Codex really grown on me lately. I re-signed to try it out on a project I have and it turned out to be really great addition to my toolkit.
It isn't always perfect and it's cli (how I mostly use it) isn't as sophisticated as OpenCode which is my default.
I am happy with this app, I am using Superset, terminal app which suprisingly is well positioned to help you if you work in cli like I do. But like I said, new desktop app seems like a solid addition.
But you can already do that, in the terminal. Open your favourite terminal, use splits or tmux and spin up as many claude code or codex instances as you want. In parallel. I do it constantly. For all kinds of tasks, not only coding.
The inclusion of a live vibe-coded game on the webpage is fun, except the game barely works and it's odd they didn't attempt any polish/QA for what is ostensibly a PR announcement. It just adds more fuel to the fire to the argument that vibecoding results in AI slop.
I agree, if it had been polished I would have not trusted the demo at all, the fact it shows what you can potentially expect from a one-shot is cooler.
I really want to like the native Mac app aesthetic but I kinda hate it. It screams minimalist but also clearly tells me it’s not meant for a power user. That ruggedness and sensitivity is missing.
> We're also excited to show more people what's now possible with Codex . For a limited time we're including Codex with ChatGPT Free and Go, and we're doubling the rate limits on Plus, Pro, Business, Enterprise, and Edu plans.
Translated from Marketingspeak, this is presumably "we're also desperate for some people to actually use it because everyone shrugged and went back to Claude Code when we released it".
I dunno, feels like the models have different weak/strong points, sometimes I can sit with Claude Code for an hour with some issue, try it with Codex and have it solved in five minutes, and also the opposite happens. I tend to use Codex mostly when I care more about correctness and not missing anything, Claude when it's more important I do it fast and I know exactly what it needs to do, Codex seems to require less hand-holding. Of course, just anecdotal.
GPT models definitely seem stronger when they "get it" and in the types of problems they "get", while claude seems more holistic but not "as smart" as some of the spikes GPT can get.
Given the prevalence of Opencode and its ability to use any model and provider I don't see reason why would anyone bother with random vendors half-assed tools.
For starters, money. There is no better value out there that I'm aware of than Claude Code Max. Claude Code also just works way better than Opencode, in my experience. Though I know there are those that have experienced the exact opposite.
Genuinely excited to try this out. I've started using Codex much more heavily in the past two months and honestly, it's been shockingly good. Not perfect mind you, but it keeps impressing me with what it's able to "get". It often gets stuff wrong, and at times runs with faulty assumptions, but overall it's no worse than having average L3-L4 engs at your disposal.
That being said, the app is stuck at the launch screen, with "Loading projects..." taking forever...
Edit: A lot of links to documentation aren't working yet. E.g.: https://developers.openai.com/codex/guides/environments. My current setup involves having a bunch of different environments in their own VMs using Tart and using VS Code Remote for each of them. I'm not married to that setup, but I'm curious how it handles multiple environments.
Edit 2: Link is working now. Looks like I might have to tweak my setup to have port offsets instead of running VMs.
I have the $20 a month subscription for ChatGPT and the $200/year subscription to Claude (company reimbursed).
I have yet to hit usage limits with Codex. I continuously reach it with Claude. I use them both the same way - hands on the wheel and very interactive, small changes and tell them both to update a file to keep up with what’s done and what to do as I test.
Codex gets caught in a loop more often trying to fix an issue. I tell it to summarize the issue, what it’s tried and then I throw Claude at it.
Claude can usually fix it. Once it is fixed, I tell Claude to note in the same file and then go back to Codex
I will say that doing small modifications or asking a bunch of stuff fills the context the same in my observations. It depends on your codebase and the rest of stuff you use (sub agents, skills, etc)
I was once minimising the changes and trying to take the max of it. I did an uncountable numbers of tests and and variations. Didn't really matter much if I told it to do it all or change one line. I feel Claude code tries to fill the context as fast as possible anyway
I am not sure how worth Claude is right now. I still prefer that rather than codex, but I am starting to feel that's just a bias
The trick to reach the usage limit is to run many agents in parallel. Not that it’s an explicit goal of mine but I keep thinking of this blog post [0] and then try to get Codex to do as much for me as possible in parallel
[0]: http://theoryofconstraints.blogspot.com/2007/06/toc-stories-...
Telling a bunch of agents to do stuff is like treating it as a senior developer who you trust to take an ambiguous business requirement and letting them use their best judgment and them asking you to use their best judgement.
But doing that with AI feels like hiring an outsourcing firm for a project and they come back with an unmaintable mess that’s hard to reason through 5 weeks later.
I very much micro manage my AI agents and test and validate it. I treat it like a mid level ticket taker code monkey.
I have a found Codex to be an exceptional code-reviewer of Claude's work.
Same here. From my experience, codex usually knocks backend/highly "logical?" tasks out of the park while fairly basic front-end/UI tasks it stumbles over at times.
But overall it does seem to be consistently improving. Looking to see how this makes it easier to work with.
Hey thank you for calling out the broken link. That should be fixed now. Will make sure to track down the other broken links. We'll track down why loading is taking a while for you. Should definitely be snappier.
Is this the only announcement for Apple platform devs?
I thought Codex team tweeted about something coming for Xcode users - but maybe it just meant devs who are Apple users, not devs working on Apple platform apps...
Cool, looks like I'll stay on Cursor. All alternatives come out buggy, they care a lot about developer experience.
BTW OpenAI should think a bit about polishing their main apps instead of trying to come out with new ones while the originals are still buggy.
(I work on Codex) One detail you might appreciate is that we built the app with a ton of code sharing with the CLI (as core agent harness) and the VSCode extension (UI layer), so that as we improve any of those, we polish them all.
Any chance you'll enable remote development on a self-hosted machine with this app?
Ie. I think the codex webapp on a self-hosted machine would be great. This is impotant when you need a beefier machine (with potentially a GPU).
Working remotely with the app would truly be great
Interested in this as well.
Any reason to switch from vscode with codex to this app? To me it looks like this app is more for non-developers but maybe I’m missing something
Good question! VS Code is still a great place for deep, hands-on coding with the Codex IDE extension.
We built the Codex app to make it easier to run and supervise multiple agents across projects, let longer-running tasks execute in parallel, and keep a higher-level view of what’s happening. Would love to hear your feedback!
ok , 'projects' but this would make a lot more sense if we could connect remotely to the projects which works without a problem using the IDE plugin, so right now I don't see any advantage of using this
Looks like another Claude App/Cowork-type competitor with slightly different tradeoffs (Cowork just calls Claude Code in a VM, this just calls Codex CLI with OS sandboxing).
Here's the Codex tech stack in case anyone was interested like me.
Framework: Electron 40.0.0
Frontend:
- React 19.2.0
- Jotai (state management)
- TanStack React Form
- Vite (bundler)
- TypeScript
Backend/Main Process:
- Node.js
- better-sqlite3 (local database)
- node-pty (terminal emulation)
- Zod (validation)
- Immer (immutable state)
Build & Dev:
- pnpm (package manager)
- Electron Forge
- Vitest (testing)
- ESLint + Prettier
Native/macOS:
- Sparkle (auto-updates)
- Squirrel (installer)
- electron-liquid-glass (macOS vibrancy effects)
- Sentry (error tracking)
The use of the name Codex and the focus on diffs and worktrees suggests this is still more dev-focused than Cowork.
They have the same stack of a boot camper, quite telling.
It's basically what Emdash (https://www.emdash.sh/), Conductor (https://www.conductor.build/) & CO have been building but as first class product from OpenAI.
Begs the question if Anthropic will follow up with a first-class Claude Code "multi agent" (git worktree) app themselves.
https://code.claude.com/docs/en/desktop
oh i didn't know that claude code has a desktop app already
It isn’t its own app, but it’s built in to their desktop, mobile and web apps.
And it uses worktrees.
Maybe a dumb question on my side; but if you are using a GUI like emdash with Claude Code, are you getting the full claude code harness under the hood or are you "just" leveraging the model ?
I can answer for Conductor: you're getting the full Claude Code, it's just a GUI wrapper on top of CC. It makes it easy to create worktrees (1 click) and manage them.
yeah, I wanted a better terminal for operating many TUI agent's at once and none of these worked because they all want to own the agent.
I ended up building a terminal[0] with Tauri and xterm that works exactly how I want.
0 - screenshot: https://x.com/thisritchie/status/2016861571897606504?s=20
Emdash is inducing CC, Codex, etc. natively. Therefore users are getting the raw version of each agent.
They have Claude Code web in research preview
The landing page for the demo game "Voxel Velocity" mentions "<Enter> start" at the bottom, but <Enter> actually changes selection. One would think that after 7mm tokens and use of a QA agent, they would catch something like this.
It's interesting, isn't it? On the one hand the game is quite impressive. Although it doesn't have anything particularly novel (and it shouldn't, given the prompt), it still would have taken me several days working nonstop. On the other hand, there's plenty of paper cuts.
I think these subtle issues are just harder to provide a "harness" for, like a compiler or rigorous test suite that lets the LLM converge toward a good (if sometimes inelegant) solution. Probably a finer-tuned QA agent would have changed the final result.
I'm a Claude Code user primarily. The best UI based orchestrator I've used is Zenflow by Zencoder.ai -- I am in no way affiliated with them, but their UI / tool can connect to any model or service you have. They offer their own model but I've not used it.
What I like is that the sessions are highly configurable from their plan.md which translates a md document into a process. So you can tweak and add steps. This is similar to some of the other workflow tools I've seen around hooks and such -- but presented in a way that is easy for me to use. I also like that it can update the plan.md as it goes to dynamically add steps and even add "hooks" as needed based on the problem.
Always sounds so interesting and then I do a search only to found out it's another product trying to sell you your 20th "AI credit package." I really don't see how these apps will last that long. I pay for the big three already - and no I don't want to cancel them just so I can use your product.
Aren't there 500+ aggregator services?
This is the 5th OpenAI product called Codex if I'm counting correctly
Bit of a buried lede:
> For a limited time we're including Codex with ChatGPT Free
Is this the first free frontier coding agent? (I know there have been OSS coding agents for years, but not Codex/Claude Code.)
That depends on whether Gemini CLI counts. I've had generally bad experiences with it, but it is free for at least some usage.
Google also has aistudio.google.com which is Lovable competitor and its free for unlimited use. That seems to work so much better than gemini CLI even on similar tasks
No.
I am glad to not depend on AI. It would annoy me to no ends how it tries to assimilate everything. It's like systemd on roids in this aspect. It will swallow up more and more tasks. Granted, in a way this is saying "then it was not necessary to have this things anymore now that AI solves it all", but I am skeptical of "the praised land" here. Skynet was not trusted back in 1982 or so. I don't trust AI either.
I'm the same way but I've got the gloomy sense that folks like us are about to be swept aside by the flood if we don't "adapt."
I got invites to seven AI-centered meetings late last week.
I feel the same way about using the Internet or books to code. I'd rather just have the source code so that I'm not dependent on anything other then my own brain.
Good luck.
To me, the obvious next step for these companies is to integrate their products with web hosting. At this point, the remaining hurdle for non-developers is deploying their creations to the cloud with built-in monetization.
Just tell it to use your gcp/aws account using the cli, makes it infinitely powerful in terms of deployment. (Also, while I might miss some parts of programming that I have given to AI, I certainly don't miss working with clouds).
I dont think these are made for non-devs, Lovable and other which are built for non-devs already provide hosting.
interestingly opencode's first product was an IaC platform... seems to be where this is all going.
- looks like OpenAIs answer to Claude Code Desktop / Cowork
- workspace agent runner apps (like Conductor) get more and more obsolete
- "vibe working" is becoming a thing - people use folder based agents to do their work (not just coding)
- new workflows seem to be evolving into folder based workspaces, where agents can self-configure MCP servers and skills + memory files and instructions
kinda interested to see if openai has the ideas & shipping power to compete with anthropic going forward; anthropic does not only have an edge over openai because of how op their models are at coding, but also because they innovate on workflows and ai tooling standards; openai so far has only followed in adoption (mcp, skills, now codex desktop) but rarely pushed the SOTA themselves.
Also interesting that they are both only for macOS. I’m feeling a bit left out on the Windows and Linux side, but this seems like an ongoing trend.
my guess is that openai/anthropic employees work on macOS and mostly vibe code these new applications (be it Atlas browser or now Codex Desktop); i wouldn't be surprised if Codex Desktop was built in a month or less;
linux / windows requires extra testing as well as some adjustments to the software stack (e.g. liquid glass only works on mac); to get the thing out the door ASAP, they release macos first.
Looks like they forgot the part of the code editor where you can… edit code. Claude Code in Zed is about the most optimal experience I can imagine. I want the agent on the side and a code editor in the middle.
That’s not really a negative for me as I can easily jump into vscode where I already have my workspace for coding set up exactly as I like it. This being a completely separate app just to get the agentic work right is a good direction imo
Yeah but its annoying to find the file the agent just edited without any IDE/editor integration, you have to add that command which opens the file in vscode after editing.
It would be nice to have an integrated development environment.
Usage like this is becoming a rarity. Most people are editing significantly less and "agent interfaces" are slowly taking the focus.
"most" people aren't even using AI yet
Of those that are, most are not vibe coding, so an editor is still required at many points
OpenAI, ChatGPT, Codex
So many of the things that pioneered the way for the truly good (Claude, Gemini) to evolve. I am thankful for what they have done.
But the quality is gone, and they are now in catch-up mode. This is clear, not just from the quality of GPT-5.x outputs, but from this article.
They launch something new, flashy, should get the attention of all of us. And yet, they only launch to Apple devices?
Then, there are typos in the article. Again. I can't believe they would be sloppy about this with so much on the line. EDIT: since I know someone will ask, couple of examples - "7MM Tokens", "...this prompt initial prompt..."
And why are they not giving the full prompt used for these examples? "...that we've summarized for clarity" but we want to see the actual prompt. How unclear do we need to make our prompts to get to the level that you're showing us? Slight red flag there.
Anyway, good luck to them, and I hope it improves! Happy to try it out when it does, or at the very least, when it exists for a platform I own.
Not sure when you last evaluated the tools, but I strongly prefer Codex to Claude Code and Gemini.
Codex gets complex tasks right and I don't keep hitting usage limits constantly. (this is comparing the 20$ ChatGPT to the 200$ Claude Pro Max plans fwiw)
The tooling around ChatGPT and Codex is less, but their models are far more dependable imo than Antropic's at this very moment.
How about us, Linux users? This is Mac only. Do they plan to support CLI version with all the features they are adding to desktop app?
Hi! Romain here, I work at OpenAI. The team actually built the Codex app in Electron so we can support both Windows and Linux very soon. Stay tuned!
Let me guess, you use MacOS yourself?
not only is it mac only, it appears to be arm only as well. App won't launch on my intel mac
Guess MacOS gives you pass for early-access stuff, right? /s
From a developer's perspective it makes sense, though. You can test experimental stuff where configurations are almost the same in terms of OS and underlying hardware, so no weird, edge-case bugs at this stage.
This is an ode to opencode and how openai, very strangely, is just porting layout and feature of real open-source.
So much valuation, so much intern competetion and shenanigans than the creatives left.
Mac only. Again.
Apple is great but this is OpenAI devs showing their disconnect from the mainstream. Its complacent at best, contemptuous at worst.
SamA or somebody really needs to give the product managers here a kick up the arse.
Hi! Romain here, I work on Codex at OpenAI. We totally hear you. The team actually built the app in Electron specifically so we can support Windows and Linux as well. We shipped macOS first, but Windows is coming very soon. Appreciate you calling this out. Stay tuned!
Only thing i'd add re windows is it's taking us some time to get really solid sandboxing working on Windows, where there are fewer OS-level primitives for it. There's some more at https://developers.openai.com/codex/windows and we'd love help with testing and feedback to make it robust.
Curios why electron not native?
Wouldn’t native give better performance and more system integration?
He literally says why electron in his comment that you are replying to
Going cross platform doesn’t sound the main reason (or I hope not). For a company that size, is it really hard to hire specialised small team?! It would be a good show case for their Codex too
When you're a trillion dollar company that burns more coal than Bangladesh in order to harness a hyperintelligent Machine God to serve your whims, you don't have the resources to maintain native clients for three separate targets.
Electron? Why can't Codex write, or at least translate, your application to native code instead of using a multi-hundred-mb browser wrapper to display text? Is this the future of software engineering Codex is promising me?
Windows is almost ready. It's already running but we are solving a few more things before the release to make sure it works well.
If you were going to release a product for developers as soon as it was ready for developers to try, such that you could only launch on one platform and then follow up later with the rest, macOS is the obvious choice. There's nothing contemptuous about that.
> For a limited time, Codex will also be available to ChatGPT Free and Go users to help build more with agents. We’re also doubling rate limits for existing Codex users across all paid plans during this period.
Is there more information about it? For how long and what are the limits?
no
This looks interesting and I use Codex a fair bit already in vscode etc, but I'm having trouble leaving a 'code editor with AI' to an environment that sort of looks like it puts the code as a hidden secondary artefact. I guess the key thing is the multi agent spinning plates part.
(I work on Codex) I think for us the big unlock was GPT-5.2 and GPT-5.2-Codex, where we found ourselves needing to make many fewer manual edits.
I find that the case too. For more complex things my future ask would be something that perhaps formalized verification/testing into the AI dev cycle? My confidence in not needing to see code is directly proportional in my level of comfort in test coverage (even if quite high level UI/integration mechanisms rather than 1 != 0 unit stuff)
> "Localize my app and add the option to change units"
To me this still feels like the wrong way to interact with a coding agent. Does this lead people to success? I've never seen it not go off the rails in some way unless you provide clear boundaries as to what the scope of the expected change is. It's gonna write code if you don't even want it to yet, it's gonna write the test first or the logic first, whichever you don't want it to do. It'll be much too verbose or much too hacky, etc.
The better models can handle that prompt assuming there is an existing clean codebase and the scope of the task is not too large. The existing code can act as an implicit boundary.
Weaker models give your experience, or when using a 100% LLM codebase I think it can end up in a hall of mirrors.
Now I have an idea to try, have a 2nd LLM processing pass that normalizes the vibe-code to some personal style and standard to break it out of the Stack Overflow snippet maze it can get itself in.
I've had no issues with prompts like that. I use Cursor with their plan mode, so I get a nice markdown file to iterate on or edit myself before it actually does anything.
100%
First phase: Plan. Mandatory to complete, as well as get AI feedback from a separate context or model. Iterate until complete.
Only then move on to the Second Phase: make edits.
Better planning == Better execution
Until a few days ago (when I switched to Codex), I would have agreed. My workflow was "thoroughly written issues" -> plan -> implement. Without the plan step, there is a high likelyhood that Claude Code (both normal or with GLM-4.7) or Cursor drift off in a wrong direction.
With Codex, I increasingly can skip the plan step, and it just toils along until it has finished the issue. It can be more "lazy" at times and ask before going ahead more often, but usually in a reasonable scope (and sometimes at points where I think other services would have gone ahead on a wrong tangent and burnt more tokens of their more limited usage).
I wouldn't be surprised that with the next 1-2 model iterations a plan step won't be worth the effort anymore, given a good enough initial written issue.
I still use tons of non-plan mode edits with cursor too. The example prompt above I'd plan it out first just to make sure it does it in a way I want since I personally know there are tons of ways to implement it. But for simple changes or when I don't want a plan on purpose I just use a normal agent.
And then
> gh-address-comments address comments
Inspiring stuff. I would love to be the one writing GH comments here. /s
But maybe there's a complementary gh-leave-comments to have it review PRs for you too.
These paid offerings geared toward software development must be a hell of a lot "smarter" than the regular chatbots. The amount of nonsense and bad or outright wrong code Gemini and ChatGPT throw at me lately is off the charts. I feel like they are getting dumber.
I don’t understand why we are getting these software products that want to have vendor lock in when the underlying system isn’t being improved. I prefer Claude code right now because it’s a better product . Gemini just has a weird context window that poisons the rest of the code generated (when online) ChatGPT Codex vs Claude I feel that Claude is a better product and I don’t use enough tokens to for Claude Pro at $100 and just have a regular ChatGPT subscription for productivity tasks .
> I don’t understand why we are getting these software products that want to have vendor lock in when the underlying system isn’t being improved.
I think that it's clear now that the pace of model improvements is asymptotic (or at least it's reached a local maxima) and the model itself provides no moat. (Every few weeks last year, the perception of "the best model" changed, based on basically nothing other than random vibes and hearsay.)
As a result, the labs are starting to focus on vertical integration (that is, building up the product stack) to deepen their moat.
It’s the inconsistency that gets me. Very similar tasks, similar complexity, same code base, same prompting:
Session A knocks it out of the park. Chef’s kiss.
Session B just does some random vandalism.
I'm excited to try this out, it seems like it would solve a lot of my workflow issues. I hope there is the ability to review/edit research docs and plans it generates and not just code.
I really look forward to using this. I tried Codex first time yesterday and it was able to complete a task (i.e. drawing Penrose tilings) that Claude Code previously failed at. Also a little overwhelmed by all the new features that this app brings. I feel that I'm behind all the fancy new tools.
i've been using ai vibe coding tools since Copilot was basically spicy autocomplete, and this feels like the next obvious step: less “help me type” and more “please do this while I watch nervously.” The agent model sounds powerful, but in practice it’s still a lot of supervision, retries, and quiet hope it doesn’t hallucinate itself into a refactor I didn’t ask for.
Genuinely curious if people would just let this rip with no obvious isolation?
I’m aware Mac OS has some isolation/sandboxes but without running codex via docker I wouldn’t be running codex.
(Appreciate there are still risks)
Shameless plug, but you can sandbox codex cli without a container using my macOS app: https://multitui.com
(I work on Codex) We have a robust sandbox for macOS and Linux. Not quite yet for Windows, but working on that! Docs: https://developers.openai.com/codex/security
I’ve been using codex regularly and it’s pretty good at model extra high with pretty generous context.
From the video, I can see how this app would be useful in:
- Creating branches without having to open another terminal, or creating a new branch before the session.
- Seeing diff in the same app.
- working on multiple sessions at once without switching CLI
- I quite like the “address the comments”, I can see how this would be valuable
I will give it a try for sure
Wow, this is nearly an exact copy of Codex Monitor[1]: voice mode, project + threads/agents, git panel, PR button, terminal drawer, IDE integrations, local/worktree/cloud edits, archiving threads, etc.
[1] https://github.com/Dimillian/CodexMonitor
Codex Monitor seems like an Antigravity Agent Manager clone. It came out after, too.
Bunch of the features u listed were already in the codex extension too. False outrage it its finest.
Antigravity is a white labeled $2B pork of Windsurf, so it really starts there, but maybe someone knows what windsurf derived from to keep the chain going?
I have both Codex Monitor and this new Codex app open side by side right now; aside from the theme, I struggle to tell them apart. Antigravity's Agent Manager is obviously different, but these two are twins.
I have a very hard time getting worked up over this. There are a ton of entrants in this category, they all generally look the same. Cribbing features seems par for the course.
I guess the next it was meant to happen...I tried Google's Antigravity and found it quite buggy.
May give a go at this and Claude Code desktop as well, but Cursor guys are still working the hardest to keep themselves alive.
Does anybody know when Codex is going to roll out subagent support? That has been an absolute game changer in Claude Code. It lets me run with a single session for so much longer and chip away at much more complex tasks. This was my biggest pain point when I used Codex last week.
It's already out.
Can you explain how to use it? I’ve tried asking it to do “create 3 files using multiple sub agents” and other similar wording. It never works.
Is it in the main Codex build? There doesn’t seem to be an experiment for it.
https://github.com/openai/codex/issues/2604
Is there any marked difference or benefit over Claude Code?
It’s possible to run up to 4 agents at once vs. Claude Code’s single thread. Sometimes I’ll find meaningful quality differences between what agents produce.
Interesting. Has anyone found running multiple parallel agents useful in practice?
Maybe it's because I'm not used to the flow, but I prefer to work directly on the machine where I'm logged in via ssh, instead of working "somewhere in a git tree", and then have to deploy/test/etc.
Once this app (or a similar app by Anthropic) will allow me to have the same level of "orchestration" but on a remote machine, I'll test it.
I typically bounce between Claude Code and Codex for the same project, and generally enjoy using both to check each other.
One cool thing about this: upon installing it immediately found all previous projects I've used with Codex and has those projects in the sidebar with all of the "threads" (sessions) I've had with Codex on these projects!
This does look like it would simplify some aspects of using Codex on Mac, however, when I first saw the headline I thought this was going to be a phone app. And that started running a whole list of ideas through my brain... :(
But overall, looks very nice and I'm looking forward to giving it a try.
I don't know why any frontier model lab can't ship a mobile app that doesn't use a cloud VM but is able to connect to your laptop/server and work against local files on there when on the same network (e.g.: on TailScale). Or even better act as a remote control for a harness running on that remote device, so you couldn't seamlessly switch between phone and laptop/server.
Bugs me they treat MacOS as first class. Do people actually develop on a Mac in 2026? Why not just start with Linux?
I mean if they were targeting "software engineers" in general then Windows would be the obvious choice in 2026 as much as in 2006. But these early releases are all about the SF bubble where Mac is very much dominant.
Can't build iOS apps on anything else sadly.
How is this better than vscode with the codex extension?
I'm managing context with codex inside VSCode using different threads. I'm trying to figure out if there are use cases where I'd rather be in this app.
Is this not just a skinned version of Goose: https://block.github.io/goose/
they are all copies of each other. Did you expect them to build something completely new? Software Development is stuck in an AI hole where we only build AI features.
In the end this and all other 89372304 AI projects are just OpenAPI/Anthropic API wrappers, but at least one has 1st party support which maybe gives it a slight advantage?
Having dictation and worktree support built in is nice. Currently there is a whole ecosystem of tools implementing similar functionality for Claude Code. The automations look cool too!
seems like I need to update my toolset for the 3rd time this week
Is it open source? Do they disclose which framework they use for the GUI? Is it Electron or Tauri?
lol ofc not
looks like the same framework they used to build chatgpt desktop (electron)
edit - from another comment:
> Hi! Romain here, I work on Codex at OpenAI. We totally hear you. The team actually built the app in Electron specifically so we can support Windows and Linux as well. We shipped macOS first, but Windows is coming very soon. Appreciate you calling this out. Stay tuned!
Does this support users who access Codex via Azure OpenAI API keys?
It keeps offering me to "Get Plus" even though I am signed and already have a Plus plan.
Codex really grown on me lately. I re-signed to try it out on a project I have and it turned out to be really great addition to my toolkit.
It isn't always perfect and it's cli (how I mostly use it) isn't as sophisticated as OpenCode which is my default.
I am happy with this app, I am using Superset, terminal app which suprisingly is well positioned to help you if you work in cli like I do. But like I said, new desktop app seems like a solid addition.
Currently using opencode with Codex 5.2 and wondering why I should switch.
Built an open source lightweight version of this that works with any cli agent: https://github.com/built-by-as/FleetCode
> Work with multiple agents in parallel
But you can already do that, in the terminal. Open your favourite terminal, use splits or tmux and spin up as many claude code or codex instances as you want. In parallel. I do it constantly. For all kinds of tasks, not only coding.
The inclusion of a live vibe-coded game on the webpage is fun, except the game barely works and it's odd they didn't attempt any polish/QA for what is ostensibly a PR announcement. It just adds more fuel to the fire to the argument that vibecoding results in AI slop.
To be fair the premise is that they 1 shotted it. I'd just be suspicious if it were any better (the POC is that is just about works)
I agree, if it had been polished I would have not trusted the demo at all, the fact it shows what you can potentially expect from a one-shot is cooler.
Does the Codex app host MCP Apps?
> and we're doubling the rate limits on Plus, Pro, Business, Enterprise, and Edu plans.
I love competition
This is so garbage. OpenAI is never catching up.
I really want to like the native Mac app aesthetic but I kinda hate it. It screams minimalist but also clearly tells me it’s not meant for a power user. That ruggedness and sensitivity is missing.
No Linux support? :(
> We're also excited to show more people what's now possible with Codex . For a limited time we're including Codex with ChatGPT Free and Go, and we're doubling the rate limits on Plus, Pro, Business, Enterprise, and Edu plans.
Translated from Marketingspeak, this is presumably "we're also desperate for some people to actually use it because everyone shrugged and went back to Claude Code when we released it".
I dunno, feels like the models have different weak/strong points, sometimes I can sit with Claude Code for an hour with some issue, try it with Codex and have it solved in five minutes, and also the opposite happens. I tend to use Codex mostly when I care more about correctness and not missing anything, Claude when it's more important I do it fast and I know exactly what it needs to do, Codex seems to require less hand-holding. Of course, just anecdotal.
GPT models definitely seem stronger when they "get it" and in the types of problems they "get", while claude seems more holistic but not "as smart" as some of the spikes GPT can get.
Yeah this is clearly just a marketing re-release but if they've executed well i'm happy to try it
They also claim 2x usage from December (though 2x a tiny amount is still a tiny amount)
Given the prevalence of Opencode and its ability to use any model and provider I don't see reason why would anyone bother with random vendors half-assed tools.
For starters, money. There is no better value out there that I'm aware of than Claude Code Max. Claude Code also just works way better than Opencode, in my experience. Though I know there are those that have experienced the exact opposite.
Are you calling OpenAI a random vendor?
That's like calling Coca Cola a random beverage vendor