Marketing pages tell you a tool is magic. Benchmarks tell you it scores well on problems someone curated. Neither tells you what happens at hour six of a real workday when the agent is three minutes into a task you cannot interrupt, or when it quietly edits a file you did not select. For that, you go to the threads where developers vent, brag, and argue with each other, and you read what they say when no vendor is listening.
So that is what we did. What follows is a cross-section of real reactions, grouped honestly: the people who say it changed their work, the people who walked away furious, the nuanced middle that is usually closest to the truth, and the specific structural gripes that keep coming up. Read them as a chorus, not a verdict. The pattern across all four buckets is the real signal.
The Raves
Developers who say it changed how they ship.
I've submitted over 60 PRs with Codex today. A mixture of Python, Typescript and C#. I haven't experienced any major issues, it has been brilliant to say the least. The only time it tripped up was on a YAML file, but it was an easy correction.
My company has had access to a preview of OpenAI Codex (and other tools). It's wild just how fast things are moving along. This morning I merged close to a dozen PR's that were completed by Codex. It's not fully "there" just yet, but with all of these tools we have stopped hiring junior SWE's entirely.
I cannot speak to Claude, but I have a top rated, multiplayer VR app that I have hardly written any of the code as ChatGPT does it for me. Of course I have to direct it as its logic is its weak spot right now, but it is very good a spitting out syntax that meets my needs and is performant on the mobile VR platforms.
I had the same frustration as you until I tried Codex. It's the one tool that's actually been helpful
Ive got several large raises at my job by spitting out small programs that solve big problems in a very short time, using chatGPT and doing the coding myself with chatGPT as support. It provided me with the initial template, i broke it down to small tasks, and i solved the bugs it got stuck on.
Most months my Pro subscription more than pays for itself. I was thinking I didn't need it because I have been using Windsurf and the API a lot more, but now that Codex is out, it looks like I'm keeping it another month.
If codex-1 is really that good, I have no problem paying $200 a month for the improved productivity.
The Rage
The ones who walked away angry, or broke something first.
no idea if codex cli and the new codex are actually different in terms of quality, but based on my experince with codex cli it was horrible. Claude code outperforms it in every way, can't say yet about the new one
I'm eager to try it out and report back how it compares to Claude Code. The previous Codex product was wack.
I will also add that my curiosity, too, is present, and codex was, in fact, underwhelming.
Well. As I said. It hallucinated. I broke down the bug and still didn't fix. Maybe because I was using the free version of it? But Chatgpt fixed the bug for me..but there are coders relating that o1 is better. So I don't know about it..
I was using Copilot a while ago and for some reason it edited code that was not even selected and broke it. Since then I stopped using any AI tools integrated with editors, I just use ChatGPT or Claude or whatever, and put the code together myself.
I'm constantly arguing with ChatGPT about it lol. I'll give it code, ask it a question. It'll spit out code. Try the code, might fix one thing, and break 3 others that worked fine before it's "fix". Tell it what it did, spits out another set of code. Fixes the issue it broke, and breaks something else.
It has its moments. I've been coding with ChatGPT-4 for a while and don't really see a difference between the two. Earlier today it changed a piece of working code I gave it to modify for no reason (it didn't need to be modified) which broke the app. It was a simple fix but still such a simple "mistake" makes it hard to rely on totally. I don't mind when it makes mistakes on something I'm trying to learn because it forces me to understand more. It's when I need to write some simple script for work that I could write myself but use Chat to be more efficient and it fucks up that really pisses me off.
I hit the rate limit constantly with codex cli. Before a recent update, the tool would just crash on rate limit, now it waits and takes 10 min to get an answer for a simple request. I tried loading 100$ in the console, but it didn't bump me to tier 2. I guess I need to spend 50$ not just load 50$ for tier 2? Should I run dummy request to burn 50$ until I get to a higher tier to finally make the tool usable? :-/
You can get it all on gemini, for $20. Openai is not worth it anymore, and thats coming from a user who used it every day for a year. For 5 users on chat gpt $150 a month For 5 user (using google one, and adding other users as "family) on gemini? $20. a month. Also if you cancel on chatgpt, (with teams) that means everything you have in there gets deleted, you cant unlink the account. its a trap. Gemini comes with more than just "Access to models" too. Gemini comes with 2tb of cloud storage, and tons of other little features. And this? codex is pathetic. I am in the act of getting away from openai, because I have a feeling google is the only place to do programming related tasks anymore. Openai simply cant compete with their costs, because google isn't depending on ai for income. And i mean look at what openai does for teams. You get essentially nothing.
I just spent most of the day using Codex CLI with 40 mini. Didn't come close to Claude Code for frontend stuff. I do, however, expect Codex to improve rapidly where as Anthropic takes FOREVER to get stuff out the door, so at some point I don't think Claude will have a chance. I'll use it for now though.
I tried using ChatGPT to help with a new website idea. It created a site, but things did not work as I wanted, so I asked it to fix this or that, which it did, but doing so broke other things. Like if I had asked a button to do something, it would work initially, but then after asking for it to fix something completely separate, it removes the code behind the button click. For no reason. Loads of things like this with AI, it fixes one thing (or tries to) and breaks other things. It has no member of what you had previously told it to do. Also, you ask it do something, like add some code for when a user clicks a button. It claims it added the code, and fixed this or that, when it totally hasn't. It's just lying. As the OP said, I think AI can be used to write a specific function. But not be able to build a whole solution.
The Mixed Verdict
Useful, with a but. The most honest reviews usually are.
I've been using them both all day today. Codex is much more thoughtful about the changes it recommends but it's very slow. Every request is 3-5 minutes. So if it's a task where I know it's a very quick well defined task, I find Cursor more useful. Been turning to Codex for more thoughtful stuff/refactoring where I can't avoid a weird turn by Cursor.
Having the opposite experience. GPT-4 wrote me 1k lines of Python and, while it got many things wrong, it was still hours faster than doing it all manually. And for the cases where GPT-4 failed, Codex / Copilot saver the day. However your mileage may vary based on what your codebase is wirrten in, the dependencies, APIs, etc. It is not taking our jobs yet.
Idk man I have zero issues getting both Claude and ChatGPT to spit out decent, functional code. You have to really be able to explain what you want in detail.
> In your experience, is it better than Copilot? In what cases, and when not? From my testing so far - Copilot is better at completing short sections of code as you type, but I think this is mostly because of latency. Copilot responses feel almost instantaneous, where Jetbrains is taking a couple of seconds. This might just be down to the servers being overloaded at launch. The quality of Jetbrains autocomplete suggestions seems very similar in quality. Since they're using OpenAI under the hood, my guess it's based on the same or similar codex model as Copilot and would be expected to perform similarly. Where Jetbrains AI shines is that it can work as an agent and make larger changes and refactorings spread out in different places (vs just completing one section of code)
Idk if I can say I've built anything impressive yet, just a few projects of various complexity that involve AI in some way. But I have zero coding experience and I had to learn SO MUCH. I use VS code with the Cline extension for coding. I typically use Sonnet 3.5, because I ran into so many issues with 3.7. I had to learn how to setup a database through firebase, how to handle API keys, how to prompt effectively, refractoring and modular code, fine tuning an API model, how to use the command terminal when I never used it before, the whole GitHub pages process, dev server for testing, and so much more. Anytime I ran into an error or something broke and I didn't know what it meant, I would pull up chatgpt and have it explain it to me like I'm 10, then I'll get it, fix it and move on. There's a lot more I could say but I'm still learning and figuring it all out.
ChatGPT is 2.5 years old and I use it daily to help with my code. But I think your vision is a big short-sighted. It's like looking at new born crawling and conclude that it's never gonna be a runner.
I only started coding java in 2022 and python last year. I'm lazy and just chatgpt coding hahaha.
It would have to be A LOT better than Claude Code for me to let it run unattended and not expect garbage. I'm a solopreneur with 25 years of coding experience, and currently I'm spending over $1,000/mo on Claude Code. It does pretty well but I I had to disable auto-accept because I have to steer or correct it too often. It's easier if I catch it early, instead of letting it go at it (which costs more money and more time to fix). As Codex seems to work unattended, unless the model is a lot better than Sonnet 3.7, I'm skeptical.
Quirks, Limits, And Red Flags
The specific gripes: speed, privacy, no interrupt, walled-off APIs.
Is there a way to opt-out of them storing & training on the git codebase you upload to Codex? I couldn't any mention of data privacy in their PR.
I found that OpenAIs codex tool after completing a task runs an agent to remove useless comments, they couldn't find a way to avoid it so they work around it lmao
I use both and it depends on the task. Codex doesn't allow you to interrupt once the process has kicked off, but you can fire off an infinite number of parallel agents that push PRs at the end of their run. That's extremely useful for well scoped grunt work across a full codebase (imagine Deep Research on your repo). Cline x Gemini is good when you need control over the process. Great for complex work that still requires human intervention. To date, OAI's SOTA models can't be beat on codebase understanding when you run it via their platform (either Codex cloud or Codex CLI) because their internal tool use is pretty insane. Gemini x Cline/Roo are hard to beat on day-to-day, general purpose coding where you want to be in the driver's seat and don't exactly have clear vision of "what next"
You're right. Cursor made me try codex cli, claude cli and some other alternatives. I won't return, lol
> Codex- Top of the line Context, which for me means everything. Downloads the entire codebase each time in its own VM and runs tests and just does shit different than anyone else. I've probably missed something here but is Codex an IDE? I thought it was "just" a ChatGPT-like interface where you connected your repo, could ask it to perform tasks via one or many background agents that each spin up a VM of your app etc?
My main issue is that, it can't connect to any api or external database. This is very restrictive for me as most of my backend stuff involves connecting to my db , other api for development. Unless someone has a clever way for me to build in codex and then test it using some other process?
Even GPT5 and 6 will still have the limit of being an LLM. Which means it can't truly determine what is actually correct nor will it be able to implement the code it spits out and check aspects such as efficiency. That's where an AGI will come in and truly replace Devs, and at that point most jobs will be filled by an AGI anyway. Although AGIs are just a work of fiction for now and it might be something we never achieve as a species. That said ChatGPT, or a better suited implementation of Codex like GitHub CoPilot definitely helps remove monotony from writing code which I am all for. I'm finding I can focus on the details aswell as the bigger picture more whilst AI does the boring stuff that I always hated doing anyway.
It's basically a codex that can read/write really fast. I use it fairly often, but the use case of writing code for me is just not that great compared to information retrieval, even though companies are pushing the former a lot harder
The Pattern
Read all 34 together and a shape emerges. When Codex and ChatGPT work, people describe speed and leverage in glowing terms, 60 PRs in a day, a shipped app they barely wrote. When they fail, it is rarely a small miss. It is broken production code, an edit to a file nobody touched, a confident answer that fixed one thing and broke another. The fluency is real, and so is the cost of trusting it without checking. The middle, the developers who say it is genuinely useful but only when you already know what you are doing, are describing the same truth from the calm side of it.
Got a Codex or ChatGPT coding story of your own, a win or a wreck? Share it here.