AutogenAI APAC > Resources > Proposal Writing > The Day The Vibe Died: Why GPT-5 Fell Short of Expectations
Dark Mode

The Day The Vibe Died: Why GPT-5 Fell Short of Expectations

By: Archie Rowberry

On the 7th of August 2025, GPT-5 came out, the biggest, most expensive, most vaunted LLM ever developed. This was the model that would propel us beyond needing white-collar workers, developers, accountants, or engineers.

And yet, when it was released it flopped, markets and media had a distinctly flat reaction to its abilities and, for the time being at least, white-collar work is safe. There’s a couple of reasons it’s worth talking about as to why GPT-5 is not the model that people expected. The most obvious is hype; the entire world of AI users were making so much noise and so many CEOs were making predictions declaring that GPT-5 was a planet-killing model that people were rightly or wrongly expecting a leap in performance at least as great as that between GPT-3 and GPT-4. The reality is that many competitors have reached parity in performance with OpenAI and they no longer hold some competitive edge when it comes to delivering next-generation frontier models. The ingredients for developing LLMs are now well-known and, with the competitors having caught up, the going is no longer quite as good as it was for OpenAI. We’ve known this for some time though, that models couldn’t become smarter indefinitely, and that we would see a plateauing of their capabilities (at least in the short to medium-term). AI experts and investors had baked this model’s slight anti-climax into their valuations of the industry.

The second, and I find more interesting, reason is to do with how we’ve come to use and what we’ve come to expect from it. People didn’t like the vibes of the new model, and their issues were varied and often conflicting; it was too verbose, it wasn’t verbose enough, it wouldn’t understand them, it would follow instructions too obligingly, it no longer felt warm and supportive. Pulling out clear areas for improvement from the feedback users presented seemed impossible, with no consistent pain points beyond the model being “different” and “off”. GPT-5 could and should be the most advanced and capable model on the market – enabling users to perform a variety of tasks at a near postgraduate level, and yet people were not vibing with it.

Why was it that OpenAI’s super-users, those who should theoretically be best at maximising the performance of LLMs, had the most visceral reaction to the new model? Well, to understand this we need to look at how familiar we’ve become with LLMs, we might need to look more at ourselves rather than these models to understand the answer.

There are two brilliant things about humans:

  • We love patterns, it’s what made us so great at hunting and gathering on the Plains of Africa. Predators, prey, edible and poisonous foods were easily catalogued and identified by our superior pattern-matching brains.
  • We’re also deeply social beings who crave interaction and are phenomenally efficient at understanding the behavioural patterns in each other. This has enabled us to build bonds quickly, work together effectively, and empathise naturally.

These two strengths, when combined however, expose a flaw in LLMs; we have imprinted personalities and behaviours onto these models. We’ve anthropomorphised not only their capabilities but their flaws.

  • The GPT-4 series of models all felt incredibly similar tonally because they were trained on very similar data, using very similar cost functions. Their alignment data was the same, they shared the same goals as one another. This family has been around since the launch of GPT-4 in March 2023, that’s over 2 years we’ve had to become accustomed to their quirks and idiosyncrasies. We humans are no slouches and are more than capable of becoming intimately familiar with a person in that time, so why wouldn’t we become equally familiar with GPT-4?
  • GPT-5, on the other hand, had a significantly updated set of training data, was trained using a different set of evaluation criteria, and also had some very consciously different decisions made about it’s safety and alignment goals. This resulted in a model that was materially different in its behaviour and reactions to questions posed by users. When GPT-5 was released people reacted as any of us might when we see a friend who we’ve known for 2 years become a completely different person, which is badly…

Social philosopher John Searle posed a thought experiment in 1980, called the Chinese Room. This experiment places our subject inside of a room with no windows, two doors each with letterboxes, and a very large book. This book contains all possible combinations of understandable Mandarin phrases which a native Mandarin speaker could be expected to respond to, along with the relevant response in Mandarin. In this scenario our English speaking subject would take prompts in Mandarin from one door, dutifully copy out the response from the book, and then return the copied out text through the other door. In this thought experiment the subject has absolutely no comprehension of what they’re doing, what they’re saying, or what the person sending them messages is trying to achieve. To the person outside of the room, the subject is fluent in Mandarin and has a unique character. This impression is identical to our subconscious impressions of LLMs, even when we rationally understand that we are talking to a machine trained on the sum total of data on the internet we imprint a behaviour onto them and give the model human traits that inform our beliefs of how we should interact with it going forward.

What’s becoming increasingly clear as we move into the AI age is that in order for a business to deliver value with LLMs they need to consider the users more than the LLMs themselves. Humans are the owners of and operators of a business’ processes and will remain to be for the foreseeable. It is the interface between AI and humans that the most friction appears. Humans must trust the models to behave in a certain way, building the trust in these machines is key to us offloading tasks to them effectively.

Raw intelligence is now secondary to the ability of an organisation to use it effectively; The application layer is where companies can make or break the integration of AI.

At AutogenAI we’ve built the systems and workflows specifically for bid-writers, leveraging our understanding of the proposal writing process to align models’ behaviours to a single neat persona which you can rely on to accelerate your bid-writing process. Our understanding of models strengths and weaknesses and the resources we’ve put into profiling their capabilities means that we can match specific models to specific tasks within the bid-writing process. We marry the entire AI generation pipeline to meet users’ expectations and preferences – making their experience as consistent and seamless as possible.

Turn AI into your most reliable bid partner—book a demo with AutogenAI.

September 09, 2025