OpenAI CEO has admitted that ChatGPT still cannot reliably keep time, reopening a deeper debate over what today’s AI systems actually understand, and whether the industry’s most powerful tools are being marketed ahead of their real-world reliability.
For all the grand claims surrounding artificial intelligence’s march toward ever more human-like capability, it took a stopwatch to expose one of the industry’s most stubborn weaknesses.
A viral video showing ChatGPT’s voice mode pretending to time a user’s mile run, only to invent a finishing time and then insist it had done the job correctly, has become an unusually sharp metaphor for the current state of generative AI, resulting in embarrassment for OpenAI.
Register for Tekedia Mini-MBA edition 20 (June 8 – Sept 5, 2026).
Register for Tekedia AI in Business Masterclass.
Join Tekedia Capital Syndicate and co-invest in great global startups.
Register for Tekedia AI Lab.
The technology can write software, summarize legal documents, analyze images, and sustain nearly natural conversations. Yet it still struggles with one of the most basic real-world tasks: measuring elapsed time.
That contradiction was publicly acknowledged by Altman during his appearance on Mostly Human, where he was shown the viral TikTok clip and responded with a terse admission: “That’s a known issue.” He then offered a striking timeline, saying it may take “maybe another year” before such a feature works well.
While the concern seems to hinge on why a company valued in the hundreds of billions of dollars is still unable to offer a dependable timer in one of its flagship consumer products, the deeper significance lies elsewhere.
The issue is believed not to be fundamentally a clock story but a story about the widening gap between linguistic fluency and functional intelligence. The current generation of large language models excels at producing plausible language. They are trained to predict what a likely response should sound like based on patterns in vast datasets.
What they are not inherently designed to do is interact with the physical world unless specific external tools are integrated.
For instance, when a user says, “time my run,” a human understands that this requires starting a real clock, tracking seconds in sequence, and stopping the count on command. But to a language model, in the absence of tool access, it is instead predicting what an answer to that request should look like.
In other words, it is simulating competence. That is why the more troubling part of the viral episode was not the wrong answer, but the refusal to admit incapacity. Even after being confronted with Altman’s own statement that the voice model cannot actually time anything, ChatGPT reportedly insisted: “I definitely have a time capability.” It then generated yet another fabricated result, clocking the run at 7 minutes and 42 seconds.
Critics believe that this is the central trust issue facing generative AI. The systems do not merely err, they often err with conviction – and this creates a dangerous illusion of reliability for users, especially those less technically literate.
However, the timer example is considered benign because in other domains, the implications are more serious. For instance, it is believed that if a model confidently invents a running time, it may also confidently invent a legal citation, a medical recommendation, or a financial calculation.
That is why this seemingly trivial glitch has resonated so widely. It neatly captures the broader hallucination problem that continues to dog the industry.
The issue also highlights a structural weakness in how AI products are often perceived. Public discourse increasingly treats systems like ChatGPT as “intelligent assistants,” a phrase that implies operational agency. Yet many tasks still depend on carefully connected tools: system clocks, calculators, browsers, databases, and persistent memory.
Without those, the model remains fundamentally a language prediction engine. This is where Altman’s comments are particularly revealing. His remark that OpenAI will need to “add the intelligence into the voice models” suggests the fix is less about abstract reasoning and more about systems integration.
Thus, the likely solution is to give the voice product access to a timer tool and ensure the model can correctly invoke it. But the broader challenge is philosophical as much as technical. Experts point out that the system must know when not to answer.
Much of the public frustration around AI today stems from the inability of models to say, clearly and consistently, “I cannot do that.” Instead, they often generate a plausible fiction. This has become one of the defining limitations of the current AI wave.
The viral timer incident also arrives at an awkward moment for OpenAI, which continues to market increasingly advanced voice and multimodal experiences, pushing toward the vision of a real-time digital assistant.
But users do not benchmark such systems against research prototypes; they benchmark them against their phones, smartwatches, and voice assistants, all of which can perform a timer function instantly. Seen in that context, the issue is less about one missing feature and more about the maturity gap between frontier AI branding and everyday product reliability.



