Large Language Models Struggle With Reading Clocks
This article is part of an exclusive IEEE Journal Watch series in partnership with IEEE Xplore.
The rapidly developing capabilities of artificial intelligence have worried many people. But don’t worry just yet: if you can read an analog clock correctly, you’re still beating AI in this regard.
AI models capable of analyzing different types of media in the form of text, images and videos – called multimedia large language models (MLLMs) – are gaining traction in various applications, such as sports analytics and autonomous driving. But sometimes, these models can fail at what seem like the simplest tasks, including accurately reading the time from an analog clock. This raises questions about exactly which image analysis factors these models suffer from.
For example, when it comes to reading traditional watches, do models have difficulty distinguishing between short and long hands? Or do you find it difficult to determine the exact angle and direction of the hands in relation to the numbers? Answers to these seemingly trivial questions can provide critical insights into the major limitations of these models.
Javier Conde, an assistant professor at the Politecnica University of Madrid, and colleagues at the Politecnico di Milano and the University of Valladolid, sought to investigate these limitations in a recent study. The results were published on October 16 IEEE Internet Computingsuggest that if an MLLM struggles with one aspect of image analysis, this can cause a cascading effect that affects other aspects of image analysis.
How well can artificial intelligence tell time?
First, the research team built a large dataset of synthetic images of analog clocks, which collectively displayed more than 43,000 specific times, and tested the ability of four different MLLMs to read times in a subset of the images. All four models initially failed to tell time accurately. The researchers were able to boost the models’ performance by training them with an additional 5,000 images from the dataset and testing the models again, using additional images they had not seen before. However, the models’ performance dropped again when tested against a completely new set of clock images.
The results touch on a major limitation of many AI models: they are good at recognizing data they are familiar with, but often fail to recognize new scenarios they have not yet encountered in their training data. In other words, they often lack generalizability.
Conde and his colleagues wanted to dig deeper into what makes it so difficult for MLLM to tell time. If the problem is related to model sensitivity to Clockwise spatial trends, then further fine-tuning can address this limitation – simply expose the model to more data and then it will become better at the task at hand.
In a series of experiments, they created new datasets of analog clocks, either with distorted shapes or changed the appearance of the clock hands, for example by adding arrows to the tips. “Although such variations do not pose much difficulty for humans, models often fail at this task,” explains Conde, citing Salvador Dali’s famous painting of distorted clocks. Memory stability. While humans can decipher the time of distorted and melting clocks, MLLMs struggle to tell the time of similar distorted clocks.
The results show that MLLMS struggles to determine the spatial orientation of the clock hands, but struggles even more when the clock hands have a unique appearance (e.g., arrows on their edges) to which the model has not been extensively exposed. However, these issues were not mutually exclusive: through additional experiments, the researchers found that if MLLMs made an error in… Clock recognition, this in turn led to larger spatial errors.
“It appears that reading time is not as simple a task as it may seem, as the model must identify the hands of the clock, determine their directions, and integrate these observations to infer the correct time,” explains Conde, noting that models struggle to process these changes simultaneously.
In their study, the researchers stressed that in more complex real-world scenarios such as medical image analysis or autonomous driving perception, these subtle and critical failures can lead to more serious consequences.
“These results show that we cannot take model performance for granted,” says Conde, emphasizing the need for extensive training and testing with diverse inputs that is essential to ensure that models remain robust to the diverse scenarios they are likely to encounter in real-world applications.
Many people expect AI to continue to improve, which in turn raises the question: Will AI models eventually be able to accurately read traditional analog clocks? Only time will tell.
This story was updated on November 8, 2025 to correct that the analog clock data the researchers used was synthetic data, not publicly available data.
From articles on your site
Related articles around the web
Don’t miss more hot News like this! Click here to discover the latest in AI news!
2025-11-08 14:00:00


