ChartNet lifts smaller AI chart models

MIT and IBM researchers have opened a new front in multimodal artificial intelligence by releasing ChartNet, a large synthetic dataset designed to teach smaller vision-language models how to read, extract and reason from charts with greater accuracy than far larger commercial systems.

The dataset, available through Hugging Face, contains 1.7 million richly annotated core chart samples and a broader collection running into millions of synthetic examples. Its release is aimed at one of the persistent weaknesses in enterprise AI: the ability to interpret charts embedded in market reports, scientific papers, dashboards and policy documents without misreading numbers or inventing trends.

Chart understanding is a demanding task because models must combine visual recognition, numerical extraction and language reasoning. A line chart, bar chart or scatter plot may require the system to identify axes, labels, legends, scale changes, colour coding, underlying data tables and the relationships between variables. Errors in any one of these steps can produce misleading summaries, especially in finance, healthcare, energy and public policy, where charts often carry the central evidence behind a decision.

ChartNet was built by researchers from MIT, the MIT-IBM Computing Research Lab and IBM Research. The project’s lead author is Jovana Kondic, an MIT electrical engineering and computer science graduate student. The wider team includes Pengyuan Li, Dhiraj Joshi and Isaac Sanchez from IBM Research, along with Aude Oliva and Rogerio Feris from the MIT-IBM research network. The work is scheduled for presentation at the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

The dataset uses a code-guided synthesis pipeline rather than relying mainly on chart images gathered from the web. Starting from seed charts, the system reconstructs plotting code, then alters chart type, data values, topics, colours and visual styles to produce many variations. Each sample is aligned with rendered chart images, plotting code, a numerical table, a natural-language summary and question-answer pairs with reasoning steps.

That structure gives AI models more than visual examples. It allows them to connect the chart image to its underlying data and to the code that generated it, creating a stronger bridge between pixels, numbers and language. The dataset covers 24 chart types and six plotting libraries, and includes quality checks intended to remove malformed or visually inaccurate examples.

Testing showed that fine-tuning open-source vision-language models on ChartNet improved performance across chart reconstruction, data extraction, summarisation and question-answering. Smaller systems, including models in the 2-billion to 7-billion parameter range, were able to outperform much larger off-the-shelf models on several standard chart tasks. In chart data extraction, the best ChartNet-tuned Granite Vision model scored above GPT-4o in reported evaluations, while LLaVA-7B showed large gains after training on the dataset.

The findings challenge the assumption that model size alone is the main route to better performance. For chart interpretation, the researchers argue that tightly aligned training data can matter more than simply scaling parameters. That is significant for smaller companies and research teams that cannot afford frontier commercial models or massive computing budgets.

ChartNet also includes specialised subsets beyond its synthetic core. A human-verified set contains more than 94,000 examples, including a 2,000-chart test set. A grounding and localisation subset is designed to help models identify specific visual regions when answering questions. A real-world chart collection adds 30,000 examples from published visualisations, while safety-focused content is listed as part of the roadmap.

The release comes as businesses increasingly deploy AI systems to process dense documents containing tables, diagrams and charts. In banking and asset management, such models may be asked to explain earnings trends, compare sector performance or extract figures from investor presentations. In scientific publishing, they may be used to summarise experimental results. In government and development work, they may assist with interpreting demographic, climate and fiscal data.



Notice an issue?

Arabian Post strives to deliver the most accurate and reliable information to its readers. If you believe you have identified an error or inconsistency in this article, please don't hesitate to contact our editorial team at editor[at]thearabianpost[dot]com. We are committed to promptly addressing any concerns and ensuring the highest level of journalistic integrity.


ADVERTISEMENT
Social Media Auto Publish Powered By : XYZScripts.com