AI is only as good as the data that fuels it

By George Kurian

Artificial intelligence (AI) has recently been described in many ways—revolutionary, an economic game changer, a “beast” that is either overhyped or underhyped. I like to think about AI as a new frontier in the great tradition of tools that have propelled humankind forward—the next stage of the information revolution, like the industrial revolution or the scientific revolution that came before. And like any of the significant innovations before it, AI has the possibility to become a source for good, or a source for chaos.

AI holds great promise for businesses: Predictive AI, powered by machine learning, is already being used to recognize patterns, drastically improve efficiency, and solve business and social problems better and faster than anything we’ve seen. It can be used to improve medical research, like predicting how proteins fold to affect biological functions. It can help detect financial fraud to protect both customers and the company’s bottom line. It can aid natural disaster planning by better predicting crises and their ripple effects. We know because we have been helping customers achieve these AI-driven objectives for many years.

ADVERTISEMENT

And generative AI not only recognizes patterns but also generates new patterns. This capability can enable software developers to be more productive, help content creators deliver much more immersive experiences, and make it far easier for customers, employees, citizens, and students to find the information they need.

All of these possibilities are made by one thing: data. This has long been true—better datasets have allowed prior generations of AI tools to deliver better predictions, and by using very large datasets, large language models have powered generative AI to achieve new levels of capability. Current innovations are rapidly improving these foundation models by using customers’ private data to provide better context or to fine-tune an existing model and make better decisions. The eminent computer scientist Peter Norvig summarizes it elegantly: “More data beats clever algorithms, but better data beats more data.”

Simply put, AI is built on a foundation of data—data storage, safety, and accessibility is critical to the insight and analysis provided by AI. And the AI capabilities of your organization are only as competent as the data that fuels it.

What this moment needs: integration, performance, and trust

Operationalizing AI requires managing multiple versions of models and keeping them up to date with the latest datasets. This means that massive amounts of data must flow freely—whether that is the enterprise’s own data or other relevant datasets that customers use to improve their AI systems. Of course, we know better than anyone that this isn’t the data equivalent of opening a spillway on a dam. Not only is the volume of data massive and unrelenting, but it’s scattered, often unstructured, and needs to be protected. Complex technology and disparate organizational and data silos are major hurdles to getting AI projects into production. To help you capitalize on the best of AI, you need the most complete, powerful, and sustainable solutions, without the bottlenecks of traditional data silos. Having a modern, intelligent, integrated-hybrid cloud data infrastructure is the foundation of AI.

ADVERTISEMENT

Whether you’re a small or large business, here’s how you can optimize your data engine to take advantage of the intelligent technology revolution:

  • Ensure that your data and AI organization are integrated. Often the biggest gap in organizational readiness for an AI-powered, data-driven future is the fragmented ownership of data, siloed data platforms and infrastructure, and a disparate range of specialists who operate in silos. For example, many organizations have data analysts and engineers who deeply understand the data, data scientists who can apply modern data analytic tools to that data, and business analysts who understand how to apply data and AI recommendations to drive business outcomes. These roles need to work closely together as one team to accelerate AI impact.
  • Assess and consolidate your unstructured data. For many years, businesses have invested in tools to extract value from structured data such as databases, data warehouses, and business intelligence tools. Generative AI, however, provides a powerful engine to derive value from the majority and fastest-growing part of a company’s data—that is, unstructured data. Text is still the leading format of most organizations’ data; documents, audio files, and large files like imaging and videos constitute the largest percentage of a company’s data. Natural language processing (NLP) and computer vision (CV) are among the most mature AI tools, and certainly the fastest-application in generative AI, make sure you have an up-to-date view of your unstructured data landscape and its relevant applications so that you’re ready to use them with the generative AI applications for your business.
  • Integrate your workloads and data with intelligent hybrid multicloud infrastructure. Data volumes, types, and speeds are growing inexorably. With massive amounts of data to process, simplicity and integration go a long way. A data pipeline is effectively the architectural system for collecting, transporting, processing, transforming, storing, retrieving, and presenting data. Today’s leading-edge AI teams want to marry the scalability and unrelenting pace of innovation of the public clouds with the security and governance of on-premises environments by building hybrid cloud data pipelines.
  • Prioritize the security and governance of your data. With great power comes great responsibility. The saying might be trite, but there’s a reason for its ubiquity, and it’s particularly relevant to AI. AI has benefits from a security standpoint—it can identify cyberthreats in real time and create models for error detection—but it also can be dangerous. With AI, your private data is far more valuable, but it can be a source of errors, bias, and other inaccuracies in your model. So it needs to be well secured and well governed.

By optimizing your data engine, you can have a solid foundation to unlock the power of AI while doing so responsibly, safely, and affordably.

The writer is the CEO of NetApp

 

 


Also published on Medium.

ADVERTISEMENT

ADVERTISEMENT