Wide range of problems when using data streaming to train AI systems

0
New research from Conduktor indicates that while data streaming is becoming a core component of artificial intelligence strategies, organisations are facing mounting challenges with data quality, scalability and infrastructure costs as they attempt to expand AI initiatives.
The study surveyed 200 senior IT and data executives at companies with annual revenues above US$50 million. It found that 43 per cent are already using streaming data to train or run AI models, alongside broader usage for workflow automation (83 per cent) and real-time decision-making (51 per cent).
Respondents said integration with AI and machine learning has become the most important capability for data-streaming systems to address, ranking ahead of support for connecting to mainstream business applications and ahead of security and governance requirements.
However, organisations reported ongoing quality issues when using streaming data in AI workflows. The most common problems include inconsistent data formats, duplicate events that skew model performance, and missing or incomplete data. As companies scale data infrastructure to support more advanced AI workloads, concerns widen further: 72 per cent cited data privacy and security challenges, 59 per cent pointed to rising infrastructure costs and 58 per cent reported limitations in real-time processing capacity.
Executives said they rely on a range of major platforms for AI data preparation and model lifecycle management, including Google Cloud Vertex AI, Microsoft Azure Machine Learning, Amazon SageMaker Data Wrangler and AWS Glue DataBrew.
Despite the pain points, confidence levels remain high. Eighty per cent of respondents rated their current integration between data-streaming platforms and AI/ML systems as “good”, while 9 per cent rated it “excellent”. Only 11 per cent said integration was “average”.
Conduktor chief executive Nicolas Orban said that high-quality, contextualised streaming data is essential as AI workloads become central to business operations. He noted that even with strong integration confidence, significant gaps remain: more than half of organisations still struggle with data quality, and almost three-quarters report privacy and compliance risks when scaling their data infrastructure.
“Fragmented data creates chaos, including missed signals, duplicated work, low trust and poor decisions,” Orban said. He added that centralising streaming data into a unified platform can improve visibility and control for enterprise IT teams.
According to analyst firm Dataintelo, the global market for streaming-data processing software was valued at about US$9.5 billion in 2023 and is forecast to reach US$23.8 billion by 2032, driven by demand for real-time processing across social media, IoT ecosystems and enterprise systems.
Share.

Comments are closed.