Accelerates Data Access and Production Deployments for NVIDIA-Driven Data Science
MapR Technologies has announced support within the MapR Data Platform to accelerate data access and production deployments for data science through the RAPIDS open-source software.
MapR uniquely helps data scientists accelerate the access of required training data by focusing on easing the issues of on-boarding, cleansing, cataloging, and feeding data at high performance to GPUs and NVIDIA DGX systems. The MapR solution also manages the deployment and management of multiple models into production to speed business impact.
“The challenge for most data scientists is the data logistics to locate, prep and access the right data for training. In many cases, 90 percent of the time is spent data wrangling,” said Anil Gadre, EVP and chief product officer, MapR Technologies. “MapR complements RAPIDS with a data management and logistics fabric to accelerate the high-scale processing and access of disparate data across geographies. The same fabric also speeds the deployment of models into production and coordinates the continuous deployment and updating of multiple models to impact business in real-time at scale.”
Central to the solution is the ability to coordinate data flows from across the enterprise and, through a pre-built MapR container for GPUs, make it easy to integrate into NVIDIA’s complete end-to-end data science training pipelines. The MapR Data Platform for RAPIDS enables data scientists to:
- Collect data at scale from a variety of sources and preserve raw data so that potentially valuable features are not lost
- Make input and output data available to many independent applications even across geographically distant locations, on premises, in the cloud or at the edge
- Manage multiple models during development and easily roll into production
- Improve evaluation methods for comparing models during development and production, including use of a reference model for baseline successful performance
- Support rapid stream-based delivery of standard files including Parquet, ORC, JSON, ABRO, and CSV file formats directly into RAPIDS
”MapR’s work with NVIDIA in the RAPIDS ecosystem is helping make broad adoption in the enterprise easy for the largest breadth of workloads,” said Clément Farabet, vice president of AI infrastructure at NVIDIA. ”MapR’s ability to span on-prem and cloud, from IoT edge to core with a scalable, high-performance common platform means that more data can be fed to GPUs and more innovative applications can be created by data scientists faster.”
The MapR container for GPUs aimed at making it easy to integrate automated data logistics into RAPIDS is available today in NVIDIA’s repository www.RAPIDS.ai.
A Reference Architecture (URL) providing detailed technical information on how to configure NVIDIA DGX systems with MapR Data Platform for optimal use is also available here.
MapR Technologies, provider of the industry’s leading data platform for AI and Analytics, enables enterprises to inject analytics into their business processes to increase revenue, reduce costs, and mitigate risks. MapR addresses the data complexities of high-scale and mission critical distributed processing from the cloud to the edge, IoT analytics, and container persistence. Global 2000 enterprises trust the MapR Data Platform to help them solve their most complex AI and analytics challenges. Amazon, Cisco, Google, Microsoft, SAP and other leading businesses are all part of the MapR ecosystem. For more information, visit mapr.com.