According to the February 2020 Gartner Magic Quadrant Report, vendors in the extremely competitive and “thrillingly innovative” Data Science and Machine Learning (DSML) market have packed more AI-enhanced features into their platforms than ever before.
Of the many comparison points Gartner used to evaluate vendor offerings, one set of criteria separated the leaders from the rest: how well a given DSML product enables users who are not expert data scientists to build a machine learning model pipeline. This includes:
- AI-enhanced automation to help with data ingest, visualization, and feature discovery.
- Drag-and-drop tools to build pipelines more quickly.
- Collaboration tools that bring multi-disciplinary input to model management (MLOps).
- Multiple ways to integrate model pipelines into business processes.
DSML platforms aren’t just about getting the expert data scientist all the most powerful tools and technologies anymore. As Gartner emphasized, DSML platform vendors compete to enable “citizen data scientists” to build out the rest of the pipeline in which the expert data scientists embed the models. This lets expert data scientists focus on what they do best—a win for everybody.
This article examines the competitive features that enable DSML platforms to help non-expert users to efficiently build and maintain a model pipeline.
Ingest and Meld Large Data from Disparate Sources
Data has gotten so big that people need computer augmentation to handle it. Siemens’ extensive computer systems generated six terabytes of log data every day! Hidden within that was evidence of 60,000 cyber threats per second, such as from viruses, malware, and other cybercrime. Siemens needed to respond automatically to protect its own and its customers’ data.
A competitive DSML platform must do the following to help with big data flows and management:
- Ingest data from disparate sources, such as company data silos.
- Combine with external sources, such as IBM’s X-Force Exchange to supply threat patterns that help to analyze the data.
- House the data in a cloud-based location such as Amazon’s Simple Storage Service (S3).
Siemens used Amazon SageMaker to automatically label and prepare the data. The ingest process takes in the constant big data input from logs to enable downstream processing in the DSML pipeline. The cloud-based repository consolidated the data in one place that fed all the downstream processes.
“On AWS [and SageMaker], our AI-driven cyber-security platform easily exceeds the strongest published benchmarks in the world,” said Jan Pospisil, Senior Data Scientist at the Siemens Cyber Defense Center.
Data ingest is just the first step in the pipeline. DSML platforms must also assist “citizen data scientists” to analyze and understand the data, a necessary step before applying models.
ML-Augmented Data Visualization and Feature Discovery
The Gartner MQ report made a clear distinction between business intelligence/analytics platforms and DSML platforms. The report emphasized that ML-aided data discovery is one of the key features that a vendor’s offering must have to be competitive and reach the coveted Leader quadrant.
Shelby Blitz reported in a DZone article on data visualization how ML helps data scientists.
“One of the biggest benefits is that machine learning algorithms expedite the data discovery process… they automatically improve their analysis as they scan information.”
That’s exactly what General Electric Power did with their Enterprise Resource Planning (ERP) data. Using an AI-enhanced data science tool provided software dashboards with visualizations that can go from big picture to detailed views and different levels of abstraction. The visualization capability helped to identify patterns in the data more quickly than conventional coding techniques. GEP used the automated analysis to feed into its proprietary analysis methods that were already tailored to its business.
Data visualization shows the patterns, but data scientists need to refine these into the set of data features the data provides. Competitive DSML platforms offer a toolkit of ML-enhanced techniques to find existing data features, or to further process the data to create them.
The set of discovered data features helps the data science team to decide which predictive models to use. This leads to the next step in the pipeline where the model process gets built.
Drag-and-Drop Tools to Build Common Models
Expert data scientists know how to write code to solve model projects. However, most model applications can use off-the-shelf algorithms. Gartner praised vendors that provided an intuitive user experience (UX) around the process of building the model that consumes the pipeline’s data with its features.
“Automated machine learning improves the productivity of data scientists by handling much of the repetitive tasks associated with model development,” said Susan Kahler, Global Product Marketing Manager for AI at SAS Institute Inc., a seven-year Leader in the Gartner DSML Report.
SAS, like the other five Leaders in the 2020 report, were highly rated to offer drag-and-drop user interfaces with ML-enhanced features that connect such model building blocks as:
- Data inputs
- Data transformations
- Model algorithms
- Data sample steps
- Model training steps
- Data outputs
Once the data scientist finishes the model diagram, platform features allow the model to be trained and tested. The model builder UX enables citizen data scientists to quickly assemble a set of competing models. This provides grist for the model evaluation process that comes next.
Collaboration Environment to Build, Assess, and Replace Models
Sound model management requires the collaboration of many disciplines, among them:
- Expert data scientists
- Citizen data scientists
- Project managers
- Business stakeholders
- Government regulation compliance experts
Gartner called out vendors that provided an integrated UX that allowed all of these roles to collaborate smoothly. Collaboration within the flow of model development and evaluation improves business outcomes and empowers valuable exchanges within data science teams. It also allows business managers and stakeholders to follow results and progress.
Equally important, Gartner gave high marks to vendors who supported performance monitoring of deployed models, and group decision-making on when to replace a production model with the best challenger from among the models still in development. Collaboration in the overall MLOps process separated the Leaders from the rest of the MQ field.
Multiple Ways to Build MLOps into Business Processes
The 2020 Gartner report required platforms to support model operationalization (or MLOps). MLOps is a term that encompasses:
- Putting a model into production
- Monitoring its performance as data flows through it
- Building new models to challenge the production (or champion) model
- Replacing the production model with a new champion when its performance degrades
The MLOps development cycle is a never-ending process. All DSML platforms need to support it as basic functionality.
However, the capability that Gartner looked for in its 2020 report was flexibility in how the DSML platform supported deploying the models—the more targets the better.
Model generation output is usually code that’s much smaller than the model environment itself. Ways to deploy this code include:
- Desktop deployment: For local/development execution
- Server deployment: For behind-the-firewall execution
- Cloud deployment: To make the recommendations available as a service
- On-premises deployment: To integrate with customer systems
- Containerized deployment: Deploy as a Docker container with an API to allow interaction
- Library deployment: Bundle the code into a library that can be integrated into an application
Deployment to a variety of business environments is historically the hardest part of MLOps. Then the DSML platform must be able to measure the deployed model’s performance to keep track of when it starts to lose effect. That’s when the next champion model needs to get deployed.
The Gartner report rewarded vendors with praise when they offered robust deployment and monitoring functionality.
The 2020 Magic Quadrant for DSML showed nearly all of the vendors to the right of the Visionary line and clustered near or fully within the Leader quadrant. That was stiff competition. To distinguish between competitors, the report called out the most competitive features that supported more participation from roles other than the expert data scientist through ML-enhanced features, ease of use, and model deployment options.
However, within the competition, the DSML market is “simultaneously more vibrant and messier than ever. Vendors [built] rapidly evolving solutions with numerous open-source components.”
That means vendor offerings have significant differences between them. Gartner suggests the following procedure when a company wants to get further into this area:
- Start with open source platforms to gain expertise and experience.
- Start looking at all the vendors, whether small, medium or large.
- Pick the vendor that best fits what you need now.
This could mean picking a Niche Player or Visionary because their focused offering matches your requirements well. Established Leaders typically charge premium prices for their platforms.
The good news is that you have a lot to choose from, with more vendors competing to enter this report in years to come. The fact that vendors are trying to get more personnel engaged in using the platform means that building a model pipeline will continue to get faster and more comprehensive, making it easier to get models deployed where businesses need them most.
Do you want to read more articles like this on our blog? Are you a cloud expert with an opinion to share? Contact us to let us know what kind of tech industry topics we should be covering!