Tuesday, July 14, 2009

Data Mining for MySQL: Scoring your MySQL data just became a lot easier!

Many databases currently allow for data mining and analysis. SQL Server, for example, benefits from SQL Server Integration Services (SSIS) and Oracle from Oracle Data Miner. MySQL users, on the other hand, have in general used tools such as R and SPSS for data mining and to build statistical models. There is even an R package that builds an interface between R and MySQL (called RMySQL). Both R and SPSS (as well as a host of other statistical tools) are able to export PMML (Predictive Model Markup Language) which is the standard way to represent data mining models (for more on PMML, click here).

We have recently shown that one can easily deploy predictive models from SQL Server on the Amazon Cloud in a matter of minutes by using a script task in SSIS and the ADAPA Scoring Engine (see SSIS/ADAPA posting here). This time, we would like to make a similar case for MySQL.

Mind that building a model is a very different task than deploying one or executing it. The model development phase is usually mostly made of data analysis and massaging as well as feature selection. During model execution all you need are the most important data pieces (a much smaller sample of data fields than what you used during model development) to generate your decisions. In addition, the required pre-processing can be represented in PMML (for more on pre-processing and PMML, click here).

Model Deployment: Once a model exists, it can be easily uploaded in ADAPA which makes models available right away for execution via Web Services.

Model Execution: The task then is to extract data from your MySQL database, score it, and write the scored data back into the database. You can easily do that by using yet another open source tool: Jitterbit. It allows for data to be mapped from MySQL into a Web Service Call to ADAPA which returns the data back to Jitterbit and MySQL.

Process in Detail - Blog: We have described this process on a step-by-step basis here.

Process in Detail - Video
: We have also made a video describing this process. The YouTube version of this video can be accessed below, but we highly recommend the high-definition version of it.

Scoring your MySQL data just became a lot easier!

Wednesday, July 8, 2009

KDD 2009 Panel Report: Open Standards and Cloud Computing

Leading Experts Debate Emerging Trends for Predictive Analytics and Data Mining.

At KDD 2009 in Paris, the leading conference on Knowledge Discovery and Data Mining, a panel of experts discussed various topics related to open standards and cloud computing, with a particular focus on the practical use of statistical algorithms, reliable production deployment of models and the integration of predictive analytics within other systems.

Moderated by Zementis, the panel was comprised of a distinguished group of thought leaders representing key software vendors in the data mining industry including DMG / Open Data Group, IBM, KNIME, KXEN, Microstrategy, Pervasive, SAS and SPSS.

The first major focus of the discussion was the Predictive Model Markup Language (PMML). All vendors on the panel strongly support PMML, the de-facto standard for model exchange. It was evident that all panel members champion the PMML standard and will continue to actively improve features and usability through their products. Addressing enhanced compatibility among vendors, the DMG and Zementis now offer a comprehensive PMML converter to check, validate, and convert PMML models. The panel also coincided with the general release announcement of PMML 4.0, the latest version of the standard.

Turning towards the emerging trend of Cloud Computing, it was evident that all vendors are actively investigating how to leverage the cloud most effectively for predictive analytics and data mining. Several vendors already provide cloud-based solutions, either on a public cloud infrastructure like Amazon EC2 or their own data center.

PMML and Cloud Computing are a reality and available today! There was no doubt that PMML as a standard has been accepted and has evolved into a valuable foundation for the predictive analytics industry. Cloud Computing will deliver additional benefits for various data mining solutions, either through a private or a public cloud infrastructure depending on the nature of the application.

For a more detailed summary of the panel, please review the KDD 2009 Panel Report which summarizes questions and answers from the discussion.

Copyright © 2009-2014 Zementis Incorporated. All rights reserved.

Privacy - Terms Of Use - Contact Us