Data mining scientists work hard to analyze historical data and to build the best predictive solutions out of it. IT engineers, on the other hand, are usually responsible for bringing these solutions to life, by recoding them into a format suitable for operational deployment. Given that data mining scientists and engineers tend to inhabit different information worlds, the process of moving a predictive solution from the scientist's desktop to the operational environment can get lost in translation and take months. The advent of data mining specific open standards such as the Predictive Model Markup Language (PMML) has turned this view upside down: the deployment of models can now be achieved by the same team who builds them, in a matter of minutes.
In this talk to the ACM Data Mining Group, given at the LinkedIn auditorium in Sunnyvale, Dr. Alex Guazzelli not only provides the business rationale behind PMML, but also describes its main components. Besides being able to describe the most common modeling techniques, as of version 4.0, released in 2009, PMML is also capable of handling complex pre-processing tasks. As of version 4.1, released in December 2011, PMML has also incorporated complex post-processing to its structure as well as the ability to represent model ensemble, segmentation, chaining, and composition within a single language element. This combined representation power, in which an entire predictive solution (from pre-processing to model(s) to post-processing) can be represented in a single PMML file, attests to the language's refinement and maturity.