Monday, March 10, 2014

The Data Mining Group releases PMML version 4.2 - PRESS RELEASE

Chicago, IL 2/25/2014 – The Data Mining Group announced today the release of PMML v 4.2. PMML is an application and system independent XML interchange format for statistical and data mining models. The goal of the PMML standard is to encapsulate a model independent of applications or systems in such a way that two different applications (the PMML Producer and Consumer) can use it.

“As a standard, PMML provides the glue to unify data science and operational IT. With one common process and standard, PMML is the missing piece for Big Data initiatives to enable rapid deployment of data mining models. Broad vendor support and rapid customer adoption demonstrates that PMML delivers on its promise to reduce cost, complexity and risk of predictive analytics,” says Alex Guazzelli, Vice President of Analytics, Zementis. “You can not build and deploy predictive models over big data without using multiple models and no one should build multiple models without PMML,” says Bob Grossman, Founder and Partner at Open Data Group.

Some of the elements that are new to PMML v4.2 include:

  • Improved support for post-processing, model types, and model elements
  • A completely new element for text miningScorecards now introduce the ability to compute points based on expressions
  • New built-in functions, including “matches” and “replace” for the use of regular expressions 

“With PMML our customers and partners are able to drive real value from their predictive models right away, using the open standard,” said Andrew Flint, Senior Director of Product Management at FICO (NYSE: FICO). “Models built in most commercial or open source data mining tools, such as FICO Model Builder or R, can be instantly deployed in the FICO Analytic Cloud using PMML. The net result is quicker time to innovation and value on analytic applications, and the ability to combine the power of standards-based predictive analytics with the scalability of cloud computing.”

Radhika Kulkarni, Vice President of Advanced Analytics at SAS notes, “SAS continues to support the analytic collaboration that PMML provides to users. The recent release of SAS Enterprise Miner 13.1 provides users the ability to not only consume PMML from Open Source R models, but also produce PMML, which can be consumed by other applications. SAS Model Manager enables users to consume and manage R and PMML models as part of the SAS ecosystem. Sharing analytic models is paramount to the analytic lifecycle.”

“We are extremely excited to continue our long-running PMML support to PMML 4.2,” said Scott Cappiello, Vice President, Program Management, MicroStrategy Incorporated. "As a firm believer in providing the maximum analytic flexibility to organizations, PMML provides significant advantage in folding in analytics beyond our native and open source R capabilities, to provide business users the full range of business analytics in the big data age”.

"We are happy to see PMML's impact continuing to grow and will keep being among the first to integrate new PMML features into KNIME. Starting with our summer release KNIME will also support Naive Bayes Models and we will keep adding to its PMML Preprocessing abilities as well," says Kilian Thiel of KNIME.

About PMML:
PMML is the leading standard for statistical and data mining models and supported by over 20 vendors and organizations. With PMML, it is straightforward to develop a model on one system using one application and deploy the model on another system using another application.

About DMG:
The Data Mining Group (DMG) is an independent, vendor led consortium that develops data mining standards, such as the Predictive Model Markup Language (PMML). DMG members include: IBM, MicroStrategy, SAS, Experian, Pervasive Software, Zementis, Equifax, FICO, KNIME, NASA, Open Data Group, Rapid-I, Togaware, and Visa.

For more information about the Data Mining Group and the PMML standard, go to: www.dmg.org

PMML 4.2 is here! What is new? What changed?

PMML 4.2 is out! That's really great. The DMG (Data Mining Group) has been working on this new version of PMML for over two years now. And, I can truly say, it is the best PMML ever! If you haven't seen the press release for the new version, please see posting below:

http://www.kdnuggets.com/2014/02/data-mining-group-pmml-v42-predictive-modeling-standard.html

What changed?


PMML is a very mature language. And so, there isn't really dramatic changes in the language at this point. One noteworthy change is that old PMML used to call the target field on a predictive model "predicted". This was confusing since a predicted field is usually the result of scoring or executing a model. The score so to speak. Well, PMML 4.2 clears things up a bit. The target field is now simply "target". A small change, but a huge step towards making it clear that the Output element is where the predicted outputs should be defined.

Continuous Inputs for Naive Bayes Models


This is a great new enhancement to the NaiveBayes model element. We wrote an entire paper about this new feature and presented it at the KDD 2013 PMML Workshop. If you use Naive Bayes models, you should definitely take a look at our article.


And, now you can benefit from actually having our proposed changes in PMML itself! This is really remarkable and we are all already benefiting from it. The Zementis Py2PMML (Python to PMML) Converter uses the proposed changes to convert Gaussian Naive Bayes models from scikit-learn to PMML.


Complex Point Allocation for Scorecards


The Scorecard model element was introduced to PMML in version 4.1. It was a good element then, but it is really great now in PMML 4.2. We added to it a way for computing complex values for the allocation of points for an attribute (under a certain characteristic) through the use of expressions. That means, you can use input or derived values to derive the actual value for the points. Very cool! 

Andy Flint (FICO) and I wrote a paper about the Scorecard element for the KDD 2011 PMML Workshop. So, if you haven't seen it yet, it will get you started into how to use PMML to represent scorecards and reason codes.


Revised Output Element


The output element was completely revised. It is much simpler to use. With PMML 4.2, you have direct access to all the model outputs + all post-processing directly from the attribute "feature".

The attribute segmentId also allows users to output particular fields from segments in a multiple model scenario. 

The newly revised output element spells flexibility. It allows you to get what you need out of your predictive solutions.

For a complete list of all the changes in PMML 4.2 (small and large), see:


What is new? Text Mining!


PMML 4.2 introduces the use of regular expressions to PMML. This is solely so that users can process text more efficiently. The most straightforward additions are simple: 3 new built-in functions for concatenating, replacing and matching strings using regular expressions.

The more elaborate addition is the incorporation of a brand new transformation element in PMML to extract term frequencies from text. The ideas for this element were presented at the KDD 2013 PMML Workshop by Benjamin De Boe, Misha Bouzinier, Dirk Van Hyfte (InterSystems). Their paper is a great resource for finding out the details behind the ideas that led to the new text mining element in PMML. 


Obviously, the changes described above are also new, but it was nice to break the news into two pieces. For the grand-finale though, nothing better than taking a look at PMML 4.2 itself. 


Enjoy!





Copyright © 2009-2014 Zementis Incorporated. All rights reserved.

Privacy - Terms Of Use - Contact Us