Monday, June 30, 2014

Zementis presents at useR! 2014 - Happening now at UCLA


useR! 2014 is happening now at UCLA. For more information, see: http://user2014.stat.ucla.edu/

The useR! conference is the main gathering of R users and experts in the planet. It features invited talks, tutorials, presentations and posters. This year, Zementis is giving a presentation on Model Ensembles and PMML. It will take place on Tuesday (July 1st) at 4 PM PST.


For the abstract of our presentation, please refer to: http://user2014.stat.ucla.edu/abstracts/talks/112_Jena.pdf

Zementis will also be presenting a poster on Tuesday at 5:30 PM PST. This poster will showcase the pmmlTransformations package. For the abstract of our poster presentation, please refer to: http://user2014.stat.ucla.edu/abstracts/posters/113_Jena.pdf

PMML, the Predictive Model Markup Language, is the perfect vehicle for the deployment of predictive analytics. It is imperative for the deployment of model ensembles such as Random Forest Models, which are usually composed by hundreds if not thousands of decision trees. PMML is supported in R via the pmml and pmmlTransformations packages. For a detail description of these packages, please refer to:
https://support.zementis.com/entries/21197842-PMML-Export-Functionality-in-R-Supported-Packages

Thursday, June 19, 2014

Introducing Py2PMML (Python to PMML)

The Zementis Python to PMML Converter (Py2PMML) provides you with an easy to use interface to translate your Python-generated machine learning models into PMML, the Predictive Model Markup Language standard. In particular, it allows for models built using scikit-learn to be consumed by Zementis ADAPA and UPPI scoring engines.

Once translated into PMML, models can be easily deployed and scored against new incoming data. For example, models can be deployed in ADAPA for real-time scoring or UPPI for big data scoring in-database or Hadoop.

How does it work?


Easy! Once you build your model using the scikit-learn library, all you need to do is write out a .txt file containing the model's parameters. The .txt file needs to follow a strict order and contain all the required information. This is the file used by Py2PMML to generate the corresponding PMML file for your model. With the PMML file in hand, you can simply deploy it in ADAPA for real-time scoring or UPPI for big data scoring.



What are the supported model types?


As of now, the supported scikit-learn predictive modeling classes are:

Supported pre-processing classes are (contact us for details):

  • Class MinMaxScalerStandardizes features by scaling each feature to a given range
  • Class OneHotEnconder - Creates dummy continuous variables out of categorical variables
  • Missing Value Replacement
To learn exactly how each .txt file needs to be generated so that Py2PMML can do its job, please take a look at the specific posting for the particular model type you are interested in converting to PMML.

References


Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.

Monday, June 9, 2014

Online PMML Course @ UCSD Extension: Register today!

The Predictive Model Markup Language (PMML) standard is touted as the standard for predictive analytics and data mining models. It is allows for predictive models built in one application to be moved to another without any re-coding. PMML has become the imperative for companies wanting to extract value and insight from Big Data. In the Big Data era, the agile deployment of predictive models is imperative. Given the volume and velocity associated with Big Data, one cannot spend weeks or months re-coding a predictive model into the IT operational environment where it actually produces value (the fourth V in Big Data).

Also, as predictive models become more complex through the use of random forest models, model ensembles, and deep learning neural networks, PMML becomes even more relevant since model recoding is simply not an option.

Zementis has paired up with UCSD Extension to offer the first online PMML course. This is a great opportunity for individuals and companies alike to master PMML so that they can muster their predictive analytics resources around a single standard and in doing so, benefit from all it can offer.

http://extension.ucsd.edu/studyarea/index.cfm?vAction=singleCourse&vCourse=CSE-41184

Course Benefits
  • Learn how to represent an entire data mining solution using open-standards
  • Understand how to use PMML effectively as a vehicle for model logging, versioning and deployment
  • Identify and correct issues with PMML code as well as add missing computations to auto-generated PMML code

Course Dates

07/14/14 - 08/25/14

PMML is supported by most commercial and open-source data mining tools. Companies and tools that support PMML include IBM SPSS, SAS, R, SAP KXEN, Zementis, KNIME, RapidMiner, FICO, StatSoft, Angoss, Microstrategy ... The standard itself is very mature and its latest release is version 4.2.

For more details about PMML, please visit the Zementis PMML Resources page.







Copyright © 2009-2014 Zementis Incorporated. All rights reserved.

Privacy - Terms Of Use - Contact Us