Tuesday, December 30, 2014

Scoring data with ADAPA using Pentaho Data Integration

Predictive model integration for MySQL, Microsoft SQL Server, Oracle and PostgreSQL


The main use of predictive models is to generate predictions for new data. This data frequently resides in databases like MySQL, and the ADAPA scoring engine needs a way to easily access it. One way of accomplishing this is by using the Pentaho Data Integration (PDI) tool, and in this post we outline how to score data from relational databases using the ADAPA REST API and PDI.

PDI provides an easy to use point-and-click interface to manage the whole workflow: retrieving the data, scoring it through ADAPA, and saving the results elsewhere. It is possible to use PDI to read and write to different databases, including MySQL, Microsoft SQL Server, Oracle, PostgreSQL, and others. PDI can also act as a client to the ADAPA Scoring Engine by leveraging the ADAPA REST API, and take care of transforming the data into necessary formats - JSON and URL in this case.

Prior to starting, we assume that:
  • PDI is installed
  • Data to be scored is stored in either MySQL, Microsoft SQL Server, Oracle or PostgreSQL
  • A PMML model for the data is deployed and available through the ADAPA REST API.

The process is built and executed in PDI. The transformation should consist of the following steps:
  • Retrieve data from the database
  • Transform to a JSON object
  • Convert the JSON object to a URL as a method to transmit it
  • Send URL to ADAPA through REST API
  • Capture ADAPA output
  • Write the scoring result back to a flat file

For detailed step-by-step instructions using a neural network model deployed in ADAPA, please review the following videos:


Monday, December 29, 2014

Using MySQL as a Client to the ADAPA Scoring Engine

Predictive analytics scoring with MySQL and ADAPA


In this blog post, we outline how to use a MySQL database as a client to the ADAPA Scoring Engine by leveraging the ADAPA REST API to execute a predictive analytics model based on the Predictive Model Markup Language (PMML) industry standard.

We assume that:
  • MySQL and cURL are installed
  • Necessary MySQL tables are already created
  • A PMML model for the data is deployed and available through the ADAPA
  • REST API.
One option to make API calls from MySQL is by using the MySQL-UDF-HTTP package, which enables creation of user defined functions for HTTP REST Operations in a database. This package is available on Google Code and will be installed on top of MySQL. We can leverage the User Defined Functions (UDFs) created with this package to make REST API calls to ADAPA from MySQL. Specifically, we use HTTP GET requests to the ADAPA engine to score one record at a time. An advantage of using these functions is that we can easily write the scores back to the database.

In addition, the scoring process can be automated with database triggers. Triggers automatically execute database queries when specified events occur. In this case, we can write functions to score and update or insert records, and set triggers to execute these functions on update and insert events. The HTTP UDF is called by the scoring function to send a GET request to the ADAPA REST API.

Simply using SQL and UDFs, the above enables us to easily execute complex predictive analytics models directly from one of the most commonly used databases, score the records, and write the results back into a database table.

A step-by-step tutorial, including installing MySQL-UDF-HTTP and writing functions and triggers, is available in this video.

Tuesday, December 23, 2014

451 Research Initiates Coverage of Zementis

2014 has been a busy year of growth for Zementis, as our customers, business partners and employees have noticed. Industry analysts have also noticed, with two leading research firms initiating coverage of Zementis in 2014. In May, Gartner included Zementis on its short list of “Cool Vendors in Data Science”, and just this week, 451 Research published its first report on Zementis.

451 Research is well known in the technology sector as a highly respected research and advisory company, and is especially noted for its coverage of emerging technologies and the companies that bring these technologies to market. Within the Enterprise Software sector, Krishna Roy leads 451’s analytical efforts in the realm of business applications and software infrastructure, including big data analytics and predictive analytics. She has a longstanding background in technology journalism, and covers the big data analytics segment extensively. 451’s subscribers will find an amazing number of insightful reports on big data and predictive analytics that bear her byline.

The universe of companies in her analytical portfolio numbers more than 100, affording her a fantastic vantage point from which to study the technologies and market dynamics that shape the competitive landscape and define the market’s evolutionary trajectory. This breadth of coverage also makes her schedule extremely full. Zementis is honored that she has chosen to devote some of her scarce time to studying our company and imparting her perspectives and insights.

To access the report, click on the link below:

Impact Report - December 22, 2014


Wednesday, November 12, 2014

IBM and Zementis Release White Paper: "Enhancing Predictive Analytics"

Continuing their longstanding partnership, Zementis and IBM recently released a white paper that details how organizations can improve their business agility through predictive analytics. The paper describes some of the key benefits that organizations can derive from applying predictive analytics to key decisions, outlines some of the operational and technical challenges that organizations commonly face in this effort, and showcases the capabilities that IBM and Zementis make possible to unlock the full potential of big data through predictive analytics.

Together, Zementis and IBM help enterprises overcome many challenges associated with their big data efforts, simplifying and accelerating the deployment of predictive models and making possible large-scale analytics that once seemed impractical to execute.

Zementis' UPPITM solution is integrated with several of IBM's flagship big data analytics platforms, including IBM PureDataTM System for Analytics, powered by Netezza® technology, and IBM InfoSphere® BigInsightsTM software. In each case, the joint IBM/Zementis solution helps companies deploy, execute and integrate scalable, standards-based predictive analytics. UPPI extends the in-database and Hadoop-based predictive analytics capabilities of these IBM platforms through the use of Hive, a data warehouse system for Hadoop.

The white paper describes the benefits of Zementis' open standards approach to predictive analytics, as well as the technical capabilities of the joint solutions with IBM and the tangible benefits that organizations can realize by making IBM and Zementis a foundational element of their big data analytics strategy and architecture.

Highlights of the joint solution include:
  • Enables near real-time predictive model deployment through a universal, flexible approach
  • Reduces cost and complexity of deploying and utilizing predictive analytics for big data
  • Delivers standards-based execution of predictive analytics for in-database scoring
  • Accelerates time-to-market for enhancing intelligent decision making via predictive data
  • Supports highly dynamic and complex data environments with massively parallel processing
  • Extends the analytics functionality and business value of robust IBM platforms
Download the white paper

Thursday, November 6, 2014

Microsoft and Zementis Announce ADAPA for Azure

On October 28, Microsoft and Zementis unveiled the culmination of a collaborative effort that had begun many months before. Zementis' real-time predictive analytics decision engine, ADAPA®, became officially certified on Azure - Microsoft's dynamic and innovative cloud platform.


Microsoft Azure offers enterprise users a powerful collection of integrated services - compute, storage, data, networking, and applications - and is the only major cloud platform ranked by Gartner as an industry leader for both infrastructure-as-a-service (IaaS) and platform-as-a-service (PaaS).

With ADAPA on Azure, organizations can develop predictive models using most open source and commercial data mining tools and deploy machine learning models rapidly to generate predictive insights in real-time. New and existing Zementis customers can now take advantage of the "pay per use" model with Azure to reduce infrastructure costs and total cost of ownership.

Microsoft's Azure Marketplace makes it easy to find, purchase and launch ADAPA. Configuration is easy, and organizations can be up and running quickly with predictive analytics that leverage the efficiency, scalability, security and performance of Microsoft's enterprise-grade cloud environment.
 
"With a focus on rapid deployment and integration of predictive algorithms through open standards, Zementis embraces the cloud," said Garth Fort, General Manager of Enterprise Partners, Microsoft. "ADAPA allows customers to take advantage of the compute resources in Azure to support predictive analytics solutions and quickly scale capacity as computing requirements change, paying only for the resources used."

To purchase and deploy ADAPA from the Azure Marketplace today, please visit Microsoft Azure

Wednesday, October 1, 2014

Zementis/Teradata Whitepaper: Massively Parallel In-database Predictions with PMML


Zementis and Teradata have teamed up to make available to you a whitepaper which not only discusses the benefits of in-database scoring using UPPI for Teradata/Aster but also shares performance numbers that will blow you away! Enjoy!

DOWNLOAD WHITEPAPER

Abstract

Open standards enable interoperability and portability across systems and solutions. Such a level of flexibility creates new opportunities for addressing exceedingly demanding business 
agility and performance requirements. The Predictive Model Markup Language (PMML) is the embodiment of an open standard and delivers such benefits in the world of data mining and predictive analytics. This means that models developed in any environment and tool set can be deployed and used in a completely different system. 

In the context of Big Data, the urgent need to apply the power of predictive analytics to derive reliable predictions-and, hence, business decisions-from vast amounts of data collected by 
many organizations is a key requirement. In this paper, we discuss how the PMML standard enables embedding advanced predictive models directly into the database or the data warehouse, alongside the actual data to be scored. More importantly, we show how we can easily take advantage of a highly parallel database architecture to efficiently derive predictions from very large volumes of data.

DOWNLOAD WHITEPAPER

Monday, August 25, 2014

Hortonworks/Zementis Webinar: Hadoop’s Advantages for Machine Learning and Predictive Analytics

Please join Ofer Mendelevitch, Director of Data Science of Hortonworks and Michael Zeller, Founder and CEO of Zementis as they present key learnings as to what drives successful implementations of big data analytics projects. Their knowledge comes from working with dozens of companies from small cloud-based start-ups to some of the largest companies in the world.

When: Wednesday, September 10, 2014 at 10 am PST / 1 pm EST

REGISTRATION

Hortonworks will present their approach to using Apache Hadoop for predictive models with big data, and the benefits of Hadoop to data scientists. Zementis will demonstrate how to quickly deploy, execute, and optimize predictive models from open source machine learning tools like R and Python as well as commercial data mining vendors like IBM, SAP and SAS.

Zementis leverages the PMML open industry standard (Predictive Model Markup Language) providing a higher ROI for Big Data and predictive analytics initiatives. At the same time reducing IT costs, and improving the quality of predictive model management while requiring no change in how data science teams do their day-to-day work.

Whether your company is just beginning to work with predictive analytics or has an experienced data science team this webinar will provide valuable insights on how to move predictive models into an operational environment based on Hadoop and Hive and using open industry standards while eliminating the custom coding and delays typically associated with these projects. Please join us for this exciting presentation and discussion.

REGISTRATION

Wednesday, August 20, 2014

Zementis Sponsors SIGKDD 2014 Test of Time Award

The SIGKDD Test of Time Award recognizes outstanding papers from past KDD conferences beyond the last decade that have had an important impact on the data mining research community.  SIGKDD is the ACM Special Interest Group for Knowledge Discovery and Data Mining.  Not only since the advent of “Big Data”, but for 20 years, the annual SIGKDD conference has been the leading global forum for data scientists and practitioners from academia, industry and government to disseminate cutting edge research results and to demonstrate innovative applications.


It is our privilege to support the SIGKDD 2014 Test of Time Award as it recognizes influential contributions published in KDD conference proceedings which have had a substantial impact on data science.  Selected by a committee of leading scientists and supported by thousands of citations since their original publication, one could almost call it the “Nobel Prize in Data Science.”





The following three papers were selected by the award committee to receive the inaugural award:

  • A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise [KDD 1996]
  • Integrating Classification and Association Rule Mining [KDD 1998]
  • Maximizing the Spread of Influence through a Social Network [KDD 2003]
For abstracts and additional details, please see the SIGKDD web site blog.

Please join us at KDD 2014 in New York City, August 24-27, to celebrate the winners at an interdisciplinary event which will bring together researchers and practitioners from data science, data mining, knowledge discovery, large-scale data analytics, and big data.

Monday, August 18, 2014

UCSD's John Freeman interview with Alex Guazzelli, Zementis CTO: Predictive Analytics, Big Data, and PMML


Dr. Alex Guazzelli, Zementis CTO, has been extra busy lately by teaching a class at UCSD Extension entitled "Predictive Models with PMML". As the 6-week course is nearing its end for this Summer quarter, Dr. Guazzelli was invited by John Freeman, Director of Communications for UCSD Extension, for an interview  on UCTV's Career Talk to discuss Predictive Analytics, Big Data and PMML.

The interview itself was broadcast last week and it is now AVAILABLE ONLINE.

Tuesday, July 22, 2014

Alpine Data Labs/Zementis Webinar - Breaking Down Barriers for Predictive Analytics

If you are interested in learning about how PMML, the Predictive Model Markup Language standard, is being used by Alpine Data Labs and Zementis to instantly move predictive analytic models from the scientist's desktop into the IT operational environment, be sure to join us for the upcoming Alpine Data Labs/Zementis webinar on July 30th, featuring Steven Hillion, CPO of Alpine Data Labs, and Michael Zeller, CEO of Zementis.

Register here!



This will be an engaging, fast-paced and informative presentation and discussion of the latest tools and trends in predictive analytics. The webinar will include a demo of the PMML capabilities in Alpine Data Labs Chorus 4.0 and instant deployment of predictive models via Zementis solutions.

Title:

Breaking Down Barriers for Predictive Analytics


When?

Wednesday, July 30 2014 at 1 pm ET / noon CT / 11 am MT / 10 am PT / 5 pm GMT

Register here!

Thursday, July 17, 2014

ADAPA with PMML 4.2 support now available on the AWS Marketplace

Zementis has been offering its ADAPA decision engine as a service on the Amazon Cloud for a few years now. With ADAPA on the Amazon Cloud, companies all over the world benefit from fast deployment and execution of predictive analytics via Web-services and PMML, the Predictive Model Markup Language. You can even launch your own ADAPA instance in the cloud through the AWS Marketplace with a single click.



ADAPA and its sister product, the Universal PMML Plug-in (UPPI) are PMML-based scoring engines. That is, they can consume predictive models built in any data mining tool as long as the model is represented in PMML, the Predictive Model Markup Language standard. PMML is supported by most commercial and open-source data mining tools, including FICO, IBM SPSS, KNIME, RapidMiner, R, SAS, and SAP. With PMML, one can simply move a predictive model from the scientist's desktop where it was built to the IT operational environment with no need for custom code.

Zementis was the first company to announce compatibility with PMML 4.2, the latest version of the PMML standard. PMML 4.2 introduces extensive text mining capabilities into the standard and now Zementis is bringing these exciting new PMML features to its AWS customers. Learn about all the new cool features introduced in PMML 4.2.

It is really super simple to deploy and score your models using ADAPA. And now, with PMML 4.2 support on the Amazon Cloud, predictive analytics as a service has just become amazingly powerful.

Visit the Zementis website for details

Monday, June 30, 2014

Zementis presents at useR! 2014 - Happening now at UCLA


useR! 2014 is happening now at UCLA. For more information, see: http://user2014.stat.ucla.edu/

The useR! conference is the main gathering of R users and experts in the planet. It features invited talks, tutorials, presentations and posters. This year, Zementis is giving a presentation on Model Ensembles and PMML. It will take place on Tuesday (July 1st) at 4 PM PST.


For the abstract of our presentation, please refer to: http://user2014.stat.ucla.edu/abstracts/talks/112_Jena.pdf

Zementis will also be presenting a poster on Tuesday at 5:30 PM PST. This poster will showcase the pmmlTransformations package. For the abstract of our poster presentation, please refer to: http://user2014.stat.ucla.edu/abstracts/posters/113_Jena.pdf

PMML, the Predictive Model Markup Language, is the perfect vehicle for the deployment of predictive analytics. It is imperative for the deployment of model ensembles such as Random Forest Models, which are usually composed by hundreds if not thousands of decision trees. PMML is supported in R via the pmml and pmmlTransformations packages. For a detail description of these packages, please refer to:
https://support.zementis.com/entries/21197842-PMML-Export-Functionality-in-R-Supported-Packages

Thursday, June 19, 2014

Introducing Py2PMML (Python to PMML)

The Zementis Python to PMML Converter (Py2PMML) provides you with an easy to use interface to translate your Python-generated machine learning models into PMML, the Predictive Model Markup Language standard. In particular, it allows for models built using scikit-learn to be consumed by Zementis ADAPA and UPPI scoring engines.

Once translated into PMML, models can be easily deployed and scored against new incoming data. For example, models can be deployed in ADAPA for real-time scoring or UPPI for big data scoring in-database or Hadoop.

How does it work?


Easy! Once you build your model using the scikit-learn library, all you need to do is write out a .txt file containing the model's parameters. The .txt file needs to follow a strict order and contain all the required information. This is the file used by Py2PMML to generate the corresponding PMML file for your model. With the PMML file in hand, you can simply deploy it in ADAPA for real-time scoring or UPPI for big data scoring.



What are the supported model types?


As of now, the supported scikit-learn predictive modeling classes are:

Supported pre-processing classes are (contact us for details):

  • Class MinMaxScalerStandardizes features by scaling each feature to a given range
  • Class OneHotEnconder - Creates dummy continuous variables out of categorical variables
  • Missing Value Replacement
To learn exactly how each .txt file needs to be generated so that Py2PMML can do its job, please take a look at the specific posting for the particular model type you are interested in converting to PMML.

References


Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.

Monday, June 9, 2014

Online PMML Course @ UCSD Extension: Register today!

The Predictive Model Markup Language (PMML) standard is touted as the standard for predictive analytics and data mining models. It is allows for predictive models built in one application to be moved to another without any re-coding. PMML has become the imperative for companies wanting to extract value and insight from Big Data. In the Big Data era, the agile deployment of predictive models is imperative. Given the volume and velocity associated with Big Data, one cannot spend weeks or months re-coding a predictive model into the IT operational environment where it actually produces value (the fourth V in Big Data).

Also, as predictive models become more complex through the use of random forest models, model ensembles, and deep learning neural networks, PMML becomes even more relevant since model recoding is simply not an option.

Zementis has paired up with UCSD Extension to offer the first online PMML course. This is a great opportunity for individuals and companies alike to master PMML so that they can muster their predictive analytics resources around a single standard and in doing so, benefit from all it can offer.

http://extension.ucsd.edu/studyarea/index.cfm?vAction=singleCourse&vCourse=CSE-41184

Course Benefits
  • Learn how to represent an entire data mining solution using open-standards
  • Understand how to use PMML effectively as a vehicle for model logging, versioning and deployment
  • Identify and correct issues with PMML code as well as add missing computations to auto-generated PMML code

Course Dates

07/14/14 - 08/25/14

PMML is supported by most commercial and open-source data mining tools. Companies and tools that support PMML include IBM SPSS, SAS, R, SAP KXEN, Zementis, KNIME, RapidMiner, FICO, StatSoft, Angoss, Microstrategy ... The standard itself is very mature and its latest release is version 4.2.

For more details about PMML, please visit the Zementis PMML Resources page.


Thursday, May 29, 2014

Zementis is a finalist for the SAP 2014 Startup Focus Award

Zementis is proud to be a finalist for the SAP 2014 Startup Focus Award for the Most Innovative company category. The list of all finalists has just been announced.

http://www.saphana.com/community/learn/startups/news-views/blog/2014/05/28/2014-startup-focus-award-finalists

Customers are increasingly facing the challenge of implementing more intelligent real-time decisions within the context of big data. Business insights are critical for making intelligent business decisions, and these insights often lie buried in massive volumes of fast-changing and increasingly varied data. Predictive analytics based on statistical algorithms and machine learning can reveal these insights.

Once an organization’s data science team has developed predictive models, the team must then collaborate with the internal IT organization to deploy those models so that business users can incorporate predictive analytics into their decision making. For a data-driven enterprise, the agile deployment, integration and execution of predictive models has become an essential strategic capability.
Zementis and SAP have partnered to deliver this capability to enterprises and enable consistent, accurate predictive analytics as an operational capability, at scale. Our joint solution combines Zementis ADAPA, a scoring engine for predictive analytics, with SAP HANA, the premier platform for in-memory computing.

As a joint solution, ADAPA for SAP HANA represents a universal platform for the operational deployment and execution of predictive analytics. It delivers:
  • Real-time scoring through HANA
  • Superior performance via super-fast in-memory processing
  • High scalability, to support dynamic computing requirements associated with real-time big data
  • Inherent flexibility to support complex computations irrespective of predictive model type or data volume

With ADAPA for SAP HANA, organizations become agile consumers of big data for predictive analytics. Zementis and SAP have removed the complexity of this critical business capability, freeing organizations to focus on developing the best possible predictive models and using those models to make the most intelligent business decisions.

For more details about ADAPA for SAP HANA please contact Zementis, or download the ADAPA for SAP HANA data sheet and watch a demo video. It shows how Zementis and SAP have overcome the challenge of fraud detection in e-commerce through the use of predictive analytics.

Wednesday, May 28, 2014

Creating, Modifying, Deploying and Scoring Predictive Models with PMML, ADAPA and KNIME

Zementis and KNIME co-presented a webinar on creating, modifying, deploying and scoring predictive models using PMML. The webinar is now available for viewing on-demand (see below). It starts with Iris Adae from KNIME giving an overview of PMML, the Predictive Model Markup Language standard as well as the extensive support KNIME offers for PMML. PMML is the de facto standard for predictive analytics and can be produced by KNIME for a number of modeling techniques as well as data pre-processing nodes/computations.

Iris' presentation and demo are then followed by Alex Guazzelli from Zementis who shows how easy it is for anyone to benefit from models built in KNIME or R and deployed in the Zementis ADAPA Scoring Engine for execution. Once uploaded in ADAPA, models are available for scoring via web-services (SOAP or REST). KNIME can then be used to connect to a database, read in data and pass it through ADAPA for scoring via the REST API.

 On-demand webinar (available on YouTube):

 

The idea presented in this webinar is to show how easy one can easily move a predictive model from the scientist's desktop to the IT operational environment. When training a model, scientists rely on historical data, but when using the model on a regular basis, the model is moved or deployed in production where it presented with new data. ADAPA provides a scalable and lightning fast scoring engine for models that live in production. And, although KNIME data mining nodes are typically used by scientists to build models, its database and REST nodes as well as PMML-enabled nodes can simply be used to create a flow for passing models and data for scoring in ADAPA. 

Use-cases discussed are:

  • Read data from a flat file, use KNIME for data pre-processing and building of a neural network model. Export the entire predictive workflow as a PMML file and then take this PMML file and upload and score it in ADAPA via its Admin Web Console. 
  • Read data from a database (MySQL, SQLServer, Oracle, ...), build model in KNIME, export model as a PMML file and deploy it in ADAPA using its REST API. This use-case also shows new or testing data flowing from the database and into ADAPA for scoring via a sequence of KNIME nodes. 
  • The video also shows a case in which one can use KNIME nodes to simply read a PMML file produced in any PMML-compliant data mining tool (R, SAS EM, SPSS, ...), upload it in ADAPA using the REST API and score new data from MySQL in ADAPA also through the REST interface. Note that in this case, the model has already been trained and we are just using KNIME to deploy the existing PMML file in ADAPA for scoring. 

 To watch the Zementis discussed use-cases only, watch:

Tuesday, May 27, 2014

Zementis and SAP: HANA Marketplace, SAP Blog - Interview, Big Data Bus

The Zementis partnership with SAP is manifesting itself in a number of ways. This week, we would like to share with you three new developments.

1) ADAPA is not being offered at the SAP HANA Marketplace.

2) An interview with our CEO, Mike Zeller, was just featured by SAP on the SAP Blogs.


3) Zementis was again part of the SAP Big Data Bus and the "Big Data Theatre". This time, the bus was parked outside US Bank in Englewood, Colorado. We were engaged in a myriad of conversations with the many people that came through the bus about how ADAPA and SAP HANA work together to bring predictive analytics and real-time scoring to transactional data and millions of accounts, in any industry.

Visit the Zementis ADAPA for SAP HANA page for more details on the Zementis and SAP real-time solution for predictive analytics.




Thursday, May 15, 2014

Transforming R to PMML: Zementis Presentation to the Bay Area R Users Group

The Zementis team was honored to give a presentation this week (May 12) to the Bay Area R Users Group
Our talk addressed how to convert predictive models developed in R to PMML, the Predictive Model Markup Language standard. We described the pmml and pmmlTransformations packages (see details below) and discussed the benefits of doing so which include:
  • Overcoming R's memory and speed limitations 
  • Deploying predictive models built in R in minutes, not months
  • Making many predictive models operational at once
  • Using PMML multiple models element to deploy ensembles, segmentation, and chaining 
In our presentation, we also discussed how Zementis' technology not only enables models to work with RDMS and NOSQL databases but also how it enables real-time scoring against in-flight data.

R PMML Package


A PMML package for R that exports all kinds of predictive models is available directly from CRAN.
The pmml package offers support for the following data mining algorithms:
  • ksvm (kernlab): Support Vector Machines
  • nnet: Neural Networks
  • rpart: C&RT Decision Trees 
  • lm and glm (stats): Linear and Binary Logistic Regression Models 
  • arules: Association Rules
  • kmeans and hclust: Clustering Models
  • multinom (nnet): Multinomial Logistic Regression Models
  • glm (stats): Generalized Linear Models for classification and regression with a wide variety of link functions 
  • randomForest: Random Forest Models for classification and regression
  • coxph (survival): Cox Regression Models to calculate survival and stratified cumulative hazards
  • naiveBayes (e1071): Naive Bayes Classifiers
  • glmnet: Linear ElasticNet Regression Models
  • ada: Stochastic Boosting
  • svm (e1071): Support Vector Machines

The pmml package can also export data transformations built with the pmmlTransformations package (see below). It can also be used to merge two disctinct PMML files into one. For example, if transformations and model were saved into separate PMML files, it can combine both files into one, as described in Chapter 5 of the PMML book - PMML in Action.

How does it work?


Simple, once you build your model using any of the supported model types, pass the model object as an input parameter to the pmml function as shown in the figure below:



Example - sequence of R commands used to build a linear regression model using lm and the Iris dataset:


Documentation


For more on the pmml package, please take a look at the paper we published in The R Journal. For that, just follow the link below:
Also, make sure to check out the package's documentation from CRAN:

2) CRAN: pmml Package

R PMML Transformations Package


This is a brand new R package. Called pmmlTranformations, this package transforms data and when used in conjunction with the pmml package, it allows for data transformations to be exported together with the predictive model in a single PMML file. Transformations currently supported include:
  • Min-max normalization
  • Z-score normalization
  • Dummy-fication of categorical variables
  • Value Mapping
  • Discretization (binning)
  • Variable renaming

If you would like to contribute code to the pmmlTransformations package, please feel free to contact us.

How does it work?


The pmmlTransformations package works in tandem with the pmml package so that data pre-processing can be represented together with the model in the resulting PMML code. 

In R, as shown in the figure below, this process includes three steps:

  1. With the use of the pmmlTransformations package, transform the raw input data as appropriate
  2. Use transformed and raw data as inputs to the modeling function/package (hclust, nnet, glm, ...)
  3. Output the entire solution (data pre-processing + model) in PMML using the pmml package


Example - sequence of R commands used to build a linear regression model using lm with transformed data


Documentation


For more on the pmmlTransformations package, please take a look at the paper we wrote for the KDD 2013 PMML Workshop. For that, just follow the link below:
Also, make sure to check out the package's documentation from CRAN:






Copyright © 2009-2014 Zementis Incorporated. All rights reserved.

Privacy - Terms Of Use - Contact Us