Thursday, December 18, 2008
At Zementis, we have already tested the KNIME PMML export quite extensively and we are excited to report that it works well with ADAPA, our PMML scoring engine.
KNIME 2.0 currently exports models in PMML 3.1 which are automatically converted to the latest PMML format once uploaded in any of the Zementis scoring products.
In addition for the native import/export nodes for PMML, KNIME also provides access to an R node that allows the export of PMML. This node leverages the R PMML export library that Zementis has been supporting and which has been covered extensively in the ADAPA support blog.
The KNIME blog provides a detailed description about how to convert R models into PMML.
Wednesday, November 26, 2008
In 2008, we gave a talk (11/12/08) at the Forum on Analytics (sponsored by the San Diego Software Industry Council - SDSIC). The talk was entitled "Easy Expression and Execution of Data Mining Models through PMML". Please click here to see the presentation slides. The presentation transcripts follow below (after the abstract).
PMML (Predictive Model Markup Language) is an XML-based language used to define data mining models. It was specified by the Data Mining Group, an independent group of leading technology companies. By providing a uniform standard to represent predictive models, PMML allows for the exchange of predictive solutions between different applications and various vendors. Many statistical packages already support the PMML standard; these include, for example, SAS and SPSS. In an effort to broaden the scientific workbench available to data mining scientists and to support the open source community, Zementis recently contributed code to the R project. In particular, we implemented the export of neural network models built with the nnet R package as well as Support Vector Machines built with the ksvm R package. The same PMML exporter can also produce decision trees built with rpart and linear regression models built with lm. The PMML exporter package is currently available through CRAN (the Comprehensive R Archive Network).
All of the R exported PMML models are readily available to be uploaded into an execution engine for scoring or classification. For example, the ADAPA engine, which can be used for production deployment of PMML models, is currently available as a service in the Amazon Elastic Compute Cloud (Amazon EC2).
Our aim here is to show how one can quickly build a data mining model in R, such as a Support Vector Machine, and use the PMML exporter to produce a model file which can be uploaded and executed in a different application. We demonstrate how one can use data containing expected results to verify correct model deployment. If all computed and expected values match, the model can be considered ready for production, i.e. available for generating predictions on incoming data as part of an overall enterprise decision management strategy. From R to ADAPA, we use PMML as an effective way to express and execute data mining models.
Our work shows how PMML can be effectively used to allow for model exchange between different applications. Also, it highlights how one can benefit from an open-source statistical package such as R to easily export models into PMML and upload them into ADAPA, a light-weight scoring engine which consumes several PMML models. The ease of model expression and execution allows data mining scientists to concentrate on the important tasks: data analysis and model building. Real-time, scalable execution is handled through software tools which communicate through a common language, PMML.
Below, please find the transcripts of our talk, organized per slide.
Slide 1 - Title
My talk will be divided in 3 parts all of which are centered around Open Standards. I will start by talking about the Development of Predictive Models using R. I will then talk about Deployment and in doing so I will focus on PMML. Finally, I will talk about the real-time execution of PMML files.
So, R is our software of choice for this presentation, given that it is an open source and a GNU project. R is available for free over the internet. R allows for data manipulation, calculation, and graphical display. It provides a wide variety of statistical techniques and it is highly extensible.
But, how to export models out of R? Once you build your models in R, you can easily export them into PMML.
We recently contributed code to the R PMML package which can now export a variety of modeling techniques which include … to name a few:
Support Vector Machines
Great! But, what is PMML? PMML stands for Predictive Model Markup Language. It is an XML-based language which is the de facto standard for exchanging predictive models between compliant applications. For this reason, PMML avoids proprietary issues and incompatibilities
PMML provides a clear separation of tasks in which model deployment easily follows model development. In this way, PMML frees scientists to focus on model building. PMML eliminates the need for custom model deployment. In doing so, it ensures scalability and reliability.
PMML is a mature standard and is widely supported by the industry. It is developed and maintained by the Data Mining Group which is a vendor independent consortium with several major supporters including IBM, Oracle, Microsoft, SAS, SPSS, Fair Isaac and Zementis.
A single PMML file can be used to represent data transformations and well as the model itself. In doing so, PMML brings data transformations and statistical models together.
PMML allows for the definition of a data dictionary which is used to define all the raw data fields coming into the model, including missing value strategy and outlier treatment.
Several data transformations can also be expressed in PMML which can be used to extract feature detectors from the raw data.
On the other hand, post-processing of results allows for tailored decisions.
So, here it is, R … The R GUI is very simple.
Imagine that we want to build a neural network model to solve the Iris classification problem. In this case, all we need to do is upload the R neural network package NNET.
We then assign the data file containing the IRIS data set to the R object we called Iris.
And call NNET with the right parameters which include the data set used to train the model as well as the size of the hidden layer. We assign the network to the R object IrisNet
Once trained, this network can be easily exported into PMML. We need to upload the PMML package first and then call it with our neural network object as a parameter.
Here is your PMML code. Hot from the oven!
Note the data dictionary which contains a description of the four input variables used to train the model.
Also note that in this simple example, there were no data transformations. We are using the input data AS IS.
So, you did all your data analysis, built your model and generate PMML using R … but what now?
What can you do if you need to execute this model on a Iris Field and use it in real-time?
This is where a predictive analytics scoring engine fits in. We are going to use ADAPA here as an example.
ADAPA allows for data transformations and models to be uploaded and executed in real time via web-services calls.
It is an environment to manage and execute not only one but many predictive models and rule sets.
ADAPA is not a model building environment. We used R for that!
This is a screen-shot of the ADAPA management console.
PMML files are easily uploaded and maintained through this interface.
Note that several models have already been uploaded into ADAPA including the IRIS Neural Network model we just built using R.
Once a model is uploaded, it needs to be validated. This is accomplished via a score matching test.
In this case we uploaded 150 IRIS data records containing the input variables as well as the expected output. ADAPA will then compare and match computed and expected values.
If any mismatches are found, ADAPA allows for complete traceability of its internal decisions.
Great! I just showed you how to Easy Express and Execute Data Mining Models using PMML … all in 6 steps.
R allows for Data Analysis, Model Building, and PMML Export.
PMML can then be uploaded into a compliant decision engine.
Data reaches the engine via web-service calls.
And in so doing,
Model execution is performed in real-time.
Slide 14 - Thank you!
We launched our ADAPA predictive analytics decision engine on EC2 which allows users to deploy, integrate, and execute statistical scoring models, e.g., using algorithms like neural networks, support vector machine (SVM), decision tree, clustering models, naive bayes classifiers and various regression models.
Finally, companies have the option to save a lot of software licensing cost by buying predictive analytics like books, under a cost effective pay-as-you-go SaaS license model.
It is great to see more mathematical applications migrate to the cloud. This is one of the best opportunities where cloud computing can reduce cost and complexity of implementing computational efforts in HPC, large-scale simulations, and predictive analytics.
Tuesday, November 11, 2008
[High Res Video: Exporting PMML from R]
This example walks through the steps required to create a neural network model for the Iris classification problem and then export the model as a PMML file. Once exported in PMML format, the model is ready to be deployed and executed, e.g., in the Zementis ADAPA scoring engine.
To follow this example, you will need to download the PMML package, and the Iris.csv data file.
Tuesday, November 4, 2008
While attending the Business Rules Forum 2008 in Orlando, it was apparent that many of the rules engine vendors have discovered the value of predictive analytics and consider supporting the Predictive Model Markup Language (PMML) standard. As part of the Enterprise Decision Management track of the conference, Zementis presented on open source, open standards, and cloud computing under the title of Agile Deployment of Predictive Analytics Using Amazon EC2.
The synergies of rules and predictive models form the foundation for an Enterprise Decision Management strategy. While rules describe explicit knowledge, predictive analytics leverages implicit knowledge that is embedded in your data.
At Zementis, we have long combined rules and predictive models for better decisions. Our ADAPA decisioning engine seamlessly integrates the Drools rules engine. If you are already using Drools, you can effectively deploy predictive models in your business process with minimal effort.
Even if you are using a rules engine from Ilog, Corticon, or another rules engine vendor, ADAPA provides an easy, SOA-based integration via web services. Predictive analytics does not have to be complex or costly to deploy, if you follow open standards and leverage best-of-breed solutions.
Want to learn more about combining rules and predictive analytics? Please contact us!
Wednesday, October 22, 2008
If your company or software is not listed in the table, please let us know and we will update it. Thanks!
Friday, October 10, 2008
Monday, October 6, 2008
Wednesday, August 6, 2008
Yes, we do! You will find below videos about ADAPA and its business proposition, the ADAPA Console (how to upload and manage models), the ADAPA Control Center (how to launch and manage your ADAPA instances in the Amazon Cloud), and much more.
Our list of videos is available in two formats: youTube and high resolution.
* YouTube - Predictive Analytics with R, PMML, ADAPA, and Excel
* YouTube - ADAPA Add-in for Excel 2007 - Demo
* YouTube - The PMML Path towards True Interoperability in Data Mining
* YouTube - Predictive Analytics + Business Rules = Enhanced Decisioning (RulesFest 2010 Slide Show)
* YouTube - ADAPA Predictive Analytics Scoring Engine - Demo (6 min)
* YouTube - ADAPA Means Business (90 seconds)
* YouTube - ADAPA Control Center Tutorial
* YouTube - ADAPA Console Tutorial
* YouTube - R to PMML Export Example
Also, feel free to check and subscribe to our YouTube Channel.
* High Res Video: ADAPA Add-in for Excel 2007 - Demo (6 min)
* High Res Video: ADAPA Predictive Analytics Scoring Engine - Demo (6 min)
* High Res Video: ADAPA Means Business (90 seconds)
* High Res Video: R to PMML Export Example
* High Res Video: ADAPA Console Tutorial
* High Res Video: ADAPA Control Center Tutorial
Thursday, May 22, 2008
Wednesday, May 21, 2008
Tuesday, May 20, 2008
Starting with data analysis and model development, you can effectively use the Predictive Model Markup Language (PMML) standard, to move complex decision models from the scientist's desktop into a scalable production environment hosted on the Amazon Elastic Compute Cloud (Amazon EC2).
Expressing Models in PMML
PMML is an XML-based language used to define predictive models. It was specified by the Data Mining Group, an independent group of leading technology companies including Zementis. By providing a uniform standard to represent such models, PMML allows for the exchange of predictive solutions between different applications and various vendors.
Open source statistical tools such as R can be used to develop data mining models based on historical data. R allows for models to be exported into PMML which can then be imported into an operational decision platform and be ready for production use in a matter of minutes.
On-Demand Predictive Analytics
Amazon EC2 is a reliable, on-demand infrastructure on which we offer the ADAPA® (Adaptive Decision And Predictive Analytics) Predictive Decisioning Engine based on the Software as a Service (SaaS) paradigm. ADAPA imports models expressed in PMML and executes these in batch mode, or real-time via web-services.
Our service is implemented as a private, dedicated Amazon EC2 instance of ADAPA. Each client has access to his/her own ADAPA Engine instance via HTTP/HTTPS. In this way, models and data for one client never share the same engine with other clients.
The ADAPA Control Center
In order to have ADAPA readily available on Amazon EC2, we built the ADAPA Control Center application which allows for the user launch and manage all ADAPA instances from a single location (see figure below).
Our service easily scales together with the client’s organizational needs for more power and predictive analytics resources. From the ADAPA Control Center, one can launch new as well as terminate existing instances. Amazon EC2 offers five different instances’ types to address different processing needs. These are: small, large, extra-large, and high-CPU (medium and extra-large) as well as high-memory (extra-large, double extra-large, and quadruple extra-large). Also, whenever an instance is no longer necessary, it can be terminated in a matter of seconds.
The ADAPA Console
Each instance executes a single version of the ADAPA engine, which can be easily accessed through the Control Center. The engine itself is accessible through the ADAPA Console which allows for the easy managing of predictive models and data files. The instance owner can use the console to upload new models as well as score or classify records on data files in batch mode. Real-time execution of models is achieved through the use of web-services. The ADAPA Console offers a very intuitive interface which is divided into two main sections: model and data management. These allow for existing models to be used for generating decisions on different data sets. Also, new models can be easily uploaded and existing models can be removed in a matter of seconds.
Using a SaaS solution to break down traditional barriers that currently slow the adoption of predictive analytics, our strategy translates predictive models into operational assets with minimal deployment costs and leverages the inherent scalability of utility computing.
In summary, ADAPA revolutionizes the world of predictive analytics, since it allows for:
- Cost-effective and reliable service based on Amazon’s EC2 infrastructure
- Secure execution of predictive models through dedicated and controlled instances including HTTPS and Web-Services security
- On-demand computing. Choice of instance type (small, large, and extra-large) and launch of multiple instances.
- Superior time-to-market by providing rapid deployment of predictive models and an agile enterprise decision management environment.
Friday, May 9, 2008
- none (default strategy)
For information on each strategy, please visit, for example, the PMML 3.2 Decision Trees specification page at the Data Mining Group website.
Thursday, May 8, 2008
Amazon EC2 is a web service that provides resizable compute capacity in the Cloud. It is designed to make web-scale computing easier. Amazon EC2 provides you with complete control of your computing resources and lets you run on Amazon's proven computing environment.
By utilizing the ADAPA Control Center, you can launch and terminate a new ADAPA instance in minutes which allows you to quickly scale capacity, both up and down, as your computing requirements change.
Finally, by offering ADAPA on Amazon EC2 as a service changes the economics of predictive analytics by allowing you to pay only for computing that you actually use. What a concept ... huh?
Yes, decision trees are part of the modeling elements supported by ADAPA (to see a list of all techniques click here).
You can build your decision tree model with different training algorithms, export the tree as a PMML file (or convert the resulting model to PMML), and upload it into ADAPA for decisioning.
As for modeling techniques it currently supports the following PMML elements:
- Neural Networks
- Support Vector Machines
- Association Rules
- General Regression
- Decision Trees
- Clustering Models
- Naive Bayes Classifiers
- Ruleset Models
- Multiple models (ensembles, segmentation, and model composition)
If you are unsure about what a PMML element represents, please check the DMG (Data Mining Group) webpage which defines PMML . Also, take a look at the feature list for the ADAPA Predictive Analytics Engine for a full feature list of all the PMML elements supported.
Wednesday, April 30, 2008
Tuesday, April 29, 2008
Wednesday, April 16, 2008
1) The model ADAPA loaded and executed may be different than the model you built in your development environment. This may reflect a problem with ADAPA or see below.
2) It may be the case that the PMML file you got out of your model development environment does not really represent all aspects of the model or is problematic semantically speaking.
In both cases, you can try to follow ADAPA's decisions by clicking on the row id for the record you want to look at in the ADAPA Console (see figure below - orange arrow points to row id 3). The row id is a hyperlink and will allow you to download a text file containing a log of computations. This may be very helpful in determining why ADAPA generated the value(s) it did.
Also, the problem may have to do with your data validation file itself. It may be the case that you generated your model in SPSS, for example, exported it as a PMML file and uploaded it into ADAPA. So far so good, but how about the data? If you saved your data in SPSS as well, you have to make sure you saved the expected value or prediction with the correct name. SPSS usually calls this value "PRE_1." You will need to change the name of this variable to the name of the predicted variable defined in the PMML file. Also, if your data contains the original target used to build the model, you will need to rename it to something different than the predicted variable. Your new predicted variable now should be the predicted result you got out of SPSS or any model development environment you used to score the data in the first place.
Monday, April 7, 2008
Friday, March 21, 2008
Wednesday, March 19, 2008
Wednesday, March 12, 2008
This happens usually when the PMML tag on the top of the file refers to http://xml.spss.com/spss/spss-logreg instead of http://www.dmg.org/PMML-3_1.
Whenever a file containing unreadable character is uploaded to the PMML Converter, an error will be produced with the following message: "This is not an xml file". In cases like this, we suggest you locate where any unreadable characters may be located and delete them before conversion. If you have more specific information about this problem, please let us know.
Wednesday, February 27, 2008
Friday, February 22, 2008
Yes, you can. SAS Enterprise Miner exports PMML for a variety of modeling techniques, including neural networks. Please note that depending on the version of Enterprise Miner you have models are exported in an older version of PMML (PMML 2.1 or 3.1). These are automatically converted to latest version of PMML once uploaded into ADAPA.
If you only have the base SAS product, you will probably need to export your model to PMML by writing your own script. Feel free to contact us for tips and help on how to do that.
Thursday, February 21, 2008
In SPSS Statistics version 16 and beyond, you can export PMML for neural networks (back-propagation and radial-basis) by selecting the Export tab on the model building menu. Note that scaling of numerical variables and dummy-fication of categorical variables is expressed in the resulting PMML file under the TransformationDictionary element.
Tuesday, February 19, 2008
For example, the transformation element NormContinuous can be used to implement simple normalization functions such as the z-score transformation (X - m ) / s, where m is the mean value and s is the standard deviation.
Pleaser, refer to the transformations page of the dmg website for PMML examples.
Monday, February 11, 2008
Yes, you can. ADAPA supports deployment of multiple models.
Once your model(s) is uploaded successfully, it is ready to be used, either via the ADAPA Console and the Excel add-in in batch-mode or through Web Services in real-time.
If you have a model already in place, ADAPA will throw an error if you try to upload another model with the same name (the name of a model is specified inside the PMML file in its model element).
ADAPA also supports composing of multiple models into a single model. This important feature supports a variety of model composition cases such as model segmentation, composition, ensembles, and chaining.
For examples and instructions on how to represent model composition in PMML and ADAPA, please refer to the book "PMML in Action" available at amazon.com.
If an input value is missing in a given data record and the value is part of an active PMML input variable (as defined in the Mining Schema element), then ADAPA will try to replace the missing value by the replacement value specified in the Mining Schema. So, if you get a score back for a data record containing missing values, that's because ADAPA is replacing the missing values by the replacement values specified in your PMML file.
I mentioned "try" before because you may have not specified a replacement value in the mining schema. If that is the case, ADAPA will not produce a score for the given data record with missing data.
This is slightly different than what is implied by PMML itself (see the mining schema PMML element), but we feel it gives the user better control over what ADAPA should do in case of missing values. In this way, if your model is a neural network model, for example, you will need to explicitly define the replacement value to be zero for every input if that is what you want. This is in contrast to having ADAPA do that in an automatic way for every type of modeling technique.
Click here to learn more on how missing values are handled in decision trees.
Friday, February 8, 2008
As a scoring engine, ADAPA supports the PMML standard (versions 2.0 to 4.2). In this way, different data mining models can be uploaded into the engine and executed in real-time or batch mode.
For a more detailed list of features, feel free to take a look at the ADAPA page. If you are still unsure about any of the features or would like to learn more about it, drop us a note or give us a call. You can find our contact information in the contacts page of the Zementis website.
In PMML, activations functions are divided into two groups. Group 1 contains the following functions:
Note that if you export regression models from SPSS, these will be in the general regression format.
Thursday, February 7, 2008
After the model is trained, a file will be created in the specified location containing a PMML representation of your linear regression model. A similar sequence of actions and results should work for Multinomial Logistic models.
Important things to notice about the PMML file:
1) The model is represented as a general regression PMML element;
2) For SPSS versions up to version 14, data transformations are not part of the PMML file. Therefore, you will need to add any data transformations manually. You can use the Transformation Generator tool to graphically design your transformations and then paste the resulting PMML code into your file.