Showing posts with label PMML. Show all posts
Showing posts with label PMML. Show all posts

Tuesday, May 20, 2008

What is the scoop behind PMML and Amazon EC2?

Organizations increasingly recognize the value that predictive analytics offers to their business. The complexity of development, integration, and deployment of predictive models, however, is often considered cost-prohibitive for many projects. In light of mature open source solutions, open standards, and SOA principles we propose an agile model development life cycle that allows us to quickly leverage predictive analytics in operational environments.

Starting with data analysis and model development, you can effectively use the Predictive Model Markup Language (PMML) standard, to move complex decision models from the scientist's desktop into a scalable production environment hosted on the Amazon Elastic Compute Cloud (Amazon EC2).

Expressing Models in PMML

PMML is an XML-based language used to define predictive models. It was specified by the Data Mining Group, an independent group of leading technology companies including Zementis. By providing a uniform standard to represent such models, PMML allows for the exchange of predictive solutions between different applications and various vendors.

Open source statistical tools such as R can be used to develop data mining models based on historical data. R allows for models to be exported into PMML which can then be imported into an operational decision platform and be ready for production use in a matter of minutes.

On-Demand Predictive Analytics

Amazon EC2 is a reliable, on-demand infrastructure on which we offer the ADAPA® (Adaptive Decision And Predictive Analytics) decision engine based on the Software as a Service (SaaS) paradigm. ADAPA Predictive Analytics Edition imports models expressed in PMML and executes these in batch mode, or real-time via web-services.

Our service is implemented as a private, dedicated Amazon EC2 instance of the ADAPA® Predictive Analytics Edition. Each client has access to his/her own ADAPA® engine instance via HTTP/HTTPS. In this way, models and data for one client never share the same ADAPA® engine with other clients.

The ADAPA Control Center

In order to have ADAPA readily available at Amazon EC2, we built the ADAPA Control Center application which allows for the user launch and manage all ADAPA instances from a single location (see figure below).



An ADAPA instance contains all the functionality of the ADAPA Predictive Analytics Edition. Our service easily scales together with the client’s organizational needs for more power and predictive analytics resources. From the ADAPA Control Center, one can launch new as well as terminate existing instances. Although there is a limit of 20 instances that can be deployed at any single time, Amazon EC2 offers three different instances’ types to address different processing needs. These are: small, large, and extra-large. Also, whenever an instance is no longer necessary, it can be terminated in a matter of seconds.

The ADAPA Console

Each instance executes a single version of the ADAPA Predictive Analytics engine, which can be easily accessed through the Control Center. The engine itself is accessible through the ADAPA Console which allows for the easy managing of predictive models and data files. The instance owner can use the console to upload new models as well as score or classify records on data files in batch mode. Real-time execution of models is achieved through the use of web-services. The ADAPA Console offers a very intuitive interface which is divided into two main sections: model and data management. These allow for existing models to be used for generating decisions on different data sets. Also, new models can be easily uploaded and existing models can be removed in a matter of seconds.


Using a SaaS solution to break down traditional barriers that currently slow the adoption of predictive analytics, our strategy translates predictive models into operational assets with minimal deployment costs and leverages the inherent scalability of utility computing.

In summary, ADAPA revolutionizes the world of predictive analytics, since it allows for:

  • Cost-effective and reliable service based on Amazon’s EC2 infrastructure

  • Secure execution of predictive models through dedicated and controlled instances including HTTPS and Web-Services security

  • On-demand computing. Choice of instance type (small, large, and extra-large) and launch of multiple instances.

  • Superior time-to-market by providing rapid deployment of predictive models and an agile enterprise decision management environment.

Friday, May 9, 2008

How does ADAPA handle missing values for Decision Trees?

PMML 3.2 offers many different strategies for the handling of missing values in Decision Trees. ADAPA supports all of them. These are:

  • lastPrediction

  • nullPrediction

  • defaultChild

  • weightedConfidence

  • aggregateNodes

  • none (default strategy)


For information on each strategy, please visit the PMML 3.2 Decision Trees specification page at the Data Mining Group website.

Thursday, May 8, 2008

Does ADAPA support decision trees?

Yes, decision trees are the latest addition to the suite of modeling techniques supported by ADAPA (to see a list of all techniques click here).

You can build your decision tree model with different training algorithms, export the tree as a PMML file (or convert the resulting model to PMML), and upload it into ADAPA for decisioning.

Does ADAPA support all modeling techniques specified in PMML?

We are constantly working towards that goal. Currently, ADAPA supports several PMML elements, including pre- and post-processing elements.

As for modeling techniques it supports the following PMML elements:
  • Neural Networks

  • Support Vector Machines

  • Regression

  • General Regression

  • Decision Trees

If you are interested in using ADAPA but the PMML element you use is not listed in here, feel free to contact us.

If you are unsure about what a PMML element represents, please check the DMG (Data Mining Group) webpage which defines PMML 3.2 (the latest version of PMML). Also, take a look at the Zementis ADAPA Predictive Analytics page which contains a list of modeling techniques supported by the PMML elements listed above and ADAPA.

Friday, March 21, 2008

Even when my model uploads successfully, I still get warnings. Is that OK?

Yes, it is OK. Warnings are generated whenever ADAPA finds inconsistencies in the PMML file which do not affect scoring. However, you should check all warnings even if you get a perfect score match, i.e. your model validates fine.

For example, if you upload the model "Iris_SVM" from the PMML Examples page of the Zementis website, you will get warnings. If you click on the "warnings" hyperlink in the ADAPA Predictive Analytics demo, you will get a file back entitled "uploadedFile.xml". By opening this file in your XML editor, you will be able to see the PMML model you just uploaded with all detected warnings embedded into it as PMML extensions. ADAPA will also generate a comment on the top of the file giving a summary of its findings. For the "Iris_SVM" model, this reads as follows:

Comment generated by ADAPA: There are at least 6 warnings in this PMML document.
Detailed information can be found as comments or as Extension elements embedded in the appropriate locations within this document.


The first warning talks about the timestamp which is not in the right format. For all other warnings look for PMML extensions with name = "WARNING".

Monday, March 17, 2008

How do I use the PMML 3.2 Converter?

For a video tutorial on how to use the PMML 3.2 Converter, click here ...

Simple, just "Browse" for your old PMML file (versions 2.1, 3.0, or 3.1). Once you find it, click on "Convert". If the model is converted successfully, it will be available for download. Just click on the link provided (see image below) and save the file locally.



You can then use your new PMML 3.2 file in ADAPA. Check the ADAPA Predictive Analytics demo on the Zementis website.

Note that the converter expects valid PMML files. Auto-generated PMML code may sometimes contain elements that are not valid PMML. In that case, the converter will display a message stating that your file does not conform to the PMML specification. You will be able to take a look at the identified problems by clicking on the "details" hyperlink. Any errors encountered will have specific comments generated by the converter. On the very top of the file, there is also a summary comment stating how many total problems were encountered during conversion. You can use this information as feedback to obtain a valid PMML file before attempting conversion again.

For supported PMML elements as well as known issues with auto-generated PMML code, check our other blogs under the "PMML Converter" label.

How do I use the ADAPA Predictive Analytics demo?

For a video tutorial on how to use the ADAPA Predictive Analytics demo, click here ...

When you first launch the ADAPA Predictive Analytics demo, we will see a single tab entitled "New Model" ... this is where you upload your PMML 3.2 model. You can browse for your model file and once you locate it, click on "Upload".

In the example below, the model named "Audit_NN" was uploaded successfully into ADAPA. This model can be found among the PMML examples supplied in the Zementis website.



You then need to click on the model tab, which in this case is "Audit_NN" to be able to upload a data file for verification or scoring. Note that the tab will contain the model description together with a link to a file containing possible warnings that were generated by ADAPA during model uploading.



Note that the ADAPA Predictive Analytics demo will only accept files for uploading up to 1MB.

You can now browse for the data validation file, in this case "Audit_NN.csv" which contains 2000 records with inputs as well as the expected output value for the "Audit_NN" model. This file is also available in the PMML examples page of the Zementis website.

Once the data file is uploaded and results are generated, the ADAPA Predictive Analytics demo will display the first five records as shown below.



Note that in this case, there is a perfect match between computed and expected values for the predicted field. If, for any reason, the results do not match, the first five mismatched records will be displayed instead. All values under the "Computed Value" column are hyperlinks. By clicking on one of these values, you will have access to debugging information which can be very useful in case you do get a mismatch.

Once you are able to validate that the model is working as expected, you can browse and upload a file for scoring.

Monday, February 11, 2008

What happens if my data contains records with missing values? Will ADAPA score such records anyway?

If an input value is missing in a given data record and the value is part of an active PMML input variable (as defined in the mining schema), then ADAPA will try to replace the missing value by the replacement value specified in the mining schema. So, if you get a score back for a data record containing missing values, that's because ADAPA is replacing the missing values by the replacement values specified in the mining schema.

I mentioned "try" before because you may have not specified a replacement value in the mining schema. If that is the case, ADAPA will not produce a score for the given data record with missing data.

This is slightly different than what is implied by PMML itself (see the mining schema PMML element), but we feel it gives the user better control over what ADAPA should do in case of missing values. In this way, if your model is a neural network model, for example, you will need to explicitly define the replacement value to be zero for every input if that is what you want. This is in contrast to having ADAPA do that in an automatic way for every type of modeling technique.

Click here to learn more on how missing values are handled in decision trees.

Friday, February 8, 2008

ADAPA refuses to upload my model, what should I do?

Before successfully uploading a PMML file, ADAPA will make sure that it is a valid PMML 3.2 file. During this phase, you may get many syntax errors. All errors need to be resolved before ADAPA successfully uploads a model. Syntax errors (your model is not a valid model according to the PMML 3.2 schema) are displayed as embedded comments in your model file - try clicking on the "details" hyperlink and open the file in an XML editor.

Once your file passes schema validation, it will also be semantically checked (ADAPA will ask: does this model make sense?). Semantic errors and warning are displayed in the model file as embedded PMML extenstions.

If you are using a licensed version of ADAPA through its web management console, errors and warnings are displayed in the console itself.

Use the information you get back from ADAPA to correct your PMML file and give it another try. If you use a PMML element not currently supported by ADAPA, feel free to let us know. You can find our contact information in the Zementis website, contacts page.

Does ADAPA support all aspects of the Neural Network PMML element?

Almost all. ADAPA does not support Neural Networks with recurrent connections.

What is PMML and how can I learn more about it?

The Predictive Model Markup Language (PMML) is an XML-based language which provides a standard for applications to define statistical and data mining models and to share models between PMML compliant applications.

Therefore, proprietary issues and incompatibilities should no longer be a barrier to the exchange of models between applications from different vendors.

You can learn more about PMML by taking a look at the DMG (Data Mining Group) website. Else, you will also find PMML examples and information on PMML exporters and converters in the support page of the Zementis website.

What kind of activation functions for Neural Networks are supported by ADAPA?

ADAPA supports all the PMML 3.2 list of activation functions for the Neural Network model element.

In PMML, activations functions are divided into two groups. Group 1 contains the following functions:
  • threshold
  • logistic
  • tanh
  • identity
  • exponential
  • reciprocal
  • square
  • Gauss
  • sine
  • cosine
  • Elliott
  • arctan
Group 2 contains only one function:
  • radialBasis
For more details, please take a look at the PMML 3.2 Neural Network specification.

What types of Neural Network models built with R nnet can I export to PMML?

You can basically export most of the Neural Network models you build using the R nnet package into PMML 3.2 by using the PMML package available from Togaware. See link below:

http://rattle.togaware.com/

The PMML package is also available through CRAN.

The function to be used is pmml.nnet. With this function, a PMML representation can be obtained for Neural Networks implementing:

  • multi-class classification
  • binary classifcation
  • regression
Details you should know:

  1. Scaling of input variables: Since nnet does not automatically implement scaling of numerical inputs, you will need to add scaling to the generated PMML file by hand if you are planning to use the model to compute scores/results from raw data. Scaling of numerical values in PMML is easy. See blog on scaling and transformations in PMML for details.
  2. The PMML exporter uses transformations to create dummy variables for categorical inputs. These are expressed in the NeuralInputs element of the resulting PMML file.
  3. PMML does not support the censored variant of softmax.
  4. Given that nnet uses a single output node to represent binary classification, the resulting PMML file contains a discretizer with a threshold set to 0.5.

BTW, any of the models you build in nnet and export using the PMML package can be uploaded directly into ADAPA for scoring.

Thursday, February 7, 2008

How can I export PMML code from older versions of SPSS?

In older versions of SPSS, like SPSS 11.5, a linear regression model can be exported to PMML by going through the following sequence of menus: Analyze -> Regression -> Linear... -> Save... You will find yourself in box "Linear Regression: Save". Enter the file name and location you want the PMML file to be written to in "Export model information to XML file" at the bottom of the "Save" box.

After the model is trained, a file will be created in the specified location containing a PMML representation of your linear regression model. A similar sequence of actions and results should work for Multinomial Logistic models.

Important things to notice about the PMML file:

1) The model is represented as a general regression PMML element;

2) For older versions of SPSS (like 11.5), the export is in PMML 2.0. This file will need to be converted to PMML 3.2 before it can be uploaded into ADAPA.

3) For SPSS versions up to version 14, data transformations are not part of the PMML file. Therefore, you will need to add any data transformations manually.