Wednesday, February 27, 2008

Can I use the PMML converter to convert any kind of modeling technique to PMML 3.2?

You can use the PMML converter to validate your PMML file against the specification for versions 2.1, 3.0, 3.1, and 3.2. If validation is not successful, the converter will give you a file back with explanations for why the validation failed (click on the "details" hyper-link).

So, before actual conversion takes place, the validation phase needs to be successful, i.e. your file needs to conform to the PMML specification as published in the DMG website (for any of the older PMML versions listed above).

The PMML converter will only convert the following model elements to PMML 3.2:
  • Neural Networks

  • Decision Trees

  • Support Vector Machines

  • General Regression

  • Regression

It will also convert pre- and post-processing PMML elements.

Friday, February 22, 2008

Can I export my Weka model into PMML?

As far as we know, Weka does not export models into PMML automatically. According to a Weka blog, this functionality may be incorporated by the end of 2008.

If you have more precise information about this topic, please feel free to let us know so that it can be incorporated into this blog.

Can I export my SAS model into PMML?

Yes, you can. SAS Enterprise Miner exports PMML 2.1 for a variety of modeling techniques, including neural networks. Please note that given that 2.1 is an older version of PMML, you will need to convert your model to PMML 3.2 before uploading it into ADAPA. For that, you can use the PMML Converter Tool, which is also available as an iGoogle Gadget.

If you only have the base SAS product, you will probably need to export your model to PMML by writing your own script. Feel free to contact us for tips on how to do that.

Thursday, February 21, 2008

How can I export PMML from SPSS?

In newer versions of SPSS (I believe starting with version 14), you can export many different modeling techniques into PMML. For some techniques, it also exports the transformations that are applied to the input data before the model is built.

In SPSS version 16, you can export PMML for neural networks (back-propagation and radial-basis) by selecting the Export tab on the model building menu. Note that scaling of numerical variables and dummy-fication of categorical variables is expressed in the resulting PMML file under the TransformationDictionary element.

SPSS PMML issues:
  • We have noticed a problem with the resulting PMML file whenever data transformations are present. In the PMML schema, the element DerivedField is required to have attributes optype and dataType, which are missing in the SPSS export. If you try to load any model that is not conforming to the PMML schema into ADAPA, it will complain and refuse to upload such a model.
  • Also, SPSS generates PMML code with SPSS originated tags, like x-Basis, which are not part of PMML. Such tags are constrained around model information which is not necessary for your model to run successfully in ADAPA. However, to conform to the PMML schema, these need to be deleted from the PMML file.
  • SPSS generates models without a modelName. Although this is not a required field in PMML, it is convenient to have a name for your model so that it can be managed easily in ADAPA.
Note that newer versions of SPSS export models into PMML version 3.1. Given that ADAPA consumes PMML 3.2, any SPSS export will have to be converted from 3.1 to 3.2 before being loaded into ADAPA.

We have made available a list of converters from older versions of PMML (2.1, 3.0, and 3.1) to version 3.2. These converters also correct the SPSS problems mentioned above. In this way, your SPSS generated models can be uploaded successfully into ADAPA. All converters are available on the Zementis web site through our PMML Converter Tool or in form of an iGoogle gadget.

Tuesday, February 19, 2008

How can I represent scaling of numerical variables in PMML?

You can scale numerical variables by using transformations in PMML. These can be part of the TransformationDictionary or LocalTransformations elements. For neural networks, numerical transformations can also be done in the NeuralInputs element (as well as the NeuralOutputs element).

For example, the transformation element NormContinuous can be used to implement simple normalization functions such as the z-score transformation (X - m ) / s, where m is the mean value and s is the standard deviation.

Pleaser, refer to the transformations page of the dmg website for PMML examples.

Monday, February 11, 2008

How many data records can ADAPA produce results for?

There is no limit in ADAPA in terms of data records it can process. However, if you are using the ADAPA Predictive Analytics Demo on our web site or the iGoogle Gadget Demo, the limit is 100 records.

If you need to process more than 100 records at a time or would like to connect to ADAPA through web-services, visit the how-to-buy page on the Zementis website. Or contact us directly either by phone or e-mail.

The ADAPA Predictive Analytics Edition is now being offered as a service through the Amazon Elastic Compute Cloud (Amazon EC2). Once you subscribe to this service, there is no limit of data records or models that you can upload into ADAPA.

Can I upload multiple models into ADAPA?

Yes, you can. The ADAPA Predictive Analytics Edition and the ADAPA Enterprise Edition support deployment of multiple models.

The limited demo version and the iGoogle Gadget demo only allow you to upload one model at a time. Once your model is uploaded successfully, you can use it to produce results/decisions for your data.

If you have a model already in place, it will be deleted once you upload another model. The new model will then be used to produce results against any new data.

If you need multiple models to be available and/or model segmentation, you will need to purchase a licensed version of ADAPA. Please, feel free to contact us at any time if that is the case. Our contact information can be found in the Zementis website, contacts page.

What happens if my data contains records with missing values? Will ADAPA score such records anyway?

If an input value is missing in a given data record and the value is part of an active PMML input variable (as defined in the mining schema), then ADAPA will try to replace the missing value by the replacement value specified in the mining schema. So, if you get a score back for a data record containing missing values, that's because ADAPA is replacing the missing values by the replacement values specified in the mining schema.

I mentioned "try" before because you may have not specified a replacement value in the mining schema. If that is the case, ADAPA will not produce a score for the given data record with missing data.

This is slightly different than what is implied by PMML itself (see the mining schema PMML element), but we feel it gives the user better control over what ADAPA should do in case of missing values. In this way, if your model is a neural network model, for example, you will need to explicitly define the replacement value to be zero for every input if that is what you want. This is in contrast to having ADAPA do that in an automatic way for every type of modeling technique.

Click here to learn more on how missing values are handled in decision trees.

How can I sign in for the paid version of ADAPA?

That's easy. First, make sure to get to the "How to Buy" page in the Zementis website. In this page, once you click the URL for subscribing to the ADAPA Predictive Analytics Edition, you will be directed to Amazon and will be faced with a page that looks like this:



If you already have an Amazon account, feel free to use it. If not, you will need to create one by selecting "I am a new user." Once you are able to sign in, you will need to subscribe to the Elastic Compute Cloud (EC2) service. The next few pages will walk you through that process. The first page will look like this:



Note that this is just a welcoming page with information about Amazon EC2 and S3 (Simple Storage Service). Go to the bottom of the page and click on "Sign Up for Amazon EC2". Once you do that, you will now be faced with a page that looks like this:



In this page, you will have the chance to review your choices for processing with EC2 and storage with S3 (you do not need S3 to run ADAPA). Click on the icon next to "Amazon Elastic Compute Cloud" to review EC2 options and on the icon next to "Amazon Simple Storage Service" to review S3 options. Also, make sure your payment information is correct by scrolling to the bottom of the page. Once you have reviewed all options, click on "Complete Sign Up." Note that you will have the chance to launch different instance types (small, large, and extra large) later on through the ADAPA Control Center. Once you complete the sign up process for EC2, you will be faced with a page like this:



As the message says, a confirmation note will be sent to your e-mail address. This note will contain information on how to access your Amazon Web Services (AWS) account. It will also contain a hyper link to a page in which you will be able to retrieve your Access Key ID and your Secret Access Key (both needed to run ADAPA - so, keep that in mind). In the current page, you can explore more about what EC2 can offer by clicking on the hyper links provided. Once you are ready to move on, click "Continue." You will now be faced with a page that looks like this:



This is a confirmation page. Review the pricing information for ADAPA. Given that you have not yet used any of the services, your total is still $0.00. Click on "Place your order." You will then be faced with a page containing a header like the one below:



Note that the header shows an activation code. You do not need this code to run ADAPA. Underneath this header, find the ADAPA Control Center. You are now good to go (given that you have your access keys - check note you got via e-mail with instructions on how to obtain your keys). Please, check blogs on the ADAPA Control Center on how to use it to launch ADAPA instances. You can now close the header by clicking on the "close" icon on the top right corner.

Once you come back to Amazon later on and sign in this header will be shown again, but now it will display the option for you to check and manage your ADAPA subscription. See new header below.



For information on how to manage your subscription, please check our blog on how to manage your EC2 ADAPA subscription. We also have blogs on how to manage your EC2 AWS account. Make sure to check that too.

Thanks for purchasing ADAPA.

Friday, February 8, 2008

ADAPA refuses to upload my model, what should I do?

Before successfully uploading a PMML file, ADAPA will make sure that it is a valid PMML 3.2 file. During this phase, you may get many syntax errors. All errors need to be resolved before ADAPA successfully uploads a model. Syntax errors (your model is not a valid model according to the PMML 3.2 schema) are displayed as embedded comments in your model file - try clicking on the "details" hyperlink and open the file in an XML editor.

Once your file passes schema validation, it will also be semantically checked (ADAPA will ask: does this model make sense?). Semantic errors and warning are displayed in the model file as embedded PMML extenstions.

If you are using a licensed version of ADAPA through its web management console, errors and warnings are displayed in the console itself.

Use the information you get back from ADAPA to correct your PMML file and give it another try. If you use a PMML element not currently supported by ADAPA, feel free to let us know. You can find our contact information in the Zementis website, contacts page.

How can I test my model once it is successfully uploaded into ADAPA?

This is a good question. Given that you built your model outside of ADAPA, you want to make sure that both ADAPA and your development environment produce exactly the same results.

ADAPA provides an integrated testing process to make sure your model was uploaded and works as expected. It allows for a test file containing from 1 to thousands of records with all the necessary input variables and the expected result for each record to be uploaded for score matching.

This can be done easily through its web management console. After processing the file, ADAPA returns statistics on total amount of matched and unmatched records, percentages, etc. If any records failed the matching test, a complete list of all failed records is displayed. One can then peer through computed information for each record to locate where expected and computed values differed and thus pinpoint the source of the problem.

PMML also offers a Model Verification element for similar testing purposes. In this way, verification records are part of the PMML file itself. As of now, ADAPA does not support the Model Verification PMML element.

Does ADAPA support all aspects of the Neural Network PMML element?

Almost all. ADAPA does not support Neural Networks with recurrent connections.

What is PMML and how can I learn more about it?

The Predictive Model Markup Language (PMML) is an XML-based language which provides a standard for applications to define statistical and data mining models and to share models between PMML compliant applications.

Therefore, proprietary issues and incompatibilities should no longer be a barrier to the exchange of models between applications from different vendors.

You can learn more about PMML by taking a look at the DMG (Data Mining Group) website. Else, you will also find PMML examples and information on PMML exporters and converters in the support page of the Zementis website.

What is ADAPA anyway?

ADAPA (Adaptive Decision And Predictive Analytics) is intrinsically a decision engine. It combines the power of predictive analytics and business rules to facilitate the tasks of managing and designing automated decisions systems.

As a scoring engine, ADAPA supports the PMML standard. In this way, different data mining models can be uploaded into the engine and executed in real-time or batch mode. The predictive analytics capabilities of ADAPA are only but a single aspect of the ADAPA offering, which also includes business rules, auditing & reporting, web services, web management console, testing, and more.

There is a whole lot of information posted in different websites about ADAPA. You can get a basic introduction by reading about it in wikipedia. For a more detailed list of features, feel free to take a look at our company's products page. If you are still unsure about any of the features or would like to learn more about it, drop us a note or give us a call. You can find our contact information in the contacts page of the Zementis website.

What kind of activation functions for Neural Networks are supported by ADAPA?

ADAPA supports all the PMML 3.2 list of activation functions for the Neural Network model element.

In PMML, activations functions are divided into two groups. Group 1 contains the following functions:
  • threshold
  • logistic
  • tanh
  • identity
  • exponential
  • reciprocal
  • square
  • Gauss
  • sine
  • cosine
  • Elliott
  • arctan
Group 2 contains only one function:
  • radialBasis
For more details, please take a look at the PMML 3.2 Neural Network specification.

What types of Neural Network models built with R nnet can I export to PMML?

You can basically export most of the Neural Network models you build using the R nnet package into PMML 3.2 by using the PMML package available from Togaware. See link below:

http://rattle.togaware.com/

The PMML package is also available through CRAN.

The function to be used is pmml.nnet. With this function, a PMML representation can be obtained for Neural Networks implementing:

  • multi-class classification
  • binary classifcation
  • regression
Details you should know:

  1. Scaling of input variables: Since nnet does not automatically implement scaling of numerical inputs, you will need to add scaling to the generated PMML file by hand if you are planning to use the model to compute scores/results from raw data. Scaling of numerical values in PMML is easy. See blog on scaling and transformations in PMML for details.
  2. The PMML exporter uses transformations to create dummy variables for categorical inputs. These are expressed in the NeuralInputs element of the resulting PMML file.
  3. PMML does not support the censored variant of softmax.
  4. Given that nnet uses a single output node to represent binary classification, the resulting PMML file contains a discretizer with a threshold set to 0.5.

BTW, any of the models you build in nnet and export using the PMML package can be uploaded directly into ADAPA for scoring.

Does ADAPA support all general regression PMML models?

Yes, ADAPA supports the entire list of general regression PMML model elements. These are:
  • regression
  • generalLinear
  • multinomialLogistic
  • ordinalMultinomial
  • generalizedLinear
ADAPA also supports all the link and cumulative link functions defined in PMML 3.2 for general regression models.

Note that if you export regression models from SPSS, these will be in the general regression format. SPSS versions 15 and 16 export PMML 3.1 and so the PMML file will need to be converted to PMML 3.2 before it is uploaded into ADAPA.

What kind of normalization methods does ADAPA support for the regression PMML element?

ADAPA supports all the PMML normalization methods available for the regression element: softmax, simplemax, logit, probit, cloglog, loglog, exp and cauchit.

Note however that ADAPA currently does not support the ordinal version of these normalization methods, only categorical. Ordinal multinomial regression models are supported by ADAPA through the general regression PMML element.

Extending the SVM element in PMML to allow for multiclass-classification using the one-against-one approach in ADAPA.

For multiclass-classification with k classes, k > 2, the R ksvm function uses the `one-against-one'-approach, in which k(k-1)/2 binary classifiers are trained; the appropriate class is found by a voting scheme.

In order to implement such a scheme in ADAPA, we needed to extend PMML 3.2. Basically, PMML asks for a single target category to be associated with each Support Vector Machine. In case of a binary classifier, PMML actually asks for the alternate binary target category.

So, in order to implement the one-against-one approach, we needed to give each machine an extra alternate target category given that all k(k-1)/2 machines are binary classifiers.

Note that ADAPA also supports one-against-all approach (also known as one-against-rest) for which the PMML extension is not necessary.

Voting schemes for multiclass-classification problems in SVM are described in:

C.-W. Hsu and C.-J. Lin
A comparison on methods for multi-class support vector machines
IEEE Transactions on Neural Networks, 13(2002) 415-425.
http://www.csie.ntu.edu.tw/~cjlin/papers/multisvm.ps.gz

What types of regression models built with R can I export to PMML?

Quick answer:
  • Linear Regression
  • Binary Logistic Regression
You can basically export any linear regression models you build using the R glm function and the gaussian family into PMML 3.2 by using the PMML package available from Togaware. See link below:

http://rattle.togaware.com/

The PMML package is also available through CRAN.

The function to be used is named pmml.lm. The original version only allowed for the exporting of linear regression models. We have extended it to also export binary logistic regression models built using the R function glm and the binomial family.

The following example trains a binary logistic regression model for the audit dataset and exports its equivalent PMML 3.2 code:

audit <- read.csv(file("http://rattle.togaware.com/audit.csv"))
binlog <- glm(Adjusted ~ ., data=crs$dataset[crs$sample,c(2:8,10:13)], family=binomial(logit))
pmml.lm(binlog)

Note that this function does not support multinomial logistic regression models or any other regression models built using the VGAM package.

What types of SVM models built with R ksvm can I export to PMML?

You can basically export any SVM models you build using the ksvm R package into PMML 3.2 by using the PMML package available from Togaware. See link below:

http://rattle.togaware.com/

The PMML package is also available through CRAN.

The function to be used is pmml.ksvm. With this function, a PMML representation can be obtained for SVMs implementing:

  • multi-class classification
  • binary classifcation
  • regression

Note that it also implements transformations for the input variables by following the scaling scheme used by ksvm. It also uses transformations to create dummy variables for any categorical inputs.

We have encountered an issue with ksvm while building the dummy-fication piece. Basically, except for the first categorical variable in a model, all other categorical variables loose their first input category. That is, ksvm does not create a dummy variable for the first category. We have already pointed this out to the author of ksvm. For now, the PMML export code mimics this issue so that you can get a match during scoring.

The example below shows how to train a support vector machine to perform binary classification using the audit dataset provided by Togaware (thanks to Graham Williams).

require(kernlab)
audit <- read.csv(file("http://rattle.togaware.com/audit.csv"))
myksvm <- ksvm(as.factor(Adjusted) ~ ., data=audit[,c(2:10,13)], kernel="rbfdot", prob.model=TRUE)
pmml.ksvm(myksvm, data=audit)

BTW, any models you build in ksvm and export using the PMML package can be uploaded directly into ADAPA for scoring.

Thursday, February 7, 2008

How can I export PMML code from R?

A PMML package for R that exports all kinds of predictive models is available from Togaware. Go to the site shown in the link below and download the PMML package (or get it through CRAN):

http://rattle.togaware.com/

Zementis has contributed exporter functionality for the following R packages/functions:

1) ksvm (Support Vector Machines);

2) nnet (Neural Networks).

Both ksvm and nnet export PMML 3.2.

We have also updated the linear and binary logistic regression function to also export PMML 3.2. This function works with R package glm.

In addition to the models listed above, the pmml package also offers PMML 3.1 export for decision trees (package rpart) and clustering models. You can use the PMML Converter to convert decision trees from version 3.1 to 3.2 and upload them into ADAPA.

How can I export PMML code from older versions of SPSS?

In older versions of SPSS, like SPSS 11.5, a linear regression model can be exported to PMML by going through the following sequence of menus: Analyze -> Regression -> Linear... -> Save... You will find yourself in box "Linear Regression: Save". Enter the file name and location you want the PMML file to be written to in "Export model information to XML file" at the bottom of the "Save" box.

After the model is trained, a file will be created in the specified location containing a PMML representation of your linear regression model. A similar sequence of actions and results should work for Multinomial Logistic models.

Important things to notice about the PMML file:

1) The model is represented as a general regression PMML element;

2) For older versions of SPSS (like 11.5), the export is in PMML 2.0. This file will need to be converted to PMML 3.2 before it can be uploaded into ADAPA.

3) For SPSS versions up to version 14, data transformations are not part of the PMML file. Therefore, you will need to add any data transformations manually.