Wednesday, April 16, 2008

ADAPA's computed value is not the same as the expected value, why is that?

There are many reasons for the validation test to fail. I can think of two main reasons.

1) The model ADAPA loaded and executed may be different than the model you built in your development environment. This may reflect a problem with ADAPA or see below.

2) It may be the case that the PMML file you got out of your model development environment does not really represent all aspects of the model or is problematic semantically speaking.

In both cases, you can try to follow ADAPA's decisions by clicking on the computed value which is a hyperlink and follow through its log of computations which are displayed as a text file. This may be very helpful in determining why ADAPA generated the value(s) it did.

Also, the problem may have to do with your data validation file itself. It may be the case that you generated your model in SPSS, for example, exported it as a PMML, converted it using the iGoogle converter and uploaded it into ADAPA. So far so good, but how about the data? If you saved your data in SPSS as well, you have to make sure you saved the expected value or prediction with the correct name. SPSS usually calls this value "PRE_1." You will need to change the name of this variable to the name of the predicted variable defined in the PMML file. Also, if your data contains the original target used to build the model, you will need to rename it to something different than the predicted variable. Your new predicted variable now should be the predicted result you got out of SPSS or any model development environment you used to score the data in the first place.

Monday, April 7, 2008

When uploading PMML example files into ADAPA, I get computed and expected values. What do they mean?

Each of the examples listed in the Zementis website is composed of two files: a PMML model file (.xml) and a validation file (.csv).

Given that the models have been built in a tool other than ADAPA, we want to make sure that both development tool and ADAPA produce the same results. This is done by supplying ADAPA with the expected results for a number of input records. When this happens, ADAPA will automatically compare the given expected value with its own computed value. If the both values match for all records (and given enough validation records), we can feel confident that ADAPA has uploaded the model correctly. When this happens, there is no longer the need to supply ADAPA with the expected results, since all we really want from now on is to get the computed results back.

Friday, March 21, 2008

Even when my model uploads successfully, I still get warnings. Is that OK?

Yes, it is OK. Warnings are generated whenever ADAPA finds inconsistencies in the PMML file which do not affect scoring. However, you should check all warnings even if you get a perfect score match, i.e. your model validates fine.

For example, if you upload the model "Iris_SVM" from the PMML Examples page of the Zementis website, you will get warnings. If you click on the "warnings" hyperlink in the ADAPA Predictive Analytics demo, you will get a file back entitled "uploadedFile.xml". By opening this file in your XML editor, you will be able to see the PMML model you just uploaded with all detected warnings embedded into it as PMML extensions. ADAPA will also generate a comment on the top of the file giving a summary of its findings. For the "Iris_SVM" model, this reads as follows:

Comment generated by ADAPA: There are at least 6 warnings in this PMML document.
Detailed information can be found as comments or as Extension elements embedded in the appropriate locations within this document.


The first warning talks about the timestamp which is not in the right format. For all other warnings look for PMML extensions with name = "WARNING".

Wednesday, March 19, 2008

What is the format I should use for my data file for scoring with the ADAPA Predictive Analytics demo?

You should upload your data as a CSV file. Make sure the data file contains all the input fields you actually use in your model. If you are missing a field, ADAPA will not generate any scores.

Also, the first row should contain the name of the variables.

For example, for the model "Audit_NN" available in the PMML Examples page of the Zementis website, the first 6 rows of the .csv data file used to validate the model look like:

Age,Employment,Education,Marital,Occupation,Income,Sex,Deductions,Hours,Adjusted
38,Private,College,Unmarried,Service,81838,Female,0,72,0
35,Private,Associate,Absent,Transport,72099,Male,0,30,0
32,Private,HSgrad,Divorced,Clerical,154676.74,Male,0,40,0
45,Private,Bachelor,Married,Repair,27743.82,Male,0,55,1
60,Private,College,Married,Executive,7568.23,Male,0,40,0

Note that the variable "Adjusted" is actually the predicted field. It is present in the example above since we are using this file for validation (score matching). Obviously, if you are only trying to score your data, you should leave the predicted column out. ADAPA will return computed scores for each entry.

Monday, March 17, 2008

How do I use the ADAPA PMML 3.2 Converter?

Simple, just "Browse" for your old PMML file (versions 2.1, 3.0, or 3.1). Once you find it, click on "Convert". If the model is converted successfully, it will be available for download. Just click on the link provided (see image below) and save the file locally.



You can then use your new PMML 3.2 file in ADAPA. Check the ADAPA Predictive Analytics demo on the Zementis website.

Note that the converter expects valid PMML files. Auto-generated PMML code may sometimes contain elements that are not valid PMML. In that case, the converter will display a message stating that your file does not conform to the PMML specification. You will be able to take a look at the identified problems by clicking on the "details" hyperlink. Any errors encountered will have specific comments generated by the converter. On the very top of the file, there is also a summary comment stating how many total problems were encountered during conversion. You can use this information as feedback to obtain a valid PMML file before attempting conversion again.

For supported PMML elements as well as known issues with auto-generated PMML code, check our other blogs under the "PMML Converter" label.

How do I use the ADAPA Predictive Analytics demo?

When you first launch the ADAPA Predictive Analytics demo, we will see a single tab entitled "New Model" ... this is where you upload your PMML 3.2 model. You can browse for your model file and once you locate it, click on "Upload".

In the example below, the model named "Audit_NN" was uploaded successfully into ADAPA. This model can be found among the PMML examples supplied in the Zementis website.



You then need to click on the model tab, which in this case is "Audit_NN" to be able to upload a data file for verification or scoring. Note that the tab will contain the model description together with a link to a file containing possible warnings that were generated by ADAPA during model uploading.



Note that the ADAPA Predictive Analytics demo will only accept files for uploading up to 1MB.

You can now browse for the data validation file, in this case "Audit_NN.csv" which contains 2000 records with inputs as well as the expected output value for the "Audit_NN" model. This file is also available in the PMML examples page of the Zementis website.

Once the data file is uploaded and results are generated, the ADAPA Predictive Analytics demo will display the first five records as shown below.



Note that in this case, there is a perfect match between computed and expected values for the predicted field. If, for any reason, the results do not match, the first five mismatched records will be displayed instead. All values under the "Computed Value" column are hyperlinks. By clicking on one of these values, you will have access to debugging information which can be very useful in case you do get a mismatch.

Once you are able to validate that the model is working as expected, you can browse and upload a file for scoring.

Friday, March 14, 2008

Does ADAPA support rules and reporting?

Currently, rules and reporting are only part of the ADAPA Enterprise Edition which is offered as a software license for on-site deployment. If you are interested in this Edition or would like to see us offer it as a hosted solution, please let us know!

The ADAPA Predictive Analytics Edition, featured in the limited demo on the Zementis web site and offered as a hosted solution via Amazon's Elastic Compute Cloud (EC2), does not include rules or reporting. It is focused on the deployment of predictive models using PMML.