Welcome to the technical support knowledge base for ADAPA on the Cloud. Our blogs cover general questions and information related to predictive models, PMML, and supported functionality of the ADAPA predictive decisioning platform. Please use the search tool or the FAQ Categories to the left to find the information you are looking for. If you can't find it, feel free to contact us.



© Predictive Analytics by Zementis, Inc. - All Rights Reserved.



Search This Blog

Loading...

Wednesday, March 19, 2008

What is the format I should use for my data file for batch scoring in ADAPA?

You should upload your data as a CSV file. Make sure the data file contains all the input fields you actually use in your model. If you are missing a field, ADAPA will not generate any scores.

Also, the first row should contain the name of the variables.

For example, for the model "Audit_NN" available in the PMML Examples page of the Zementis website, the first 6 rows of the .csv data file used to validate the model look like:

AGE,Employment,Education,Marital,Occupation,Income,Sex, Deductions,Hours,Adjusted
38,Private,College,Unmarried,Service,81838,Female,0,72,0
35,Private,Associate,Absent,Transport,72099,Male,0,30,0
32,Private,HSgrad,Divorced,Clerical,154676.74,Male,0,40,0
45,Private,Bachelor,Married,Repair,27743.82,Male,0,55,1
60,Private,College,Married,Executive,7568.23,Male,0,40,0

ADAPA also supports the use of double quotes around any of the fields (data or field names). Therefore, the following line is also compatible with ADAPA:

"38","Private","College","Unmarried","Service", ...

You should use double quotes to include commas inside a string as shown below:

"Ryan, Private": without double quotes, ADAPA would treat this single value as two strings.

You should also use double quotes to represent blank characters before or after a string. For example:

" AGE", "AGE ", and "AGE" represent different values whereas "AGE" and AGE are the same.

To represent double quotes inside a string, repeat them twice: "COLOR:""YELLOW""" will be interpreted by ADAPA as COLOR:"YELLOW". Make sure you only use the two adjacent double quotes inside a string surrounded by double quotes.

For more on how to represent your .csv file, click here (beware though that ADAPA does not allow fields to contain embedded line-breaks. In ADAPA, a record is represented by a single line).

Predicted Field

Also, note that in the example above the variable "Adjusted" is actually the predicted field. It is present in the example above since we are using this file for validation (score matching). Obviously, if you are only trying to score your data, you should leave the predicted column out. ADAPA will return computed scores for each entry.

0 comments:






Copyright © 2009 Zementis Incorporated. All rights reserved.

Privacy - Terms Of Use - Contact Us