Welcome to the technical support knowledge base for ADAPA on the Cloud. Our blogs cover general questions and information related to predictive models, PMML, and supported functionality of the ADAPA predictive decisioning platform. Please use the search tool or the FAQ Categories to the left to find the information you are looking for. If you can't find it, feel free to contact us.



© Predictive Analytics by Zementis, Inc. - All Rights Reserved.



Search This Blog

Loading...

Monday, October 12, 2009

Test data sets available for demo-ing the ADAPA Add-in for Microsoft Office Excel.

In the e-mail you received from Zementis containing the link to the Excel add-in download, you will also find the link to a test data file in Microsoft Excel 2007 format. This file contains three data sets that can be used together with the ADAPA demo instance for scoring. That is, of course, if you choose to "Use Demo" in the ADAPA Add-in "Setup Connection" dialog box. The demo comes pre-loaded with three models that can be used for scoring the data sets included in the test file. These three models were trained and expressed in PMML by Zementis scientists using KNIME, R and SPSS before being uploaded in ADAPA.


The three models available in the ADAPA demo instance and their respective data set available for scoring in the sample Excel file are:


IrisMLRModel: A multinomial logistic regression model trained with the Iris data set.

The Iris data set: This is perhaps the best known data set to be found in the pattern recognition literature. The data set contains 3 classes representing different types of the Iris plant. Each class is represented by 50 records. For more info on the Iris data set, please check the Iris page at the UCI Repository of Machine Learning Databases - http://archive.ics.uci.edu/ml/datasets/Iris (Asuncion, A. & Newman, D.J. (2007). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science). Note that for scoring, the class has been omitted from the data set. It will be produced by ADAPA as a result of the scoring process together with the probability associated with each of the three classes of Iris plant covered by the data: Setosa, Versicolor, and Virginica.


AuditSVMModel: A support vector machine trained with the Audit data set.

The Audit data set: This data set is supplied as part of the Rattle package - http://rattle.togaware.com (it is also available for download as a CSV file from http://rattle.togaware.com/audit.csv). This is an artificial data set consisting of fictional clients who have been audited, perhaps for tax refund compliance. For each case an outcome is recorded (whether the taxpayer's claims had to be adjusted or not) and any amount of adjustment that resulted is also recorded. Note that for scoring, the adjusted field has been omitted from the data set. It will be produced by ADAPA as a result of the scoring process.


LoanNNModel: A neural network model trained with mortgage loan data.

The Loan data set: This data set contains loan level data for several adjustable rate mortgage (ARM) loans. ARM loans originated by subprime lenders in the US were a key factor behind the financial crisis that began in 2008 and affected the entire world. The data set contains eleven features which are used as model inputs. The output is a score signifying the risk of default for each particular loan. The score ranges from 0 to 1000 in which the higher the score, the higher the risk of default. Note that the score produced by ADAPA for this data set is hypothetical.

0 comments:






Copyright © 2009 Zementis Incorporated. All rights reserved.

Privacy - Terms Of Use - Contact Us