Monday, February 22, 2010

Scorecards in PMML: A Primer

Scorecards are extremely popular, since they provide a clear and effective to way to predict outcome for a variety of situations. By clear I mean that the logic behind the scores obtained via a scorecard can be easily understood and appreciated. Scorecards are effective for situations in which you want to predict the probability of someone or something being "bad" or "good". These probabilities can then be readily used for decision making.

Scorecards, as any data mining model, contain a set of inputs fields which are used to predict a certain target value. This prediction can be seen as an assessment about a prospect, a customer, or a scenario for which an outcome is predicted based on historical data. In a scorecard, input fields, also referred to as characteristics (for example, "Age"), are broken down into attributes (for example, "20-29" and "30-39" age groups) with specific partial scores associated with them. These scores represent the influence of the input attributes on the target and are readily available for inspection. For example, a high partial score for a particular attribute could imply a heavy dependence of the target value on that attribute. Partial scores are then summed up so that an overall score can be obtained for the target value (is it good? Or, is it bad?).

ADAPA provides two different ways for scorecards to be represented. The first being through rules as described in the ADAPA Scorecard Guide and the second, as described in here, through the use of PMML.
Given that PMML does not offer a specific scorecard element, we use a RegressionModel element to implement different score allocation strategies and to compute the overall score. More specifically, we show in here how to represent different attributes (categorical or continuous ... and complex) and their corresponding partial scores by the use of data transformations and built-in functions (see tutorial on data processing in PMML).

Score Allocation for Categorical Attributes

Typical score allocation for categorical attributes is done by associating a partial score with each attribute. In the PMML code shown below, input field "var1" may contain one of the following values (or attributes): "positive", "negative", and "neutral", for which a partial score is defined (see table below for score allocation details). Note, that it also accounts for missing values. In the PMML example, the resulting partial score is assigned to derived variable "derivedVar1".

Note that for categorical attributes, we simply use the MapValues element as described in to implement score allocation. If the input field consists of a large set of attributes, score allocation can be easily implemented by using the element TableLocator.

Score Allocation for Continuous Attributes

In the PMML code shown below, continuous input field "var2" has been discretized into three ranges or attributes: "less than 100", "greater or equal to 100 and less than 200", and "greater than 200" (see table for score allocation details). Note, that it also accounts for missing values. In the PMML example, the resulting partial score is assigned to derived variable "derivedVar2a".

Note that for continuous attributes, we simply use the Discretize element to implement score allocation.

Score Allocation for Complex Attributes

If the attributes are complex, built-in functions can be used to implement score allocation. The PMML code shown below uses several built-in functions to implement a complex score allocation (see table for details). As in the previous score allocation examples, this also accounts for missing values. In the PMML example, the resulting partial score is assigned to derived variable "derivedVar2b".

Note that we are using built-in function IF-THEN-ELSE in conjunction with arithmetic operators to implement the necessary logic. Built-in functions in PMML are very powerful and can be used to represent a variety of complex score allocation strategies.

Computing the Overall Score

The score allocation examples shown in here include input attributes which are either related to "var1", which is a categorical field, or to "var2", which is continuous. For each attribute associated with these fields, a partial score is assigned to each derived field: "derivedVar1", "derivedVar2a", and "derivedVar2b" by using a PMML transformation.

Finally, as shown in the PMML code below, the sum of all partial scores is implemented via a regression table for which all regression coefficients are set to 1. Note also that score allocation for all attributes are represented as transformations placed inside the LocalTransformations element.

A file containing the full PMML example shown here as well as data for model verification can be found in the PMML Examples page of the Zementis website.

There is a whole lot of information posted in different websites about Scorecards, PMML and ADAPA. If you want to learn more on how to represent data processing in PMML including different ways to perform score allocation for complex attributes, make sure to check our PMML Data Processing Primer.

For a more detailed list of ADAPA features, feel free to take a tour of ADAPA on the Cloud or check what is inside the ADAPA box. If you are still unsure about any of the features or would like to learn more about them and how ADAPA can represent scorecards using rules, drop us a note or give us a call. You can find our contact information in the contacts page of the Zementis website.

Tuesday, February 16, 2010

3 Ways to Access Your Predictive Analytics in the Cloud

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

Thursday, February 4, 2010

ADAPA Add-in for Excel - On-Line Video Tutorial is Now Available!

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

ADAPA 2.20 Released: Enhanced PMML Support and Improved Web Console.

Zementis is constantly adding new features to ADAPA. In its latest 2.20 release (February 2, 2010), it adds important new features with automatic PMML conversion, model composition, and an improved Web Console experience.

Integrated PMML Converter (and Corrector)

With this release, the popular PMML Converter has been incorporated seamlessly into ADAPA. As a result, ADAPA can now directly import older PMML versions. You may be already aware that many of the modeling tools still export older versions of PMML. Now, you can directly import these older versions of PMML into ADAPA without having to manually upgrade them externally. In addition, we have added functionality to automatically correct a number of common problems found in PMML generated by some popular modeling tools, allowing the models to work as intended.

Remember that we recommend that you always run a score matching test to validate the imported models against your data and modeling environment. And if you find that we still missed something in our PMML converter/corrector, do let us know.

Model Composition

ADAPA now supports composing of multiple models into a single model. This important feature supports a variety of model composition cases such as model selection or segmentation, model sequencing, and value post-processing. For examples and instructions on how to represent model composition in PMML and ADAPA, please refer to the ADAPA Predictive Analytics Guide available for download from the ADAPA Console Help page.

Web Console

The ADAPA Web Console now allows you to download any of the imported models. This feature makes it easy to review your models, including any warning messages generated during the import. In addition, given that all imported models are automatically converted, the download feature allows you to retrieve and review the upgraded (and possible corrected) version of your model.

For more information on this exciting new feature, please feel free to contact us.

Copyright © 2009-2014 Zementis Incorporated. All rights reserved.

Privacy - Terms Of Use - Contact Us