Monday, February 11, 2008
What happens if my data contains records with missing values? Will ADAPA score such records anyway?
If an input value is missing in a given data record and the value is part of an active PMML input variable (as defined in the Mining Schema element), then ADAPA will try to replace the missing value by the replacement value specified in the Mining Schema. So, if you get a score back for a data record containing missing values, that's because ADAPA is replacing the missing values by the replacement values specified in your PMML file.
I mentioned "try" before because you may have not specified a replacement value in the mining schema. If that is the case, ADAPA will not produce a score for the given data record with missing data.
This is slightly different than what is implied by PMML itself (see the mining schema PMML element), but we feel it gives the user better control over what ADAPA should do in case of missing values. In this way, if your model is a neural network model, for example, you will need to explicitly define the replacement value to be zero for every input if that is what you want. This is in contrast to having ADAPA do that in an automatic way for every type of modeling technique.
Click here to learn more on how missing values are handled in decision trees.