Showing posts with label ksvm. Show all posts
Showing posts with label ksvm. Show all posts

Friday, February 8, 2008

Extending the SVM element in PMML to allow for multiclass-classification using the one-against-one approach in ADAPA.

For multiclass-classification with k classes, k > 2, the R ksvm function uses the `one-against-one'-approach, in which k(k-1)/2 binary classifiers are trained; the appropriate class is found by a voting scheme.

In order to implement such a scheme in ADAPA, we needed to extend PMML 3.2. Basically, PMML asks for a single target category to be associated with each Support Vector Machine. In case of a binary classifier, PMML actually asks for the alternate binary target category.

So, in order to implement the one-against-one approach, we needed to give each machine an extra alternate target category given that all k(k-1)/2 machines are binary classifiers.

Note that ADAPA also supports one-against-all approach (also known as one-against-rest) for which the PMML extension is not necessary.

Voting schemes for multiclass-classification problems in SVM are described in:

C.-W. Hsu and C.-J. Lin
A comparison on methods for multi-class support vector machines
IEEE Transactions on Neural Networks, 13(2002) 415-425.
http://www.csie.ntu.edu.tw/~cjlin/papers/multisvm.ps.gz

What types of SVM models built with R ksvm can I export to PMML?

You can basically export any SVM models you build using the ksvm R package into PMML 3.2 by using the PMML package available from Togaware. See link below:

http://rattle.togaware.com/

The PMML package is also available through CRAN.

The function to be used is pmml.ksvm. With this function, a PMML representation can be obtained for SVMs implementing:

  • multi-class classification
  • binary classifcation
  • regression

Note that it also implements transformations for the input variables by following the scaling scheme used by ksvm. It also uses transformations to create dummy variables for any categorical inputs.

We have encountered an issue with ksvm while building the dummy-fication piece. Basically, except for the first categorical variable in a model, all other categorical variables loose their first input category. That is, ksvm does not create a dummy variable for the first category. We have already pointed this out to the author of ksvm. For now, the PMML export code mimics this issue so that you can get a match during scoring.

The example below shows how to train a support vector machine to perform binary classification using the audit dataset provided by Togaware (thanks to Graham Williams).

require(kernlab)
audit <- read.csv(file("http://rattle.togaware.com/audit.csv"))
myksvm <- ksvm(as.factor(Adjusted) ~ ., data=audit[,c(2:10,13)], kernel="rbfdot", prob.model=TRUE)
pmml.ksvm(myksvm, data=audit)

BTW, any models you build in ksvm and export using the PMML package can be uploaded directly into ADAPA for scoring.