Once translated into PMML, models can be easily deployed and scored against new incoming data. For example, models can be deployed in ADAPA for real-time scoring or UPPI for big data scoring in-database or Hadoop.
How does it work?
Easy! Once you build your model using the scikit-learn library, all you need to do is write out a .txt file containing the model's parameters. The .txt file needs to follow a strict order and contain all the required information. This is the file used by Py2PMML to generate the corresponding PMML file for your model. With the PMML file in hand, you can simply deploy it in ADAPA for real-time scoring or UPPI for big data scoring.
What are the supported model types?
As of now, the supported scikit-learn predictive modeling classes are:
- Class RandomForestClassifier - Random Forest Models
- Class RandomForestRegressor - Random Forest Models
- Class DecisionTreeClassifier - Decision Trees
- Class DecisionTreeRegressor - Decision Trees
- Class KMeans - KMeans Clustering
- Class LogisticRegression - Logistic Regression
- Class LinearRegression - Linear Regression
- Class GaussianNB - Gaussian Naive Bayes
- Class BernouilliNB - Bernouilli Naive Bayes
Supported pre-processing classes are (contact us for details):
- Class MinMaxScaler - Standardizes features by scaling each feature to a given range
- Class OneHotEnconder - Creates dummy continuous variables out of categorical variables
- Missing Value Replacement
Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.