Friday, December 9, 2011

In-database Scoring with PMML, Zementis, and Sybase IQ: Big Data Analytics Made Easy

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

Friday, December 2, 2011

KNIME PMML Support: Model Import and Export + Pre-processing

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

Tuesday, October 25, 2011

Operational Deployment of Predictive Solutions: Lost in Translation? Not with PMML

Traditionally, the deployment of predictive solutions have been, to put it mildly, cumbersome. As shown in the Figure below, data mining scientists work hard to analyze historical data and to build the best predictive solutions out it. Engineers, on the other hand, are usually responsible for bringing these solutions to life, by recoding them into a format suitable for production deployment. Given that data mining scientists and engineers tend to inhabit different information worlds, the process of moving a predictive solution from the scientist's desktop to production can get lost in translation.


Luckily, the advent of PMML (Predictive Model Markup Language) changed this scenario radically. PMML is the de facto standard used to represent predictive solutions. In this way, there is no need for scientists to write a word document describing the solution. They can just export it as a PMML file. Today, all major data mining tools and statistical packages support PMML. These include IBM SPSS, SAS, R, KNIME, RapidMiner, KXEN, ... Also, tools such as the Zementis Transformations Generator and KNIME allow for easy PMML coding for pre- and post-processing steps.

Great! Once a PMML file exists, it can be easily deployed in production with ADAPA, the Zementis scoring engine. ADAPA even allows for models to be deployed in the Amazon Cloud and be accessed from anywhere via web-services. Zementis also offers in-database scoring via its Universal PMML Plug-in, which is also available for Hadoop. In this way, a process that could take 6 months, now takes minutes.


PMML and ADAPA have transformed model deployment forever. If you or your company are still spending time and resources in deploying your predictive analytics the traditional way, make sure to contact us. The secret behind exceptional predictive analytics is out!

Friday, April 22, 2011

ADAPA 3.4 Released: Association Rules

PMML, the Predictive Model Markup Language, allows for a predictive analytic model to be developed in one application and easily moved to another for production deployment and execution.

Once a predictive model is exported from a PMML-compliant tool such as SAS EM, SPSS/IBM, R, KNIME, RapidMiner, ... it can be uploaded directly into the Zementis ADAPA engine which makes the model available for execution via its console or as a web-service. ADAPA can already import most of the techniques defined by the PMML standard and now, with the release of ADAPA 3.4, we have expanded it even further to cover Association Rules.

Association Rules


Analysts always want to explore rules and relations between variables in large data sets. The learning mechanism of Association Rules serves this purpose. The rules discovered by Association Rules often provide useful information for marketing activities. For example, they can be used for discovering relations between products in transaction data in supermarkets. In this way, an association rule can be found to indicate that if a customer purchases beef and cat food together, he/she is most likely to also buy tuna cans.

An Association Rule Model in PMML is represented by the element "AssociationModel". ADAPA and PMML support two different formats for representing Association Rules. These are "rectangular" and "transactional". To learn more about these two formats, please read our posting: Association Rules in ADAPA.

PMML Conversion

In addition, with the release of ADAPA 3.4, we were able to make ADAPA even better when it comes to converting and correcting PMML files. This is yet another big step towards true interoperability. In many cases, even if the model has syntactic or semantic problems, ADAPA automatically corrects known issues for models exported from several model development environments. For that, we analyze PMML files submitted to us by our partners and clients.


If for any reason, your PMML code cannot be converted or corrected automatically, feel free to contact us. We are here to help!

Wednesday, April 20, 2011

Predictions in the Cloud with ADAPA

ADAPA is the first standards-based, real-time predictive decisioning engine available on the market and the first scoring engine accessible on the Amazon Cloud as a service. ADAPA on the Cloud combines the benefits of Software as a Service (SaaS), the scalability of cloud computing and the extensive feature set of ADAPA on Site.

What do you mean by standards-based?

ADAPA executes predictive models represented in PMML (Predictive Model Markup Language). PMML is the standard for representing predictive models currently exported from all major commercial and open-source data mining tools.

With PMML, you can basically build your model in IBM SPSS, SAS, R, KNIME, ... export it as a PMML file and upload it in ADAPA. Once you do that, your model is ready to be used from anywhere via web-services. You can even execute your models directly from within Excel.




Is ADAPA really fast?


ADAPA is very fast. We recently published a study on the ACM SIGKDD Newsletter in which we show that ADAPA can easily score thousands of transactions per second. In the High-CPU Extra-Large instance, ADAPA can score 300 million transactions per hour. FAST!

What kind of models does it support?

Modeling techniques currently supported are:
  • Neural Networks
  • Association Rules
  • Support Vector Machines
  • Naive Bayes Classifiers
  • Ruleset Models
  • Clustering Models (including Two-Step Clustering)
  • Decision Trees
  • Regression Models (including Cox Regression Models)
  • Scorecards
  • Multiple Models (ensemble, composition, and segmentation)

How about data pre- and post-processing?

ADAPA transforms your raw data into meaningful feature detectors before scoring it. It post-processes the output of your predictive model so that it conforms to your requirements. ADAPA supports all the PMML built-in functions and data manipulations (as well as user defined functions). To learn more about how to represent pre- and post-processing operations in PMML, please take a look at our PMML data manipulation primer or simply contact us.

Can I combine predictive analytics with business rules?


ADAPA provides seamless integration of predictive analytics and rules. Simply put, ADAPA allows data driven insight and expert knowledge to be combined into a single and powerful decision strategy. That is because in addition of a sophisticated predictive analytics engine, ADAPA also incorporates the full functionality of a rules engine.

How do I pay for it? Is it expensive?

Once you sign up for ADAPA on the Cloud through Amazon.com, ADAPA charges show up on your credit card bill. Amazon handles all the billing. You can even use the same account you use to buy books. ADAPA on the Cloud does not cost an arm and a leg. Check out our pricing! And, the best part, you pay only for what you actually use.

Tuesday, April 19, 2011

Webinar: Deploying Predictive Analytics with PMML, Revolution R, and ADAPA


Presented: Wednesday, April 13th, 2011
Presenters: Alex Guazzelli, Vice President - Analytics, Zementis Inc.
David Smith, Vice President - Marketing, Revolution Analytics

View the on-demand replay of the webinar

Download the webinar presentation

The rule in the past was that whenever a predictive model was built in a particular development environment, it remained in that environment forever, unless it was manually recoded to work somewhere else. This rule has been shattered with the advent of PMML (Predictive Modeling Markup Language). By providing a uniform standard to represent predictive models, PMML allows for the exchange of predictive solutions between different applications and various vendors.

In this joint webinar from Revolution Analytics and Zementis, you’ll learn:
  • How to use data to create predictive models in the R language, with Revolution R Enterprise
  • The purpose of the PMML standard, and predictive models it supports
  • How to export predictive models from R using PMML
  • How to score predictive models in PMML using ADAPA, from within Microsoft Excel and in the cloud
This webinar will be suitable for any technology professionals with an interest in predictive models and who wishes to learn more about Revolution R, PMML and ADAPA.

Download the whitepaper:
Deploying Advanced Analytics Using R & PMML

Monday, April 18, 2011

ADAPA and PMML Association Rules

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

Wednesday, April 6, 2011

ADAPA Web Services

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

Saturday, March 12, 2011

Universal PMML Plug-in for EMC Greenplum Database

It is our pleasure to announce a new Zementis product, the Universal PMML Plug-in for in-database scoring. Available now for the EMC Greenplum Database, a high-performance massively parallel processing (MPP) database, the plug-in leverages the Predictive Model Markup Language (PMML) to execute predictive models directly within EMC Greenplum, for highly optimized in-database scoring.
Developed by the Data Mining Group (DMG), PMML is supported by all major data mining vendors, e.g., IBM SPSS, SAS, Teradata, FICO, STASTICA, Microstrategy, TIBCO and Revolution Analytics as well as open source tools like R, KNIME and RapidMiner. With PMML, models built in any of these data mining tools can now instantly be deployed in the EMC Greenplum database. The net result is the ability to leverage the power of standards-based predictive analytics on a massive scale, right where the data resides.
"By partnering with Zementis, a true PMML innovator, we are able to offer a vendor-agnostic solution for moving enterprise-level predictive analytics into the database execution environment," said Dr. Steven Hillion, Vice President of Analytics at EMC Greenplum. "With Zementis and PMML, the de-facto standard for representing data mining models, we are eliminating the need to recode predictive analytic models in order to deploy them within our database. In turn, this enables an analyst to reduce the time to insight required in most businesses today."

Want to learn more?

To learn more about how the EMC Greenplum Database and the Universal PMML Plug-in work together, feel free to:
The Universal PMML Plug-in for the EMC Greenplum Database is available now. Contact us today for more information.

Thursday, March 10, 2011

Friday, February 11, 2011

Predictive Analytics Toolkit: Open Standards and Cloud Computing

Organizations around the globe increasingly recognize the value that predictive analytics offers to their business. The complexity of development, integration, and deployment of predictive models, however, is often considered cost-prohibitive for many projects. In light of mature open source solutions, open standards, and SOA principles we offer an agile model development life cycle that allows us to quickly leverage predictive analytics in operational environments.

Starting with data analysis and model development, you can effectively use the Predictive Model Markup Language (PMML) standard, to move complex decision models from the scientist's desktop into a scalable production environment hosted on the Amazon Elastic Compute Cloud (Amazon EC2).

Expressing Models in PMML

PMML is an XML-based language used to define predictive models. It was specified by the Data Mining Group (DMG), an independent group of leading technology companies including Zementis. By providing a uniform standard to represent such models, PMML allows for the exchange of predictive solutions between different applications and various vendors.

Open source statistical tools such as R can be used to develop data mining models based on historical data. R allows for models to be exported into PMML which can then be imported into an operational decision platform and be ready for production use in a matter of minutes.

On-Demand Predictive Analytics

Amazon EC2 is a reliable, on-demand infrastructure on which we offer the ADAPA Predictive Decisioning Engine based on the Software as a Service (SaaS) paradigm. ADAPA imports models expressed in PMML and executes these in batch mode, or real-time via web-services.

Our service is implemented as a private, dedicated Amazon EC2 instance of ADAPA. Each client has access to his/her own ADAPA instance via HTTP/HTTPS. In this way, models and data for one client never share the same engine with other clients.

Using a SaaS solution to break down traditional barriers that currently slow the adoption of predictive analytics, our strategy translates predictive models into operational assets with minimal deployment costs and leverages the inherent scalability of utility computing.

In summary, ADAPA allows for:
  • Cost-effective and reliable service based on Amazon’s EC2 infrastructure
  • Secure execution of predictive models through dedicated and controlled instances including HTTPS and Web-Services security
  • On-demand computing. Choice of instance type (small, large, extra-large, ...) and launch of multiple instances.
  • Superior time-to-market by providing rapid deployment of predictive models and an agile enterprise decision management environment.
For a practical guide, watch:

Friday, January 28, 2011

PMML group in LinkedIn: Close to 1,000 members!


The Predictive Model Markup Language (PMML) is the leading standard for representing statistical and data mining models. With PMML, it is straightforward to develop a model on one system using one application and deploy the model on another system using another application. PMML reduces complexity and bridges the gap between development and production deployment of predictive analytics.

PMML is governed by the Data Mining Group (DMG), an independent, vendor led consortium that develops data mining standards. PMML is currently supported by over 20 vendors and organizations and awareness as well as use of the standard is growing quickly. To establish a conduit in which people can come together to learn and discuss topics related to PMML, we have created a PMML interest group in LinkedIn. The group is going strong, with many PMML-related discussions and announcements every week. And, we are happy to announce that the group is now nearing 1,000 members!

To join the Predictive Model Markup Language (PMML) group on LinkedIn, please follow this link:

http://www.linkedin.com/groupRegistration?gid=2328634

The group aims to serve as a central resource regarding the practical application of PMML, its benefits for business and IT. PMML increases business agility by eliminating the need for proprietary solutions or custom code development. For this reason, it is a critical element in the quest for business process optimization and automated, intelligent decisions.

We encourage active participation in the PMML group from the entire community, please post your questions! The group already contains postings related to
  • The value of PMML for business and IT

  • PMML powered products

  • Links to a general introduction and overview presentation
If your organization is already supporting the PMML standard, please feel welcome to share information about your products which do so.

Thursday, January 20, 2011

ADAPA in the Cloud - Full Feature List

Broad support for predictive algorithms
ADAPA supports an extensive collection of statistical and data mining algorithms. These are:


  • Ruleset Models
  • Clustering Models (Distribution-Based, Center-Based, and 2-Step Clustering)
  • Decision Trees (for classification and regression) together with multiple missing value handling strategies (Default Child, Last Prediction, Null Prediction, Weighted Confidence, Aggregate Nodes)
  • Naive Bayes Classifiers
  • Association Rules
  • Neural Networks (Back-Propagation, Radial-Basis Function, and Neural-Gas)
  • Regression Models (Linear, Polynomial, and Logistic) and General Regression Models (General Linear, Ordinal Multinomial, Generalized Linear, Cox)
  • Support Vector Machines (for regression and multi-class and binary classification)
  • Scorecards (including reason codes and point allocation for categorical, continuous, and complex attributes)
  • Multiple Models (Segmentation, Ensembles - including Random Forest Models, Chaining and Model Composition)

Model interfaces: pre- and post-processing
Additionally, ADAPA supports a myriad of functions for implementing data pre- and post-processing. These include:
  • Text Mining
  • Value Mapping
  • Discretization
  • Normalization
  • Scaling
  • Regular Expressions
  • Logical and Arithmetic Operators
  • Lookup Tables
  • Custom Functions
and much much more

If you think of anything ADAPA cannot do or something else you need to do in terms of data manipulation, let us know.

Automatic conversion (and correction) for older versions of PMML
ADAPA consumes model files that conform to PMML, version 2.0 through 4.2. If your model development environment exports an older version, ADAPA will automatically convert your file into a 4.2 compliant format. It will also correct a number of common problems found in PMML generated by some popular modeling tools, allowing the models to work as intended.

Web-based management and interactive execution of predictive models and business rules
Model management: Models and rule sets are deployed and managed through an intuitive, Web-based management console, the ADAPA Console.
  • Model verification: The ADAPA Console includes a model validation test, allowing models to be verified for correctness. By providing ADAPA a test file containing input data and expected results for a model, the engine will report any deviations from expected results, greatly enhancing traceability of errors and debugging of model deployment issues. The console also provides easy access to our rules testing framework in which business rules are submitted to regression testing and acceptance.
  • Batch-scoring: The console also provides functionality to upload a (compressed) CSV data file and batch-scores it against any of the deployed models. Results are returned in the same format and may be downloaded for further processing and visualization.

Simplified integration via SOA
Service Oriented Architecture (SOA) principles simplify integration with existing IT infrastructure. Since ADAPA publishes all deployed models and rule sets as a Web-Service, you can score data records from within your own environment. With the simple execution of a web service call (SOAP or REST), you are able to leverage the power of predictive models and business rules on-demand or in real-time.

Data scoring from inside Excel
The ADAPA Add-in for Microsoft Office Excel 2007, 2010 and 2013  allows you to easily score data using ADAPA in the Cloud. Once the Add-in is installed, all you need to do is to select your data in Excel, connect to ADAPA and start scoring right away. Your predictions will be made available as new columns.

On-demand predictive analytics solution
ADAPA in the Cloud is a fully hosted Software-as-a-Service (SaaS) solution. You only pay for the service and the capacity that is used, eliminating the necessity for expensive software licenses and in-house hardware resources. As the business grows, ADAPA in the Cloud provides a cost-effective expansion path, for example, by adding multiple ADAPA instances for scalability or failover. The SaaS model removes the burden for you to manage a scalable, on-demand computing infrastructure.
At any given time, launch one or more instances using the ADAPA Control Center Web interface. For each instance, select the most appropriate instance type: “large”, “high-CPU” or “high-IO”.

Private instance for all your decisioning needs
We provide you with a single-tenant architecture. The service is implemented as a private, dedicated instance of ADAPA that encapsulates your predictive models and business rules. Only you have access to your private ADAPA instance(s) via HTTPS. Your decisioning files and data never share the same engine with other clients. You launch and terminate your dedicated ADAPA instances through the secure ADAPA Control Center.

Trusted, secure, scalable cloud infrastructure
Zementis leverages Amazon Web Services providing on-demand infrastructure for ADAPA on the Cloud. The Amazon Elastic Compute Cloud (Amazon EC2) offers utility computing with virtually unlimited scalability. Billing and subscription management are handled through Amazon. Payment information remains secure and confidential while enjoying the convenience of using your existing Amazon.com account. Yes, the same account you use to buy book.

Wednesday, January 19, 2011

ADAPA 3.3 Released: Extended PMML Coverage

PMML, the Predictive Model Markup Language, allows for a predictive analytic model to be developed in one application and easily moved to another for production deployment and execution.

Once a predictive model is exported from a PMML-compliant tool such as SAS EM, SPSS/IBM, R, KNIME, RapidMiner, ... it can be uploaded directly into the Zementis ADAPA engine which makes the model available for execution via its console or as a web-service. ADAPA can already import most of the techniques defined by the PMML standard and now, with the release of ADAPA 3.3, we have expanded it even further to cover Cox Regression and Ruleset models.

Cox Regression Models

Cox proportional hazards model of survival is used in various industries including pharmaceutical and telecommunications.

Ruleset Models

Ruleset models can be thought of as flattened decision tree models. A ruleset consists of a number of rules in which each rule contains a predicate and a predicted class value.

As of now, ADAPA supports the following modeling techniques:
  • Ruleset Models
  • Clustering Models (Distribution-Based, Center-Based, and 2-Step Clustering)
  • Decision Trees (for classification and regression) together with multiple missing value handling strategies (Default Child, Last Prediction, Null Prediction, Weighted Confidence, Aggregate Nodes)
  • Naive Bayes Classifiers
  • Neural Networks (Back-Propagation, Radial-Basis Function, and Neural-Gas)
  • Regression Models (Linear, Polynomial, and Logistic) and General Regression Models (General Linear, Ordinal Multinomial, Generalized Linear, Cox)
  • Support Vector Machines (for regression and multi-class and binary classification)
  • Scorecards (point allocation for categorical, continuous, and complex attributes)

Additionally, ADAPA supports a myriad of functions for implementing data pre- and post-processing. These include:

  • Value Mapping
  • Discretization
  • Normalization
  • Scaling
  • Logical and Arithmetic Operators

and much much more

If you think of anything ADAPA cannot do or something else you need to do in terms of data manipulation, let us know.

For your free trial of ADAPA, please register at: https://myadapa.zementis.com/

Thursday, January 13, 2011

Predictive Analytics + Business Rules = Enhanced Decisioning


Business Rules are ubiquitous today. They manage the day to day operations of thousands of companies worldwide. From stocking to maintenance, rules are an integral part of the way we do business in the 21st century. This kind of knowledge, know as Expert Knowledge is forged from years of experience, or what turned out to be the “logical thing to do”.

However, along with the information age, more and more data started being gathered all over the world about the processes and services we as a society came to benefit from. In this sea of data, predictive algorithms were designed to extract its hidden patterns, i.e. knowledge that is hidden from the human eye. This is known as Data-Driven Knowledge.

In an ideal world, business rules and predictive models live side by side benefiting from each other since both encode complementary types of knowledge.

In the presentation below, originally given at RulesFest 2010, Dr. Alex Guazzelli starts by differentiating the two types of knowledge. He then makes the point that companies can get Enhanced Decisioning whenever expert and data-driven knowledge are combined. Dr. Guazzelli goes on to describe the making of a predictive solution by using a "fish processing plant" as an analogy for any process that can benefit from intelligent decisioning. He ends by showing how such a solution can be deployed using PMML (the Predictive Model Markup Language) and easily moved to the production environment using ADAPA, the Zementis Predictive Decisioning Engine.






Copyright © 2009-2014 Zementis Incorporated. All rights reserved.

Privacy - Terms Of Use - Contact Us