Wednesday, April 16, 2014

Standards-based Predictive Analytics with SAP HANA and Zementis ADAPA

At the recent DEMO Enterprise 2014 conference, Zementis announced its participation in the SAP® Startup Focus program and launched ADAPA for SAP HANA, a standards-based predictive analytics scoring engine. 

ADAPA for SAP HANA provides a simple plug-and-play platform to deploy the most complex predictive models and execute them in real-time, even in the context of Big Data.

In joining the SAP HANA Startup Focus program, Zementis set out to address two key challenges related to the operational deployment of predictive analytics:  Agile deployment and scalable execution.

Transactional data has for years pushed the boundaries of predictive analytics. The financial industry, for example, has been using transactional data to detect fraud and abuse for decades with complex custom solutions. Real-time scoring is paramount for companies to be able to predict and prevent fraudulent activity before it actually happens.  Likewise, the Internet of Things (IoT) demands effective processing of sensor data to employ predictive maintenance for detecting issues before they turn into device failures.

To solve these challenges, Zementis combined its ADAPA predictive analytics scoring engine with SAP HANA in a true plug-and-play platform which is universally applicable across all industries.  ADAPA to serve scoring requests and execute predictive models, HANA to offload complex model preprocessing and computation of aggregates.

In this scenario, real-time execution critically depends on HANA serving complex data lookups and aggregate profile computation in a few milliseconds.  In a high-volume environment, such aggregates or lookups may have to be computed over millions of transactions.

ADAPA provides scalable real-time scoring of the core model, plus agility for model deployment through the Predictive Model Markup Language (PMML) industry standard.  Clients are able to instantly deploy existing predictive models from various data mining tools.  For example, you can take a complex predictive model from SAS Enterprise Miner, export it in PMML format and simply make it available for real-time scoring in ADAPA for SAP HANA.  The same process, of course, applies to most commercial tools, e.g. SAP Predictive Analysis, KXEN, IBM SPSS, as well as open source tools like R and KNIME.

The unique aspect of the Zementis / SAP platform is that it combines the benefits of an open standard for predictive analytics with the power of in-memory computing.

For more product details, please see

Monday, March 10, 2014

The Data Mining Group releases PMML version 4.2 - PRESS RELEASE

Chicago, IL 2/25/2014 – The Data Mining Group announced today the release of PMML v 4.2. PMML is an application and system independent XML interchange format for statistical and data mining models. The goal of the PMML standard is to encapsulate a model independent of applications or systems in such a way that two different applications (the PMML Producer and Consumer) can use it.

“As a standard, PMML provides the glue to unify data science and operational IT. With one common process and standard, PMML is the missing piece for Big Data initiatives to enable rapid deployment of data mining models. Broad vendor support and rapid customer adoption demonstrates that PMML delivers on its promise to reduce cost, complexity and risk of predictive analytics,” says Alex Guazzelli, Vice President of Analytics, Zementis. “You can not build and deploy predictive models over big data without using multiple models and no one should build multiple models without PMML,” says Bob Grossman, Founder and Partner at Open Data Group.

Some of the elements that are new to PMML v4.2 include:

  • Improved support for post-processing, model types, and model elements
  • A completely new element for text miningScorecards now introduce the ability to compute points based on expressions
  • New built-in functions, including “matches” and “replace” for the use of regular expressions 

“With PMML our customers and partners are able to drive real value from their predictive models right away, using the open standard,” said Andrew Flint, Senior Director of Product Management at FICO (NYSE: FICO). “Models built in most commercial or open source data mining tools, such as FICO Model Builder or R, can be instantly deployed in the FICO Analytic Cloud using PMML. The net result is quicker time to innovation and value on analytic applications, and the ability to combine the power of standards-based predictive analytics with the scalability of cloud computing.”

Radhika Kulkarni, Vice President of Advanced Analytics at SAS notes, “SAS continues to support the analytic collaboration that PMML provides to users. The recent release of SAS Enterprise Miner 13.1 provides users the ability to not only consume PMML from Open Source R models, but also produce PMML, which can be consumed by other applications. SAS Model Manager enables users to consume and manage R and PMML models as part of the SAS ecosystem. Sharing analytic models is paramount to the analytic lifecycle.”

“We are extremely excited to continue our long-running PMML support to PMML 4.2,” said Scott Cappiello, Vice President, Program Management, MicroStrategy Incorporated. "As a firm believer in providing the maximum analytic flexibility to organizations, PMML provides significant advantage in folding in analytics beyond our native and open source R capabilities, to provide business users the full range of business analytics in the big data age”.

"We are happy to see PMML's impact continuing to grow and will keep being among the first to integrate new PMML features into KNIME. Starting with our summer release KNIME will also support Naive Bayes Models and we will keep adding to its PMML Preprocessing abilities as well," says Kilian Thiel of KNIME.

About PMML:
PMML is the leading standard for statistical and data mining models and supported by over 20 vendors and organizations. With PMML, it is straightforward to develop a model on one system using one application and deploy the model on another system using another application.

About DMG:
The Data Mining Group (DMG) is an independent, vendor led consortium that develops data mining standards, such as the Predictive Model Markup Language (PMML). DMG members include: IBM, MicroStrategy, SAS, Experian, Pervasive Software, Zementis, Equifax, FICO, KNIME, NASA, Open Data Group, Rapid-I, Togaware, and Visa.

For more information about the Data Mining Group and the PMML standard, go to:

PMML 4.2 is here! What is new? What changed?

PMML 4.2 is out! That's really great. The DMG (Data Mining Group) has been working on this new version of PMML for over two years now. And, I can truly say, it is the best PMML ever! If you haven't seen the press release for the new version, please see posting below:

What changed?

PMML is a very mature language. And so, there isn't really dramatic changes in the language at this point. One noteworthy change is that old PMML used to call the target field on a predictive model "predicted". This was confusing since a predicted field is usually the result of scoring or executing a model. The score so to speak. Well, PMML 4.2 clears things up a bit. The target field is now simply "target". A small change, but a huge step towards making it clear that the Output element is where the predicted outputs should be defined.

Continuous Inputs for Naive Bayes Models

This is a great new enhancement to the NaiveBayes model element. We wrote an entire paper about this new feature and presented it at the KDD 2013 PMML Workshop. If you use Naive Bayes models, you should definitely take a look at our article.

And, now you can benefit from actually having our proposed changes in PMML itself! This is really remarkable and we are all already benefiting from it. The Zementis Py2PMML (Python to PMML) Converter uses the proposed changes to convert Gaussian Naive Bayes models from scikit-learn to PMML.

Complex Point Allocation for Scorecards

The Scorecard model element was introduced to PMML in version 4.1. It was a good element then, but it is really great now in PMML 4.2. We added to it a way for computing complex values for the allocation of points for an attribute (under a certain characteristic) through the use of expressions. That means, you can use input or derived values to derive the actual value for the points. Very cool! 

Andy Flint (FICO) and I wrote a paper about the Scorecard element for the KDD 2011 PMML Workshop. So, if you haven't seen it yet, it will get you started into how to use PMML to represent scorecards and reason codes.

Revised Output Element

The output element was completely revised. It is much simpler to use. With PMML 4.2, you have direct access to all the model outputs + all post-processing directly from the attribute "feature".

The attribute segmentId also allows users to output particular fields from segments in a multiple model scenario. 

The newly revised output element spells flexibility. It allows you to get what you need out of your predictive solutions.

For a complete list of all the changes in PMML 4.2 (small and large), see:

What is new?

PMML 4.2 introduces the use of regular expressions to PMML. This is solely so that users can process text more efficiently. The most straightforward additions are simple: 3 new built-in functions for concatenating, replacing and matching strings using regular expressions.

The more elaborate addition is the incorporation of a brand new transformation element in PMML to extract term frequencies from text. The ideas for this element were presented at the KDD 2013 PMML Workshop by Benjamin De Boe, Misha Bouzinier, Dirk Van Hyfte (InterSystems). Their paper is a great resource for finding out the details behind the ideas that led to the new text mining element in PMML. 

Obviously, the changes described above are also new, but it was nice to break the news into two pieces. For the grand-finale though, nothing better than taking a look at PMML 4.2 itself. 


Wednesday, January 29, 2014

Standards in Predictive Analytics: R, Hadoop and PMML (a white paper by James Taylor)

James Taylor (@jamet123) is remarkable in capturing the nuances and mood of the data analytics and decision management industry and community. As a celebrated author and an avid writer, James has been writing more and more about the technologies that transform Big Data into real value and insights that can then drive smart business decisions. It is not a surprise then that James has just made available a white paper entitled "Standards in Predictive Analytics" focusing on PMML, the Predictive Model Markup Language, R, and Hadoop.

Why R? 

Well, you can use R for pretty much anything in analytics these days. Besides allowing users to do data discovery, it also provides a myriad of packages for model building and predictive analytics.

Why Hadoop? 

It almost goest without saying. Hadoop is an amazing platform for processing predictive analytic models on top of Big Data.

Why PMML? 

PMML is really the glue between model building (say, R, SAS EM, IBM SPSS, KXEN, KNIME, Python scikit-learn, .... ) and the production system. With PMML, moving a model from the scientist's desktop to production (say, Hadoop, Cloud, in-database, ...) is straightforward. It boils down to this:

R -> PMML -> Hadoop

But, I should stop here and let you read James' wise words yourself. The white paper is available through the Zementis website. To download it, simply click below.


And, if you would like to check James' latest writings, make sure to check his website:

Wednesday, January 8, 2014

Watch Zementis/Datameer Webinar: Best Practices for Big Data Analytics with Machine Learning

Please watch the  Zementis and Datameer webinar entitled "Best Practices for Big Data Analytics with Machine Learning."


In this webinar, we demonstrate through an industry specific use case how to identify patterns and relationships to make sound predictions using smart data analytics. You will learn best practices on:
  • Selecting the right machine learning approach for business and IT
  • Visualizing machine learning on Hadoop
  • Leveraging existing predictive algorithms on Hadoop

Monday, November 11, 2013

Big Data Scoring - IBM PureData and Zementis Universal PMML Plug-in (UPPI)

In-database scoring is one of the most straightforward ways to gain insights from Big Data. It is no surprise then that the Zementis Universal PMML Plug-in (UPPI) is now being offered for a variety of database platforms. These include IBM Pure Data for Analytics (Netezza), Pivotal/Greenplum, SAP Sybase IQ, Teradata and Teradata Aster. Zementis also offers UPPI for Hadoop/Hive, including IBM Pure Data for Hadoop as well as InfoSphere BigInsights. It is in this context that we travelled to Vegas to attend the IBM Information on Demand (IOD) Conference.

I must say, I am always impressed by the IBM universe of products and tools that are being offered for analytics (descriptive and predictive) as well as Big Data in general. Zementis had a booth inside the Pure Data exhibit area and next to all the Pure Data appliances. As you can imagine, traffic was solid not just because of all the blinking lights but also because the conference itself attracts a lot of people. I believe there were 14 thousand attendants this year.

Why in-database scoring? Well, simple. Not all analytic tasks are born the same. If one is confronted with massive volumes of data that need to be scored on a regular basis, in-database scoring sounds like the logical thing to do. In all likelihood, the data in this case is already stored in a database and, with in-database scoring, there is no data movement. Data and models reside together hence scores and predictions flow on an accelerated pace.

Why scoring in Hadoop? Big Data and Hadoop are somewhat synonymous terms these days, since the latter offers an important technological platform to tackle the challenge of analyzing large volumes of data. In fact, predictive analytics is paramount for companies to extract value and insight from such data. By offering the Universal PMML Plug-in (UPPI) for Hadoop, Zementis takes a big step in making its technology available for companies around the globe to easily deploy, execute, and integrate scalable standards-based predictive analytics on a massive parallel scale through the use of Hive, a data warehouse system for Hadoop.

UPPI brings together essential technologies, offering the best combination of open standards and scalability for the application of predictive analytics. It fully supports the Predictive Model Markup Language (PMML), the de facto standard for data mining applications, which enables the integration of predictive models from IBM/SPSS, SAS, R, and many more.

Thursday, October 31, 2013

ADAPA on Amazon AWS Marketplace: Predictive Analytics and Big Data Scoring 1-click away

Clients benefit from our solutions by being able to use PMML, the Predictive Model Markup Language, to move their predictive models from IBM SPSS, R, SAS EM, ... and deploy them instantly in a variety of platforms, including the Amazon Elastic Compute Cloud (Amazon EC2).

ADAPA on the Amazon Cloud offers the power of our real-time PMML-based scoring engine on the Amazon Cloud. ADAPA on the Amazon Cloud comes pre-installed on a virtual server on the cloud. We call that an "ADAPA Instance".

The AWS (Amazon Web Services) Marketplace gives you the power of having ADAPA at your fingertips on three different types of virtual machines. Once you select the machine type and the cloud region in which you want it to run (US, Europe, Latin America or Asia-Pacific), all you need to select is 1-Click Launch and moments later your ADAPA instance is up and running, ready for deployment and execution. 

Big Data Scoring through ADAPA with S3 Processing

Zementis makes it super easy to score your big data by connecting your Amazon S3 (Simple Storage Service) bucket to your predictive models deployed in ADAPA on the Amazon Cloud. ADAPA with S3 Processing is intended for mission critical applications that require very high throughput of predictive analytics. While ADAPA provides real-time scoring via a Web-services API, S3 Processing addresses use cases with scoring requirements that involve tens or hundreds of millions of rows at a time.

Thursday, October 10, 2013

CIO Review: Zementis selected as one of the top 20 most promising big data companies

Selected by a distinguished panel comprising of CEOs, CIOs, VCs, industry analysts and the editorial board of CIO Review, Zementis has been named by CIO Review as one of the "Top 20 Most Promising Big Data Companies in 2013." Congratulations Zementis!


That comes as no surprise since Zementis is all about kicking down barriers for the fast deployment and execution of predictive solutions. By leveraging the PMML (Predictive Model Markup Language) standard, Zementis' products allow for predictive models built anywhere (IBM SPSS, KXEN, KNIME R, SAS, ...) to be deployed right-away on-site, in the cloud (Amazon, IBM, FICO), in-database (Pivotal/Greenplum, SAP Sybase IQ,  IBM PureData for Analytics/Netezza, Teradata and Teradata Aster) or in Hadoop (Hive or Datameer).

Predictive analytics has been used for many years to learn patterns from historical data to literally predict the future. Well known techniques include neural networks, decision trees, and regression models. Although these techniques have been applied to a myriad of problems, the advent of big data, cost-efficient processing power, and open standards have propelled predictive analytics to new heights.

Big data involves large amounts of structured and unstructured data that are captured from people (e.g., on-line transactions, tweets, ... ) as well as sensors (e.g., GPS signals in mobile devices). With big data, companies can now start to assemble a 360 degree view of their customers and processes. Luckily, powerful and cost-efficient computing platforms such as the cloud and Hadoop are here to address the processing requirements imposed by the combination of big data and predictive analytics. 

Creating predictive solutions is just part of the equation. Once built, they need to be transitioned to the operational environment where they are actually put to use. In the agile world we live today, the Predictive Model Markup Language (PMML) delivers the necessary representational power for solutions to be quickly and easily exchanged between systems, allowing for predictions to move at the speed of business.  

Zementis' PMML-based products: ADAPA for real-time scoring and UPPI for big data scoring, are designed from the ground up to deliver the agility necessary for models to be easily deployed in a variety of platforms and to be put to work right-away. 

Zementis ADAPA and UPPI kick-down the barriers for big data adoption!

Copyright © 2009 Zementis Incorporated. All rights reserved.

Privacy - Terms Of Use - Contact Us