Monday, November 11, 2013

Big Data Scoring - IBM PureData and Zementis Universal PMML Plug-in (UPPI)

In-database scoring is one of the most straightforward ways to gain insights from Big Data. It is no surprise then that the Zementis Universal PMML Plug-in (UPPI) is now being offered for a variety of database platforms. These include IBM Pure Data for Analytics (Netezza), Pivotal/Greenplum, SAP Sybase IQ, Teradata and Teradata Aster. Zementis also offers UPPI for Hadoop/Hive, including IBM Pure Data for Hadoop as well as InfoSphere BigInsights. It is in this context that we travelled to Vegas to attend the IBM Information on Demand (IOD) Conference.

I must say, I am always impressed by the IBM universe of products and tools that are being offered for analytics (descriptive and predictive) as well as Big Data in general. Zementis had a booth inside the Pure Data exhibit area and next to all the Pure Data appliances. As you can imagine, traffic was solid not just because of all the blinking lights but also because the conference itself attracts a lot of people. I believe there were 14 thousand attendants this year.

Why in-database scoring? Well, simple. Not all analytic tasks are born the same. If one is confronted with massive volumes of data that need to be scored on a regular basis, in-database scoring sounds like the logical thing to do. In all likelihood, the data in this case is already stored in a database and, with in-database scoring, there is no data movement. Data and models reside together hence scores and predictions flow on an accelerated pace.

Why scoring in Hadoop? Big Data and Hadoop are somewhat synonymous terms these days, since the latter offers an important technological platform to tackle the challenge of analyzing large volumes of data. In fact, predictive analytics is paramount for companies to extract value and insight from such data. By offering the Universal PMML Plug-in (UPPI) for Hadoop, Zementis takes a big step in making its technology available for companies around the globe to easily deploy, execute, and integrate scalable standards-based predictive analytics on a massive parallel scale through the use of Hive, a data warehouse system for Hadoop.

UPPI brings together essential technologies, offering the best combination of open standards and scalability for the application of predictive analytics. It fully supports the Predictive Model Markup Language (PMML), the de facto standard for data mining applications, which enables the integration of predictive models from IBM/SPSS, SAS, R, and many more.

Thursday, October 31, 2013

ADAPA on Amazon AWS Marketplace: Predictive Analytics and Big Data Scoring 1-click away

Clients benefit from our solutions by being able to use PMML, the Predictive Model Markup Language, to move their predictive models from IBM SPSS, R, SAS EM, ... and deploy them instantly in a variety of platforms, including the Amazon Elastic Compute Cloud (Amazon EC2).

ADAPA on the Amazon Cloud offers the power of our real-time PMML-based scoring engine on the Amazon Cloud. ADAPA on the Amazon Cloud comes pre-installed on a virtual server on the cloud. We call that an "ADAPA Instance".

The AWS (Amazon Web Services) Marketplace gives you the power of having ADAPA at your fingertips on three different types of virtual machines. Once you select the machine type and the cloud region in which you want it to run (US, Europe, Latin America or Asia-Pacific), all you need to select is 1-Click Launch and moments later your ADAPA instance is up and running, ready for deployment and execution. 

Big Data Scoring through ADAPA with S3 Processing

Zementis makes it super easy to score your big data by connecting your Amazon S3 (Simple Storage Service) bucket to your predictive models deployed in ADAPA on the Amazon Cloud. ADAPA with S3 Processing is intended for mission critical applications that require very high throughput of predictive analytics. While ADAPA provides real-time scoring via a Web-services API, S3 Processing addresses use cases with scoring requirements that involve tens or hundreds of millions of rows at a time.

Thursday, October 10, 2013

CIO Review: Zementis selected as one of the top 20 most promising big data companies

Selected by a distinguished panel comprising of CEOs, CIOs, VCs, industry analysts and the editorial board of CIO Review, Zementis has been named by CIO Review as one of the "Top 20 Most Promising Big Data Companies in 2013." Congratulations Zementis!


That comes as no surprise since Zementis is all about kicking down barriers for the fast deployment and execution of predictive solutions. By leveraging the PMML (Predictive Model Markup Language) standard, Zementis' products allow for predictive models built anywhere (IBM SPSS, KXEN, KNIME R, SAS, ...) to be deployed right-away on-site, in the cloud (Amazon, IBM, FICO), in-database (Pivotal/Greenplum, SAP Sybase IQ,  IBM PureData for Analytics/Netezza, Teradata and Teradata Aster) or in Hadoop (Hive or Datameer).

Predictive analytics has been used for many years to learn patterns from historical data to literally predict the future. Well known techniques include neural networks, decision trees, and regression models. Although these techniques have been applied to a myriad of problems, the advent of big data, cost-efficient processing power, and open standards have propelled predictive analytics to new heights.

Big data involves large amounts of structured and unstructured data that are captured from people (e.g., on-line transactions, tweets, ... ) as well as sensors (e.g., GPS signals in mobile devices). With big data, companies can now start to assemble a 360 degree view of their customers and processes. Luckily, powerful and cost-efficient computing platforms such as the cloud and Hadoop are here to address the processing requirements imposed by the combination of big data and predictive analytics. 

Creating predictive solutions is just part of the equation. Once built, they need to be transitioned to the operational environment where they are actually put to use. In the agile world we live today, the Predictive Model Markup Language (PMML) delivers the necessary representational power for solutions to be quickly and easily exchanged between systems, allowing for predictions to move at the speed of business.  

Zementis' PMML-based products: ADAPA for real-time scoring and UPPI for big data scoring, are designed from the ground up to deliver the agility necessary for models to be easily deployed in a variety of platforms and to be put to work right-away. 

Zementis ADAPA and UPPI kick-down the barriers for big data adoption!

Wednesday, October 2, 2013

R PMML Support: BetteR than EveR

How does it work? Simple! Once you build your model in R using any of the PMML supported model types, pass the model object as an input parameter to the pmml package as shown in the figure below.

pmml package

The pmml package offers export for a variety of model types, including:

   •   ksvm (kernlab): Support Vector Machines 
   •   nnet: Neural Networks 
   •   rpart: C&RT Decision Trees 
   •   lm & glm (stats): Linear and Binary Logistic Regression Models 
   •   arules: Association Rules 
   •   kmeans and hclust: Clustering Models 
   •   multinom (nnet): Multinomial Logistic Regression Models 
   •   glm (stats): Generalized Linear Models for classification and regression with 
         a wide variety of link functions 
   •   randomForest: Random Forest Models for classification and regression 
   •   coxph (survival): Cox Regression Models to calculate survival and stratified 
         cumulative hazards 
   •   naiveBayes (e1071): Naive Bayes Classifiers 
   •   glmnet: Linear ElasticNet Regression Models 
   •   ada: Stochastic Boosting (coming soon) 
   •   svm (e1071): Support Vector Machines (coming soon)

The pmml package can also export data transformations built with the pmmlTransformations package (see below). It can also be used to merge two distinct PMML files into one. For example, if transformations and model were saved into separate PMML files, it can combine both files, as described in Chapter 5 of the PMML book - PMML in Action

Data Transformations - the R pmmlTransformations Package

The pmmlTransformations package transforms data and, when used in conjunction with the pmml package, allows for data transformations to be exported together with the predictive model in a single PMML file. Transformations currently supported are:

   •   Min-max normalization 
   •   Z-score normalization 
   •   Dummy-fication of categorical variables 
   •   Value Mapping 
   •   Variable renaming

To learn more about this package, check out the paper we presented at the KDD 2013 PMML Workshop.

Tuesday, September 24, 2013

Predictive Analytics Deployment - A No-brainer with PMML

Model deployment used to be a big task. Predictive models, once built, needed to be re-coded into production to be able to score new data. This process was prone to errors and could easily take up to six months. Re-coding of predictive models has no place in the big data era we live in. Since data is changing rapidly, model deployment needs to be instantaneous and error-free.

PMML, the Predictive Model Markup Language, is the standard to represent predictive models. Given that PMML can be produced by all the top commercial and open-source data mining tools (e.g., FICO Model Builder, SAS EM, IBM SPSS, R, KNIME, ...), a predictive model can be easily moved into the production environment once it is represented as a PMML file.

Zementis offers ADAPA for real-time scoring and UPPI for big data scoring which make the entire model deployment process a no-brainer. Given that ADAPA and UPPI are universal PMML consumers (accept any version of PMML produced by any PMML-compliant tool), they can make predictive models instantly available for execution inside the production environment.

Check out the Zementis website for details.

Tuesday, September 10, 2013

PMML Workshop - UCSD Extension - Oct 24-25 (Register today!)

October 24-25, 2013
San Diego Supercomputer Center (SDSC), UC San Diego Campus

The Predictive Model Markup Language (PMML) is the de facto standard to represent data mining and predictive analytic models. With PMML, one can easily share a predictive solution among PMML-compliant applications and systems.
Developed in partnership with the San Diego Supercomputer Center’s (SDSC) Predictive Analytics Center of Excellence (PACE), this 2-day, hands-on workshop, will explore how the PMML language allows for models to be deployed in minutes. You will get to know its business value and the data mining tools and companies supporting PMML. You will also begin to understand the language elements and capabilities and learn how to effectively extract the most out of your PMML code.

Workshop Benefits
  • Practice PMML on SDSC’s Gordon with the guidance of world class instructors from industry and academia.
  • Learn how to represent an entire data mining solution using open-standards
  • Understand how to use PMML effectively as a vehicle for model logging, versioning and deployment
  • Identify and correct issues with PMML code as well as add missing computations to auto-generated PMML code
  • PLUS…Receive a comprehensive tour of SDSC to discover its inner workings, extensive capabilities and current projects.
  • Alex Guazzelli, Ph.D., Vice President of Analytics, Zementis, Inc.
  • Natasha Balac, Ph.D., Director of PACE, SDSC, UC San Diego
  • Paul Rodriguez, Ph.D., Research Programmer Analyst, SDSC, UC San Diego
Scholarships Available!
Thanks to the generous underwriting of Zementis, three (3) half-tuition scholarships are available.
 Learn more and apply
Note: Students should have a fundamental knowledge of data mining methods and basic experience with computer programming language. Students must bring a laptop (MAC or PC) each day to fully participate during the hands-on portion of the workshop.
Course Number: CSE-41184   Credit: 2 units
This course is part of the following Certificate Program(s):

Wednesday, July 10, 2013

The Zementis Partnership with Karmasphere

We are excited to announce our partnership with Karmasphere, a leader in Big Data analytics. This alliance brings the power of PMML-based scoring to the Karmasphere Hadoop universe. Hadoop has emerged as the go-to platform for Big Data analytics. It is no surprise then that Zementis has been hard at work making its predictive analytics engine available for Hadoop users in different shapes and forms (Karmasphere, Hive, Datameer)

The 2013 Hadoop Summit held in San Jose two weeks ago was a memorable event. The summit was a buzz of excitement. It was the perfect venue for Zementis and Karmasphere to announce their partnership. 

How does it work? Simple. Zementis makes PMML-based models available as standard Hive User Defined Functions (UDF's) that can be readily consumed and managed by users of the Karmasphere Workspace for Big Data Analytics.

Practically speaking, it is now easier than ever to move models from your favorite model building environment for scoring in Hadoop. If you use R, for example, simply save your models in PMML-format and use Zementis and Karmasphere to deploy them natively on Hadoop, across all your data and dimensions. No custom code necessary! Easy, cost-efficient and fast.

As Martin Hall, founder and CTO of Karmasphere, says "Zementis' experience making analytic models portable is formidable. We're tapping into that, preparing for a world where more off-the-shelf, very powerful, standard analytics can be available to more analysts, enabling them to work faster in the world of Big Data."  

Read the press release!

Friday, May 10, 2013

The Zementis Partnership with FICO

Stuart Wells, FICO CTO, announced the strategic partnership between Zementis and FICO at FICO World on May 2, 2013. FICO clients will now benefit from the outstanding Zementis scoring technology.

How? The Zementis ADAPA scoring engine provides a highly scalable framework to deploy, integrate, and execute complex data mining and predictive models based on the PMML standard. Models built in most commercial and open source data mining tools, such as FICO Model Builder or R, can now instantly be deployed in the FICO Anaytic Cloud. 

Customers, application developers and FICO partners will be able to extract value and insight from their predictive models and data immediately, using ADAPA and PMML. This will result in quicker time to innovation and value on their analytic applications.

Read the press release!

Predictive Analytics Deployment

Zementis offers software solutions that enable scalable, real-time execution of predictive analytics across a variety of platforms based on the PMML standard. These include:

ADAPA Scoring EngineOur solution for real-time scoring. ADAPA is available for on-site deployment as a traditional license or as a service in the Amazon Elastic Compute Cloud (EC2) and IBM SmartCloud Enterprise. And now, with our FICO partnership, ADAPA will also be available in the FICO Analytic Cloud.

UPPI, the Universal PMML Plug-in: The leading solution for Big Data, UPPI provides scoring in-database and for Hadoop. It is available for EMC Greenplum, IBM Netezza, SAP Sybase IQ, Teradata/Aster as well as Hadoop/Hive and Datameer. 

Tuesday, May 7, 2013

KDD 2013 PMML Workshop (August 11, 2013)

Come and join us for the KDD PMML Workshop to be held in Chicago on August 11. Organized by the Data Mining Group (DMG), this workshop will feature invited talks and presentations of selected papers. 
KDD PMML Worshop
What: A half-day workshop on the Predictive Model Markup Language (PMML)
When: Sunday, August 11, 2013 - Time TBD
Where: Chicago, IL - Chicago Sheraton

Call For Papers
  • Abstracts due: May 14, 2013, 23:59pm CT
  • Papers due: May 24, 2013, 23:59pm CT
  • Acceptance notification: May 31, 2013
  • Final Camera Ready Paper Due: June 7, 2013
  • CFP Website

Thursday, April 11, 2013

Predictive Model Markup Language (PMML) Workshop at KDD 2013 in Chicago

Please join us for a Predictive Model Markup Language (PMML) workshop at KDD 2013 in Chicago on August 11, 2013, to exchange exciting new developments, leading practices, and high impact applications in big data, knowledge discovery and data mining which utilize the PMML standard. 

The annual ACM SIGKDD conference on Knowledge Discovery and Data Mining (KDD) is the premier international forum for data mining and big data researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. We invite submission of papers describing implementations of the Predictive Model Markup Language (PMML). Submitted papers will go through a competitive peer review process. Please consult the workshop website for full details regarding paper preparation and submission guidelines. 

PMML workshop website 

KDD conference web site

Wednesday, March 27, 2013

The Zementis Partnership with Infocom

It is our pleasure to announce a strategic partnership with Infocom. If you missed out on our press release, here is the headline:

Zementis and Infocom partner to deliver predictive analytic solutions in Japan.


Dedicated to the Japanese market, Infocom combines strong expertise in data mining and predictive analytics with extensive delivery and consulting capabilities.

Zementis offers software solutions that enable scalable, real-time execution of predictive analytics across a variety of platforms based on the PMML standard. These include the ADAPA Scoring Engine available for on-site deployment or in the cloud, and UPPI, the Universal PMML Plug-in for in-database scoring and Hadoop (available for IBM Netezza, Teradata/Aster, EMC Greenplum, SAP Sybase IQ as well as Hadoop and Datameer).

Infocom will market, distribute and support Zementis's predictive analytics software in Japan.

To take a look at the press release, click HERE.

Additional Online Resources 

Friday, March 8, 2013

Making the Case for PMML and ADAPA

If you are not familiar with PMML, the Predictive Model Markup Language, you may be wondering what all the fuss is about ...

PMML is the de facto standard to represent data mining and predictive analytic solutions. With PMML, one can easily share a predictive solution among PMML-compliant applications and systems  For example, you can build your model in R, export it in PMML, and use ADAPA, the Zementis Scoring Engine, to deploy it in production.

Many data mining models are a one-time affair. You use historical data to build the model and use it to analyze ... historical data. Wait! That sounds more like descriptive analytics, not predictive analytics. Well, that is sort of true. To be truly predictive, a data mining model needs to be applied to new data. These are the models that need to be operationally deployed and, from my point of view, these are the solutions that are truly revolutionizing the way we do business and live in the Big Data world.

If you want then to use your data mining model to make predictions when presented with new data, it needs to be a dynamic asset. It cannot be static. You need to be able to build it and instantly put it to use. And, that's where PMML and ADAPA come in handy.

Obviously, a few data mining tools try to lock you in. You happily build the model using tool A, just to realize that you need the same tool to execute it. In this case, you are missing out. Here are some of the benefits of moving your predictive model to ADAPA:
  • Overcome speed/memory limitations
  • Dramatically lower your infrastructure cost
  • Tap into all the advantages of cloud computing with ADAPA on the Cloud (IBM SmartCloud or Amazon EC2)
  • Produce scores in real-time (using Web Services or Java API), on-demand, or batch-mode
  • Execute your models directly from Excel, by using the ADAPA Add-in for Excel
  • Benefit from using a set of PMML-compliant model development tools (best of breed)
  • Deploy your models in minutes
  • Manage models via Web Services or a Web console
  • Upload one or many models into ADAPA at once
  • Benefit from the seamless integration of business rules and predictive models (yes, for those who need it, ADAPA comes with a business rules engine)
PMML and ADAPA allow you to use best of breed tools (not the same old tool) for the job at hand. Also, you can leverage the expertise from a diverse group of data scientists. That means, not all your data scientists need to be experts on a single tool. They can use different tools that share one thing in common, the PMML standard. And, once represented in PMML, models can be easily understood by all team members. PMML allows for transparency and, in doing so, fosters best practices.

Why not benefit from: 1) an open standard to represent data mining models; and 2) a proven scoring engine that consumes any version of PMML and make it available for execution right away, in real-time?

Keep also in mind that ADAPA's sister product, the Universal PMML Plug-in (UPPI), allows you to move the same PMML file in-database or Hadoop. UPPI is currently available for EMC Greenplum, SAP Sybase IQ, IBM Netezza, and Teradata/Aster. With UPPI for in-database scoring, there is no need to move your data outside the database. Data and models reside inside it and so there is minimal data movement and maximum scoring speed. UPPI is also available for Datameer and will soon be available for Hadoop/Hive.

Making a model operational in minutes has never been easier! And, it is all because of PMML and scoring tools such as ADAPA and UPPI.

Thursday, February 28, 2013

In-database scoring with Teradata and Teradata Aster

The partnership between Zementis and Teradata allows customers with a variety of data mining tools to efficiently deploy predictive models based on the Predictive Model Markup Language (PMML) standard.  Focused on Big Data applications, the Universal PMML Plug-in (UPPI) for Teradata enables scalable execution of standards-based predictive analytics directly within the Teradata data warehouse.

To read more about the benefits of running your predictive solutions inside Teradata and Teradata Aster, please visit:

PMML Scoring

Zementis offers a range of products that make possible the deployment of predictive solutions and data mining models built in all the top commercial and open-source data mining vendors. Our products include the ADAPA Scoring Engine for real-time scoring and UPPI, which is currently available for a host of database platforms as well as Hadoop/Datameer. For a list of available platforms, please visit our in-database products page.


Not all analytic tasks are born the same. If one is confronted with massive volumes of data that need to be scored on a regular basis, in-database scoring sounds like the logical thing to do. In all likelihood, the data in this case is already stored in a database and, with in-database scoring, there is no data movement. Data and models reside together hence scores and predictions flow on an accelerated pace

Copyright © 2009-2014 Zementis Incorporated. All rights reserved.

Privacy - Terms Of Use - Contact Us