Wednesday, December 2, 2009

ADAPA 2.19 released. New and improved Web Services functionality.



Zementis is constantly adding new features to ADAPA. In its latest 2.19 release (November 30, 2009), it adds an important new feature that significantly enhances bulk scoring of large data sets: scoring of CSV files through web service calls.

With this new feature, an application can now submit for scoring input data compiled in a CSV file (ADAPA already supported SOAP XML format).

As an example, a simple application can export datafrom a database into a file, score the file, and import the results back into the database. And, much like the other web services, the file can be scored against multiple models in a single web service call, saving unnecessary round trip messages.

With CSV being a more compact data format over the SOAP XML representation, this can lead to significant savings in the volume of data being exchanged. In addition, the CSV file can even be submitted in a compressed format to minimize the network transfer time overhead.

For more information on this exciting new feature, please feel free to contact us.

Friday, November 13, 2009

Validation, Correction, and Conversion

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

Friday, November 6, 2009

ADAPA 2.18 released. New features and enhanced PMML support!



Zementis is constantly adding new features to ADAPA. A great example is the release of ADAPA version 2.18 (released on November 4, 2009). This release is packed with new features for its web services component as well as enhanced PMML (Predictive Model Markup Language) support.





Web S
ervices

A new web service has been added that brings the following functionality:
  • Import or export models

  • Remove models

  • Describe models (get model information)

  • Apply multiple models over multiple records

This new service comes as a second option besides the existing "model as an operation" web services. As of this version, the "model as an operation" web service is referred to as the RPC Web Service, reflecting the fact that every model is practically turned into a separate Remote Procedure Call. Note that in order to accommodate the new web service, the address (URL) and namespaces of the existing ones have been modified. Any applications developed against the web services of ADAPA 2.17 (or earlier) will require minor modifications to work with 2.18. Details can be found in the Web Services documentation which is accessible from the ADAPA Console Help Page.

For a list of all the Web Services available for Predictive Models in ADAPA, please click HERE.









ADAPA 2.18 adds support for three additional aspects of PMML:
  • DefineFunction in TransformationDictionary: A DefineFunction in PMML allows for a function to be defined once and called multiple times with a different list of arguments.

  • Targets for classification models (Targets were already supported for regression models)

  • TwoStep Clustering Models: A TwoStep Clustering Model is a clustering model exported by SPSS which is able to cluster very large datasets with mixed continuous and discrete data types. An overview of this model is given in the SPSS Enabling Technologies Division web page. Essentially, this method calculates distances between points using the distance measure of J. D. Banfield and A. E. Raftery ("Model-based Gaussian and non-Gaussian clustering",Biometrics,49,1993,pp. 803-821) as extended by M. Meila and D. Heckerman ("An experimental comparision of several clustering methods", Microsoft Research Technical Report,1998).

    For tips on how to represent TwoStep Clustering Models in PMML or for how to use the DefineFunction and Targets elements, please refer to the ADAPA Predictive Analytics Guide available for download from the ADAPA Console Help page.

    For more information on PMML support in ADAPA, please click HERE.

Thursday, November 5, 2009

ADAPA® Web Services for Predictive Analytics and Business Rules

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

Wednesday, October 28, 2009

ADAPA Add-in Help: Apply Model

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

ADAPA Add-in Help: Setup Connection

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

Thursday, October 22, 2009

SAIC and Zementis to bring "smarts" to the Smart Grid.

It seems that every tech blog or news I read these days is talking about two things: Cloud Computing and the Smart Grid. Well, in here, I will be covering both topics and more. I "predict" you will find it interesting.

The Smart Grid is definitely where we should go as a nation and as a world that is faced with several challenges not only in terms of growing energy demand, but also global warming. Both need to be addressed at the same time. But, how can it be done? I am sure that there are many answers to this question, no matter what the answer is though, it must involve the modernization of our existing energy grid, or the advent of the Smart Grid.

But again, what makes a grid smart? Is a better and more efficient grid a smarter grid? Probably! However, to be better and more efficient, it needs to go further than sophisticated meters and sensors ... and the enormous amount of data collected from them. How can data make the grid smarter? By collecting data from the grid on demand, one can make smarter decisions which in turn optimize the grid and make it truly intelligent. Turning this data into knowledge and acting upon it in real-time will provide the benefits that we are seeking and lead to the power grid of the future that we envision.

Whenever we are overwhelmed by vast amounts of data, Predictive Analytics can help us make sense of it all. Do I mean using Artificial Intelligence (AI)? Yes! Predictive Analytics is a branch of AI which involves building statistical models that can learn patterns hidden in vast amounts of data and learn to detect the same patterns as they occur, enabling the grid to predict whenever non-optimal or unwanted conditions are about to occur. Forewarned is forearmed! These predictive models are then able to apply their knowledge of the grid to improve its performance. The range of applications is really only dependent on your imagination.

Zementis, a leading provider of predictive analytics solutions, recently launched its scoring engine on the Amazon Cloud. The ADAPA engine can be used from anywhere in the globe at anytime. It takes in raw data and produces intelligent decisions which can be embedded into grid applications.

Since ADAPA leverages Cloud Computing and Open Standards, it is able to deliver the "smarts" to the Smart Grid in a very efficient way. Cloud Computing moves hardware and software into a web-based pay as you go business model. This allows utility companies to leverage massive computer power and speed for a fraction of the traditional cost they would normally incur given all the necessary requirements.

SAIC and Zementis signed a Marketing Agreement. The two companies are now working together to deliver breakthrough predictive analytics solutions to utility companies. Zementis’ expertise combined with SAIC’s deep domain knowledge in the energy sector will help utility providers to be more efficient and agile in addressing all the challenges they face when moving to a truly smart grid.

Monday, October 19, 2009

The latest ACM SIGKDD Explorations Newsletter is out. Focus on open source analytics and PMML.


The latest issue of ACM SIGKDD Explorations is out! This issue is relevant in many ways, since it not only gives special attention to open source analytics (including articles on Weka and KNIME), but it also discusses PMML and cloud computing.

PMML, in particular, gets special treatment. It is described in a full article written by Rick Pechter from Microstrategy. As Rick puts it, "the Predictive Model Markup Language data mining standard has arguably become one of the most widely adopted data mining standards in use today."

PMML is also discussed in most of the other articles, including the one by Zementis, entitled: "Efficient Deployment of Predictive Analytics through Open Standards and Cloud Computing". In this article, we use the ADAPA scoring engine to illustrate how the benefits of PMML and cloud computing can be combined to offer a platform that leverages these elements to deliver an efficient deployment process for statistical models.

So, don't miss out on this issue of SIGKDD Explorations. We invite you to explore all the peer-reviewed articles in detail.

Monday, October 12, 2009

Test data sets available for demo-ing the ADAPA Add-in for Microsoft Office Excel.

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

Friday, October 9, 2009

Predictive Analytics at your fingertips: Scoring data in Microsoft Office Excel.

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

Monday, October 5, 2009

How to score data using the ADAPA Add-in for Microsoft Office Excel?

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

Wednesday, September 30, 2009

Predictive Model Markup Language (PMML) Interest Group on LinkedIn

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

Monday, August 24, 2009

EU Cloud and Naive Bayes Classifiers. Predictive Analytics on the go now on both sides of the pond.

You may be wondering ... what is the link between the Amazon European Union (EU) Cloud and Naive Bayes Classifiers? The short answer: They are two new features of the latest version of the ADAPA Scoring Engine.

ADAPA is a revolutionary scoring engine since it allows people anywhere at anytime to deploy and execute their predictive models as a Service on the Amazon Elastic Compute Cloud (EC2). ADAPA is also completely standards-based and so reads models expressed in PMML, which is the de facto standard to represent predictive models and their interfaces (data pre- and post-processing). PMML is exported from all the major commercial and open-source model development tools.

In the latest version of ADAPA, users can launch ADAPA instances (virtual machines running ADAPA) on the Amazon EU EC2 (in addition to US EC2). This new feature addresses regulatory constraints for companies in the EU and brings ADAPA closer to users not only in the EU, but also adjacent regions.


Besides being able to have ADAPA closer to users and data, the engine can now also deploy and execute Naive Bayes Classifiers. This is the latest addition to the list of modeling techniques already supported by ADAPA.

Predictive Analytics on the go is now available for launching on both sides of the pond. Closer to users and data.

Tuesday, July 14, 2009

Data Mining for MySQL: Scoring your MySQL data just became a lot easier!

Many databases currently allow for data mining and analysis. SQL Server, for example, benefits from SQL Server Integration Services (SSIS) and Oracle from Oracle Data Miner. MySQL users, on the other hand, have in general used tools such as R and SPSS for data mining and to build statistical models. There is even an R package that builds an interface between R and MySQL (called RMySQL). Both R and SPSS (as well as a host of other statistical tools) are able to export PMML (Predictive Model Markup Language) which is the standard way to represent data mining models (for more on PMML, click here).

We have recently shown that one can easily deploy predictive models from SQL Server on the Amazon Cloud in a matter of minutes by using a script task in SSIS and the ADAPA Scoring Engine (see SSIS/ADAPA posting here). This time, we would like to make a similar case for MySQL.

Mind that building a model is a very different task than deploying one or executing it. The model development phase is usually mostly made of data analysis and massaging as well as feature selection. During model execution all you need are the most important data pieces (a much smaller sample of data fields than what you used during model development) to generate your decisions. In addition, the required pre-processing can be represented in PMML (for more on pre-processing and PMML, click here).

Model Deployment: Once a model exists, it can be easily uploaded in ADAPA which makes models available right away for execution via Web Services.

Model Execution: The task then is to extract data from your MySQL database, score it, and write the scored data back into the database. You can easily do that by using yet another open source tool: Jitterbit. It allows for data to be mapped from MySQL into a Web Service Call to ADAPA which returns the data back to Jitterbit and MySQL.



Process in Detail - Blog: We have described this process on a step-by-step basis here.

Process in Detail - Video
: We have also made a video describing this process. The YouTube version of this video can be accessed below, but we highly recommend the high-definition version of it.

Scoring your MySQL data just became a lot easier!

Wednesday, July 8, 2009

KDD 2009 Panel Report: Open Standards and Cloud Computing

Leading Experts Debate Emerging Trends for Predictive Analytics and Data Mining.

At KDD 2009 in Paris, the leading conference on Knowledge Discovery and Data Mining, a panel of experts discussed various topics related to open standards and cloud computing, with a particular focus on the practical use of statistical algorithms, reliable production deployment of models and the integration of predictive analytics within other systems.


Moderated by Zementis, the panel was comprised of a distinguished group of thought leaders representing key software vendors in the data mining industry including DMG / Open Data Group, IBM, KNIME, KXEN, Microstrategy, Pervasive, SAS and SPSS.

The first major focus of the discussion was the Predictive Model Markup Language (PMML). All vendors on the panel strongly support PMML, the de-facto standard for model exchange. It was evident that all panel members champion the PMML standard and will continue to actively improve features and usability through their products. Addressing enhanced compatibility among vendors, the DMG and Zementis now offer a comprehensive PMML converter to check, validate, and convert PMML models. The panel also coincided with the general release announcement of PMML 4.0, the latest version of the standard.

Turning towards the emerging trend of Cloud Computing, it was evident that all vendors are actively investigating how to leverage the cloud most effectively for predictive analytics and data mining. Several vendors already provide cloud-based solutions, either on a public cloud infrastructure like Amazon EC2 or their own data center.



PMML and Cloud Computing are a reality and available today! There was no doubt that PMML as a standard has been accepted and has evolved into a valuable foundation for the predictive analytics industry. Cloud Computing will deliver additional benefits for various data mining solutions, either through a private or a public cloud infrastructure depending on the nature of the application.

For a more detailed summary of the panel, please review the KDD 2009 Panel Report which summarizes questions and answers from the discussion.






Monday, June 22, 2009

Examining PMML 4.0 - Part I: Pre-Processing

You may be wondering what is all the fuss around PMML and its 4.0 version. So, we decided to explore all that PMML 4.0 has to offer in a series of blogs. In part I, we will be exploring its improved pre-processing capabilities.

All data mining models manipulate the raw data in a way or another before passing it through a neural network, support vector machine, or regression model. Therefore, a language that wants to represent all the computations that go into a model needs also to be able to represent the data transformations that were applied to the raw data before scoring takes place. PMML is this language! It is the Yin and Yang of data mining.

Let's first re-cap on the pre-processing capabilities available in PMML 3.2. This version of PMML allows for the following out of the box data transformations:
  • Normalization of continuous variables: this is accomplished via the NormContinuous element of PMML. It is mostly used to normalized a variable between 0 and 1. See example below (real PMML code) in which two variables are normalized. The first between 0 and 1 and the second between 0 and 4.
  • Normalizing Categorical Inputs: normally used to transform strings into numerical variables. This is accomplished by the element NormDiscrete. In the PMML example below, a categorical variable creates dummy variables that will be assigned values 1 or 0 depending on the category assumed by the input variable.
  • Discretization: this is used to transform continuous variables into strings. This is accomplished by the Discretize element. In the PMML example below, if the input variable is equal to 500, it is transformed to low; if equal to 5000, it is transformed to medium; and if 50,000, it is high.
  • Value Mapping: this is accomplished in PMML by the use of a mapping table and the element MapValues. To make things more interesting, in the PMML example below, we combine elements MapValues and NormDiscrete to group small sets of categorical values. In specific, we want to find out if the input variable belongs to a specific group of colors. We do that by using MapValues to map different colors to the same number. We then use the element NormDiscrete to create dummy variables which are used to indicate group membership.
  • Arithmetic Expressions: PMML offers a range of arithmetic functions (as well as string and date/time maniputation functions) that can be arranged in different ways to express complex arithmetic expressions. The example below solves the following operation:
ResultVar=maximum(round(InputVar1/3.3),2^(1+log(1.3*InputVar2+1)))

  • PMML 4.0 - Boolean Operations: Not only PMML 4.0 allows for Boolean operations to be fully expressed, but it also allows these to be nested into IF-THEN-ELSE logic. These new buit-in functions offer a vast new array of possibilites for representing data transformations in PMML. So, we devote the rest of this review by looking at transformations that can now be easily expressed in PMML 4.0.
We start with the PMML code below which implements the following logical and arithmetic operations:
IF InputVar1 == "Partner" THEN DerivedVar1 = "P" ELSE DerivedVar2 = 2 * InputVar2



Note that it uses the newly defined 4.0 functions: "if", "equal", and "not" as well as function "*".

The PMML code below assumes that both "then" and "else" parts of the "if" use the same derived variable to implement the following operations:
IF InputVar1 == "Partner" THEN DerivedVar1 = "5.1 * InputVar2" ELSE DerivedVar1 = "InputVar2 / 3.3"

Finally, we end our list of PMML pre-processing examples by showing the use of 4.0 functions "isMissing" and "isIn" combined with function "if". The PMML example below implements the following operations:
IF InputVar is missing THEN DerivedVar = 1 ELSE (IF InputVar is in ("Partner", "Associate", "Colleague") THEN DerivedVar = 2 ELSE DerivedVar = 3)


We finish part I of our PMML tour hoping that this short description of its pre-processing capabilities can help you to easily navigate through all the data transformations available in PMML 4.0.

Tuesday, June 16, 2009

PMML 4.0 is here!

The DMG (Data Mining Group) has just released PMML 4.0, the latest and greatest version of the Predictive Model Markup Language.

DMG, PMML
Zementis, together with SPSS, SAS, IBM, Open Data Group, Salford Systems, Microstrategy and all the other contributing members of the DMG is proud to be part of the making of PMML, the de facto standard to represent data mining models.

Not only can
PMML represent a wide range of statistical techniques, but it can also be used to represent the data transformations necessary to transform raw data into meaningful feature detectors. In this way, PMML offers a standard to represent data manipulation and modeling in a single concise way.



Improved Pre-Processing Capabilities

PMML 4.0 extends the range of pre-processing capabilities supported by older versions by adding a range of boolean operations (e.g., and, or, not, equal, notEqual, greaterOrEqual, ...) to the list of built-in functions. These, combined with an IF-THEN-ELSE function which is also new to PMML, allow for the representation of a wide range of feature detectors.

For examples on how to use these new pre-processing capabilities as well as all the standard PMML transformations, please check the PMML Data Pre-Processing Primer.

Time Series Models


PMML 4.0 also extends the existing standard by allowing for the representation of Time Series Models. In particular, it allows for data miners and data mining tools to represent Exponential Smoothing models and offers place holders for ARIMA, Seasonal Trend Decomposition, and Spectral Analysis which are to be supported in the near future.

Model Explanation

Other additions are Model Explanation and Multiple Models. Model Explanation allows for evaluation and model performance measures to be part of the PMML file itself. In this way, not only data manipulation and models get to be defined, but also associated ROC Graph, Gains/Lift Charts, Confusion Matrix, Field Correlations, Univariate Statistics, and more.

Multiple Models

Multiple Models allows for model composition, ensembles, and segmentation. It replaces the old Model Composition element to offer greater flexibility for combining different models types, such as regression and decision trees.

Extending Existing Elements

Last, but not least, PMML 4.0 offers a range of extensions to existing elements, such as the addition of multi-class classification for Support Vector Machines, improved representation for Association Rules, and the addition of Cox Regression Models.

There is no doubt that PMML is here to stay. The announcement of PMML 4.0 attests to the commitment of the leading data mining vendors to be able to represent their solutions through a single language, a language that can be understood by all. It is our vision that users will be free to share models among many solutions, benefiting from an environment in which interoperability is truly attainable.

For more information on PMML and a list of useful links, please check PMML 101. Also, check the article "PMML: An Open Standard for Sharing Models" just published in The R Journal.

We also invite the entire community to join our on-going PMML discussion at the AnalyticBridge website.

Monday, June 1, 2009

How to Score 300,000,000 Customer Records for $3

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

Thursday, May 28, 2009

The R Journal - A Refereed Journal for the R Project Launches


The R project, a free software environment for statistical computing and graphics, now features a peer-reviewed journal.


The R newsletter has been transformed into The R Journal, a refereed journal for articles covering topics that are of interest to users or developers of R. The first issue is now available online. As a sign of the open source R project gaining significant momentum, The R Journal intends to reach a wide audience with high-quality papers that are focused on R.

As a supporter of the R PMML Package (see blog and video tutorial), Zementis (together with Togaware) is honored that our article "PMML: An Open Standard for Sharing Models" has been selected by the editorial board to be published as part of the inaugural issue.

While our article emphasizes the importance of the Predictive Model Markup Language (PMML) standard, other contributed research papers address a wide range of topics including graphics and parallel computing. Of more general interest are the invited articles on "The Future of R" which provide the reader with an overview of R's programming model and R-Forge, a central, collaborative platform for the development of R packages.

Following the feature in the New York Times in early 2009, this is yet another significant milestone for R.

Wednesday, May 20, 2009

KDD 2009 Panel on Open Standards and Cloud Computing


Emerging Trends in Open Standards and Cloud Computing for Data Mining.

Please join us for an exciting panel discussion at the 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining in Paris.

Over the past decade, we have seen tremendous interest in the application of data mining and statistical algorithms, first in research and science and, more recently, across various industries. Impacting scientific and business applications alike, interoperability and open standards still lack broader adoption in the data mining community. In addition, emerging trends in cloud computing and Software as a Service will play a critical role in promoting the effective implementation and widespread application of predictive models.

The panel will discuss various topics related to open standards and cloud computing, with a particular focus on the practical use of statistical algorithms, reliable production deployment of models and the integration of predictive analytics within other systems.

Moderated by Zementis, the panel is comprised of a distinguished group of thought leaders representing key software vendors in the data mining industry including DMG / Open Data Group, IBM, KNIME, KXEN, Microstrategy, Pervasive, SAS and SPSS.

For details, please see the KDD 2009 web site.

Tuesday, May 12, 2009

SAP selects Zementis as a winner in the SAP BusinessObjects Explorer Development Contest

Recently, SAP sponsored a contest that challenged the developer community to create and demonstrate the most innovative use of SAP BusinessObjects Explorer APIs inside of another application or process. The challenge was a great success, and SAP received a large number of high-quality submissions.



In its solution, Zementis combined on-demand, high-performance processing for predictive algorithms through the ADAPA Predictive Analytics scoring engine on Amazon EC2 with dynamic, on-demand visualization of the results in SAP BusinessObjects Explorer. The Zementis innovation leverages services deployed in two distinct cloud computing infrastructures, Amazon EC2 and SAP, demonstrating true interoperability between services, platforms, and vendors.

We also posted a brief VIDEO that illustrates the integration between ADAPA and the SAP BusinessObjects Explorer.

(SAP Announcement Details)

Tuesday, March 24, 2009

ADAPA Predictive Analytics Scoring Engine - Demo Video

In order to fully leverage the power of predictive analytics and data mining algorithms, users need the capability to seamlessly integrate statistical decision models into operational systems. Fortunately, the maturity of open standards and cloud computing allow us to finally deliver an agile deployment framework that combines on-demand scalability and a cost-effective Software-as-a-Service (SaaS) licensing model.

ADAPA on the Amazon Cloud is a deployment platform and scoring engine, which delivers:

  • Better ... Support of Open Standards

  • Faster ... Deployment and Scoring

  • Cheaper ... Total Cost of Ownership

  • Easier ... Launch and Integration


Please take a few moments to watch the following video to learn why ADAPA is a true quantum leap for the predictive analytics industry and how it will empower your business to make better decisions today.

Interested to experience ADAPA first hand? Let us take your Enterprise Decision Management (EDM) strategy to the next level and contact us now for a free trial.




[High Res Video: ADAPA Predictive Analytics Scoring - Demo]

Monday, March 23, 2009

ADAPA means business - Predictive Analytics in 90 seconds

Watch this short video to learn how ADAPA will help you predict future customer behavior today. What can predictive analytics do for you, when you deploy your predictive models in ADAPA and are able to start using them right away?

Combining predictive analytics with cloud computing, ADAPA presents new opportunities to leverage predictive models in real-time or in batch mode, with new flexibility and at a lower total cost of ownership.



[High Res Video: ADAPA Means Business]

Tuesday, February 17, 2009

Data pre-processing in PMML and ADAPA - A Primer

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.

Thursday, January 8, 2009

Statistical Analysis Software R Featured in NYT Article

We highly recommend the following article about R in the New York Times: Data Analysts Captured by R's Power.

It shows how excellent open source software can prosper!

NYT, January 6, 2009. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.
[...] those most familiar with the software estimate that close to 250,000 people work with it regularly.


Zementis has been working with the R community, specifically to extend the support for the Predictive Model Markup Language (PMML) standard which allows model exchange among various statistical software tools.

Got models in R? Deploy and score them in ADAPA in minutes on the Amazon EC2 cloud computing infrastructure!

If you develop your models in R, you can easily deploy and execute these models in the Zementis ADAPA scoring engine (using the PMML standard). This not only eliminates potential memory constraints in R but also speeds execution and allows SOA-based integration. For the IT department, ADAPA delivers reliability and scalability needed for production-ready deployment and real-time predictive analytics.

Contact us for a free trial!

Wednesday, January 7, 2009

Rattle Version 2.4.0 Released - Open Source Data Mining and Extensive PMML Support

Version 2.4.0 of the free data mining software, Rattle, has been released.

Through its extensive PMML support, Rattle perfectly complements the Zementis ADAPA predictive analytics scoring engine. Build your predictive models in Rattle/R and then score/deploy/integrate them via ADAPA in your production environment.

The aim of Rattle is to provide a simple and intuitive interface that allows a user to quickly load data from a CSV file (or via ODBC), transform and explore the data, build and evaluate models, and export models as PMML (Predictive Model Markup Language) or as scores. All of this with little knowledge of R.


For more details, please see the announcement at KDnuggets News.

Monday, January 5, 2009

ADAPA on the Cloud - Security on Amazon EC2

This posting has been moved to the Zementis Support Site. You can still access it by clicking HERE.





Copyright © 2009-2014 Zementis Incorporated. All rights reserved.

Privacy - Terms Of Use - Contact Us