Data Processing

Data Mining Software Comparison

Data Mining Software Comparison

Data mining software ”“ software that extracts information from a data set and structures it in ways that are easily interpretable and applied by humans ”“ has become increasingly important in the modern age. With the digital revolution and the rapid growth of the Internet, almost any business or industry, even those not traditionally associated with the information economy, can benefit from robust data mining capabilities.

As a result, there are a large (and increasing) number of software applications for data mining, each with a Data Mining Company behind them. While all data mining software runs on the same principles, it’s not all created equal. This is a discussion of the three most popular and commonly found examples of data mining software most businesses will be familiar with.

SPSS Modeler (IBM)

One of the most popular data mining applications is IBM’s SPSS Modeler, now in version 15. SPSS Modeler is a proprietary system that can integrate with other data mining applications (such as Microsoft’s Analysis Services built on the SQL language). It offers a visual interface that allows non-programmers to work with it, visually pushing ”œnodes’ of information together to produce complex data-mining operations in the background. These operations can then be fed (again, using simple visual metaphors) to existing databases or other custom-built applications ”“ for example, warehouse logistics applications.

SPSS Modeler can work directly with SQL databases and queries, which is a huge advantage for the software. In combination with its visual interface SPSS Modeler is ideal for desktop applications involving non-programmers touching the database. SPSS Modeler is one of the priciest and most complex solutions, requiring licenses that start off at $12,000 a year (for a single license) on the server side, and generally requires IBM to custom install.

SAS Data Mining

One of the more popular choices of data mining software is SAS Data Mining. Its chief advantages are being more affordable (in general) than SPSS Modeler while also providing a very powerful and flexible data mining tool for both small- and large-scale businesses and enterprises.

While SAS Data Mining has a graphical user interface (GUI), it is not as visually easy to use as SPSS Modeler. The GUI for SAS is mainly just a windowed readout of queries and nodes, making it very challenging for the non-programmer to use without extensive training. However, SAS Data Mining is more affordable than SPSS Modeler, with a desktop (individual) version selling for $8,700 per license. Larger installations require a custom price quote, and SAS Data Mining is not available at retail stores.

One major downside for SAS Data Mining is that it does not work natively with SQL queries. It translates SQL queries into a proprietary SAS format and the output from these operations is not reliably ”œpushable”� back into a SQL database or related applications.

WEKA KnowledgeFlow

WEKA KnowledgeFlow is a popular open-source alternative data mining software. It is free to download and install, extendable, and embeddable in existing systems. As with many open source solutions, you have to balance the low cost of acquisition with a lack of support, although small companies that provide installation and support for WEKA do exist. However, WEKA is an old codebase and its data mining capabilities have not kept up with modern times. While it can compete on features with about 95% of SPSS Modeler and SAS Data Mining, which last 5% can be significant.

For example, WEKA cannot handle any sort of text analytics, an increasingly important aspect of modern data mining. However, depending on your needs for data mining tools and services, WEKA may be a good alternative if your budget doesn’t allow for a major investment in data mining software.

– Data Czar @ DEO

Continue Reading