Find the right data mining suite for you

When we discuss data mining suites, we often focus on whether they implement a particular algorithm when, in fact, a specific algorithm is only one small part of the whole suite. More often than not, the suite’s reporting capabilities are far more important than the theory behind the chosen algorithms.

Depending on the business problem at hand, being able to see how the results were derived can be just as important as the resulting data itself. In other words, using a prediction model to score potential candidates is useful, but it’s just as useful to know how the dimensions in the underlying dataset were used to calculate those predictions. In this column, experts of marketing automation from bpm’online will show you how to focus your evaluation on the capabilities of an application suite, rather than on the underlying mathematical and statistical theorems coded in the software.

Let’s take a national fast-food chain as an example. Say the chain is considering several locations for a new restaurant. Data, such as population density, traffic flow, average household income, historical growth records, and a number of competitors within a certain radius, is collected on existing restaurant locations. This information is used to create a predictive model that calculates the sales for the first year in business.

Next, data is collected in several locations where the next restaurant may be built, and this data is fed through the model to calculate the sales for each site. Naturally, the locations that score highest are the best candidates. But we can gain even more insight by examining the actual predictive model. Perhaps traffic flow is more or less important than average household income, and maybe a number of competitors have a direct relationship on predicted sales. This insight, or intelligence, can then be used to expand the restaurant business into geographic areas not previously considered.

Obviously, there’s more to a data mining suite than just mathematical algorithms. I’ve divided my methodology into three sections: installation and maintenance, capabilities, and project management. Each section is further divided into areas I’ve rated in terms of importance to the effectiveness of the data mining suite. Depending upon your working environment and software needs, you may want to modify these weights slightly.

To use this methodology correctly, you must first create a chart with 11 rows. (See Table 1 for an example.) The first three rows will contain the name, manufacturer, and purchase price of the data mining suites being evaluated. The next seven rows contain the value you assign in each area used in the evaluation. The value in these rows can run from zero to ten, with ten being the highest score. The last row is the overall rating of the suite. This rating is calculated by multiplying each rating in the previous seven rows by the appropriate weight of each area and then summing across the columns.

Installation And Maintenance

Architecture (Importance: Medium-5). Here we focus on how the software suite is set up and how many options you have during installation. Questions you should ask include:

  • What options do you have when you first install the software?
  • Is it client/server, browser, or PC application-based?
  • Is the interface a GUI, command line, or API?
  • How easy is it to change the default settings?
  • How are macro commands used?
  • Can members of the data mining team share results and data sets easily?

Traditional Criteria (Importance: High-8). The questions in this section are typical of any software purchase and should be rated very high. To assign numeric ratings here, you should have an idea of how many people will use the software directly and how many will review the software’s results later. The cost of purchasing or licensing software, along with a maintenance agreement, is obviously of critical importance.

Training should also be considered. It can be delivered through traditional classes or through computers. Classroom training can be at the vendor’s location or on site. Computer-based training can be either over the Internet or on CD-ROM. Consultants can be brought in to help guide the first project through to completion. Consultants not only provide more insight into using the data mining suite to its fullest, but they can also be instrumental in bringing together the database and business experts to maximize return.

Author: Brandon Park