Rexer Analytics 2010 Data Miner Survey Summary Report
-
Upload
vignesh-manickam -
Category
Documents
-
view
218 -
download
0
Transcript of Rexer Analytics 2010 Data Miner Survey Summary Report
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
1/37
4th Annual Data Miner Survey 2010 Survey Summary Report
Rexer Analytics
For more information contactKarl Rexer, PhD
www.RexerAnalytics.com
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
2/37
!2011 Rexer Analytics 2
Outline
Overview & Key Findings Where & How Data Miners Work
Whats Important to Data Miners
Data Mining Tools: Usage & Satisfaction
Overcoming Challenges & Optimism about the Future
Appendix: Where do Data Miners Come From?
Appendix: Rexer Analytics
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
3/37
!2011 Rexer Analytics 3
Overview & Key Findings
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
4/37
!2011 Rexer Analytics 4
2010 Data Miner Survey: Overview
Fourth annual survey
50 questions
Data collected online in early 2010
10,000+ invitations emailed,
plus promoted by newsgroups,
vendors, and bloggers
Respondents:735 data miners from
60 countries
Corporate
Consultants
Note: Data from tool vendors (companiesmaking data mining software) was excluded
from many analyses.
Academics
NGO / Govt
Vendors
North America
USA 40%
Canada 4%
Europe Germany 7%
UK 5%
France 4%
Poland 4%
Asia Pacific
India 4%
Australia 3%
China 2%
Central & SouthAmerica (4%)
Columbia 2%
Brazil 1%
Middle East & Africa (3%) Israel 1%
Turkey 1%
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
5/37
!2011 Rexer Analytics 5
Key Findings
FIELDS & GOALS: Data miners work in a diverse set of fields. CRM / Marketing has been the #1 fieldin each of the past four years. Fittingly, improving the understanding of customers, retainingcustomers and other CRM goals are also the goals identified by the most data miners surveyed.
ALGORITHMS: Decision trees, regression, and cluster analysis continue to form a triad of corealgorithms for most data miners. However, a wide variety of algorithms are being used. This year, forthe first time, the survey asked about Ensemble Models, and 22% of data miners report using them.A third of data miners currently use text mining and another third plan to in the future.
MODELS: About one-third of data miners typically build final models with 10 or fewer variables, whileabout 28% generally construct models with more than 45 variables.
TOOLS: After a steady rise across the past few years, the open source data mining software R overtookother tools to become the tool used by more data miners (43%) than any other. STATISTICA, which hasalso been climbing in the rankings, is selected as the primarydata mining tool by the most data miners(18%). Data miners report using an average of 4.6 software tools overall. STATISTICA, IBM SPSSModeler, and R received the strongest satisfaction ratings in both 2010 and 2009.
TECHNOLOGY: Data Mining most often occurs on a desktop or laptop computer, and frequently thedata is stored locally. Model scoring typically happens using the same software used to develop models.
STATISTICA users are more likely than other tool users to deploy models using PMML.
CHALLENGES:As in previous years, dirty data, explaining data mining to others, and difficult access todata are the top challenges data miners face. This year data miners also shared best practices forovercoming these challenges.
FUTURE: Data miners are optimistic about continued growth in the number of projects they will beconducting, and growth in data mining adoption is the number one future trend identified. There isroom to improve: only 13% of data miners rate their companys analytic capabilities as excellent and
only 8% rate their data quality as very strong.
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
6/37
!2011 Rexer Analytics 6
Where & How Data Miners Work
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
7/37!2011 Rexer Analytics 7
Where Data Miners Work
More data miners work in CRM / Marketing than any other field.
Many data miners also work in Academia and Financial Services.
These have been the three most commonly reported fields in each of the fourannual data miner surveys (2007-2010).
Many data miners work in several fields.
Question: In what fields do you TYPICALLY apply data mining? (Select all that apply)
10%
10%
10%
11%
13%
13%
14%
15%
15%
25%
29%
41%
0% 10% 20% 30% 40% 50%
Government
Internet-based
Manufacturing
Medical
Technology
Pharmaceutical
Retail
Telecommunications
Insurance
Academic
Financial
CRM/Marketing
Vendors were excluded from this analysis.
Data Mining is everywhere!
Data miners also report working in Non-profit (6%), Hospitality/Entertainment/
Sports (4%), Military/Security (3%), andOther (9%).
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
8/37!2011 Rexer Analytics 8
8%
9%
9%
11%
12%
13%
14%
16%
21%
21%
22%
25%
26%
27%
31%
32%
60%
68%
69%
0% 10% 20% 30% 40% 50% 60% 70%
MARS
Uplift Modeling
Link Analysis
Genetic Algorithms
Social Network Analysis
Rule Induction
Survival Analysis
Anomoly Detection
Bayesian
Support Vector
Ensemble Models
Association Rules
Text MiningFactor Analysis
Neural Nets
Time Series
Cluster Analysis
Regression
Decision Trees
Data Mining Algorithms
Decision trees, regression, and cluster analysis continue to form a triad of corealgorithms for most data miners. This has been very consistent over time.
However, a wide variety of algorithms are being used.
Question: What algorithms/analytic methods do you TYPICALLY use? (Select all that apply)
Corporate Consultants Academic NGO / Govt
10% 12% 4% 5%
Consultants are more likely to use
Ensemble Models
Consultants and corporate data miners aremore likely to use Uplift Modeling
Corporate Consultants Academic NGO / Govt
21% 27% 20% 18%
Vendors were excluded from this analysis.
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
9/37!2011 Rexer Analytics 9
Text Mining
STATISTICA Text Miner 19%
IBM SPSS Modeler 17%
SAS Text Miner 9%
IBM SPSS Text Analytics 7%
Rapid Miner 6%
Provalis Wordstat 2%
GATE 2%
KXEN 2%
Oracle Text or ODM 1%
Megaputer Text Analyst 1%
Autonomy 1%
Other 35%Text Miners
About a third of data minerscurrently incorporate text
mining into their analyses,
while another third plan to
do so.
Software Used
Plan to StartText Mining
No Plans to
Conduct TextMining
!" $!" %!" &!"
The focus of our text miningis to extract key themes
(sentiment analysis)
We use text fields as inputs /predictors in a larger model
We use text mining as part ofsocial network analyses
55%
59%
21%
Question: Which is the best description of your use of text mining?
Question: What software do you use to perform text mining?
Question: Which statement(s) describe your use of text mining?
Vendors were excluded from this analysis.
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
10/37!2011 Rexer Analytics 10
Number of Variables in Final Models
About one-third of data miners typically build final models with 10 or
fewer variables, while about 28% generally construct models with morethan 45 variables.
10%
26%
13%
11%
3%
6%
1%2%
0%
7%8%
13%
0%
5%
10%
15%
20%
25%
30%
1 to 5 6 to 10 11 to 15 16 to 20 21 to 25 26 to 30 31 to 35 36 to 40 41 to 45 46 to 50 51 to
100
More
than 100
Number of Variables in Models
Question: How many variables are typically incorporated into your final model? Vendors were excluded from this analysis.
There were no significant differences in numberof variables by company size, computing
platform, or employment type.
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
11/37!2011 Rexer Analytics 11
Number of Analytic Projects
About half of data miners report completing 5 or fewer projects in 2009.
Data miners completing 6 or more projects are more likely to use larger,centralized computing platforms.
1%
8%
46%
20%
14%
8%
3%
0%
10%
20%
30%
40%
50%
None One 2 to 5 6 to 10 11 to 20 21 to 50 More than
50
Number of Analytic Projects in 2009
Question: Please estimate the number of analytic projects you completed in 2009.
Data miners completing 6 or more projects in 2009 aremore likely to use a centralized main frame (23% vs.
15%), a local server (31% vs. 23%), or a desktop withdata on server, mainframe, or cloud (45% vs. 36%)
than are those completing fewer projects.
Data mining academics and consultants tended to
conduct fewer projects than did data miners incorporate or NGO / government settings (5.0, 9.8,
19.4, and 21.0 projects, respectively).
Vendors were excluded from this analysis.
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
12/37!2011 Rexer Analytics 12
35%
24%
49%
39%
26%
18%
7%
0%60%
Computing Environments
Most data mining happens on desktop and laptop computers.
Frequently the data and processing is local(not on servers, mainframe or cloud).
Only a small minority of data mining is on the cloud.
Question: What are the computing environments/platforms on which datamining/analytics occurs at your company/organization? (Check all that apply)
Corporate
Consultant
Academic
NGO/
Govt
Vendor
5% 10% 7% 3% 14%
20% 16% 14% 32% 26%
28% 30% 19% 29% 45%
48% 36% 25% 47% 39%
43% 49% 58% 58% 35%
29% 24% 15% 32% 37%
28% 36% 46% 42% 44%
Cloud Computing
Centralized Mainframe/Server
Local Server
Desktop PC/Workstation (with data &processing on server, mainframe or cloud)
Desktop PC/Workstation (withdata & processing locally)
Laptop PC (with data & processingon server, mainframe or cloud)
Laptop PC (with data &processing locally)
Overall
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
13/37!2011 Rexer Analytics 13
Internal vs. External Data Mining Resources
Data miners report that data mining is generally handled internally. Data miners
reported very similar results in the 2009 survey.
Also,14% of data miners report that their company offshores some analytics (anincrease from 8% in the 2009 survey).
Entirely internal
More internalthan external
Approximatelyhalf internal &
half external
Entirelyexternal
(1%)
More externalthan internal
Question: To what degree does your organization use internal versus external resources foryour data mining projects?
Offshoring Question (not graphed): Has your company moved any data mining or other
analytics to another country to take advantage of lower wages in the destination country?
Note: Responders answer this question
about their company/organization,therefore they are themselves internal.
The probability of external responses islimited by the structure of the question.
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
14/37!2011 Rexer Analytics 14
Whats Important to Data Miners
V d
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
15/37!2011 Rexer Analytics 15
The goals for data mining analyses are very diverse.
Several CRM goals are high on the list. More than a third of data miners identify that they are using datamining to improve the understanding of customers.
Some data mining tools are used for a wide range of goals, and others have more specific, niche uses.
Goals for Analyses
Overall
IBM
SPSS
Statistics
IBM
SPSS
Modeler Knime R SAS
SAS
Enterprise
Miner
STATISTICA
(StatSoft) WekaTotal Number of Goals 3.8 4.5 4.9 1.8 3.4 4.4 4.7 2.9 2.1
Improving understanding of customers 37% 55% 52% 15% 28% 64% 42% 22% 14%
Retaining customers 32% 45% 55% 3% 19% 42% 48% 15% 11%
Improving direct marketing programs 29% 40% 39% 12% 19% 48% 48% 19% 14%
Selling products / services to existingcustomers 29% 29% 38% 9% 21% 40% 55% 17% 11%
Market research / survey analysis 29% 62% 32% 12% 26% 31% 26% 33% 11%
Acquiring customers 27% 40% 45% 3% 19% 37% 39% 20% 4%
Risk management / credit scoring 26% 24% 32% 9% 19% 31% 39% 21% 25%
Improving customer experiences 25% 24% 35% 12% 19% 31% 32% 16% 14%
Fraud detection or prevention 21% 12% 28% 0% 23% 12% 39% 27% 7%
Sales forecasting 21% 29% 27% 6% 26% 27% 19% 16% 11%Medical advancement / drug discovery /biotech / genomics 18% 19% 4% 61% 26% 9% 10% 9% 36%
Price optimization 13% 21% 21% 0% 12% 13% 19% 13% 0%Manufacturing improvement 10% 2% 10% 6% 19% 1% 3% 23% 4%
Investment planning / optimization 10% 14% 6% 3% 9% 7% 16% 9% 7%
Website or search optimization 8% 10% 14% 3% 7% 7% 6% 2% 4%
Criminal or terrorist detection 6% 2% 13% 0% 2% 3% 3% 4% 7%
Collections 6% 0% 7% 3% 7% 13% 3% 3% 0%
Software optimization 6% 2% 4% 9% 12% 4% 3% 3% 4%
Language understanding 4% 0% 10% 0% 5% 1% 3% 1% 7%
Fundraising 3% 14% 4% 0% 5% 1% 6% 1% 4%
Higher Frequency Lower FrequencyQuestion: What were the goals of your analyses in 2009? (Select all that apply)
Vendors wereexcluded from
this analysis.
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
16/37!2011 Rexer Analytics 16
Importance of Model Explainability
Model explainability / transparency is important to most data miners.
It is particularly important to data miners working in NGO / Govt settings and todata miners who use IBM SPSS Statistics as their primary tool.
Percent indicating Model Explainability isExtremely Importantby Primary Tool Used
Question: How important is model explainability / transparency to you?
Extremely Important
39% 38% 16% 4%
0% 20% 40% 60% 80% 100%
ModeratelyImportant
Not at allImportant
(2%)
17%
25%
30%
33%
37%
38%
45%
60%
0% 10% 20% 30% 40% 50% 60% 70%
R
Knime
Weka
STATISTICA
SAS Enterprise
IBM SPSS Modeler
SAS
IBM SPSS Statistics
Percent indicating Model Explainability isExtremely Importantby Employment Type
49%
58%
41%
35%
38%
0% 10% 20% 30% 40% 50% 60% 70%
Vendor
NGO / Gov't
Academic
Consultant
Corporate
Overall
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
17/37!2011 Rexer Analytics 17
Data Mining Tools:Usage & Satisfaction
Survey Questions:
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
18/37!2011 Rexer Analytics 18
Data Mining SoftwareSurvey Questions:
What Data mining/analytic tools did you use in 2009?(rate each as never, occasionally, or frequently)
What one Data Mining software package do you usemost frequently?
Overall Corporate Consultants Academics NGO / Govt
Data miners report using an average of 4.6 software tools.
R is used by the most data miners (43%).
STATISTICA is the primary data mining tool chosen most often (18%).
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
19/37!2011 Rexer Analytics 19
Satisfaction with Data Mining Tools
Question: Please rate your overall satisfaction with your primary Data Mining software package.
2010 2009
Sample size < 20
STATISTICA received the highest satisfaction ratings. R and
IBM SPSS Modeler users are also quite satisfied.
These three tools also had the highest satisfaction rates in 2009.
Vendors were excluded from this analysis.
Vendors were
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
20/37!2011 Rexer Analytics 20
Continuing Use of Data Mining Tools
2010 2009
Sample size < 20
About 80% of STATISTICA and R users report that they are extremely
likely to stay with these primary tools over the next 3 years.
Only 18% of Weka users feel this way.
The ranking of tools on this measure is different than that seen in 2009.
Question: What is the likelihood that you will continue to use this tool as your primary Data Mining software package over the next 3 years?
Vendors wereexcluded from
this analysis.
Vendors were
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
21/37!2011 Rexer Analytics 21
Satisfaction with Tools: Details
STATISTICA received strong ratings across all dimensions.
OverallIBM SPSSStatistics
IBM SPSSModeler Knime R SAS
SAS
EnterpriseMiner
STATISTICA(StatSoft) Weka
Quality and accuracy of model performance 4.33 4.20 4.12 3.96 4.51 4.20 4.24 4.82 3.65
Variety of available algorithms 4.32 3.70 4.28 4.10 4.71 4.04 3.97 4.85 4.15
Dependability/Stability of software 4.26 4.03 3.88 4.18 4.24 4.37 3.96 4.76 3.96
Data manipulation capabilities 4.24 4.15 4.22 4.03 4.02 4.44 3.83 4.64 3.52
Ease of use 4.19 4.28 4.53 4.33 3.60 3.64 3.69 4.63 4.08
The software contains a specific analytic
technique that I need 4.17 4.08 3.90 3.93 4.46 3.98 3.90 4.63 3.79
Ability to automate repetitive tasks 4.11 3.65 3.71 3.71 4.50 4.23 3.75 4.68 3.67Quality of user interface 4.09 3.98 4.37 4.36 3.66 3.54 3.45 4.71 3.46
Good metrics of model quality 4.08 3.74 3.88 3.46 4.32 3.96 4.00 4.70 3.52
Ability to handle very large data sets 4.07 3.83 3.97 3.79 3.00 4.59 4.24 4.78 2.77
Good variable discovery, profiling and
selection 4.06 3.55 4.13 3.72 4.08 3.82 3.79 4.71 3.42
Ease of scoring models to other data sets 4.05 3.33 4.16 4.00 4.06 3.83 3.93 4.55 3.67
Quality of output / Ease of interpretation 4.04 4.03 4.00 3.86 3.85 3.70 3.69 4.74 3.21
Speed 4.03 3.83 3.78 3.93 3.63 4.07 3.83 4.76 3.24Ability to modify algorithm options to fine-tune
analyses 3.97 3.45 3.63 3.97 4.53 3.73 3.64 4.45 3.92
Useful help menu, demos and tutorials 3.88 3.78 3.88 3.64 3.78 3.50 3.57 4.50 3.30
Enables mining within one's database 3.88 3.48 3.94 3.70 3.50 4.09 3.57 4.30 3.39
Strong graphical visualization of models 3.70 3.35 3.62 3.37 4.13 2.68 3.17 4.80 2.88
Cost of software 3.69 3.55 3.23 4.88 4.69 2.31 2.36 4.18 4.84
Higher Satisfaction Lower SatisfactionMean satisfaction rating on 1-5 scale
Question: Rate how satisfied you are with the performance of your primary data mining package (identified earlier) on each of these factors.
Vendors wereexcluded from
this analysis.
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
22/37!2011 Rexer Analytics 22
Factors Most Related to Primary Tool Satisfaction
The simple correlations between the detailed satisfaction items and overall primarytool satisfaction reveal the factors most closely related to primary tool satisfaction.
Correlation with Overall SatisfactionOverall Corporate Consultant Academic NGO / Govt
Correlation Rank Correlation Rank Correlation Rank Correlation Rank Correlation Rank
Quality of user interface .428 1 .469 4 .430 2 .532 1 .142 16
Good variable discovery, profiling
and selection.419 2 .523 2 .315 11 .311 11 .469 3
Good metrics of model quality .414 3 .462 5 .385 4 .379 7 .503 2
Ease of use .402 4 .528 1 .279 14 .385 6 .346 10
Quality and accuracy of model
performance.397 5 .444 6 .377 5 .391 5 .178 14
Useful help menu, demos and
tutorials.396 6 .422 11 .444 1 .284 14 .413 6
The software contains a specific
analytic technique that I need.394 7 .437 8 .368 6 .267 16 .574 1
Quality of output / Ease of
interpretation.392 8 .506 3 .428 3 .244 17 -.172 19
Variety of available algorithms .371 9 .426 9 .344 8 .313 10 .405 7
Strong graphical visualization of
models.366 10 .406 12 .362 7 .095 19 .424 4
Ease of scoring models to other
data sets .365 11 .440 7 .317 10 .244 18 .302 11Speed .360 12 .424 10 .253 16 .428 3 .351 8
Dependability/Stability of software .337 13 .360 15 .289 13 .501 2 .158 15
Ability to automate repetitive tasks .332 14 .352 16 .306 12 .427 4 .190 13
Ability to modify algorithm options to
fine-tune analyses.332 15 .342 17 .324 9 .375 8 .351 9
Data manipulation capabilities .322 16 .371 14 .278 15 .308 12 .416 5
Ability to handle very large data sets .289 17 .384 13 .203 18 .286 13 .229 12
Enables mining within one's
database.217 18 .225 19 .222 17 .274 15 .060 17
Cost of software .202 19 .320 18 .118 19 .325 9 .000 18
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
23/37!2011 Rexer Analytics 23
Data Mining Tools: Strengths & Weaknesses
Tool strengths and weaknesses were identified by the satisfaction ratings of data minerswho considered each tool to be their primary data mining tool.
IBM SPSS
Statistics
IBM SPSS
Modeler Knime R SAS
SAS Enterprise
MinerSTATISTICA
(StatSoft) Weka
Top 3
Strengths
1)Ease of use
2)Quality andaccuracy ofmodelperformance
3)Data
Manipulationcapabilities
1)Ease of use
2)Quality ofuser interface
3)Variety ofavailablealgorithms
1)Cost ofsoftware
2)Quality ofuser interface
3)Ease of use
1)Variety ofavailablealgorithms
2)Cost ofsoftware
3)Ability to
modifyalgorithmoptions tofine-tuneanalyses
1)Ability tohandle verylarge datasets
2)DataManipulationcapabilities
3)Dependability/stability ofsoftware
1)Ability tohandle verylarge datasets
2)Quality andaccuracy ofmodel
performance
3)Good metricsof modelperformance
1)Variety ofavailablealgorithms
2)Quality andaccuracy ofmodelperformance
3)Stronggraphicalvisualizationof models
1)Cost ofsoftware
2)Variety ofavailablealgorithms
3)Ease of use
Top 3Weaknesses
1)Ease ofscoring toother datasets
2)Stronggraphicalvisualizationof models
3)Ability tomodifyalgorithmoptions tofine-tuneanalyses
1)Cost ofsoftware
2)Stronggraphicalvisualizationof models
3)Ability tomodifyalgorithmoptions tofine-tuneanalyses
1)Stronggraphicalvisualizationof models
2)Good metricsof modelquality
3)Useful helpmenu, demosand tutorials
1)Ability tohandle verylarge datasets
2)Enablesmining withinonesdatabase
3)Ease of use
1)Cost ofsoftware
2)Stronggraphicalvisualizationof models
3)Useful helpmenu, demosand tutorials
1)Cost ofsoftware
2)Stronggraphicalvisualizationof models
3)Quality ofuser interface
1)Cost ofsoftware
2)Enablesmining withinonesdatabase
3)Ability tomodifyalgorithmoptions tofine-tuneanalyses
1)Ability tohandle verylarge datasets
2)Stronggraphicalvisualizationof models
3)Quality ofoutput/easeofinterpretation
Vendors were excluded from this analysis. Note: Strengths and weaknesses determined by mean on 5-point satisfaction scale.
Question: Rate how satisfied you are with the performance of your primary data mining package (identified earlier) on each of these factors.
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
24/37!2011 Rexer Analytics 24
Priorities in Selecting a Data Mining Package Data miners have a diverse set of priorities when selecting a data mining package.
Almost of third of data miners identify model performance among their top 3 priorities.
Users of different tools identified different factors as their top priorities.
Overall
IBM
SPSS
Statistics
IBM
SPSS
Modeler Knime R SAS
SAS
Enterprise
Miner
STATISTICA
(StatSoft) Weka
Quality and accuracy of model performance 31% 24% 23% 24% 19% 18% 35% 64% 29%
Ability to handle very large data sets 27% 33% 30% 12% 16% 39% 29% 18% 25%
Variety of available algorithms 26% 33% 34% 21% 42% 22% 19% 19% 43%
Cost of software 23% 26% 21% 36% 33% 21% 23% 6% 29%
Data manipulation capabilities 22% 19% 38% 21% 30% 27% 23% 18% 18%
Ease of use20% 26% 24% 39% 9% 12% 26% 21% 7%
Dependability/Stability of software 18% 12% 20% 12% 16% 21% 13% 34% 4%
Strong graphical visualization of models 15% 10% 7% 12% 26% 19% 13% 21% 11%
Speed 14% 5% 7% 9% 23% 13% 10% 21% 18%The software contains a specific analytic
technique that I need 13% 17% 7% 15% 16% 18% 6% 10% 18%
Ability to automate repetitive tasks 13% 12% 10% 21% 14% 16% 6% 10% 11%
Quality of graphics 12% 26% 11% 12% 7% 4% 13% 8% 11%
Quality of user interface 11% 7% 15% 12% 2% 6% 13% 13% 14%Ability to modify algorithm options to fine-tune
analyses 10% 7% 8% 9% 23% 7% 23% 5% 18%Good variable discovery, profiling and
selection9% 2% 10% 9% 2% 13% 13% 5% 11%
Good metrics of model quality 7% 0% 4% 12% 7% 7% 16% 15% 11%
Ease of scoring models to other data sets 6% 5% 10% 3% 2% 10% 6% 8% 0%
Enables mining within ones database 5% 10% 6% 6% 0% 4% 0% 0% 4%
Useful help menu, demos and tutorials 4% 10% 4% 9% 5% 0% 3% 2% 0%
Higher Priority Lower Priority
The table shows the frequency that each
factor was ranked among the top 3.Question: Please identify the top five factors that are most import
to you when selecting your primary data mining software.
Vendors wereexcluded from
this analysis.
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
25/37
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
26/37
!2011 Rexer Analytics 26
7%
11%
11%
12%
13%
13%
26%
51%
0% 20% 40% 60%
.Net
C code
PMML
Java
We dont score our models tonew data
Other
SQL
Data mining tool used to create
model
Technologies Used to Score / Deploy Models
Models are often scored / deployed using the same data mining tool used to create them.
This is especially true among SPSS Modeler users.
STATISTICA users are more likely thananyone else to score / deploy using PMML.
Question: What technology do you use to score or deploy yourmodels to new data records? (Select all that apply)
Primary Tool
Score
Using Data
Mining Tool*
Score
Using SQL*
Score
Using
PMML*
IBM SPSS
Statistics50% 31% 2%
IBM SPSS
Modeler
69% 31% 10%
Knime 55% 9% 15%
R 53% 16% 12%
SAS 54% 34% 3%
SAS Enterprise
Miner45% 32% 10%
STATISTICA(StatSoft)
43% 23% 32%
Weka 36% 7% 0%
Vendors were excluded from this analysis.
Exclusively Data
Mining Tool
Data Mining Tool
and Other
Exclusively
Something Other
than Data Mining
Tool
*Percentages are within the group of data miners whoidentify these tools as their primary tool.
31%
21%
48%
Data mining tool used to create model
SQL
Other
We dont score our models to new data
Java
PMML
C code
.Net
PMML I d E
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
27/37
!2011 Rexer Analytics 27
PMML Import and Export
PMML adoption is strongest among STATISTICA users.
! The previous page showed that 32% of STATISTICA primary users score / deploy using PMML.
!In the tables below we see that a third of the people importing PMML are importing their PMML
into STATISTICA. A similar proportion of PMML exporters are exporting their PMML from
STATISTICA.
PMML Export Tool % cases
STATISTICA (Statsoft) 36%
Knime 14%
R 13%
IBM/SPSS Modeler 9%
Own package/code 6%
Rapid Miner 6%
SAS 6%
KXEN 3%
SPSS 3%
PMML Import Tool % cases
STATISTICA (Statsoft) 33%
Knime 11%
R 9%
SAS 6%
IBM/SPSS Modeler 5%
Zementis ADAPA 5%
SAS Enterprise Miner 4%
Portrait Customer
Analytics3%
WEKA 3%
Oracle 3%
Note: Any tools identified by fewer than 3% of responders are not shown.
Question: If you use PMML, what
software package(s) do you use toproduce (export) your PMML?
Question: If you use PMML, what
software package(s) do you use toconsume (import) your PMML?
Vendors were excluded from this analysis.
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
28/37
!2011 Rexer Analytics 28
Overcoming Challenges
&
Optimism about the Future
Ch ll & O i Th
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
29/37
!2011 Rexer Analytics 29
Challenges & Overcoming Them
Data miners face a consistent set of challenges. The topchallenges facing data miners are:
Dirty data: #1 challenge every year, 2007-2010
Explaining data mining to others: Always in the top 4 challenges,2007-2010
Difficult access to data: Always in the top 3 challenges, 2007-2010
This year survey respondents shared their BestPractices for overcoming these challenges.
E.g., Dirty Data: Use anomaly detection to flag records to put beforesubject matter experts.
E.g., Dirty Data: All projects begin with low-level data reports showingcounts of records, verification of keys (uniqueness, widows/orphans), anddistributions of field contents. These reports are echoed back to the datacontent experts.
A summarized list of the best practices outlined by the 2010 survey
responders is available at www.RexerAnalytics.com.
A l ti C bilit & D t Q lit
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
30/37
!2011 Rexer Analytics 30
Analytic Capability & Data Quality
Analytic capability:
Theres room to improve if were going to Compete on Analytics.
Data miners at small companies (
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
31/37
!2011 Rexer Analytics 31
Data Mining and the Economy
Question: How will the number of data mining projects your organization
conducts in 2010 compare to what has been typical in the past few years?
There is a strong market for data mining:
73% of data miners foresee increases in the number of data mining projects.
Data miners working in diverse settings share this optimism.
Number of Data Mining Projects in 2010
F t T d i D t Mi i
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
32/37
!2011 Rexer Analytics 32
Number of respondents
Survey respondents shared their ideas about future trends in data
mining (an open-ended survey question).
The future trend foreseen by the most data miners is a growth in the
adoption of data mining.
Future Trends in Data Mining
50
32
32
26
15
15
12
11
! #! $! %! &! '! (!
Growth in Data Mining Adoption
Text Mining
Social Network Analysis
Automation
Cloud Computing
Data Visualization
Tools Get Easier to Use
Scaling to Bigger Data
Question: What do you envision as the primary future
trends in data mining? (text box provided for response)
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
33/37
!2011 Rexer Analytics 33
Appendix: Where do Data MinersCome From?
Ed ti l B k d
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
34/37
!2011 Rexer Analytics 34
Educational Background
Question: What is your formal training/education/dominant work background? (Check option that is best description)
Data miners have diverse backgrounds.
Almost a quarter of data miners have backgrounds in computer science.
Many others were trained in statistics or engineering.
24%
18%
12%
9%
9%
7%
6%
3%
3%
2%
2%
3%
0% 5% 10% 15% 20% 25%
Computer Science / IT
Statistics
Engineering
EconomicsMathematics
Social Science
Medical / Biological Sciences
Market Research
Financial Forecasting / Risk
Operation Research
Humanities / Liberal Arts
Physics
Business / Marketing
Other
1%
1%
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
35/37
!2011 Rexer Analytics 35
Appendix: Rexer Analytics
Rexer Analytics Overview
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
36/37
!2011 Rexer Analytics 36
Company Summary
Small privately held consulting firm Founded in 2002
Focus: Analytic and CRM Consulting
(applied statistics & data mining)
Example Projects Fraud detection
Customer attrition analysis & prediction
Text mining
Customer segmentation
Sales forecasting
Market basket analysis
Analysis of product returns and failure
CRM metric design & measurement
Predictive models for campaign targeting & cross-sell
Survey research (to understand customer needs & customer decision making)
Senior Staff
Karl Rexer, PhD Paul Gearan
Heather Allen, PhD
Roberta Chicos
Recent Clients Redbox
ADT Securities
Hewlett-Packard
Davol (CR Bard)
Oracle
Loan Depot
Accudata
Palladium
Rexer Analytics Overview
-
7/21/2019 Rexer Analytics 2010 Data Miner Survey Summary Report
37/37
Karl Rexer, PhD
617-233-8185
Rexer Analytics
30 Vine Street
Winchester, MA 01890USA
www.RexerAnalytics.com
For more information contact: