Project 2 Data Mining Part 1
-
Upload
rayborg -
Category
Technology
-
view
972 -
download
2
description
Transcript of Project 2 Data Mining Part 1
![Page 1: Project 2 Data Mining Part 1](https://reader034.fdocuments.net/reader034/viewer/2022051323/547e1c00b37959a22b8b5392/html5/thumbnails/1.jpg)
Project II Data Mining a
Mushroom Dataset Group 1
Raymond Borges
Jarilyn Hernandez
![Page 2: Project 2 Data Mining Part 1](https://reader034.fdocuments.net/reader034/viewer/2022051323/547e1c00b37959a22b8b5392/html5/thumbnails/2.jpg)
The Mushroom Dataset
Data Set Characteristics:
Multivariate Number of Instances:
8124 Area: Life
Attribute Characteristics:
Categorical Number of Attributes:
22 Date Donated:
1987
This data set includes descriptions of hypothetical samples
corresponding to 23 species of gilled mushrooms in the
Agaricus and Lepiota Family.
Each species is identified as definitely edible, definitely
poisonous, or of unknown edibility and not recommended.
This latter class was combined with the poisonous one.
![Page 3: Project 2 Data Mining Part 1](https://reader034.fdocuments.net/reader034/viewer/2022051323/547e1c00b37959a22b8b5392/html5/thumbnails/3.jpg)
Mushroom Dataset
22 Independent attributes
1 Class Attribute (Can you eat it?)
Edible(4,208)51.8%
Poisonous(3,916)48.2%
![Page 4: Project 2 Data Mining Part 1](https://reader034.fdocuments.net/reader034/viewer/2022051323/547e1c00b37959a22b8b5392/html5/thumbnails/4.jpg)
Mushroom Dataset
22 Attributes Total
18 Intrinsically
on Mushroom
4 Others
1 Habitat
1 Population
1 Bruises
1 Odor
![Page 5: Project 2 Data Mining Part 1](https://reader034.fdocuments.net/reader034/viewer/2022051323/547e1c00b37959a22b8b5392/html5/thumbnails/5.jpg)
Odor attribute, 1R Learner
The Simplest Rule 98.52% Acc.
A = almond
C = creosote
F = foul
L = anise
M = musty
N = none
P = pungent
S = spicy
Y = fishy
a c f l m n p s y
![Page 6: Project 2 Data Mining Part 1](https://reader034.fdocuments.net/reader034/viewer/2022051323/547e1c00b37959a22b8b5392/html5/thumbnails/6.jpg)
J48 Tree 100%
Classification
P P P P P E P E
almond
creosote foul anise spicy fishy
E = Edible
P = Poisonous
E E E E E E P E
black
brown buff chocolate green orange purple yellow
E
broad
narrow
E P
P E E E E E
abundant clustered numerous scattered several solitary
musty none pungent
white
crowded distant close
![Page 7: Project 2 Data Mining Part 1](https://reader034.fdocuments.net/reader034/viewer/2022051323/547e1c00b37959a22b8b5392/html5/thumbnails/7.jpg)
Simplest rule-set (Benchmark)
These are Poisonous 1. Odor = not almond or anise or none
(120 poisonous cases missed, 98.52% accuracy)
2. Spore-print-color =green
(48 cases missed, 99.41% accuracy)
3. Odor=none and stalk-surface-below-ring = scaly and stalk-color-above-ring= not brown
(8 cases missed, 99.90% accuracy)
4. Habitat= leaves and cap-color=white
4. May also be population=clustered and cap-color=white (100% accuracy)
![Page 8: Project 2 Data Mining Part 1](https://reader034.fdocuments.net/reader034/viewer/2022051323/547e1c00b37959a22b8b5392/html5/thumbnails/8.jpg)
Habitat Insights
Woods Grasses Leaves Meadows Paths Urban Waste
Waste is safe but stay away from paths
![Page 9: Project 2 Data Mining Part 1](https://reader034.fdocuments.net/reader034/viewer/2022051323/547e1c00b37959a22b8b5392/html5/thumbnails/9.jpg)
Population Insights
Abundant Clustered Numerous Scattered Several Solitary
Mushrooms travel safer in groups
![Page 10: Project 2 Data Mining Part 1](https://reader034.fdocuments.net/reader034/viewer/2022051323/547e1c00b37959a22b8b5392/html5/thumbnails/10.jpg)
Information Knowledge
Population Data %Rates vs. Mushrooms
Abundant Clustered Numerous Scattered Several Solitary 0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
% Poisonous % Edible
![Page 11: Project 2 Data Mining Part 1](https://reader034.fdocuments.net/reader034/viewer/2022051323/547e1c00b37959a22b8b5392/html5/thumbnails/11.jpg)
Poisonous/Edible Ratio
vs. Mushroom Population Density
solitary
several
scattered
numerous clustered
abundant
-50.00%
0.00%
50.00%
100.00%
150.00%
200.00%
250.00%
300.00%
0 1 2 3 4 5 6 7
Po
iso
no
us/
Edib
le R
atio
Mushroom Density
![Page 12: Project 2 Data Mining Part 1](https://reader034.fdocuments.net/reader034/viewer/2022051323/547e1c00b37959a22b8b5392/html5/thumbnails/12.jpg)
Conclusions
If it stinks don’t eat it, 98.52% accuracy
If it doesn’t stink and it’s spore color is not
green then you have a 99.41% chance of
survival
Odor and spore color may be the best
attributes statistically but not in the field
![Page 13: Project 2 Data Mining Part 1](https://reader034.fdocuments.net/reader034/viewer/2022051323/547e1c00b37959a22b8b5392/html5/thumbnails/13.jpg)
Future Work Use more easily identified attributes to classify
mushrooms to produce a method of easier visual classification
Eliminate nonvisual attributes
Focus on visual-queue attributes, e.g.
habitat, population, cap and stalk
Compare the two methods