Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big...
Transcript of Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big...
Seeing the Big Picture Segmenting Images to Create Data
15.071x – The Analytics Edge
Image Segmentation
• Divide up digital images to salient regions/clusters corresponding to individual surfaces, objects, or natural parts of objects
• Clusters should be uniform and homogenous with respect to certain characteristics (color, intensity, texture)
• Goal: Useful and analyzable image representation
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 1
Wide Applications
• Medical Imaging • Locate tissue classes, organs, pathologies and tumors • Measure tissue/tumor volume
• Object Detection • Detect facial features in photos • Detect pedestrians in footages of surveillance videos
• Recognition tasks • Fingerprint/Iris recognition
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 2
Various Methods
• Clustering methods • Partition image to clusters based on differences in pixel
colors, intensity or texture
• Edge detection • Based on the detection of discontinuity, such as an abrupt
change in the gray level in gray-scale images
• Region-growing methods • Divides image into regions, then sequentially merges
sufficiently similar regions
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 3
In this Recitation…
• Review hierarchical and k-means clustering in R
• Restrict ourselves to gray-scale images • Simple example of a flower image (flower.csv) • Medical imaging application with examples of transverse
MRI images of the brain (healthy.csv and tumor.csv)
• Compare the use, pros and cons of all analytics methods we have seen so far
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 4
Grayscale Images
• Image is represented as a matrix of pixel intensity values ranging from 0 (black) to 1 (white)
• For 8 bits/pixel (bpp), 256 color levels 7 pixels
0 .2 .5 00 0.2
.2 0 0 .5.5 .20
0 .5 .7 00 0.5
.5 0 1 .7.7 .50
0 .5 .7 00 0.5
.2 0 0 .5.5 .20
0 .2 .5 00 0.2
7 pixels
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 5
0
.2
0 .7
.5 1 .5
0 .7 0
.2 .2
0 0
• Cluster pixels according to their intensity values
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
Grayscale Image Segmentation
.2 .5 00 0.2
0 0 .5.5 .20
.5 00 0.5
0 .7.7 0
.5 00 .5
0 0 .5.5 0
.2 .5 00 .2
0
.2
0
.5
0
.2
0 0 Intensity Vector of size n = 7x7
7 pixels Pairwise distances n(n-1)/2
.7
1
.7
.5
0
.2
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 6
Dendrogram Example
ABCDEF
A B C F
BCDEF
BC
DEF
DE
D EData Observations
Cluster
Distance between clusters BCDEF and BC
Level 5
Level 4
Level 3
Level 2
Level 1
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 7
Dendrogram Example
ABCDEF
A B C F
BCDEF
BC
DEF
DE
D E Level 1
Level 2
Level 3
Level 4
Level 5
A, BC, DE, F 4 clusters
A, BC, DEF 3 clusters
A, BCDEF 2 clusters
CO
AR
SER
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 8
Dendrogram Example
ABCDEF
A B C F
BCDEF
BC
DEF
DE
D E Level 1
Level 2
Level 3
Level 4
Level 5
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9
Flower Dendrogram
4 clusters
2 clusters
3 clusters
15.071x – Predictive Diagnosis: Discovering Patterns for Disease Detection 10
k-Means Clustering
• The k-means clustering aims at partitioning the data into k clusters in which each data point belongs to the cluster whose mean is the nearest
k-Means Clustering Algorithm
1. Specify desired number of clusters k
2. Randomly assign each data point to a cluster
3. Compute cluster centroids
4. Re-assign each point to the closest cluster centroid
5. Re-compute cluster centroids
6. Repeat 4 and 5 until no improvement is made
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 11
k-Means Clustering
k-Means Clustering Algorithm
1. Specify desired number of clusters k
k = 2
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 12
k-Means Clustering
k-Means Clustering Algorithm
1. Specify desired number of clusters k
2. Randomly assign each data point to a cluster
k = 2
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 13
k-Means Clustering
k-Means Clustering Algorithm
1. Specify desired number of clusters k
2. Randomly assign each data point to a cluster
k = 2
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 14
k-Means Clustering
k-Means Clustering Algorithm
1. Specify desired number of clusters k
2. Randomly assign each data point to a cluster
k = 2
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 15
k-Means Clustering Algorithm
1. Specify desired number ofclusters k
2. Randomly assign each datapoint to a cluster
3. Compute cluster centroids
k-Means Clustering
k-Means Clustering Algorithm
1. Specify desired number of clusters k
2. Randomly assign each data point to a cluster
3. Compute cluster centroids
4. Re-assign each point to the closest cluster centroid
k = 2
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 16
k-Means Clustering
k-Means Clustering Algorithm
1. Specify desired number of clusters k
2. Randomly assign each data point to a cluster
3. Compute cluster centroids
4. Re-assign each point to the closest cluster centroid
k = 2
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 17
k-Means Clustering
k-Means Clustering Algorithm
1. Specify desired number of clusters k
2. Randomly assign each data point to a cluster
3. Compute cluster centroids
4. Re-assign each point to the closest cluster centroid
5. Re-compute cluster centroids
6. Repeat 4 and 5 until no improvement is made
k = 2
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 18
First Taste of a Fascinating Field
• MRI image segmentation is subject of ongoing research
• k-means is a good starting point, but not enough • Advanced clustering techniques such as the modified
fuzzy k-means (MFCM) clustering technique • Packages in R specialized for medical image analysis
http://cran.r-project.org/web/views/MedicalImaging.html
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 19
3D Reconstruction
• 3D reconstruction from 2D cross sectional MRI images
MRI image removed due to copyright restrictions.
• Important in the medical field for diagnosis, surgical planning and biological research
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 20
Comparison of Methods
Used For Pros Cons
Lin
ear
Reg
ress
ion
Log
isti
cR
egre
ssio
n
Predicting a continuous outcome (salary, price, number of votes, etc.)
Predicting a categorical outcome (Yes/No, Sell/ Buy, Accept/Reject, etc.)
•
•
Simple, well recognized Works on small and large datasets
• Assumes a linear relationship Y = a log(X)+b
• Computes probabilities that can be used to
• Assumes a linear relationship
assess confidence of the prediction
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 21
Comparison of Methods
Used For Pros Cons
CA
RT
R
ando
m
For
ests
Predicting a categorical outcome (quality rating 1--5, Buy/Sell/Hold) or a continuous outcome (salary, price, etc.)
• Can handle datasets without a linear relationship
• Easy to explain and interpret
Same as CART • Can improve accuracy over CART
• May not work well with small datasets
• Many parameters to adjust
• Not as easy to explain as CART
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 22
Comparison of Methods
• Finding similargroups
• Clustering intosmaller groups andapplying predictivemethods on groups
Same as Hierarchical Clustering
Used For Pros Cons
Hie
rarc
hica
lC
lust
erin
g k-
mea
nsC
lust
erin
g
• No need to select • Hard to use with
•
number ofclusters a prioriVisualize with a
large datasets
dendrogram
• Works with anydataset size
• Need to selectnumber ofclusters beforealgorithm
15.071x – Seeing the Big Picture: Segmenting Images to Create Data 23
MIT OpenCourseWare https://ocw.mit.edu/
15.071 Analytics Edge Spring 2017
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.