Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big...

25
Seeing the Big Picture Segmenting Images to Create Data 15.071x – The Analytics Edge

Transcript of Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big...

Page 1: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

Seeing the Big Picture Segmenting Images to Create Data

15.071x – The Analytics Edge

Page 2: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

Image Segmentation

• Divide up digital images to salient regions/clusters corresponding to individual surfaces, objects, or natural parts of objects

• Clusters should be uniform and homogenous with respect to certain characteristics (color, intensity, texture)

• Goal: Useful and analyzable image representation

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 1

Page 3: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

Wide Applications

• Medical Imaging • Locate tissue classes, organs, pathologies and tumors • Measure tissue/tumor volume

• Object Detection • Detect facial features in photos • Detect pedestrians in footages of surveillance videos

• Recognition tasks • Fingerprint/Iris recognition

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 2

Page 4: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

Various Methods

• Clustering methods • Partition image to clusters based on differences in pixel

colors, intensity or texture

• Edge detection • Based on the detection of discontinuity, such as an abrupt

change in the gray level in gray-scale images

• Region-growing methods • Divides image into regions, then sequentially merges

sufficiently similar regions

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 3

Page 5: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

In this Recitation…

• Review hierarchical and k-means clustering in R

• Restrict ourselves to gray-scale images • Simple example of a flower image (flower.csv) • Medical imaging application with examples of transverse

MRI images of the brain (healthy.csv and tumor.csv)

• Compare the use, pros and cons of all analytics methods we have seen so far

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 4

Page 6: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

Grayscale Images

• Image is represented as a matrix of pixel intensity values ranging from 0 (black) to 1 (white)

• For 8 bits/pixel (bpp), 256 color levels 7 pixels

0 .2 .5 00 0.2

.2 0 0 .5.5 .20

0 .5 .7 00 0.5

.5 0 1 .7.7 .50

0 .5 .7 00 0.5

.2 0 0 .5.5 .20

0 .2 .5 00 0.2

7 pixels

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 5

Page 7: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

0

.2

0 .7

.5 1 .5

0 .7 0

.2 .2

0 0

• Cluster pixels according to their intensity values

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1

Grayscale Image Segmentation

.2 .5 00 0.2

0 0 .5.5 .20

.5 00 0.5

0 .7.7 0

.5 00 .5

0 0 .5.5 0

.2 .5 00 .2

0

.2

0

.5

0

.2

0 0 Intensity Vector of size n = 7x7

7 pixels Pairwise distances n(n-1)/2

.7

1

.7

.5

0

.2

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 6

Page 8: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

Dendrogram Example

ABCDEF

A B C F

BCDEF

BC

DEF

DE

D EData Observations

Cluster

Distance between clusters BCDEF and BC

Level 5

Level 4

Level 3

Level 2

Level 1

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 7

Page 9: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

Dendrogram Example

ABCDEF

A B C F

BCDEF

BC

DEF

DE

D E Level 1

Level 2

Level 3

Level 4

Level 5

A, BC, DE, F 4 clusters

A, BC, DEF 3 clusters

A, BCDEF 2 clusters

CO

AR

SER

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 8

Page 10: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

Dendrogram Example

ABCDEF

A B C F

BCDEF

BC

DEF

DE

D E Level 1

Level 2

Level 3

Level 4

Level 5

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9

Page 11: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

Flower Dendrogram

4 clusters

2 clusters

3 clusters

15.071x – Predictive Diagnosis: Discovering Patterns for Disease Detection 10

Page 12: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

k-Means Clustering

• The k-means clustering aims at partitioning the data into k clusters in which each data point belongs to the cluster whose mean is the nearest

k-Means Clustering Algorithm

1. Specify desired number of clusters k

2. Randomly assign each data point to a cluster

3. Compute cluster centroids

4. Re-assign each point to the closest cluster centroid

5. Re-compute cluster centroids

6. Repeat 4 and 5 until no improvement is made

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 11

Page 13: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

k-Means Clustering

k-Means Clustering Algorithm

1. Specify desired number of clusters k

k = 2

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 12

Page 14: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

k-Means Clustering

k-Means Clustering Algorithm

1. Specify desired number of clusters k

2. Randomly assign each data point to a cluster

k = 2

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 13

Page 15: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

k-Means Clustering

k-Means Clustering Algorithm

1. Specify desired number of clusters k

2. Randomly assign each data point to a cluster

k = 2

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 14

Page 16: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

k-Means Clustering

k-Means Clustering Algorithm

1. Specify desired number of clusters k

2. Randomly assign each data point to a cluster

k = 2

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 15

Page 17: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

k-Means Clustering Algorithm

1. Specify desired number ofclusters k

2. Randomly assign each datapoint to a cluster

3. Compute cluster centroids

k-Means Clustering

k-Means Clustering Algorithm

1. Specify desired number of clusters k

2. Randomly assign each data point to a cluster

3. Compute cluster centroids

4. Re-assign each point to the closest cluster centroid

k = 2

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 16

Page 18: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

k-Means Clustering

k-Means Clustering Algorithm

1. Specify desired number of clusters k

2. Randomly assign each data point to a cluster

3. Compute cluster centroids

4. Re-assign each point to the closest cluster centroid

k = 2

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 17

Page 19: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

k-Means Clustering

k-Means Clustering Algorithm

1. Specify desired number of clusters k

2. Randomly assign each data point to a cluster

3. Compute cluster centroids

4. Re-assign each point to the closest cluster centroid

5. Re-compute cluster centroids

6. Repeat 4 and 5 until no improvement is made

k = 2

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 18

Page 20: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

First Taste of a Fascinating Field

• MRI image segmentation is subject of ongoing research

• k-means is a good starting point, but not enough • Advanced clustering techniques such as the modified

fuzzy k-means (MFCM) clustering technique • Packages in R specialized for medical image analysis

http://cran.r-project.org/web/views/MedicalImaging.html

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 19

Page 21: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

3D Reconstruction

• 3D reconstruction from 2D cross sectional MRI images

MRI image removed due to copyright restrictions.

• Important in the medical field for diagnosis, surgical planning and biological research

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 20

Page 22: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

Comparison of Methods

Used For Pros Cons

Lin

ear

Reg

ress

ion

Log

isti

cR

egre

ssio

n

Predicting a continuous outcome (salary, price, number of votes, etc.)

Predicting a categorical outcome (Yes/No, Sell/ Buy, Accept/Reject, etc.)

Simple, well recognized Works on small and large datasets

• Assumes a linear relationship Y = a log(X)+b

• Computes probabilities that can be used to

• Assumes a linear relationship

assess confidence of the prediction

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 21

Page 23: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

Comparison of Methods

Used For Pros Cons

CA

RT

R

ando

m

For

ests

Predicting a categorical outcome (quality rating 1--5, Buy/Sell/Hold) or a continuous outcome (salary, price, etc.)

• Can handle datasets without a linear relationship

• Easy to explain and interpret

Same as CART • Can improve accuracy over CART

• May not work well with small datasets

• Many parameters to adjust

• Not as easy to explain as CART

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 22

Page 24: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

Comparison of Methods

• Finding similargroups

• Clustering intosmaller groups andapplying predictivemethods on groups

Same as Hierarchical Clustering

Used For Pros Cons

Hie

rarc

hica

lC

lust

erin

g k-

mea

nsC

lust

erin

g

• No need to select • Hard to use with

number ofclusters a prioriVisualize with a

large datasets

dendrogram

• Works with anydataset size

• Need to selectnumber ofclusters beforealgorithm

15.071x – Seeing the Big Picture: Segmenting Images to Create Data 23

Page 25: Seeing the Big Picture - MIT OpenCourseWare · 2020-03-09 · Level 5 . 15.071x – Seeing the Big Picture: Segmenting Images to Create Data 9. Flower Dendrogram . 4 clusters 2 clusters

MIT OpenCourseWare https://ocw.mit.edu/

15.071 Analytics Edge Spring 2017

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.