Rajia cluster analysis

18
REHANA RAJ DFK1307 DEPT OF FISH PROCESSING TECHNOLOGY COLLEGE OF FISHERIES MANGALORE CLUSTER ANALYSIS

Transcript of Rajia cluster analysis

Page 1: Rajia cluster analysis

REHANA RAJ

DFK1307

DEPT OF FISH PROCESSING TECHNOLOGY

COLLEGE OF FISHERIES

MANGALORE

CLUSTER ANALYSIS

Page 2: Rajia cluster analysis

Cluster Analysis is a multivariate statistical techniques

in which large data set is segregated into several

groups based on homogeneity or similarity measures

Cluster Analysis make sensible and informative

classification of an initially unclassified set of data

with desired accuracy, using the variable values

observed on each individual

It saves lot of resource in terms of time, money etc

Page 3: Rajia cluster analysis

Before clustering After clustering

Page 4: Rajia cluster analysis

To assign observations to groups (‘clusters’)

To divide the observations into homogenous and

distinct groups

To reduce the complexity of data

Page 5: Rajia cluster analysis

Generates several groups of data set which are similar

Homogeneous within the group and as much as

possible heterogeneous to other groups

Normally, data consists of objects or persons

Segregation is done based on more than two

variables.

Page 6: Rajia cluster analysis

Hierarchical Clustering

Centroid-based clustering

Distribution-based clustering

Density-based clustering

Page 7: Rajia cluster analysis

Hierarchical clustering is a method of cluster analysis which

seeks to build a hierarchy of clusters.

Two types:

Agglomerative (bottom-top):

◦ Start with each document being a single cluster.

◦ Eventually all documents belong to the same cluster.

Divisive (top-bottom):

◦ Start with all documents belong to the same cluster.

◦ Eventually each node forms a cluster on its own.

No. of clusters need not be k.

Page 8: Rajia cluster analysis

Construction of a tree-based hierarchical diagram

usually called dendrogram. E.g., In case of taxonomy

classificationanimal

vertebrate

fish reptile amphib. mammal worm insect crustacean

invertebrate

Page 9: Rajia cluster analysis

In this clustering, clusters are

represented by a central

vector, which may not

necessarily be a member of

the data set.

Aims to partition on

observations into k clusters.

Each observation belongs to

the cluster with the nearest

mean.

Here, the no. of clusters is

fixed to k(k-means clustering)

Page 10: Rajia cluster analysis

Clusters can be defined as objects belonging to same

distribution.

It provides correlation and dependence of attributes.

Page 11: Rajia cluster analysis

Clusters are based on density.

Objects in these sparse areas - that are required to separate

clusters - are usually considered to be noise and border

points.

The most popular density based clustering method is

DBSCAN (density-based spatial clustering of applications

with noise).

OPTICS (Ordering Points To Identify the Clustering

Structure) is a generalization of DBSCAN that handles

different densities much better way.

Page 12: Rajia cluster analysis

Density-based clustering

with DBSCAN.

DBSCAN assumes clusters of

similar density, and may have

problems separating nearby

clusters

OPTICS is a DBSCAN variant

that handles different densities

much better

Page 13: Rajia cluster analysis

1. Forming the clusters from the given data set – resulting

in a new variable that identifies cluster members among

the cases (one phase cluster)

2. Description of clusters by re-crossing with the data

(Two phase cluster)

Page 14: Rajia cluster analysis

FISH CUTLET

FISH FINGER

FISH BURGER

VALUE ADDED

PRODUCTS

One phase cluster

Forming of clusters by the chosen data set

Page 15: Rajia cluster analysis

FISH CUTLET

Seer fish Mackerel

Baked Fried

Two phase cluster

Third phase cluster

Page 16: Rajia cluster analysis

Cuts down the cost of preparing a sampling frame and other administrative factors.

No special scales of measurement necessary

Visual graphic provides clear understanding of the clusters.

Disadvantages:

Choice of cluster-forming variables often not based on

theory but at random

In some cases, determination of clusters is difficult to decide.

Advantages :

Page 17: Rajia cluster analysis

Marketing: Help marketers to discover distinct groups in their

customer bases, and then use this knowledge to develop targeted

marketing programs

Land use: Identification of areas of similar land use in an earth

observation database

Insurance: Identifying groups of motor insurance policy holders

with a high average claim cost

City-planning: Identifying groups of houses according to their

house type, value, and geographical location

Earth-quake studies: Observed earth quake epicenters should be

clustered along continent faults

Page 18: Rajia cluster analysis

for your kind attention!