Discrete data mapping
-
Upload
d-dutta-roy -
Category
Data & Analytics
-
view
597 -
download
1
Transcript of Discrete data mapping
Discrete Data Mapping : Problem of HR-Analytics
Debdulal Dutta Roy, Ph.D. (Psy.)Psychology Research Unit
INDIAN STATISTICAL INSTITUTE, KOLKATAWorkshop : QIP-
STC (AICTE) on HR Analytics- hands on Training.VGSOM, IIT., Kharagpur
11.5.2015
HR analytics and Discrete data• HR-analytics cover two approaches broadly - association and
predictive. Discrete data mapping follows former. It is a multivariate statistical model to explore association of different data points. Association of discrete data forms neighbourhood. The map provides knowledge about distances among neighbourhoods, e.g., neighbourhoods of human resource activities (recruitment, training, placement, promotion, incentives etc.) and that of employee performance (attrition, engagement etc.). The model is useful for big data (data of multiple companies). In this model, multi dimensional data are plotted on bi-dimensional plot. This technique allows organizations to decide on relationships and trends and predict future behaviors or events.
Truth is that you can measure• Truth=Response – Error• Any response is affected by fixed or random errors.• Errors can be controlled by sampling, controlling
environment, instruments, statistics. • Any response can be measured by discrete and continuous
data.• Discrete data can not be fractioned but Continuous data can
be fractioned.• Discrete data can be calculated by frequency or percentage.• Both types of data can be interchanged by transformation.• Transformation looses important properties of original data.
D. Dutta Roy, ISI., Kolkata
Discrete VS Continuous
• Discrete data can be numeric -- like numbers of apples -- but it can also be categorical -- like red or blue, or male or female, or good or bad. Continuous data are not restricted to defined separate values, but can occupy any value over a continuous range.
Lecture notes: Discrete Data Mapping by D. Dutta Roy, ISI., Kolkata
HR Analytics
• HR analytics data include heads (number of people) of recruitment, training, placement, promotion, incentives etc. and those of their performance like attrition, engagement etc.
• Analytics can prepare, one, two or multi-way tables.
• Stem-leaf plot can be used to map discrete data.
D. Dutta Roy, ISI., Kolkata
Stem-Leaf Plot of One-way table of Discrete data
D. Dutta Roy, ISI., Kolkata
Two-Way table or Crosstabulation• Cross tabulation is a combination of two (or more) frequency tables
arranged such that each cell in the resulting table represents a unique combination of specific values of crosstabulated variables.
• Thus, crosstabulation allows us to examine frequencies of observations that belong to specific categories on more than one variable.
• By examining these frequencies, we can identify relations between crosstabulated variables. Only categorical (nominal) variables or variables with a relatively small number of different meaningful values should be crosstabulated.
• Note that in the cases where we do want to include a continuous variable in a crosstabulation (e.g., income), we can first recode it into a particular number of distinct ranges (e.g., low, medium, high).
• Cross tabulation can be computed through Pivot table in MS-Excel .
Histogram of Two-way table
Test of Significance
• The Pearson Chi-square is the most common test for significance of the relationship between categorical variables.
• Coefficient Phi: It is a measure of correlation between two categorical variables in a 2 x 2 table. Its value can range from 0 (no relation between factors; Chi-square=0.0) to 1 (perfect relation between the two factors in the table).
Coefficient of Contingency
• The coefficient of contingency is a Chi-square based measure of the relation between two categorical variables (proposed by Pearson, the originator of the Chi-square test). Its advantage over the ordinary Chi-square is that it is more easily interpreted, since its range is always limited to 0 through 1 (where 0 means complete independence).
Correspondence Analysis
• The Crosstabs procedure offers several measures of association and tests of association but cannot graphically represent any relationships between the variables.
• Correspondence analysis is to describe the relationships between two nominal variables in a correspondence table in a low-dimensional space.
Frequency Table (N=902 respondents)
Reasons for work preference 0 1 2 3 4 5 6Total
Achievement 6 31 115 236 265 201 48 902
Application 1 20 50 126 274 296 135 902
Knowledge 3 22 68 156 239 304 110 902
Aesthetic 29 146 249 270 155 43 10 902
Affiliation 29 219 320 202 109 23 0 902
Harm avoidance 85 417 239 100 45 13 3 902
Recognition 10 108 258 299 141 72 14 902
0:least important; 1:Less important; 2: Important; 4:More important; 5:Most important
Frequency distribution provides information about data grouping
Neighbourhood
• In the frequency table, there are 6 column and 7 Row variables. Neighbourhood can be formed by clustering the row, column and row- column correspondence.
• So, partitioning in the row and column variables is important .
Correspondence of row and col variables
Scoring Categories
0 1 2 3 4 5 6 Total
f % f % f % f % f % f % f %
Achievement 6 3.68 31 3.22 115 8.85 236 16.99 265 21.58 201 21.11 48 15 902
Application 1 0.61 20 2.08 50 3.85 126 9.07 274 22.31 296 31.09 135 42.19 902
Knowledge 3 1.84 22 2.28 68 5.23 156 11.23 239 19.46 304 31.93 110 34.38 902
Aesthetic 29 17.79 146 15.16 249 19.17 270 19.44 155 12.62 43 4.52 10 3.13 902
Affiliation 29 17.79 219 22.74 320 24.63 202 14.59 109 8.88 23 2.42 0 0 902
Harm avoidance 85 52.15 417 43.3 239 18.4 100 7.2 45 3.66 13 1.37 3 0.94 902
Recognition 10 6.13 108 11.21 258 19.86 299 21.53 141 11.48 72 7.59 14 4.38 902
Total 163 100 963 100 1299 100 1389 100 1228 100 952 100 320 100 6314
Neighbourhood Data Mapping (N=902)
Lecture note: Discrete Data Mapping by D. Dutta Roy, ISI., Kolkata
Where in Chi-Square fails, this model works(Job Analysis Data, N=200)
Lecture note: Discrete Data Mapping by D. Dutta Roy, ISI., Kolkata
Thank You