Discrete data mapping

Discrete Data Mapping : Problem of HR-Analytics

Debdulal Dutta Roy, Ph.D. (Psy.)Psychology Research Unit

INDIAN STATISTICAL INSTITUTE, KOLKATAWorkshop : QIP-

STC (AICTE) on HR Analytics- hands on Training.VGSOM, IIT., Kharagpur

11.5.2015

HR analytics and Discrete data• HR-analytics cover two approaches broadly - association and

predictive. Discrete data mapping follows former. It is a multivariate statistical model to explore association of different data points. Association of discrete data forms neighbourhood. The map provides knowledge about distances among neighbourhoods, e.g., neighbourhoods of human resource activities (recruitment, training, placement, promotion, incentives etc.) and that of employee performance (attrition, engagement etc.). The model is useful for big data (data of multiple companies). In this model, multi dimensional data are plotted on bi-dimensional plot. This technique allows organizations to decide on relationships and trends and predict future behaviors or events.

Truth is that you can measure• Truth=Response – Error• Any response is affected by fixed or random errors.• Errors can be controlled by sampling, controlling

environment, instruments, statistics. • Any response can be measured by discrete and continuous

data.• Discrete data can not be fractioned but Continuous data can

be fractioned.• Discrete data can be calculated by frequency or percentage.• Both types of data can be interchanged by transformation.• Transformation looses important properties of original data.

D. Dutta Roy, ISI., Kolkata

Discrete VS Continuous

• Discrete data can be numeric -- like numbers of apples -- but it can also be categorical -- like red or blue, or male or female, or good or bad. Continuous data are not restricted to defined separate values, but can occupy any value over a continuous range.

Lecture notes: Discrete Data Mapping by D. Dutta Roy, ISI., Kolkata

HR Analytics

• HR analytics data include heads (number of people) of recruitment, training, placement, promotion, incentives etc. and those of their performance like attrition, engagement etc.

• Analytics can prepare, one, two or multi-way tables.

• Stem-leaf plot can be used to map discrete data.


Stem-Leaf Plot of One-way table of Discrete data


Two-Way table or Crosstabulation• Cross tabulation is a combination of two (or more) frequency tables

arranged such that each cell in the resulting table represents a unique combination of specific values of crosstabulated variables.

• Thus, crosstabulation allows us to examine frequencies of observations that belong to specific categories on more than one variable.

• By examining these frequencies, we can identify relations between crosstabulated variables. Only categorical (nominal) variables or variables with a relatively small number of different meaningful values should be crosstabulated.

• Note that in the cases where we do want to include a continuous variable in a crosstabulation (e.g., income), we can first recode it into a particular number of distinct ranges (e.g., low, medium, high).

• Cross tabulation can be computed through Pivot table in MS-Excel .

http://www.statsoft.com/textbook/statistics-glossary/n.aspx?button=n#Nominal%20Scale

Histogram of Two-way table

Test of Significance

• The Pearson Chi-square is the most common test for significance of the relationship between categorical variables.

• Coefficient Phi: It is a measure of correlation between two categorical variables in a 2 x 2 table. Its value can range from 0 (no relation between factors; Chi-square=0.0) to 1 (perfect relation between the two factors in the table).

Coefficient of Contingency

• The coefficient of contingency is a Chi-square based measure of the relation between two categorical variables (proposed by Pearson, the originator of the Chi-square test). Its advantage over the ordinary Chi-square is that it is more easily interpreted, since its range is always limited to 0 through 1 (where 0 means complete independence).

Correspondence Analysis

• The Crosstabs procedure offers several measures of association and tests of association but cannot graphically represent any relationships between the variables.

• Correspondence analysis is to describe the relationships between two nominal variables in a correspondence table in a low-dimensional space.

Frequency Table (N=902 respondents)

Reasons for work preference 0 1 2 3 4 5 6Total

Achievement 6 31 115 236 265 201 48 902

Application 1 20 50 126 274 296 135 902

Knowledge 3 22 68 156 239 304 110 902

Aesthetic 29 146 249 270 155 43 10 902

Affiliation 29 219 320 202 109 23 0 902

Harm avoidance 85 417 239 100 45 13 3 902

Recognition 10 108 258 299 141 72 14 902

0:least important; 1:Less important; 2: Important; 4:More important; 5:Most important

Frequency distribution provides information about data grouping

Neighbourhood

• In the frequency table, there are 6 column and 7 Row variables. Neighbourhood can be formed by clustering the row, column and row- column correspondence.

• So, partitioning in the row and column variables is important .

Correspondence of row and col variables

Scoring Categories

0 1 2 3 4 5 6 Total

f % f % f % f % f % f % f %

Achievement 6 3.68 31 3.22 115 8.85 236 16.99 265 21.58 201 21.11 48 15 902

Application 1 0.61 20 2.08 50 3.85 126 9.07 274 22.31 296 31.09 135 42.19 902

Knowledge 3 1.84 22 2.28 68 5.23 156 11.23 239 19.46 304 31.93 110 34.38 902

Aesthetic 29 17.79 146 15.16 249 19.17 270 19.44 155 12.62 43 4.52 10 3.13 902

Affiliation 29 17.79 219 22.74 320 24.63 202 14.59 109 8.88 23 2.42 0 0 902

Harm avoidance 85 52.15 417 43.3 239 18.4 100 7.2 45 3.66 13 1.37 3 0.94 902

Recognition 10 6.13 108 11.21 258 19.86 299 21.53 141 11.48 72 7.59 14 4.38 902

Total 163 100 963 100 1299 100 1389 100 1228 100 952 100 320 100 6314

Neighbourhood Data Mapping (N=902)

Lecture note: Discrete Data Mapping by D. Dutta Roy, ISI., Kolkata

Where in Chi-Square fails, this model works(Job Analysis Data, N=200)

Lecture note: Discrete Data Mapping by D. Dutta Roy, ISI., Kolkata

Thank You

Discrete data mapping

Data & Analytics

Transcript of Discrete data mapping