Improving the Web Design Mining Web Data at Cityjob

31
Improving the Web Design Mining Web Data at Cityjob.com Hing-Po Lo, Linda Lu, Miriam Chan Department of Management Sciences [email protected] City University of Hong Kong, Hong Kong

description

Improving the Web Design Mining Web Data at Cityjob.com. Hing-Po Lo, Linda Lu, Miriam Chan. Department of Management Sciences. City University of Hong Kong, Hong Kong. [email protected]. I. Introduction. Customer Relationship Management. Data Mining. The Web. A. The Web. - PowerPoint PPT Presentation

Transcript of Improving the Web Design Mining Web Data at Cityjob

Page 1: Improving  the  Web Design  Mining Web Data at Cityjob

Improving the Web Design Mining Web Data at Cityjob.com

Hing-Po Lo, Linda Lu, Miriam Chan

Department of Management Sciences

[email protected]

City University of Hong Kong, Hong Kong

Page 2: Improving  the  Web Design  Mining Web Data at Cityjob

I. Introduction

Data Mining Customer Relationship Management

The Web

Page 3: Improving  the  Web Design  Mining Web Data at Cityjob

Worldwide Internet Commerce Revenues:Business and Consumer Segments,1996-2002

0100

200300400

500600

1996 1997 1998 1999 2000 2001 2002

Consumer Business-Business

A. The Web

US$B

• More than 200 millions surfers per day

• Huge volume of data captured from the Web

• Only 2% of web data analyzed

Page 4: Improving  the  Web Design  Mining Web Data at Cityjob

B. Customer Relationship Management

• DOT COM companies

• require the use of CRM to establish a personalized relationship with their customers

• work in an “information-intensive” and “ultra-competitive” mode

Page 5: Improving  the  Web Design  Mining Web Data at Cityjob

C. Data Mining Tools•There are many software and web vendors that may help to explore and mine the web log files.

•Most study the “clickstream” at the “session level”. In order to conduct CRM, one has to analyze the web log file at the “customer level”.

•A tailor-made software using SAS macro and Enterprise Miner has been developed.

Page 6: Improving  the  Web Design  Mining Web Data at Cityjob

Cityjob.COM

• It offers information on almost all posts available from major companies in HK.

• It receives on average over several thousand visitors per day.

Page 7: Improving  the  Web Design  Mining Web Data at Cityjob

Study Period:

11 December 2000 to 4 February 2001

Three types of data files:

• Web log files;

• Subscribers’ profiles;

• Jobs’ profiles.

II. The Data

Page 8: Improving  the  Web Design  Mining Web Data at Cityjob

#Software: Microsoft Internet Information Server 4.0

#Version: 1.0

#Date: 2000-12-11 00:00:00

#Fields: date time c-ip cs-username s-sitename s-computername s-ip cs-method cs-uri-stem cs-uri-query sc-status sc-win32-status sc-bytes cs-bytes time-taken cs(Cookie)

2000-12-11 00:00:00 208.223.166.3 - W3SVC4 PROD5_WEB 202.130.170.225 GET /default.asp - 200 0 15838 645 1297

RMID=d0dfa603398e0850;+CityjobID=LASTUPD=20001130&LOGIN=sloo;+IND=000;+OPN=000;+CTY=091;+RDB=c80200000000000000020028311b1b0000000000000000;+ASPSESSIO 

1. Web log files

Page 9: Improving  the  Web Design  Mining Web Data at Cityjob

User

ID

Age Sex Ed.

level

P.

income

H.

income

Country Marital

Status

Em.

Status

Occ.

cityjob94290 27 F SEC HK S FT CUS

cityjob94293 26 M DIP 2 HK S FT FIN

cityjob94338 28 F SEC HK S FT ACC

cityjob94345 34 M UC 8 9 HK M FT MGT

2. Subscribers’ profiles

Cont’d

Ind Reg. Date Interest

HOT 20001030 MKT

BNK 20001030 BANK, FIN, INVEST, MKT

OMF 20001030 ENTER, GAME, HKNEWS, PROPOMF

DPT 20001030 CNEWS, COMPU, ECON, ENTER, HKNEWS,

Page 10: Improving  the  Web Design  Mining Web Data at Cityjob

3. Jobs’ profiles

Job ID Title Type Work

Exp.

Quali. Industry Level

cityjobB7200 ORG. MANAGER

IT 4 UC BANK MID

cityjobAVU10 EXECUTIVE OFFICER II

LEG 3 DIP GOV JUN

cityjobB7040 ASST. ACCOUNTANT

ACC 5 SEC RET PRO

cityjobB7530 SALES EXECUTIVE

SAL 4 UC TDG JUN

Page 11: Improving  the  Web Design  Mining Web Data at Cityjob

Web log files

Subscribers’ files Jobs’ files

Page 12: Improving  the  Web Design  Mining Web Data at Cityjob

A: Reading the web log files

B: Cleaning the data files

C: Creating new variables

D: Merging the data files

E:   Prepare different SAS data files

SAS macros were written to perform the following tasks:

Page 13: Improving  the  Web Design  Mining Web Data at Cityjob

Useful Summary Information

A. Subscribers’ profiles

B. Jobs’ profiles

C. Web log files

D. Web log files + User ID

E. Web log files + Job ID

Page 14: Improving  the  Web Design  Mining Web Data at Cityjob

Relative Percentage of Count in Each Hour

0%

1%

2%

3%

4%

5%

6%

7%

8%

Time

Rela

tive

Perc

enta

ge

Page 15: Improving  the  Web Design  Mining Web Data at Cityjob

Job ID Title Industry Visit

No.Popularity

Index

cityjobCM070 OFFICER - CORPORATE BANKING

BNK 7748 100.0

cityjobC8570 ADMINISTRATIVE ASSISTANT

GOV 6552 84.6

cityjobCDU20 EXECUTIVE TRAINEE - INVESTMENT PRODUCTS

BNK 5148 64.9

cityjobCL580 CONTRACT HOUSING OFFICER

GOV 4944 63.8

cityjobCK570 EXECUTIVES FOR CORPORATE FINANCE

BNK 4664 60.2

The most popular jobs

Page 16: Improving  the  Web Design  Mining Web Data at Cityjob

Ⅲ. Collaborative Filtering1. By Association Rules

• Whenever a visitor enquires about a particular job, we can “cross sell” similar jobs by recommending other jobs that have the highest association with the original one.

• The association is based on the click history of all the visitors to the Web.

Page 17: Improving  the  Web Design  Mining Web Data at Cityjob

• Job A: cityjobCF520:

Title: Assistant Accountant; Qualification: Diploma; Working experience: one year

then• Job B: cityjobCF180:

Title: Assistant Accountant; Qualification: Diploma; Working experience: three year 

• Job C: cityjobCF100:

Title: Assistant Accountant; Qualification: University/College; Working experience: not specified 

• Job D: cityjobCEUJ0:

Title: Assistant Accountant; Qualification: Not specified; Working experience: two years

For example,if

Page 18: Improving  the  Web Design  Mining Web Data at Cityjob

This group of 4 jobs has a

• Confidence Value of 50.3% :

given a visitor enquires about job A, the probability that he would also enquire about jobs B, C, and D is 0.503;

• Lift Value of 298.46 :

if a visitor has enquired about job A, he is almost 300 times more likely to enquire about jobs B, C, and D than a visitor chosen at random.

Page 19: Improving  the  Web Design  Mining Web Data at Cityjob

2. By Popularity Index

• Job A: cityjobCDU20

Title: EXECUTIVE TRAINEE - INVESTMENT PRODUCTS, Type: FIN, Working Experience: 0, Qualification: UC, Industry: BNK, Level: JUN, Index of popularity: 64.9.

then (with same type, industry and qualification)

• Job B: cityjobCM470

Title: ASSOCIATE (TREASURY), Type: FIN, Working Experience: 3, Qualification: UC, Industry: BNK, Level: JUN, Index of popularity: 59.2.

•  Job C: cityjobCM470

Title: ASSOCIATES (CRM), Type: FIN, Working Experience: 2, Qualification: UC, Industry: BNK, Level: JUN, Index of popularity: 44.6.

•  Job D: cityjobCFLC0

Title: DEALER & INVESTOR ADVISOR, Type: FIN, Working Experience: 3, Qualification: UC, Industry: BNK, Level: PRO, Index of popularity: 36.6.

For example,if

Page 20: Improving  the  Web Design  Mining Web Data at Cityjob

Ⅳ. Predictive Models

1. Churn (Attrition) model

To identify subscribers with high likelihood of ceasing their current activity of visiting the Web site,thus the Cityjob.com can take action to retain them. It is often less expensive to retain them than it is to win them back.

2. Popular job model

What are the characteristics of jobs that would attract more visitors? Are they related to their job type and job industry?

Page 21: Improving  the  Web Design  Mining Web Data at Cityjob

1. The Churn (Attrition) Model

• Sample: All subscribers of Cityjob.com.

•  Dependent Variable: Visit = 1 if the subscriber has

visited the Cityjob.com during the study period;

Visit = 0 otherwise.

Page 22: Improving  the  Web Design  Mining Web Data at Cityjob

• Factors used: Gender; Age; Educational Level

dummy variables for interest and country;

no. of days since registration.

• Sampling procedure: Stratified sampling based on

the variable “Visit” is used to obtain equal number

of observations from the two groups of

subscribers (Y=1 and Y=0).

• Data partition: Training data 70%, Validation data 30%

Page 23: Improving  the  Web Design  Mining Web Data at Cityjob
Page 24: Improving  the  Web Design  Mining Web Data at Cityjob

• Lift Chart

 Churn model

(logistic regression )

important factors:

1. No. of days since registration;

2. Educational level,

3. Gender

4. Whether has interest in computer games or not.

Page 25: Improving  the  Web Design  Mining Web Data at Cityjob

2. The Popular Job Model

• Sample : All jobs advertised on the Cityjob.com.

•  Dependent Variable: Popular = 1 if the job has been

visited for at least 20 times, Popular = 0 otherwise.  

Page 26: Improving  the  Web Design  Mining Web Data at Cityjob

• Factors used: Dummy variables for different job types,

job industries, job level, qualification required,

working experience.

•  Data partition: Training data 70%, Validation data 30%

•  Missing values: missing values for working experience

and qualification required were replaced by 0 and

3 (Secondary school completed) respectively.

Page 27: Improving  the  Web Design  Mining Web Data at Cityjob
Page 28: Improving  the  Web Design  Mining Web Data at Cityjob

• Lift Chart

popular job model

(logistic regression )

Important factors:

1. higher qualification(more likely)

2. higher level (more likely)

3. jobs industries:

accounting, banking, building ,

construction ( more likely )

4. jobs types:

  art/design/creative, engineering,

sales (less likely)

Page 29: Improving  the  Web Design  Mining Web Data at Cityjob

1. Web Design 

a. To develop a collaborative filtering system

b. To include a popularity index

Ⅴ. Recommendation

2. Marketing Strategies

a. To develop appropriate marketing strategiesfor customer retention

b. To develop Cityjob.com’s own web monitorsystem

Page 30: Improving  the  Web Design  Mining Web Data at Cityjob

Ⅵ.Unexpected Discovery

There was a user who came everyday during the study period at exactly the same time (4:00 a.m. HK time) and stayed for one to three hours browsing more than 500 pages each time (average 5 sec. per page).

Page 31: Improving  the  Web Design  Mining Web Data at Cityjob

The End