Group 7_DMBD Term Project

download Group 7_DMBD Term Project

of 14

Transcript of Group 7_DMBD Term Project

  • 8/10/2019 Group 7_DMBD Term Project

    1/14

    By Nancy (

    Subhash Rajeev (1Vishnu Poduval (1

    Vishal Wagh(Eswar Sunil Kumar (

    Aaron Ernest (

  • 8/10/2019 Group 7_DMBD Term Project

    2/14

    Objectives of the study

    BusinessObjective

    To gauge the buying behavior of customers from an ecommerce point of viewand thereby to identify the major issues that prevent different classes of usersfrom using the internet for making purchases

    Factors UnderConsideration

    General demographics of the users

    Technology demographics of the users

    Internet shopping habits

    Web and Internet Usage habits

    Source ofdatabase

    GVUCs 8th WWW user survey( Run from October 10, 1997 to November 16,1997)

    Special pointers provided to the by Yahoo, Netscape and WebTV

  • 8/10/2019 Group 7_DMBD Term Project

    3/14

    Mining Main challenge faced - Branching

    - Categorization of the dataGeneral

    Technology

    Internet Usage

    Privacy

    E-commerce

    Gender

    Connection

    Speed &

    Upgrades

    Indispensable

    Technologies

    Cookie

    Privacy

    Reasons for Using the Web

    for Personal Shopping

    Primary

    Language

    Email

    accounts Frequency of Use Internet Laws

    Time Spent Searching

    Personal Shopping

    Registered to

    Vote

    Equipment

    Owned

    Navigation

    Services

    Content

    Providers

    have right to

    resell user

    information

    Success Rate of Personal

    Shopping

    Imp Issue

    Facing the

    Internet

    Frequency of

    Switching

    Browsers

    Years on

    Internet

    Which arewho have/online? W

    What are t

    for people What is th

    reasons wiclassificatgeography

    Is there a spent on thabits?

    How muchmatter in t

    customers

  • 8/10/2019 Group 7_DMBD Term Project

    4/14

    Dataset- Numeric, stores in ASCII file

    No of datasets- 10044. Fields- 60 After Cleaning, no of datasets- 7290, no of training datasets- 1822, no of test datasets- 5

    The training dto develop the

    later tested onfields are highleach other.In most of suchthem have bee

    Overview of Dataset

    Variables(Demographic Segmentation)

    Variables(Psychographic Segmentation)

    Age Years on Internet

    Country Major Occupation

    Gender Who_pays_for_access

    Education_Attainment Willingness_to_Pay_Fees

    Race Most_important_Issue_Facing_the_Internet

    Major Geographical Location Sexual Preference

    Marital Status Primary_Place_of_WWW Access

    Household Income

    The entire recoded asindex to thin an addit

    file

  • 8/10/2019 Group 7_DMBD Term Project

    5/14

    MethodologyData cleaning

    Sampling Training and test

    Demographic segmentation models trained

    C 5.0 with balancing

    C 5.0 with balancing and boosting

    C 5.0 with balancing, boosting and misclassification costs

    Psychographic segmentation models trained

    C 5.0 balanced

    C 5.0 Boosted C 5.0 misclassified

    Neural network

    Boosted Neural network

    Testing of models

    Interpreting results of most accurate and suitable model

    Target variWhether uonlineVariable tyDistributio78% Yes, 2

    Factor of baMisclassific

    demographNo. of trial

    Other models were used as well, only these have been represented in the PPT for having the mostconfusion matrices and lift curves

    Sampling mTraining Testing D

    EvaluationGains chamatrices

  • 8/10/2019 Group 7_DMBD Term Project

    6/14

    MethodologyModels usedDemographic Psychographi

  • 8/10/2019 Group 7_DMBD Term Project

    7/14

    ResultsDemographic

    C 5.0 with balancing

    C 5.0 with balancing, boostingand misclassification costs

    C 5.0 with balancing and boosting

    C 5.0 with balancing, boosting and misclassification costs has the highest model accuracy for te

    It also has the highest number of true positives

  • 8/10/2019 Group 7_DMBD Term Project

    8/14

    ResultsDemographic

    C 5.0 with balancing

    C 5.0 with balancing, boostingand misclassification costs

    C 5.0 with balancin

    C 5.0 with balancing, boosting and misclassification costs has the highest amount of lif

    l

  • 8/10/2019 Group 7_DMBD Term Project

    9/14

    ResultsDemographic- Balanced C 5.0 model with boosting and misc

    penalties

  • 8/10/2019 Group 7_DMBD Term Project

    10/14

    ResultsPsychographic

    Normal C 5.0 with balancing

    C 5.0 with boosting

    C 5.0 with misclassification penalties

    Simple

    Neural netw

  • 8/10/2019 Group 7_DMBD Term Project

    11/14

    ResultsPsychographic

    Normal C 5.0 with balancing

    C 5.0 with boosting

    C 5.0 with misclassification penalties

    Simple neural network

    Neural network with

  • 8/10/2019 Group 7_DMBD Term Project

    12/14

    ResultsPsychographic Simple neural network

    Time spent on the internet seems to be a crucial factor when itcomes to understanding what factors come into considerationwhen deciding whether to purchase online

  • 8/10/2019 Group 7_DMBD Term Project

    13/14

    Interpretation and Concluding Rem Thus we can see that age, years spent on the internet

    and primary place of internet access are the most

    important characteristics that decide whether usersbuy online

    Surprisingly, the purchasing power or monthly incomeand occupation had a much lower importance when itcame to predicting whether a user would purchaseonline

    However, since this data is from 1997, at a time whenusers were new to the internet, it makes sense thatfamiliarity with the medium was necessary in order toreach consumers

    E-commerce shoppers during this era were earlyadoptersin the product lifecycle and were technology

    savvy and already used to the internet. They trusted itas well.

    E- commerce websites ther

    only advertise online since the cost of advertising as agsince their buyers were ma

    familiar with th

  • 8/10/2019 Group 7_DMBD Term Project

    14/14