Predictive Analytics for Franchise Expansion · Service Restaurant) franchise, identify a set of...

1
Predictive Analytics for Franchise Expansion Helping franchises identify ideal neighbourhoods for expansion through data mining techniques CASE STUDY - Williams Fresh Cafe We begin by obtaining an existing set of loca- tions of Williams Fresh Cafe, from which we will build our model on. The figure to our left shows the current locations where Williams currently has their storefronts. A quick glance tells us that these locations are predominately based around South-Western Ontario. 1 We then obtain a set of new locations after running our algorithm. Shown above, we see an overlap between the new locations and the ex- isting franchise locations. However, the set of predictions also includes locations within Ottawa and Kingston, and it is exactly these locations we are most interested in because they represent uncaptured market-share for Williams. Finally, we examine the new locations outside of the overlapping set for further analysis. In the city of Ottawa, see that a handful of the new recommended locations are close to post-secondary educational establishments as well as central business districts. In Kingston, we can see that the new recommended locations are also located close to post- secondary educational establishments. In general, the predicted set can be interpreted as a short list of possible locations for Williams to expand into, thus greatly reducing the amount of time required to make franchise expansion decisions, as well as helping management make more informed decisions through data-driven analysis. 1. BACKGROUND Current Site Selection Procedure: i. Slow (takes weeks) ii. Labour intensive iii. Inconsistent and unscientific 2. OBJECTIVE Given a set of existing locations for a QSR (Quick Service Restaurant) franchise, identify a set of favourable neighbourhoods for further expansion within Ontario. 3. APPROACH Utilize Ontario demographic data, existing storefront location data, and locations of post-secondary institutions for the purpose of performing data mining algorithms. 4. MODEL i. Consolidate, cleanse, and standardize the data sets ii. Identify key demographic attributes that improve model performance (including but not limited to): Population Age Gender Minority Status Occupation Distance to Educational Establishments Mode of Transportation Income Marital Status Number of Children Education Attainment Type of Household 1 2 3 4 2 3 4 & iv. A/B Testing of various attributes for the purpose of improving model accuracy v. For each existing franchise store location, calculate Euclidean distance between itself and predicted neighbourhoods in Ontario (with p(success) ≥ 95%) to determine similar neighbourhoods in terms of demographic features vi. Find the predicted neighbourhoods in Ontario that appear most often in the Euclidean distance calculations across all of a franchise’s store locations filtered by lowest Euclidean distance value 5. RESULTS Using 10-Fold Cross Validation, our Logistic Regression model has a classification accuracy of 76% for Williams, 64% for Second Cup and 51% for Tim Horton’s. 6. IMPACT 7. CONCLUSION Data mining can be an effective tool to help franchises identify new neighbourhoods to expand into while expecting a similar set of demographics to their existing locations. However, demographic information is not the only determinant for predicting locations. Other important factors to look at are the number of competitors, real- estate availability, and intangible factors such as management quality. There are still challenges with a data-driven approach as it relates to site expansion, as quality of data plays a pivotal role. DATA SOURCE Core Demographic Data - Ontario 2013 Projected Data & RSI Canada Existing Storefront Locations Williams Website Second Cup Website Tim Horton’s Website Post Secondary Institutions Ministry of Education Team 11: Chad Xu, Jason Wang, Jonathan Ong, Lamin Ceesay Faculty Advisors: Dr. Mark D. Smucker, Dr. Lukasz Golab Sponsor: Piinpoint (Jim Robeson) iii. Utilize logistic regression to obtain a set of probabilistic likelihood of “success” for all Ontario neighbourhoods AREA IMPACT Ethical Increased chance of capitalistic abuse of ethnic and gender information Health Higher exposure to fast foods Safety Minimize financial risks for franchises looking to expand operations Society Identify markets for new service

Transcript of Predictive Analytics for Franchise Expansion · Service Restaurant) franchise, identify a set of...

Page 1: Predictive Analytics for Franchise Expansion · Service Restaurant) franchise, identify a set of favourable neighbourhoods for further expansion within Ontario. 3. APPROACH Utilize

Predictive Analytics for Franchise Expansion Helping franchises identify ideal neighbourhoods for expansion through data mining techniques

CASE STUDY - Williams Fresh Cafe

We begin by obtaining an existing set of loca-

tions of Williams Fresh Cafe, from which we will

build our model on. The figure to our left shows

the current locations where Williams currently has

their storefronts. A quick glance tells us that these

locations are predominately based around

South-Western Ontario.

1

We then obtain a set of new locations after

running our algorithm. Shown above, we see an

overlap between the new locations and the ex-

isting franchise locations. However, the set of

predictions also includes locations within Ottawa

and Kingston, and it is exactly these locations we

are most interested in because they represent

uncaptured market-share for Williams.

Finally, we examine the new locations outside of the overlapping set for further

analysis. In the city of Ottawa, see that a handful of the new recommended locations are

close to post-secondary educational establishments as well as central business districts. In

Kingston, we can see that the new recommended locations are also located close to post-

secondary educational establishments. In general, the predicted set can be interpreted as

a short list of possible locations for Williams to expand into, thus greatly reducing the amount

of time required to make franchise expansion decisions, as well as helping management

make more informed decisions through data-driven analysis.

1. BACKGROUND Current Site Selection Procedure:

i. Slow (takes weeks)

ii. Labour intensive

iii. Inconsistent and unscientific

2. OBJECTIVE Given a set of existing locations for a QSR (Quick

Service Restaurant) franchise, identify a set of

favourable neighbourhoods for further expansion within

Ontario.

3. APPROACH Utilize Ontario demographic data, existing storefront

location data, and locations of post-secondary

institutions for the purpose of performing data mining

algorithms.

4. MODEL i. Consolidate, cleanse, and standardize the data sets

ii. Identify key demographic attributes that improve

model performance (including but not limited to):

Population

Age

Gender

Minority Status

Occupation

Distance to Educational

Establishments

Mode of Transportation

Income

Marital Status

Number of Children

Education Attainment

Type of Household

1

2

3

4

2

3 4 &

iv. A/B Testing of various attributes for the purpose of

improving model accuracy

v. For each existing franchise store location, calculate

Euclidean distance between itself and predicted

neighbourhoods in Ontario (with p(success) ≥ 95%) to

determine similar neighbourhoods in terms of

demographic features

vi. Find the predicted neighbourhoods in Ontario that

appear most often in the Euclidean distance calculations

across all of a franchise’s store locations filtered by lowest

Euclidean distance value

5. RESULTS Using 10-Fold Cross Validation, our Logistic Regression

model has a classification accuracy of 76% for Williams,

64% for Second Cup and 51% for Tim Horton’s.

6. IMPACT

7. CONCLUSION Data mining can be an effective tool to help franchises

identify new neighbourhoods to expand into while

expecting a similar set of demographics to their existing

locations. However, demographic information is not the

only determinant for predicting locations. Other important

factors to look at are the number of competitors, real-

estate availability, and intangible factors such as

management quality. There are still challenges with a

data-driven approach as it relates to site expansion, as

quality of data plays a pivotal role.

DATA SOURCE

Core Demographic Data -

Ontario

2013 Projected Data & RSI

Canada

Existing Storefront

Locations

Williams Website

Second Cup Website

Tim Horton’s Website

Post Secondary

Institutions

Ministry of Education

Team 11: Chad Xu, Jason Wang, Jonathan Ong, Lamin Ceesay

Faculty Advisors: Dr. Mark D. Smucker, Dr. Lukasz Golab

Sponsor: Piinpoint (Jim Robeson)

iii. Utilize logistic regression to obtain a set of

probabilistic likelihood of “success” for all Ontario

neighbourhoods

AREA IMPACT

Ethical Increased chance of capitalistic abuse of ethnic

and gender information

Health Higher exposure to fast foods

Safety Minimize financial risks for franchises looking to

expand operations

Society Identify markets for new service