FORDHAM UNIVERSITY Graduate School of Business Marketing … · FORDHAM UNIVERSITY Graduate School...
Transcript of FORDHAM UNIVERSITY Graduate School of Business Marketing … · FORDHAM UNIVERSITY Graduate School...
FORDHAM UNIVERSITY
Graduate School of Business
Marketing Decision Models, MKGB 77AA
Professor: Mohammad G. Nejad, Ph.D.
Strategic Analysis of a School Supply Vending Machine Campaign in New York City
Team Members:
Anqi Wang
Dongqi Wang
Hannah Parker
Guillermo Ponte
Sean Scott
Dear CMO,
At Staples, we are committed to helping our customers every day. We want to grow our
businesses, increase our profitability and advance in key strategic initiatives. This has historically
enabled us to stay competitive and create more value for our shareholders. We want to offer our
customers the best products and services. We operate a tailored portfolio with unique
characteristics for each unique location. Also, we want to make our customers’ lives easier,
because we understand that time is very valuable for them. Therefore, we want to not only
enhance their shopping experiences but also seamlessly integrate it with their day to day lives.
Profitable companies of the 21st century will be those that align the needs of their business with
the needs of the world around them. As marketing employees at Staples, we want to focus on
two pillars: Accessibility and Affordability.
Our top priority is to continue to improve service and value. We want to be accessible for our
customers, because we understand how busy they are. On the other hand, Staples understands
that customers have different incomes levels and necessities. Thus, we want to offer a range of
products, perfect for each unique demographic. Our biggest opportunities to be more accessible
is through cost savings in areas such as supply chain, merchandising, store operations and real
estate, marketing, salesforce, business process and IT outsourcing, and customer service.
In 2015, Staples saw sales decrease from the previous year. Now, for 2016, we want to change
this. We want to increase our sales significantly through a new, exciting sales program targeted
to urban areas. We remain focused on optimizing our retail square footage in North America
through store closures and improved productivity. The proposed installation of 250 vending
machines around NYC area will allow us to accomplish these goals.
Our corporate responsibility programs are a critical part of our customer commitment. We
understand that our customers are more aware of the environment, and we want to offer them
new programs to reduce their carbon footprint. We want to introduce a new holistic marketing
campaign where our customers will buy eco-friendly products and recycle their old supplies in a
completely new shopping environment. I invite you to read more about our corporate
responsibility and the marketing campaign on the following pages.
Sincerely,
Staples Marketing Team
Table of Contents
I. Executive Summary…………………………………………………………………. 1
II. Project Flow Chart……………………………………………………………………5
III. Phase 1 - Data Analysis:
a. Summary…………………………………………………………………………. 6
b. Memo to the Manager…………………………………………………………..... 7
c. Analysis Flow Chart……………………………………………………………….8
d. Analysis: Internal………………………………………………………………… 9
e. Analysis: External……………………………………………………………….. 15
IV. Phase 2 - Strategic Analysis:
a. Summary………………………………………………………………………… 21
b. Company Overview……………………………………………………………... 22
c. Market Overview…………………………………………………………….….. 24
d. Opportunity……………………………………………………………………….25
e. Target Customer………………………………………………………………… 29
f. Goals…………………………………………………………………………….. 31
V. Phase 3 - Marketing Metrics
a. Summary…………………………………………………………………………..34
b. Metric Overview (Table) ………………………………………………………... 35
c. Metrics Model (Figure) …………………………………………………………. 38
d. Marketing Dashboard……………………………………………………………. 39
VI. Limitations………………………………………………………………………..….. 40
VII. Works Cited………………………………………………………………………….. 41
VIII. Appendices:
a. Tables and Figures……………………………………………………………….. 42
b. SAS Code……………………………………………………………………….... 61
1
Executive Summary
For the past 30 years Staples Inc. has found success as a trusted supplier of office and
school related products to both businesses and individual customers. Staples’ in-store copying
and faxing services as well as its established B2B division have helped build a large and
expansive customer base. Staples’ main strengths include its large product portfolio and
numerous retail channels which include stores, catalogs, and a website. Despite Staples’ success,
the office supplies industry is currently in decline. The majority of office work shifting online
has threatened paper, pens, and printer ink with obsolescence. Staples is currently fighting to
merge with Office Depot in order to effectively compete with online retailers such as
Amazon. Staples is a heavy hitting retailer, but opportunities for further growth are certainly
available.
Based on marketing intelligence formed from analysis on internal Staples data, external
demographic data, and industry secondary reports, it is recommended that Staples move forward
with a plan to begin offering their products in Staples vending machines. The vending machines
will initially be rolled out in the five boroughs of New York City and will include all green
products and a recycling receptacle. This original strategy will bolster Staples’ multi-channel
approach and help Staples to ultimately reduce costly retail space.
Performing analysis on internal Staples data was the first step in developing the
marketing intelligence project. The dataset included information on 14,448 orders from 10,000
different households. The dataset was first aggregated by customer ID, and detailed number of
orders, sum revenue, and days since last order for each customer. The customers were then
classified as either “High Level”, “Average”, or “Low Level” based on their revenue. Regression
models were then created using the aggregated dataset that modelled revenue. Findings of the
2
regression included purchases in quarter 3 are more profitable than any other quarters, purchases
made via web are more profitable than purchase made through other channels, and purchases
made with a credit card are more profitable than purchases made with any other payment method.
The next stage of the internal data analysis involved finding key customer segments to
target. Customers were clustered by number of orders, sum revenue, and days since last
order. The first segment made up three quarters of the dataset and was characterized by their
light use of Staples. The second segment was the smallest of the four, making up less than a
percentage of the dataset. The third segment made up about a fifth of the dataset and customers
in this segment were characterized by their consistent use of Staples. The fourth and final
segment made up about a fifth of the dataset and consisted of slightly more desirable customers
than segment 3.
In order to further assess Staples’ current position in the market, external data was
collected. Census data was obtained that detailed demographic variables for all New York City
zip codes. The goal was to ascertain what New York City looks like in terms of Staples’ target
market. It was found that all five boroughs of New York fall well above national averages in
terms of income, education level, and student enrollment. Additional analysis involved
understanding the geography of the city in terms of demographic characteristics and behavior, in
order to efficiently roll out the campaign. Zip codes were clustered using the variables number
of students enrolled in Pre-K through 12th
grade, median household income, and percent of
population with a Bachelor’s degree or higher. Four clusters were found. The four identified
segment were “Wealthy Families”, “Lower Middle Class Families”, “Downtown Manhattanites”,
and “Upper Middle Class Families.” The second and fourth segments are the most attractive to
3
Staples because of their high student enrollment. Further details of the segments can be found in
Figure 37.
Based on secondary reports, it is known that Staples would like to reduce its retail space
and maintain its multi-channel approach to business. Based on internal data analysis it is known
that Staples’ customers prefer to pay with a credit card and prefer to not make purchases in brick
and mortar locations. The vending machine approach will provide a way for Staples to reach its
goals while giving its customers a new and convenient way to purchase office and school
supplies with their credit cards. The vending machines will be distributed in zip codes deemed
desirable by the external data analysis, as well as areas near schools and shopping malls.
The vending machines will first be rolled out in New York City because of its large and
diverse population and its willingness to care for the environment. The recycling receptacle and
inclusion of green products will encourage already environmentally conscious New Yorkers to
use the vending machines to satisfy their office and school supplies needs and will promote
additional eco-friendly behavior from the rest of the population.
In order to measure the effectiveness of the campaign, 11 key marketing metrics will be
formulated and tracked. For profit monitoring, customer deciles, return on marketing investment,
profit margin, payback period, internal rate of return, and break-even point will be track.
Multichannel sales success will be measured through market penetration rates, acquisition costs,
and a sales funnel approach. The awareness of the sustainability campaign will be measured
using customer awareness. Finally, in order to improve the customer experience, Staples will
monitor the firm’s Net Promoter Score.
Implementing the vending machines approach in New York City is a way for Staples to
attain its goals of reducing retail space and maintaining its multiple channels and large product
4
portfolio while increasing the share of its customers’ wallets. Should the project be a success in
New York City, it can be scaled to other urban and suburban areas across the country. Solid
marketing intelligence supports this original idea and its future success. The future for Staples,
the Office Supplies Industry, and vending machines is bright.
5
Project Flow Chart
6
Phase 1
a. Summary
Phase 1 of this Staples marketing intelligence project involved looking at both internal
and external data in order to identify Staples’ strengths, its customer demographics, and
characteristics of a specific market in order to help identify a new opportunity. Analysis of
internal data was performed in SAS. This analysis produced four customer segments that were
clustered on the variables order frequency, revenue, and days since last order. Regression
models were then run on these segments with revenue as the independent variable. The models
communicated that the most profitable customers purchase via the web, with a credit card, and in
quarter 3.
External zip code data was also analyzed in SAS in an effort to get a well-rounded
understanding of Staples’ environment. In order to focus this project, New York City was chosen
as the target location. New York City ZIP codes were clustered based on income, student
enrollment and educational attainment. Four distinct clusters of ZIP codes were identified – they
vary significantly by income and education. Two clusters are concentrated in Manhattan and can
be described by high income, high education, and moderate to below average student enrollment.
The other clusters are more heavily focused in Brooklyn, the Bronx, Queens, and Staten Island.
These remaining groups are characterized by lower income but a higher enrollment rate for
students in secondary school.
7
b. Memo to Manager
Based on internal dataset analysis, our team has several suggestions to the manager.
1. Staple needs a creative slogan to make customers have a positive image of brand;
2. Cooperate with Credit Card Banks, providing credits, gifts or cash back if customers use
credit cards to pay orders, especially in off-seasons (Quarter 1, Quarter 2, and Quarter 4);
3. Provide marketing promotion such as seasonal discount, quantity discount, or gross
promotion to simulate customers to purchase more and then increase sum revenue, especially
in off-seasons (Quarter 1, Quarter 2, and Quarter 4).
External analysis has also provided several key insights:
4. New York City has a much higher average income and education rate than the national
average
5. Manhattan is characterized by extremely high income – these areas will value higher-end
products
6. The remaining boroughs have lower incomes on average, but much higher enrollment
rates – these areas are more family oriented
7. Overall, a larger proportion of NYC is characterized by lower income and larger student
enrollment – this may be a more lucrative area for Staples to target any new marketing
campaigns
8
c. Analysis Flow Chart
9
d. Analysis Procedure – Internal Data Analysis
In order to understand Staples’ current customer base, a twelve year internal dataset was
analyzed (through April 30, 2009). The dataset includes variables such as: order source, quantity
of items purchased, returns, payment information, and purchaser ZIP code.
There is one record per order, with multiple orders per household. Orders within the same
household are indicated with matching Household-ID numbers (one number per unique
household). The dataset contains 14,448 order recorders from 10,000 unique households.
The purpose for analyzing the internal dataset is to find what factors influence revenue
for each customer order. This objective will be met by running regression models to predict
revenue for each order and segmenting customers based on their number of orders, revenue, and
purchase recency. A marketing campaign will then be designed to help Staples appeal to more
customers and thus increase revenue. The analysis includes two main parts: dataset modification/
aggregation and data analysis.
1. Dataset Modification and Data Aggregation
For dataset modification, real revenue was calculated for 14,448 orders, as the gross
product revenue before and after the date “01/25/2007”. A new variable, “Quarter”, was created
based on the month of each revenue. Next, payment category was recoded into specific payment
methods. Total item quantity was also calculated for each order. Next, items were coded based
on order methods; orders could be coded as one of the following: catalog, web, or credit card.
Data modification also required creating a new variable to indicate if the order was in Quarter 3
(1 is Yes, 0 is No).
10
The dataset was then aggregated by customer ID. This left 10,000 customer records. For
each customer ID, new variables were created based on this aggregation: number of orders, sum
revenue, and days since last order (“04/30/2009” was set as the current day). After that,
customers were defined as either “High level customer”, “Average level customer”, or “Low
level customer” based on their sum revenue compared to mean and standard deviation of total
revenue.
If sum revenue of one customer is greater than [mean + standard deviation], we defined
that customer as “High level customer”. If sum revenue of one customer is lower than [mean –
standard deviation], we defined that customer as “Low level customer”. If sum revenue of a
customer is between [mean – standard deviation] and [mean + standard deviation], we defined
that customer as “Average level customer”.
2. Data Analysis on Modified Dataset
The first step in analyzing the modified dataset was calculating sum revenue of each
Quarter and each Payment Method. This would provide insights into which quarters generate the
most revenue. Figure 1 and Figure 2 indicate the total revenue and the percentage of total
revenue for each Quarter.
These two graphics show that people purchase most in Quarter 3, which contributed 82%
of total revenue. This is most likely due to school starting in late summer and the high demand
for school supplies during this quarter.
Next, payment method was analyzed. We wanted to know which Payment Method people
most frequently used. Figure 3 and Figure 4 indicate the total revenue and percentage of total
revenue for each Payment Method.
11
Based on these Figures, most people purchase from Staples by Credit Card, since nearly
90% of total revenue was generated by this payment source. The next phase of analysis
involved looking into order channel. According to the dataset, there are three ways: Catalog
Order, Web Order, and Other. Figure 5 and Figure 6 show total revenue and percentage of total
revenue on each Order Indicator.
These two Figures illustrate that nearly 1/3 of total revenue was created by Web Orders
while nearly 2/3 total revenue was created by Catalog Orders. Catalog and Web are the two main
ways orders are placed. Catalog is still the most popular indicator, even though people are
increasingly purchasing online.
Finally, a multiple linear regression model was built to predict real revenue. Real
Revenue was chosen as the independent variable, and Quantity, Web Order, Credit Card, and
Quarter 3 were selected as dependent variables. Figure 7 shows the result of multiple linear
regression models.
According to the analysis results, the p-value is less than 0.0001, so the multiple
regression model results are significant. Each p-value for the independent variable is less than
0.05, which means all independent variables have an influence on the independent variable. The
model is shown below:
Real Revenue = 10.80037 + (16.63232*Quantity) + (2.80396*Web Order) – (1.84099*Credit
Card) + (5.57690*Quarter 3)
This model exhibits that if one customer purchases one additional item, he will generate
16.63232 more for real revenue; if a customer places an order by web, he will generate 2.80396
more for real revenue; if a customer pays by credit card, his revenue will reduce by 1.84099.
12
Additionally, if a customer purchases in Quarter 3, he will generate $5.58 more in revenue than
any other quarter. According to the t-value, Quantity has the most influence on the real revenue,
while credit card has the least effect on the real revenue.
The adjusted R square is 0.5678, which means that 56.78% of cases could be explained
by this regression model. While the R-squared is sufficient, other factors may be influencing real
revenue.
3. Data Analysis on Aggregated Dataset
For data analysis on the aggregated dataset, the objective is to segment Staples customers
based on order frequency, sum revenue and days since last order. The objective is to find out
unique features for each segment and to find the most valuable customers for Staples. After
several attempts, Staples customers were grouped into 5 clusters. Figure 8 shows the initial seed
centers for the 5 clusters. The three variables have been standardized, so that they do not have
unequal influence on the clusters. Figure 9 indicates the mean values for each variable for each
cluster. The mean numbers in the table are still standardized, and Figure 10 shows descriptive
statistics for each Cluster. There was only one customer placed into Cluster 5. This most likely
meant it was an outlier, so the cluster was ignored. The summaries of the clusters are listed
below.
Cluster 1: There are 7,401 customers in Cluster 1, and these customers had the following
characteristics:
1. On average they had the least number of orders (1.08); many of them only purchased once.
2. On average, they purchased the least amount in terms of revenue ($47.98);
3. The average days since last order are 17,478.
13
Cluster 1 represents Staples’ “light users”. This group is important because of its size.
Staples must find a way to get them to purchase again.
Cluster 2: 63 customers with the following characteristics:
1. They purchased frequently at Staples (average of 7.57 times);
2. Their average revenue is relatively high ($593.40)
Customers in Cluster 2 can be defined as “heavy users” at Staples. They make purchases
at Staples frequently and buy much more than customers in other clusters. However, the number
of customers in Cluster 2 is low. Staples should aim to transition customers into Cluster 2.
Cluster 3 & Cluster 4: Customers in Cluster 3 and Cluster 4 had similar buying habits.
They know and like Staples, but only make purchases when they really need school and office
supplies. There is only one customer in Cluster 5, so this individual was moved into cluster 2.
That customer is also a heavy user.
After identifying clusters, sum revenue generated by each segment and by each customer
level was calculated. Figure 11 and Figure 12 show the sum revenue and percentage of each
cluster; Figure 13 and Figure 14 indicate the sum revenue and percentage at a customer level;
Figure 15 shows the number of each customer level, and Figure 16 shows the most popular
Payment Method used by each Cluster.
As seen in these figures, most of the revenue came from Cluster 1 and Cluster 3.
Although customers in Cluster 1 and Cluster 3 are not heavy users for Staples, they have huge
purchasing power. It is important to make sure that these customers have access to a Staples for
when they do need office and school supplies.
14
It seems that the top 11.36% of Staples customers (high quality customers) generated
nearly 40% of the revenue. This phenomenon is reasonable because in the real business world 80%
revenue of a company comes from top 20% customers. For Staples, this rule also applies. As
seen in Figure 16, every cluster places the majority of orders with their credit cards. Based on
this figure, perhaps Staples should cooperate with Credit Card companies, providing some
credits, gifts or cash back if customers use credit card to pay the order.
The next step in the analysis involved identifying which variables had relationships with
the sum revenue for each customer. Days Since Last Order and Number of Orders were selected
as the specific variables. Next, the Pearson coefficient was calculated between Sum Revenue and
Days Since Last Order and between Sum Revenue and Number of Orders. Figure 17 and Figure
18 show the results of this correlation analysis.
The p-values of two analyses are both less than 0.0001, indicating that the relationship
between Sum Revenue and Days Since Last Order is significant. The relationship between Sum
Revenue and Number of Order is significant as well. The Pearson coefficient between Sum
Revenue and Days Since Last Order is -0.13064, which means Sum Revenue has a weak
negative relationship with Days Since Last Order, so days since last order could not have much
influence on sum revenue. The Pearson coefficient between Sum Revenue and Number of Orders
is 0.62946, which means Sum Revenue has a strong positive relationship with Number of Orders.
So more orders means more revenue. That’s why it is really necessary to find a way to increases
the purchase rates of the large segments.
From the analysis, several key findings were identified:
1. People purchase the most from Staples in Quarter 3 (July, August, and September);
15
2. Most orders were paid by Credit Card;
3. Nearly 2/3 of the orders and the revenue came from catalog orders;
4. In 56.78% cases, revenue for an order could be explained by our regression model, using
the dependent variables quantity, web orders, credit card, and quarter 3;
5. Customers are segmented into 5 clusters, each cluster has unique characteristics;
6. For each customer, Revenue has a strong positive relationship with number of order
placed;
7. 10% of the High level customers generated nearly 40% revenue.
e. Analysis Procedure - External Data Analysis
In order to craft and execute a new marketing campaign for Staples, a target location was
first selected. New York City is one of the largest cities in the world. In effect, it serves as one of
Staples’ largest target markets. In order to understand the 5 boroughs of New York City
demographically, New York City ZIP code data was collected from Census.gov. Variables
extracted from the Census American Fact Finder portal include: (by ZIP code) population, total
number of households, total number of households that are families, number of students enrolled
in Pre-K through 12th
grade, median household income, and various other educational attainment
and income demographics. The primary objective of this external data analysis was to ascertain
what New York City looks like in terms of our target market, on a micro-level by ZIP code.
Overall the data shows that, on average, the 5 boroughs of NYC fall well above the
national average in terms of income, education level, and student enrollment. As seen in Figure
25, the average population for a zip code in NYC is 46,747. Compared to the US average of
7,034, this is extremely high. Moreover, an average ZIP code has 45,830 households and 80% of
those (36,909) are family households. Students enrolled in secondary school (Pre-K through 12th
16
grade) is 11,682 in an average ZIP code in NYC, compared to the national average of only 1,374.
Median household income is $65,458.64; the national average is $53,482. 30.06% of residents in
the average NYC ZIP code have at least a Bachelor degree; 21.23% is the national average.
Based on these figures, New York City is an attractive urban area for Staples to launch a
campaign. If a campaign can work in an area as well-suited to the target market as NYC, it can
be scaled to other urban areas.
In order for Staples to effectively launch a marketing campaign, the company first needs
to understand the geography in terms of demographic characteristics and behavior. One analysis
that will benefit Staples is the rate of students in secondary school. Given the firm’s large
product portfolio of school supplies, Staples should understand where to target campaigns geared
toward students. Staples needs to understand how income and education levels affect the number
of students enrolled in secondary school. Figure 29 shows the relationship between the number
of students enrolled in Pre-K through 12th
grade and Median Household Income. As Median
Household Income increases, the number of students enrolled in school decreases. Since Staples
needs to find ZIP codes highly concentrated with students, it may be beneficial to consider
targeting lower-income areas specifically and to offer more value-focused products. Relatedly,
Figure 30 shows that as the poverty level of a ZIP code increases, student enrollment increases.
There is also a negative relationship between student enrollment and percent of the population
with a Bachelors or more 31. As the population becomes more affluent, student enrollment tends
to decrease.
The overall demographic characteristics mentioned previously have proven that this
strategy will be well-piloted in New York City given the large population characterized by high
income and education. However, the data also shows that the relationships between income and
17
education are not in Staples’ favor. For example, it is exhibited that as income and education
increase, the number of young students decreases. Ideally, Staples would like to choose areas
where high income and education result in higher levels of student enrollment. Essentially, these
ideal areas will be full of wealthy families.
Now, Staples needs to understand how the ZIP codes can be grouped together according to
these traits. With this, the firm will be able to focus in on two location-based strategies:
1. Find areas that do not fit the norm – ZIP codes characterized by high income, education,
and student enrollment
a. These ZIP codes will be easy to target – Staples can emphasize high-quality
products and charge a premium for sustainability
2. Find areas that may be in sync with the averages, but identify unique strategies that will
encourage these individuals to take action
a. These ZIP codes will be trickier to target – Staples will need to offer competitive
prices that compel price-sensitive families to use these kiosks
Since one large target market for Staples is families with young children who frequently
shop for school supplies, ZIP codes were clustered in SAS using K-Means by three variables:
number of students enrolled in Pre-K through 12th
grade, median household income, and percent
of the population with a Bachelors degree or more. As seen in Figure 32, four distinct clusters
were identified based on student enrollment, income, and education.
Cluster 1, containing 32 NYC ZIP codes, has on average 3,190 students enrolled in Pre-K
through 12th
grade, a median household income of $109,016.81, and 61.45% of the adult
population has a Bachelors degrees or higher. Cluster 2 (the largest clusters with 80 ZIP codes)
has a much higher number of students enrolled in secondary school: 11,583.54 on average. This
18
cluster’s median household income is $40,160.88, and 18.10% of its population on average has a
Bachelor degree or higher. Cluster 3 only comprises of 2 ZIP codes – it is characterized by low
student enrollment (1,107.50), extremely high income ($247,778.50) and high educational
attainment (60.77%). Finally, Cluster 4 comprises of 65 ZIP codes. This group on average has
5,906 students enrolled in secondary school, a median income of $69,540.49 and only 28.38% of
its population on average having a Bachelor or more.
Cluster 1 – Wealthy Families
Overall, these descriptive statistics provide insights on the demographics of these clusters
as a whole. It is seen that Cluster 1 has high income, high educational attainment, and above
average student enrollment. Cluster 1 is likely willing to pay a premium for environmentally
friendly products (given that they are financially stable and educated), and they have a decent
amount of students enrolled. These are Staples’ “Wealthy Families”.
Cluster 2 – Lower Middle Class Families
Cluster 2 highly contrasts to Cluster 1. Cluster 2 is characterized by extremely high
student enrollment rates, low income, and low education. Cluster 2 represents our “Lower
Middle Class Families”. They likely look for value when buying school supplies for their
children – they are on a budget and working hard to provide for their families. Offering discounts
will help alleviate some of the risk they associate with purchases.
Cluster 3 - Downtown Manhattanites
Cluster 3 can be classified as outliers. These two ZIP codes have low student enrollment,
extremely high income and tend to be very well-educated. They are geographically situated
downtown in TriBeCa, an area famous for affluent celebrities and executives. This group
represents our “Downtown Manhattanites” who have been successful professionally, but haven’t
19
focused on family as much. This cluster will respond better to high-end products that may not be
family related – but more suitable for professional use.
Cluster 4 – Upper Middle Class Families
Finally, Cluster 4 is our middle-of-the-road cluster in all aspects. They have average
student enrollment, average income, and average educational attainment levels. These ZIP codes
contain working professionals who also have families. Cluster 4, our “Upper Middle Class
Families”, will be interested in products for their children that are reasonably priced, but they
also place value on the look and feel. This working class group also may be susceptible to the
idea of sustainable products.
Figure 37 shows geographically where each of these distinct groups of ZIP codes are
located. As described above, each group will require a different marketing strategy aimed at
offering a value proposition that is suitable to their needs.
Based on the results of analysis on internal data, external data, and secondary industry
reports it is recommended that Staples offer their products in vending machines. The vending
machines should first be rolled out in New York city because of NYC’s diverse population,
willingness to care for the environment, and willingness to try new things. The recycling
receptacle and inclusion of green products will encourage already environmentally conscious
New Yorkers to use the vending machines to satisfy their office and school supplies needs and
will promote additional eco-friendly behavior from the rest of the population. Staples currently
fulfills the majority of their customers’ orders through their distribution network. However, the
vending machines will make accessing Staples products even more easy and convenient.
The new vending machine campaign will increase both sales and customer loyalty by
addressing the needs of the targeted segments. This segment can be described as educated
20
families who, based on their decision to live in a large city, value convenience, and, according to
our research, care about the environment. The machines will be located in zip codes whose
populations fit our desired profile and in close proximity to the schools the children attend. This
will eliminate the need to travel to one of Staples’ costly brick and mortar locations and give
easy access to school supplies on the way to and from school each day.
Effective implementation of the vending machines will help Staples reach its goals of
improving customer experience, increasing share of wallet, developing its multi-channel
approach, and increasing its large product offering.
21
Phase 2
a. Summary
Based on the results of analysis on internal data, external data, and secondary industry
reports it is recommended that Staples offer their products in vending machines. The vending
machines should first be rolled out in New York City because of NYC’s diverse population,
willingness to care for the environment, and willingness to try new things. The recycling
receptacle and inclusion of green products will encourage already environmentally conscious
New Yorkers to use the vending machines to satisfy their office and school supplies needs and
will promote additional eco-friendly behavior from the rest of the population.
The new vending machine campaign will increase both sales and customer loyalty by
addressing the needs of the segments we are targeting. This segment can be described as
educated families who, based on their decision to live in a large city, value convenience, and,
according to our research, care about the environment. The machines will be located in zip
codes whose populations fit our desired profile and in close proximity to the schools the children
attend. This will eliminate the need to travel to one of Staples’ costly brick and mortar locations
and give easy access to school supplies on the way to and from school each day.
Effective implementation of the vending machines will help Staples reach its goals of
improving customer experience, increasing share of wallet, developing its multi-channel
approach, and increasing its large product offering.
Ultimately, it is recommended that Staples roll out the vending machines because they
allow Staples to reduce its retail space and maintain its multi-channel approach and large product
offering. The vending machines should be rolled in the appropriate New York City zip codes
(based on external data analysis) as well as near schools and shopping malls.
22
b. Company Overview
Staples, Inc. is a brick and mortar/ online retailer that provides an assortment of office
products to businesses and individual consumers. The company offers a lowest price guarantee
and emphasizes convenience - consumers can purchase in store, online, via a mobile device or
through social apps. Staples has in-store business centers that offer the following: shipping,
copying, scanning, faxing, computer work stations, tech services, and printing, marketing, small
business lending and credit services (“Staples and Office Depot”, 2016).
Staples Business Advantage is the B2B (business-to-business) division of the company.
This division helps business customers make purchases for their company in a curated way with
customer service, competitive pricing, and an e-commerce site designed for B2B purchasing.
Product offerings within this division include: office supplies, facilities cleaning and
maintenance, breakroom snacks, furniture, and printing and marketing services (“Staples and
Office Depot”, 2016).
Staples falls within the office supplies and stationery stores industry. This industry is made
up of retail stores that engage in one or more of the following (“OFFICE SUPPLIES &
STATIONERY STORES INDUSTRY”, 2016):
● Retailing new stationery, school supplies, and office supplies
● Selling a combination of new office equipment, furniture, and supplies
● Selling new office equipment, furniture, and supplies in combination with selling
computers
Being a leading supplier of office supplies across multiple countries, Staples must cater to
the demands of several different customer bases within the industry. In order to do this, Staples
maintains various retail channels: contract businesses, retail stores, catalogs, and the Web. Major
23
strengths for Staples internally include its multi-channel approach, its large product portfolio,
and its strong distribution network. Staples’ main weakness is its dependence on third-party
vendors. Opportunities in the industry include the growth in online retail sales, private label
brand growth, and focusing on cost control in order to increase profits. Overall, threats to the
industry include increased competition in the market overall due to major online retailers such as
Amazon, as well as a decline in paper consumption in offices (Staples, Inc., 2014). As of 2016,
there were 1,450 retail outlets in the United States that fall into this industry. Sales grossed at
$16,254,000,000 ($2,309,000 per establishment) (“OFFICE SUPPLIES & STATIONERY
STORES INDUSTRY”, 2016).
In order to develop a strategy to improve Staples’ bottom lines within a niche target
market, Staples’ strengths should be further examined. Staples has proved to be the best at two
things in the industry:
1. Utilizing multiple retail channels in order to target end-customers in both the B2C and
B2B space
2. Maintaining a large product and services portfolio (Staples, Inc., 2014)
Staples uses multiple channels such as in-store and Web in order to cater to the demands of
its multi-faceted customer base. Staples uses its contract business to target medium to large sized
businesses and offer them special services such as account management, delivery, proprietary
items stocking, and a wide assortment of environmentally friendly products and services. Its
online and retail stores are crafted to satisfy individual consumers on a geographic basis. For
instance, in suburban areas Staples offers large supercenters. In more urban and rural markets,
the firm operates smaller format stores. Staples’ most dominant strategy, therefore, is to leverage
24
its retail channels in order to attract different customer groups with distinct purchasing behaviors
(Staples, Inc., 2014).
By focusing on cost control, Staples can leverage another key opportunity. Currently, the
firm is focusing on considerably reducing its costs. In 2012, Staples set a goal to save $250
million in annual pre-tax savings by 2015. Its plan has been to save in areas like product cost,
store operations, and supply chain issues. In order to achieve this, the firm has started to reduce
retail space in North America by 15% (Staples, Inc., 2014).
c. Market Overview
A major opportunity within the office supplies industry is the vast growth in online retail
stores. Consumers are increasingly beginning to prefer online shopping due to its interactive
function. The U.S. Department of Commerce cited that online retail sales grew 16.1% between
2010 and 2013, while in-store sales grew by only 4.3%. Currently, Staples drives $10 billion in
sales yearly from its online channel (Staples, Inc., 2014).
The digital age has led the office supply industry to become much less powerful than it
once was. With most office work shifting online, paper and printer ink are becoming nearly
obsolete. Staples competes with online and traditional retailers focused in the office supply
industry (Office Depot) or broader mass merchants (Wal-Mart, Target, and Amazon). This has
led to consolidation within the industry (Staples, Inc., 2014).
Currently, Office Depot and Staples are fighting to merge the two companies. The
companies argue that this merger will allow them to be better positioned to serve the “changing
needs of business customers” and to compete against larger more diverse competitors like
amazon. The FTC is trying to stop this merge due to fear of monopolization in the industry.
25
However, with e-Commerce sites such as Amazon expanding so rapidly, the two firms feel a
merger is the only way to compete (“Staples and Office Depot”, 2016).
d. Opportunity
Secondary reports indicate that Staples is looking to reduce retail space and store
operations costs while still maintaining its multi-channel approach and its large product portfolio.
Based on internal and external research combined with an understanding of the current industry
and market environment, these goals can be achieved with the implementation of Staples
vending machines. The project will initially be rolled out in the five boroughs of New York City,
and will include a receptacle to recycle used Staples products. New York City is the United
States’ 6th
greenest city, and the recycling aspect of the new sales channel will encourage eco-
friendly behavior and draw new users to the machines (NerdWallet, 2016). The contents of the
vending machines will include Staples’ “recycled and eco-friendly” product line as well as other
high performing office and school supplies products. The strategic placement of the vending
machines throughout the five boroughs will be based on analysis of ZIP code level demographic
data as well as the spatial distribution of schools and shopping malls. The successful
implementation of the vending machines will enable Staples to eliminate retail space while still
offering multiple purchase channels and a diverse portfolio of products.
New York City is one of the largest cities in the world with rich culture and history. The
population in NYC is around 8,491,079 people. 26% of the total population in NYC is under 18
years of age. There are in total 3,095,931 households and the average household size is 3 people.
For 2012, the total retail sales per capita were $11,067. The median household income across
New York City stands at $53,657, according to 2014 department of numbers. The per capita
income is $33,095 (Census Data).
26
In New York City there are more than 26,000 people living in each square mile. It takes
75,000 trees to print a Sunday edition of the New York Times. New York City has more people
than 39 of the 50 states in the U.S. The borough of Brooklyn on its own would be the fourth
largest city in the United States. Queens would also rank fourth nationally. Manhattan’s daytime
population swells to 3.94 million, with commuters adding a net 1.34 million people
(Bigapple.com).
The New York City public school system is the largest in the world. New York State’s
policy is to provide language access to public services and programs. More than 1.1 million
students are taught in more than 1,700 public schools with a budget of nearly $25 billion. NYC
spends $19,076 each year per student. The public school system is managed by the New York
City Department of Education. On the other hand, there are approximately 900 additional
privately run secular and religious schools in the city (Schools NYC).
Staples needs to take full advantage of this population by offering the products that they
need. Consumers are expected to spend about $68 billion on back-to-school spending this year,
as compared to $75 billion last year (Fortune.com, 2016). This is a 9.3% decrease that is part of
a larger trend brought on by increasing dependence on technology. Including school supplies
staples and placing the vending machines in areas near schools and populations of young
students will help Staples claim a larger share of this market and potentially bring spending on
school supplies back up.
The median family income for NYC is $71,115 in 2014 according NYC.gov. New York
is home to 653,000 households with at least one child under the age of 18. A family of four
could spend in Manhattan an average of $93,500 - which would cover only cover food, transport,
housing, health care, child care and taxes but not vacations, eating out or savings (NY daily
27
news). Total household expenditures in New York City are above the national average and the
education is not the exception.
In addition to being a media and commercial hub, New York City is home to residents
who are ready and willing to care for the environment. Based on the findings of a 2014 poll,
New York City was named the 6th
greenest city in the United States (Pew Research Center,
2015). Factors including willingness to use public transportation, support for restrictions on
pollution, and willingness to recycle contributed to the rankings. Every day in New York City
there are 2.0 pounds of paper/cardboard recycling collected per resident and .21 pounds of
metal/glass recycling collected per resident (New York City Municipal Refuse and Recycling
Statistics, 2015). These are well above the respective daily national averages of .09 and .05
pounds per person (Municipal Solid Waste, 2015). The recycling receptacle and inclusion of
green products will encourage already environmentally conscious New Yorkers to use the
vending machines to satisfy their office and school supplies needs and will promote additional
eco-friendly behavior from the rest of the population.
Staples must take the secondary report research and internal and external data analysis
into account in order for the new vending machines to achieve all the company’s goals. These
goals include reducing its retail space, maintaining its multi-channel approach and large product
portfolio, increasing share of wallet with existing customers, and taking full advantage of New
York’s large and unique market.
We currently fulfill the majority of customers’ orders through our distribution network.
As we expand our assortment, we are increasingly relying on third parties to fulfill orders and
deliver products directly to our customers. However, we want to keep increasing our relationship
28
with the customers, by understanding them better, supporting them with products specific to their
needs, and making it easier to access our products.
The new vending machine campaign will increase both sales and customer loyalty by
addressing the needs of the segments we are targeting. This segment can be described as
educated families who, based on their decision to live in a large city, value convenience, and,
according to our research, care about the environment. The machines will be located in zip
codes whose populations fit our desired profile and in close proximity to the schools the children
attend. This will eliminate the need to travel to one of Staples’ costly brick and mortar locations
and give easy access to school supplies on the way to and from school each day.
Citizens who can afford to are more likely to buy eco-friendly products than those who
cannot. Because the segment we are targeting is in a high income and education bracket, and
because New York is known to have citizens who care about the environment, the recycling
aspect and inclusion of green products will contribute to the success of the Staples Vending
Machines.
Staples is strategically looking for ways to reduce its number of stores while still
retaining customers and increasing customer lifetime value. As the firm looks to cut costs,
vending machines provide an attractive solution given that they are less expensive to operate in
the long term (Success with Self-Serve Kiosks, 2010). Retailers in multiple industries have seen
the benefit of using self-service machines in order to reduce operating costs. In 2010, the market
for these machines was $3.2 billion.
Retail vending machines (or self-service kiosks) provide several key benefits to large
corporations looking to increase sales without the burden of large store-fronts. Among the firms
that have utilized self-service kiosks for product sales are: Macy’s, Best Buy, Proactiv, Benefit
29
Cosmetics, and Nespresso. Macy’s, for example, understood that its customers placed value on
one-stop shopping. As a result, the retailer launched e-Spot, an automated shop offering
consumer electronics with touchscreen technology for handling sales transactions. Brand inside
Macy’s e-Spot include Apple, Beats, Skullcandy, and iHome. Prices range from $24.99 to
599.99. A transaction is completed in less than two minutes. Customers have given very positive
feedback to Macy’s- they enjoy the no-pressure experience that it provides. Today’s shoppers
value ease of use and instant gratification (Zoom Systems, 2016).
Best Buy was the first retailer to launch an automated retail experience in the consumer
electronics market. The firm introduced the Best Buy Express ZoomShop in 2008. The
machines help consumers stay connected by offering products such as digital cameras,
headphones, phone chargers, and other travel gadgets. Best Buy is known to place these
machines in airports (Zoom Systems, 2016).
Macy’s and Best Buy both understood a value proposition its customers would respond to
well: self-service and instant shopping. In an effort to target urban families with school aged
children during the busy back-to-school season as well as off-season, Staples self-service kiosks
is an obvious solution. Not only will these automated machines (placed strategically throughout
the five boroughs of NYC) help Staples reduce the amount of physical stores in the area, but it
will also provide customers with an interesting, convenient new way to shop.
e. Target Customer
The segments we are going to target as a marketing priority are families in the 5 boroughs
of New York City with children enrolled in secondary school. And later we plan to expand the
target market to the households in the whole North American areas.
30
New York City’s above average population, income levels, education levels, and student
enrollment made it a great springboard location for the campaign. According to Staple Reports,
their B2B sales increased by 1.1% in 2015 while total sales declined by 6.4%. Therefore, it is
necessary to focus on acquisition and retention of household customers. Households make up a
large share of all sales to Staples and they spend billions of dollars a year on the products and
services Staples sells. Targeting households with young children will support this effort and
drive sales, growth, and profit for the company. To be specific, we divide our target customers
in New York City into three groups:
Wealthy Families
Households with high income and high educational levels that are located in areas of
above average student enrollment are our primary target customers. Since environmentalism is
by now deeply rooted in the consumer mind-set and public-policy arena, it will make sense to
market Staples school supplies to affluent and well educated families because they are more
likely to embrace the idea of green and sustainable products. They will pay a premium for the
chance to do well and, in many cases, be seen doing well.
Upper Middle Class Families
Families with average income and average educational attainment levels in areas of
average student enrollment are our second target markets. Compared with households with high
income and educational levels, they are not willing to pay extra for the expensive school supplies
but they are still influenced by the green consumerism, which will eventually persuade them to
use the vending machines. They focus more on the practical value than the price of the product.
31
Lower Middle Class Families
Families with low income and low education in areas of extremely high student
enrollment will make up our third targeted segment. They are on a budget and are working hard
to provide for their children, and they amount to large population in New York City. This means
they have high demand for school products for their children. They are price sensitive and are
not that interested in green products. Hence, for this target market, Staples can cooperate with the
school and offer a program targeted at parents and teachers. By giving rewarding points and
discount, Staples can easily win favor of lower middle class families.
Value Proposition
We strongly believe in delivering school products that generate environmental benefits.
As a world-class retailer, Staples will let customers shop however and whenever they want,
whether it is in store, online, on vending machine or on mobile devices. Also, we will provide
our customers with the most sustainable products, improving our offering of recycling and green
services.
f. Goals
Reduce costs and maximize profits
The digital age has led the office supplies industry to become much less powerful than it
once was. With most office work shifting online, paper and printer ink are becoming nearly
obsolete. Although the fact that the company remains profitable, Staples will struggle to survive
in the coming years. Thus, our primary goal in 2016 is to reduce expenses and maximize profit
with existing customers, and acquiring new customers. We aim to grow sales by 5% within the
next 2 years. Also, we strive to obtain 45% of the market share within 2 years. As we gain more
32
customer satisfaction and gradually expand the offerings through vending machines, we expect a
50% market share within 5 years.
Develop multi-channel marketing
While Staples’ brick-and-mortar physical stores are losing money, we aim to shift focus
towards a multichannel approach as to leverage the existing stores and maximize profit. By
launching vending machine without Staples’ space or a new type of smaller stores that engage
customers with interactive kiosks, we are striving to drive more sales to both online and offline
stores.
Green and recycling product
From the environmental perspective, we are devoted to selling more sustainable products
and services. We will continue to improve sourcing, identification and the promotion of greener
products to customers while at the same time offering easy recycling solutions. For example, by
2020, we aim to recycle 100 million paper, ink and toner cartridges each year across all
operations, especially from the vending machine. In sum, our goal is to make more sustainable
business practices happen.
Provide the best customer experience
Customers are of great importance to us. Therefore, we are making efforts to provide our
customers with optimal customized in-store and online experience and help them find the best
deals and the right products quickly. Our objective is to let customers shop however and
whenever they want, and eventually earn customer satisfaction and loyalty to the company.
33
Perceptual Map
On the Basis of our perceptual map, we realize that there is a need for a brand that is not
only easy to buy but also eco-friendly. Thus, the value proposition of our product could give us a
competitive edge over the other brands mentioned above.
34
Phase 3
a. Summary
Phase 3 of this marketing intelligence project involved identifying metrics and
developing a dashboard in order to track the success of this new campaign. Metrics and
dashboard implementation are crucial in evaluating any marketing program. Once established
these metrics must be tracked on a regular basis and provided to management in the form of a
user-friendly dashboard.
Metrics were created based on the marketing team’s key goals: increase profits, promote
multichannel sales, increase sustainability awareness, and improve customer experience. In order
to track profits, the team developed measures to calculate: customer deciles, return on marketing
investment, payback period, profit margin, internal rate of return, and break-even point.
Multichannel sales success will be tracked through market penetration costs, acquisition costs,
and a sales funnel approach. This will inform the team on how well these vending machines are
performing. Sustainability awareness will be measured through customer awareness, and
customer satisfaction will be tracked by Net Promoter Score.
This marketing metrics model is also presented in the form of a flow chart and dashboard
so that management can more clearly visualize the ways in which this campaign will help Staples’
reach its ultimate goal of increasing profits.
35
b. Metric Overview
36
Product Adoption Funnel
37
Customer Deciles
38
c. Marketing Model (Figure)
39
d. Dashboard
40
Limitations
The team acknowledges some limitations and scope to this marketing intelligence project.
Primarily, this analytics project is strictly within the scope of New York City. External research was
conducted specifically with the five boroughs of New York City in mind, and all assumptions and
strategy decisions were based on this research. While it can be assumed that a project successful in one
urban area can be scalable to other urban areas, additional research will need to be conducted to get a
better understanding of these different markets.
Another limitation of this analysis lies in our inability to connect our external market research
with our internal customer dataset. Because external market research is on a Census level, the team is
unable to identify the percentage of individuals that actually belong to Staples’ customer base. A strong
marketing campaign could be developed if it was known which individuals in the target market were
current customers and which Staples was looking to acquire as new customers.
In terms of internal data analysis, revenue, number of orders and recency were the only attributes
measured for each customer. However, customers can interact with and create value for firms in a
variety of ways including customer lifetime value, customer referral value, customer influencer value, as
well as customer knowledge value (Kumar, 2010). Past purchasing behavior is only a small portion of
predicting future behavior. Future customer behavior will also be influenced by things such as a
customer’s number of connections to other customers, emotional valence of the customer’s reviews, and
willingness to recommend. Additional variables were building a more accurate model for customer
behavior.
41
Works Cited
OFFICE SUPPLIES & STATIONERY STORES INDUSTRY (NAICS 45321). (2016). World
Industry & Market Outlook Report, 1-166.
Staples and Office Depot Issue Open Letter to Customers. (March 18, 2016).
http://investor.staples.com/phoenix.zhtml?c=96244&p=irol-newsArticle&ID=2149502
Q4 2015 Finances. http://finance.yahoo.com/news/staples-inc-announces-fourth-quarter-
110000503.html
Staples Performance Summary 2012-2014.
http://www.staples.com/sbd/cre/marketing/about_us/documents/globalperfsummary-2015.pdf
Will 2016 Be Staples, Inc.’s Worst Year Yet? (January 19, 2016).
http://www.fool.com/investing/general/2016/01/19/will-2016-be-stapless-bestworst-year-
yet.aspx
Staples, Inc. SWOT Analysis. (2014). Staples, Inc. SWOT Analysis, 1-8
Zoom Systems, 2016. http://www.zoomsystems.com/our-clients/best-buy
Success with Self-Serve Kiosks. (2010). Specialty Retail Report.
http://specialtyretail.com/issue/2010/01/retail-products/success-with-self-serve-kiosks/
42
Appendix A: Tables and Figures
Figure 1
Figure 2
43
Figure 3
Figure 4
44
Figure 5
Figure 6
45
Figure 7 Multiple Linear Regression Model on Real Revenue
Figure 8 K-means Cluster Analysis Initial Seed Centers
46
Figure 9 Means of K-mean Cluster
Figure 10 Descriptive Statistics for each Cluster
Figure 11
47
Figure 12
Figure 13
48
Figure 14
Figure 15 Numbers for each Customer Level
49
Figure 16
Figure 17: Correlation between Sum Revenue and Days since Last Order
50
Figure 18: Correlation between Sum Revenue and Number of Order
Figure 19
51
Figure 20
Figure 21
Figure 22
52
Figure 23
Figure 24
Figure 25
NYC Census Data: Descriptive Statistics
53
Figure 26
NYC Census Data: Count of Students Enrolled in Pre-K through 12th
Grade by ZIP code
Figure 27
NYC Census Data: Count of Median Household Income Range by ZIP Code
54
Figure 28
NYC Census Data: Count of Population with a Bachelor or More by ZIP Code
Figure 29
NYC Census Data: How Does Median Household Income Affect Secondary School
Enrollment by ZIP Code
55
Figure 30
NYC Census Data: How Does the Count of People Below the Poverty Level Affect Secondary School
Enrollment by ZIP Code?
Figure 31
NYC Census Data: How Does a ZIP Code’s Percent of People with a Bachelors or More Affect
Secondary School Enrollment by ZIP Code?
56
Figure 32
K-Means Cluster Analysis: Final Cluster Means
Figure 33
K-Means Cluster Analysis: Clusters Broken Down by Median Income and Student
Enrollment in Pre-K through 12
57
Figure 34
Distribution of Median Income by Cluster
Figure 35
Distribution of Secondary School Enrollment by Cluster
58
Figure 36
Distribution of Educational Attainment by Cluster
59
Figure 37
60
Appendix B: SAS Code
I. Internal Analysis
/********** internal data analysis ***************/
/* Generated Code (IMPORT) */
/* Generated Code (IMPORT) */
/* Source File: Data Set 8 DMEF0509-2 - EXCEL.xlsx */
/* Source Path: /folders/myfolders/Mylib */
/* Code generated on: Thursday, April 14, 2016 18:30:00PM */
/* Import dataset named Import */;
PROC IMPORT DATAFILE="C:\SAS\Data Set 8 DMEF0509-2 - EXCEL.xlsx" dbms=xlsx
OUT=IMPORT REPLACE;
RUN;
/* DATA Part */;
/* Create and alter new datasets */;
/* Create a new dataset named MODIFY from IMPORT */;
DATA MODIFY;
SET IMPORT;
/* Create a new variable named Mouthnumber indicating the month of each order, then change data
format to mmddyy10., and calculate the RealRevenue for each order by time "01/25/2007" */;
MonthNumber=month(InputDate);
FORMAT InputDate mmddyy10.;
IF InputDate ge input("01/25/2007",mmddyy10.) THEN
RealRevenue=GrossProductRevenueAmount+ShippingHandling+SalesTax;
61
ELSE
RealRevenue=GrossProductRevenueAmount-CancelAmount-ReturnedAmount+CouponAmount-
AdditionalChargesAmount;
/* Create a new variable named Quarter based on Monthnumber */;
IF MonthNumber IN(1,2,3) THEN Quarter=4;
ELSE IF MonthNumber IN(4,5,6) THEN Quarter=1;
ELSE IF MonthNumber IN(7,8,9) THEN Quarter=2;
ELSE Quarter=3;
/* Create a new variable named PaymentThod based on PaymentCategoryCode */;
IF PaymentCategoryCode=1 THEN PaymentMethod="Cash";
ELSE IF PaymentCategoryCode=2 THEN PaymentMethod="Credit Card";
ELSE IF PaymentCategoryCode=3 THEN PaymentMethod="Debit Card";
ELSE PaymentMethod="Coupon/GiftCard";
/* Create a new variable named Quantity indicating purchase quantity of each order */;
IF CatalogItemIndicator="Y" THEN Quantity=CatalogItemQuantity;
ELSE Quantity=WebitemQuantity;
/* Create a new variable named CatalogOrder indicating if the order was Catalog order. 1 is Yes; 0 is
No */;
IF CatalogItemIndicator="Y" THEN CatalogOrder=1;
ELSE CatalogOrder=0;
62
/* Create a new variable named WebOrder indicating if the order was Web order. 1 is Yes; 0 is No
*/;
IF WebItemIndicator="Y" THEN WebOrder=1;
ELSE WebOrder=0;
/* Create a new variable named CreditCard indicating if the order was purchased by a credit card. 1
is Yes; 0 is No */;
IF PaymentMethod="Cred" THEN CreditCard=1;
ELSE CreditCard=0;
/* Create a new variable named Quarter3 indicating if the order was in Quarter3. 1 is Yes; 0 is No */;
IF Quarter=3 THEN Quarter3=1;
ELSE Quarter3=0;
/* Create a new variable named OrderIndicator indicating by which indicator the order placed */;
IF CatalogItemIndicator="N" AND WebItemIndicator="N" THEN OrderIndicator="Others";
ELSE IF CatalogItemIndicator="Y" THEN OrderIndicator="Catalog";
ELSE OrderIndicator="Web";
RUN;
/* Create a new dataset named AGGREGATE from MODIFY, keep ID, Inputdate, RealRevenue and
PaymentMethod variables, and sort by ID */;
/* Sort the dataset MODIFY by ID and Inputedate */;
PROC SORT DATA=MODIFY;
BY ID Inputdate;
63
DATA AGGREGATE;
SET MODIFY(KEEP=ID Inputdate RealRevenue PaymentMethod);
BY ID;
/* Aggregate the number of order by each Customer ID */;
DO;
IF FIRST.ID THEN NumberofOrder=1;
ELSE NumberofOrder+1;
END;
/* Aggregate the Revenue by each Customer ID */;
DO;
IF FIRST.ID THEN SumRevenue=RealRevenue;
ELSE SumRevenue+RealRevenue;
END;
IF LAST.ID;
DROP RealRevenue;
/* ADD the last day "04/30/2009" */;
LASTDAY=Input("04/30/2009", mmddyy10.);
FORMAT LASTDAY mmddyy10.;
/* Calculate the days since last order for each customer */;
DaysSinceLastOrder=LASTDAY-InputDate;
RENAME PaymentMethod=LastPaymentMethod;
RUN;
64
/* Alter the dataset AGGREGATE, add new variables indicates Mean and Standard Deviation of
Revenue, add X1=Mean+STD; X2=Mean-STD, and then define each customer */;
DATA AGGREGATE;
SET AGGREGATE;
MERGEVAL=1;
RUN;
PROC MEANS DATA=AGGREGATE NOPRINT;
VAR SumRevenue;
OUTPUT OUT=MEANS1;
RUN;
DATA STDDEV1;
SET MEANS1;
IF _STAT_='STD';
MERGEVAL=1;
KEEP SumRevenue MERGEVAL;
RENAME SumRevenue = STD_SumRevenue;
RUN;
DATA MEANS1;
SET MEANS1;
IF _STAT_='MEAN';
MERGEVAL=1;
KEEP SumRevenue MERGEVAL;
RENAME SumRevenue = MEAN_SumRevenue;
65
DATA AGGREGATE;
MERGE AGGREGATE MEANS1 STDDEV1;
BY MERGEVAL;
DROP MERGEVAL;
RUN;
DATA AGGREGATE;
SET AGGREGATE;
X1 = Mean_SumRevenue+STD_SumRevenue;
X2 = Mean_SumRevenue-STD_SumRevenue;
IF SumRevenue gt X1 THEN Customertype = "High quality customers";
Else IF SumRevenue lt X2 THEN Customertype = "Low quality customers";
Else Customertype = "Avg quality customers";
DROP X1 X2;
Run;
/* Alter the dataset AGGREGATE, order data sequence by Sumrevenue, and then slice customers
into Deciles by SumRevenue */;
PROC SORT DATA=AGGREGATE;
BY DESCENDING SumRevenue;
DATA AGGREGATE;
SET AGGREGATE;
BY DESCENDING SumRevenue;
Number=_N_;
IF 1 LE Number LE 1000 THEN CustomerDecilesbySumRevenue="0-10% ";
66
ELSE IF 1001 LE Number LE 2000 THEN CustomerDecilesbySumRevenue="10-20%";
ELSE IF 2001 LE Number LE 3000 THEN CustomerDecilesbySumRevenue="20-30%";
ELSE IF 3001 LE Number LE 4000 THEN CustomerDecilesbySumRevenue="30-40%";
ELSE IF 4001 LE Number LE 5000 THEN CustomerDecilesbySumRevenue="40-50%";
ELSE IF 5001 LE Number LE 6000 THEN CustomerDecilesbySumRevenue="50-60%";
ELSE IF 6001 LE Number LE 7000 THEN CustomerDecilesbySumRevenue="60-70%";
ELSE IF 7001 LE Number LE 8000 THEN CustomerDecilesbySumRevenue="70-80%";
ELSE IF 8001 LE Number LE 9000 THEN CustomerDecilesbySumRevenue="80-90%";
ELSE CustomerDecilesbySumRevenue="90-100%";
RUN;
/* Create a new dataset Named DECILES from dataset AGGREGATE to calculate sum revenue for
each decile and percentage of sum revenue by total revenue for each decile */;
DATA DECILES;
SET AGGREGATE(KEEP=ID SumRevenue CustomerDecilesbySumRevenue);
BY CustomerDecilesbySumRevenue DESCENDING SumRevenue;
IF FIRST.CustomerDecilesbySumRevenue THEN SumDecileRevenue=SumRevenue;
ELSE SumDecileRevenue+SumRevenue;
IF LAST.CustomerDecilesbySumRevenue;
MERGEVAL=1;
IF FIRST.MERGEVAL THEN TotalRevenue=SumDecileRevenue;
ELSE TotalRevenue+SumDecileRevenue;
RUN;
DATA TOTAL1;
67
SET DECILES(KEEP=TotalRevenue) END=EOF;
MERGEVAL=1;
BY TotalRevenue;
RENAME TotalRevenue=Total_Revenue;
IF EOF;
RUN;
DATA DECILES;
MERGE DECILES TOTAL1;
BY MERGEVAL;
DROP MERGEVAL ID SumRevenue TotalRevenue;
PercentageofTotalRevenue=SumDecileRevenue/Total_Revenue;
FORMAT PercentageofTotalRevenue percent8.5;
DROP Total_Revenue;
RUN;
PROC PRINT DATA=DECILES;
RUN;
/* PROC part */;
/* PROC part for dataset MODIFY */;
/* Calculate Sum of RealRevenue on each Quarter */;
DATA MODIFY;
SET MODIFY;
TITLE "Sum RealRevenue by Quarter";
PROC SORT DATA=MODIFY;
68
BY Quarter;
PROC MEANS SUM DATA=MODIFY;
VAR RealRevenue;
CLASS Quarter;
RUN;
/* Calculate Sum of RealRevenue on each PaymentMethod */;
TITLE "Sum RealRevenue by PaymentMethod";
PROC SORT DATA=MODIFY;
BY PaymentMethod;
PROC MEANS SUM DATA=MODIFY;
VAR RealRevenue;
CLASS PaymentMethod;
RUN;
/* Calculate Sum of RealRevenue on each OrderIndicator */;
TITLE "Sum RealRevenue by OrderIndicator";
PROC SORT DATA=MODIFY;
BY OrderIndicator;
PROC MEANS SUM DATA=MODIFY;
VAR RealRevenue;
CLASS OrderIndicator;
RUN;
69
/* Multiple Linear Regression Model on each customer order (RealRevenue is dependent variable;
Quantity, CatelogOrder, WebOrder and CreditCard are independent variables) */;
TITLE "Multiple Linear Regression Model on RealRevenue";
PROC REG PLOTS(MAXPOINTS=20000);
MODEL RealRevenue = Quantity WebOrder CreditCard Quarter3;
RUN;
/* PROC part for dataset AGGREGATE */;
/* K-Means Clustering */;
DATA AGGREGATE;
SET AGGREGATE;
PROC STDIZE DATA=AGGREGATE OUT=STANDARD METHOD=STD;
VAR NumberofOrder SumRevenue DaysSinceLastOrder;
RUN;
DATA AGGREGATE;
SET AGGREGATE;
PROC FASTCLUS DATA=STANDARD OUT=CLUSTER
MAXCLUSTERS=5 MAXITER=100;
VAR NumberofOrder SumRevenue DaysSinceLastOrder;
RUN;
/* Merge Dataset AGGREGATE and Dateset CLUSTER by ID */;
PROC SORT DATA=AGGREGATE;
BY ID;
RUN;
70
PROC SORT DATA=CLUSTER;
BY ID;
RUN;
DATA AGGREGATE;
MERGE AGGREGATE CLUSTER(KEEP=ID CLUSTER);
BY ID;
RUN;
/* Descriptive Statistics on each Cluster */;
PROC SORT DATA=AGGREGATE;
BY Cluster;
TITLE "Descriptive Statistics on each Cluster";
PROC MEANS MEAN N MAX MIN DATA=AGGREGATE;
VAR NumberofOrder SumRevenue DaysSinceLastOrder;
CLASS Cluster;
RUN;
/* Histogram and Scatter Diagram of SumRevenue and DaysSinceLastOrder */;
TITLE "Histogram and Scatter Diagram of SumRevenue and DaysSinceLastOrder";
DATA AGGREGATE;
SET AGGREGATE;
ODS Graphics ON;
PROC CORR DATA=AGGREGATE NOMISS
PLOTS(MAXPOINTS=20000)=MATRIX(HISTOGRAM);
VAR SumRevenue DaysSinceLastOrder;
71
RUN;
ODS Graphics OFF;
RUN;
/* Histogram and Scatter Diagram of SumRevenue and NumberofOrder */;
TITLE "Histogram and Scatter Diagram of SumRevenue and NumberofOrder";
DATA AGGREGATE;
SET AGGREGATE;
ODS Graphics ON;
RUN;
PROC CORR DATA=AGGREGATE NOMISS
PLOTS(MAXPOINTS=20000)=MATRIX(HISTOGRAM);
VAR SumRevenue NumberofOrder;
RUN;
ODS Graphics OFF;
RUN;
/* Sum Revenue by Customer type */;
TITLE "Sum Revenue by Customer type";
DATA AGGREGATE;
SET AGGREGATE;
PROC SORT DATA=AGGREGATE;
BY Customertype;
PROC MEANS SUM DATA=AGGREGATE;
VAR SumRevenue;
72
CLASS Customertype;
RUN;
/* Sum Revenue by Cluster */;
TITLE "Sum Revenue by Cluster ";
DATA AGGREGATE;
SET AGGREGATE;
PROC SORT DATA=AGGREGATE;
BY Cluster;
PROC MEANS SUM DATA=AGGREGATE MAXDEC=2;
VAR SumRevenue;
CLASS Cluster;
RUN;
/* Export Datasets MODIFY, AGGREGATE and DECILES as excel files to make graphics */;
PROC EXPORT DATA=MODIFY OUTFILE="C:\SAS\Modify.xlsx" DBMS=xlsx Replace;
PROC EXPORT DATA=AGGREGATE OUTFILE="C:\SAS\Aggregate.xlsx" DBMS=xlsx Replace;
PROC EXPORT DATA=DECILES OUTFILE="C:\SAS\Deciles.xlsx" DBMS=xlsx Replace;
RUN;
II. External Analysis
73
/********** external data analysis ***********/
/* Source File: nycdata2.xlsx */
/* Source Path: /folders/myfolders/Mylib */
/* Code generated on: Sunday, April 10, 2016 3:13:39 PM */
%web_drop_table(mylib.nycdata);
FILENAME REFFILE "C:\SAS\nycdata2.xlsx" TERMSTR=CR;
PROC IMPORT DATAFILE=REFFILE
DBMS=XLSX
OUT=mylib.nycdata;
GETNAMES=YES;
RUN;
PROC CONTENTS DATA=mylib.nycdata; RUN;
%web_open_table(mylib.nycdata);
/* View data */
DATA mydata;
set mylib.nycdata;
run;
proc print data=mylib.nycdata;
run;
/* new variable showing percent of zipcode with more than a bachelors*/
data mylib.nycdata;
set mylib.nycdata;
pctbachormore= bachelors_or_more/ total_population;
74
run;
proc print data=mylib.nycdata;
run;
/* new variable showing percent of zipcode households that are families*/
data mylib.nycdata;
set mylib.nycdata;
pctfamily= total_family_households/ total_population;
run;
proc print data=mylib.nycdata;
run;
/* Histogram of students enrolled in secondary school */
DATA mydata;
set mylib.nycdata;
PROC UNIVARIATE;
Var students_prek_through_12;
Histogram;
RUN;
/* Histogram of Median Income */
DATA mydata;
set mylib.nycdata;
PROC UNIVARIATE;
Var Median_Household_Income;
Histogram;
75
RUN;
/* Histogram of Educated Population */
DATA mydata;
set mylib.nycdata;
PROC UNIVARIATE;
Var pctbachormore;
Histogram;
RUN;
/* Basic Descriptive Stats */
DATA mydata;
set mylib.nycdata;
PROC Means N Mean STD Min Max;
VAR TOTAL_POPULATION TOTAL_HOUSEHOLDS TOTAL_FAMILY_HOUSEHOLDS
SINGLE_FATHER_HOUSEHOLDS SINGLE_MOTHER_HOUSEHOLDS
NONFAMILY_HOUSEHOLDS TOTAL_SCHOOL_ENROLLMENT
STUDENTS_PREK_THROUGH_12 INCOME_BELOW_POVERTY_LEVEL
MARRIED_COUPLE_FAMILIES
INCOME_UNDER_10K INCOME_10K_TO_14_9K INCOME_15K_TO_19_9K
INCOME_20K_TO_24_9K INCOME_25K_TO_29_9K
INCOME_30K_TO_34_9K INCOME_35K_TO_39_9K INCOME_40K_TO_44_9K
INCOME_45K_TO_49_9K INCOME_50K_TO_59_9K
INCOME_60K_TO_74_9K INCOME_75K_TO_99_9K INCOME_100K_TO_124_9K
INCOME_125K_TO_149_9K INCOME_150K_TO_199_9K
76
INCOME_200K_PLUS MEDIAN_HOUSEHOLD_INCOME HIGH_SCHOOL_DIPLOMA
BACHELORS GRADUATE BACHELORS_OR_MORE pctbachormore pctfamily;
Run;
/* Quartiles */
DATA mydata;
set mylib.nycdata;
PROC MEANS q1 median q3 max;
Var total_family_households TOTAL_SCHOOL_ENROLLMENT
STUDENTS_PREK_THROUGH_12 MEDIAN_HOUSEHOLD_INCOME pctbachormore;
Run;
/* K-Means Clustering */
proc stdize data=mylib.nycdata out=Stand method=std;
var
STUDENTS_PREK_THROUGH_12 MEDIAN_HOUSEHOLD_INCOME pctbachormore;
run;
proc fastclus data=mylib.nycdata out=Clust
maxclusters=4 maxiter=100;
var
STUDENTS_PREK_THROUGH_12 MEDIAN_HOUSEHOLD_INCOME pctbachormore;
run;
/* frequency table showing counties and clusters */
proc freq data=Clust;
tables county*Cluster;
77
run;
/* plot clusters */
proc candisc data=Clust out=Can noprint;
class Cluster;
var STUDENTS_PREK_THROUGH_12 MEDIAN_HOUSEHOLD_INCOME pctbachormore;
run;
proc sgplot data=Can;
scatter y=median_household_income x=students_prek_through_12/ group=Cluster;
run;
/***** cluster 1************/
/* descriptives*/
DATA clusters;
set work.Clust;
where
CLUSTER = 1;
PROC Means N Mean STD Min Max;
VAR TOTAL_POPULATION TOTAL_HOUSEHOLDS TOTAL_FAMILY_HOUSEHOLDS
SINGLE_FATHER_HOUSEHOLDS SINGLE_MOTHER_HOUSEHOLDS
NONFAMILY_HOUSEHOLDS TOTAL_SCHOOL_ENROLLMENT
STUDENTS_PREK_THROUGH_12 INCOME_BELOW_POVERTY_LEVEL
MARRIED_COUPLE_FAMILIES
INCOME_UNDER_10K INCOME_10K_TO_14_9K INCOME_15K_TO_19_9K
INCOME_20K_TO_24_9K INCOME_25K_TO_29_9K
78
INCOME_30K_TO_34_9K INCOME_35K_TO_39_9K INCOME_40K_TO_44_9K
INCOME_45K_TO_49_9K INCOME_50K_TO_59_9K
INCOME_60K_TO_74_9K INCOME_75K_TO_99_9K INCOME_100K_TO_124_9K
INCOME_125K_TO_149_9K INCOME_150K_TO_199_9K
INCOME_200K_PLUS MEDIAN_HOUSEHOLD_INCOME HIGH_SCHOOL_DIPLOMA
BACHELORS GRADUATE BACHELORS_OR_MORE pctbachormore pctfamily;
Run;
/* histograms */
DATA clusters;
set work.clust;
where cluster = 1;
PROC UNIVARIATE;
Var STUDENTS_PREK_THROUGH_12;
Histogram;
RUN;
/* confidence intervals */
PROC MEANS DATA = work.clust alpha =.05 CLM;
where cluster = 1;
VAR median_household_income STUDENTS_PREK_THROUGH_12 pctbachormore;
Run;
/******* cluster 2*********/
/* descriptives*/
79
DATA clusters;
set work.Clust;
where
CLUSTER = 1;
PROC Means N Mean STD Min Max;
VAR TOTAL_POPULATION TOTAL_HOUSEHOLDS TOTAL_FAMILY_HOUSEHOLDS
SINGLE_FATHER_HOUSEHOLDS SINGLE_MOTHER_HOUSEHOLDS
NONFAMILY_HOUSEHOLDS TOTAL_SCHOOL_ENROLLMENT
STUDENTS_PREK_THROUGH_12 INCOME_BELOW_POVERTY_LEVEL
MARRIED_COUPLE_FAMILIES
INCOME_UNDER_10K INCOME_10K_TO_14_9K INCOME_15K_TO_19_9K
INCOME_20K_TO_24_9K INCOME_25K_TO_29_9K
INCOME_30K_TO_34_9K INCOME_35K_TO_39_9K INCOME_40K_TO_44_9K
INCOME_45K_TO_49_9K INCOME_50K_TO_59_9K
INCOME_60K_TO_74_9K INCOME_75K_TO_99_9K INCOME_100K_TO_124_9K
INCOME_125K_TO_149_9K INCOME_150K_TO_199_9K
INCOME_200K_PLUS MEDIAN_HOUSEHOLD_INCOME HIGH_SCHOOL_DIPLOMA
BACHELORS GRADUATE BACHELORS_OR_MORE pctbachormore pctfamily;
Run;
/* histograms */
DATA clusters;
set work.clust;
80
where cluster = 2;
PROC UNIVARIATE;
Var STUDENTS_PREK_THROUGH_12;
Histogram;
RUN;
/* confidence intervals */
PROC MEANS DATA = work.clust alpha =.05 CLM;
where cluster = 2;
VAR median_household_income STUDENTS_PREK_THROUGH_12 pctbachormore;
Run;
/********** cluster 3********/
/* descriptives*/
DATA clusters;
set work.Clust;
where
CLUSTER = 1;
PROC Means N Mean STD Min Max;
VAR TOTAL_POPULATION TOTAL_HOUSEHOLDS TOTAL_FAMILY_HOUSEHOLDS
SINGLE_FATHER_HOUSEHOLDS SINGLE_MOTHER_HOUSEHOLDS
NONFAMILY_HOUSEHOLDS TOTAL_SCHOOL_ENROLLMENT
STUDENTS_PREK_THROUGH_12 INCOME_BELOW_POVERTY_LEVEL
MARRIED_COUPLE_FAMILIES
81
INCOME_UNDER_10K INCOME_10K_TO_14_9K INCOME_15K_TO_19_9K
INCOME_20K_TO_24_9K INCOME_25K_TO_29_9K
INCOME_30K_TO_34_9K INCOME_35K_TO_39_9K INCOME_40K_TO_44_9K
INCOME_45K_TO_49_9K INCOME_50K_TO_59_9K
INCOME_60K_TO_74_9K INCOME_75K_TO_99_9K INCOME_100K_TO_124_9K
INCOME_125K_TO_149_9K INCOME_150K_TO_199_9K
INCOME_200K_PLUS MEDIAN_HOUSEHOLD_INCOME HIGH_SCHOOL_DIPLOMA
BACHELORS GRADUATE BACHELORS_OR_MORE pctbachormore pctfamily;
Run;
/* histograms */
DATA clusters;
set work.clust;
where cluster = 3;
PROC UNIVARIATE;
Var STUDENTS_PREK_THROUGH_12;
Histogram;
RUN;
/* confidence intervals */
PROC MEANS DATA = work.clust alpha =.05 CLM;
where cluster = 3;
VAR median_household_income STUDENTS_PREK_THROUGH_12 pctbachormore;
Run;
/******* cluster 4***********/
82
/* descriptives*/
DATA clusters;
set work.Clust;
where
CLUSTER = 1;
PROC Means N Mean STD Min Max;
VAR TOTAL_POPULATION TOTAL_HOUSEHOLDS TOTAL_FAMILY_HOUSEHOLDS
SINGLE_FATHER_HOUSEHOLDS SINGLE_MOTHER_HOUSEHOLDS
NONFAMILY_HOUSEHOLDS TOTAL_SCHOOL_ENROLLMENT
STUDENTS_PREK_THROUGH_12 INCOME_BELOW_POVERTY_LEVEL
MARRIED_COUPLE_FAMILIES
INCOME_UNDER_10K INCOME_10K_TO_14_9K INCOME_15K_TO_19_9K
INCOME_20K_TO_24_9K INCOME_25K_TO_29_9K
INCOME_30K_TO_34_9K INCOME_35K_TO_39_9K INCOME_40K_TO_44_9K
INCOME_45K_TO_49_9K INCOME_50K_TO_59_9K
INCOME_60K_TO_74_9K INCOME_75K_TO_99_9K INCOME_100K_TO_124_9K
INCOME_125K_TO_149_9K INCOME_150K_TO_199_9K
INCOME_200K_PLUS MEDIAN_HOUSEHOLD_INCOME HIGH_SCHOOL_DIPLOMA
BACHELORS GRADUATE BACHELORS_OR_MORE pctbachormore pctfamily;
Run;
/* histograms */
DATA clusters;
set work.clust;
83
where cluster = 4;
PROC UNIVARIATE;
Var STUDENTS_PREK_THROUGH_12;
Histogram;
RUN;
/* confidence intervals */
PROC MEANS DATA = work.clust alpha =.05 CLM;
where cluster = 4;
VAR median_household_income STUDENTS_PREK_THROUGH_12 pctbachormore;
Run;
/* more plots*/
/* scatter plots */
/* bachormore and students*/
proc sgplot data=work.clust;
reg x=pctbachormore y=STUDENTS_PREK_THROUGH_12/ lineattrs=(color=red thickness=2);
Title "The Relationship Between Education Level and Students in School";
run;
/* poverty and students*/
proc sgplot data=work.clust;
reg x=INCOME_BELOW_POVERTY_LEVEL
y=STUDENTS_PREK_THROUGH_12/ lineattrs=(color=red thickness=2);
Title " ";
run;
84
/* median income and students*/
proc sgplot data=work.clust;
reg x=INCOME_BELOW_POVERTY_LEVEL
y=STUDENTS_PREK_THROUGH_12/ lineattrs=(color=red thickness=2);
Title " ";
run;
/* students and family households */
proc sgplot data=work.clust;
reg x=total_family_households y=STUDENTS_PREK_THROUGH_12/ lineattrs=(color=red
thickness=2);
/* box plots for clusters */
PROC SORT DATA = work.clust OUT=MyDataProcessed;
BY Cluster;
proc boxplot data=mydataprocessed;
plot MEDIAN_HOUSEHOLD_INCOME*cluster;
run;
PROC SORT DATA = work.clust OUT=MyDataProcessed;
BY Cluster;
proc boxplot data=mydataprocessed;
plot students_prek_through_12*cluster;
run;
PROC SORT DATA = work.clust OUT=MyDataProcessed;
BY Cluster;
85
proc boxplot data=mydataprocessed;
plot pctbachormore*cluster;
run;
/* Hierarchical Clustering --- K MEANS RESULTS ARE MUCH BETTER */
proc cluster data=mylib.nycdata method=centroid ccc pseudo out= tree;
var TOTAL_SCHOOL_ENROLLMENT: STUDENTS_PREK_THROUGH_12:
MEDIAN_HOUSEHOLD_INCOME: pctbachormore:;
copy ZIP: CITY: COUNTY: TOTAL_POPULATION: TOTAL_HOUSEHOLDS:
TOTAL_FAMILY_HOUSEHOLDS:;
run;
proc tree data = tree noprint nclusters=3 out=out;
copy ZIP: CITY: COUNTY: TOTAL_POPULATION: TOTAL_HOUSEHOLDS:
TOTAL_FAMILY_HOUSEHOLDS:;
run;
PROC PRINT DATA=work.out;
run;