SSRN-id2424840

Electronic copy available at: http://ssrn.com/abstract=2424840

1

MEASURING E-COMMERCE CONCENTRATION EFFECTS WHEN PRODUCT POPULARITY IS

CHANNEL-SPECIFIC

Gonca Soysal

Alejandro Zentner

August, 2014

This paper uses household-level panel data from three large apparel retailers to examine how e-

commerce affects the concentration of sales across products. In our data there are remarkable

differences between the products that are popular in online versus offline channels. When the

relative popularity of products differs by channel, as in our data, we demonstrate that the

traditional long tail metrics used in the literature provide biased results regarding changes to the

concentration of sales caused by the growth in online sales. We propose an alternative metric that

allows us to measure concentration effects when product popularity varies by channel. Our results

demonstrate that ignoring differences in product popularity across channels can lead to erroneous

conclusions regarding whether e-commerce increases or decreases the concentration of sales across

products. Examining how the migration of consumers from brick and mortar to online channels

affects the anatomy of their purchases is important for guiding managerial practice.

* Gonca Soysal ([email protected]) is Assistant Professor of Marketing and Alejandro Zentner

([email protected]) is Associate Professor of Managerial Economics, Naveen Jindal School of Management,

University of Texas at Dallas. We are grateful to the Wharton Customer Analytics Initiative (WCAI) and the

anonymous retailers for making the dataset used in this study available.


2

1. Introduction

E-commerce does not necessarily need to affect all industries in the same way. Most prior

empirical studies examining how the migration of consumers from offline to online channels affect

their purchases focused on a narrow set of product categories (i.e., books and movies), but it is

unclear whether we can generalize the results from these studies to other product categories. For

example, while several academic studies focusing on the book or movie markets have found

support for the long tail hypothesis predicting a lower concentration of sales across products as

consumers move online (e.g., Brynjolfsson, Hu, and Smith 2003; Zentner, Smith, and Kaya 2013),

it is unclear whether e-commerce changes the concentration of sales in the same way for other

product categories where physical examination before purchase might be more important (e.g.,

clothing, art, furniture, eyewear, or fresh produce). In this paper we focus on the apparel industry

and use household-level panel data from three large apparel retailers in order to study how e-

commerce affects the concentration of sales across products.

The e-commerce literature distinguishes between digital versus non-digital product attributes based

on how easily these product attributes can be communicated over the Internet (e.g., Lal and

Sarvary 1999; Lee and Bell 2013; Bell, Gallino, and Moreno 2013), and products have different

combinations of digital and non-digital product attributes.1 Examining markets where digital

product attributes are prevalent have captured most of the attention in the long tail versus superstar

effects literature (e.g., books, movies). It is not clear, however, whether the results from these

studies will generalize to markets where non-digital product attributes are important in product

choice. Specifically, one key difference when examining markets where non-digital attributes are

important (e.g., apparel industry) is that brick and mortar stores have an advantage over the online

channel when purchasing certain products because they provide consumers with the opportunity to

physically examine product characteristics (e.g., personal fit, color, or texture). Since non-digital

product attributes are critical when choosing certain items (e.g., womens swimsuits), customers

may have a preference towards purchasing such items offline, which might cause a discrepancy in

product popularity across channels. We show that in our data there are remarkable differences

1 For example, price is a digital product attribute because information regarding price is easily communicated over the

Internet. Conversely, personal fit is a non-digital attribute because personal fit is not easily communicated over the

Internet.


3

between the products that are popular in the online and the offline channels. A substantial number

of products that take a large share of the transactions in one channel (are superstar products in one

channel) take a small share of the transactions in the other channel (are niche products in the other

channel), and vice versa. For instance, the data from one of our focal companies show that 14 of

the top 50 best-selling products in the online channel are not even among the top 1000 best-selling

products in the offline channel. The data from our two other companies show similar patterns.

Our objective is not restricted to examining whether or not empirical results regarding e-commerce

concentration effects from examining books or movies are generalizable to other product

categories. We also demonstrate that the metrics employed to evaluate concentration effects in the

book or movie industries should not be used to examine other product categories. We will show

that in our data online sales and sales at physical stores center on different sets of products and as

a result the locations of the online and offline sales distributions are different (see Figure 1). A

metric seeking to gauge whether the overall concentration of sales increases or decreases as

consumers move online must therefore account for the differences in not only the concentration

but also the location of the distributions of online and offline sales.

Figure 1: Distribution Concentration versus Location

The figure illustrates an example with only five products (products A through E), which are

arranged arbitrarily on the horizontal axis. The distributions 1 and 2 in the left panel, representing

online and offline sales respectively, have different concentration but are centered on the same set

of products: product B, product C, and product D are popular in both channels, and product C is

the most popular product in both channels. The distributions 3 and 4 in the right panel, representing

online and offline sales respectively, have the same concentration but are centered on different sets

of products: product B is the most popular product in the online channel (distribution 3) whereas

product D is the most popular product in the offline channel (distribution 4). Figure 1 only

represents an example where we have arbitrarily designed the distributions of sales.

4

Most prior studies on the long tail use data from either online or offline markets exclusively to

define product sales ranks and assume that product popularity online and offline are either equal or

at least similar. We will show that using the traditional long tail metrics (not accounting for

differences in locations of the distributions of online and offline sales) for our focal industry may

result in seemingly contradictory results regarding whether the concentration of sales increases or

decreases as consumers move online. For instance, studies using data from offline sales

exclusively to determine product sales ranks will incorrectly tend to find large long tail effects

when the distributions of online and offline sales concentrate around different sets of products.

However, when correctly interpreted, the results using a metric based exclusively on offline sales

intuitively suggest that, as consumers move from offline to online markets, sales concentrate

around products that are popular online and away from products that are popular offline.

Conversely, studies using data from online sales exclusively to determine product sales ranks will

tend to find spurious superstar effects because when consumers move from offline to online

markets sales concentrate around those products that are popular online and niche offline.

Because traditional long-tail metrics do not provide information on the overall concentration e-

commerce effects when sales online and offline concentrate around different sets of products, in

this paper we propose a metric that allows the measurement of the overall concentration effects

accounting for the differences in the locations of the online and offline sales distributions. Using

our proposed metric, we find long tail effects when consumers use the online channel more

frequently for two of our retailers and no changes in concentration for the other retailer. We also

show how biased the estimates can be, either toward finding large long tail effects or large

superstar effects, when using metrics based on either online sales or offline sales exclusively.

We believe that our results are important for the long tail literature. The examination of long tail

versus superstar e-commerce effects has focused on a narrow set of products and has overlooked

the possible existence of differences in product popularity across channels. Our analyses

demonstrate that these differences are substantial for the apparel industry, and our results show

how ignoring these differences can cause incorrect conclusions regarding whether e-commerce

increases or decreases the concentration of sales and, more importantly, regarding the size of the

concentration effects. Our paper demonstrates that long tail versus superstar examinations must

5

account for cross channel popularity differences. Examining how prevalent product popularity

differences are across channels for other industries is an important avenue for future research.

Studying how the concentration of sales changes as consumers move online is not only important

for the academic literature but also important for managerial practice. Our results show that

producers and retailers in this industry should consider the differences in product popularity in the

online versus offline channels, and shift their efforts toward products that are popular in the online

channel as consumers move online. Our results suggest that two of our focal companies should

shift their efforts toward a larger set of products as e-commerce gains market share relative to sales

at brick and mortar stores, but our results also suggest the third company should not change its

overall variety as consumers move online. Comparing the size of our concentration effect

estimates and those from using traditional metrics is important for managers when making

decisions over variety; our concentration effect estimates are bracketed by large spurious long tail

effects when measuring product popularity based on offline sales exclusively and large spurious

superstar effects when measuring product popularity based on online sales exclusively.

2. Literature

Our paper contributes to a growing literature on how information technology is affecting the

anatomy of consumer purchases, and most directly to how e-commerce affects sales concentration

patterns. The empirical studies in this literature can be categorized by the type of data they employ.

Using cross-sectional data, Brynjolfsson, Hu, and Smith (2003) find that online book retailers offer

a wider variety than physical stores do, and also show that a large proportion of online book

purchases occur for titles not stocked in brick and mortar stores. Brynjolfsson, Hu, and Simester

(2011) also use cross-sectional data to examine sales of clothing for a retailer selling via both the

Internet and the catalog channels. They find that the concentration of sales is lower for the Internet

channel compared to the catalog channel. Although Brynjolfsson, Hu, and Simester (2011) also

examine the clothing industry, our study is different because neither the catalog channel nor the

Internet channel allows for an easy evaluation of non-digital product attributes, unlike the brick

and mortar channel. Brynjolfsson et al. (2009) use cross-sectional data to study the nature of

competition between brick-and-mortar and internet retailers. They show that internet retailers face

significant competition from brick and mortar retailers when selling popular products, but the

6

competition is not as intense for niche products. Using aggregated data at the level of the movie

title, Elberse and Oberholzer-Gee (2007) study how online commerce affected the distribution of

video sales from 2000 to 2005. They find that the tail is longer in 2005, but also in 2005, superstar

products took a larger proportion of sales than ever before. Waldfogel (2012) uses data aggregated

by albums, and shows that Internet markets decrease the concentration of music sales in a few

artists. Using panel data Zentner, Smith, and Kaya (2013) examine the movie rental market, and

find that superstar DVD titles take a smaller share of the market as the closure of physical stores

made consumers shift from offline to online marketplaces. Goldfarb et al. (2013) also use panel

data, and show how e-commerce might produce long tail effects by decreasing social inhibitions.

While Pozzi (2012) does not focus on how e-commerce may affect the concentration of sales per

se, he examines brand exploration online and offline by using panel data on grocery shopping. He

finds that brand exploration for groceries is more prevalent at physical stores than in online

marketplaces. Forman et al. (2009) use location and product level monthly panel data from

Amazon.com on sales of books, and show that when a physical store opens locally peoples online

purchases of the nationally most popular products decline relative to the purchases of products

unlikely to be popular or available offline. Our study also uses panel data and documents that for

the apparel industry a substantial number of products that are superstars in one channel are niche in

the other channel. We also show how to measure overall sales concentration in this setting (i.e.,

when the distributions of online and offline sales center on different sets of products).

Our results also contribute to the literature on how recommendation systems and popularity lists

affect the sales of popular and niche products. In this literature Tucker and Zhang (2011) study the

impact of popularity information on sales, arguing that titles with niche appeal may benefit from

being listed in popular product lists more than general appeal products do. Likewise, Fleder and

Hosanagar (2009) and Oestreicher-Singer and Sundararajan (2011) analyze how peer-based

automated recommendation lists influence online preferences for long tail versus superstar

products, with the former authors finding that recommendation lists can either increase or decrease

sales of long tail products, and the latter authors finding that product categories that are more

sensitive to recommendation networks are also more likely to have higher sales of long tail

products. In contrast to our study, these studies focus on the online market exclusively and do not

examine sales from physical stores or cross-channel choices.

7

3. Data and Setting

Our data come from three different brands of a North American specialty apparel retailer.

Although these three brands belong to the same parent company, each brand has its own brick-and-

mortar stores and an online store selling the brands independent and exclusive line of clothing and

home accessories. The stores from the three brands are not co-located and the brands are managed

independently. Our data cover a two year period from July 1st, 2010 through June 30th, 2012. For

each brand, the company random sampled a total of 14,000 customers from all the customers who

have been active during the two-year data period. However, the total number of customers we

observe for each brand exceeds 14,000 customers due to cross-shopping across brands (when

customers are sampled for a brand, the information is collected for the transactions from all three

brands). For each customer, we observe purchases from both the retailers physical stores and the

online store. The company matches credit and debit card purchases to a specific customer using

card numbers and customer names, and matches cash purchases using e-mail addresses, which the

store clerks are trained to request. For each purchase event, we observe the purchase channel,

physical store number for transactions made at physical stores, list of purchased items, and number

of units and dollar amount for each item purchased. In addition to the detailed transaction data, for

each customer we also observe the gender, age, date of first purchase from the retailer, and

geographic location (latitude and longitude of the census block of the customers residence).2

Table 1 presents summary statistics for our data. Over our two year period of analysis, we observe

a total of 197,220 transactions for Brand A; 19% of these transactions (or 36,528 transactions) took

place online. For Brand B we observe a total of 148,069 transactions; 27% (or 40,162 transactions)

took place online. For Brand C we observe a total of 37,225 transactions; 47% (or 17,453

transactions) took place online.

Table 1 shows that the share of online business is rather large for all three brands, and that a

substantial fraction of customers use both the online and offline channels. Out of 24,072 unique

customers who made a purchase from Brand A, 42% (or 10,079 consumers) ever made a purchase

online, 92% (or 22,206 customers) ever made a purchase offline, and 34% (or 8,213 customers

2 Census blocks are small geographic units. According to the Census, Census Blocks are generally defined to contain

between 600 and 3,000 people. http://www.census.gov/geo/reference/gtc/gtc_bg.html

8

used both channels. For Brand B, there are a total of 27,008 unique customers who made a

purchase, 51% (or 13,826 customers) ever made a purchase online, 87% (or 23,582 customers)

ever made a purchase offline, and 39% (or 10,400 customers) used both channels. For Brand C,

there are a total of 12,035 unique customers made a purchase in our sample, 55% (or 6,587

customers) made a purchase online, 67% (or 8,012 customers) made a purchase offline, and 21%

(or 2,564 customers) used both channels.

Because a substantial portion of the customers in our data are multichannel and we also observe

several transactions per customer during our observation period, in our empirical analysis we are

able to exploit the panel nature of the data. Customers made on average 7.31 transactions offline

and 3.62 transactions online for Brand A, 4.64 transactions offline and 2.91 transactions online for

Brand B, and 2.54 transactions offline and 2.65 transactions online for Brand C.

Customers purchased similar number of units per transaction when they used the online versus the

offline channel. For Brand A (Brand B, Brand C) they purchased an average of 2.54 (2.96, 2.01)

units when they used the online channel and 2.63 (2.60, 1.89) units when they used the offline

channel. In spite of buying similar number of units per transaction from both channels, it is

interesting that the dollar values of the transactions were smaller in the physical store channel than

in the online channel.

Table 1: Summary Statistics

Standard deviations are in parentheses.

Online Offline Online Offline Online Offline

# of Transactions by Channel 36,528 160,692 40,162 107,907 17,453 19,772

# of Transactions Overall

# of Customers by Channel 10,079 22,206 13,826 23,582 6,587 8,012

# of Multichannel Customers

# of Customers Overall

Average # of Transactions Per Customer

3.62

(7.15)

7.31

(11.98)

2.91

(3.48)

4.64

(6.55)

2.65

(6.08)

2.54

(5.68)

Average Transaction Size (# of Units)

2.54

(2.97)

2.63

(1.77)

2.96

(2.34)

2.60

(1.82)

2.01

(1.63)

1.89

(1.40)

AverageTransaction Size ($)

159.97

(170.3)

115.12

(99.37)

96.17

(76.29)

70.56

(62.34)

145.54

(153.50)

113.74

(106.54)

10,400 2,564

Brand A Brand B Brand C

24,072 27,008 12,035

197,220 148,069 37,225

8,213

9

A prior stream of literature that investigated the impact of migration of customers from offline to

online channels on the distribution of sales across products made a distinction between popular and

niche products based on the number of transactions made for each product. These previous studies

used sales ranks from one channel (either offline or online) to classify products as either popular or

niche. This approach is valid for categories like books or movies where the distributions of sales

across products are likely to be similar for the online and offline channels, and popular products in

one channel are also popular in the other channel. One possible explanation for this similarity is

that the need to physically examine the product before purchase is minimal for product categories

such as books or movies: digital product attributes are prevalent in these product categories

allowing consumers to easily evaluate product features and characteristics both online and offline.

However, the ability to physically examine the product before purchase (a benefit unique to the

offline channel) is more important for some product categories where non-digital product attributes

are more prevalent. For example, in the apparel industry, the importance of actual color, quality of

materials, and personal fit (i.e., the non-digital attributes) for some products might result in large

differences between the distributions of sales across products for the online and offline channels.

Products where physical examination is important (e.g., a womans swim suit) might be more

popular in the offline channel whereas products where physical examination is relatively less

important (e.g., a mens classic white dress shirt) might be more popular in the online channel. If

the distributions of sales across products are substantially different for the online and the offline

channels, examining sales from only one channel in order to classify goods as popular versus niche

would be misleading.

For our empirical analysis we must also specify the appropriate level in the hierarchy for product

definition. Unlike the case of movies or books where the title may suffice to identify a unique

product,3 in the apparel industry a unique style (design) is often offered in a variety of sizes and

colors. One can therefore either conduct the analysis at the style level (aggregating over different

sizes and colors) or at the unique SKU level (SKUs are unique identifiers for a color and size

within a specific style). We conduct our analysis at the style level in the main text, and replicate

the analysis at the SKU level in the appendix as a robustness check.

3 There are also some issues when defining products for the movie or book industries. For example, should a movie in

DVD and the same movie in Blu-ray disc be classified as the same product?

10

Figure 2 shows the commonality of superstar products across the online and offline channels for

our observation period for all three brands. Areas B, C, and D in Figure 2 represent the 100

products with highest overall sales (Top 100 Overall). The products in area B are among the Top

100 in sales in the offline channel and in overall sales but not among the Top 100 in sales in the

online channel, and the products in area D are among the Top 100 in sales in the online channel

and in overall sales but not among the Top 100 in sales in the offline channel. The products in area

C (the middle intersection) are in the Top 100 in all overall sales, online sales, and offline sales.

The low commonality of top products online and offline is remarkable for all three brands. Only 43

products for Brands A and B and 45 products for Brand C rank within the Top 100 in sales in both

the online and offline channels; these statistics demonstrate the important differences in superstar

products across the online and offline channels.

The area E in Figure 2 represents the products that are Top 100 from the online channel but not

Top 100 when considering overall sales from both channels. This area is rather large for all brands

(50 for Brand A, 57 for Brand B, and 36 for Brand C).

The differences in product assortment online versus offline are not as wide in our focal industry

compared to other industries examined in prior research, where product assortment is substantially

larger online than at physical stores (e.g., see Zentner, Smith, and Kaya 2013 for statistics on

product assortment online versus offline in the movie rental industry). Our data suggest that almost

all top products either online or offline were available from both channels. For example, we can

investigate product assortment by channel for the products in Figure 2 by examining products with

zero sales in either the online or offline channel. In Figure 2 for Brand A (Brand B, Brand C) only

3 (4, 0) out of the 57 (57, 55) products that are among the Top 100 in the offline channel and not

among the Top 100 in the online channel actually have zero sales in the online channel. Similarly,

in Figure 2 for Brand A (Brand B, Brand C) only 6 (10, 6) out of the 57 (57, 55) products that are

among the Top 100 in the online channel and not among the Top 100 in the offline channel have

zero sales in the offline channel. Although our focus in this paper is not on explaining whether the

concentration e-commerce effects arise from the demand side (e.g., the way consumers search

online versus offline) or the supply side (e.g., assortment differences online versus offline), these

statistics suggest that the cross-channel popularity differences that we document are likely to be

driven by the demand side and not by cross-channel differences in product assortment.

11

The offline channel has a much larger share of overall sales compared to the online channel -- 81%

of all transactions for Brand A take place in brick and mortar stores, 73% for Brand B, and 53%

for Brand C. Thus, the list of popular products in overall sales is dominated by products that are

popular from the offline channel. As we explain below, the dominance of the offline channel also

invalidates the use of overall sales (adding the sales from online and offline channels) to classify

products as popular or niche when there are differences in product popularity across online and

offline channels.

Figure 2: Commonality in Popular Products in Online and Offline Channels

Brand A

Brand B

12

Brand C

In Tables 2a, 2b, and 2c we further investigate the differences in product popularity ranks online

and offline by computing popularity ranks for various thresholds. For example, the top left cell in

Table 2a for Brand A shows that only 21 of the top 50 products offline are also within the top 50

products in the online channel. Table 2a also shows that some products can be very popular in one

channel and have very low sales in the other channel (e.g., for Brand A 10 of the top 50 online

products are not even among top 1000 products offline). Tables 2b and 2c show similar patterns

for Brands B and C.

Table 2a: Comparison of Popular Products Offline vs. Online for Brand A for

Various Rank Thresholds

13

Table 2b: Comparison of Popular Products Offline vs. Online for Brand B for


Table 2c: Comparison of Popular Products Offline vs. Online for Brand C for


While Figure 2 reveals the differences in superstar products in online and offline markets, Tables

2a, 2b, and 2c further reveal that a substantial number of products are superstar in one channel and

niche in the other channel. Together these statistics demonstrate that a metric seeking to examine

the existence of long tail effects as consumers move online must account for the differences in the

locations of the distributions of online versus offline sales.

Table 3 presents summary statistics showing the concentration of sales for all brands in the online

versus offline channels. We measure the concentration of sales by channel using the cumulative

share of transactions taken by the top-ranked products in each channel. Computing concentration

in a channel looking at the share of best sellers in that channel allows us to isolate the differences

in the concentration of the sales distributions across channels from the positions of these

distributions. Table 3 shows that, once we control for the differences in product popularity across

the channels, the cumulative share of transactions taken by the top-ranked products are slightly

14

larger offline compared to online for Brands A and B, and larger online compared to offline for

Brand C.

Table 3: Cumulative Share of Transactions Taken by the Top Ranked Products

in the Online and Offline Channels

Brand A

Brand B

Brand C

4. Econometric Model

Our goal in this paper is to establish how the concentration of sales is expected to change when

consumers gravitate toward online channels. While previous research has found that e-commerce

is expected to drive sales from the most popular products in the head of the popularity distribution

toward less popular products in the tail of the popularity distribution, we note previously that these

results may not generalize to the specialty apparel industry because of the likely differences in the

15

way individuals search for clothing compared to the way individuals search for books or movies

that have captured most of the attention in prior research.

Moreover, the summary statistics presented previously show that in our focal industry there are

remarkable differences between the popularity rankings of products across online versus offline

markets: online and offline sales concentrate around different sets of products. Overlooking the

differences in the locations of the distributions of online and offline sales may lead to misleading

managerial implications for producers and retailers in the specialty apparel industry in deciding

whether they should focus their production and stocking efforts on either long tail or superstar

products as consumers move online.

Our empirical analysis uses individual and transaction level panel data. For each transaction in the

data made by household i on date t we define a dummy variable indicating whether the transaction

was made online or offline:

{

For each transaction we also define the share of products in the transaction taken by the purchases

of top 100 products as follows:

The superscript m indicates that the sales ranking used to compute the numerator of

is calculated using monthly sales, and the superscript c indicates which one of

the following three different ways is used to compute the monthly sales ranks of products:

a- Using sales from the physical store channel exclusively;

b- Using sales from the online channel exclusively;

c- Using sales from the physical store channel for transactions made at physical stores and

using sales from the online channel for transactions made online, i.e.:

16

{

While the first two metrics have been traditionally used to measure how e-commerce affects the

concentration of sales, these metrics are less useful in our context due to the large differences

between the popularity rankings of products across online versus offline markets: niche products in

the offline market might be popular products in the online market and vice versa. In this context, it

becomes necessary to disentangle the differences in the spread (concentration) of the distributions

of online and offline sales from the differences in the location of the distributions of online and

offline sales, since these distributions concentrate around different sets of products (see Figure 1

above).4 By separately evaluating the popularity ranks for transactions made at physical stores

versus online, our third and proposed metric is useful for examining whether consumers purchases

become more or less concentrated when they move to the online market in a way that is not

contaminated by the differences in the location of the distributions of online and offline sales.

Our empirical approach is to estimate fixed effects models of the following form:

( )

Model (1) examines the change in the share of products in each transaction taken by the purchases

of top 100 products when consumers move from offline to online channels (when the dummy

variable representing an online transaction goes from 0 to 1). In Model (1) represents a fixed

effect for household i and represents a fixed effect for month m. Household fixed effects

capture the heterogeneity from time-invariant characteristics of each household as well as other

characteristics that are unlikely to change substantially during our two year observation period

4 It might seem reasonable to use aggregate sales from both online and offline markets to calculate an overall ranking

of sales. However, using aggregate sales does not separate the location from the spread of the distributions of online

and offline sales. Moreover, using aggregate sales to compute ranks would provide an invalid metric in our setting

because it would give a substantially larger weight to the distribution of sales at physical stores than to the distribution

of sales online (we note above that a much larger fraction of the overall sales from our focal firms are made at physical

stores than online). For reference, we replicate our analysis using a ranking based on aggregate sales in the Appendix.

17

(e.g., preferences, income, or household size). Month fixed effects absorb aggregate shocks over

time such as time trends or seasonality shocks.

Similar to Zentner, Smith, and Kaya (2012), our panel data estimation approach accounts for the

potential sorting of heterogeneous consumers into channels by controlling for all time-invariant

characteristics of each household.

5. Ordinary Least Square Results

Table 4 presents OLS estimation results for Model (1) using data from Firms A, B, and C. All

regressions include fixed effects for each month and also include household level fixed effects.

The standard errors are clustered at the household level to allow for the possibility of serial

correlation over time.

In Column I of Table 4 we define product popularity based on sales from the online channel

exclusively. When defining popularity in this way the regression results show superstar effects for

all firms: the results show that the share of transactions taken by the top 100 online products

increases when consumers move to online markets. The predicted superstar effects are large in size

for all three firms. For Firm A the top 100 products online take 17.8% of all sales, and the

estimation results predict that the share of transactions taken by the top 100 online products would

increase by 43.3% if consumers moved all of their transactions to online markets.5 For Firms B

and C the estimation results predict that the share of transactions taken by the top 100 online

products would increase by 42.3% and 19.6% respectively if all transactions moved to online

markets.

In Column II of Table 4 we define product popularity based on sales from the physical store

channel exclusively. The results in Column II contrast with those in Column I; they show long tail

effects instead of superstar effects for all three firms: the results in Column II show that the share

of transactions taken by the top 100 offline products decreases when consumers move from offline

to online markets. The predicted long tail effects from Column II are large in size; according to the

estimates in Column II the share of transactions taken by the top 100 offline products would

5 This predicted effect is calculated using the proportion of all transactions taken by the online channel (18.9%), the

regression coefficient (0.0955), and the proportion of transactions taken by the top 100 products from the online

market (0.178): (1-0.189)x(0.0955/0.178).

18

decrease as consumers move all of their transactions to online markets (by 32.9% for Firm A,

55.8% for Firm B, and 47.8% for Firm C).

Table 4: Share Taken by Top 100 Products in Each Transaction

OLS Estimates

The estimates in Columns I and II of Table 4 provide contradictory conclusions regarding whether

online commerce produces long tail or superstar effects. Based on the results from these

regressions it is unclear whether producers and retailers should focus their resources on producing

and stocking long tail or superstar products as consumers move to online markets.

The seemingly implausible results in Columns I and II actually have an intuitive explanation, and

the explanation might have been anticipated from our summary statistics showing that there are

differences in the locations of the distributions of online and offline sales -- some popular products

at physical stores are niche in online markets and vice versa. The results in Columns I and II are

I II III

Rank Based in Online Sales Rank Based in Offline Sales Proposed Metric

Dummy Online Purchases 0.0955*** -0.0988*** 0.0038

(0.0033) (0.0029) (0.0035)

Observations 171,964 171,964 171,964

R-squared 0.1909 0.2021 0.1866

Dummy Online Purchases 0.0738*** -0.1519*** -0.0577***

(0.0027) (0.0026) (0.0030)

Observations 134,934 134,934 134,934

R-squared 0.2537 0.2889 0.261


(0.0095) (0.0087) (0.0097)

Observations 33,355 33,355 33,355

R-squared 0.411 0.5117 0.4157

Includes fixed effects for both months (24) and individuals (Company A: 22,464; Company B: 23991 ; Company C: 11,106).

Standard errors in parentheses are clustered by household.

* significant at 10%; ** significant at 5%; *** significant at 1%

For firm A, the mean of the depent variable is 0.178 in Column I, 0.242 in Column II, and 0.261 in Column III.

For firm B, the mean of the depent variable is 0.125 in Column I, 0.196 in Column II, and 0.221 in Column III.

For firm C, the mean of the depent variable is 0.329 in Column I, 0.389 in Column II, and 0.481 in Column III.

The mean of the independent variable (Dummy Online Purchases) is 0.189 for Firm A, 0.278 for Firm B, and 0.469 for Firm C.

Firm C

Firm A

Firm B

19

consistent with consumers increasing their purchases of products that are niche at physical stores

but popular in online markets as they move from offline to online markets.

It might be thought that using overall sales combining sales from both the online and offline

channels to compute popularity ranks might be the correct way of measuring concentration effects.

However, a metric seeking to measure changes in the concentration of sales as consumers move to

online markets must allow for separating the differences in the location versus the concentration of

the distributions of online and offline sales. Although ranks based on overall sales combine sales

from both channels, this metric does not separate the two relevant moments: location and

concentration of the online and offline sales distributions. Moreover, using overall sales gives a

substantially larger weight to the distribution of sales at physical stores than to the distribution of

sales online as we note above (see footnote 4 and the Appendix where we present results using

overall sales to compute sales rankings).

In Column III of Table 4 we show the results using our proposed metric that accounts for the

differences in the locations of the distributions of online and offline sales. This metric uses sales

from physical stores when evaluating popularity ranks for transactions made at physical stores, and

sales from the online channel when evaluating popularity ranks for transactions made online. The

results show no change in the concentration of sales for Firm A. When consumers buying from

Firm A move their purchases to the online channel, they buy a similar fraction of products from

the head of the online sales distribution than they were buying from the head of the offline sales

distribution. The estimation results in Column III show long tail effects for Firms B and C; these

long tail effects are, however, substantially smaller in size than when using sales from the offline

channel exclusively to define popularity ranks.

Our results demonstrate the importance of accounting for the differences in product popularity

across the online and offline channels when measuring the size of the e-commerce impact on the

concentration of sales. This is particularly important when examining industries where product

popularity online and offline present important differences, such as our focal industry where

ignoring these differences would lead to wrong conclusions regarding the effect of e-commerce on

the concentration of sales.

20

6. Instrumental Variable Results

Our regressions in the previous section allow us to control for the heterogeneity across consumers

who might sort into either the online or the physical store channel. However, a second confounding

factor may arise due to channel selection if customers who use both channels choose the specific

channel based on the types of products they desire to purchase. This confounding factor would

cause a bias in measuring concentration effects if the choice of the channel is correlated with

product popularity (e.g., consumers choose the online channel to purchase either niche or popular

products). In this section we examine e-commerce concentration effects after controlling for this

source of potential endogeneity in channel choice for multichannel consumers. In order to break

this potential endogeneity problem, we need to observe changes in channel choice that are

unrelated to the popularity of the products that the customers wish to purchase. We use the entry of

physical stores during our study period as an instrumental variable. Our focal retailer opened a

large number of physical stores for its three brands: it increased the number of physical stores by

17.6% for Brand A, 23.2% for brand B, and 89.1% for Brand C. The entry of physical stores

decreases the transportation cost for customers who live near the location of the entrant stores, and

therefore may affect their channel choice (see Forman et al. 2009 and Choi and Bell 2011). We

believe that the entry of physical stores is a valid instrument. While the entry of physical stores and

the location of the new store are obviously choices for the firms, we believe these choices are

unrelated to the relative demand for niche versus superstar products making the instrument

unrelated to the error term (i.e., exogenous).

Table 5 presents Instrumental Variables results for Firms A, B, and C. The regressions in Column I

of Table 5 present the first stage regression results examining how the entry of physical stores

affects consumers channel choices. These regressions include household level fixed effects; the

results of these regressions show that customers decrease the proportion of purchases from the

online channel and increase the proportion of purchases from physical stores when physical stores

enter close to where they live. For Firm A some of the estimates for the coefficients on the

dummies indicating the presence of store near the consumers location in each month are not

statistically significant; for Firms B and C the coefficient estimates on all dummies indicating the

presence of a store near the consumers location are statistically significant and also large in size.

For example, using the first stage coefficient estimates for Firm B shows that consumers who did

21

not have a store within 20 miles from where they live would increase the likelihood of buying from

the physical store by 25.9% when a physical store enters within a mile of where they live.

Column II in Table 5 presents the second stage instrumental variable results. These results are

consistent with the OLS results in Column IV of Table 4: they show no statistically significant

changes in concentration for Firm A (the positive sign in fact suggests superstar effects) and long

tail effects for Firms B and C.

Unlike the OLS results in the previous section, the Instrumental Variable results account for the

types of products that customers select to purchase online versus offline. Thus, comparing the OLS

results in Table 4 that are affected by the selection of the channel and the Instrumental Variable

results in Table 5 that account for the selection of the channel we can speculate about whether

consumers choose to purchase niche versus popular products in online versus offline markets. The

Instrumental Variable results for Firms B and C in Table 5 predict a longer tail compared to the

OLS results in Column IV of Table 4, suggesting that consumers actually select the online channel

to purchase relatively more superstar products than when selecting the physical store channel. Any

bias in long tail effect estimates from the OLS regressions for Firms B and C would therefore be in

the direction of finding superstar effects. For Firm A, the size of the instrumental variable

coefficient estimate in Table 5 shows more superstar effects than the size of the OLS coefficient

estimate in Table 4, suggesting that consumers select the online channel to purchase relatively

more long tail products than when selecting the brick and mortar store channel (although neither

the OLS nor the IV coefficient estimates are statistically significant for Firm A).

22

Table 5: Share Taken by Top 100 Products in Each Transaction

IV Estimates

I II

First Stage Second Stage - Proposed Metric

Dummy Store between 0 and 1 Miles -0.0760** na

(0.0321) na

Dummy Store between 1 and 3 Miles -0.0294 na

(0.0235) na

Dummy Store between 3 and 10 Miles -0.0569*** na

(0.0210) na


(0.0265) na

Dummy Online Purchases na 0.0946

na (0.2727)

Observations 171,964 171,964

R-squared 0.4259 na


(0.0434) na


(0.0339) na


(0.0235) na


(0.0234) na

Dummy Online Purchases -0.1100*

(0.0668)


R-squared 0.4495 na


(0.0927) na


(0.0646) na


(0.0402) na


(0.0460) na

Dummy Online Purchases -0.1800*

(0.1038)


R-squared 0.6417 na

Includes fixed effects for both months (24) and individuals (Company A: 22,464; Company B: 23,991; Company C:11,106:).



Firm A

Firm C

Firm B

23

7. Conclusion

While the long tail hypothesis was considered one of the best ideas of 2005 by industry

observers (Businessweek 2005), most empirical examinations on the long tail hypothesis have

focused on a narrow set of products categories: movies and books. One important question we

raise in this paper is whether prior findings regarding the long tail from examining movies or

books generalize to other contexts, in particular industries where physical examination before

purchase might be more prevalent than the movie and book industries.

Our empirical analysis focuses on the apparel industry, where non-digital product attributes are

more prevalent and thus physical examination of product characteristics such as personal fit, color,

or texture is more important than it is for movies or books. We use a unique individual-level panel

data set of purchase transactions from three specialty apparel retailers that operate both brick and

mortar and online channels. We demonstrate that sales online and offline are substantially different

in our focal industry, to an extent that creates wide differences between the sets of products that

are popular online versus offline. This characteristic of our focal industry does not only invalidate

the empirical generalization of prior results from examining movies or books to the apparel

industry, but we also show that the methods previously employed to examine long tail effects in

the markets for movies or books are invalid for examining the apparel industry in particular and

therefore generally invalid except for special cases.

In this paper we demonstrate how ignoring the way sales in the online and offline channels

concentrate around different sets of products leads to misleading conclusions regarding e-

commerce concentration effects. We show that measuring long tail effects using data from offline

(online) sales exclusively when the distributions of online and offline sales concentrate around

different sets of products biases the estimates toward finding large long tail (superstar) effects.

However, these estimated long tail (superstar) effects just indicate that sales move toward products

that are popular online (offline) and away from products that are popular offline (online).

To overcome this challenge, we propose a metric that isolates the difference in the concentration of

sales across the offline and the online channels (second moment) by controlling for the difference

in the locations of the online and offline sales distributions. In our empirical analysis we control

24

for consumer heterogeneity by using individual-level fixed effects, and further control for channel

selection effects by using store entry as an instrumental variable. Qualitatively, using our proposed

metric we find long tail effects when consumers use the online channel more frequently for two of

our retailers and no changes in concentration for the other retailer. More importantly, in terms of

size our concentration effect estimates are bracketed by large spurious long tail effects when

measuring product popularity based on offline sales exclusively and large spurious superstar

effects when measuring product popularity based on online sales exclusively.

We believe that our results and methods are important not only for the long tail versus superstar

literature, but also for managerial practice. Ignoring the differences in product popularity across

channels, and using an incorrect measure of the e-commerce concentration effects may lead to

managerial errors regarding product selection, product variety, and stocking decisions. Examining

differences in product popularity by channel and long tail effects for other industries where

product popularity might differ across online and offline channels is warranted.

25

Appendix

Appendix A Analysis at the SKU Level

In the database each specific color and size combination within a style is assigned a unique SKU

code. In the main text we conducted our analysis at the style level, which aggregates items

offered in a variety of sizes and colors. To check for robusteness, in this appendix we replicate our

analysis in the main text using information disagregated at the SKU level. We show that the

conclusions in the main text are accentuated when using data at the SKU level our analysis at the

style level in the main text represents a conservative choice.

Table A1 presents OLS estimates for Model (1) in the main text using SKU level data. The results

in Table A1 are similar to the results at the style level presented in the main text. Column I in

Table A1 shows superstar effects when product popularity is based on online sales exclusively.

Moreover, the predicted superstar effects in Column I of Table A1 are larger in magnitude

compared to the results in the main text using data at the style level. The estimation results in

Column I of Table A1 indicate that the share of transactions taken by the top 100 online products

would increase by 118% if consumers moved all of their transactions to online markets for Firm A

(109% for Firm B and 42% for Firm C).

Column II of Table A1 presents results computing product popularity based on offline sales

exclusively; the results predict long tail effects for all three firms as in the style level analysis

presented in the main text. In terms of size, the estimation results in Column II of Table A1

indicate that the share of transactions taken by the top 100 offline products would decrease by 47%

if consumers moved all of their transactions to online markets for Firm A (66% for Firm B and

61% for Firm C). The predicted long tail effects in Column II of Table A1 are slightly larger in

magnitude compared to those in the main text using data at the style level.

In Column III of Table A1 we present the results using our proposed metric on data aggregated at

the SKU level. The results indicate the existence of superstar effects for Firms A and B (23% and

5% respectively) and long tail effects for Firm C (15%).

26

Table A1: Share Taken by Top 100 Products in Each Transaction

OLS Estimates at the SKU Level

The differences in product popularity across channels are magnified when conducting the analysis

at the more disaggregate SKU level compared to the style level used in the main text. Figure A1 is

analogous to Figure 2 in the main text, and shows the commonality of superstar products across the

online and offline channels for all three brands when examining the data at the more disaggregated

SKU level. Compared to the style level analysis presented in Figure 2 in the main text, the analysis

at the SKU level in Figure A1 shows that the number of products in the intersection area C,

representing the number of products that are top 100 in all overall, online, and offline rankings

drops from 43 (in Figure 2 in the main text) to 23 (in Figure A1) for Brand A, from 43 to 14 for

Brand B and from 45 to 19 for Brand C.

I II III

Rank Based on Online Sales Rank Based on Offline Sales Proposed Metric

Dummy Online Purchases 0.0696*** -0.0439*** 0.0257***

(0.0021) (0.0016) (0.0023)

Observations 171,964 171,964 171,964

R-squared 0.2393 0.2259 0.2169

Dummy Online Purchases 0.0575*** -0.0549*** 0.0051***

(0.0018) (0.0015) (0.0020)

Observations 134,934 134,934 134,934

R-squared 0.3128 0.2526 0.2610


(0.0063) (0.0067) (0.0075)

Observations 33,355 33,355 33,355

R-squared 0.417 0.4441 0.3931

Includes fixed effects for both months (24) and individuals (Company A: 22,464; Company B: 23,991 ; Company C: 11,106).



For firm A, the mean of the depent variable is 0.048 in Column I, 0.076 in Column II, and 0.089 in Column III.

For firm B, the mean of the depent variable is 0.038 in Column I, 0.06 in Column II, and 0.077 in Column III.

For firm C, the mean of the depent variable is 0.128 in Column I, 0.157 in Column II, and 0.216 in Column III.


Firm C

Firm A

Firm B

27

Figure A1: Commonality in Popular Products in Online and Offline Channels

SKU Level Analysis

Brand A

Brand B

Brand C

28

Tables A1a through A1c are analogous to Tables 2a through 2c in the main text, and tabulate

popularity ranks for various thresholds for the online and offline channels calculated at the SKU

level. Compared to the statistics in Tables 2a through 2c in the main text using data at the style

level, the differences in product popularity across the online and offline channels are substantially

larger when computing the statistics at the SKU level. For example, Table A2b for Brand B shows

that 28 of the top 50 products in the online channel are not even among the top 1000 products in

the offline channel Table 2b for the same brand in the main text shows that 14 of the top 50

products in the online channel are not among the top 1000 products in the offline channel. The data

from our two other companies show similar patterns.

Table A2a: Comparison of Popular Products Offline vs. Online for Brand B at

the SKU Level

Table A2b: Comparison of Popular Products Offline vs. Online for Brand B at

the SKU Level

29

Table A2c: Comparison of Popular Products Offline vs. Online for Brand C at

the SKU Level

The greater differences in product popularity across channels when conducting the analysis at the

SKU level generates a greater contrast between the superstar effects in Column 1 of Table A1 and

the long tail effects in Column 2 of Table A1 compared to the results in the main text (the contrast

between the superstar effects in Column 1 of Table 4 in the main text and the long tail effects in

Column 2 of Table 4 in the main text is smaller than the contrast between the estimates in Columns

1 and 2 of Table A1).

Table A3 presents IV results using data at the SKU level. The first stage results in Column I are

identical to those in the main text (the instrumented and instrumental variable in Table A3 and

Table 5 in the main text are the same). The second stage IV results in Column II of Table A3 are

consistent with our results at the style level in the main text.

30


IV Estimates at the SKU Level

I II

First Stage Second Stage - Proposed Metric

Dummy Store between 0 and 1 Miles -0.0760** na

(0.0321) na


(0.0235) na


(0.0210) na


(0.0265) na

Dummy Online Purchases na 0.062

na (0.1741)


R-squared 0.4259 na


(0.0434) na


(0.0339) na


(0.0235) na


(0.0234) na

Dummy Online Purchases -0.1016**

(0.0432)


R-squared 0.4495 na


(0.0927) na


(0.0646) na


(0.0402) na


(0.0460) na

Dummy Online Purchases -0.0869

(0.0854)


R-squared 0.6417 na

Includes fixed effects for both months (24) and individuals (Company A: 22,464; Company B: 23,991; Company C:11,106:).



Firm A

Firm C

Firm B

31

Appendix B Popularity Based on Overall Sales

Table A4 presents results using popularity ranks based on overall sales (aggregating sales from

both the online and offline channels). The table also presents the results from Table 4 in the main

text for comparison.

The results for Firms A and B in Table A4 show that the results from an analysis using overall

sales to rank product popularity (Column IV) closely resemble the results from an analysis

computing popularity using offline sales exclusively (Column II). This is expected for Firms A and

B since offline sales account for a substantially larger share of the overall sales than online sales

for these companies (see Table 1 in the main text). As a result, basing product popularity on

overall sales is similar to basing product popularity on sales from the dominant channel; using

overall sales does not control for the differences in product popularity across channels when the

distributions of online and offline sales are centered on different locations. This demonstrates that

using overall sales also produces misleading results.


OLS Estimates Including Overall Sales Measure

I II III IV

Rank Based in Online Sales Rank Based in Offline Sales Proposed Metric Rank Based on Overall Sales

Dummy Online Purchases 0.0955*** -0.0988*** 0.0038 -0.0709***

(0.0033) (0.0029) (0.0035) (0.0031)

Observations 171,964 171,964 171,964 171,964

R-squared 0.1909 0.2021 0.1866 0.1967

Dummy Online Purchases 0.0738*** -0.1519*** -0.0577*** -0.1162***

(0.0027) (0.0026) (0.0030) (0.0027)

Observations 134,934 134,934 134,934 134,934

R-squared 0.2537 0.2889 0.261 0.2743

Dummy Online Purchases 0.1220*** -0.3516*** -0.1594*** -0.1861***

(0.0095) (0.0087) (0.0097) (0.0096)

Observations 33,355 33,355 33,355 33,355

R-squared 0.411 0.5117 0.4157 0.4359

Includes fixed effects for both months (24) and individuals (Company A: 22,464; Company B: 23991 ; Company C: 11,106).



For firm A, the mean of the depent variable is 0.178 in Column I, 0.242 in Column II, 0.261 in Column III, and 0.244 in column IV

For firm B, the mean of the depent variable is 0.125 in Column I, 0.196 in Column II, 0.221 in Column III, and 0.199 in column IV.

For firm C, the mean of the depent variable is 0.329 in Column I, 0.389 in Column II, 0.481 in Column III, and 0.413 in column IV


Firm C

Firm A

Firm B

32

REFERENCES

Bell, David, Santiago Gallino and Antonio Moreno (2013), Inventory Showrooms and Customer

Migration in Omni-channel Retail: The Effect of Product Information, Working Paper, the Wharton

School, University of Pennsylvania.

Brynjolfsson, Erik, Yu Hu, and Michael Smith (2003), Consumer Surplus in the Digital Economy:

Estimating the Value of Increased Product Variety, Management Science, 49(11), 1580-1596.

Brynjolfsson Erik, Yu Hu, and Mohammad Rahman (2009), Battle of the Retail Channels: How

Product Selection and Geography Drive Cross-Channel Competition, Management Science, 55(11),

17551765.

Brynjolfsson, Erik, Yu Hu, and Duncan Simester (2011), Goodbye Pareto Principle, Hello Long Tail:

the Effect of Search Costs on the Concentration of Product Sales, Management Science, 57(8), 1373-

1386.

BusinessWeek (2005) Best of 2005: IdeasHow the Net can find markets for the obscure. Accessed

March 8, 2014, http://images.businessweek.com/ss/05/12/bestideas/source/11.htm.

Elberse Anita, and Felix Oberholzer-Gee, (2007), Superstars and Underdogs: An Examination of the

Long Tail Phenomenon in Video Sales, MSI Reports: Working Paper Series, 4, 4972.

Fleder Daniel, and Kartik Hosanagar, (2009), Blockbuster Cultures Next Rise and Fall: The Impact

of Recommender Systems on Sales Diversity, Management Science, 55(5), 697712.

Forman Chris, Anindya Ghose, and Avi Goldfarb, (2009), Competition Between Local and Electronic

Markets: How the Benefit of Buying Online Depends on where You Live, Management Science,

55(1), 4757.

Goldfarb, Avi, Ryan C. McDevitt, Sampsa Samila, and Brian Silverman, (2013),. The Effect of Social

Interaction on Economic Transactions: An Embarrassment of Niches? Working Paper, University of

Toronto.

Choi, Jeonghye and David Bell, (2011), Preference Minorities and the Internet, Journal of

Marketing Research, 58, 670 682.

33

Lal, Rajiv and Miklos Sarvary, (1999), When and how is the Internet Likely to Decrease Price

Competition?, Marketing Science, 18 (4), 485-503.

Lee, Jae Young and David R. Bell, (2013), Neighborhood Social Capital and Social Learning for Experience Attributes of Products, Marketing Science, 32 (6) , pp. 960976

Oestreicher-Singer Gal, and Arun Sundararajan , (2012), Recommendation Networks and the Long

Tail of Electronic Commerce, MIS Quarterly, 36(1), 6584.

Pozzi, Andreas, (2012), Shopping Cost and Brand Exploration in Online Grocery, American

Economic Journal: Microeconomics, 4(3), 96-120.

Tucker Catherine, and Juanjuan Zhang, (2011), How does Popularity Information affect Choices? A

Field Experiment, Management Science, 57(5), 828842.

Waldfogel, Joel, (2012), And the Bands Played On: Digital Disintermediation and the Quality of New

Recorded Music, Working Paper, University of Minnesota. Minneapolis, Minnesota.

Zentner, Alejandro, Michael Smith and, Cuneyd Kaya, (2013), How Video Rental Patterns Change as

Consumers Move Online, Management Science, 59(11), 26222634.

SSRN-id2424840

Documents

Transcript of SSRN-id2424840