Introduction and Literature Review - economics.lafayette.edu · Web viewIn this real-word example,...

Click here to load reader

Transcript of Introduction and Literature Review - economics.lafayette.edu · Web viewIn this real-word example,...

Defining GeogRaphic Markets from Probabilistic Clusters:A Machine Learning Algorithm Applied

to Supermarket Scanner Data[footnoteRef:1] [1: Correspondence: [email protected] (S. Bruestle). This work is partially supported by the European Community H2020 programme under the funding schemes: INFRAIA-1-2014-2015: Research Infrastructures G.A. 654024 SoBigData, http://www.sobigdata.eu. We thank UniCoop Tirreno for providing the data. In addition, we thank Ken Elzinga (University of Virginia), Nathan Larson (American University), Michael Kelly (Lafayette College), Sidney Michelini (Paris School of Business), Carlo Schwarz (University of Warwick), and Dina Guo (Freddie Mac) for their helpful comments. In addition, we thank the following Federal Maritime Commission (FMC) employees for their helpful comments: Roy Pearson, Robert Blair, and Sean Watterson. Their comments were their own and do not represent the views of the FMC or the United States government. In addition, we gave a talk on the project to Consiglio Nazionale delle Researche (CNR) Italy in 2018. We thank the audience for their helpful comments.]

2

Draft: 9/19/2019

By Stephen Bruestle,[footnoteRef:2] [2: Stephen Bruestle is the primary author and instigator in this line of research. He came up with the concept for this paper. He performed all the statistical analysis and clustering. He created all the tables and figures. He reviewed the literatures from multiple scientific disciplines. He developed all the economics, theory, and analysis. And, he wrote the paper. Dr. Bruestle is an economist at the U.S. Federal Maritime Commission (FMC), which deals with anticompetition issues. Among other responsibilities, the FMC administers and monitors international liner shipping’s limited immunity from United States antitrust laws. The opinions and ideas in this paper are Dr. Bruestle’s own and do not represent the views of the FMC or the United States government. ]

Luca Pappalardo,[footnoteRef:3] and Riccardo Guidotti3 [3: It is a condition of publishing content using data from SoBigData to credit their support researchers, which in this case are Luca Pappalardo and Riccardo Guidotti. They provided the Coop dataset. They extracted the data. They cleaned the data. And, they provided helpful comments on how data scientists cluster. Luca Pappalardo and Riccardo Guidotti are at the Institute of Information Science and Technologies (ISTI), Consiglio Nazionale delle Ricerche (CNR). ]

Abstract: We propose that we estimate geographic markets in two steps. First, estimate clusters of transactions interchangeable in use. Second, estimate markets from these clusters. We argue that these clusters are subsets of markets. We draw on both antitrust cases and economic intuition. We model and estimate these clusters using techniques from machine learning and data science. We model these clusters using Blei et al.’s (2003) Latent Dirichlet Allocation (LDA) model. And, we estimate this model using Griffiths and Steyvers’s (2004) Gibbs Sampling algorithm (Gibbs LDA). We apply these ideas to a real-world example. We use transaction-level scanner data from the largest supermarket franchise in Italy. We find fourteen clusters. We present strong evidence that LDA fits the data. This shows that these interchangeability clusters exist in the marketplace. Then, we compare Gibbs LDA clusters with clusters from the Elzinga-Hogarty (E-H) test. We find similar clusters. LDA has a few identifiable parameters. The E-H test has too many parameters for identification. Also, Gibbs LDA avoids the silent majority fallacy of the E-H test. Then, we estimate markets from the Gibbs LDA clusters. We use consumption overlap and price stationarity tests on the clusters. We find four grocery markets in Tuscany.

JEL Codes: L100, D400, C380, L400, C150

Keywords: defining markets, clustering, interchangeable of use, machine learning, Latent Dirichlet Allocation (LDA), Gibbs Sampling (Gibbs LDA), bags of products, Elzinga-Hogarty test, elbow method, sampling methods, consumption overlap, antitrust markets, economics markets, Markov Chain Monte Carlo (MCMC), silent majority fallacy

1. Introduction and Literature Review

Market definition is a form of clustering. Clustering is organizing objects into groups whose members are similar according to some measure. Marketing definition is organizing transactions into “markets” whose members are similar according to some measure.

All methods used to define markets are forms of clustering. Numerical tests are forms of clustering. Human judgement is a form of clustering.

In antitrust cases, the size of the markets often determines the outcome of the case. For example, in the United States v. Philadelphia National Bank, the district court defined the market as a broad geographic area, where there were many competitors so the merged firm could not abuse market power. Therefore, the banks were allowed to merge. Then, the Supreme Court overturned this decision. They defined the market as a narrower geographic area, where few firms compete. This would give the merged firm the ability to abuse its dominant position by raising prices. Therefore, the Supreme Court disallowed the merger.[footnoteRef:4] [4: The defendant in a proposed merger does not always argue for larger markets. Sometimes the defendant argues for smaller markets. They argue this to show that the two firms are not competitors. For example, in the United States v. Continental Can, the defendant argued that metal containers and glass containers were in two different markets. This way there would be no concern over a merger between a metal container manufacturer and a glass container manufacturer. See United States v. Continental Can Co., 217 F. Supp. 761, 782.]

You should not confuse this with a cluster market. A cluster market is a different concept. A cluster market is where each firm sells a group of complements. For example, grocery stores are cluster markets. They sell a group of complements, which include meat, milk, vegetables, and so on.

[Table 1 about here.]

Table 1 shows the leading tests used to define markets. All these tests are forms of clustering.

Marshall (1920) defined markets based on the law of one price; that is the “prices of products in the same market tend to equality with due allowance for transportation cost.” [footnoteRef:5] We call clusters based on the law of one price: economics markets. [5: He was building on Cournot (1897).]

The law of one price comes from arbitrage. For example, suppose products 1 and 2 are identical. Suppose store 1 sells product 1 for . Store 2 sells product 2 for . Then someone could make a profit through arbitrage—buying product 1 and selling it in front of store 2. Or, equivalently, all consumers would buy from store 1 and not store 2. This would induce store 1 to increase and store 2 to lower until .

We use several standard clustering methods based on the law of one price. These include price correlation comparison (e.g. Nestlé–Perrier merger), stationary tests (e.g. Stigler and Sherwin, 1985; Forni, 2004), Granger causality tests (Cartwright et al., 1989; Slade, 1986), and natural experiments on price movements (Davis and Garcés, 2009, pg. 185-188).

The 1984 Horizontal Merger Guidelines define markets based on a hypothetical monopolist test; that is a market is the smallest area or group of products where a hypothetical, profit-maximizing monopolist would impose a 'small but significant and nontransitory' increase in price. We call clusters based on a hypothetical monopolist test: antitrust markets.

For example, suppose Y products form a market. Suppose product 2 is the closest substitute to product 1. Product 3 is the second closest substitute to product 1. And, so on. Let the perfectly competitive price be . Suppose a single firm sells products 1 through y. Let the single firm’s profit for setting a price of p be . Then the antitrust market is the smallest y such that .

We use several standard clustering methods based on this concept. These include the Small but Significant and Nontransitory Increase in Price (SSNIP) test (1984 Merger Guidelines), Critical Loss Analysis (Langenfeld and Li, 2001), and the Full Equilibrium Relevant Market (FERM) test (e.g. Ivaldi and Lorincz, 2005).

We often cannot find markets from just one of these tests.[footnoteRef:6] Defendants and plaintiffs often combine tests. They often blend market definitions. Sometimes they use economics markets. Sometimes they use antitrust markets. And sometimes they use both. [6: Kaplow (2013) argues that we do not need to define antitrust markets. If we have enough information, then we have enough to estimate competitive effects. Then we do not need to find antitrust markets.]

In the United States v. du Pont (Cellophane), the Supreme Court established an additional market definition to be used with the other market definitions. They later refined it in Brown Shoe v. United States and the United States v. Continental Can.[footnoteRef:7] They established that products are in the same market if they are interchangeable in use; or equivalently, products are in the same market if consumers use them for the same purpose. Let’s call clusters based on this standard: interchangeability clusters. [7: In the United States v. du Pont (Cellophane), products are in the same market if they are "reasonably interchangeable by consumers for the same purposes." In Brown Shoe v. United States, this became the "boundaries of a product market are determined by the reasonable interchangeability of use." And then in the United States v. Continental Can, this became "interchangeability of use." See United States v. E.I. du Pont de Nemours 8c Co., 351 U.S. 377, 395 (1956) (Cellophane); Brown Shoe Co. v. United States, 370 U.S. 294, 325 (1962); and United States v. Continental Can Co., 378 U.S. 441, 449 (1964).]

The courts still use this definition today.[footnoteRef:8] [8: See for example Geneva Pharm. v. Barr Labs, Inc., 386 F.3d 485, 496 (2nd Cir. 2004); United States v. Microsoft Corporation, 253 F.3d 34 (D.C. Cir. 2001); and the Federal Trade Commission v. Shite ViroPharma, Inc., Complaint for Injunctive and Other Equitable Relief. (District Court 2017).]

You should not confuse these clusters with the set of all functional substitutes. Functional substitutes are all the products that consumers could use for the same purpose. Interchangeability clusters are products that we observe consumers using for the same purpose. For example, consumers could use caviar in the place of salmon in salads; caviar and salmon can be used for the same purpose. They are functional substitutes. But consumers do not use them for the same purpose, because caviar costs a lot more. They are not interchangeable in use (Davis and Garcés, 2009, pg. 166-167).

In Brown Shoe v. United States, the Supreme Court established that the standard is interchangeable in use.[footnoteRef:9] We need to observe consumers using the products for the same purpose. If consumers could use the products for the same purpose, then the products are not necessarily in the same market. We need to observe consumers using the products interchangeably. [9: In the United States v. Brown Shoe, the court tried using functional substitutes to determine markets. The court concluded that market delineation "cannot be determined by any process of logic and should be determined by the processes of observation... In other words, determine how the industry itself and how users, the public, treat the shoe product." See Brown Shoe Co. v. United States, 370 U.S. 294, 730 (1962).]

Similarly, neighboring towns might not be in the same market. Consumers might not use the stores in the two towns interchangeably.

The standard gauge for interchangeability in use is consumption overlap, which is how much of the same consumers use the two products. For example, suppose 30% of the consumers that purchase product 1 purchase product 2. Then, the consumption overlap is 30%.

Elzinga and Hogarty (1973; 1978) created the most common way to cluster based on this gauge. We call this method: the Elzinga-Hogarty (E-H) test. It is an algorithm that finds geographic interchangeability clusters.

Many criticize the E-H test based on its failure in legal cases.[footnoteRef:10] [10: The courts used the test extensively in hospital merger cases. Between 1994 and 2000, the FTC and DOJ challenged 7 of 900 hospital mergers, and then lost all seven. The FTC and DOJ concluded (U.S. Trade Commission, 2005, ch. 4, pg. 5): "... that the Elzinga-Hogarty test is not valid or reliable in defining geographic markets in hospital merger cases." Although this conclusion does not necessarily apply to other industries, it does show there are some weaknesses in the test.]

Capps et al. (2001) criticize the test. They claim there is a silent majority fallacy. What is true for some consumers, is not necessarily true for all consumers. Potentially, some consumers substitute product 1 for product 2. While a silent majority would not under any price.

This fallacy is because the E-H test relies on consumption overlap. All tests of consumption overlap have this fallacy. We need to evolve beyond consumption overlap.

Also, you cannot identify the E-H test. In the E-H test, you split the map into subregions. We usually define a subregion as a ZIP Code or a municipality. The E-H test determines if each subregion is in the market. Therefore, each subregion creates one parameter, which is one if the subregion is in the market; and zero otherwise. This gives the E-H test too many parameters.

This problem occurs in most clustering algorithms.[footnoteRef:11] [11: One exception is the clustering algorithms used in marketing science to segment consumers based on surveys (see Sarstedt and Mooi, 2014, ch. 9). This includes hierarchical and K-means clustering. They are not used in antitrust litigation. They can be identified. This is from asking a sufficiently large number of survey questions. However, this is not usually done as additional questions change the meaning of the results.]

In this paper, we address both criticisms by creating a model for interchangeability in use that has few parameters. Each subregion has a random effect governed by these parameters. The model relies on more than consumption overlap. The model uses a more complete view of interchangeability in use. The silent majority fallacy leads to poor model fit. Therefore, we avoid the fallacy when we find good model fit.

Our approach follows a result from belonging in the same interchangeability cluster. The result is that the expected shares of products tend to remain constant.

For example, suppose there are two transactions. In each transaction, the consumer buys either product 1 or product 2. In transaction 1, we guess that the consumer has a 30% chance of buying product 1. Suppose that in transaction 2, the consumer uses the products for the same purposes. As far as we know, each consumer desires each product equally. Then, in transaction 2, we would guess that the consumer has a 30% chance of buying product 1. Therefore, consumers exchange both products at the same rate.

With sufficiently many products, the converse is true. If the expected shares of products remain constant, then the transactions are in the same cluster.

Suppose this were false. Then, there exist two transactions with the same product shares and different substitutability. So, there exists some factor that effects elasticities and not product shares. This becomes absurd when there are numerous products. You have numerous equal product shares unaffected by something that affects elasticity, which may be theoretically possible but is not practical.

Therefore, with sufficiently many products, clusters are within markets; they are subsets of markets.

Yet, these clusters might not equal markets. For example, multiuse goods could create a chain of substitution across clusters. Markets could contain more than one cluster.

An antitrust market is a set of clusters. Suppose the monopolist could set individual prices for each transaction, and the monopolist knew how much each consumer valued each good in each transaction. Then, each transaction is one market. Now suppose there exists some small friction or cost to price discrimination. Firms would want to set the same prices for all similar transactions. Transactions with the same interchangeability in use would be close enough for the same price. We see this when firms strategize by consumer segment. Therefore, clusters are markets when there exist no barriers to price discrimination between clusters.[footnoteRef:12] When these barriers exist, clusters combine to form markets. [12: Barriers to price discrimination include imperfect information, government regulations, social barriers, arbitrage, multiuse products, and the additive costs associated with price discrimination.]

Therefore, this paper estimates markets in two steps. First, we estimate the interchangeability clusters. Then, we estimate the markets from the clusters. Finding the clusters first makes it easier to find markets. It reduces the number of dimensions.

In this paper, we model and estimate these clusters using techniques from machine learning and data science. We model clusters using Blei et al.’s (2003) Latent Dirichlet Allocation (LDA) model. And, we estimate this model using Griffiths and Steyvers’s (2004) Gibbs Sampling algorithm (Gibbs LDA).

LDA is a general clustering model. Blei et al. (2003) made it to cluster documents based on word patterns. Others have used it to: classify genetic sequences based on animal traits (Chen et al., 2010; Pritchard et al., 2000), recognize objects in photographs (Fei-Fei and Perona, 2005; Sivic et al., 2005; Russell et al., 2006; Cao and Fei-Fei, 2007; Wang et al., 2008), cluster videos (Niebles et al., 2008; Wang et al., 2007), cluster music (Hu, 2009), analyze social networks (Airoldi et al., 2008), cluster the disabilities of the elderly population (Erosheva et al., 2007), cluster CEO behavior (Bandiera et al., 2017), and predict user tastes and preferences based on consumer reviews (Marlin, 2004).

LDA comes from machine learning. The goal is to automate clustering in a way that mimics human thought. It comes from the same algorithms that we use to create modern search engines like Google, Yahoo, and BING.

In this paper, we focus on geographic markets. In the next paper, we plan to focus on product markets. In theory, we should solve for both geographic and product markets jointly. In practice, we solve for them separately (Davis and Garcés, 2009, pg. 163).

Marketing and data scientists have been clustering to find consumer segments. They mostly cluster consumers based on survey results (see Sarstedt and Mooi; 2014). Notably, Guidotti and Gabrielli (2017) and Guidotti et al. (2018) cluster based on when consumers make purchases.

This paper proposes we take it a step further. We find antitrust markets from clusters.

While we wrote this paper for antitrust economists, others might find it useful. This paper might help data scientists develop new clustering techniques. It might help marketing scientists segment consumers. And, it might help us cluster for other purposes.

We organize this paper as follows:

In section 2, we adapt LDA as a model for defining markets. We cluster transactions based on the patterns of purchases, and we give two illustrative examples.

In section 3, we summarize Gibbs LDA. This is the technique used to estimate LDA, and we give one simulated example.

In section 4, we apply these ideas to a real-world example. We use transaction-level scanner data from the largest supermarket franchise in Italy (sec. 4.1). We estimate 14 Gibbs LDA clusters and model fit (sec. 4.2). Then, we compare these results to results found using the E-H test (sec. 4.3). We get similar clusters. And, we discuss some advantages of using Gibbs LDA over the E-H test. Then, we find markets from the Gibbs LDA clusters (sec. 4.4).

In section 5, we conclude.

2. Latent Dirichlet Allocation (LDA) Model

In this section, we adapt the Latent Dirichlet Allocation (LDA) model to a new situation. We do not change the model. We change what the model clusters. Blei et al. (2003) made it to cluster documents based on word patterns. We use LDA to cluster transactions based on purchase patterns.

This model is a two-step process. First, a consumer draws a cluster. Second, the consumer draws a product from that cluster.

In section 3, we solve for the reverse of this model to estimate the clusters. We see the products drawn. We use the inverted model to estimate the clusters.

2.1 The Model

There exist I consumer segments. Each consumer segment is a region or subregion of the map. Each could be a municipality, a ZIP Code, a city block, a household, or even a consumer.[footnoteRef:13] [13: The choice of the size of the segment depends on how finite you want the results. The model gives consumers within a segment the same preferences.]

For each consumer segment some random process determines its expenditure . This random process is not critical to anything that follows. LDA models product selection and does not model the choice of the number of purchases.

Expenditure could depend on prices or incomes. It could depend on the variables in the model. It is ancillary to the model. It does not matter how it is determined.

The cost of each transaction is some fixed amount, P.

This creates a fixed number of transactions. Each consumer segment i has transactions. The total number of transactions is .

This purposely ignores pricing effects. We are trying to find the boundaries of markets. Pricing effects are within markets, not across boundaries.

Each consumer segment i draws a random K-dimensional vector of tastes from a Dirichlet Distribution with a K-dimensional parameter , where K is an exogenously given number of clusters. Consumer segments buy from clusters in different proportions. Consumer segment i's vector of these proportions is . The kth element is the probability that a given purchase by consumer segment i is from cluster k.

For example, the "college student" consumer segment purchases more beer than wine, so . The "downtown" consumer segment purchases more wine than beer, so .

The Dirichlet distribution draws random variables on the (K-1)-simplex and has a probability density function given by:

(1)

Then, products are purchased in a two-step process:

First Step: for each consumer segment i and for each purchase , a random cluster is drawn from the multinomial distribution with a parameter of . The probability of choosing cluster k is .

These draws of clusters are technically not independent. They are conditional on . Within each consumer segment, they are independent and identically distributed.

Therefore, we assume that the order of purchases does not matter.

For example, suppose two clusters exist: products from the east mall and products from the west mall. In the first step, each consumer determines which mall they are purchasing from. The probability that a purchase from consumer segment i is from the east mall is .

Second Step: for each consumer segment i and for each purchase , a random product is drawn from the multinomial distribution with a parameter conditioning on the cluster . The probability of purchasing y is the parameter .

For example, suppose two clusters exist: products in the east mall and products in the west mall. The second step does not depend on the consumer segment. It just depends on the mall. The probability that a purchase from the east mall is sneakers is .

Note that cluster shares, , can depend on prices and incomes. The model treats  as constant. It is ancillary to the model. It does not matter how it is determined. We are trying to find the boundaries of markets. Within market forces determine cluster and market shares. You can analyze within market forces with a different model when you assess market power. It would be a separate analysis.

A good analogy for this model is that a cluster is a bag of products. For each consumer segment i and for each transaction , first a random bag of products is drawn from the multinomial distribution with a parameter of . Then, a random product is drawn from the bag of products using the multinomial distribution with a parameter . The economist does not observe the bags drawn or the structure of the bags. The structure of the bags is  and . He or she only observes y, all the products purchased. In section 3, we show how the economist estimates the structure of the bags from y.

For example, suppose Santa randomly distributes toys using this model. First, he randomly draws either "naughty" or "nice". This depends on the neighborhood. Some neighborhoods are naughtier, and some are nicer. Second, he randomly picks a toy from the bag. The toy drawn depends on the bag and not the neighborhood. The economist observes the number of each toy delivered to each neighborhood. He or she does not know there is a "naughty" bag and a "nice" bag. He or she does not know if kids in the inner-city are naughtier or nicer than kids in the suburbs. He or she infers the bags based on the toys delivered.

2.2 Example 1: Corner Stores & Grocery Stores

Figure 2 shows an illustrative example of the LDA model.

[Figure 2 about here.]

Here, the cost of one transaction is the time and money spent on one trip to a corner store or to a grocery store. Therefore, a transaction is a shopping trip.

There exist two bags of products or equivalently stores (). Each bag of stores is a different cluster. Cluster 1 is `Corner Stores'. Corner stores are in the city center. Cluster 2 is `Grocery Stores’. Grocery stores are in the outskirts.

Also, three consumer segments exist: urbanites, suburbanites, and country folk. Urbanites live in the inner-city. Suburbanites live in the suburbs. And, country folk live in the outskirts.

For each consumer segment, consumers make a fixed number of shopping trips. This is determined by some random process. Urbanites make shopping trips, suburbanites make shopping trips, and country folk make shopping trips. Each consumer segment draws a . This represents the probability of drawing each bag of stores. Urbanites draw a corner store all the time. Therefore, their . Suburbanites draw a corner store half the time and a grocery store half the time. Therefore, their and . Likewise, for country folk, .

First Step: for each shopping trip, , consumer segment i selects the store that it visits by drawing a random bag of stores. The probability of drawing bag k is .

Second Step: the consumer draws the store from the selected bag. This process only depends on the bag. This process is the same for any consumer drawing from the same bag. When a consumer visits a cluster, the location of their home does not affect the choice of the store. If a consumer visits a corner store, then he or she a has 25% chance of visiting `7-Eleven', a 20% chance of visiting `Speedway', and so on. If a consumer visits a grocery store, then he or she a has 25% chance of visiting `ShopRite', a 20% chance of visiting `Aldi', and so on.

Note that suburbanites buy from both corner stores and grocery stores, but do not use both sets of stores for the same purpose. They would not use a corner store in the place of a grocery store. And, they would not use a grocery store in the place of a corner store. Corner stores and grocery stores have consumption overlap, but they are not interchangeable.

The economist observes the number of shopping trips to each store by each consumer segment. He or she does not observe the bags, the number of bags, and which stores belong to which bag. In section 3, we show how he or she uses the stores visited to estimate the clusters.

2.3 Example 2: Televised Sports Programs in India

Figure 3 shows another illustrative example of the LDA model.[footnoteRef:14] In India, the north and west love cricket, and the south loves soccer (i.e. world football). [14: We made-up the numbers in this example. ]

[Figure 3 about here.]

Here the cost of one transaction is an hour of a consumer's time.

There exist two bags of sports programs (). Each bag of sports programs is a different cluster. Cluster 1 is `Cricket'. Cluster 2 is `Soccer'.

Also, two consumer segments exist: `North & West India' and `South India'.

For each consumer segment, consumers watch a fixed number of hours of programs. This is determined by some random process. Northern and western Indian consumers watch hours of sports programs. Southern Indian consumers watch hours of sports programs. Each consumer segment draws a . This represents the probability of drawing each bag of sports programs. Northern and western Indian consumers watch more cricket. They watch cricket 95% of the time, so their and . Southern Indian consumers watch more soccer. They watch soccer 80% of the time, so their and .

First Step: for each hour of television, , consumer segment i selects what it watches by first drawing a random bag of sports programs. The probability of drawing bag k is .

Second Step: the consumer draws the sports program from the selected bag. This process only depends on the bag. This process is the same for any consumer drawing from the same bag. If the consumer watches cricket, then he or she has a 35% chance of watching the `Ranji Trophy', a 30% chance of watching the `Duleep Trophy', and so on. If the consumer watches soccer, then he or she has a 43% chance of watching the `Asian Cup', a 27% chance of watching the `King's Cup', and so on.

Note one multiuse good exists. `India TV Sports News’ is both a cricket and a soccer program. Consumers could watch the program to get cricket news. Or, consumers could watch the program to get soccer news. Therefore, it belongs to both clusters.

The economist observes the number of viewers for each sports program by each consumer segment. He or she does not observe the bags, the number of bags, and which programs belong to which bag. In section 3, we show how he or she uses the number of viewers to estimate the clusters.

3. Model Estimation3.1 Estimation Technique

In this section, we describe how to estimate the model from the previous section. This estimates the clusters from the purchases.

We can identify the LDA model because it has few parameters.[footnoteRef:15] These parameters affect the model globally and each consumer segment as a random effect. The global parameters define the distribution of the random effects. [15: This set of parameters is K, , and .]

Unfortunately, you cannot directly solve for the most likely parameters. No closed-form solution exists. It is intractable due to coupling of the parameters (see: Blei et al., 2003).

In this paper, we estimate the model with a form of Markov Chain Monte Carlo (MCMC).

Specifically, we use Griffiths and Steyvers’s (2004) Gibbs Sampling algorithm (Gibbs LDA).[footnoteRef:16] Gibbs sampling is an efficient and easy to implement form of MCMC. [16: Computer scientists created several alternative techniques to estimate LDA. These include mean field variational inference (Blei et al., 2003) expected propagation (Minka and Lafferty, 2002), and collapsed variational inference (Teh et al., 2006).]

The idea is that if in addition to the purchases y, we know all the cluster assignments but the last purchase for consumer i (i.e. ), then the probability distribution for this unknown cluster assignment would be:

(2)

where and are smoothing parameters; is the number of consumer i’s other purchases in cluster k; is the number of cluster k’s other purchases of product y; and is the total number of cluster k’s purchases.

In Gibb LDA, you start with a random cluster assignment. Then, you update all the cluster assignments using (2) until convergence.

We discuss Gibbs LDA in more detail in Appendix B. This appendix makes Gibbs LDA more accessible. The Gibbs LDA literature ignored a few statistical issues. We address these issues. This paper is the first to estimate the standard errors of the model fit of Gibbs LDA. In addition, we show how to ensure convergence.

3.2 Small Simulated Example

In this section, we test Gibbs LDA with a small simulated example. We see if we derive the same generative structure.

Suppose there exist 20 consumer segments. Suppose each consumer segment makes ten purchases. We draw these purchases using LDA with α=0.2. There exist two clusters: “cold weather shoes” and “warm weather shoes”. A draw from the “cold weather shoes” cluster has a 40% chance of being wool slippers, a 40% chance of being snow boots, and a 20% chance of being sneakers. A draw from the “warm weather shoes” cluster has a 20% chance of being sneakers, a 40% chance of being sandals, and a 40% chance of being flip flops.

The economist estimates the clusters from the observed purchases. He or she does not know there is a "cold weather shoe" cluster and a "hot weather shoe" cluster. He or she delineates the clusters based on the observed purchases.

[Table 4 about here.]

Table 4 shows that Gibbs LDA quickly estimates the cluster structure. To make it easier to interpret, we ordered the consumer segments by . Low is north. High is south. Panel (a) is the initial cluster assignments. We assigned these randomly. And, panel (b) is the cluster assignments of the 50th iteration of Gibbs LDA.  A white dot means the purchase is assigned to the first cluster (cold weather shoes). And, a black dot means the purchase is assigned to the second cluster (warm weather shoes).

Gibbs LDA performed well with a small number of iterations and purchases. It correctly guessed 91.5% of the cluster assignments.

4. Example from Italian Supermarket Scanner Data

In this section, we apply these ideas to a real-world example. We use transaction-level scanner data from the largest supermarket franchise in Italy (sec. 4.1). We estimate Gibbs LDA clusters and we defend model fit (sec. 4.2). Then, we compare these results to results found using the Elzinga-Hogarty (E-H) test. We get similar clusters. And, we discuss some advantages of Gibbs LDA clusters over E-H clusters (sec. 4.3). Then, we find markets from the Gibbs LDA clusters (sec. 4.4).

4.1 Data Description

In this real-word example, we use data from Unicoop Tirreno, which is part of Coop Italia. Coop Italia is the largest supermarket franchise in Italy. The data comes from the loyalty cards of residents of Tuscany. Coop knows all the purchases each member made at each of their stores. Therefore, we know if a consumer shops at one store or multiple stores. The data is all their purchases from 2010 to 2015 in stores in Tuscany. This is 99.7% of Unicoop Tirreno’s revenue from the stores in Tuscany.[footnoteRef:17] [17: Based on dividing total expenditure in the data by Coop Tirreno's total retail and wholesale revenues for Tuscany 2010-2015, as reported in their annual budgets (Unicoop Tirreno, 2013; 2015; 2016).]

[Table 5 about here.]

Table 5 shows some summary statistics for the data. The data is from 71 stores in 34 municipalities (or “comunes”) and 5 provinces. Between 2010 and 2015, 7 of these stores closed, and 12 of these stores opened. An average 122,852 consumers visited at least one store per month. Each of these consumers visited an average 1.375 stores in a month. The average revenue was 46.094 million EUR per month. Therefore, a consumer spent an average 375 EURs per month.

[Table 6 about here.]

Table 6 shows how much consumers living in each province spent in stores in each of the provinces from Sept. – Nov. 2015. This is the period that we use to estimate Gibbs LDA.[footnoteRef:18] Consumers from Massa & Carrara spent 98.29% in stores in their own province. They spent 1.66% in Lucca. They spent 0.04% in Livorno. And so on. Consumers from Lucca spent 99.58% in stores in their own province. They spent 0.29% in Massa & Carrara. And so on. Tuscany residents not in the 5 provinces spent 0.00% of their expenditure in Massa & Carrara. They spent 0.75% in Lucca. And so on. [18: We explain in sec. 4.2.1.]

Consumers mostly spent money in their own provinces. There are a couple of exceptions. We discuss these exceptions in section 4.4.

4.2 Estimate the Clusters and Model Fit

In this section, we run Gibbs LDA on this data.[footnoteRef:19] We create a cross sectional sample from the Coop data. And, we split this cross-sectional data into three samples (sec. 4.2.1). We use the first sample to find the optimal number of clusters (sec. 4.2.2). We use the second sample to estimate the model using that number of clusters (sec. 4.2.3). We use the third sample to test out-of-sample model fit (sec. 4.2.4). Then, we run Gibbs LDA on the combined cross-sectional data. This gives me our main results (sec. 4.2.5). And, then, we rerun the entire process to verify the results (sec. 4.2.6). [19: We use Phan and Nguyen’s (2007) GibbsLDA++ program to run Gibbs LDA. It is well-tested and used in more than 48 academic publications. There exist many alternatives, including code for: MATLAB (Griffiths and Steyvers, 2011; MathWorks, 2018); R (Chang, 2015); Python (Riddell; 2015); and Stata (Schwarz, 2018). The Stata code shows promise. It was being beta tested when we wrote this paper.]

4.2.1 Data Sampling

First, we create our cross-sectional dataset and then split it into three samples.

We take a cross-section because the model assumes constant cluster shares, . Over time, we would expect prices to change, which would change the cluster shares. We do not model these changes. Therefore, we pick a small time period to keep constant. We only use data from a three-month period: Sept. - Nov. 2015. This is the most recent three-month period without an Italian holiday.[footnoteRef:20] [20: We choose the last period instead of the middle period. We want our clusters to more closely reflect current market conditions. ]

Gibbs LDA uses three variables. These are: the consumer segment, the product ID, and the number of transactions.

We set our consumer segments as municipalities. There were 287 municipalities in Tuscany. We observe sales from residence in 210 of them. We choose to segment consumers at this level to be consistent with the courts.

We set our product as individual stores. We observe 66 stores, so we observe 66 products.

We set a transaction as 10 EUR of spending. Thus, we count 10 EUR of bananas in the same way we count 10 EUR of mangos. This is a nice round number. And, it requires rounding in less than .001% of transactions.

We round to whole numbered expenses with stochastic rounding. We round X to with a probability and to with a probability . Unlike rounding to the nearest, this gives unbiased results. The machine learning literature often uses stochastic rounding (e.g. Gupta et al., 2015).

We limit the number of transactions to 10,000 per consumer segment. Truncation is common with Gibbs LDA. It makes Gibbs LDA run faster. And, it gives more weight to consumer segments with fewer observations. There are 91 consumer segments with too many transactions. For each of these segments, we select 10,000 transactions with random sampling.

Therefore, there are three variables. The first variable is the consumer’s municipality. The second is the store. The third is the number of transactions (in 10 EUR increments).

As we mention earlier, we split the cross-sectional data into three samples. Let the first sample be Sample A. We use it to find the number of clusters (sec. 4.2.2). Let the second sample be Sample B. We use it to estimate the model and the clusters (sec. 4.2.3). And, let the third sample be Sample C. We use it to test out-of-sample model fit (sec. 4.2.4).

[Table 7 about here.]

Table 7 shows how we sample. We split the samples by municipality. Sample A comes from 20% of the municipalities. Sample B comes from a different 40% of the municipalities. Sample C comes from the remaining 40% of the municipalities. We sample this way to be consistent with the Gibbs LDA literature.[footnoteRef:21] [21: We get similar results with subsampling randomly splitting by consumer and with randomly splitting by month. ]

4.2.2 Determine the Number of Clusters

Next, we use Sample A to determine the optimal number of clusters, K.

The optimal K is not the K that maximizes model fit. Better fit is not always better or preferable. A larger K means a more complex model. There are more clusters to fit the data. This gives the model more flexibility to fit the error. As a result, increasing K always increases model fit.[footnoteRef:22] [22: Similarly, in Ordinary Least Squares (OLS), increasing the number of parameters always increases the R-squared. The additional parameter always allows the regression to fit more of the data. Yet, the additional parameter is not always statistically significant or meaningful.]

Therefore, we choose the smallest K such that adding a cluster does not meaningfully increase model fit. This is called the elbow method. You choose the such that lower values meaningfully increase fit while higher values do not meaningfully increase fit.

Figure 8 shows how this works with our data.

[Figure 8 about here.]

Figure 8 plots model fit against the number of clusters. For each , we run Gibbs LDA for 2000 iterations on Sample A. Then, we calculate the model fit of the results. We give fit in terms of the average log-likelihood of a transaction.[footnoteRef:23] [23: We look at model fit per transaction so we can compare results of different sample sizes.]

Figure 8 shows that the optimal K is 14. When K is smaller, adding a cluster meaningfully increases fit. When K is larger, adding a cluster does not meaningfully increases fit.

Note the fact that a clear elbow exists also indicates that the model fits the data. The elbow method would not work well if the data was not clearly clustered.

4.2.3 Estimate the Model with In-Sample Data

Next, we use Sample B to estimate the model and find the clusters. We keep .

We run Gibbs LDA for 2000 iterations on Sample B. We estimate the model fit of each iteration.

[Figure 9 about here.]

Figure 9 shows that model fit converges within 20 iterations.[footnoteRef:24] [24: Even when Gibbs LDA is stationary, the average log-likelihood still produces some small variation due to the MCMC process. ]

We choose to discard the first 64 iterations. This avoids any bias from the initial random cluster assignments. 64 exceeds 20 iterations, and it is a convenient number when we estimate error. See Appendix B.5.1 for more on how to choose the discard period.

[Table 10 about here.]

Table 10 shows the model fit of the remaining iterations. We find a high average log-likelihood of -1.447 and a tiny standard error of 3.022E-06.[footnoteRef:25] [25: This average log-likelihood means that each transaction has a geometric mean of likelihood of about 23.5%. If the average log-likelihood is x, then the geometric mean of the likelihood of each transaction is .]

This high likelihood and tiny standard error indicates that the model fits the data.

4.2.4 Test the Out-of-Sample Model Fit

Next, we use Sample C to test out-of-sample model fit. We use Wallach et al.’s (2009) Chib-style estimator to reduce error.[footnoteRef:26] [26: We use their Chib-style estimator because they do not provide the code for the "left-to-right" algorithm. They show that both algorithms are efficient. ]

Out-of-sample fit in MCMC is not straightforward. Each estimate from the in-sample data incorporates some error. The previous Gibbs LDA literature ignores this error. They estimate out-of-sample fit using the last iterative estimate on Sample B. They ignore the variation in estimating Sample B when estimating the model fit of Sample C.

Therefore, we test how Sample C fits several estimates of running Gibbs LDA on Sample B. We use 44 estimates of running Gibbs LDA on Sample B. We use the estimate from the 108th iteration of Gibbs LDA on Sample B. We use the estimate from the 152nd iteration of Gibbs LDA on Sample B. And so on.[footnoteRef:27] Each time, we run the Chib-style estimator for 200 iterations. We find an average log-likelihood of -3.464. This is high but not as high as the in-sample model fit. We should expect this. In addition, we find a tiny standard error of 1.136E-02. [27: You could potentially run the estimator for each of Sample B's iterations. Then you could estimate standard errors with the method of batch means. We did not do this. It would have taken our computer several months. Perhaps, we need to develop better estimators. Perhaps, Dr. Bruestle just needs a better computer.]

This high out-of-sample likelihood and tiny standard error indicates that the model fits the data.

Also, we estimate out-of-sample fit in the same way in the previous literature. We only use the estimate from the last iteration of Gibbs LDA on Sample B. We run the Chib-style estimator for 1000 iterations. We find a high average log-likelihood of -3.471.

This high out-of-sample likelihood indicates that the model fits the data.

4.2.5 Estimate the Model with Data from all the Municipalities

Then, we run Gibbs LDA on the cross-sectional dataset from all municipalities. This combines Samples A, B, and C. These are the most accurate results, because these results come from the most data.

We run Gibbs LDA for 2000 iterations. We discard the results from the first 64 iterations.[footnoteRef:28] Table 10 shows the model fit statistics. We find a high average log-likelihood of -1.451 and a tiny standard error of 4.232E-06. [28: The algorithm still seems to converge within 20 iterations. ]

This similar finding indicates that the model fits the data.

Then, we create geographic maps of each cluster.

[Figure 11 about here.]

Figure 11 shows six examples of the maps. There are maps of large and small clusters. k4 and k7 are large clusters. k10 and k3 are small clusters. We shade each municipality by expenditure in the cluster. Dark green means that residents of the municipality spent a lot in the cluster. Light green means that they spent a little in the cluster. Yellow means that we do not observe any data from that municipality. In general, the clusters appear to be contiguous.[footnoteRef:29] Red stars are stores with at least 20% of the cluster’s expenditures. Purple triangles are stores with between 5% and 20% of the cluster’s expenditures. Blue dots are stores with less than 5% of the cluster’s expenditures. In general, stores locate within or close to their consumers. To see all maps, refer to Web Appendix E.1 (). [29: There is one and only one cluster (k9) that does not appear contiguous. It appears to combine three less cohesive clusters. When we performed the Elzinga-Hogarty Test (sec. 4.3), we found a similar cluster. ]

This geographical clustering indicates that the model fits the data. Gibbs LDA uses expenditure data. We do not use data on proximity. We do not use data on which stores neighbor each other. We do not use data on which municipalities neighbor each other. Geographical clustering indicates that these are the true clusters.

To interpret our results, we create names for each cluster.

Also, we order clusters approximately from north-to-south.

[Table 12 about here.]

Table 12 shows our names for the fourteen clusters and their total expenditures. These names make our clusters easy to interpret. For more information about each cluster, refer to Web Appendix E.1 ().

Easily interpretable clusters indicate that the model fits the data.

In addition, we use our results to test model fit from data from every month 2010 – 2015. This requires no MCMC iterations, because we already have estimates for  and .[footnoteRef:30] We discard the results based on the first 64 iterations of the cross-sectional data. We calculate the model fit statistics from the remaining iterations. We find high average log-likelihoods of -1.56 to -1.17 with some evidence of seasonality. In addition, we find tiny standard errors of 6.011E-06 to 3.712E-05. [30: Therefore, it does not use of the Chib-Style estimator. ]

These high likelihoods and tiny standard errors on the out-of-sample monthly data indicate that the model fits the data.

4.2.6 Rerun the Experiment to Verify the Results

Finally, we rerun the entire process to verify our results. We rerun the data sampling, all the tests of model fit, and Gibbs LDA. Let’s call this `run 2’ and the previous analysis `run 1’.

We find the same optimal number of clusters.

We find similar model fit statistics. For more detail, see Table 10.

These similar statistics indicate that the model fits the data.

[Figure 13 about here.]

Figure 13 shows that we find similar clusters in run 2.[footnoteRef:31] It shows the Kullback-Leibler (KL) distance between clusters in run 1 and run 2. The KL distance measures the similarity between clusters.[footnoteRef:32] A KL distance of 0 means the distribution of stores equal in both clusters. A higher distance means the clusters differ more. In Figure 13, we shade higher KL distances darker. [31: This follows similar analysis in Steyvers and Griffiths (2007).] [32: Formally the KL distance is: .]

The lower distances on the diagonal show that overall we find similar clusters.

These similar results indicate that the model fits the data.

4.3 Elzinga-Hogarty (E-H) Test

In this section, we compare Gibbs LDA to Elzinga and Hogarty’s (1973; 1978) E-H test. It is an algorithm that finds interchangeability clusters. We often use the E-H test in antitrust cases. First, we define the E-H algorithm that we use for comparison (sec. 4.3.1). Then, we create a dataset from the Coop data (sec. 4.3.2). Then, we find the E-H clusters (sec. 4.3.3). We compare the E-H clusters to the Gibbs LDA clusters that we estimate in the previous section. Finally, we discuss the advantages of Gibbs LDA over the E-H test (sec. 4.3.4).

4.3.1 E-H Algorithm

First, we define the E-H algorithm that we compare against Gibbs LDA.

The E-H test consists of two parts: a demand-side test and a supply-side test. We call the demand-side test: "little from the outside" (LIFO). It tests whether nearly all consumption by the candidate area comes from stores in the candidate area. This tests for consumption overlap with stores from outside the area. We call the supply-side test: "little out from the inside" (LOFI). It tests whether nearly all sales volume from the candidate area comes from consumers in the candidate area.[footnoteRef:33] [33: This weakly tests for overlapping production. In some sense, Gibbs LDA also tests overlapping production. This requires some firms making multiple products.]

These two tests complement each other.  LIFO tests demand but ignores supply. LOFI tests supply but ignores demand. Elzinga and Hogarty (1973) argue that a market satisfies both tests.

Formally, the LIFO and LOFI tests are:[footnoteRef:34] [34: Although Elzinga and Hogarty (1973) initially suggest using 75% thresholds, 90% thresholds have become the most commonly accepted threshold. In California v. Sutter Health System, Judge Maxine Chesney finds: "a service area based on the 90 percent level of significance... to be more appropriate than one based on an 85 percent threshold as proposed by the plaintiff. Courts have generally acknowledged the 90 percent level of significance." See California v. Sutter, 130 F Supp. 2d. 1109 (N.D. Cal. 2002). That said, Frech et al. (2003, pg. 3) points out that there is no economic reasoning for a 90% threshold.]

LIFO

(LIFO)

LOFI

(LOFI)

The courts apply the E-H test in many ways. See Frech et al. (2003) for a review.

To compare Gibbs LDA clusters and E-H clusters, we settle on one algorithm based on what is most commonly accepted in court:

(1) Start with an initial set of store(s). This consists of all the stores in some initial municipality.

(2) Set the candidate area as the municipality or municipalities of the set of stores.[footnoteRef:35] [35: Elzinga and Hogarty (1973) claim that the initial candidate area should be the smallest area around the largest store or plant that accounts for 75 percent of its commerce. This alternative is equivalent.]

(3) If the candidate area satisfies both (LIFO) and (LOFI), then you are done. This is the E-H cluster. Otherwise, add a municipality to the candidate area. Use the municipality with the highest volume of sales of the remaining municipalities.[footnoteRef:36] [36: Frech et al. (2003) review several alternatives to picking the next municipality. Two district courts have endorsed using the volume of sales method (Frech et al., 2003). This is the method said to be closest to how firms think about their service areas. See California v. Sutter, 130 F. Supp. 2d at 1122; see also FTC v. Freeman Hosp., 69 F.3d 260 (8th Cir. 1995).]

(4) If this additional municipality has more stores, then add those stores to the set of stores and return to step (2). Otherwise, return to step (3). [footnoteRef:37] [37: Frech et al. (2003) find that it is more reasonable and consistent to update the initial candidate region in this manner than to only use the volume of sales from the initial stores.]

4.3.2 E-H Dataset

Then, we create a data sample for the E-H test out of the Coop data.

It is okay to use the E-H test on supermarket data. Although grocery stores sell many goods, consumers treat store choice as one good. Grocery shopping is a nested problem. First, consumers select a store. Then, consumers decide which products to buy (e.g. Besanko et al., 1998).

Therefore, we treat grocery sales as one good.

To apply the E-H test, we create a price-adjusted measure of volume of sales. We normalize sales volumes by a price index. Then, we use data from Sept. – Nov. 2015 to create an E-H dataset. These are the same purchases that we use in Gibbs LDA.

First, we create store-level price indices. Hausman (1996) creates similar price indices and remove seasonality. For details on the creation of the price indices, see Appendix D.

Then, we use these price indices to create a price-adjusted measure of the volume of sales. We divide expenditures at a store by its price index.

This creates a cross-sectional dataset. One observation is the price-adjusted volume of sales from a seller-municipality to a buyer-municipality.

4.3.3 Similar Clusters

Next, we run the E-H algorithm on the E-H data. We start the algorithm from every store. This creates 25 unique clusters. On average, each cluster contains 2.68 municipalities. 1.96 of which contain one of the stores. And, it shares 0.92 municipalities with another cluster.

Then, we match each E-H cluster with its most similar Gibbs LDA cluster. Each Gibbs LDA cluster matches at least one similar E-H cluster. Often several E-H clusters match the same Gibbs LDA cluster. Many E-H clusters are similar. And, 21 of the 25 E-H clusters match a Gibbs LDA cluster. We expect a few non-matches. The two algorithms use different thresholds.[footnoteRef:38] [38: Perhaps we the elbow method could be applied to the E-H test. We leave this to future research. ]

[Figure 14 about here.]

Figure 14 shows E-H clusters superimposed on their Gibbs LDA clusters. The red dashed and dotted lines are the boundaries of the different E-H clusters.

Both algorithms seem to estimate the same clusters.

4.3.4 Advantages of Gibbs LDA over the E-H Test

Although you get similar clusters with both Gibbs LDA and the E-H test, Gibbs LDA has some advantages.

Gibbs LDA assigns every transaction to a cluster. The E-H test often assigns the same transaction to multiple clusters.

As a result, Gibbs LDA handles contested regions better. The E-H test treats a contested region as belonging to both clusters. While, Gibbs LDA assigns a contested region probabilistically to each cluster.

Gibbs LDA does not fall for the silent majority fallacy. As we mention earlier, the E-H test could fail because it relies on consumption overlap. Suppose there exists consumption overlap between two stores. Let’s call the consumers who buy from both stores: travelers. Further, suppose there exists a silent majority not willing to substitute one store for another. We would infer the silent majority would substitute one good for another based on the travelers. This leads to false results in the LIFO test (through consumption overlap). Yet, Gibbs LDA would not lead to false results. It relies on a more complete view of interchangeability in use. In Gibbs LDA, the silent majority fallacy leads to poor model fit. Therefore, we avoid the fallacy when we find good model fit.

Gibbs LDA is identifiable. The E-H test is not.

Gibbs LDA has many measures of model fit. The E-H test does not.

Therefore, Gibbs LDA is preferable to the E-H test in some situations.

4.4 Estimate Markets from Gibbs LDA Clusters

In this section, we estimate the markets from the Gibbs LDA clusters. Potentially, we could do this with the E-H clusters. Yet, it would be more difficult. There are too many overlapping E-H clusters.

It is easier to define markets from Gibbs LDA clusters than from municipalities. There 66 are municipalities and 14 clusters. Gibbs LDA reduces the number of dimensions in the problem.

First, we looked at consumption overlap. We start with the consumption overlap between clusters. Then, we look at the consumption overlap between provinces. The courts often use consumption overlap to define markets.

[Figure 15 about here.]

Figure 15 shows how much consumption overlap exists between clusters. The horizontal axis is the threshold that we use to determine which municipalities are in a cluster. It is in terms of expenditure. Let the threshold be %. The municipalities that belong to cluster k are the smallest set that total % of cluster k’s expenditures. You order municipalities from largest to smallest expenditure in cluster k. The vertical axis is the average number of shared municipalities with other clusters. It measures consumption overlap.

Figure 15 shows moderate evidence of clusters overlapping. Yet, it reacts significantly to the threshold. There is no established threshold.

Therefore, multiple clusters could be in the same market. We cannot tell. We need to look at others measures at the cluster-level.

Before we do that, let’s look at the province-level.

[Figure 16 about here.]

Figure 16 shows how much consumption overlap exists between provinces. It shows how much consumers from each province spent in stores in each province. For example, Siena residents spent 13.80% of their expenditures in Grosseto.

Figure 16 shows most provinces are separate.

A couple of exceptions exist:

Grosseto and Siena could be in the same market. Siena residents spent 13.80% of their expenditures in Grosseto. Siena residents did not spend much. Siena stores did not take in much revenue. There was only one store in Siena, and consumers in Siena mostly lived near the western border close to Grosseto.

Massa & Carrara and Lucca could be in the same market. Massa & Carrara residents spent 1.67% of their expenditures in Lucca. There were only two stores in Massa & Carrara, and consumers in Massa & Carrara mostly lived near the southern border close to Lucca.

Therefore, we investigate three regions: (a) the province of Massa & Carrara and the province of Lucca, (b) the province of Livorno, and (c) the provinces of Grosseto and Siena.

Next, we create cluster-level price indices to test for markets within these three regions. We start with the store-level prices that we create for the E-H test. Then, we create the cluster-level price indices from the store-level price indices. Cluster k’s index was the weighted average of the store-level indices, weighing by our estimate of . And, we remove seasonality. For details on the creation of the price indices, see Appendix D.

[Table 17 about here.]

Then, we choose the pricing test to define markets within these three regions. we choose Forni’s (2004) stationarity test based on a process of elimination. You use different tests in different situations. It depends on the data. Table 17 shows this process applied to the Coop data.

Stationarity tests come from the law of one price. The idea is that the ratio of prices of two goods in the same market should remain constant. Therefore, if this ratio is stationary, the products are in the same market. If this ratio is not stationary, the products are not in the same market.

Forni (2004) combines two tests: one that shows products are in the same market and one that shows that products are not in the same market. The Augmented Dickey Fuller (ADF) test has the null hypothesis of non-stationarity. Therefore, a rejection of the null hypothesis shows that the stores are in the same market. But failing to reject the null hypothesis does not show if stores are in the same market or not. The Kwiatkowski-Philips-Schmidt-Shin (KPSS) test has the null hypothesis of stationarity. Therefore, a rejection of the null hypothesis shows that the stores are not in the same market. But failing to reject the null hypothesis does not show if stores are in the same market or not. By combining the ADF and the KPSS tests we test whether stores are in the same market or not.

[Table 18 about here.]

Tables 18(a)-(c) shows the results of running these tests on the log of the price ratios between clusters. Following Forni (2004), we test ADF with orders of 4 and 8. We reject the ADF test if either order rejects at a 10% significance. We test KPSS with Bartlett windows of eight and sixteen months. We reject the KPSS test if either window rejects at a 10% significance. If we both reject the ADF test and we fail to reject the KPSS test, then this shows that the clusters are in same market. We indicate this with a "S".  If we both reject the KPSS test and fail to reject the ADF test, then this shows that the clusters are in different markets. We indicate this with a "D". If we fail to reject both tests, then we cannot determine if the two stores are in the same market. We indicate this with a "?". This only occurs once.

Table 18(a) shows the results between clusters in the province of Massa & Carrara and the province of Lucca. All the clusters (k1 - k3) are in the same market.

Table 18(b) shows the results between the clusters in the province of Livorno. There exist two markets: the City of Livorno (k4) and the rest of the province (k5 - k9).

Table 18(c) shows that results between the clusters in the provinces of Grosseto and Siena. The tests indicate that there potentially exist two overlapping markets. One potential market consists of k10, k11, k12, and k14. Let's call this M1. And, another potential market consists of k10, k12, k13, and possibly k14. Let's call this M2. M1 concentrates in the north. M2 concentrates in the south

M1 and M2 overlap enough to be one market. 47.6-49.9% of M1's expenditures are in the overlapping clusters. And, 68.3% - 69.3% of M2's expenditures are in the overlapping clusters.

Therefore, although k11 and k13 do not directly compete. They indirectly compete through a chain of substitutes with a significant market share.

In Brown Shoe v. United States, the court accepted Joan Robinson's position that market boundaries could only be drawn at gaps in chains of substitutes (see Werden, 1992, pg. 158).

Therefore, Grosseto and Siena are one market.

Therefore, there exist four markets: (1) Massa & Carrara and Lucca (k1 - k3), (2) the City of Livorno (k4), (3) the rest of Livorno (k5 - k9), and (4) Grosseto and Siena (k10 - k11).

5. Conclusion

Market definition is a form of clustering. There are many recent advances in clustering. There are advances in marketing science, data science, and machine learning. We should evaluate and adapt these advances to the problem of market definition.

In this paper, we show that interchangeability clusters are subsets of markets.

We demonstrate that you should estimate markets in two steps. First, you solve for the interchangeability clusters. Then, you estimate the markets from the clusters. Finding the clusters first makes it easier to find the markets. It reduces the number of dimensions.

Also, we show how to estimate clusters with Gibbs LDA. We show how to find the number of clusters and how to test model fit.

Also, we present strong evidence that LDA fits the Coop data. This shows that these interchangeability clusters exist in the marketplace.

In addition, we show how to interpret the results of Gibbs LDA.

In addition, we discuss some advantages of Gibbs LDA clusters over E-H clusters. Gibbs LDA is identifiable. It has better measures of model fit. And, it avoids the silent majority fallacy.

In addition, we show how to estimate geographic markets from Gibbs LDA clusters.

In the next paper, we plan to explore product-markets. Potentially, we could use Gibbs LDA to define both geographic and product markets together.

In future papers, we need to evaluate other clustering algorithms as methods for empirically defining markets. For example, we could potentially use hierarchical LDA (Griffiths, et al., 2004) to find niche markets.

Also, we can extend Gibbs LDA to model for pricing effects. Gibbs LDA is very flexible and easy to extend. Potentially, we can let Φ depend on price with some adjustment to (2).

Further, we need to explore the results of probabilistic clusters. The resulting markets are not necessarily "lines in the sand".

Also, it would be useful to apply Gibbs LDA to different industries. This would help us establish better standards for its use in court.

In Conclusion, this paper is a beginning of a new line of research, not an end. We hope this encourages economists to use modern clustering techniques to define markets. Businesses use more and more big data. They use clustering to segment consumers. It is becoming part of how firms think about their customers. We need to explore its potential for future antitrust cases.

References

Airoldi, Edoardo M, David M Blei, Stephen E Fienberg, and Eric P Xing, “Mixed membership stochastic blockmodels,” Journal of Machine Learning Research, 2008, 9 (Sep), 1981–2014.

Angelino, Elaine, Matthew James Johnson, Ryan P Adams et al., “Patterns of scalable bayesian inference,” Foundations and Trends in Machine Learning, 2016, 9 (2-3), 119–247.

Bandiera, Oriana, Stephen Hansen, Andrea Prat, and Raffaella Sadun, “CEO Behavior and Firm Performance,” Technical Report, National Bureau of Economic Research 2017.

Besanko, David, Sachin Gupta, and Dipak Jain, “Logit demand estimation under competitive pricing behavior. An equilibrium framework,” Management Science, 1998, 44 (11-part-1), 1533–1547.

Blei, David M., Andrew Ng, and Michael Jordan, “Latent Dirichlet allocation,” JMLR, 2003, 3, 993–1022.

Cao, Liangliang and Li Fei-Fei, “Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes,” in “Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on” IEEE 2007, pp. 1–8.

Capps, Cory S, David Dranove, Shane Greenstein, and Mark Satterthwaite, “The silent majority fallacy of the Elzinga-Hogarty criteria: a critique and new approach to analyzing hospital mergers,” Technical Report, National Bureau of Economic Research 2001.

Cartwright, Phillip A, David R Kamerschen, and Mei-Ying Huang, “Price correlation and granger causality tests for market definition,” Review of Industrial Organization, 1989, 4 (2), 79–98.

Chang, Jonathan, “Package ‘lda’: Collapsed Gibbs Sampling Methods for Topic Models [r software],” 2015. https://cran.r-project.org/web/packages/lda/ [Accessed: 2018-04-05].

Chen, Xin, Xiaohua Hu, Xiajiong Shen, and Gail Rosen, “Probabilistic topic modeling for genomic data interpretation,” in “Bioinformatics and Biomedicine (BIBM), 2010 IEEE International Conference on” IEEE 2010, pp. 149–152.

Cournot, Antoine Augustin, “Researches into the Mathematical Principles of the Theory of Wealth,” Macmillan, 1897.

Davis, Peter and Eliana Garcés, Quantitative techniques for competition and antitrust analysis, Princeton University Press, 2009.

Elzinga, Kenneth G and Thomas F Hogarty, “The Problem of Geographic Market Delineation in Antimerger Suits,” Antitrust Bull., 1973, 18, 45.

______ and ______, “The problem of geographic market delineation revisited: the case of coal,” Antitrust Bull., 1978, 23, 1.

Erosheva, Elena A, Stephen E Fienberg, and Cyrille Joutard, “Describing disability through individual-level mixture models for multivariate binary data,” The annals of applied statistics, 2007, 1 (2), 346.

Fei-Fei, Li and Pietro Perona, “A bayesian hierarchical model for learning natural scene categories,” in “Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on,” Vol. 2 IEEE 2005, pp. 524–531.

Forni, Mario, “Using stationarity tests in antitrust market definition,” American Law and Economics Review, 2004, 6 (2), 441–464.

Frech, Harry E III, James Langenfeld, and R Forrest McCluer, “Elzinga-Hogarty tests and alternative approaches for market share calculations in hospital markets,” Antitrust LJ, 2003, 71, 921.

Geyer, Charles J, “Practical markov chain monte carlo,” Statistical science, 1992, pp. 473–483.

Gilks, Walter R, Sylvia Richardson, and David Spiegelhalter, Markov chain Monte Carlo in practice, CRC press, 1995.

Glynn, Peter W and Donald L Iglehart, “Simulation output analysis using standardized time series,” Mathematics of Operations Research, 1990, 15 (1), 1–16.

______ and Ward Whitt, “Estimating the asymptotic variance with batch means,” Operations Research Letters, 1991, 10 (8), 431–435.

Griffiths, Thomas L and Mark Steyvers, “Finding scientific topics,” Proceedings of the National academy of Sciences, 2004, 101 (suppl 1), 5228–5235.

______ and ______, “Matlab Topic Modeling Toolbox 1.4,” 2011. http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm [Accessed: 2018-03-29].

______, Michael I Jordan, Joshua B Tenenbaum, and Mark Steyvers, “Hierarchical topic models and the nested chinese restaurant process,” Advances in neural information processing systems, 2004, 17-24.

Guidotti, Riccardo and Lorenzo Gabrielli, “Recognizing residents and tourists with retail data using shopping profiles,” International Conference on Smart Objects and Technologies for Social Good, Springer, Cham, 2017, pp. 353–363.

______, ______, Anna Monreale, Dino Pedreschi, and Fosca Giannotti, “Discovering temporal regularities in retail customer’s shopping behavior,” EPJ Data Science, 2018, 7 (1), 6.

Gupta, Suyog, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan, “Deep learning with limited numerical precision,” in “International Conference on Machine Learning” 2015, pp. 1737–1746.

Hausman, Jerry A, “Valuation of new goods under perfect and imperfect competition,” in “The economics of new goods,” University of Chicago Press, 1996, pp. 207–248.

Hu, Diane J, “Latent dirichlet allocation for text, images, and music,” University of California, San Diego, 2009, 26, 2013.

Ivaldi, Marc and Szabolcs Lorincz, “A full equilibrium relevant market test: application to computer servers,” 2005.

Jones, Galin L, Murali Haran, Brian S Caffo, and Ronald Neath, “Fixed-width output analysis for Markov chain Monte Carlo,” Journal of the American Statistical Association, 2006, 101 (476), 1537–1547.

Kaplow, Louis, “Market definition: Impossible and counterproductive,” Antitrust Law Journal, 2013, 79 (1), 361–379.

Lancichinetti, Andrea, M Irmak Sirer, Jane X Wang, Daniel Acuna, Konrad Körding, and Lus A Nunes Amaral, “High-reproducibility and high-accuracy method for automated topic classification,” Physical Review X, 2015, 5 (1), 011007.

Langenfeld, James and Wenqing Li, “Critical loss analysis in evaluating mergers,” The Antitrust Bulletin, 2001, 46 (2), 299–337.

Marlin, Benjamin M, “Modeling user rating profiles for collaborative filtering,” in “Advances in neural information processing systems” 2004, pp. 627–634.

Marshall, Alfred, “Principles of Economics (London, 1920),” Book V, 1920, p. 324.

MathWorks, “Matlab Text Analytics Toolbox 2018a [MATLAB software],” 2018. https://www.mathworks.com/help/textanalytics/ [Accessed: 2018-03-29].

Mimno, David, “The Dirichlet-multinomial distribution [INFO 6150 Class Handout],” 2016. Retrieved from https://mimno.infosci.cornell.edu/info6150/exercises/polya.pdf [Accessed: 2018-03-07].

Minka, Thomas and John Lafferty, “Expectation-Propagation for the Generative Aspect Model,” Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, 2002, pp. 352–359.

Niebles, Juan Carlos, Hongcheng Wang, and Li Fei-Fei, “Unsupervised learning of human action categories using spatial-temporal words,” International journal of computer vision, 2008, 79 (3), 299–318.

Phan, Xuan-Hieu and Cam-Tu Nguyen, “GibbsLDA++: A C/C++ Implementation of Latent Dirichlet Allocation [C/C++ software],” 2007. http://gibbslda.sourceforge.net/ [Accessed: 2018-04-05].

Pritchard, Jonathan K, Matthew Stephens, and Peter Donnelly, “Inference of population structure using multilocus genotype data,” Genetics, 2000, 155 (2), 945–959.

Riddell, Allen, “lda 1.0.5: Topic modeling with latent Dirichlet allocation [python software],” 2015. https://pypi.python.org/pypi/lda [Accessed: 2018-04-05].

Robert, Christian P and George Casella, “Monte Carlo statistical methods,” 2004.

Russell, Bryan C, William T Freeman, Alexei A Efros, Josef Sivic, and Andrew Zisserman, “Using multiple segmentations to discover objects and their extent in image collections,” in “Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on,” Vol. 2 IEEE 2006, pp. 1605–1614.

Sarstedt, Marko and Erik Mooi, “Cluster analysis,” in “A concise guide to market research,” Springer, 2014, pp. 273–324.

Schwarz, Carlo, “ldagibbs: A command for topic modeling in Stata using latent Dirichlet allocation,” The Stata Journal, 2018, 18 (1), 101–117.

Sivic, Josef, Bryan C Russell, Alexei A Efros, Andrew Zisserman, and William T Freeman, “Discovering objects and their location in images,” in “Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on,” Vol. 1 IEEE 2005, pp. 370–377.

Slade, Margaret E, “Exogeneity tests of market boundaries applied to petroleum products,” The Journal of Industrial Economics, 1986, pp. 291–303.

Steyvers, Mark and Tom Griffiths, “Probabilistic topic models,” Handbook of latent semantic analysis, 2007, 427 (7), 424–440.

Stigler, George J and Robert A Sherwin, “The extent of the market,” The Journal of Law and Economics, 1985, 28 (3), 555–585.

Teh, Yee-Whye, David Newman, and Max Welling, “A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation,” in “NIPS” 2006.

Unicoop Tirreno, “Il Bilancio Consuntivo 2012,” https://tinyurl.com/y4uw8vqm June 2013.

______, “Il Bilancio Consuntivo 2012,” https://tinyurl.com/y5sacvbo June 2015.

______, “Il Bilancio 2015,” https://tinyurl.com/yxc64yx2 June 2016.

United Nations. Statistical Division, Classification of Individual Consumption According to Purpose (COICOP) 2018, Vol. Series M. No. 99, United Nations Publications, 2018.

U.S. Trade Commission, Improving healthcare: a dose of competition; a report by the Federal Trade Commission and Department of Justice (July 2004), with various supplementary materials, Springer, 2005.

Wallach, Hanna M, Iain Murray, Ruslan Salakhutdinov, and David Mimno, “Evaluation methods for topic models,” in “Proceedings of the 26th annual international conference on machine learning” ACM 2009, pp. 1105–1112.

Wang, Xiaogang, Xiaoxu Ma, and Eric Grimson, “Unsupervised activity perception by hierarchical bayesian models,” in “Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on” IEEE 2007, pp. 1–8.

______, ______, and ______, “Spatial latent dirichlet allocation,” in “Advances in neural information processing systems” 2008, pp. 1577–1584.

Werden, Gregory J, “The history of antitrust market delineation,” Marq. L. Rev., 1992, 76, 123.

AppendixA. Tables and FiguresTable 1: Tests for Defining Markets

Criteria:

Test:

Economics Market: prices of goods in the same market tend to equality with due allowance for transportation cost

(Marshall, 1920)

• Price Correlation Comparison (e.g. Nestlé–Perrier merger)

• Stationary Tests (e.g. Stigler and Sherwin, 1985; Forni, 2004)

• Granger Causality Tests (Cartwright et al., 1989; Slade, 1986)

• Natural Experiments on Price Movements

(Davis and Garcés, 2009, pg. 185-188)

Antitrust Market:

the smallest area or group of goods where a hypothetical, profit-maximizing monopolist would impose a 'small but significant and nontransitory' increase in price

(1984 Merger Guidelines)

• Small but Significant and Nontransitory Increase in Price (SSNIP) test (1984 Merger Guidelines)

• Critical Loss Analysis (Langenfeld and Li, 2001)

• Full Equilibrium Relevant Market (FERM) test

(e.g. Ivaldi and Lorincz, 2005)

Interchangeability Cluster: goods are in the same market if they are interchangeable in use

(U.S. v. DuPont (Cellophane);

Brown Shoe Co. v. U.S.;

and U.S. v. Continental Can)

Qualitative:

• Functional Substitutes (Davis and Garcés, 2009, pg. 166-167).

• Contiguity in Geographic Markets

Quantitative:

• Elzinga-Hogarty Test (Elzinga and Hogarty, 1973; 1978)

• Latent Dirichlet Allocation (proposed this paper)

Figure 2: Example 1 (Corner Stores & Grocery Stores)

First, a consumer randomly draws a bag (i.e. a cluster). This random process depends on the consumer segment. Then the consumer draws a random product from the bag. In this case, the consumer segments are urbanites, suburbanites, and country folk. And, the bags are corner stores and grocery stores.

Figure 3: Example 2 (Televised Sports Programs in India)

One multiuse good exists. `India TV Sports News’ is both a cricket and a soccer program.

Table 4: Illustration of Gibbs Sampling Applied to a Small LDA Example1. The initial random cluster assignments

1. The cluster assignments after 50 iterations of Gibbs Sampling

There are 20 consumer segments. Each consumer segment makes ten purchases. We draw these purchases using LDA with α=0.2. Two clusters exist: “cold weather shoes” and “warm weather shoes”. A draw from the “cold weather shoes” cluster has a 40% chance of being wool slippers, a 40% chance of being snow boots, and a 20% chance of being sneakers. A draw from the “warm weather shoes” cluster has a 20% chance of being sneakers, a 40% chance of being sandals, and a 40% chance of being flip flops. A white dot means the purchase is assigned to the first cluster (cold weather shoes). And, a black dot means the purchase is assigned to the second cluster (warm weather shoes).

Table 5: Summary Statistics of Coop Data

2010- 2015

Sept. - Nov. 2015

(sample for price indices)

(sample for Gibbs LDA & E-H test)

#Stores

Average Monthly

#Stores

Average Monthly

Province:

#Consumers*

Revenue(mil. EUR)

#Consumers*

Revenue(mil. EUR)

Massa & Carrara

2

7,148.26

2.316

2

6,501.33

2.838

Lucca

7

10,114.71

3.517

7

9,826.33

4.323

Livorno

29

75,288.76

29.957

27

68,530.67

34.425

Grosseto

32

32,134.51

10.261

28

29,231.33

11.948

Siena

1

129.08

0.042

1

128.00

0.051

All Tuscany:

71

122,851.50

46.094

65

112,503.70

53.585

* = province-level number of consumers do not aggregate to Tuscany-level number of consumers because some consumers shop in more than one province.

Table 6: Expenditures by Consumer’s Province and Store’s Province (Sept. – Nov. 2015)

Avr. Monthly Expenditure (mil. EUR)

Fraction of Expenditure spent in each Province:

Massa & Carrara

Lucca

Livorno

Grosseto

Siena

Consumer's Province

Massa & Carrara

2.874

98.29%

1.66%

0.04%

0.01%

0.00%

Lucca

4.285

0.29%

99.58%

0.12%

0.01%

0.00%

Livorno

33.900

0.00%

0.01%

99.31%

0.68%

0.00%

Grosseto

11.744

0.00%

0.00%

0.36%

99.63%

0.01%

Siena

0.057

0.00%

0.00%

0.66%

13.45%

85.89%

Others (Tuscany)

0.725

0.00%

0.75%

97.89%

1.27%

0.09%

Total

53.585

5.30%

8.07%

64.24%

22.30%

0.09%

Table 7: Subsampling Summary

Sample:

Split:

Number of Municipalities:

Number of Transactions:

Sample A – Find the Number of Clusters

20%

42

309,266

Sample B – Estimate the Clusters

40%

84

427,920

Sample C – Out of Sample Testing

40%

84

430,691

Cross-Sectional Dataset

= Combined Samples A, B & C

100%

210

1,167,877

Note: all samples cover Sept. – Nov. 2015.

Figure 8: Model Fit by Number of Clusters

For each K=2, …, 40, we run Gibbs LDA for 2000 iterations on Sample A. This graph is the estimated average (per transaction) log-likelihood for every value of K.

Figure 9: Model Fit of the first 30 Iterations of Gibbs LDA on Sample B

Average log-likelihood converges within 20 iterations. We avoid transient bias by discarding the results from the first 64 iterations.

Table 10: Model Fit Statistics for Samples of Gibbs LDA

Sample

AverageLog-Likelihood

StandardError

Run 1

Sample B*

-1.447

3.022E-06

Sample C (44 iterations of B)#

-3.464

1.136E-02

Sample C (last iteration of B)!

-3.471

--

Cross Sectional Dataset*

-1.451

4.232E-06

Run 2

Sample B*

-1.413

2.868E-06

Sample C (44 iterations of B)#

-3.615

4.208E-03

Sample C (last iteration of B)!

-3.622

--

Cross Sectional Dataset*

-1.475

4.124E-06

* = We run Gibbs LDA for 2000 iterations. We discard the results from the first 64 iterations. We estimate the average log-likelihood of each remaining iteration. We report the average of these average log-likelihoods. We estimate the standard errors using the method of batch means (see Appendix B.5.1).

# = We estimate the average log-likelihood using Wallach et al.'s (2009) Chib-Style estimator for out-of-sample LDA standard errors. We use 44 estimates from Sample B. We use the estimate from the 108th iteration of Gibbs LDA on Sample B. We use the estimate from the 152nd iteration of Gibbs LDA on Sample B. And so on. Each time, we run the estimator for 200 iterations. The average and standard error are the average and standard error of these 44 estimates.

! = We estimate the average log-likelihood using Wallach et al.'s (2009) Chib-Style estimator for out-of-sample LDA standard errors. We only use the estimate from the last iteration of Gibbs LDA on Sample B. We run the Chib-style estimator for 1000 iterations.

Figure 11: Example Maps of Gibbs LDA Clusters

1. k4: City of Livorno (N. Prov. of Livorno)

[38.59 mil. EUR Expenditures]

1. k7: Piombino (Southern Livorno)

[11.84 mil. EUR Expenditures]

1. k11: Follonica (North-Western Grosseto)

[11.11 mil. EUR Expenditures]

1. k8: Island of Elba

[6.18 mil. EUR Expenditures]

1. k10: Northern Grosseto

[3.48 mil. EUR Expenditures]

1. k3: Bagni di Lucca (Eastern Lucca)

[0.35 mil. EUR Expenditures]

Note: municipality boundaries data powered by MapIt (http://mapit.openpolis.it).

We shade each municipality by expenditure in the cluster. Dark green means that residents of the municipality spent a lot in the cluster. Light green means that they spent a little in the cluster. Yellow means that we do not observe any data from that municipality.

Red stars are stores with at least 20% of the cluster’s expenditures. Purple triangles are stores with between 5% and 20% of the cluster’s expenditures. Blue dots are stores with less than 5% of the cluster’s expenditures.

These results are typical of all 14 clusters. Maps of all the clusters are available in Web Appendix E.1 ().

Table 12: Cluster Descriptions and Expenditures

Cluster:

Description:

Total Expenditurein mil. EUR(Sept. - Nov. 2015)

Province of Massa and Carrara& Province of Lucca:

k1

Northern Lucca

0.42

k2

S. Massa & Carrara and W. Lucca

14.89

k3

Bagni di Lucca (Eastern Lucca)

0.35

Province of Livorno:

k4

City of Livorno(N. part of the Prov. of Livorno)

38.59

k5

North-Central Livorno

4.28

k6

Cecina (South-Central Livorno)

8.11

k7

Piombino (Southern Livorno)

11.84

k8

Island of Elba

6.18

k9

etc. Livorno

7.88

Province of Grosseto:

k10

Northern Grosseto

3.48

k11

Follonica (North-Western Grosseto)

11.11

k12

City of Grosseto (Central Grosseto)

7.08

k13

Southern Grosseto

4.91

Province of Siena:

k14

Western Siena

0.51

Figure 13: Stability of Clusters between Different Runs

Run 2

k10'

k2'

k9'

k3'

k13'

k8'

k12'

k4'

k5'

k11'

k6'

k14'

k1'

k7'

KL Dist.:

Run 1

k10

0

8.9

8.7

9.3

9.1

9.3

9.7

10.5

8.3

10.4

8.4

11.4

10

11.1

26

k2

9.1

2.9

18.3

20.8

21.2

20.6

19.7

21.6

12

21.5

17.3

21.8

19.8

17

24

k9

9.7

17.2

4.6

21.4

21.9

21.4

22.1

22.2

14.5

15.9

20.8

13.2

21.9

14.9

22

k3

9

19.1

14.2

6.3

21.1

20.7

15.6

18.8

19.8

21.2

20.7

22.3

16.1

21.9

20

k13

10.6

19.7

21.1

21.2

6.7

21.2

22.1

22

13.9

22

11.3

22.8

21.7

22.5

18

k8

9.1

19.2

20.7

15.8

16.3

7.8

21.6

21.6

19.5

21.1

21.1

22.4

20.9

20.8

16

k12

9.1

18.8

18.5

19.1

17.7

19

8.5

21.3

14.5

21.2

9.6

22.1

21

21.7

14

k4

11.1

20.2

14.6

21.6

20.3

21.8

22.6

9.7

20.9

22.5

21.8