Granular Firms - Brandeis...
Transcript of Granular Firms - Brandeis...
Granular Firms:Micro-Data on the Private
Sectorand
Large-Scale Agent ModelsRob Axtell
Computational Social Science Program, Department of Computational and Data
Sciences
Department of Economics
Krasnow Institute for Advanced Study
George Mason University
External Professor, Santa Fe Institute
External Faculty Member, Northwestern Institute on Complexity, Northwestern
University
Outline
Background (5 mins)
Mental model calibration (5 mins)
The data (5 mins)
Agent-based computational models (5 mins)
A full-scale model (5 mins)
Computational aspects (5 mins)
The Private SectorMost output, innovation due to firms
Most workers are employees of firms
What do we know about firms?
Many ‘competing’ theories:
Most read like philosophy not science; falsifiable?
Data on universe of firms progressively available
(U.S.)
There do not exist models that explain all these data
Common is the case study of individual firms
Goals:
Describe these data
Can a model be built that explains them?
U.S. Private Sector:Calibration
Firms~30 million
6 million have employees
Publicly-traded firms
(~10 thousand)
Wal-Mart
~100 thousand/month
Labor~120 million
WORK
WAGES
30 million in the public sector
~2 million/month
~2 million/month
~3 million/month
UnemployedOut of
labor force
Job-to-
job
U.S. Private Sector:Calibration
Firms~30 million
6 million have employees
Publicly-traded firms
(~10 thousand)
Wal-Mart
~100 thousand/month
Labor~120 million
WORK
WAGES
30 million in the public sector
~2 million/month
~2 million/month
~3 million/month
UnemployedOut of
labor force
Job-to-
job
MICRO-DATA ON ALL U.S. FIRMS FROM TAX RECORDS➤ Firm sizes by employees (input), receipts (output), market
capitalization (public), plant and equipment, depreciation,…
➤ Firm ages and, when they go out of business, firm lifetimes
➤ Firm sizes conditional on age and firm ages by size
➤ Firm productivities (output per unit of input), by size and age
➤ Firm growth rates (size changes per unit time), annually and over longer periods; growth conditional on firm size and age
➤ Firm employment by size, age, productivity, growth rates,…
➤ Employment tenure, by size, age, productivity, growth rates,…
➤ Employment networks from employer-employee matched data
The Data
Firm entry, exit
Number of firms
Size dist (workers) Size dist (output) Returns to scale Productivity dist
Age distribution Survival vs age Avg size vs age Avg age vs size Joint dist of size, age
Avg growth rate Growth rate distAvg growth vs
sizeAvg growth vs age Growth vs size, age Growth var vs size
Largest firm Employment by age Income dist Job tenure dist Hiring + separation
Job to job flows Labor flow network LFN degree dist LFN edge weigh dist LFN disassortativityLFN clustering coeff
Productivity by
size
Income by firm size
Firm lifetimes
Number of Firms/Avg Size
SOURCE: CENSUS
Number of Firms/Avg Size
SOURCE: CENSUS
Number of Entrants…
Number of Entrants…
…and Exits
Firm Sizes
“U.S. Firm Sizes are Zipf Distributed,” RL Axtell, Science, 293 (Sept 7, 2001), pp. 1818-20
Pr[S ≥ si] = 1-F(si) = si-a
Average firm size ~ 20
Median ~ 3-4
Mode = 1
Source: Census
-2
~ZIPF DISTRIBUTION
Stationary U.S. Firm Sizes
Source: Census
2000-2012
Individual firms move up
and down the distribution
over time
Firm Sizes in France
SOURCE: GARICANO, LELARGE AND VAN REENEN, 2013
broken power law
Size of the Largest Firm
Source: Luttmer [2011]
Labor Productivity
BIG FIRMS DO NOT HAVE THE HIGHEST PRODUCTIVITY!
IMPLICATIONS: PROBLEMATICAL FOR THE NEW NEW TRADE THEORY
FIRM SIZE, S:
1 < S < 100
100 < S < 10,000
10,000 < S < 1,000,000
Firm Ages are Stationary
Source: Census
WEIBULL DISTRIBUTION
Survival Probability
Source: Census
YOUNG FIRMS HAVE A HIGHER FAILURE RATE
Firm Growth: Subbotin dist
Source: Census and SBA; Perline, Axtell and Teitelbaum
[2006]
ANNUAL 5 YEARS
More variance in separations
Davis, Haltiwanger and Schuh [1996]
LESS HEAVY-TAILED OVER TIME
Job Tenure
Source:
BLS
EXPONENTIAL DISTRIBUTION
U.S. Wages are Stationary
Source: Yakovenko
Buyer-Supplier Networks
Source: Atalay, Hortascu, Roberts and Syverson
Labor Flow Network(dissertation of Omar Guerrero)
Data Summary
Approximately stationary distributions of:
Firm sizes (by many measures)
Firm productivity, by size
Firm ages, survival probabilities, etc., by size
Firm growth rates, by size and age
Job tenure, by size and age, and growth…
Wages, by size and age and growth and tenure..
Networks…
Gross regularities any theory needs to hit…
TheoriesCoase (1930s): Why do firms exist at all?
Why did GM buy Fisher Body (Milwaukee)?
Berle and Means (1930s): corporate structure
Business schools: case study approach
Data on public firms (1950s): stochastic growth
Organizational models (1960s and beyond)
Theories developed in the pre-micro-data era are
not easily brought to bear on such data
Models
How to Create a Model Grounded in the Data?
What parts of the conventional theory of the firm
can be brought to bear on the data?
Machine learning approach?
Stochastic growth approach?
ABM approach: large numbers of interacting
agents
‘Growing’ the Firm Population
Start with 120M workers, emerge:
6 million firms (with employees)
3 million job changers each month
100 thousand start-ups each month
20 thousand largest firms employ 1/2 of workers
1 firm with one million employees
What microeconomic specification can reproduce
these and other empirical facts?
Methodology: Computational agents
How to realize 108 agents?
Model components
• Heterogeneous agents, otherwise no groups or
identical groups
• In order to get large firms to form we need increasing
returns to size/scale (team production)
• Agents adjust their behavior to one another
• Agents cannot be rational because the environment is
too complex, so boundedly rational
• Compensation system: rules for dividing team output
• Each agent has a social network from which it learns
about jobs
Model: Team Production
Consider a group of N agents, each of whom
supplies input (‘effort’)
Total effort level:
Total output:
Each agent receives compensation
proportional to input and a share of output:
Agents have Cobb-Douglas preferences for
income (output shares) and leisure,
Agents periodically seek utility
Analytical Results
Nash equilibria
always exist and
are unique
Agents under-
supply effort at
Nash equlibrium
(Holmström)
Nash equilibrium is
dynamically
unstable for
sufficiently large
groups
Pseudo-code
For all agents:
Consider staying in current job; how hard to work?
Consider a few other firms
Consider doing a start-up
Do option w/highest reward
For all firms:
Produce output
Pay employees
Basic Idea
t t+1
5 FIRMS
13 AGENTS
Basic Idea
t t+1
5 FIRMS
13 AGENTS
Basic Idea
t t+1
5 FIRMS
13 AGENTS
Basic Idea
t t+1
5 FIRMS
13 AGENTS
Basic Idea
t t+1
5 FIRMS
13 AGENTS
5 DIFFERENT FIRMS
13 AGENTS (CONSERVED)
Model dynamics with 1000 agents
Base ParameterizationSize of the U.S. private sector
Results
Number of firms
Number of firms
Average firm size Size dist (workers) Size dist (output) Returns to scale Productivity dist
Average firm age Age distribution Survival vs age Avg size vs age Avg age vs size Joint dist of size, age
Avg growth rate Growth rate distAvg growth vs
sizeAvg growth vs age Growth vs size, age Growth var vs size
Firm entry, exit Largest firm Employment by age Income dist Job tenure dist Hiring + separation
Job to job flows Labor flow network LFN degree dist LFN edge weigh dist LFN assortativityLFN clustering coeff
6 million 20 workers Zipf Zipf constant heavy-tailed
Weibull increasinglinearly
increasing
LaplaceSubbotin decreasing decreasing increasing 1/6 law
1 million100K/mo
3 million/mo power law power law complex
exponential simultaneousexponential
disassortative
linear inc
log(size)
exponential
hierarchical
Pareto
exponential15 years
Realized Number of Firms
TOTAL
ENTRANTS
EXITS
avg firm size ~ 20 =
Realized Firm Size Distributions
-2
Firm Size Statistics
Realized Productivity
SMALL FIRMS
MEDIUM FIRMS
LARGE FIRMS
Realized Firm Ages
Realized Firm Survival
Realized Firm Growth
KEY
8-15
16-31
32-63
64-127
128-255
256-511
512-1023
Realized Job Tenure
US data
Model output
Realizing 108 agents
Needed: 500 bytes/agent => 60 GB, 1 KB/firm => 6 GB
What doesn’t work:
hardware: vector/cluster HPC, multiple boxes, clouds;
software: MPI; Java threads; ‘big data’ languages: Hadoop,
Go, Scala, Erlang, Clojure, Haskell...
What is needed:
large ‘flat’ memory space, OS to address it (Unix)
lots of processors (many cores/processor)
low level language (C/C++, OpenMP, Intel TBB, some GPUs)
2015 $US: $10K for 256 GB RAM, $25K for 1000 GB RAM
What this gets you:compute time of 12-24 hours to remove transient
output ‘data’ directly comparable to real-world data
Fork/Join Parallelization of Agent CodesPopulation of agents
or firms
‘Fork’ it up into
pieces to execute
on a single core
Let it run 1 month
Join to compute
statistics and do
housekeeping
Standard paradigm
in C/C++ (pthreads)
Summary
New data guides model building:
Older theories are insufficiently quantitative
We have more data than we can explain presently
Built and calibrated a full scale model of the U.S.
private sector (120 million workers, 6 million firms)
Endogenous dynamics: realistic job flows + firm
formation for micro reasons; no exogenous shocks
Microeconomic level is not in (Nash) equilibrium
Macro-level is approximately stationary
Possible Implications
IO courses focused on purely strategic (game
theoretic considerations) seem archaic vis-a-vis
comprehensive micro-data
Could a model of the U.S. private sector serve as
the basis for the production side of
macroeconomics?
Can machine learning be used to develop
alternative models that also fit the data?
Can such models be developed to have
economic interpretations/meaning?
Why a Full-Scale Model?
Comprehensive data (administratively complete,
every firm) are increasingly available…
Models at reduced scale require rescaling of the
results in order to compare to the data
We have enough going on already, let’s simplify by
eliminating scale effects!
Links to the reigning computational zeitgeist: ‘whole
cell’ simulation, brain from ‘every last neuron,’
turbulence via CFD, climate from GCMs
Why not: Computing challenges are very real…