Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf ·...
Transcript of Estimating Risk and Return of Infrequently-Traded Assets ...web.math.ku.dk/~rolf/MS_211207.pdf ·...
1
Preliminary and Incomplete
Comments Welcome
Estimating Risk and Return of Infrequently-Traded Assets: A
Bayesian Selection Model of Venture Capital
Arthur Korteweg
Morten Sorensen†
October, 2007
Abstract: When estimating risk and return of infrequently traded assets, a bias arises due
to heteroscedasticity and sample selection. We present a Bayesian model to resolve these
problems, and estimate the model using data with VCs’ investments in entrepreneurial
companies. Correcting for the bias, we find that the intercept of the market model is
significantly lower, and both systematic and idiosyncratic risks are significantly higher,
than suggested by standard methods. The results are robust across specifications, and the
estimates of the selection equation provide new insights into determinants of refinancing
decisions for these companies.
We present a new empirical model of the risk and return of infrequently traded assets and
apply it to investments in entrepreneurial companies by venture capital (VC) investors. A
† Stanford University GSB and University of Chicago GSB and NBER. We are grateful to John Cochrane for very helpful discussions and suggestions and to Susan Woodward of Sand Hill Econometrics for generous access to data and the Center for Research in Security Prices (CRSP) and the Kauffman Foundation for financial support.
2
number of assets, such as privately held companies, real estate, corporate bonds, some
collateralized bonds and other OTC securities are only infrequently traded. Since assets’
market values are only known when they trade, data with these assets’ valuations and
returns are necessarily sporadic. When the current value reflects an earlier trade, this
leads to “stale-price” problem (Scholes and Williams (1977) and Dimson (1979)). In
addition, when the timing of the trades, and hence the timing of the observed valuations
and returns, is endogenous, a potentially important sample selection problem arises (see
Woodward (2004) and Cochrane (2005)). Focusing on the latter problem, we present a
dynamic sample selection model and use it to estimate the risk and return of VC
investments.
Our main findings are that, correcting for the selection bias, the investments are
more risky than previously found. Our estimates of β range from 2.6 to 3.0. Estimates of
the Fama-French three-factor model result in loadings on the SMB factor of 0.9 to 1.1
and loadings on the HML of -1.7 to -2.0. Not surprisingly, entrepreneurial companies
behave much like small growth companies. Moreover, we find that the intercepts in the
market model decline substantially after correction for the selection.
Our model extends previous models by Woodward (2004) and Cochrane (2005).
We find that it is more flexible and numerically tractable, and this allows us to impose
additional restrictions implied by the selection model and to estimate a more flexible
specification of the selection equation. We estimate industry-specific risk and returns,
and we find substantial variation across the major industry groups. Further, we estimate a
3
Fama-French three-factor specification and, perhaps not surprisingly, find that the returns
of entrepreneurial companies load heavily on the size (small) and M/B (growth) factors.
The empirical model combines a dynamic asset pricing model with a selection
model. We assume that the valuations of entrepreneurial companies develop according to
a standard CAPM or three-factor log-normal model. However, most of these valuations
unobserved and must be treated as latent variables in the model. Only when a company
receives a new round of financing, goes public, or is acquired is its valuation observed,
and to account for the potential endogeneity of these events, we add a selection equation
to the model. For each company, at each point in time, this equation specifies the
probability of observing the company’s valuation as a function of the valuation itself, the
time since last refinancing, and general market conditions.
Estimation of the model is numerically difficult. For each company, at each point
in time, the selection equation defines a latent selection variable, which interacts with the
latent valuation variables and the other variables in the model. To estimate the model, it
is necessary to account for the entire joint distribution of the latent selection and
valuations variables, and evaluation of the likelihood function requires simultaneously
integrating all these variables, which is practically infeasible due to the “curse of
dimensionality.” To overcome this numerical problem, we estimate the model using a
Bayesian methodology, based on a Markov Chain Monte Carlo (MCMC) method known
as Gibbs sampling. The model is constructed such that the problem of simulating the
parameters’ posterior distribution can be separated into three smaller interrelated
problems: a Bayesian regression, a draw of truncated random variables, and a Kalman
4
Filtering problem. Each of these problems is well understood and numerically easy, and
the posterior distribution can be simulated by solving these smaller problems in an
iterative way specified by the Gibbs sampling procedure.
A. Previous Literature
Our analysis is closely related to Woodward (2004) and Cochrane (2005). Like
them, we develop a statistical selection model for the valuations of the entrepreneurial
companies, but we extend the models in important ways. Unlike Woodward (2004), we
explicitly model the path of the unobserved market values and impose all restrictions
implied by the selection model. Similarly to the standard Heckman (1979) sample
selection model, the selection model specifies that a valuation is only observed when a
selection variable exceeds a certain threshold. Conversely, the valuation is unobserved
when the valuation is below this threshold. However, in contrast to Heckman (1979),
observations with unobserved outcomes contain important information about the outcome
or valuation variables, because the time-series nature of the data introduces serial
correlations in the values. In other words, in the standard selection model the
observations with unobserved outcomes are used to estimate the first stage but not the
second stage. Here, the serial correlation in valuations means that the restrictions on
valuations implied by observations where valuations are unobserved must also be
imposed in the second stage.
Compared to Cochrane (2005), our model is numerically more tractable, and we
are able to estimate more flexible specifications. In particular, Cochrane (2005) assumes
that the probability of observing a valuation through a refinancing or exit event is a
5
function of the valuation alone. While the valuation is clearly an important determinant of
these events, this specification may be overly parsimonious. The specification implies
that a company with high valuations should refinance each single period, which does not
happen in practice. By including the time since last financing round as an additional
variable in the selection equation, we capture the infrequent nature of these events.
Further, we include variables with the aggregate investment activity of VCs in the
market. These variables are related to the probability of an entrepreneurial company
receiving refinancing but independent of the company’s value (conditional on the market
return), and provide exogenous variation in the selection equation. It is well known that
such exogenous variation helps the statistical identification of selection model generally.
In addition, Ljungqvist and Richardson (2003), Kaplan and Schoar (2005),
Driessen, Lin and Phallippou (2007), and Phalippou and Zollo (2006) estimate various
aspects of the risk and return of investments in entrepreneurial companies from the cash
flows between VCs and their limited partners (LP), sometimes summarized in the funds’
IRRs. These flows capture the entire net return that VCs earn on their investors (net of
fees), but we find that there are some advantages to estimating the risk and return from
the market values of the individual portfolio companies instead. First, we have more
observations. Clearly, there are more individual entrepreneurial companies than VC
funds, and the total return earned by a fund necessarily reflects an average return earned
across a portfolio of companies over a long period of time. It is not possible to attribute
this return to a particular time periods, which makes it difficult to calculate betas and
exposure to risk factors more generally, and it is hard to calculate, for example, industry
specific effects.
6
Second, and perhaps more importantly, the estimation of risk and return from cash
flows alone faces an identification problem. To illustrate this problem, consider two VCs,
both with an initial endowment of $1 and both investing for two periods, after which their
entire capital is paid out. Their returns are given by the CAPM model, the market returns
are 10% and -11.2% in the two periods, and both VCs earn zero α. VC One has a β of
one and pays out 10% of his total capital after the first period. Hence, the two cash flows
observed for this investor are (1 + 10%) x 10% = 1.1 and (1 + 10%) (1 - 10%) (1 -
11.2%) = 0.879. VC Two has a β of negative one and pays out 12.2% of her total capital
after the first period. Hence, the observed cash flows for VC Two are also (1 - 10%) x
12.2% = 1.1 and (1 - 10%) (1 - 12.2%) (1 + 11%) = 0.879. The identification problem is
obvious. The two investors have opposite betas, yet return exactly the same cash flows to
their investors. When these are the observed cash flows, it is not possible to determine
whether the investor has a positive or a negative β. If α is allowed to differ from zero, this
identification problem becomes even more problematic. To our knowledge, this
identification problem has not been formally solved, and the risk and return calculations
that are based on observed cash flows (or IRRs) implicitly rely on some (unspecified)
combination of functional form and distributional assumptions, assumptions about funds’
payout ratios, and assumptions about parameters being equal across funds and time
periods. Obviously, when the estimates are based directly on market values, instead of
cash flows, this identification problem disappears.
The paper proceeds as follows. Section one presents the econometric model.
Section two presents the data. Section three contains a discussion of the empirical results,
and Section four concludes.
7
I. Econometric Model
A. The Valuation Equation
Let the market value of company i be denoted Vi and assume that this value develops
according to a continuous-time log-normal market model (the three-factor models arises
from a standard extension of this specification).
α β⎛ ⎞
− = + − +⎜ ⎟⎝ ⎠
i mi im i
i m
dV dPrdt rdt dw
V P (1)
Here Pm is the value of the market portfolio, r is the risk-free rate, and wi is a standard
Brownian motion with volatility σ 2w . Starting from this continuous-time CAPM allows us
to define the model’s parameters independently of the timing between the observed
valuations. This is important when comparing the MCMC estimator to the OLS and GLS
estimators below. An additional advantage is that this specification implies that the
discrete-time evolution of the log-valuations is linear, which allows us to draw on
standard techniques from Kalman Filtering. The non-linear Kalman Filter presents
substantially greater challenges.
To derive the discrete-time change in the valuation, we use Ito’s lemma to write
σ α β σ⎛ ⎞⎛ ⎞ ⎛ ⎞+ − = + + − +⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
2 21 1ln ln
2 2i i i im m m id V r dt dt d P r dt dw (2)
8
where σ 2i and σ 2
m are the instantaneous variances of iV and mP , respectively, and the
relationship between the variances is σ σ β σ= +2 2 2 2i w im m . The discrete-time distribution of
the return until time t is
σ α β β σ⎛ ⎞ ⎛ ⎞+ − = + + − +⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ∫2 2
0
( ) ( )1 1ln ln
(0) 2 (0) 2
ti m
i i im im m ii m
V t P tr t t r t dw
V P (3)
Rearranging leads to
( )α σ β β σ β ε⎛ ⎞⎛ ⎞− = − + − + − +⎜ ⎟⎜ ⎟
⎝ ⎠ ⎝ ⎠2 2( ) ( )1 1
ln 1 ln(0) 2 2 (0)i m
i w im im m im ii m
V t P trt t rt
V P (4)
Define = ( ) / (0)m m mr P t P and =fr rt . Let the intercept of the model be denoted
( )α α σ β β σ= − + −2 21 11
2 2i i w im im m , and let ( )εε σ 2~ 0,i N for εσ σ=2 2wt . Then αi is the
one-period intercept of the log-market model. While this is not the abnormal return, this
intercept is useful for comparing estimates arising from different statistical models. We
can now write the one-period valuation equation for the econometric model as
( )( )α β ε+ = + + + − +ln ( 1) ln ( ) lni i f i im m f iV t V t r r r (5)
B. The Selection Equation
Most valuations are unobserved and are latent variables in the model. A valuation
is only observed when the company experiences a refinancing or exit event, and we use
the following selection equation to account for the endogeneity of these events. For
company i, at time t, define the selection variable ( )iw t as
9
γ η′= +( ) ( ) ( )i i iw t X t t (6)
Here, Xi(t) is a vector of characteristics (including a constant term) which affects whether
a refinancing or exit event happens. One of these characteristics is typically the asset’s
own valuation or the return earned, either in total or since the previous financing round.
This is natural, since more successful companies are more likely to receive additional
financing, be acquired, or go public. Other characteristics are the time since the previous
financing round and variables capturing the general market conditions. The error term
ηη σ 2( ) ~ (0, )i t N is assumed i.i.d. and captures unobserved shocks. The valuation ( )iV t is
observed when ≥( ) 0iw t , and we define
[ ]= ≥( ) 1 ( ) 0i id t w t (7)
such that ( )id t equals one when a valuation is observed and is zero otherwise. Overall,
the model assumes that the population distribution of ( )( ), ( ), ( ), ( ) ( ), ( )i m f i i iX t r t r t d t V t d t is
observed in the data.
C. Gibbs Sampler
To estimate the model using Gibbs sampling (Geman and Geman (1984) and
Gelfand and Smith (1990)), the variables are divided into three blocks: The first block
contains the parameters, ( )θ α β γ σ= 2, , , . The second block contains the latent selection
variables, ( )iw t . And the third block contains the latent valuations, ( )iV t . The Gibbs
sampler simulates the joint posterior distribution of all these variables by iteratively
simulating the variables in each block conditional on the previous values of the variables
10
in the other blocks. Note that the resulting posterior distribution, is not just the posterior
distribution over the parameters in the model, but the distribution is augmented with the
latent selection and valuation variables. This augmentation technique is introduced by
Tanner and Wong (1987) and significantly improves the numerical tractability of the
model. The model simulated using 500 iterations for burn-in followed by 500 iterations
for the actual estimation. The simulation of the variables in each of the three blocks is
described next.
First, the parameters are sampled using a Bayesian regression. The parameters of
the valuation equation are estimated by a Bayesian regression, in which all the valuations
(both observed and latent) are regressed on the returns to the market and a constant term.
It is well know that, conditional on the variance, the posterior distribution of these
parameters is the Normal distribution
( )2( , | ) ~ ,Nα β σ μ Σ (8)
and the distribution of the variance is the Inverse Gamma distribution
2 ~ ( , )IG a bσ (9)
The parameters in the Normal distribution are
( )
( )
1 20 0
11 20
' /
' /
Z Y
Z Z
μ μ σ
σ
−
−−
= Σ Σ +
Σ = Σ + (10)
where the vector Y contains the monthly excess returns (in logs) computed from the latent
valuations ( )iV t . The matrix Z contains a constant term and the monthly excess market
11
returns (in logs), corresponding to the excess returns. The parameters in the Inverse
Gamma distribution are
0
0
12'2
ii
a a T
e eb b
= +
= +
∑ (11)
where Ti is the number of months between the first and last observation for firm i, and a0
and b0 represent the prior distribution for 2σ , which is 0 0( , )IG a b . The vector e contains
the stacked idiosyncratic returns, ( )i tε ,
For the selection equation, the posterior distribution of the parameters is found by
a Bayesian regression of the selection variables on the exogenous variables. The scale of
this equation is not identified, and it is normalized by setting the variance equal to one.
As a result, the posterior distribution of γ is simply
~ ( , )Nγ θ Ω (12)
where
( )( )
110
10 0
'
'
X X
X Wθ θ
−−
−
Ω = Ω +
= Ω Ω + (13)
The matrix X contains Xi(t) stacked, and W is the stacked vector ( )iw t . The prior
distribution of the parameters, is a Normal distribution with covariance matrix Ω0 and
mean θ0 .
12
In the second block, the selection variables are sampled conditional on the
previous values of the valuation variables and parameters. The conditional distribution of
the selection variables follows directly from equations (5) to (7).
( )( ) ~ ( ) ,1i iw t TN X t γ′ (14)
where TN denotes a truncated Normal distribution. The truncation is at zero from above
when the corresponding valuation is unobserved and at zero from below when it is
observed.
In the final block, the latent valuation variables are simulated. This block is the
most complex part of the estimation procedure. After conditioning on the parameters and
the selection variables, the evolution of the valuation variables corresponds to a Kalman
Filter, which filters out the unobserved valuations using the observed valuations and the
restrictions imposed by the selection equation. To simulate the conditional posterior
distribution of these valuations, we rely on the Forward Filtering Backwards Sampling
(FFBS) procedure by Carter and Kohn (1994). This procedure allows us to draw a sample
path of all a company’s valuations from the posterior distribution of this path implied by
the observed valuations and the selection equation. In the terminology of the Kalman
Filter, the latent valuation variables are state variables, and the transition rule is the
valuation equation
( ) ( ) ( )ln ( 1) ln ( ) ( ) ( )i i i f m f iV t V t r r t r tα β ε+ = + + + − + (15)
13
In the filtering terminology, iα and ( )β −( )m fr t r are observed controls, acting on the
state. Depending on whether the valuation is observed or not, the system has one or two
observation equations. Since we condition on the selection variables, these are always
“observed,” and they are given by the equation
γ η′= +( ) ( ) ( )i i iw t X t t (16)
When a valuation is observed, this observed valuation is modeled using a second
observation equation
( ) ( )= *ln ( ) ln ( )i iV t V t (17)
Here *iV is the observed valuation, iV is the valuation in the Kalman Filter. These two
valuations are equal, since valuations are assumed observed without error.
D. Prior Distributions
We use diffuse priors for the model parameters. The prior mean of α is zero and
we use the GLS estimate of β as its prior mean. The prior variance of both α and β are
chosen to be equal to 1000 times their estimated GLS variance (which is typically around
.008). The prior distribution of 2σ is an Inverse Gamma distribution with parameters 2.1
and 600. This distribution implies that ( ) 0.12E σ = and σ is between 1 and 50 percent
per month with 99% probability.
For the parameters in the selection equation, we take the prior distributions to be
Normal with prior variances of 100. The intercept has a prior mean of -1, and the prior
14
mean of the log valuations is 1. The prior mean coefficient on time since last refinancing
and its square are 1 and -0.1. All other coefficients have means of zero. The prior
distributions are diffuse relative to the resulting posterior distributions, and the estimates
are largely robust to changes in the means and variances of the prior distributions, and the
estimated coefficients reflect information in the data rather than in the priors.
II. Description of Data
Monthly market returns and Fama-French portfolios are downloaded from
Kenneth French’s website. The factor returns are constructed from all NYSE, AMEX and
NASDAQ firms in CRSP. Monthly Treasury-Bill rates are from Ibbotson Associates, and
are also available on Ken French’s website.
A. Venture Capital Data
The data with VCs’ investments in entrepreneurial companies are provided by
Sand Hill Econometrics (SHE), a commercial data provider. The data contain the
majority of US investments, in the period from 1987 to 2005. SHE combines and extends
two commercially available databases, Venture Xpert (formerly Venture Economics) and
VentureOne. These two databases are extensively used in the VC literature, and Gompers
and Lerner (1999) and Kaplan, Sensoy and Strömberg (2002) investigate the
completeness of the Venture Xpert data and find that they contain the majority of the
investments and missing investments tend to be less significant ones. In addition, SHE
has spent substantial amount of time and effort to ensure the accuracy of the data. This
includes removing investment rounds that did not actually occur, adding investment
15
rounds that were not in the original data, and consolidating rounds, so that each round
corresponds to a single actual investment by one or more VC. Cochrane (2005) uses
similar data, but our version is more recent and many of the previously encountered data
problems have been resolved.
B. Calculating Valuations
VCs distinguish between the pre- and post-money valuation of an investment, and the
data contain both of these valuations for a large number of the investments. When a VC
invests I in a company with a total value of PVPOST after the investment (the post-money
valuation), the PVPRE (the pre-money valuation) is defined by = +POST PREPV PV I .
To illustrate, imagine VC A invests $1m in a company with a pre-money
valuation of $2m and a post-money valuation of $3m. This implies that VC A receives
1/3 of the shares of the company. In the next round, VC B invests $4m, at a pre-money
valuation of $6m and a post-money valuation of $10m. The issuance of new shares to VC
B dilutes VC A’s ownership fraction to 1/5, and the value of VC A’s shares is 1/5 x
$10m. = $2m. This number can also be calculated using VC A’s original ownership
fraction and the pre-money valuation in the second round, as 1/3 x $6m = $2m. In short,
we calculate the investor’s return from round t to round t+ using the formula
+
+
= −,
( )1
( )PRE
t tPOST
PV tr
PV t (18)
16
We can use this relationship to construct a new valuation variable that strips out
the dilution of the original ownership. Starting from V(0) = 1, future values of this
valuation are calculated using the equation
+
+
=,
( )
( ) t t
V tr
V t (19)
and the resulting valuations are used as the observed valuations in the estimation
procedure.
Note that this variable can only be constructed when pre- and post-money
valuations are observed for consecutive rounds. When they are not observed for an
intermediate investment round, it is impossible to adjust for the dilution and the valuation
variable is “restarted” after the break. Note that while this reduces the number of
valuations used for the estimation, it does not introduce any bias.
C. Descriptive Statistics
We observe a total of 15,169 financing rounds, of which 9,637 have data on
valuations. The observations are concentrated in the Biotech and IT industries. There is a
total of 3,237 unique start-ups. A third of these firms ultimately go public, another 742
are acquired, and we have explicit information about 444 companies being liquidated.
There is no information about the fate of the remaining 940 firms in the data. Some of
these may be alive and well, some may be “zombies,” and most have likely been
liquidated. The empirical model must deal with this uncertainty about unobserved
outcomes.
17
On average, an entrepreneurial firm receives 4.7 financing rounds (the median is 4
rounds). However, this distribution is highly skewed, with some firms having as many as
9 rounds. On average, 12.4 months (the median 10 months) pass between consecutive
rounds. This distribution is highly also skewed. While 5% of follow-on investments occur
after as little as 2 months, another 5% take more than 32 months. The observed returns
between rounds are 129% on average, with an enormous standard deviation of 340%.
**** TABLE I ABOUT HERE ****
III. Empirical Results
We first present estimates of the CAPM model and discuss the econometric issues
in the context of this model. Then we present and discuss the effect of major industry
groups and time periods, different risk-factors, and market-wide determinants of the
return to VC investments.
We compare estimates arising from a standard OLS, a GLS, and our MCMC
procedures. The OLS and GLS estimators are standard estimation procedures, which
estimate the risk and return in a regression framework without accounting for the
endogeneity of the observed returns. For these two estimators, we calculate the excess
log-return and regress it on the corresponding excess log-return on the market. To
facilitate the comparisons of the various estimators, these returns are calculated from the
basic specification in equation (4). In particular, the following equation is estimated using
OLS
18
( ) ( ) ( ) ( ), , ,ln ( ) / ( ) ln( ) ln lnt t t t t ti i f OLS OLS m f OLSV t V t r t t r rα β ε
+ + ++ + ⎡ ⎤− = − + − +⎣ ⎦ (20)
For each observed return, +t and t represent the time of the current and the previous
valuations, respectively.
The GLS estimator adjusts for the heteroscedasticity in equation (20), arising
since valuations that are further apart have more volatile error terms. Formally, equation
(4) implies that ( )2~ 0, ( )OLS wN t tε σ+ − , and the GLS estimator normalizes this variance
by dividing equation (20) by the square root of the time between observed valuations.
Hence, the GLS estimator is estimated from the specification
( ) ( )
, , ,ln ( ) / ( ) ln( ) ln( ) ln( )t t t t t ti i f m f
GLS GLS GLS
V t V t r r rt t
t t t tα β ε
+ + +++
+ +
− −= − + +
− − (21)
A. Heteroscedasticity, OLS, GLS
Estimates of the OLS, the GLS, and the MCMC models are presented in Table II.
All these estimates are calculated without accounting for the endogeneity of the observed
returns. We observe that the intercept of monthly log-returns from the OLS and the GLS
models are 0.91%OLSα = and 2.39%GLSα = , respectively. The difference between these
two estimates is statistically significant, but difficult to interpret (Cochrane (2005) finds
an annual intercept of 462%).
To turn these estimates into annual returns, we correct for the log-linearization
and add 21 2σ to the intercept. The estimated idiosyncratic risk in the OLS regression is
88%. However, due to the heteroskedasticity, it is not clear how to interpret this value.
19
The GLS estimation finds a volatility of 32.66%. Correcting the intercept, leads to an
estimate of the monthly abnormal return of 7.7%, a substantial return.
**** TABLE II ABOUT HERE ****
B. The Selection Equation
In table II we observe that, ignoring selection, the GLS and MCMC methods
produce similar estimates. To investigate the magnitude of the selection problem, we
compare MCMC with and without the selection equation. In table III we observe that the
log-market model intercept declines from 2.13% to somewhere between -2.11% and -
3.45%, depending on the specification of the selection model. The β increases from the
initial estimate of 1.6152 to somewhere between 2.6377 and 3.0157, again depending on
the selection model. The idiosyncratic risk estimates also increase from 32.43% to
somewhere between 39.49% and 45.74%. The decrease in the intercept and the increases
in the β and volatility estimates are the expected changes in the estimates following the
inclusion of the selection equation in the model.
The estimates of the parameters in the selection equation are also sensible. Across
all specifications, the probability of observing a valuation depends positively on the
return since the previous financing round. As the time since last financing round (τ)
increases, it is more likely that a new financing round occurs (in other words, the firm can
trade off return and time since last refinancing). However, when the time since previous
financing round becomes sufficiently large, the τ2 term dominates, and the chance of
refinancing declines.
20
For the identification of selection models, it is important to have independent
variation in the selection equation. We use the number of acquisitions or public offerings
of VC backed companies to provide this variation, called ACQ and IPO, respectively.
The identifying assumption is that, conditional on the current valuation, ACQ and IPO do
not contain any information about the future returns earned by the company. This
assumption is true as long as investors rationally incorporate all beliefs about future
returns into the current valuation. In specification 3 and 4 we find that ACQ and IPO
enter positively and significantly. When there is more activity in the aggregate VC
market, it is more likely that a given firm obtains a new financing round.
C. Estimating a Three-Factor Model
To provide further insights into the determinants of the risk and return to
investments in entrepreneurial companies, Table IV presents estimates of a Fama-French
three-factor specification. Not surprisingly, we find a large positive loading on the market
factor, indicating that entrepreneurial companies are very exposed to general market
conditions. We find a positive loading on the SMB factor, indicating that the value of
entrepreneurial companies behave like values of small stocks. Finally, we find negative
loadings on the HMB factor, indicating that our valuations behave like valuations of
companies will small book-to-market ratios (growth companies). Perhaps the more
surprising finding is the large magnitudes of these factor loadings. Controlling for these
factors leaves the α largely unchanged.
21
D. Industry Betas
Table V presents estimates of alphas and betas for the four major industry
classifications: Biotech, IT, Retail and Other. There are substantial differences in the risk
and return across these four industries, with IT consistently having the highest betas, and
Biotech the lowest betas. Correspondingly, Biotech has the largest estimated monthly
intercepts, while the lowest intercepts are found for Retail and Other investments.
IV. Conclusion
When estimating risk and return for assets with infrequently observed market
valuations, a number of empirical problems arise. When the duration between the
observed valuations is variable, this introduces heteroschedasticity in the standard OLS
approach. We show that a straightforward GLS correction, resolves this problem, and
does change the estimated parameters substantially. Moreover, when the timing of the
observed valuations is endogenous, this creates a sample selection problem. To resolve
this problem, we introduce a new sample selection model.
We estimate our model using a Bayesian approach, relying on MCMC methods.
The model generalizes previous models by Cochrane (2005) and Woodward (2004). Like
Cochrane (2005), it explicitly models the underlying path of unobserved valuations,
which is important for imposing the constraints of the selection model in the instances
where valuations are unobserved. Unlike Cochrane (2005), it is numerically more
tractable and allows for more flexible and realistic specifications of the selection
equation, which is important for accurately correcting for selection bias and provides
22
insights into the process governing refinancing and exit events for entrepreneurial
companies.
Our first result is that correcting for heteroscedasticity in the error terms is
important for the estimates. The GLS estimates and the MCMC estimates (ignoring
selection) correct for the heteroscedasticity in similar ways. In addition, when correcting
for the selection bias, the MCMC procedure finds significantly higher risk exposures,
both in terms of systematic and idiosyncratic risks, and lower intercepts. These findings
are robust across a number of specifications of the pricing model, including CAPM, the
Fama-French three-factor, and industry specific loadings on the market factor. Further,
the results are robust to various specifications of the selection equation. We include the
underlying valuations to correct for the endogeneity. We include the time since previous
financing round to control for the fact that even well performing companies only receive
financing with some lags. Further, we include market-wide variables, such as the total
number of companies receiving VC financing and the total amount invested during the
same month. These market-wide variables provide exogenous variation in the selection
equation, since it is reasonable to assume that, conditional on the entrepreneurial
company’s valuation, they are independent of its future returns.
We are reluctant to interpret the intercepts as abnormal returns for two reasons.
First, the investors and the entrepreneurs cannot earn these returns, since they cannot
rebalance they portfolio and since the calculation does not adequately correct for the fact
that most investments end with returns of negative one hundred percent for everybody.
Second, translating the intercept in the log-normal model into an abnormal return in the
23
standard arithmetic model is not a trivial task either. Both of these issues are issues we
are currently tackling.
24
Bibliography
Carter, C., and R. Kohn, 1994, On Gibbs Sampling for State Space Models, Biometrika 81, 541-553.
Cochrane, John, 2005, The Risk and Return of Venture Capital, Journal of Financial Economics 75, 3-52.
Dimson, E., 1979, Risk Measurement When Shares are Subject to Infrequent Trading, Journal of Financial Economics 7, 197-226.
Driessen, Joost, Tse-Chun Lin, and Ludovic Phallippou, 2007, Measuring the risk of private equity funds: A new approach, working paper.
Gelfand, Alan, and Adrian Smith, 1990, Sampling Based Approaches to Calculating Marginal Densities, Journal of the American Statistical Association 85, 398-409.
Geman, S., and D. Geman, 1984, Stochastic Relaxation, Gibbs Distributions, adn the Bayesian Restoration of Images, IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721-741.
Gompers, Paul, and Josh Lerner, 1999. The Venture Capital Cycle (MIT Press, Cambridge).
Heckman, James, 1979, Sample Selection Bias as a Specification Error, Econometrica 47, 153-162.
Kaplan, Steven, and Antoinette Schoar, 2005, Private Equity Performance: Returns, Persistence, and Capital Flows, Journal of Finance 60, 1791-1823.
Kaplan, Steven, Berk Sensoy, and Per Strömberg, 2002, How Well do Venture Capital Databases Reflect Actual Investments? working paper, University of Chicago.
Ljungqvist, Alexander, and Matthew Richardson, 2003, The Cash Flow, Return and Risk Characteristics of Private Equity, working paper.
Phalippou, Ludovic, and Maurizio Zollo, 2006, What Drives Private Equity Fund Performance? working paper.
Scholes, M., and J. Williams, 1977, Estimating Betas from Nonsynchronous Data, Journal of Financial Economics 5, 309-328.
Tanner, Martin, and Wing Wong, 1987, The Calculation of Posterior Distributions by Data Augmentation, Journal of the American Statistical Association 82, 528-549.
25
Woodward, Susan, 2004, Measuring Risk and Performance for Private Equity, working paper.
Table I: Descriptive Statistics Table I presents descriptive statistics for the VC sample. Across the four industry classifications, Panel A contains the total number of rounds with and without observed returns. Similarly, Panel B contains the number of entrepreneurial firms across the four industry classifications and according to the observed exits. Finally, Panel C contains information about the average number of rounds received by each entrepreneurial firm. The variable tau contains the time since the previous round (measured in months), and return is the return earned since the previous round (presented in percent).
Panel A: Total Rounds
Total Biotech IT Retail Other Number of rounds 15169 3892 9152 1700 425 With valuations 9637 2485 5759 1126 267
Panel B: Number of firms Total Biotech IT Retail Other IPO 1111 343 623 121 24 acquisition 742 137 507 74 24 liquidated 444 70 288 77 9 unknown/still alive 940 260 549 92 39 Total 3237 810 1967 364 96
Panel C: Rounds per firm mean median stdev p5 p95 total 4.6861 4 2.2607 2 9 w/ valid data 2.9771 3 1.3099 2 6 tau 12.3522 10 11.2771 2 32 return 128.77 50 340.29 -62.99 500.77
Table II: OLS, GLS, and MCMC Estimates The table presents OLS, GLS and MCMC estimates of the market model in monthly log-returns. The alphas are intercepts and the betas are coefficients from a regression of the log-returns to the companies on the market log-return. Sigma is the estimated variance of the error term from this regression. The GLS estimator scales each observation with the inverse of the square-root of the time since last financing round, as specified in the text. Reported MCMC estimates are mean and standard deviation of the parameters’ simulated posterior distribution. The simulations use 500 iterations preceded by 500 discarded iterations for burn-in. For frequentist estimates ***, **, and * denote statistical significance at the 1%, 5%, and 10% levels, respectively. For Bayesian estimates ***, **, and * denote whether zero is contained in the 1%, 5%, and 10% credible intervals, respectively.
OLS GLS MCMC 1 2 3
Coef. Std. Err. Coef. Std. Err. Mean Std. Dev. alpha_tilde 0.0091 (0.0008)*** 0.0239 (0.0013)*** 0.0213 (0.0012) *** beta 1.6104 (0.0672)*** 1.3891 (0.0869)*** 1.6152 (0.0739) *** sigma 0.8777 0.3266 0.3243 (0.0031) ***
Table III: MCMC Estimates with selection correction
1 2 3 4 Mean Std. Dev. Mean Std. Dev. Mean Std. Dev. Mean Std. Dev.
alpha_tilde -0.0211 (0.0011)*** -0.0345 (0.0038)*** -0.0318 (0.0046)*** -0.0295 (0.0037)*** beta 2.8898 (0.1087)*** 3.0157 (0.1460) 2.7753 (0.1635)*** 2.6377 (0.1529)*** sigma 0.3949 (0.0028)*** 0.4574 (0.0193)*** 0.4552 (0.0214)*** 0.4498 (0.0173)*** Selection Equation return 0.1810 (0.0040)*** 0.2214 (0.0085)*** 0.2165 (0.0172)*** 0.2157 (0.0137)*** tau 0.2635 (0.0117)*** 0.2664 (0.0110)*** 0.2644 (0.0115)*** tau^2 -0.0290 (0.0024)*** -0.0298 (0.0028)*** -0.0295 (0.0025)*** X_rounds 0.4876 (0.0662)*** X_dollars 11.9621 (2.8409)*** -8.2378 (4.1443)* X_ACQ 2.7620 (0.5679)*** 0.3078 (0.5833) X_IPO 5.0547 (0.7148)*** 5.4601 (0.6836)*** const -1.4817 (0.0052)*** -1.6304 (0.0093) *** -1.8056 (0.0164)*** -1.8604 (0.0174)***
Table IV: MCMC Estimates of Fama-French 3-factor Model The table presents the posterior distributions of the parameters of the Fama-French model in log-returns. Factor and risk-free returns are from Kenneth French’s website. The MCMC algorithm uses two different selection models and 500 iterations for burn-in and 500 iterations to sample the posterior distributions. Alphas and sigmas are annualized.
OLS GLS MCMC 1 2 3 4
Coef. Std. Err. Coef. Std. Err. Mean Std. Dev. Mean Std. Dev. alpha_tilde 0.0082 (0.0009)*** 0.0216 (0.0014)*** -0.0316 (0.0034)*** -0.0278 (0.0036)*** RMRF 1.4299 (0.0694)*** 1.2327 (0.0877)*** 2.5446 (0.1562)*** 2.2662 (0.1354)*** SMB 0.4832 (0.0803)*** 0.7230 (0.1026)*** 1.0948 (0.1343)*** 0.9155 (0.1182)*** HML -0.6322 (0.0686)*** -0.8113 (0.0856)*** -2.0071 (0.1189)*** -1.7411 (0.1580)*** sigma 0.8714 0.3240 0.4455 (0.0126)*** 0.4401 (0.0169)*** Selection Equation return 0.2258 (0.0109)*** 0.2244 (0.0128)*** tau 0.2756 (0.0124)*** 0.2803 (0.0146)*** tau^2 -0.0296 (0.0024)*** -0.0302 (0.0026)*** X_rounds 0.4614 (0.0647)*** X_dollars -10.8600 (3.9612)*** X_ACQ 1.1112 (0.6038)** X_IPO 5.2952 (0.6930)*** const -1.6386 (0.0090)*** -1.8691 (0.0194)***
Table V: Industry level MCMC Estimates
OLS GLS MCMC 1 2 3 4
Coef. Std. Err. Coef. Std. Err. Mean Std. Dev. Mean Std. Dev. alpha_tilde Biotech 0.0114 (0.0016)*** 0.0200 (0.0025)*** -0.0285 (0.0035) *** -0.0249 (0.0036)*** IT 0.0091 (0.0011)*** 0.0258 (0.0017) *** -0.0318 (0.0036) *** -0.0278 (0.0039)*** Retail 0.0068 (0.0030)** 0.0326 (0.0044)*** -0.0492 (0.0058) *** -0.0476 (0.0062)*** Other 0.0063 (0.0053) 0.0105 (0.0077) -0.0505 (0.0081) *** -0.0462 (0.0080)*** beta Biotech 0.5927 (0.1300)*** 0.3140 (0.1736)* 2.0956 (0.2288) *** 1.6607 (0.1841)*** IT 1.8656 (0.0843)*** 1.6240 (0.1087)*** 2.9741 (0.1647) *** 2.7108 (0.1563)*** Retail 3.1028 (0.2244)*** 2.7645 (0.2757)*** 5.4072 (0.3543) *** 5.1907 (0.3538)*** Other 0.8432 (0.4012)** 0.6402 (0.5138) 1.7414 (0.6055) *** 1.2323 (0.5800)*** sigma 0.8680 0.3232 0.4508 (0.0136) 0.4445 (0.0172)*** Selection Equation return 0.2204 (0.0140) *** 0.2108 (0.0136)*** tau 0.2594 (0.0123) *** 0.2610 (0.0120)*** tau^2 -0.0287 (0.0021) *** -0.0299 (0.0023)*** X_rounds 0.4714 (0.0632)*** X_dollars -6.8734 (3.8495)* X_ACQ 0.2844 (0.6360) X_IPO 5.2208 (0.7185)*** const -1.6293 (0.0094) *** -1.8505 (0.0175)***