Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95...
Transcript of Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95...
Some Models of Information Aggregation
and Consensus in Networks
John N. TsitsiklisMIT
ACCESS, KTHJanuary 2011
Overview
• Information propagation/aggregation in networks
– Engineered versus social networks
– Bayesian versus naive updates
• Review a few key models and results
• Is this applied probability?
Theoretical AP Applied APrandom graphs ad. campaigns, etc.
Narrative AP Recreational AP“explain” social phenomena play with toy models
1
5/30/09 4:14 PMAmazon.com: The Wisdom of Crowds: James Surowiecki: Books
Page 1 of 8http://www.amazon.com/Wisdom-Crowds-James-Surowiecki/dp/0385721706/ref=sr_1_1?ie=UTF8&s=books&qid=1243714434&sr=8-1
John's Amazon.com Today's Deals Gifts & Wish Lists Gift Cards
Sponsored Content
Quantity: 1
or
Sign in to turn on 1-Click ordering.
or
Amazon Prime Free Trial
required. Sign up when you
check out. Learn More
More Buying Choices
111 used & new from
$5.59
Have one to sell?
Share with Friends
Share your own customer images
Search inside another edition of this book
Start reading The Wisdomof Crowds on your Kindlein under a minute.
Don’t have a Kindle? Getyours here.
+
Hello, John Tsitsiklis. We have recommendations for you. (Not John?) Buy Any CD, Get $1 Off MP3s
Your Account | Help
Search Books
BooksAdvanced
SearchBrowseSubjects
Hot NewReleases
BestsellersThe New York
Times® Best SellersLibros EnEspañol
BargainBooks
Textbooks
The Wisdom of Crowds and over 285,000 other books are available for Amazon Kindle – Amazon’s new wireless reading device.
Learn more
The Wisdom of Crowds (Paperback)by James Surowiecki (Author) "If, years hence, people remember anything aboutthe TV game show Who Wants to Be a Millionaire?, they will probably remember
the contestants' panicked phone..." (more) Key Phrases: bowling bubble, bowling stocks, foam strike, United States,New York, Wall Street (more...)
(173 customer reviews)
List Price: $14.95
Price: $10.17 & eligible for FREE Super Saver Shipping
on orders over $25. Details
You Save: $4.78 (32%)
In Stock.Ships from and sold by Amazon.com. Gift-wrap available.
Want it delivered Tuesday, June 2? Order it in the next 50 hours and 30minutes, and choose One-Day Shipping at checkout. Details
59 new from $5.59 51 used from $5.92 1 collectible from $21.00
Best Value
Buy The Wisdom of Crowds and get Crowdsourcing: Why the Power of the Crowd Is Driving the Future of
Business at an additional 5% off Amazon.com's everyday low price.
Buy Together Today: $27.07
Leadership and Business Seminars in Boston (What's this?) | Change location
Business Degree Online eLearners.com/Business Boost Your Salary with a Business/ Administration B.S. Get Info Today!
Provider: Education Dynamics - elearners (Details) | No reviews yet. Be the first
Shop All Departments Cart Your Lists
Also Available in: ListPrice:
OurPrice:
Other Offers:
Kindle Edition (Kindle Book) $9.99
Hardcover (Bargain Price) $24.95 $16.475 used & new from$15.02
Hardcover $24.95 $16.4791 used & new from$3.39
Paperback29 used & new from$5.83
Audio Download(Audible.com)
$25.95 $13.63
Show more editions and formats
Get Free Two-Day ShippingGet Free Two-Day Shipping for three months with a special extended free trial of Amazon Prime. Add this eligible textbook to your cart to qualify. Sign up atcheckout. See details.
Customers Who Bought This Item Also Bought Page 1 of 20
Wikinomics: How Mass
Collaboration Changes
Everyt... by Don Tapscott
(103) $18.45
The Tipping Point: How
Little Things Can
Make... by Malcolm
Gladwell
(1,000) $8.54
Blink: The Power of
Thinking Without
Thinking by Malcolm
Gladwell
(1,035) $9.11
Made to Stick: Why
Some Ideas Survive
and Others Die by Chip
Heath
(274) $16.50
Back Next
The Wisdom of Crowds
(Paperback)
by James Surowiecki
(173) $10.17
111 used & new from $5.59
Book sections
Front Cover
Front Flap
Copyright
Table of Contents
First Pages
Back Flap
Back Cover
Surprise Me!
Sample searches in this book:
bowling bubble
bowling stocks
foam strike
Search Inside This Book
Feedback | Help | Expanded View | Close
Myopia and Herding
The Wisdom or Madness of Crowds
The Wisdom of Crowds
Wisdom versus Herding
of Rational but Selfish Agents
Consensus and Averaging
12
Social sciences
• Merging of “expert” opinions
• Evolution of public opinion
• Reputation systems
• Modeling of jurors
• Language evolution
• Charles Mackay (London, 1841):Extraordinary Popular Delusionsand the Madness of Crowds
– “Men, it has been well said,think in herds; it will be seenthat they go mad in herds,. . .”
5
Myopia and Herding
Wisdom or Madness of Crowds?
The Wisdom of Crowds
Wisdom versus Herding
of Rational but Selfish Agents
Consensus and Averaging
12
Sensor and other Engineered Networks
• Fusion of available information
– we get to design the nodes’ behavior
12
The Basic Setup
• Each node i endowed with private information Xi
– Nodes form opinions, make decisions
– Nodes observe opinions/decisions or receive messages
– Nodes update opinions/decisions
– Everyone wants a “good”decision; common objective
• Questions
– Convergence? To what? How fast? Quality of limit?
– Does the underlying graph matter? (Tree, acyclic, random,. . .)
• Variations
– Hypothesis testing (binary decisions): P(H0 | Xi)
– Parameter estimation: E[Y | Xi]
– Optimization: minu E[c(u, Y ) | Xi]
13
The Basic Setup
• Each node i endowed with private information Xi
– Nodes form opinions, make decisions
– Nodes observe opinions/decisions or receive messages
– Nodes update opinions/decisions
– Everyone wants a “good”decision; common objective
• Questions
– Convergence? To what? How fast? Quality of limit?
– Does the underlying graph matter? (Tree, acyclic, random,. . .)
• Variations
– Hypothesis testing (binary decisions): P(H0 | Xi)
– Parameter estimation: E[Y | Xi]
– Optimization: minu E[c(u, Y ) | Xi]
13
• Social science: postulate update mechanism
• Engineering: design/optimize update mechanism
14
The Basic Setup
• Each node i endowed with private information Xi
– Nodes form opinions, make decisions
– Nodes observe opinions/decisions or receive messages
– Nodes update opinions/decisions
– Everyone wants a “good”decision; common objective
• Questions
– Convergence? To what? How fast? Quality of limit?
– Does the underlying graph matter? (Tree, acyclic, random,. . .)
• Variations
– Hypothesis testing (binary decisions): P(H0 | Xi)
– Parameter estimation: E[Y | Xi]
– Optimization: minu E[c(u, Y ) | Xi]
13
The Basic Setup
• Each node i endowed with private information Xi
– Nodes form opinions, make decisions
– Nodes observe opinions/decisions or receive messages
– Nodes update opinions/decisions
– Everyone wants a “good”decision; common objective
• Questions
– Convergence? To what? How fast? Quality of limit?
– Does the underlying graph matter? (Tree, acyclic, random,. . .)
• Variations
– Hypothesis testing (binary decisions): P(H0 | Xi)
– Parameter estimation: E[Y | Xi]
– Optimization: minu E[c(u, Y ) | Xi]
13
• Social science: postulate update mechanism
• Engineering: design/optimize update mechanism
14
Myopia and Herding
Wisdom or Madness of Crowds?
The Wisdom of Crowds
Wisdom versus Herding
of Rational but Selfish Agents
Consensus and Averaging
Bayesian Models15
Choosing the Best Tavern
• Private info points to the better one,with some probability of error
• Information is there, but is not used
13
Where do we eat tonight?
• Private info points to the better one,with some probability of error
• Information is there
• Information is there, but is not used
13
Where do we eat tonight?
(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)
• Private info points to the better one,with some probability of error
• Information is there
• Information is there, but is not used
13
Choosing the Best Tavern
• Private info points to the better one,with some probability of error
• Information is there, but is not used
13
Where do we eat tonight?
• Private info points to the better one,with some probability of error
• Information is there
• Information is there, but is not used
13
Where do we eat tonight?
(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)
• Private info points to the better one,with some probability of error
• Information is there
• Information is there, but is not used
13
A
A BA B
A AA
Where do we eat tonight?
(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)
• Private info points to the better one,with some probability of error
• Prior slightlyin favor of A
• Information is there
• Information is there, but is not used
copy
16
Choosing the Best Tavern
• Private info points to the better one,with some probability of error
• Information is there, but is not used
13
Where do we eat tonight?
• Private info points to the better one,with some probability of error
• Information is there
• Information is there, but is not used
13
Where do we eat tonight?
(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)
• Private info points to the better one,with some probability of error
• Information is there
• Information is there, but is not used
13
Where do we eat tonight?
• Private info points to the better one,with some probability of error
• Information is there
• Information is there, but is not used
13
A
A BA B
A AA
Where do we eat tonight?
(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)
• Private info points to the better one,with some probability of error
• Prior slightlyin favor of A
• Information is there
• Information is there, but is not used
copy
16
Choosing the Best Tavern
• Private info points to the better one,with some probability of error
• Information is there, but is not used
13
Where do we eat tonight?
• Private info points to the better one,with some probability of error
• Information is there
• Information is there, but is not used
13
Where do we eat tonight?
(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)
• Private info points to the better one,with some probability of error
• Information is there
• Information is there, but is not used
13
Where do we eat tonight?
• Private info points to the better one,with some probability of error
• Information is there
• Information is there, but is not used
13
A
A BA B
A AA
Where do we eat tonight?
(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)
• Private info points to the better one,with some probability of error
• Prior slightlyin favor of A
• Information is there
• Information is there, but is not used
copy
16
Where do we eat tonight?
(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)
• Private info points to the better one,with some probability of error
• Prior slightlyin favor of A
• Information is there
• Information is there, but is not used
copy
16
Tandem networks
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 1)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 1)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
Tandem networks
Let !j > 0 be the prior probability of hypothesis Hj and Pt(n) = !0P0(Yn,n = 1)+!1P1(Yn,n =
0) be the probability of error at sensor n. The goal of a system designer is to design a strategy
so that the probability of error Pt(n) is minimized. Let P !t (n) = inf Pt(n), where the infimum is
taken over all possible strategies.
1 2 n
X1 X2 Xn
Figure 1: A tandem network.
The problem of finding optimal strategies has been studied in [1–3], while the asymptotic
performance of a long tandem network (i.e., n ! ") is considered in [2, 4–8] (some of these
works do not restrict the message sent by each sensor to be binary). In the case of binary
communications, [4, 8] find necessary and su!cient conditions under which the error probability
goes to zero in the limit of large n. To be specific, the error probability stays bounded away
from zero i" there exists a B < " such that | log dP1
dP0| # B. In the case when the log-likelihood
ratio is unbounded, numerical examples have indicated that the error probability goes to zero
much slower than an exponential rate. In a parallel configuration, in which all sensors send their
messages directly to a single fusion center (i.e., "i is now a function of only Xi), the rate of decay
of the error probability is known to be exponential [9]. This suggests that a tandem configuration
performs worse than a parallel configuration, when n is large. It has been conjectured in [2,8,10,11]
that indeed, the rate of decay of the error probability is sub-exponential. However, no rigorous
proof is known for this result. The goal of this paper is to prove this conjecture.
We first note that there is a caveat to the sub-exponential decay conjecture: the probability
measures P0 and P1 are assumed to be equivalent measures, i.e., they are absolutely continuous
w.r.t. each other. Indeed, if there exists a measurable set A such that P0(A) > 0 and P1(A) = 0,
then an exponential decay rate can be achieved as follows: each sensor always declares 1 till the
2
• Estimation with limited memory(Cover, 1969; Hellman & Cover, 1970; Koplowitz, 1975)
• en = optimal P(Yn is incorrect)
• en ! 0? How fast?
10
Tandem networks
Let !j > 0 be the prior probability of hypothesis Hj and Pt(n) = !0P0(Yn,n = 1)+!1P1(Yn,n =
0) be the probability of error at sensor n. The goal of a system designer is to design a strategy
so that the probability of error Pt(n) is minimized. Let P !t (n) = inf Pt(n), where the infimum is
taken over all possible strategies.
1 2 n
X1 X2 Xn
Figure 1: A tandem network.
The problem of finding optimal strategies has been studied in [1–3], while the asymptotic
performance of a long tandem network (i.e., n ! ") is considered in [2, 4–8] (some of these
works do not restrict the message sent by each sensor to be binary). In the case of binary
communications, [4, 8] find necessary and su!cient conditions under which the error probability
goes to zero in the limit of large n. To be specific, the error probability stays bounded away
from zero i" there exists a B < " such that | log dP1
dP0| # B. In the case when the log-likelihood
ratio is unbounded, numerical examples have indicated that the error probability goes to zero
much slower than an exponential rate. In a parallel configuration, in which all sensors send their
messages directly to a single fusion center (i.e., "i is now a function of only Xi), the rate of decay
of the error probability is known to be exponential [9]. This suggests that a tandem configuration
performs worse than a parallel configuration, when n is large. It has been conjectured in [2,8,10,11]
that indeed, the rate of decay of the error probability is sub-exponential. However, no rigorous
proof is known for this result. The goal of this paper is to prove this conjecture.
We first note that there is a caveat to the sub-exponential decay conjecture: the probability
measures P0 and P1 are assumed to be equivalent measures, i.e., they are absolutely continuous
w.r.t. each other. Indeed, if there exists a measurable set A such that P0(A) > 0 and P1(A) = 0,
then an exponential decay rate can be achieved as follows: each sensor always declares 1 till the
2
• Estimation with limited memory(Cover, 1969; Hellman & Cover, 1970; Koplowitz, 1975)
• en = optimal P(Yn is incorrect)
• en ! 0? How fast?
10
Y1 Y2 Yn!1
• Xi are i.i.d.
• H0 : Xi " P0, H1 : Xi " P1
• ! : R #$ {0,1}
• minimize P(error) over {!i}
H0 H1
Yn = 0 Yn = 1
p0(0$ 1) p1(0$ 1)
p1(1$ 0) p0(1$ 0)
0 1
0 1
Y1 Y2 Yn!1
• Xi are i.i.d.
• H0 : Xi " P0, H1 : Xi " P1
• ! : R #$ {0,1}
• minimize P(error) over {!i}
H0 H1
Yn = 0 Yn = 1
p0(0$ 1) p1(0$ 1)
p1(1$ 0) p0(1$ 0)
0 1
0 1
Y1 Y2 Yn!1
• Xi are i.i.d.
• H0 : Xi " P0, H1 : Xi " P1
• ! : R #$ {0,1}
• minimize P(error) over {!i}
H0 H1
Yn = 0 Yn = 1
p0(0$ 1) p1(0$ 1)
p1(1$ 0) p0(1$ 0)
0 1
0 1
• en = min P(Yn is incorrect)
• Private info: likelihood ratio Li =P1(Xi)
P0(Xi)
• Two regimes:
• 0 < a ! Li < b <"Engineered en #$ 0 =% Myopic en #$ 0
• Li ranges over (0,")Myopic en $ 0 =% Engineered en #$ 0 en $ 0(but slowly)
Tandem networks
• Xi are i.i.d.; H0 : Xi ! P0, H1 : Xi ! P1
• binary message/decision functions !i: Yi = !i(Xi, Yi"1)
• “Social network” view: each !i is myopic–Bayes optimal
• “Engineered system view”: {!i} designed for best end-decision
– simple sensor network
– estimation with limited memory [Cover, 1969]
Tandem networks
• Xi are i.i.d.; H0 : Xi ! P0, H1 : Xi ! P1
• binary message/decision functions !i: Yi = !i(Xi, Yi"1)
• “Social network” view: each !i is myopic–Bayes optimal
• “Engineered system view”: {!i} designed for best end-decision
– simple sensor network
– estimation with limited memory [Cover, 1969]
Tandem networks
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 1)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 1)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
Tandem networks
Let !j > 0 be the prior probability of hypothesis Hj and Pt(n) = !0P0(Yn,n = 1)+!1P1(Yn,n =
0) be the probability of error at sensor n. The goal of a system designer is to design a strategy
so that the probability of error Pt(n) is minimized. Let P !t (n) = inf Pt(n), where the infimum is
taken over all possible strategies.
1 2 n
X1 X2 Xn
Figure 1: A tandem network.
The problem of finding optimal strategies has been studied in [1–3], while the asymptotic
performance of a long tandem network (i.e., n ! ") is considered in [2, 4–8] (some of these
works do not restrict the message sent by each sensor to be binary). In the case of binary
communications, [4, 8] find necessary and su!cient conditions under which the error probability
goes to zero in the limit of large n. To be specific, the error probability stays bounded away
from zero i" there exists a B < " such that | log dP1
dP0| # B. In the case when the log-likelihood
ratio is unbounded, numerical examples have indicated that the error probability goes to zero
much slower than an exponential rate. In a parallel configuration, in which all sensors send their
messages directly to a single fusion center (i.e., "i is now a function of only Xi), the rate of decay
of the error probability is known to be exponential [9]. This suggests that a tandem configuration
performs worse than a parallel configuration, when n is large. It has been conjectured in [2,8,10,11]
that indeed, the rate of decay of the error probability is sub-exponential. However, no rigorous
proof is known for this result. The goal of this paper is to prove this conjecture.
We first note that there is a caveat to the sub-exponential decay conjecture: the probability
measures P0 and P1 are assumed to be equivalent measures, i.e., they are absolutely continuous
w.r.t. each other. Indeed, if there exists a measurable set A such that P0(A) > 0 and P1(A) = 0,
then an exponential decay rate can be achieved as follows: each sensor always declares 1 till the
2
• Estimation with limited memory(Cover, 1969; Hellman & Cover, 1970; Koplowitz, 1975)
• en = optimal P(Yn is incorrect)
• en ! 0? How fast?
10
Tandem networks
Let !j > 0 be the prior probability of hypothesis Hj and Pt(n) = !0P0(Yn,n = 1)+!1P1(Yn,n =
0) be the probability of error at sensor n. The goal of a system designer is to design a strategy
so that the probability of error Pt(n) is minimized. Let P !t (n) = inf Pt(n), where the infimum is
taken over all possible strategies.
1 2 n
X1 X2 Xn
Figure 1: A tandem network.
The problem of finding optimal strategies has been studied in [1–3], while the asymptotic
performance of a long tandem network (i.e., n ! ") is considered in [2, 4–8] (some of these
works do not restrict the message sent by each sensor to be binary). In the case of binary
communications, [4, 8] find necessary and su!cient conditions under which the error probability
goes to zero in the limit of large n. To be specific, the error probability stays bounded away
from zero i" there exists a B < " such that | log dP1
dP0| # B. In the case when the log-likelihood
ratio is unbounded, numerical examples have indicated that the error probability goes to zero
much slower than an exponential rate. In a parallel configuration, in which all sensors send their
messages directly to a single fusion center (i.e., "i is now a function of only Xi), the rate of decay
of the error probability is known to be exponential [9]. This suggests that a tandem configuration
performs worse than a parallel configuration, when n is large. It has been conjectured in [2,8,10,11]
that indeed, the rate of decay of the error probability is sub-exponential. However, no rigorous
proof is known for this result. The goal of this paper is to prove this conjecture.
We first note that there is a caveat to the sub-exponential decay conjecture: the probability
measures P0 and P1 are assumed to be equivalent measures, i.e., they are absolutely continuous
w.r.t. each other. Indeed, if there exists a measurable set A such that P0(A) > 0 and P1(A) = 0,
then an exponential decay rate can be achieved as follows: each sensor always declares 1 till the
2
• Estimation with limited memory(Cover, 1969; Hellman & Cover, 1970; Koplowitz, 1975)
• en = optimal P(Yn is incorrect)
• en ! 0? How fast?
10
Y1 Y2 Yn!1
• Xi are i.i.d.
• H0 : Xi " P0, H1 : Xi " P1
• ! : R #$ {0,1}
• minimize P(error) over {!i}
H0 H1
Yn = 0 Yn = 1
p0(0$ 1) p1(0$ 1)
p1(1$ 0) p0(1$ 0)
0 1
0 1
Y1 Y2 Yn!1
• Xi are i.i.d.
• H0 : Xi " P0, H1 : Xi " P1
• ! : R #$ {0,1}
• minimize P(error) over {!i}
H0 H1
Yn = 0 Yn = 1
p0(0$ 1) p1(0$ 1)
p1(1$ 0) p0(1$ 0)
0 1
0 1
Y1 Y2 Yn!1
• Xi are i.i.d.
• H0 : Xi " P0, H1 : Xi " P1
• ! : R #$ {0,1}
• minimize P(error) over {!i}
H0 H1
Yn = 0 Yn = 1
p0(0$ 1) p1(0$ 1)
p1(1$ 0) p0(1$ 0)
0 1
0 1
• en = min P(Yn is incorrect)
• Private info: likelihood ratio Li =P1(Xi)
P0(Xi)
• Two regimes:
• 0 < a ! Li < b <"Engineered en #$ 0 =% Myopic en #$ 0
• Li ranges over (0,")Myopic en $ 0 =% Engineered en #$ 0 en $ 0(but slowly)
Tandem networks
• Xi are i.i.d.; H0 : Xi ! P0, H1 : Xi ! P1
• binary message/decision functions !i: Yi = !i(Xi, Yi"1)
• “Social network” view: each !i is myopic–Bayes optimal
• “Engineered system view”: {!i} designed for best end-decision
– simple sensor network
– estimation with limited memory [Cover, 1969]
Tandem networks
• Xi are i.i.d.; H0 : Xi ! P0, H1 : Xi ! P1
• binary message/decision functions !i: Yi = !i(Xi, Yi"1)
• “Social network” view: each !i is myopic–Bayes optimal
• “Engineered system view”: {!i} designed for best end-decision
– simple sensor network
– estimation with limited memory [Cover, 1969]
Tandem Network Results
• en = P(Yn is incorrect) ! 0 ?
19
Tandem networks
• Xi are i.i.d.; H0 : Xi ! P0, H1 : Xi ! P1
• binary message/decision functions !i: Yi = !i(Xi, Yi"1)
• “Social network” view: each !i is myopic–Bayes optimal
• “Engineered system view”: {!i} designed for best end-decision
– simple sensor network [Cover, 1969]
L H1 H0 copy
0! 1
1! 0
P1(Xi) P0(Xi)x
Bounded likelihood ratios (B-LR): 0 < a < Li < b <"
Unbounded likelihood ratios (U-LR): 0 < Li <"
The Two Regimes
19
L H1 H0 copy 0! 1 1! 0 P1(Xi) P0(Xi)x
Bounded likelihood ratios (B-LR): 0 < a < Li < b <"
never get compelling evidence
compelling evidence has positive probability
Unbounded likelihood ratios (U-LR): 0 < Li <"
The Two Regimes
19
L H1 H0 copy 0! 1 1! 0 P1(Xi) P0(Xi)x
Bounded likelihood ratios (B-LR): 0 < a <d P1
d P0(Xi) < b <"
never get compelling evidence
compelling evidence has positive probability
Unbounded likelihood ratios (U-LR): 0 <d P1
d P0(Xi) <"
20
L H1 H0 copy 0! 1 1! 0 P1(Xi) P0(Xi)x
Bounded likelihood ratios (B-LR): 0 < a <d P1
d P0(Xi) < b <"
never get compelling evidence
compelling evidence has positive probability
Unbounded likelihood ratios (U-LR): 0 <d P1
d P0(Xi) <"
20
L H1 H0 copy 0! 1 1! 0 P1(Xi) P0(Xi)x
Bounded likelihood ratios (B-LR): 0 < a <d P1
d P0(Xi) < b <"
never get compelling evidence
arbitrarily compelling evidence is possible
Unbounded likelihood ratios (U-LR): 0 <d P1
d P0(Xi) <"
20
Tandem networks
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 1)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 1)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
H0 H1
Yn = 0 Yn = 1
p0(0! 1) p1(0! 1)
p1(1! 0) p0(1! 0)
0 1
0 1
Tandem networks
Let !j > 0 be the prior probability of hypothesis Hj and Pt(n) = !0P0(Yn,n = 1)+!1P1(Yn,n =
0) be the probability of error at sensor n. The goal of a system designer is to design a strategy
so that the probability of error Pt(n) is minimized. Let P !t (n) = inf Pt(n), where the infimum is
taken over all possible strategies.
1 2 n
X1 X2 Xn
Figure 1: A tandem network.
The problem of finding optimal strategies has been studied in [1–3], while the asymptotic
performance of a long tandem network (i.e., n ! ") is considered in [2, 4–8] (some of these
works do not restrict the message sent by each sensor to be binary). In the case of binary
communications, [4, 8] find necessary and su!cient conditions under which the error probability
goes to zero in the limit of large n. To be specific, the error probability stays bounded away
from zero i" there exists a B < " such that | log dP1
dP0| # B. In the case when the log-likelihood
ratio is unbounded, numerical examples have indicated that the error probability goes to zero
much slower than an exponential rate. In a parallel configuration, in which all sensors send their
messages directly to a single fusion center (i.e., "i is now a function of only Xi), the rate of decay
of the error probability is known to be exponential [9]. This suggests that a tandem configuration
performs worse than a parallel configuration, when n is large. It has been conjectured in [2,8,10,11]
that indeed, the rate of decay of the error probability is sub-exponential. However, no rigorous
proof is known for this result. The goal of this paper is to prove this conjecture.
We first note that there is a caveat to the sub-exponential decay conjecture: the probability
measures P0 and P1 are assumed to be equivalent measures, i.e., they are absolutely continuous
w.r.t. each other. Indeed, if there exists a measurable set A such that P0(A) > 0 and P1(A) = 0,
then an exponential decay rate can be achieved as follows: each sensor always declares 1 till the
2
• Estimation with limited memory(Cover, 1969; Hellman & Cover, 1970; Koplowitz, 1975)
• en = optimal P(Yn is incorrect)
• en ! 0? How fast?
10
Tandem networks
Let !j > 0 be the prior probability of hypothesis Hj and Pt(n) = !0P0(Yn,n = 1)+!1P1(Yn,n =
0) be the probability of error at sensor n. The goal of a system designer is to design a strategy
so that the probability of error Pt(n) is minimized. Let P !t (n) = inf Pt(n), where the infimum is
taken over all possible strategies.
1 2 n
X1 X2 Xn
Figure 1: A tandem network.
The problem of finding optimal strategies has been studied in [1–3], while the asymptotic
performance of a long tandem network (i.e., n ! ") is considered in [2, 4–8] (some of these
works do not restrict the message sent by each sensor to be binary). In the case of binary
communications, [4, 8] find necessary and su!cient conditions under which the error probability
goes to zero in the limit of large n. To be specific, the error probability stays bounded away
from zero i" there exists a B < " such that | log dP1
dP0| # B. In the case when the log-likelihood
ratio is unbounded, numerical examples have indicated that the error probability goes to zero
much slower than an exponential rate. In a parallel configuration, in which all sensors send their
messages directly to a single fusion center (i.e., "i is now a function of only Xi), the rate of decay
of the error probability is known to be exponential [9]. This suggests that a tandem configuration
performs worse than a parallel configuration, when n is large. It has been conjectured in [2,8,10,11]
that indeed, the rate of decay of the error probability is sub-exponential. However, no rigorous
proof is known for this result. The goal of this paper is to prove this conjecture.
We first note that there is a caveat to the sub-exponential decay conjecture: the probability
measures P0 and P1 are assumed to be equivalent measures, i.e., they are absolutely continuous
w.r.t. each other. Indeed, if there exists a measurable set A such that P0(A) > 0 and P1(A) = 0,
then an exponential decay rate can be achieved as follows: each sensor always declares 1 till the
2
• Estimation with limited memory(Cover, 1969; Hellman & Cover, 1970; Koplowitz, 1975)
• en = optimal P(Yn is incorrect)
• en ! 0? How fast?
10
Y1 Y2 Yn!1
• Xi are i.i.d.
• H0 : Xi " P0, H1 : Xi " P1
• ! : R #$ {0,1}
• minimize P(error) over {!i}
H0 H1
Yn = 0 Yn = 1
p0(0$ 1) p1(0$ 1)
p1(1$ 0) p0(1$ 0)
0 1
0 1
Y1 Y2 Yn!1
• Xi are i.i.d.
• H0 : Xi " P0, H1 : Xi " P1
• ! : R #$ {0,1}
• minimize P(error) over {!i}
H0 H1
Yn = 0 Yn = 1
p0(0$ 1) p1(0$ 1)
p1(1$ 0) p0(1$ 0)
0 1
0 1
Y1 Y2 Yn!1
• Xi are i.i.d.
• H0 : Xi " P0, H1 : Xi " P1
• ! : R #$ {0,1}
• minimize P(error) over {!i}
H0 H1
Yn = 0 Yn = 1
p0(0$ 1) p1(0$ 1)
p1(1$ 0) p0(1$ 0)
0 1
0 1
• en = min P(Yn is incorrect)
• Private info: likelihood ratio Li =P1(Xi)
P0(Xi)
• Two regimes:
• 0 < a ! Li < b <"Engineered en #$ 0 =% Myopic en #$ 0
• Li ranges over (0,")Myopic en $ 0 =% Engineered en #$ 0 en $ 0(but slowly)
Tandem Network Results
• en = P(Yn is incorrect)
• Private info: likelihood ratio Li =P1(Xi)
P0(Xi)
Two regimes:
• range of Li: [a, b] 0 < a < b <!Engineered en "# 0 =$ Myopic en "# 0(Cover, 1969)
• range of Li: (0,!)Myopic en # 0 =$ Engineered en # 0(Papastavrou & Athans, 1992) (Cover, 1969)
(but slowly) (Tay, JNT & Win, 2008)
The Two Regimes
Social Engineered
U-LR: en ! 0 =" en ! 0(Papastavrou (Cover, 69)
& Athans, 92) but slowly(Tay, Win & JNT, 08)
B-LR: en #! 0 $= en #! 0(Koplowitz, 75)
B-LR, en #! 0 en ! 0ternary (Dia & JNT, 09) (Koplowitz, 75)
messages
21
Engineered — Architectural Comparisons
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear unclear unclear
less informationbut “same performance”(Tay, Win & JNT, 08)
more informationbut “same performance”(Zoumpoulis & JNT, 09)
but same exponent(Neyman-Pearson version)
24
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear
en ! exp{"!n}
! is known
(JNT, 88)
23
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear
en ! exp{"!n}
! is known
(JNT, 88)
23
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear
en ! exp{"!n}
! is known
(JNT, 88)
23
Engineered — Architectural Comparisons
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear unclear unclear
less informationbut “same performance”(Tay, Win & JNT, 08)
more informationbut “same performance”(Zoumpoulis & JNT, 09)
but same exponent(Neyman-Pearson version)
24
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear
en ! exp{"!n}
! is known
(JNT, 88)
23
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear
en ! exp{"!n}
! is known
(JNT, 88)
23
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear
en ! exp{"!n}
! is known
(JNT, 88)
23
Another example
vmv2v1
f
2 2 2
• Leaves make Bernoulli observations Xi
• Easy to verify that !!tree,NP < !!star,NP.
• Maybe low branching factor near the leaves is the culprit
15
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear unclear unclear
information loss
but same exponent(Neyman-Pearson version)
worse error exponent
en ! exp{"!n}
! is known
(JNT, 88)
23
Engineered — Architectural Comparisons
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear unclear unclear
less informationbut “same performance”(Tay, Win & JNT, 08)
more informationbut “same performance”(Zoumpoulis & JNT, 09)
but same exponent(Neyman-Pearson version)
24
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear
en ! exp{"!n}
! is known
(JNT, 88)
23
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear
en ! exp{"!n}
! is known
(JNT, 88)
23
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear
en ! exp{"!n}
! is known
(JNT, 88)
23
Another example
vmv2v1
f
2 2 2
• Leaves make Bernoulli observations Xi
• Easy to verify that !!tree,NP < !!star,NP.
• Maybe low branching factor near the leaves is the culprit
15
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear unclear unclear
information loss
but same exponent(Neyman-Pearson version)
worse error exponent
en ! exp{"!n}
! is known
(JNT, 88)
23
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear unclear unclear
information loss
but same exponent(Neyman-Pearson version)
worse error exponent
en ! exp{"!n}
! is known
(JNT, 88)
23
A Simple Tree
f
v1 v2
m m
• Each subtree: optimal NP testfor: P0(false alarm) = !/2
• P1(missed detection) ! e"m"#star,NP
• Fusion rule:declare H1 i! [Yv1 = 1 or Yv2 = 1]
13
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear unclear unclear
less informationbut “same performance”(Tay, Win & JNT, 08)
more informationbut “same performance”(Zoumpoulis & JNT, 09)
but same exponent(Neyman-Pearson version)
23
Engineered — Architectural Comparisons
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear unclear unclear
less informationbut “same performance”(Tay, Win & JNT, 08)
more informationbut “same performance”(Zoumpoulis & JNT, 09)
but same exponent(Neyman-Pearson version)
24
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear
en ! exp{"!n}
! is known
(JNT, 88)
23
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear
en ! exp{"!n}
! is known
(JNT, 88)
23
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear
en ! exp{"!n}
! is known
(JNT, 88)
23
Another example
vmv2v1
f
2 2 2
• Leaves make Bernoulli observations Xi
• Easy to verify that !!tree,NP < !!star,NP.
• Maybe low branching factor near the leaves is the culprit
15
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear unclear unclear
information loss
but same exponent(Neyman-Pearson version)
worse error exponent
en ! exp{"!n}
! is known
(JNT, 88)
23
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear unclear unclear
information loss
but same exponent(Neyman-Pearson version)
worse error exponent
en ! exp{"!n}
! is known
(JNT, 88)
23
A Simple Tree
f
v1 v2
m m
• Each subtree: optimal NP testfor: P0(false alarm) = !/2
• P1(missed detection) ! e"m"#star,NP
• Fusion rule:declare H1 i! [Yv1 = 1 or Yv2 = 1]
13
General Acyclic Networks — Engineered
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear unclear unclear
less informationbut “same performance”(Tay, Win & JNT, 08)
more informationbut “same performance”(Zoumpoulis & JNT, 09)
but same exponent(Neyman-Pearson version)
23
Engineered — Architectural Comparisons
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear unclear unclear
less informationbut “same performance”(Tay, Win & JNT, 08)
more informationbut “same performance”(Zoumpoulis & JNT, 09)
but same exponent(Neyman-Pearson version)
24
A Simple Tree
f
v1 v2
m m
• Each subtree: optimal NP testfor: P0(false alarm) = !/2
• P1(missed detection) ! e"m"#star,NP
• Fusion rule:declare H1 i! [Yv1 = 1 or Yv2 = 1]
13
Engineered — Architectural Comparisons
(Tay, Win & JNT, 08)
(Zoumpoulis & JNT, 09)
unclear unclear unclear
less informationbut “same performance”(Tay, Win & JNT, 08)
more informationbut “same performance”(Kreidl, Zoumpoulis & JNT, 10)
but same exponent(Neyman-Pearson version)
24
A Perspective on Bayesian Optimality
• Engineering
– Optimal rules are hard to find
– Can design “good” schemes (asymptotic, etc.)
• Social Networks
– Is Bayesian updating plausible?
• More realistic settings?
– Correlated private information
– Multiple hypotheses
– Complex topologies or interaction sequences
26
“Naive” Information Aggregation(Consensus and Averaging)
“Naive” Information Aggregation
(Consensus and Averaging)
26
The Setting
• n agents
– starting values xi(0)
• reach consensus on some x!, with either:
– mini xi(0) " x! " maxi xi(0) (consensus)
– x! =x1(0) + · · · + xn(0)
n(averaging)
– averaging when xi # {$1,+1} (voting)
• interested in:
– genuinely distributed algorithm
– no synchronization
– no “infrastructure” such as spanning trees
• simple updates, such as: xi :=xi + xj
229
The Setting
• n agents
– starting values xi(0)
• reach consensus on some x!, with either:
– mini xi(0) " x! " maxi xi(0) (consensus)
– x! =x1(0) + · · · + xn(0)
n(averaging)
– averaging when xi # {$1,+1} (voting)
• interested in:
– genuinely distributed algorithm
– no synchronization
– no “infrastructure” such as spanning trees
• simple updates, such as: xi :=xi + xj
229
The Setting
• n agents
– starting values xi(0)
• reach consensus on some x!, with either:
– mini xi(0) " x! " maxi xi(0) (consensus)
– x! =x1(0) + · · · + xn(0)
n(averaging)
– averaging when xi # {$1,+1} (voting)
• interested in:
– genuinely distributed algorithm
– no synchronization
– no “infrastructure” such as spanning trees
• simple updates, such as: xi :=xi + xj
229
Social sciences
• Merging of “expert” opinions
• Evolution of public opinion
• Evolution of reputation
• Modeling of jurors
• Language evolution
• Preference for “simple” models
– behavior described by “rules of thumb”
– less complex than Bayesian updating
• interested in modeling, analysis (descriptive theory)
– . . . and narratives
Social sciences
• Merging of “expert” opinions
• Evolution of public opinion
• Evolution of reputation
• Modeling of jurors
• Language evolution
• Preference for “simple” models
– behavior described by “rules of thumb”
– less complex than Bayesian updating
• interested in modeling, analysis (descriptive theory)
– . . . and narratives
Engineering• Distributed computation and sensor networks
– Fusion of individual estimates– Distributed Kalman filtering– Distributed optimization– Distributed reinforcement learning
• Networking– Load balancing and resource allocation– Clock synchronization– Reputation management in ad hoc networks– Network monitoring
• Multiagent coordination and control– Coverage control– Monitoring– Creating virtual coordinates for geographic routing– Decentralized task assignment– Flocking
30
Distributed optimization
minx
n!
i=1fi(x)
• centralized gradient iteration: x := x! !n!
i=1"fi(x)
• distributed gradient algorithm: xi := xi ! !"fi(xi)
• reconciling distributed updates: xi :=xi + xj
2
converges (when stepsize ! is small enough)under minimal assumptions
– time-varying interconnection topology
minx
f(x)
• centralized x := x! !"f(x) + noise
• distributed xi := x! !"f(x) + noisei
• reconcile xi :=xi + 2xj
3
• consensus algorithm su!ces
averaging algorithm needed
Distributed optimization
minx
n!
i=1fi(x)
• centralized gradient iteration: x := x! !n!
i=1"fi(x)
• distributed gradient algorithm: xi := xi ! !"fi(xi)
• reconciling distributed updates: xi :=xi + xj
2
converges (when stepsize ! is small enough)under minimal assumptions
– time-varying interconnection topology
minx
f(x)
• centralized x := x! !"f(x) + noise
• distributed xi := x! !"f(x) + noisei
• reconcile xi :=xi + 2xj
3
• consensus algorithm su!ces
averaging algorithm needed
Distributed optimization
minx
n!
i=1fi(x)
• centralized gradient iteration: x := x! !n!
i=1"fi(x)
• distributed gradient algorithm: xi := xi ! !"fi(xi)
• reconciling distributed updates: xi :=xi + xj
2
converges (when stepsize ! is small enough)under minimal assumptions
– time-varying interconnection topology
Distributed optimization
minx
n!
i=1fi(x)
• centralized gradient iteration: x := x! !n!
i=1"fi(x)
• distributed gradient algorithm: xi := xi ! !"fi(xi)
• reconciling distributed updates: xi :=xi + xj
2
converges (when stepsize ! is small enough)under minimal assumptions
– time-varying interconnection topology
Distributed optimization
minx
n!
i=1fi(x)
• centralized gradient iteration: x := x! !n!
i=1"fi(x)
• distributed gradient algorithm: xi := xi ! !"fi(xi)
• reconciling distributed updates: xi :=xi + xj
2
converges (when stepsize ! is small enough)under minimal assumptions
– time-varying interconnection topology
Distributed optimization
minx
n!
i=1fi(x)
• centralized gradient iteration: x := x! !n!
i=1"fi(x)
• distributed gradient algorithm: xi := xi ! !"fi(xi)
• reconciling distributed updates: xi :=xi + xj
2
converges (when stepsize ! is small enough)under minimal assumptions
– time-varying interconnection topology
Distributed optimization
minimizen!
i=1fi(x)
• agent i: update xi in direction of improvement of fi
reconcile updates through a consensus or averaging algorithm
minimize f(x1, . . . , xn)
• agent i: update xi, based on (possibly outdated) values of xj
9
minx
f(x)
• centralized x := x! !"f(x) + noise
• distributed xi := x! !"f(x) + noisei
• reconcile xi :=xi + 2xj
3
• consensus algorithm su!ces
averaging algorithm needed
Distributed optimization
minx
n!
i=1fi(x)
• centralized gradient iteration: x := x! !n!
i=1"fi(x)
• distributed gradient algorithm: xi := xi ! !"fi(xi)
• reconciling distributed updates: xi :=xi + xj
2
converges (when stepsize ! is small enough)under minimal assumptions
– time-varying interconnection topology
Distributed optimization
minx
n!
i=1fi(x)
• centralized gradient iteration: x := x! !n!
i=1"fi(x)
• distributed gradient algorithm: xi := xi ! !"fi(xi)
• reconciling distributed updates: xi :=xi + xj
2
converges (when stepsize ! is small enough)under minimal assumptions
– time-varying interconnection topology
Distributed optimization
minx
n!
i=1fi(x)
• centralized gradient iteration: x := x! !n!
i=1"fi(x)
• distributed gradient algorithm: xi := xi ! !"fi(xi)
• reconciling distributed updates: xi :=xi + xj
2
converges (when stepsize ! is small enough)under minimal assumptions
– time-varying interconnection topology
Asynchronous computation model
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
24
The DeGroot opinion pooling model (1974)
xi(t + 1) =!
j
aij xj(t) aij ! 0,"
j aij = 1
x(t + 1) = Ax(t) A: stochastic matrix
• Markov chain theory + “mixing conditions”
"# convergence of At, to matrix with equal rows
"# convergence of xi to"
j !jxj
"# convergence rate estimates
•
• x(t + 1) = A(t)x(t):mixing conditions for nonstationary Markov chains(Chatterjee and Seneta, 1977)
!(n2) for line graphs
Asynchronous computation model
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
24
Averaging algorithms
• Averaging algorithms
– A doubly stochastic: 1! A x = 1! x, where 1! = [1 1 . . . 1]
– x1 + · · · + xn is conserved
– convergence to x" =x1(0) + · · · + xn(0)
n
34
The DeGroot opinion pooling model (1974)
xi(t + 1) =!
j
aij xj(t) aij ! 0,"
j aij = 1
x(t + 1) = Ax(t) A: stochastic matrix
• Markov chain theory + “mixing conditions”
"# convergence of At, to matrix with equal rows
"# convergence of xi to"
j !jxj
"# convergence rate estimates
•
• x(t + 1) = A(t)x(t):mixing conditions for nonstationary Markov chains(Chatterjee and Seneta, 1977)
!(n2) for line graphs
Asynchronous computation model
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
24
Convergence time (time to get close to “steady-state”)
Equal weight to all neighborsUndirected graph: exponential(n)
Directed graph: O(n3), tight
The DeGroot opinion pooling model (1974)
Convergence time of consensus algorithms
xi(t + 1) =!
j
aij xj(t) aij ! 0,"
j aij = 1
x(t + 1) = Ax(t) A: stochastic matrix
• Markov chain theory + “mixing conditions”
"# convergence of At, to matrix with equal rows
"# convergence of xi to"
j !jxj
"# convergence rate estimates
•
• x(t + 1) = A(t)x(t):mixing conditions for nonstationary Markov chains(Chatterjee and Seneta, 1977)
Asynchronous computation model
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
24
Convergence time (time to get close to “steady-state”)
Equal weight to all neighborsUndirected graph: exponential(n)
Directed graph: O(n3), tight
Convergence time (time to get close to “steady-state”)
Equal weight to all neighborsDirected graphs: exponential(n)
Undirected graphs: O(n3), tight(Landay and Odlyzko, 1981)
Better results for special graphs(Erdos-Renyi, geometric, small world)
Convergence time (time to get close to “steady-state”)
Equal weight to all neighborsDirected graphs: exponential(n)
Undirected graphs: O(n3), tight(Landau and Odlyzko, 1981)
Better results for special graphs(Erdos-Renyi, geometric, small world)
The DeGroot opinion pooling model (1974)
Convergence time of consensus algorithms
xi(t + 1) =!
j
aij xj(t) aij ! 0,"
j aij = 1
x(t + 1) = Ax(t) A: stochastic matrix
• Markov chain theory + “mixing conditions”
"# convergence of At, to matrix with equal rows
"# convergence of xi to"
j !jxj
"# convergence rate estimates
•
• x(t + 1) = A(t)x(t):mixing conditions for nonstationary Markov chains(Chatterjee and Seneta, 1977)
A critique
• Social Networks
– “The process that it describes is intuitively appealing.”(DeGroot, 74)
– How plausible is this type of synchronism?
• Engineering
– If the graph is indeed fixed:elect a leader, form a spanning tree, accumulate on tree
– Want simplicity, and robustness w.r.t. changing topologies,failures, etc.
– Di!erent models and specs?
45
Asynchronous computation model
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
zero-delay case: x(t + 1) = A(t)x(t) (inhomogeneous chain)
27
Asynchronous computation model
Time-Varying/Chaotic Environments
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
zero-delay case: x(t + 1) = A(t)x(t) (inhomogeneous chain)
27
Asynchronous computation model
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
xi(t + 1) =!
j
aij(t)xj(t)
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
24
Asynchronous computation model
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
Asynchronous computation model
(JNT, Bertsekas, Athans, 1986)
xi(t + 1) =!
j
aij(t)xj
"t! dij(t)
#
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
6
xi(t + 1) =!
j
aij(t)xj(t)
aij(t): nonzero whenever i receives message from j
dij(t): delay of that message
24
Variants
xi(t + 1) =!
j
aij xj(t) aij ! 0,"
j aij = 1
x(t + 1) = Ax(t) A: stochastic matrix
• Fixed matrix A, subject to given graph/zero patternsoptimize A via SDP (Boyd & Xiao, 2003)
• i.i.d. random graphs: same (in expectation) as fixed graphs;convergence rate "# “mixing times” (Boyd et al., 2005)
• Fairly arbitrary sequence of graphs/matrices A(t):worst-case analysis
• “equal-neighbor model”: xi := average of messages and own value
• bidirectional model: $ t: i# j i! i" j
• doubly stochastic A(t): sum & average preserving
Consensus convergence
xi(t + 1) =!
j
aij(t)xj
"t)
• aii(t) > 0; aij(t) > 0 =! aij(t) " ! > 0
• “strong connectivity in bounded time”:over B time steps “communication graph”is strongly connected
• Convergence to consensus:# i : xi(t)$ x% =convex combination of initial values(JNT, Bertsekas, Athans, 86; Jadbabaie et al., 03)
• “convergence time”: exponential in n and B
– even with:symmetric graph at each timeequal weight to each neighbor(Cao, Spielman, Morse, 05)
48
Averaging in Time-Varying Setting
• x(t + 1) = A(t)x(t)
– A(t) doubly stochastic, for all t,
– nonzero aij(t) ! ! > 0
– O(n2/!)
• Improved convergence rate
– exchange “load” with up to two neighbors at a time
– can use ! = O(1)
– convergence time: O(n2)
• Is there a !(n2) bound to be discovered?
52
! 1/n1.5
Averaging algorithms – analysis
• Suppose initially!
i xi(0) = 0,!
i x2i = 1
some node has |xi| ! 1/"
n some node has opposite sign
total gap at least 1/"
n some gap at least 1/n1.5
• Over B steps, communicate across gap: V improves by # !/n3
– Convergence time: O(n3/!) ! # 1/degree $% O(n4)– Account for simultaneous contributions of all gaps: O(n2/!)– Keep degree bounded $% 1/! bounded
• Averaging in time-varying bidirectional graphs:no harder than consensus on fixed graphs
• Various convergence proofs of optimization algs. remain valid– Improves the convergence time estimate for subgradient
methods [Nedic, Olshevsky, Ozdaglar, JNT, 09]
Averaging in Time-Varying Setting
• x(t + 1) = A(t)x(t)
– A(t) doubly stochastic, for all t,
– nonzero aij(t) ! ! > 0
– O(n2/!)
• Improved convergence rate
– exchange “load” with up to two neighbors at a time
– can use ! = O(1)
– convergence time: O(n2)
• Is there a !(n2) bound to be discovered?
52
Averaging in Time-Varying Setting
• x(t + 1) = A(t)x(t)
– A(t) doubly stochastic, for all t,
– nonzero aij(t) ! ! > 0
– O(n2/!)
• Improved convergence rate
– exchange “load” with up to two neighbors at a time
– can use ! = O(1)
– convergence time: O(n2)
• Is there a !(n2) bound to be discovered?
52
! 1/n1.5
Averaging algorithms – analysis
• Suppose initially!
i xi(0) = 0,!
i x2i = 1
some node has |xi| ! 1/"
n some node has opposite sign
total gap at least 1/"
n some gap at least 1/n1.5
• Over B steps, communicate across gap: V improves by # !/n3
– Convergence time: O(n3/!) ! # 1/degree $% O(n4)– Account for simultaneous contributions of all gaps: O(n2/!)– Keep degree bounded $% 1/! bounded
• Averaging in time-varying bidirectional graphs:no harder than consensus on fixed graphs
• Various convergence proofs of optimization algs. remain valid– Improves the convergence time estimate for subgradient
methods [Nedic, Olshevsky, Ozdaglar, JNT, 09]
Averaging in Time-Varying Setting
• x(t + 1) = A(t)x(t)
– A(t) doubly stochastic, for all t,
– nonzero aij(t) ! ! > 0
– O(n2/!)
• Improved convergence rate
– exchange “load” with up to two neighbors at a time
– can use ! = O(1)
– convergence time: O(n2)
• Is there a !(n2) bound to be discovered?
52
Averaging in Time-Varying Setting
• x(t + 1) = A(t)x(t)
– A(t) doubly stochastic, for all t,
– nonzero aij(t) ! ! > 0
– O(n2/!)
• Improved convergence rate
– exchange “load” with up to two neighbors at a time
– can use ! = O(1)
– convergence time: O(n2)
• Is there a !(n2) bound to be discovered?
52
! 1/n1.5
Averaging algorithms – analysis
• Suppose initially!
i xi(0) = 0,!
i x2i = 1
some node has |xi| ! 1/"
n some node has opposite sign
total gap at least 1/"
n some gap at least 1/n1.5
• Over B steps, communicate across gap: V improves by # !/n3
– Convergence time: O(n3/!) ! # 1/degree $% O(n4)– Account for simultaneous contributions of all gaps: O(n2/!)– Keep degree bounded $% 1/! bounded
• Averaging in time-varying bidirectional graphs:no harder than consensus on fixed graphs
• Various convergence proofs of optimization algs. remain valid– Improves the convergence time estimate for subgradient
methods [Nedic, Olshevsky, Ozdaglar, JNT, 09]
Averaging in Time-Varying Setting
• x(t + 1) = A(t)x(t)
– A(t) doubly stochastic, for all t,
– nonzero aij(t) ! ! > 0
– O(n2/!)
• Improved convergence rate
– exchange “load” with up to two neighbors at a time
– can use ! = O(1)
– convergence time: O(n2)
• Is there a !(n2) bound to be discovered?
52
Closing Thoughts
Engineering
– Bayesian: near-optimal designs possible
– Naive: interesting, manageable design questions
Social Networks
– What are the plausible models?
!ank y"!