A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

33
A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy

Transcript of A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

Page 1: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

A new family of regular semivalues and applications

Roberto LucchettiPolitecnico di Milano,Italy

Page 2: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 2

Main goal:

To rank genes from DNA data provided by Microarray Analysis.

Tools: Cooperative Game Theory, in particular Power indicesPower indices rank players according to their “strength” in the game.

In the EU council the strongest states (GE,FR,IT,UK) have a some 10 times power w.r.t. the weakest state (MT)

In UN the veto players have a some 100 (10) times power w.r.t. non permanent players, according to Shapley (Banzhaf).

Page 3: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 3

A (TU) game is with

N={1,…,n} is the set of players,

v is the characteristic function of the game.

A N is called coalition.

v(A) is the utility (or cost) for the coalition A.

GN represents the set of all games having N as set of players.

Remark:

GN R2n-1

Page 4: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 4

A Base for GN:

Unanimity gamesSubclass of games:

Simple games. Among them the weighted majority games:

Page 5: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

Introduction: how an array works

A chip can contain millions of DNA probes

Page 6: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

Introduction: how a microarray works

Hybridization

When a single DNA helix meets a single mRNA helix, if they are complementary they will stick to each other.

Hybridization helps researchers to identify what RNA sequences are present in a sample and this tells them what genes are being expressed by the organism and how much they are being expressed.

Page 7: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

Introduction: how a microarray works

GeneChip microarrays use the natural chemical attraction between the RNA target (from the sample preparation) and the DNA on the array to determine the expression level of a given gene.

Adenine (A)

Guanine (G)

Thyimine (T)/Uracil (U)

Cytosine (C)

DNA/RNADNA/RNA

T

C

A

G

Page 8: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

Introduction: how a microarray works

The RNA extract from a sample is copied in cRNA (through a process known as PCR)PCR). Copying the RNA allows it to be more easily detected on the array. At the same time the RNA is copied, a chemical flourescent molecule called biotin is attached to the strand. This molecule will show where the sample RNA has stuck to the DNA probe on the array.

Page 9: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

Introduction: how a microarray works

If the gene is highly expressed,many RNA molecules will stick to the probe and the probe location will shine brightly when the laser hit it.

If the sample RNA doesn’t match it will be rejected by the probe on the array and when the laser hits the probe, nothing glows.

Page 10: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

Introduction: how a microarray works

The whole point of microarray gene expression analysis is to compare expression levels among different samples. Let’s simplify the situation with an example in which we have four genes and two samples.Gene1: 2RUDE Gene2: 2LOUD Gene3: GETOUT Gene4: FATMET

Gene4 is not glowing.

Page 11: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

Array1 Array2 Array3

array 1 array 2 array 3 array 4 …

gene 1 0,67 0,45 1,32 1,34 …

gene 2 1,01 1,13 1,54 2,13 …

gene 3 1,38 1,21 1,23 0,12 …

gene 4 0,65 0,98 0,54 … …

gene 5 0,17 1,32 2,43 … …

… … … … … …

Expression level of gene 4 in array 2

Page 12: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 12

The Microarray Game:

An mxn Boolean matrix M such that

Given the column , supp

Page 13: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 13

Sample 1 Sample 2 Sample 3

gene1 0.5 0.2 1

gene2 0.4 1 0.3

gene3 0.8 0.4 0.2

Sample1 Sample2 Sample3 Sample 4

gene1 0.7 0.3 1.8 0.8

gene2 0.1 0.2 0.5 0.9

gene3 1 0.6 1.7 0.1

Sample1 Sample2 Sample3 Sample4

gene1 0 0 1 0

gene2 1 1 0 0

gene3 1 0 1 1

Page 14: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 14

A power index for the game (N,v) is (x1,…,xn) such

that:

xi represents the power of player i in game v.

weighted voting does not work…

The most famous:

Shapley () and Banzhaf () .

Page 15: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 15

the marginal contribution of i to S {i}

Shapley () and Banzhaf()

Page 16: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 16

is a probabilistic value if there is a probability

on

such that

Shapley

Banzhaf

Page 17: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 17

If pi(S)=p(|S|)>0, the probabilistic value is called regular semivalue

Examples:

Banzhaf Shapley p-binomial

Regular semivalues are points in the simplex:

Page 18: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 18

Properties for power indices

Let

The solution has the dummy player (DP) property, if for each player such that

for all coalitions A not containing i,

Page 19: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 19

Let be a permutation.

Given the game v, denote by the game

and by

The solution has the symmetry (S) property if, for each permutation as above

Page 20: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 20

The new family of power indices

Let

Define on the unanimity game as

and extend it by linearity on a generic

Page 21: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 21

Page 22: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 22

Page 23: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 23

Page 24: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 24

Theorem 1

There exists one and only one value fulfilling the symmetry, linearity and dummy player properties, and assigning aS to all non null players

in the unanimity game uS , where a1=1 and as>0 for s=2,…,n.

fulfills the formula:

Page 25: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 25

Theorem 2 a is a regular semivalue for all a>0. 2 fulfills the formula:

•Corollary

The family of the weighting coefficients of the values a, a>0,is an open curve in the simplex of the regular semivalues, containing the Shapley value. The addition of the Banzhaf value to the curve provides a one-point compactification of the curve.

Page 26: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 26

Theorem 3 study of the term:

Key tool

Let , let

Then

Moreover, for all natural l, and positive real a,x:

Finally, for each natural m, the following formula holds:

Page 27: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 27

Let count in how many ways the sum of the weights of j players different from i can give k. Then the following proposition holds.

Let be the value defined in the theorem above. Let q>0 be a positive integer, and let w1,…,wn be non negative integers.

Let v=[q;w1,…,wn] be the associated weighted majority game. Then the following formula holds:

Calculating the indices in weighted majority games

An efficient algorithm based on generating functions and formal series allows for a fast calculation of the coefficients

Page 28: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 28

Applications

The EU

Page 29: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

29

STATI SY S2 BF SY(I)/MT S2(I)/MT BF(I)/MTGE 0,086738 0,02797 0,032688 10,6066383 9,815722703 8,260803639FR 0,086738 0,02797 0,032688 10,6066383 9,815722703 8,260803639IT 0,086738 0,02797 0,032688 10,6066383 9,815722703 8,260803639UK 0,086738 0,02797 0,032688 10,6066383 9,815722703 8,260803639SP 0,079975 0,025999 0,031164 9,77960769 9,123884457 7,875663381PL 0,079975 0,025999 0,031164 9,77960769 9,123884457 7,875663381RO 0,039937 0,013476 0,017889 4,88360405 4,729163962 4,520849128NL 0,036825 0,012476 0,016691 4,5031054 4,378366807 4,218094516BE 0,034068 0,011555 0,015475 4,16600531 4,055048061 3,910791003CZ 0,034068 0,011555 0,015475 4,16600531 4,055048061 3,910791003GR 0,034068 0,011555 0,015475 4,16600531 4,055048061 3,910791003HU 0,034068 0,011555 0,015475 4,16600531 4,055048061 3,910791003PT 0,034068 0,011555 0,015475 4,16600531 4,055048061 3,910791003SE 0,028193 0,00961 0,012989 3,44756282 3,372390341 3,282537276AU 0,028193 0,00961 0,012989 3,44756282 3,372390341 3,282537276BG 0,028193 0,00961 0,012989 3,44756282 3,372390341 3,282537276FI 0,019606 0,006721 0,00916 2,39749856 2,358602005 2,314885014DK 0,019606 0,006721 0,00916 2,39749856 2,358602005 2,314885014SK 0,019606 0,006721 0,00916 2,39749856 2,358602005 2,314885014IR 0,019606 0,006721 0,00916 2,39749856 2,358602005 2,314885014LT 0,019606 0,006721 0,00916 2,39749856 2,358602005 2,314885014LV 0,011042 0,003813 0,005251 1,35024683 1,338033557 1,327015416SLO 0,011042 0,003813 0,005251 1,35024683 1,338033557 1,327015416CY 0,011042 0,003813 0,005251 1,35024683 1,338033557 1,327015416ES 0,011042 0,003813 0,005251 1,35024683 1,338033557 1,327015416LU 0,011042 0,003813 0,005251 1,35024683 1,338033557 1,327015416MT 0,008178 0,00285 0,003957 1 1 1

Page 30: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 30

The power indices, when considering the 56 genes common to the indices, among the first 100 common to all indices. Data from 40 tumor samples vs 22 normal, 2000 genes

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0 10 20 30 40 50 60

Genes

No

rmal

ized

val

ues

sigma2(10^-4)

sigma3(10^-6)

Banzhaf(10^-13)

Shapley(10^-2)

Page 31: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 31

Data from a Colon Rectal Cancer10 Healthy 12 Tumoral tissues

An extended microarray game considers also how much the genes are abnormally expressed w.r.t a normality interval.Given the normality interval [mi,Mi] of the gene i, si the standard deviation, Nk

i=[mi-ksi,mi+ksi], assign k to the ij cell of the matrix if value of gene i in patient j falls in Ni

k \ Nik-1

A weighted Shapley value is used to rank genes. This allows better differentiating the genes. Taking the first 100 genes in the ranking, the game is formed as an average of weighted majority games.Then we calculate the Shapley, Banzhaf and 2 indices

Page 32: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 32

Gene expression analysis was performed by using Human Genome U133A-Plus 2.0 GeneChip arrays (Affymetrix, Inc., Calif).

The following 7 genes are quoted in medical literature as having great importance in the onset of the disease:CYR61, UCHL1, FOS,FOSB, EGR1, VIP, KRT24.

One of them was ranked around the 100-th position by the weighted Shapley value. All other ones are among the first 50 and played the subsequent game.

S B 2

FOSB 2 1 1

CYR61 1 2 2

FOS 3 3 3

VIP 5 5 6

EGR1 10 9 9

KRT24 45 35 35

Page 33: A new family of regular semivalues and applications Roberto Lucchetti Politecnico di Milano,Italy.

R.Lucchetti Politecnico di Milano 33

References R.Lucchetti P.Radrizzani, E. Munarini, A new family of regular semivalues

and applications, Int.J.of Game Theory DOI 10.1007/s00182-010-0263-5

R. Lucchetti-S. Moretti-F. Patrone-P. Radrizzani, The Shapley and Banzhaf indices in microarray games, Computers and Operations Research, 37, (2010) p. 1406-1412.

R. Lucchetti-P.Radrizzani, Microarray Data Analysis Via Weighted Indices and Weighted Majority Games, Computational Intelligent Methods for Bioinformatics and Biostatistics II, Masulli, Peterson, Tagliaferri (Eds), Lecture Notes in Computer Science, Springer (2010) p.179-190.

S.Moretti , F.Patrone, S.Bonassi, The class of microarraygames and the relevance index for genes. TOP 15 (2007), p256-280.

D. Albino, P. Scaruffi, S. Moretti, S.Coco, C.Di Cristofano, A.Cavazzana, M.Truini, S.Stigliani, S.Bonassi, G.Ptonini (2008): Stroma poor and stroma rich gene signatures show a low intratumoral gene expression heterogeneity in Neuroblastic tumors. Cancer 113, p. 1412-1422.