Statistical Analysis of Networks - Networks,...

144
Statistical Analysis of Networks - Networks, correlation and time series Dimitris Kugiumtzis November 7, 2018 Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time

Transcript of Statistical Analysis of Networks - Networks,...

Page 1: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Statistical Analysis of Networks -Networks, correlation and time series

Dimitris Kugiumtzis

November 7, 2018

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 2: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Contents

7/11/2018 Networks, correlation and time series

14/11/2018 Correlation, complexity, and coupling measures oftime series

21/11/2018 Analysis of multi-variate time series by means ofnetworks

16/11/2018 Connectivity networks and applications

5/12/2018 Networks from time series using Matlab

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 3: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Introduction - Example: Games of world cup 1930 - 2006

Data from: http://gd2006.org/contest/details.php#worldcup see [1]

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 4: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Introduction - Example: Flight connections

Data from: https://au.pinterest.com/pin/488077678338752549

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 5: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Introduction - Example: Flight connections

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 6: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Introduction - Example: Flight connections

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 7: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Introduction - Example: Ship connections

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 8: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Introduction - Example: similar web-pages

Data from: http://vlado.fmf.uni-lj.si/pub/networks/data/GD/gd97/B97.net

see [1]

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 9: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Introduction - Example: Finance

Network ?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 10: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Introduction - Example: Finance

Network ?Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 11: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Introduction - Example: Brain Data

Network ?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 12: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Introduction - Example: Brain Data

Network ?Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 13: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Introduction - Example: brain network

ECoG: ”Linear and nonlinearassociation measures produce similarassociation matrices and networks.”

?

It is important to:

Use appropriate measure ofcorrelation / association /causality.

Assess the significance of themeasure.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 14: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Introduction - Example: brain network

ECoG: ”Linear and nonlinearassociation measures produce similarassociation matrices and networks.”

?

It is important to:

Use appropriate measure ofcorrelation / association /causality.

Assess the significance of themeasure.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 15: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Introduction - Example: brain network

ECoG: ”Linear and nonlinearassociation measures produce similarassociation matrices and networks.”

?

It is important to:

Use appropriate measure ofcorrelation / association /causality.

Assess the significance of themeasure.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 16: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Introduction - Example: brain network

ECoG: ”Linear and nonlinearassociation measures produce similarassociation matrices and networks.”

?

It is important to:

Use appropriate measure ofcorrelation / association /causality.

Assess the significance of themeasure.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 17: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Introduction - Example: brain network

ECoG: ”Linear and nonlinearassociation measures produce similarassociation matrices and networks.”

?

It is important to:

Use appropriate measure ofcorrelation / association /causality.

Assess the significance of themeasure.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 18: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Network

A network consists of nodes and links

Node: national team, web-page, ...Link: match between two teams, link between two web-pages, ...Each node is an entity and the link denotes a connection betweentwo entities.

Here, we will study a different (specific) type of nodes and links:Each node type is a variable.The link denotes some form of association or correlation betweenthe variables.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 19: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Network

A network consists of nodes and linksNode: national team, web-page, ...

Link: match between two teams, link between two web-pages, ...Each node is an entity and the link denotes a connection betweentwo entities.

Here, we will study a different (specific) type of nodes and links:Each node type is a variable.The link denotes some form of association or correlation betweenthe variables.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 20: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Network

A network consists of nodes and linksNode: national team, web-page, ...Link: match between two teams, link between two web-pages, ...

Each node is an entity and the link denotes a connection betweentwo entities.

Here, we will study a different (specific) type of nodes and links:Each node type is a variable.The link denotes some form of association or correlation betweenthe variables.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 21: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Network

A network consists of nodes and linksNode: national team, web-page, ...Link: match between two teams, link between two web-pages, ...Each node is an entity and the link denotes a connection betweentwo entities.

Here, we will study a different (specific) type of nodes and links:Each node type is a variable.The link denotes some form of association or correlation betweenthe variables.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 22: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Network

A network consists of nodes and linksNode: national team, web-page, ...Link: match between two teams, link between two web-pages, ...Each node is an entity and the link denotes a connection betweentwo entities.

Here, we will study a different (specific) type of nodes and links:Each node type is a variable.The link denotes some form of association or correlation betweenthe variables.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 23: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Association Network (link: association rule) see [2], Sec 7.3

Link: association level between attributes of the two nodes.Association is not necessarily determined by a statistical measure.

Example

Scientometrics: Study the relationship among various scientificdisciplines. see [2], Sec 3.5.1

Node (unit): scientific journalLink: interaction between two journals,e.g. a link is established if journal i cites journal j at least oncewithin a given period (directed link), and vice versa (undirectedlink).

Another association rule is the Jaccard measure:JACij = JACji =

Cij+Cji∑k 6=j Cik+

∑k 6=i Cjk

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 24: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Association Network (link: association rule) see [2], Sec 7.3

Link: association level between attributes of the two nodes.Association is not necessarily determined by a statistical measure.

Example

Scientometrics: Study the relationship among various scientificdisciplines. see [2], Sec 3.5.1

Node (unit): scientific journalLink: interaction between two journals,e.g. a link is established if journal i cites journal j at least oncewithin a given period (directed link), and vice versa (undirectedlink).

Another association rule is the Jaccard measure:JACij = JACji =

Cij+Cji∑k 6=j Cik+

∑k 6=i Cjk

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 25: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Association Network (link: association rule) see [2], Sec 7.3

Link: association level between attributes of the two nodes.Association is not necessarily determined by a statistical measure.

Example

Scientometrics: Study the relationship among various scientificdisciplines. see [2], Sec 3.5.1

Node (unit): scientific journalLink: interaction between two journals,e.g. a link is established if journal i cites journal j at least oncewithin a given period (directed link), and vice versa (undirectedlink).

Another association rule is the Jaccard measure:JACij = JACji =

Cij+Cji∑k 6=j Cik+

∑k 6=i Cjk

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 26: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Association Network (link: association rule) see [2], Sec 7.3

Link: association level between attributes of the two nodes.Association is not necessarily determined by a statistical measure.

Example

Scientometrics: Study the relationship among various scientificdisciplines. see [2], Sec 3.5.1

Node (unit): scientific journal

Link: interaction between two journals,e.g. a link is established if journal i cites journal j at least oncewithin a given period (directed link), and vice versa (undirectedlink).

Another association rule is the Jaccard measure:JACij = JACji =

Cij+Cji∑k 6=j Cik+

∑k 6=i Cjk

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 27: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Association Network (link: association rule) see [2], Sec 7.3

Link: association level between attributes of the two nodes.Association is not necessarily determined by a statistical measure.

Example

Scientometrics: Study the relationship among various scientificdisciplines. see [2], Sec 3.5.1

Node (unit): scientific journalLink: interaction between two journals,

e.g. a link is established if journal i cites journal j at least oncewithin a given period (directed link), and vice versa (undirectedlink).

Another association rule is the Jaccard measure:JACij = JACji =

Cij+Cji∑k 6=j Cik+

∑k 6=i Cjk

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 28: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Association Network (link: association rule) see [2], Sec 7.3

Link: association level between attributes of the two nodes.Association is not necessarily determined by a statistical measure.

Example

Scientometrics: Study the relationship among various scientificdisciplines. see [2], Sec 3.5.1

Node (unit): scientific journalLink: interaction between two journals,e.g. a link is established if journal i cites journal j at least oncewithin a given period (directed link), and vice versa (undirectedlink).

Another association rule is the Jaccard measure:JACij = JACji =

Cij+Cji∑k 6=j Cik+

∑k 6=i Cjk

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 29: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Association Network (link: association rule) see [2], Sec 7.3

Link: association level between attributes of the two nodes.Association is not necessarily determined by a statistical measure.

Example

Scientometrics: Study the relationship among various scientificdisciplines. see [2], Sec 3.5.1

Node (unit): scientific journalLink: interaction between two journals,e.g. a link is established if journal i cites journal j at least oncewithin a given period (directed link), and vice versa (undirectedlink).

Another association rule is the Jaccard measure:JACij = JACji =

Cij+Cji∑k 6=j Cik+

∑k 6=i Cjk

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 30: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Association Network (link: association rule) see [2], Sec 7.3

Link: association level between attributes of the two nodes.Association is not necessarily determined by a statistical measure.

Example

Scientometrics: Study the relationship among various scientificdisciplines. see [2], Sec 3.5.1

Node (unit): scientific journalLink: interaction between two journals,e.g. a link is established if journal i cites journal j at least oncewithin a given period (directed link), and vice versa (undirectedlink).

Another association rule is the Jaccard measure:JACij = JACji =

Cij+Cji∑k 6=j Cik+

∑k 6=i Cjk

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 31: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Association Network (link: association rule) see [2], Sec 7.3

Link: association level between attributes of the two nodes.Association is not necessarily determined by a statistical measure.

Example

Scientometrics: Study the relationship among various scientificdisciplines. see [2], Sec 3.5.1

Node (unit): scientific journalLink: interaction between two journals,e.g. a link is established if journal i cites journal j at least oncewithin a given period (directed link), and vice versa (undirectedlink).

Another association rule is the Jaccard measure:JACij = JACji =

Cij+Cji∑k 6=j Cik+

∑k 6=i Cjk

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 32: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

A “backbone” map of Science and Social Science:7121 journals from year 2000

Comp Sci

PolySciLaw

LIS

Geogr

Hist

Econ

Sociol

Nursing

Educ

Comm

Psychol

Geront

Neurol

RadiolSport Sci

Oper Res

Math

Robot

AIStat

Psychol

Anthrop

Elect Eng

Physics

Mech Eng

ConstrMatSci

FuelsElectChemP Chem

Chemistry

AnalytChem

Astro

Env

Pharma

Neuro Sci

Chem Eng

Polymer

GeoSci

GeoSci

Paleo

Meteorol

EnvMarine

Social Sci

SoilPlant

Ecol

Agric

Earth Sciences

Psychol

OtoRh

HealthCare

BiomedRehab

Gen Med

GenetCardio

Ped

Food Sci

Zool

EntoVet Med

Parasit

Ophth

DairyEndocr

Ob/Gyn

Virol

Hemat

Oncol

Immun

BioChem

Nutr

Endocr

Urol

Dentist

Derm

Pathol

Gastro

SurgMedicine

ApplMath

Aerosp

CondMatNuc

EmergMed Gen/Org

42

Source: http://grants.nih.gov/grants/KM/OERRM/OER KM events/Borner.pdf

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 33: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

The 212 nodes represent clusters of journals for different disciplines

44

Source: http://grants.nih.gov/grants/KM/OERRM/OER KM events/Borner.pdf

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 34: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Association Network (node: vector of attributes)

Each node i is presented with a vector xi of n observed attributesxi = [xi1, . . . , xin]′ .

A similarity measure sim(i , j) quantifies the level of associationbetween two such nodes i and j .

sim(i , j) may take numerical values.

A link is assigned if the level of sim(i , j) constitutes non-trivialassociation between i and j .

sim(i , j) is not directly observable but can be inferred by theinformation in xi and xj .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 35: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Association Network (node: vector of attributes)

Each node i is presented with a vector xi of n observed attributesxi = [xi1, . . . , xin]′ .

A similarity measure sim(i , j) quantifies the level of associationbetween two such nodes i and j .

sim(i , j) may take numerical values.

A link is assigned if the level of sim(i , j) constitutes non-trivialassociation between i and j .

sim(i , j) is not directly observable but can be inferred by theinformation in xi and xj .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 36: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Association Network (node: vector of attributes)

Each node i is presented with a vector xi of n observed attributesxi = [xi1, . . . , xin]′ .

A similarity measure sim(i , j) quantifies the level of associationbetween two such nodes i and j .

sim(i , j) may take numerical values.

A link is assigned if the level of sim(i , j) constitutes non-trivialassociation between i and j .

sim(i , j) is not directly observable but can be inferred by theinformation in xi and xj .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 37: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Association Network (node: vector of attributes)

Each node i is presented with a vector xi of n observed attributesxi = [xi1, . . . , xin]′ .

A similarity measure sim(i , j) quantifies the level of associationbetween two such nodes i and j .

sim(i , j) may take numerical values.

A link is assigned if the level of sim(i , j) constitutes non-trivialassociation between i and j .

sim(i , j) is not directly observable but can be inferred by theinformation in xi and xj .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 38: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Association Network (node: vector of attributes)

Each node i is presented with a vector xi of n observed attributesxi = [xi1, . . . , xin]′ .

A similarity measure sim(i , j) quantifies the level of associationbetween two such nodes i and j .

sim(i , j) may take numerical values.

A link is assigned if the level of sim(i , j) constitutes non-trivialassociation between i and j .

sim(i , j) is not directly observable but can be inferred by theinformation in xi and xj .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 39: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 1: Profile of Department web-sites

Example

Consider as network an ensemble of Departments of some sort(e.g. of the same University, discipline, country etc).

The interest is in studying the quality / strength / similarity of theweb-profiles of the Departments.Node: a Dept web-site, assigned with a number of attributes:

Staff members having their home-page linked to theDepartment web-pages (e.g. given as percentage).

Number of announcements within the last month.

Other ???

What is an appropriate sim(i , j) to infer links?

... the above is the first exercise!You should determine and collect data for: Departments (e.g. 5),attributes (e.g. 4-5), and determine a suitable sim(i , j).You may use a software (e.g. pajek) to draw the network.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 40: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 1: Profile of Department web-sites

Example

Consider as network an ensemble of Departments of some sort(e.g. of the same University, discipline, country etc).The interest is in studying the quality / strength / similarity of theweb-profiles of the Departments.

Node: a Dept web-site, assigned with a number of attributes:

Staff members having their home-page linked to theDepartment web-pages (e.g. given as percentage).

Number of announcements within the last month.

Other ???

What is an appropriate sim(i , j) to infer links?

... the above is the first exercise!You should determine and collect data for: Departments (e.g. 5),attributes (e.g. 4-5), and determine a suitable sim(i , j).You may use a software (e.g. pajek) to draw the network.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 41: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 1: Profile of Department web-sites

Example

Consider as network an ensemble of Departments of some sort(e.g. of the same University, discipline, country etc).The interest is in studying the quality / strength / similarity of theweb-profiles of the Departments.Node: a Dept web-site, assigned with a number of attributes:

Staff members having their home-page linked to theDepartment web-pages (e.g. given as percentage).

Number of announcements within the last month.

Other ???

What is an appropriate sim(i , j) to infer links?

... the above is the first exercise!You should determine and collect data for: Departments (e.g. 5),attributes (e.g. 4-5), and determine a suitable sim(i , j).You may use a software (e.g. pajek) to draw the network.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 42: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 1: Profile of Department web-sites

Example

Consider as network an ensemble of Departments of some sort(e.g. of the same University, discipline, country etc).The interest is in studying the quality / strength / similarity of theweb-profiles of the Departments.Node: a Dept web-site, assigned with a number of attributes:

Staff members having their home-page linked to theDepartment web-pages (e.g. given as percentage).

Number of announcements within the last month.

Other ???

What is an appropriate sim(i , j) to infer links?

... the above is the first exercise!You should determine and collect data for: Departments (e.g. 5),attributes (e.g. 4-5), and determine a suitable sim(i , j).You may use a software (e.g. pajek) to draw the network.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 43: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 1: Profile of Department web-sites

Example

Consider as network an ensemble of Departments of some sort(e.g. of the same University, discipline, country etc).The interest is in studying the quality / strength / similarity of theweb-profiles of the Departments.Node: a Dept web-site, assigned with a number of attributes:

Staff members having their home-page linked to theDepartment web-pages (e.g. given as percentage).

Number of announcements within the last month.

Other ???

What is an appropriate sim(i , j) to infer links?

... the above is the first exercise!You should determine and collect data for: Departments (e.g. 5),attributes (e.g. 4-5), and determine a suitable sim(i , j).You may use a software (e.g. pajek) to draw the network.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 44: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 1: Profile of Department web-sites

Example

Consider as network an ensemble of Departments of some sort(e.g. of the same University, discipline, country etc).The interest is in studying the quality / strength / similarity of theweb-profiles of the Departments.Node: a Dept web-site, assigned with a number of attributes:

Staff members having their home-page linked to theDepartment web-pages (e.g. given as percentage).

Number of announcements within the last month.

Other ???

What is an appropriate sim(i , j) to infer links?

... the above is the first exercise!You should determine and collect data for: Departments (e.g. 5),attributes (e.g. 4-5), and determine a suitable sim(i , j).You may use a software (e.g. pajek) to draw the network.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 45: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 1: Profile of Department web-sites

Example

Consider as network an ensemble of Departments of some sort(e.g. of the same University, discipline, country etc).The interest is in studying the quality / strength / similarity of theweb-profiles of the Departments.Node: a Dept web-site, assigned with a number of attributes:

Staff members having their home-page linked to theDepartment web-pages (e.g. given as percentage).

Number of announcements within the last month.

Other ???

What is an appropriate sim(i , j) to infer links?

... the above is the first exercise!You should determine and collect data for: Departments (e.g. 5),attributes (e.g. 4-5), and determine a suitable sim(i , j).You may use a software (e.g. pajek) to draw the network.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 46: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 1: Profile of Department web-sites

Example

Consider as network an ensemble of Departments of some sort(e.g. of the same University, discipline, country etc).The interest is in studying the quality / strength / similarity of theweb-profiles of the Departments.Node: a Dept web-site, assigned with a number of attributes:

Staff members having their home-page linked to theDepartment web-pages (e.g. given as percentage).

Number of announcements within the last month.

Other ???

What is an appropriate sim(i , j) to infer links?

... the above is the first exercise!

You should determine and collect data for: Departments (e.g. 5),attributes (e.g. 4-5), and determine a suitable sim(i , j).You may use a software (e.g. pajek) to draw the network.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 47: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 1: Profile of Department web-sites

Example

Consider as network an ensemble of Departments of some sort(e.g. of the same University, discipline, country etc).The interest is in studying the quality / strength / similarity of theweb-profiles of the Departments.Node: a Dept web-site, assigned with a number of attributes:

Staff members having their home-page linked to theDepartment web-pages (e.g. given as percentage).

Number of announcements within the last month.

Other ???

What is an appropriate sim(i , j) to infer links?

... the above is the first exercise!You should determine and collect data for: Departments (e.g. 5),attributes (e.g. 4-5), and determine a suitable sim(i , j).

You may use a software (e.g. pajek) to draw the network.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 48: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 1: Profile of Department web-sites

Example

Consider as network an ensemble of Departments of some sort(e.g. of the same University, discipline, country etc).The interest is in studying the quality / strength / similarity of theweb-profiles of the Departments.Node: a Dept web-site, assigned with a number of attributes:

Staff members having their home-page linked to theDepartment web-pages (e.g. given as percentage).

Number of announcements within the last month.

Other ???

What is an appropriate sim(i , j) to infer links?

... the above is the first exercise!You should determine and collect data for: Departments (e.g. 5),attributes (e.g. 4-5), and determine a suitable sim(i , j).You may use a software (e.g. pajek) to draw the network.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 49: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network see [2], Sec 7.3.1

For each node i , an attribute Xi is assigned that is considered as acontinuous random variable.

For each Xi there are n observations: xi = [xi1, . . . , xin]′.

A similarity measure sim(i , j) quantifies the level of correlationbetween Xi and Xj .

A standard similarity measure is the Pearson correlation coefficientCorr(X ,Y ) = rXY = sXY

sX sYsXY : sample covariance of X and Y , sX : sample SD of X

Example

Gene Regulation from Microarray Data: Patterns of regulatoryinteractions among genes can be described by networks.Node: the geneXi : relative level of RNA expression of the gene i in a cell.xi : Microarray measurements of the RNA level at n experiments(different conditions).Link: regulatory relationship, sim(i , j) := Corr(Xi ,Xj) = rXi ,Xj

= rij

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 50: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network see [2], Sec 7.3.1

For each node i , an attribute Xi is assigned that is considered as acontinuous random variable.

For each Xi there are n observations: xi = [xi1, . . . , xin]′.

A similarity measure sim(i , j) quantifies the level of correlationbetween Xi and Xj .

A standard similarity measure is the Pearson correlation coefficientCorr(X ,Y ) = rXY = sXY

sX sYsXY : sample covariance of X and Y , sX : sample SD of X

Example

Gene Regulation from Microarray Data: Patterns of regulatoryinteractions among genes can be described by networks.Node: the geneXi : relative level of RNA expression of the gene i in a cell.xi : Microarray measurements of the RNA level at n experiments(different conditions).Link: regulatory relationship, sim(i , j) := Corr(Xi ,Xj) = rXi ,Xj

= rij

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 51: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network see [2], Sec 7.3.1

For each node i , an attribute Xi is assigned that is considered as acontinuous random variable.

For each Xi there are n observations: xi = [xi1, . . . , xin]′.

A similarity measure sim(i , j) quantifies the level of correlationbetween Xi and Xj .

A standard similarity measure is the Pearson correlation coefficientCorr(X ,Y ) = rXY = sXY

sX sYsXY : sample covariance of X and Y , sX : sample SD of X

Example

Gene Regulation from Microarray Data: Patterns of regulatoryinteractions among genes can be described by networks.Node: the geneXi : relative level of RNA expression of the gene i in a cell.xi : Microarray measurements of the RNA level at n experiments(different conditions).Link: regulatory relationship, sim(i , j) := Corr(Xi ,Xj) = rXi ,Xj

= rij

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 52: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network see [2], Sec 7.3.1

For each node i , an attribute Xi is assigned that is considered as acontinuous random variable.

For each Xi there are n observations: xi = [xi1, . . . , xin]′.

A similarity measure sim(i , j) quantifies the level of correlationbetween Xi and Xj .

A standard similarity measure is the Pearson correlation coefficientCorr(X ,Y ) = rXY = sXY

sX sYsXY : sample covariance of X and Y , sX : sample SD of X

Example

Gene Regulation from Microarray Data: Patterns of regulatoryinteractions among genes can be described by networks.Node: the geneXi : relative level of RNA expression of the gene i in a cell.xi : Microarray measurements of the RNA level at n experiments(different conditions).Link: regulatory relationship, sim(i , j) := Corr(Xi ,Xj) = rXi ,Xj

= rij

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 53: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network see [2], Sec 7.3.1

For each node i , an attribute Xi is assigned that is considered as acontinuous random variable.

For each Xi there are n observations: xi = [xi1, . . . , xin]′.

A similarity measure sim(i , j) quantifies the level of correlationbetween Xi and Xj .

A standard similarity measure is the Pearson correlation coefficientCorr(X ,Y ) = rXY = sXY

sX sYsXY : sample covariance of X and Y , sX : sample SD of X

Example

Gene Regulation from Microarray Data: Patterns of regulatoryinteractions among genes can be described by networks.

Node: the geneXi : relative level of RNA expression of the gene i in a cell.xi : Microarray measurements of the RNA level at n experiments(different conditions).Link: regulatory relationship, sim(i , j) := Corr(Xi ,Xj) = rXi ,Xj

= rij

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 54: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network see [2], Sec 7.3.1

For each node i , an attribute Xi is assigned that is considered as acontinuous random variable.

For each Xi there are n observations: xi = [xi1, . . . , xin]′.

A similarity measure sim(i , j) quantifies the level of correlationbetween Xi and Xj .

A standard similarity measure is the Pearson correlation coefficientCorr(X ,Y ) = rXY = sXY

sX sYsXY : sample covariance of X and Y , sX : sample SD of X

Example

Gene Regulation from Microarray Data: Patterns of regulatoryinteractions among genes can be described by networks.Node: the gene

Xi : relative level of RNA expression of the gene i in a cell.xi : Microarray measurements of the RNA level at n experiments(different conditions).Link: regulatory relationship, sim(i , j) := Corr(Xi ,Xj) = rXi ,Xj

= rij

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 55: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network see [2], Sec 7.3.1

For each node i , an attribute Xi is assigned that is considered as acontinuous random variable.

For each Xi there are n observations: xi = [xi1, . . . , xin]′.

A similarity measure sim(i , j) quantifies the level of correlationbetween Xi and Xj .

A standard similarity measure is the Pearson correlation coefficientCorr(X ,Y ) = rXY = sXY

sX sYsXY : sample covariance of X and Y , sX : sample SD of X

Example

Gene Regulation from Microarray Data: Patterns of regulatoryinteractions among genes can be described by networks.Node: the geneXi : relative level of RNA expression of the gene i in a cell.

xi : Microarray measurements of the RNA level at n experiments(different conditions).Link: regulatory relationship, sim(i , j) := Corr(Xi ,Xj) = rXi ,Xj

= rij

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 56: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network see [2], Sec 7.3.1

For each node i , an attribute Xi is assigned that is considered as acontinuous random variable.

For each Xi there are n observations: xi = [xi1, . . . , xin]′.

A similarity measure sim(i , j) quantifies the level of correlationbetween Xi and Xj .

A standard similarity measure is the Pearson correlation coefficientCorr(X ,Y ) = rXY = sXY

sX sYsXY : sample covariance of X and Y , sX : sample SD of X

Example

Gene Regulation from Microarray Data: Patterns of regulatoryinteractions among genes can be described by networks.Node: the geneXi : relative level of RNA expression of the gene i in a cell.xi : Microarray measurements of the RNA level at n experiments(different conditions).

Link: regulatory relationship, sim(i , j) := Corr(Xi ,Xj) = rXi ,Xj= rij

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 57: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network see [2], Sec 7.3.1

For each node i , an attribute Xi is assigned that is considered as acontinuous random variable.

For each Xi there are n observations: xi = [xi1, . . . , xin]′.

A similarity measure sim(i , j) quantifies the level of correlationbetween Xi and Xj .

A standard similarity measure is the Pearson correlation coefficientCorr(X ,Y ) = rXY = sXY

sX sYsXY : sample covariance of X and Y , sX : sample SD of X

Example

Gene Regulation from Microarray Data: Patterns of regulatoryinteractions among genes can be described by networks.Node: the geneXi : relative level of RNA expression of the gene i in a cell.xi : Microarray measurements of the RNA level at n experiments(different conditions).Link: regulatory relationship, sim(i , j) := Corr(Xi ,Xj) = rXi ,Xj

= rij

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 58: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation from Microarray Data

Data from: http://m3d.bu.edu/cgi-bin/web/array/index.pl see [2], Sec 7.3.1

Three genes: lrp, aroG, tyrR, and 41 experiments.

Which links are “non-trivial”?Significance test for correlation coefficient?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 59: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation from Microarray Data

Data from: http://m3d.bu.edu/cgi-bin/web/array/index.pl see [2], Sec 7.3.1

Three genes: lrp, aroG, tyrR, and 41 experiments.

Which links are “non-trivial”?Significance test for correlation coefficient?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 60: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation from Microarray Data

Data from: http://m3d.bu.edu/cgi-bin/web/array/index.pl see [2], Sec 7.3.1

Three genes: lrp, aroG, tyrR, and 41 experiments.

Which links are “non-trivial”?Significance test for correlation coefficient?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 61: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation from Microarray Data

Data from: http://m3d.bu.edu/cgi-bin/web/array/index.pl see [2], Sec 7.3.1

Three genes: lrp, aroG, tyrR, and 41 experiments.

Which links are “non-trivial”?

Significance test for correlation coefficient?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 62: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation from Microarray Data

Data from: http://m3d.bu.edu/cgi-bin/web/array/index.pl see [2], Sec 7.3.1

Three genes: lrp, aroG, tyrR, and 41 experiments.

Which links are “non-trivial”?Significance test for correlation coefficient?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 63: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of correlation (parametric test)

Let ρXi ,Xj= ρij be the true Pearson correlation coefficient of Xi

and Xj .

Hypothesis test for significance:H0 : ρij = 0, H1 : ρij 6= 0.

Estimate of ρij : rij =sijsi sj

.

Parametric testing, assuming (Xi ,Xj) ∼ N([µi , µj ], [σ2i , σ

2j ], ρij):

Test statistic:

t =rij√n−2√

1−r2ij∼ tn−2, or

z = tanh−1(rij) = 12 log

[1+r2ij1−r2ij

]∼ N(0, 1

n−3)

Test all pairs at the significance level α? Multiple testing?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 64: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of correlation (parametric test)

Let ρXi ,Xj= ρij be the true Pearson correlation coefficient of Xi

and Xj .

Hypothesis test for significance:H0 : ρij = 0, H1 : ρij 6= 0.

Estimate of ρij : rij =sijsi sj

.

Parametric testing, assuming (Xi ,Xj) ∼ N([µi , µj ], [σ2i , σ

2j ], ρij):

Test statistic:

t =rij√n−2√

1−r2ij∼ tn−2, or

z = tanh−1(rij) = 12 log

[1+r2ij1−r2ij

]∼ N(0, 1

n−3)

Test all pairs at the significance level α? Multiple testing?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 65: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of correlation (parametric test)

Let ρXi ,Xj= ρij be the true Pearson correlation coefficient of Xi

and Xj .

Hypothesis test for significance:H0 : ρij = 0, H1 : ρij 6= 0.

Estimate of ρij : rij =sijsi sj

.

Parametric testing, assuming (Xi ,Xj) ∼ N([µi , µj ], [σ2i , σ

2j ], ρij):

Test statistic:

t =rij√n−2√

1−r2ij∼ tn−2, or

z = tanh−1(rij) = 12 log

[1+r2ij1−r2ij

]∼ N(0, 1

n−3)

Test all pairs at the significance level α? Multiple testing?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 66: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of correlation (parametric test)

Let ρXi ,Xj= ρij be the true Pearson correlation coefficient of Xi

and Xj .

Hypothesis test for significance:H0 : ρij = 0, H1 : ρij 6= 0.

Estimate of ρij : rij =sijsi sj

.

Parametric testing, assuming (Xi ,Xj) ∼ N([µi , µj ], [σ2i , σ

2j ], ρij):

Test statistic:

t =rij√n−2√

1−r2ij∼ tn−2, or

z = tanh−1(rij) = 12 log

[1+r2ij1−r2ij

]∼ N(0, 1

n−3)

Test all pairs at the significance level α? Multiple testing?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 67: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of correlation (parametric test)

Let ρXi ,Xj= ρij be the true Pearson correlation coefficient of Xi

and Xj .

Hypothesis test for significance:H0 : ρij = 0, H1 : ρij 6= 0.

Estimate of ρij : rij =sijsi sj

.

Parametric testing, assuming (Xi ,Xj) ∼ N([µi , µj ], [σ2i , σ

2j ], ρij):

Test statistic:

t =rij√n−2√

1−r2ij∼ tn−2, or

z = tanh−1(rij) = 12 log

[1+r2ij1−r2ij

]∼ N(0, 1

n−3)

Test all pairs at the significance level α? Multiple testing?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 68: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of correlation (parametric test)

Let ρXi ,Xj= ρij be the true Pearson correlation coefficient of Xi

and Xj .

Hypothesis test for significance:H0 : ρij = 0, H1 : ρij 6= 0.

Estimate of ρij : rij =sijsi sj

.

Parametric testing, assuming (Xi ,Xj) ∼ N([µi , µj ], [σ2i , σ

2j ], ρij):

Test statistic:

t =rij√n−2√

1−r2ij∼ tn−2, or

z = tanh−1(rij) = 12 log

[1+r2ij1−r2ij

]∼ N(0, 1

n−3)

Test all pairs at the significance level α? Multiple testing?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 69: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Parametric test for the significance of the correlation for the genes:lrp, aroG, tyrR, n = 41.

gene pair rij t-statistic (p-value) z-statistic (p-value)

lrp-aroG 0.78 7.79 (0.0000) 6.48 (0.0000)lrp-tyrR -0.36 -2.41 (0.0208) -2.32 (0.0202)aroG-tyrR -0.21 -1.36 (0.1929) -1.30 (0.1942)

Does significance level α = 0.01 establishes “non-trivial” links?Does (Xi ,Xj) ∼ N([µi , µj ], [σ

2i , σ

2j ], ρij) hold?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 70: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Parametric test for the significance of the correlation for the genes:lrp, aroG, tyrR, n = 41.

gene pair rij t-statistic (p-value) z-statistic (p-value)

lrp-aroG 0.78 7.79 (0.0000) 6.48 (0.0000)lrp-tyrR -0.36 -2.41 (0.0208) -2.32 (0.0202)aroG-tyrR -0.21 -1.36 (0.1929) -1.30 (0.1942)

Does significance level α = 0.01 establishes “non-trivial” links?Does (Xi ,Xj) ∼ N([µi , µj ], [σ

2i , σ

2j ], ρij) hold?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 71: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Parametric test for the significance of the correlation for the genes:lrp, aroG, tyrR, n = 41.

gene pair rij t-statistic (p-value) z-statistic (p-value)

lrp-aroG 0.78 7.79 (0.0000) 6.48 (0.0000)lrp-tyrR -0.36 -2.41 (0.0208) -2.32 (0.0202)aroG-tyrR -0.21 -1.36 (0.1929) -1.30 (0.1942)

Does significance level α = 0.01 establishes “non-trivial” links?Does (Xi ,Xj) ∼ N([µi , µj ], [σ

2i , σ

2j ], ρij) hold?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 72: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Parametric test for the significance of the correlation for the genes:lrp, aroG, tyrR, n = 41.

gene pair rij t-statistic (p-value) z-statistic (p-value)

lrp-aroG 0.78 7.79 (0.0000) 6.48 (0.0000)lrp-tyrR -0.36 -2.41 (0.0208) -2.32 (0.0202)aroG-tyrR -0.21 -1.36 (0.1929) -1.30 (0.1942)

Does significance level α = 0.01 establishes “non-trivial” links?Does (Xi ,Xj) ∼ N([µi , µj ], [σ

2i , σ

2j ], ρij) hold?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 73: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Parametric test for the significance of the correlation for the genes:lrp, aroG, tyrR, n = 41.

gene pair rij t-statistic (p-value) z-statistic (p-value)

lrp-aroG 0.78 7.79 (0.0000) 6.48 (0.0000)lrp-tyrR -0.36 -2.41 (0.0208) -2.32 (0.0202)aroG-tyrR -0.21 -1.36 (0.1929) -1.30 (0.1942)

Does significance level α = 0.01 establishes “non-trivial” links?

Does (Xi ,Xj) ∼ N([µi , µj ], [σ2i , σ

2j ], ρij) hold?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 74: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Parametric test for the significance of the correlation for the genes:lrp, aroG, tyrR, n = 41.

gene pair rij t-statistic (p-value) z-statistic (p-value)

lrp-aroG 0.78 7.79 (0.0000) 6.48 (0.0000)lrp-tyrR -0.36 -2.41 (0.0208) -2.32 (0.0202)aroG-tyrR -0.21 -1.36 (0.1929) -1.30 (0.1942)

Does significance level α = 0.01 establishes “non-trivial” links?Does (Xi ,Xj) ∼ N([µi , µj ], [σ

2i , σ

2j ], ρij) hold?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 75: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Does Xi ∼ N(µi , σ2i ) hold?

The results of the parametric testing are called into question!

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 76: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Does Xi ∼ N(µi , σ2i ) hold?

The results of the parametric testing are called into question!

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 77: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Does Xi ∼ N(µi , σ2i ) hold?

The results of the parametric testing are called into question!

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 78: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of correlation (nonparametric test)

Nonparametric testing: draw the null distribution of rij fromresampled pairs consistent to H0 : ρij = 0.

1 For an “original” pair (xi , xj), generate B randomized samplepairs (x∗bi , x

∗bj ), b = 1, . . . ,B. Generation of each b pair:

Let first variable intact, x∗bi = xi .Shuffle randomly the samples of the other variable to get x∗bj .

Each sample pair (x∗bi , x∗bj ), b = 1, . . . ,B is from (Xi ,Xj)

under the hypothesis of independence (x∗bi preserves themarginal distribution of xi , the same for j).

2 Compute r∗bij on each pair (x∗bi , x∗bj ), b = 1, . . . ,B.

The ensemble {r∗bij }Bb=1 forms the empirical null distributionof rij .

3 Reject H0 if sample rij is not in the distribution of {r∗bij }Bb=1

(using rank ordering).

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 79: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of correlation (nonparametric test)

Nonparametric testing: draw the null distribution of rij fromresampled pairs consistent to H0 : ρij = 0.

1 For an “original” pair (xi , xj), generate B randomized samplepairs (x∗bi , x

∗bj ), b = 1, . . . ,B. Generation of each b pair:

Let first variable intact, x∗bi = xi .Shuffle randomly the samples of the other variable to get x∗bj .

Each sample pair (x∗bi , x∗bj ), b = 1, . . . ,B is from (Xi ,Xj)

under the hypothesis of independence (x∗bi preserves themarginal distribution of xi , the same for j).

2 Compute r∗bij on each pair (x∗bi , x∗bj ), b = 1, . . . ,B.

The ensemble {r∗bij }Bb=1 forms the empirical null distributionof rij .

3 Reject H0 if sample rij is not in the distribution of {r∗bij }Bb=1

(using rank ordering).

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 80: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of correlation (nonparametric test)

Nonparametric testing: draw the null distribution of rij fromresampled pairs consistent to H0 : ρij = 0.

1 For an “original” pair (xi , xj), generate B randomized samplepairs (x∗bi , x

∗bj ), b = 1, . . . ,B. Generation of each b pair:

Let first variable intact, x∗bi = xi .

Shuffle randomly the samples of the other variable to get x∗bj .

Each sample pair (x∗bi , x∗bj ), b = 1, . . . ,B is from (Xi ,Xj)

under the hypothesis of independence (x∗bi preserves themarginal distribution of xi , the same for j).

2 Compute r∗bij on each pair (x∗bi , x∗bj ), b = 1, . . . ,B.

The ensemble {r∗bij }Bb=1 forms the empirical null distributionof rij .

3 Reject H0 if sample rij is not in the distribution of {r∗bij }Bb=1

(using rank ordering).

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 81: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of correlation (nonparametric test)

Nonparametric testing: draw the null distribution of rij fromresampled pairs consistent to H0 : ρij = 0.

1 For an “original” pair (xi , xj), generate B randomized samplepairs (x∗bi , x

∗bj ), b = 1, . . . ,B. Generation of each b pair:

Let first variable intact, x∗bi = xi .Shuffle randomly the samples of the other variable to get x∗bj .

Each sample pair (x∗bi , x∗bj ), b = 1, . . . ,B is from (Xi ,Xj)

under the hypothesis of independence (x∗bi preserves themarginal distribution of xi , the same for j).

2 Compute r∗bij on each pair (x∗bi , x∗bj ), b = 1, . . . ,B.

The ensemble {r∗bij }Bb=1 forms the empirical null distributionof rij .

3 Reject H0 if sample rij is not in the distribution of {r∗bij }Bb=1

(using rank ordering).

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 82: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of correlation (nonparametric test)

Nonparametric testing: draw the null distribution of rij fromresampled pairs consistent to H0 : ρij = 0.

1 For an “original” pair (xi , xj), generate B randomized samplepairs (x∗bi , x

∗bj ), b = 1, . . . ,B. Generation of each b pair:

Let first variable intact, x∗bi = xi .Shuffle randomly the samples of the other variable to get x∗bj .

Each sample pair (x∗bi , x∗bj ), b = 1, . . . ,B is from (Xi ,Xj)

under the hypothesis of independence (x∗bi preserves themarginal distribution of xi , the same for j).

2 Compute r∗bij on each pair (x∗bi , x∗bj ), b = 1, . . . ,B.

The ensemble {r∗bij }Bb=1 forms the empirical null distributionof rij .

3 Reject H0 if sample rij is not in the distribution of {r∗bij }Bb=1

(using rank ordering).

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 83: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of correlation (nonparametric test)

Nonparametric testing: draw the null distribution of rij fromresampled pairs consistent to H0 : ρij = 0.

1 For an “original” pair (xi , xj), generate B randomized samplepairs (x∗bi , x

∗bj ), b = 1, . . . ,B. Generation of each b pair:

Let first variable intact, x∗bi = xi .Shuffle randomly the samples of the other variable to get x∗bj .

Each sample pair (x∗bi , x∗bj ), b = 1, . . . ,B is from (Xi ,Xj)

under the hypothesis of independence (x∗bi preserves themarginal distribution of xi , the same for j).

2 Compute r∗bij on each pair (x∗bi , x∗bj ), b = 1, . . . ,B.

The ensemble {r∗bij }Bb=1 forms the empirical null distributionof rij .

3 Reject H0 if sample rij is not in the distribution of {r∗bij }Bb=1

(using rank ordering).

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 84: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of correlation (nonparametric test)

Nonparametric testing: draw the null distribution of rij fromresampled pairs consistent to H0 : ρij = 0.

1 For an “original” pair (xi , xj), generate B randomized samplepairs (x∗bi , x

∗bj ), b = 1, . . . ,B. Generation of each b pair:

Let first variable intact, x∗bi = xi .Shuffle randomly the samples of the other variable to get x∗bj .

Each sample pair (x∗bi , x∗bj ), b = 1, . . . ,B is from (Xi ,Xj)

under the hypothesis of independence (x∗bi preserves themarginal distribution of xi , the same for j).

2 Compute r∗bij on each pair (x∗bi , x∗bj ), b = 1, . . . ,B.

The ensemble {r∗bij }Bb=1 forms the empirical null distributionof rij .

3 Reject H0 if sample rij is not in the distribution of {r∗bij }Bb=1

(using rank ordering).

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 85: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of correlation (nonparametric test)

Nonparametric testing: draw the null distribution of rij fromresampled pairs consistent to H0 : ρij = 0.

1 For an “original” pair (xi , xj), generate B randomized samplepairs (x∗bi , x

∗bj ), b = 1, . . . ,B. Generation of each b pair:

Let first variable intact, x∗bi = xi .Shuffle randomly the samples of the other variable to get x∗bj .

Each sample pair (x∗bi , x∗bj ), b = 1, . . . ,B is from (Xi ,Xj)

under the hypothesis of independence (x∗bi preserves themarginal distribution of xi , the same for j).

2 Compute r∗bij on each pair (x∗bi , x∗bj ), b = 1, . . . ,B.

The ensemble {r∗bij }Bb=1 forms the empirical null distributionof rij .

3 Reject H0 if sample rij is not in the distribution of {r∗bij }Bb=1

(using rank ordering).

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 86: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Nonparametric testing, 1000 randomized samples.

gene pair rij t-statistic (p-value) rank (p-value)

lrp-aroG 0.78 6.48 (0.0000) 1001 (0.0013)lrp-tyrR -0.36 -2.32 (0.0202) 17 (0.0333)aroG-tyrR -0.21 -1.30 (0.1942) 99 (0.1971)

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 87: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Nonparametric testing, 1000 randomized samples.

gene pair rij t-statistic (p-value) rank (p-value)

lrp-aroG 0.78 6.48 (0.0000) 1001 (0.0013)lrp-tyrR -0.36 -2.32 (0.0202) 17 (0.0333)aroG-tyrR -0.21 -1.30 (0.1942) 99 (0.1971)

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 88: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Nonparametric testing, 1000 randomized samples.

gene pair rij t-statistic (p-value) rank (p-value)

lrp-aroG 0.78 6.48 (0.0000) 1001 (0.0013)lrp-tyrR -0.36 -2.32 (0.0202) 17 (0.0333)aroG-tyrR -0.21 -1.30 (0.1942) 99 (0.1971)

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 89: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Nonparametric testing, 1000 randomized samples.

gene pair rij t-statistic (p-value) rank (p-value)

lrp-aroG 0.78 6.48 (0.0000) 1001 (0.0013)lrp-tyrR -0.36 -2.32 (0.0202) 17 (0.0333)aroG-tyrR -0.21 -1.30 (0.1942) 99 (0.1971)

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 90: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation coefficient and correlation matrix

The correlation coefficients rij , i , j = 1, . . . ,N form a correlationmatrix (positive semidefinite)

Establishing the statistically significant rij , i , j = 1, . . . ,N, thecorrelation matrix is converted to the adjacency matrix.

Example

Correlation for the genes: lrp, aroG, tyrR

gene rijlrp-aroG 0.78lrp-tyrR −0.36aroG-tyrR −0.21

−→ R =

1 0.78 −0.360.78 1 −0.21−0.36 −0.21 1

R =

1 0.78 −0.360.78 1 −0.21−0.36 −0.21 1

−→ R =

0 1 01 0 00 0 0

Are the link(s) found statistically significant also “non-trivial”?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 91: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation coefficient and correlation matrix

The correlation coefficients rij , i , j = 1, . . . ,N form a correlationmatrix (positive semidefinite)

Establishing the statistically significant rij , i , j = 1, . . . ,N, thecorrelation matrix is converted to the adjacency matrix.

Example

Correlation for the genes: lrp, aroG, tyrR

gene rijlrp-aroG 0.78lrp-tyrR −0.36aroG-tyrR −0.21

−→ R =

1 0.78 −0.360.78 1 −0.21−0.36 −0.21 1

R =

1 0.78 −0.360.78 1 −0.21−0.36 −0.21 1

−→ R =

0 1 01 0 00 0 0

Are the link(s) found statistically significant also “non-trivial”?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 92: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation coefficient and correlation matrix

The correlation coefficients rij , i , j = 1, . . . ,N form a correlationmatrix (positive semidefinite)

Establishing the statistically significant rij , i , j = 1, . . . ,N, thecorrelation matrix is converted to the adjacency matrix.

Example

Correlation for the genes: lrp, aroG, tyrR

gene rijlrp-aroG 0.78lrp-tyrR −0.36aroG-tyrR −0.21

−→ R =

1 0.78 −0.360.78 1 −0.21−0.36 −0.21 1

R =

1 0.78 −0.360.78 1 −0.21−0.36 −0.21 1

−→ R =

0 1 01 0 00 0 0

Are the link(s) found statistically significant also “non-trivial”?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 93: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation coefficient and correlation matrix

The correlation coefficients rij , i , j = 1, . . . ,N form a correlationmatrix (positive semidefinite)

Establishing the statistically significant rij , i , j = 1, . . . ,N, thecorrelation matrix is converted to the adjacency matrix.

Example

Correlation for the genes: lrp, aroG, tyrR

gene rijlrp-aroG 0.78lrp-tyrR −0.36aroG-tyrR −0.21

−→ R =

1 0.78 −0.360.78 1 −0.21−0.36 −0.21 1

R =

1 0.78 −0.360.78 1 −0.21−0.36 −0.21 1

−→ R =

0 1 01 0 00 0 0

Are the link(s) found statistically significant also “non-trivial”?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 94: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 2: Micro-array Data

Use any subset (3 or more) of the genes in file Ecoliv4Build6ex1

(in ascii or excel format, see course web-page).Using the 41 experiments for each gene, form the correlationnetwork for the selected genes.

You should:

1 Use the correlation coefficient rij as similarity measure of twogenes i and j .

2 Use parametric and nonparametric test of significance for eachcorrelation coefficient rij .

3 Identify whether the significant links are the same withparametric and nonparametric testing.

4 Form the networks from significant links from each test.

matlab:

for the Pearson correlation coefficient you may use the function corrcoef

for random shuffling you may use the function randperm

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 95: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 2: Micro-array Data

Use any subset (3 or more) of the genes in file Ecoliv4Build6ex1

(in ascii or excel format, see course web-page).Using the 41 experiments for each gene, form the correlationnetwork for the selected genes.You should:

1 Use the correlation coefficient rij as similarity measure of twogenes i and j .

2 Use parametric and nonparametric test of significance for eachcorrelation coefficient rij .

3 Identify whether the significant links are the same withparametric and nonparametric testing.

4 Form the networks from significant links from each test.

matlab:

for the Pearson correlation coefficient you may use the function corrcoef

for random shuffling you may use the function randperm

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 96: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 2: Micro-array Data

Use any subset (3 or more) of the genes in file Ecoliv4Build6ex1

(in ascii or excel format, see course web-page).Using the 41 experiments for each gene, form the correlationnetwork for the selected genes.You should:

1 Use the correlation coefficient rij as similarity measure of twogenes i and j .

2 Use parametric and nonparametric test of significance for eachcorrelation coefficient rij .

3 Identify whether the significant links are the same withparametric and nonparametric testing.

4 Form the networks from significant links from each test.

matlab:

for the Pearson correlation coefficient you may use the function corrcoef

for random shuffling you may use the function randperm

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 97: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 2: Micro-array Data

Use any subset (3 or more) of the genes in file Ecoliv4Build6ex1

(in ascii or excel format, see course web-page).Using the 41 experiments for each gene, form the correlationnetwork for the selected genes.You should:

1 Use the correlation coefficient rij as similarity measure of twogenes i and j .

2 Use parametric and nonparametric test of significance for eachcorrelation coefficient rij .

3 Identify whether the significant links are the same withparametric and nonparametric testing.

4 Form the networks from significant links from each test.

matlab:

for the Pearson correlation coefficient you may use the function corrcoef

for random shuffling you may use the function randperm

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 98: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 2: Micro-array Data

Use any subset (3 or more) of the genes in file Ecoliv4Build6ex1

(in ascii or excel format, see course web-page).Using the 41 experiments for each gene, form the correlationnetwork for the selected genes.You should:

1 Use the correlation coefficient rij as similarity measure of twogenes i and j .

2 Use parametric and nonparametric test of significance for eachcorrelation coefficient rij .

3 Identify whether the significant links are the same withparametric and nonparametric testing.

4 Form the networks from significant links from each test.

matlab:

for the Pearson correlation coefficient you may use the function corrcoef

for random shuffling you may use the function randperm

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 99: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 2: Micro-array Data

Use any subset (3 or more) of the genes in file Ecoliv4Build6ex1

(in ascii or excel format, see course web-page).Using the 41 experiments for each gene, form the correlationnetwork for the selected genes.You should:

1 Use the correlation coefficient rij as similarity measure of twogenes i and j .

2 Use parametric and nonparametric test of significance for eachcorrelation coefficient rij .

3 Identify whether the significant links are the same withparametric and nonparametric testing.

4 Form the networks from significant links from each test.

matlab:

for the Pearson correlation coefficient you may use the function corrcoef

for random shuffling you may use the function randperm

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 100: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 2: Micro-array Data

Use any subset (3 or more) of the genes in file Ecoliv4Build6ex1

(in ascii or excel format, see course web-page).Using the 41 experiments for each gene, form the correlationnetwork for the selected genes.You should:

1 Use the correlation coefficient rij as similarity measure of twogenes i and j .

2 Use parametric and nonparametric test of significance for eachcorrelation coefficient rij .

3 Identify whether the significant links are the same withparametric and nonparametric testing.

4 Form the networks from significant links from each test.

matlab:

for the Pearson correlation coefficient you may use the function corrcoef

for random shuffling you may use the function randperm

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 101: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Partial Correlation Network see [2], Sec 7.3.1

If Xi and Xj are found to have a large rij :

1 There is direct dependence of Xi on Xj , or of Xj on Xi , orboth.

2 Both Xi and Xj are dependent on an other variable (node) Xk

or on m other variables (nodes) XK = {Xk1 , . . . ,Xkm}, whereK = {k1, . . . , km}.

Case 2 may be considered as ’trivial’ correlation and may notsuggest a link (i , j).

To maintain links of only direct dependence, the appropriatesimilarity measure is the partial correlation

ρij |K =σij |K

σii |Kσjj |K

ρij |K = 0 if Xi and Xj are independent, conditional to XK .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 102: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Partial Correlation Network see [2], Sec 7.3.1

If Xi and Xj are found to have a large rij :

1 There is direct dependence of Xi on Xj , or of Xj on Xi , orboth.

2 Both Xi and Xj are dependent on an other variable (node) Xk

or on m other variables (nodes) XK = {Xk1 , . . . ,Xkm}, whereK = {k1, . . . , km}.

Case 2 may be considered as ’trivial’ correlation and may notsuggest a link (i , j).

To maintain links of only direct dependence, the appropriatesimilarity measure is the partial correlation

ρij |K =σij |K

σii |Kσjj |K

ρij |K = 0 if Xi and Xj are independent, conditional to XK .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 103: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Partial Correlation Network see [2], Sec 7.3.1

If Xi and Xj are found to have a large rij :

1 There is direct dependence of Xi on Xj , or of Xj on Xi , orboth.

2 Both Xi and Xj are dependent on an other variable (node) Xk

or on m other variables (nodes) XK = {Xk1 , . . . ,Xkm}, whereK = {k1, . . . , km}.

Case 2 may be considered as ’trivial’ correlation and may notsuggest a link (i , j).

To maintain links of only direct dependence, the appropriatesimilarity measure is the partial correlation

ρij |K =σij |K

σii |Kσjj |K

ρij |K = 0 if Xi and Xj are independent, conditional to XK .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 104: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Partial Correlation Network see [2], Sec 7.3.1

If Xi and Xj are found to have a large rij :

1 There is direct dependence of Xi on Xj , or of Xj on Xi , orboth.

2 Both Xi and Xj are dependent on an other variable (node) Xk

or on m other variables (nodes) XK = {Xk1 , . . . ,Xkm}, whereK = {k1, . . . , km}.

Case 2 may be considered as ’trivial’ correlation and may notsuggest a link (i , j).

To maintain links of only direct dependence, the appropriatesimilarity measure is the partial correlation

ρij |K =σij |K

σii |Kσjj |K

ρij |K = 0 if Xi and Xj are independent, conditional to XK .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 105: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Partial Correlation Network see [2], Sec 7.3.1

If Xi and Xj are found to have a large rij :

1 There is direct dependence of Xi on Xj , or of Xj on Xi , orboth.

2 Both Xi and Xj are dependent on an other variable (node) Xk

or on m other variables (nodes) XK = {Xk1 , . . . ,Xkm}, whereK = {k1, . . . , km}.

Case 2 may be considered as ’trivial’ correlation and may notsuggest a link (i , j).

To maintain links of only direct dependence, the appropriatesimilarity measure is the partial correlation

ρij |K =σij |K

σii |Kσjj |K

ρij |K = 0 if Xi and Xj are independent, conditional to XK .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 106: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Partial variance / covariance

σij |K , σii |K and σjj |K are components of the 2× 2 partialcovariance matrix

Σ11|2 =

[s2ii |K sij |Ksij |K s2jj |K

]

Σ11|2 is defined as

Σ11|2 = Σ11 − Σ12Σ−122 Σ21

The matrices Σ11, Σ12, Σ22 and Σ21 are components of thepartitioned covariance matrix

Cov(W) =

[Σ11 Σ12

Σ21 Σ22

]of all involved variables partitioned as W = [W1,W2]′, andW1 = [Xi ,Xj ]

′, W2 = XK .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 107: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Partial variance / covariance

σij |K , σii |K and σjj |K are components of the 2× 2 partialcovariance matrix

Σ11|2 =

[s2ii |K sij |Ksij |K s2jj |K

]

Σ11|2 is defined as

Σ11|2 = Σ11 − Σ12Σ−122 Σ21

The matrices Σ11, Σ12, Σ22 and Σ21 are components of thepartitioned covariance matrix

Cov(W) =

[Σ11 Σ12

Σ21 Σ22

]of all involved variables partitioned as W = [W1,W2]′, andW1 = [Xi ,Xj ]

′, W2 = XK .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 108: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Partial variance / covariance

σij |K , σii |K and σjj |K are components of the 2× 2 partialcovariance matrix

Σ11|2 =

[s2ii |K sij |Ksij |K s2jj |K

]

Σ11|2 is defined as

Σ11|2 = Σ11 − Σ12Σ−122 Σ21

The matrices Σ11, Σ12, Σ22 and Σ21 are components of thepartitioned covariance matrix

Cov(W) =

[Σ11 Σ12

Σ21 Σ22

]of all involved variables partitioned as W = [W1,W2]′, andW1 = [Xi ,Xj ]

′, W2 = XK .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 109: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of Partial Correlation

How to select the variables (nodes), to which the correlationbetween Xi and Xj is to be conditioned on?

How many, that is what is m?

Which m variables from a total of N − 2 variables?

Let us suppose we have decided the set of variables to condition onXK = {Xk1 , . . . ,Xkm}.To assess for a “non-trivial” link, test for significance of ρij |K :H0 : ρij |K = 0, H1 : ρij |K 6= 0.

The estimate of ρij |K is the sample partial correlation rij |K .Given n observations xi = [xi1, . . . , xin]′ for each variable Xi , rij |K iscomputationally derived in these steps:

1 Compute the residuals ei of multiple linear regression of Xi onXK .

2 Similarly, compute the residuals ej of Xj on XK .

3 rij |K = rei ,ej , the correlation coefficient of ei and ej .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 110: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of Partial Correlation

How to select the variables (nodes), to which the correlationbetween Xi and Xj is to be conditioned on?

How many, that is what is m?

Which m variables from a total of N − 2 variables?

Let us suppose we have decided the set of variables to condition onXK = {Xk1 , . . . ,Xkm}.To assess for a “non-trivial” link, test for significance of ρij |K :H0 : ρij |K = 0, H1 : ρij |K 6= 0.

The estimate of ρij |K is the sample partial correlation rij |K .Given n observations xi = [xi1, . . . , xin]′ for each variable Xi , rij |K iscomputationally derived in these steps:

1 Compute the residuals ei of multiple linear regression of Xi onXK .

2 Similarly, compute the residuals ej of Xj on XK .

3 rij |K = rei ,ej , the correlation coefficient of ei and ej .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 111: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of Partial Correlation

How to select the variables (nodes), to which the correlationbetween Xi and Xj is to be conditioned on?

How many, that is what is m?

Which m variables from a total of N − 2 variables?

Let us suppose we have decided the set of variables to condition onXK = {Xk1 , . . . ,Xkm}.To assess for a “non-trivial” link, test for significance of ρij |K :H0 : ρij |K = 0, H1 : ρij |K 6= 0.

The estimate of ρij |K is the sample partial correlation rij |K .Given n observations xi = [xi1, . . . , xin]′ for each variable Xi , rij |K iscomputationally derived in these steps:

1 Compute the residuals ei of multiple linear regression of Xi onXK .

2 Similarly, compute the residuals ej of Xj on XK .

3 rij |K = rei ,ej , the correlation coefficient of ei and ej .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 112: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of Partial Correlation

How to select the variables (nodes), to which the correlationbetween Xi and Xj is to be conditioned on?

How many, that is what is m?

Which m variables from a total of N − 2 variables?

Let us suppose we have decided the set of variables to condition onXK = {Xk1 , . . . ,Xkm}.

To assess for a “non-trivial” link, test for significance of ρij |K :H0 : ρij |K = 0, H1 : ρij |K 6= 0.

The estimate of ρij |K is the sample partial correlation rij |K .Given n observations xi = [xi1, . . . , xin]′ for each variable Xi , rij |K iscomputationally derived in these steps:

1 Compute the residuals ei of multiple linear regression of Xi onXK .

2 Similarly, compute the residuals ej of Xj on XK .

3 rij |K = rei ,ej , the correlation coefficient of ei and ej .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 113: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of Partial Correlation

How to select the variables (nodes), to which the correlationbetween Xi and Xj is to be conditioned on?

How many, that is what is m?

Which m variables from a total of N − 2 variables?

Let us suppose we have decided the set of variables to condition onXK = {Xk1 , . . . ,Xkm}.To assess for a “non-trivial” link, test for significance of ρij |K :H0 : ρij |K = 0, H1 : ρij |K 6= 0.

The estimate of ρij |K is the sample partial correlation rij |K .Given n observations xi = [xi1, . . . , xin]′ for each variable Xi , rij |K iscomputationally derived in these steps:

1 Compute the residuals ei of multiple linear regression of Xi onXK .

2 Similarly, compute the residuals ej of Xj on XK .

3 rij |K = rei ,ej , the correlation coefficient of ei and ej .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 114: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of Partial Correlation

How to select the variables (nodes), to which the correlationbetween Xi and Xj is to be conditioned on?

How many, that is what is m?

Which m variables from a total of N − 2 variables?

Let us suppose we have decided the set of variables to condition onXK = {Xk1 , . . . ,Xkm}.To assess for a “non-trivial” link, test for significance of ρij |K :H0 : ρij |K = 0, H1 : ρij |K 6= 0.

The estimate of ρij |K is the sample partial correlation rij |K .

Given n observations xi = [xi1, . . . , xin]′ for each variable Xi , rij |K iscomputationally derived in these steps:

1 Compute the residuals ei of multiple linear regression of Xi onXK .

2 Similarly, compute the residuals ej of Xj on XK .

3 rij |K = rei ,ej , the correlation coefficient of ei and ej .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 115: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of Partial Correlation

How to select the variables (nodes), to which the correlationbetween Xi and Xj is to be conditioned on?

How many, that is what is m?

Which m variables from a total of N − 2 variables?

Let us suppose we have decided the set of variables to condition onXK = {Xk1 , . . . ,Xkm}.To assess for a “non-trivial” link, test for significance of ρij |K :H0 : ρij |K = 0, H1 : ρij |K 6= 0.

The estimate of ρij |K is the sample partial correlation rij |K .Given n observations xi = [xi1, . . . , xin]′ for each variable Xi , rij |K iscomputationally derived in these steps:

1 Compute the residuals ei of multiple linear regression of Xi onXK .

2 Similarly, compute the residuals ej of Xj on XK .

3 rij |K = rei ,ej , the correlation coefficient of ei and ej .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 116: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of Partial Correlation

How to select the variables (nodes), to which the correlationbetween Xi and Xj is to be conditioned on?

How many, that is what is m?

Which m variables from a total of N − 2 variables?

Let us suppose we have decided the set of variables to condition onXK = {Xk1 , . . . ,Xkm}.To assess for a “non-trivial” link, test for significance of ρij |K :H0 : ρij |K = 0, H1 : ρij |K 6= 0.

The estimate of ρij |K is the sample partial correlation rij |K .Given n observations xi = [xi1, . . . , xin]′ for each variable Xi , rij |K iscomputationally derived in these steps:

1 Compute the residuals ei of multiple linear regression of Xi onXK .

2 Similarly, compute the residuals ej of Xj on XK .

3 rij |K = rei ,ej , the correlation coefficient of ei and ej .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 117: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of Partial Correlation

How to select the variables (nodes), to which the correlationbetween Xi and Xj is to be conditioned on?

How many, that is what is m?

Which m variables from a total of N − 2 variables?

Let us suppose we have decided the set of variables to condition onXK = {Xk1 , . . . ,Xkm}.To assess for a “non-trivial” link, test for significance of ρij |K :H0 : ρij |K = 0, H1 : ρij |K 6= 0.

The estimate of ρij |K is the sample partial correlation rij |K .Given n observations xi = [xi1, . . . , xin]′ for each variable Xi , rij |K iscomputationally derived in these steps:

1 Compute the residuals ei of multiple linear regression of Xi onXK .

2 Similarly, compute the residuals ej of Xj on XK .

3 rij |K = rei ,ej , the correlation coefficient of ei and ej .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 118: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Significance of Partial Correlation

How to select the variables (nodes), to which the correlationbetween Xi and Xj is to be conditioned on?

How many, that is what is m?

Which m variables from a total of N − 2 variables?

Let us suppose we have decided the set of variables to condition onXK = {Xk1 , . . . ,Xkm}.To assess for a “non-trivial” link, test for significance of ρij |K :H0 : ρij |K = 0, H1 : ρij |K 6= 0.

The estimate of ρij |K is the sample partial correlation rij |K .Given n observations xi = [xi1, . . . , xin]′ for each variable Xi , rij |K iscomputationally derived in these steps:

1 Compute the residuals ei of multiple linear regression of Xi onXK .

2 Similarly, compute the residuals ej of Xj on XK .

3 rij |K = rei ,ej , the correlation coefficient of ei and ej .

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 119: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Partial correlation for the three genes: lrp, aroG, tyrR, and 41experiments.

gene pair rij rij |k z-statistic (p-value) rank (p-value)

lrp-aroG 0.78 0.77 6.25 (0.0000) 1001 (0.0013)lrp-tyrR -0.36 -0.32 -2.04 (0.0411) 23 (0.0453)aroG-tyrR -0.21 0.13 0.77 (0.4421) 765 (0.4727)

Only the partial correlation of aroG-tyrR is substantially differentfrom the correlation coefficient.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 120: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Partial correlation for the three genes: lrp, aroG, tyrR, and 41experiments.

gene pair rij rij |k z-statistic (p-value) rank (p-value)

lrp-aroG 0.78 0.77 6.25 (0.0000) 1001 (0.0013)lrp-tyrR -0.36 -0.32 -2.04 (0.0411) 23 (0.0453)aroG-tyrR -0.21 0.13 0.77 (0.4421) 765 (0.4727)

Only the partial correlation of aroG-tyrR is substantially differentfrom the correlation coefficient.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 121: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Partial correlation for the three genes: lrp, aroG, tyrR, and 41experiments.

gene pair rij rij |k z-statistic (p-value) rank (p-value)

lrp-aroG 0.78 0.77 6.25 (0.0000) 1001 (0.0013)lrp-tyrR -0.36 -0.32 -2.04 (0.0411) 23 (0.0453)aroG-tyrR -0.21 0.13 0.77 (0.4421) 765 (0.4727)

Only the partial correlation of aroG-tyrR is substantially differentfrom the correlation coefficient.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 122: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Gene Regulation (continuing)

Partial correlation for the three genes: lrp, aroG, tyrR, and 41experiments.

gene pair rij rij |k z-statistic (p-value) rank (p-value)

lrp-aroG 0.78 0.77 6.25 (0.0000) 1001 (0.0013)lrp-tyrR -0.36 -0.32 -2.04 (0.0411) 23 (0.0453)aroG-tyrR -0.21 0.13 0.77 (0.4421) 765 (0.4727)

Only the partial correlation of aroG-tyrR is substantially differentfrom the correlation coefficient.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 123: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 3: Micro-array Data

Do the same as in Exercise 2 but using the partial correlation assimilarity matrix.

matlab: for the partial correlation you may use the function parcorr (in the

Econometrics toolbox)

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 124: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Exercise 3: Micro-array Data

Do the same as in Exercise 2 but using the partial correlation assimilarity matrix.

matlab: for the partial correlation you may use the function parcorr (in the

Econometrics toolbox)

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 125: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network and Time Series

So far, the n measurements of attribute Xi are independent.

Further, we suppose that the n measurements may be ordered,typically being time dependent.

The vector xi = [xi1, . . . , xin]′ denotes a time series of Xi .

A similarity measure sim(i , j) quantifies the level of

correlation or coupling between Xi and Xj (undirected link)

causality from Xi and Xj , and vice versa (directed link).

A standard similarity measure is again Corr(Xi ,Xj) = rXi ,Yj.

Others ???

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 126: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network and Time Series

So far, the n measurements of attribute Xi are independent.

Further, we suppose that the n measurements may be ordered,typically being time dependent.

The vector xi = [xi1, . . . , xin]′ denotes a time series of Xi .

A similarity measure sim(i , j) quantifies the level of

correlation or coupling between Xi and Xj (undirected link)

causality from Xi and Xj , and vice versa (directed link).

A standard similarity measure is again Corr(Xi ,Xj) = rXi ,Yj.

Others ???

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 127: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network and Time Series

So far, the n measurements of attribute Xi are independent.

Further, we suppose that the n measurements may be ordered,typically being time dependent.

The vector xi = [xi1, . . . , xin]′ denotes a time series of Xi .

A similarity measure sim(i , j) quantifies the level of

correlation or coupling between Xi and Xj (undirected link)

causality from Xi and Xj , and vice versa (directed link).

A standard similarity measure is again Corr(Xi ,Xj) = rXi ,Yj.

Others ???

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 128: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network and Time Series

So far, the n measurements of attribute Xi are independent.

Further, we suppose that the n measurements may be ordered,typically being time dependent.

The vector xi = [xi1, . . . , xin]′ denotes a time series of Xi .

A similarity measure sim(i , j) quantifies the level of

correlation or coupling between Xi and Xj (undirected link)

causality from Xi and Xj , and vice versa (directed link).

A standard similarity measure is again Corr(Xi ,Xj) = rXi ,Yj.

Others ???

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 129: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network and Time Series

So far, the n measurements of attribute Xi are independent.

Further, we suppose that the n measurements may be ordered,typically being time dependent.

The vector xi = [xi1, . . . , xin]′ denotes a time series of Xi .

A similarity measure sim(i , j) quantifies the level of

correlation or coupling between Xi and Xj (undirected link)

causality from Xi and Xj , and vice versa (directed link).

A standard similarity measure is again Corr(Xi ,Xj) = rXi ,Yj.

Others ???

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 130: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network and Time Series

So far, the n measurements of attribute Xi are independent.

Further, we suppose that the n measurements may be ordered,typically being time dependent.

The vector xi = [xi1, . . . , xin]′ denotes a time series of Xi .

A similarity measure sim(i , j) quantifies the level of

correlation or coupling between Xi and Xj (undirected link)

causality from Xi and Xj , and vice versa (directed link).

A standard similarity measure is again Corr(Xi ,Xj) = rXi ,Yj.

Others ???

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 131: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Correlation Network and Time Series

So far, the n measurements of attribute Xi are independent.

Further, we suppose that the n measurements may be ordered,typically being time dependent.

The vector xi = [xi1, . . . , xin]′ denotes a time series of Xi .

A similarity measure sim(i , j) quantifies the level of

correlation or coupling between Xi and Xj (undirected link)

causality from Xi and Xj , and vice versa (directed link).

A standard similarity measure is again Corr(Xi ,Xj) = rXi ,Yj.

Others ???

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 132: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: World financial markets

N = 8 world stock markets, daily indices, n = 100 days.

Similar indices, links among world stock markets?

Can we use the same similarity measure as for time-independentobservations?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 133: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: World financial markets

N = 8 world stock markets, daily indices, n = 100 days.

Similar indices, links among world stock markets?

Can we use the same similarity measure as for time-independentobservations?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 134: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: World financial markets

N = 8 world stock markets, daily indices, n = 100 days.

Similar indices, links among world stock markets?

Can we use the same similarity measure as for time-independentobservations?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 135: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: World financial markets

N = 8 world stock markets, daily indices, n = 100 days.

Similar indices, links among world stock markets?

Can we use the same similarity measure as for time-independentobservations?

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 136: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: World financial markets, correlation coefficient

Upper triangular: sample correlation coefficient rij .Lower triangular: p-value for significance test for ρij (z-statistic)

USA AUS UK GER GRE MAL SAF CROUSA 0.86 0.92 0.88 0.89 0.33 0.27 0.75AUS 0 0.91 0.82 0.90 0.56 0.27 0.83UK 0 0 0.88 0.92 0.40 0.31 0.74GER 0 0 0 0.84 0.44 0.53 0.61GRE 0 0 0 0 0.40 0.16 0.82MAL 0.0008 0 0 0 0 0.54 0.38SAF 0.0057 0.0065 0.0017 0 0.1154 0 -0.15CRO 0 0 0 0 0 0.0001 0.1408

Almost all indices are strongly correlated.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 137: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: World financial markets, correlation coefficient

Upper triangular: sample correlation coefficient rij .Lower triangular: p-value for significance test for ρij (z-statistic)

USA AUS UK GER GRE MAL SAF CROUSA 0.86 0.92 0.88 0.89 0.33 0.27 0.75AUS 0 0.91 0.82 0.90 0.56 0.27 0.83UK 0 0 0.88 0.92 0.40 0.31 0.74GER 0 0 0 0.84 0.44 0.53 0.61GRE 0 0 0 0 0.40 0.16 0.82MAL 0.0008 0 0 0 0 0.54 0.38SAF 0.0057 0.0065 0.0017 0 0.1154 0 -0.15CRO 0 0 0 0 0 0.0001 0.1408

Almost all indices are strongly correlated.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 138: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: World financial markets, correlation networkAdjacency matrix, threshold at α = 0.01 (multiple testing?)

USA AUS UK GER GRE MAL SAF CRO

USA 0 1 1 1 1 1 1 1AUS 1 0 1 1 1 1 1 1UK 1 1 0 1 1 1 1 1GER 1 1 1 0 1 1 1 1GRE 1 1 1 1 0 1 0 1MAL 1 1 1 1 1 0 1 1SAF 1 1 1 1 0 1 0 0CRO 1 1 1 1 1 1 0 0

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 139: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: World financial markets, correlation networkAdjacency matrix, threshold at α = 0.01 (multiple testing?)

USA AUS UK GER GRE MAL SAF CRO

USA 0 1 1 1 1 1 1 1AUS 1 0 1 1 1 1 1 1UK 1 1 0 1 1 1 1 1GER 1 1 1 0 1 1 1 1GRE 1 1 1 1 0 1 0 1MAL 1 1 1 1 1 0 1 1SAF 1 1 1 1 0 1 0 0CRO 1 1 1 1 1 1 0 0

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 140: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: World financial markets, partial correlation

Upper triangular: partial correlation rij |K , conditioned on all|K | = 6 rest variables.Lower triangular: p-value for significance test for ρij |K (z-statistic)

USA AUS UK GER GRE MAL SAF CROUSA 0.01 0.37 0.27 0.07 -0.27 0.11 0.27AUS 0.9378 0.42 -0.02 0.15 0.30 0.10 0.38UK 0.0002 0 0.08 0.36 -0.16 0.08 -0.11GER 0.0081 0.8469 0.4693 0.38 -0.31 0.66 0.26GRE 0.4946 0.1392 0.0003 0.0001 0.19 -0.36 0.01MAL 0.0083 0.0033 0.1232 0.0026 0.0710 0.68 0.46SAF 0.2908 0.3554 0.4321 0 0.0003 0 -0.70CRO 0.0079 0.0002 0.3149 0.0099 0.9083 0 0

Correlation between any two indices decreased when conditionedon all others.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 141: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: World financial markets, partial correlation

Upper triangular: partial correlation rij |K , conditioned on all|K | = 6 rest variables.Lower triangular: p-value for significance test for ρij |K (z-statistic)

USA AUS UK GER GRE MAL SAF CROUSA 0.01 0.37 0.27 0.07 -0.27 0.11 0.27AUS 0.9378 0.42 -0.02 0.15 0.30 0.10 0.38UK 0.0002 0 0.08 0.36 -0.16 0.08 -0.11GER 0.0081 0.8469 0.4693 0.38 -0.31 0.66 0.26GRE 0.4946 0.1392 0.0003 0.0001 0.19 -0.36 0.01MAL 0.0083 0.0033 0.1232 0.0026 0.0710 0.68 0.46SAF 0.2908 0.3554 0.4321 0 0.0003 0 -0.70CRO 0.0079 0.0002 0.3149 0.0099 0.9083 0 0

Correlation between any two indices decreased when conditionedon all others.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 142: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Financial markets, partial correlation network

Adjacency matrix, threshold at α = 0.01USA AUS UK GER GRE MAL SAF CRO

USA 0 0 1 1 0 1 0 1AUS 0 0 1 0 0 1 0 1UK 1 1 0 0 1 0 0 0GER 1 0 1 0 1 1 1 1GRE 0 0 1 1 0 0 1 0MAL 1 1 1 1 0 0 1 1SAF 0 0 1 1 1 1 0 1CRO 1 1 1 1 0 1 1 0

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 143: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Example: Financial markets, partial correlation network

Adjacency matrix, threshold at α = 0.01USA AUS UK GER GRE MAL SAF CRO

USA 0 0 1 1 0 1 0 1AUS 0 0 1 0 0 1 0 1UK 1 1 0 0 1 0 0 0GER 1 0 1 0 1 1 1 1GRE 0 0 1 1 0 0 1 0MAL 1 1 1 1 0 0 1 1SAF 0 0 1 1 1 1 0 1CRO 1 1 1 1 0 1 1 0

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series

Page 144: Statistical Analysis of Networks - Networks, …users.auth.gr/dkugiu/Teach/Networks/KugiumtzisLecture1.pdfDimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation

Literature

[1] Data sets for pajek software,http://vlado.fmf.uni-lj.si/pub/networks/data.

[2] Kolaczyk ED (2009) Statistical Analysis of Network Data,Springer.

Dimitris Kugiumtzis Statistical Analysis of Networks -Networks, correlation and time series