R meetup-2016-02-09-pc

57
Uncovering political connections of firms using machine learning methods BURN meetup, 9th February 2016 » János Divényi @janosdivenyi « » Jenő Pál @paljenczy «

Transcript of R meetup-2016-02-09-pc

Uncovering political connections of firms using machine learning methods

BURN meetup, 9th February 2016

» János Divényi @janosdivenyi « » Jenő Pál @paljenczy «

CEU Microdata

Ádám SzeidlMiklós Koren

CEU Microdata

Ádám SzeidlMiklós Koren

Political connections and favoritism in

Hungary

Political connections matter

Political connections matter

Guess the color: left or right

Altus Zrt. Fittelina Kft. Mahír Zrt. Közgép Zrt.

Guess the color: left or right

Altus Zrt. Fittelina Kft. Mahír Zrt. Közgép Zrt.

Guess the color: left or right

Altus Zrt. Fittelina Kft. Mahír Zrt. Közgép Zrt.

Guess the color: left or right

How to automate the process for each firm?

Framework

Information Decision rule

Information

Firm register Election data

Information

Firm register Election data

Information

Firm register Election data~5M rows ~350k rows

Information

Firm register Election data~5M rows ~350k rows

data.table

Decision rule

~1B rows ~1B rows

The firm is rightif there are

more right than left politicians in the firm

Guess the color: left or right

Altus Zrt. Fittelina Kft.Mahír Zrt. Közgép Zrt.

Guess the color: left or right

Altus Zrt. Fittelina Kft. Mahír Zrt. Közgép Zrt.

firm1 firm6 firm11 firm16 firm21 firm26 firm31 firm36 firm41 firm46

firm2 firm7 firm12 firm17 firm22 firm27 firm32 firm37 firm42 firm47

firm3 firm8 firm13 firm18 firm23 firm28 firm33 firm38 firm43 firm48

firm4 firm9 firm14 firm19 firm24 firm29 firm34 firm39 firm44 firm49

firm5 firm10 firm15 firm20 firm25 firm30 firm35 firm40 firm45 firm50

Guess the color: left or right

firm1 firm6 firm11 firm16 firm21 firm26 firm31 firm36 firm41 firm46

firm2 firm7 firm12 firm17 firm22 firm27 firm32 firm37 firm42 firm47

firm3 firm8 firm13 firm18 firm23 firm28 firm33 firm38 firm43 firm48

firm4 firm9 firm14 firm19 firm24 firm29 firm34 firm39 firm44 firm49

firm5 firm10 firm15 firm20 firm25 firm30 firm35 firm40 firm45 firm50

Guess the color: left or right

firm1 firm6 firm11 firm16 firm21 firm26 firm31 firms36 firm41 firm46

firm2 firm7 firm12 firm17 firm22 firm27 firm32 firm37 firm42 firm47

firm3 firm8 firm13 firm18 firm23 firm28 firm33 firm38 firm43 firm48

firm4 firm9 firm14 firm19 firm24 firm29 firm34 firm39 firm44 firm49

firm5 firm10 firm15 firm20 firm25 firm30 firm35 firm40 firm45 firm50

Guess the color: left or right

Ferenc GyurcsányPM of left coalition

2004-2009

Ferenc GyurcsányPM of left coalition

2004-2009

Ferenc Gyurcsánlocal representative at Nyíregyháza

1998

Framework

Information Decision rule

Improve data

What is the chance that firm person & politician

is the same?

Improve data

What is the chance that firm person & politician

is the same?

Probabilistic coloring

Improve data

What is the chance that firm person & politician

is the same?

69% left 31% other Probabilistic coloring

Decision rule

~1B rows ~1B rows

The firm is rightif the average right probability

is largerthan the average left probability

Guess the color: left or right

Altus Zrt. Fittelina Kft. Mahír Zrt. Közgép Zrt.

Guess the color: left or right

Altus Zrt. Fittelina Kft. Mahír Zrt. Közgép Zrt.

firm1 firm6 firm11 firm16 firm21 firm26 firm31 firm36 firm41 firm46

firm2 firm7 firm12 firm17 firm22 firm27 firm32 firm37 firm42 firm47

firm3 firm8 firm13 firm18 firm23 firm28 firm33 firm38 firm43 firm48

firm4 firm9 firm14 firm19 firm24 firm29 firm34 firm39 firm44 firm49

firm5 firm10 firm15 firm20 firm25 firm30 firm35 firm40 firm45 firm50

Guess the color: left or right

Framework

Information Decision rule

Improve information

Improve information

Improve information

Links: common ownership or location

Improve information

Oligarchopedia

Improve information

Improve information

Improve information

Improve information

igraph

Improve decision rule

use machine learning instead of ad hoc algorithms

Improve decision rule

use machine learning instead of ad hoc algorithms

need training data

Improve decision rule

Improve decision rule

Improve decision rule

one interface to many algorithms

streamlines the process of machine learning

parallel computation with reproducibility

Improve decision rule

caret classification and regression training

one interface to many algorithms

streamlines the process of machine learning

parallel computation with reproducibility

Improve decision rule

caret classification and regression training

doParallel

The train function

Parallel computation

Seeds for parallel stochastic models

firm1 firm6 firm11 firm16 firm21 firm26 firm31 firm36 firm41 firm46

firm2 firm7 firm12 firm17 firm22 firm27 firm32 firm37 firm42 firm47

firm3 firm8 firm13 firm18 firm23 firm28 firm33 firm38 firm43 firm48

firm4 firm9 firm14 firm19 firm24 firm29 firm34 firm39 firm44 firm49

firm5 firm10 firm15 firm20 firm25 firm30 firm35 firm40 firm45 firm50

Guess the color: left or right

iterative process involving manipulation, visualization, modelling, etc

Takeaways

iterative process involving manipulation, visualization, modelling, etc

data.table

Takeaways

igraph

ggplot2

caret

ROCR

doParallel

Miklós Koren, Ádám Szeidl, Márta Bisztray, Anna Csonka, Krisztián Fekete, Attila Gáspár, Dániel Molnár, Gábor Nyéki, Krisztina Orbán, Rita Pető, Balázs Reizer, Mátyás Steiner, Bálint Szilágyi, Ferenc Szűcs, András Vereckei, Zsófia Kőműves, Olivér Kiss, Dániel Pass, Dávid Popper and others...

Thanks for the attention