The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA...

23
The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA [email protected] [email protected] Department of Instrumentation and Control Engineering, Faculty of Mechanical Engineering, CTU in Prague, Technická 4, 166 07 Prague 6

Transcript of The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA...

Page 1: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

The Application of Data Mining Methods In

Monitoring of Ecosystems

Jiri BILA and Jakub JURA

[email protected] [email protected]

Department of Instrumentation and Control Engineering, Faculty of Mechanical Engineering, CTU in Prague, Technická 4, 166 07 Prague 6

Page 2: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Monitoring of Ecosystems

• 11 Measuring Stations • 13 variables • Sampling period 6 minutes

Page 3: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Database system for monitoring

Central moduleData Batch F-M : TBatchControl : TControlSegment MSi : TSEGMSiGraph MSi : TGraph_MSiURCM : TUR_CM

Transfer of control into TBatch()Transfer of control into TControl()Transfer of control into TGraphMSi()Transfer of control into TSEGMSi()Transfer of control into TUR_CM()Export_of_data()

TUR_MSX : RainfallX : Humidity 2m

Realise operation (X)()Set operation (X)()

TMSRainfall : float (mm/m2)Humidity_2m : float (%)Temperature_2m : float (oC)Humidity_30cm : float (%)Temperature_30cm : float (oC)GR_Incidence : float (W/m2)GR_Reflection : float (W/m2)Earth_Humidity : float (%)Earth_temperature1 : float (oC)Earth_temperature2 : float (oC)Earth_temperature3 : float (oC)Wind speed : float (m/s)Wind direction : integer (st)Tension_aku : float (V)URMS : TUR_MS

Test of data correctness()Dat fix()Display variable's value()Draw graph()Archivation of vaules ki/pi()Take over control from CM()Transfer of control into CM()

Class of Measuring StationsMS1 : TMSMS2 : TMSMS3 : TMSMS4 : TMSMS5 : TMSMS6 : TMSMS7 : TMSMS8 : TMSMS9 : TMSMS10 : TMSMS11 : TMS TControl

Selection MSi()Transfer of control into MSi()Transfer of control into CM()

TSEGMSi

Compare_segments of database()Transfer_segments of database()

TGraph_MSi

Draw the compare graph()

TUR_ CM

Open the activity DBS()Terminate the activity DBS()Transfer of control to subclasses()

TUR_PR

Set the operation (Y)()Realise the operation (Y)()

Programm interfaceUR : TUR_PR

Set the batch size()Transfer the data batch()Test of batch correctness()Clean up data space()Synchronisation of the data transfer()

TBatch

Prepare transfer()Control transfer()Terminate transfer()

Page 4: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Měřicí stanice Domanín

čas

řen

í

Srá

žk

y

Vlh

ko

st 2

m

Te

plo

ta 2

m

Vlh

ko

st 3

0c

m

Te

plo

ta 3

0cm

GR

do

pa

d

GR

od

raz

Vlh

ko

st p

ůd

y

Te

p.p

ůd

y 1

Te

p.p

ůd

y 2

Te

p.p

ůd

y 3dd.mm.rrrr hh:mm;smm % °C % °C W/m2 W/m2 % °C °C °C

2.12.2007 8:10;0;86 64;3 25;97 28;0 37;12 9;4;27 3;1 37;2 25;2 51;3 29;179; 2.12.2007 8:20;0;83;4 16;93 2;2 13;20 9;8 5;27 3;1 3;2 19;2 45;2 66;187; 2.12.2007 8:30;0;82 99;4 14;87 42;3 37;35 4;12 5;27 3;1 28;2 16;2 43;2 12;177; 2.12.2007 8:40;0;77 77;5 36;69 57;7 78;96 6;37 8;27 3;1 32;2 17;2 47;2 47;180; 2.12.2007 8:50;0;73 95;6 26;63 33;9 15;113 6;41;27 3;1 35;2 15;2 45;3 9;210; 2.12.2007 9:00;0;74 4;6 58;65 94;9 15;142 9;48 9;27 3;1 39;2 13;2 44;2 72;200; 2.12.2007 9:10;0;72 88;6 64;71 3;7 45;82 4;21 3;27 3;1 47;2 12;2 44;4 48;200; 2.12.2007 9:20;0;74 43;6 4;77 81;6 22;95 9;22 2;27 3;1 54;2 1;2 44;3 06;200; 2.12.2007 9:30;0;76 73;6 35;77 94;6 81;100 2;21;27 3;1 59;2 09;2 42;2 06;200; 2.12.2007 9:40;0;74 1;6 64;78 8;6 26;70 4;12 9;27 3;1 66;2 08;2 42;2 38;190; 2.12.2007 9:50;0;73 35;7 96;65 78;10 77;246 1;69;27 3;1 72;2 07;2 4;2 86;190; 2.12.2007 10:00;0;72 86;8 11;67 03;10 22;147 8;34 6;27 3;1 86;2 07;2 41;1 62;200; 2.12.2007 10:10;0;72 79;7 75;72 31;9 02;99 8;19 2;27 3;1 99;2 03;2 36;3 84;240; 2.12.2007 10:20;0;69 31;7 8;72 7;8 22;128 2;26 5;27 3;2 17;2 08;2 42;7 63;240; 2.12.2007 10:30;0;69 93;8 14;67 83;10 26;291 3;75 4;27 3;2 19;2 05;2 36;6 88;260; 2.12.2007 10:40;0;67 24;8 36;65 16;10 72;256 9;64 6;27 3;2 29;2 05;2 36;6 06;240; 2.12.2007 10:50;0;67 75;8 5;66 25;10 9;339 7;83 4;27 3;2 37;2 08;2 36;4 72;200; 2.12.2007 11:00;0;70 91;7 44;75 82;7 02;63 1;10 4;27 3;2 4;2 09;2 38;6 26;210; 2.12.2007 11:10;0;71 28;7 46;75 37;7 64;165 9;31 4;27 3;2 39;2 12;2 36;3 43;220; 2.12.2007 11:20;0;68 88;7 91;68 22;9 28;132 5;25 5;27 3;2 55;2 18;2 4;3 94;230; 2.12.2007 11:30;0;68 87;8 46;70 71;9 84;390 7;94 3;27 3;2 59;2 16;2 37;4 34;230;

Page 5: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Data Mining

• Knowledge discovery in data bases is “the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data” Fayyad (1996).

Page 6: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Used Data Mining Methods

• Conceptual Lattice • Rough Sets

Page 7: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Conceptual Lattice

•Data Mining Context • C = (O, I, R)

– O is a set of an objects x– I is a set of an items (attributes) y– R is a binary relation R O I

Page 8: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Conceptual Lattice

•Conceptual Lattice L• Derived from Data Mining Context C• X = xOyY, x R y • Y = yIxX, x R y

– X is the largest set of an objects X O – Y is the largest set of an items Y I

Page 9: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Conceptual Lattice

• Hasse diagram • The Hasse diagram is constructed by use

the partial arrangement "<„. – Edge from the H1 to H2 exist if H1 < H2 and

none of element of H3 fulfil condition H1 < H3 < H2.

– H1 is an antecedent of element H2 (H2 is the descendant of the element H1).

– A pair of X, Y represents a node in Hasse diagram.

Page 10: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Transformace datab8ze

• Hasse diagram • The Hasse diagram is constructed by use

the partial arrangement "<„. – Edge from the H1 to H2 exist if H1 < H2 and

none of element of H3 fulfil condition H1 < H3 < H2.

– H1 is an antecedent of element H2 (H2 is the descendant of the element H1).

– A pair of X, Y represents a node in Hasse diagram.

Page 11: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Conceptual Lattice - Example

• C = (A0, A1, A2, A3, A4,3, 4, 7, 8, 9, R)

• Where:– C … context of data mining

– A0, A1, A2, A3, A4 … Monitoring Classes

– 3, 4, 7, 8, 9 … Situations– R … relation which is represented in the

table MG

Page 12: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Conceptual Lattice

MG 3 4 7 8 9

A0 1 1 1 1 1

A1 1 1

A2 1 1 1 1

A3 1 1 1

A4 1 1 1 1

Table MG which represents relation R.

SituationsMonitoring Classes

Page 13: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Con

cep

tual

Lattic

e H

ass d

iag

ram

(3,4,5,7,8,9), A0

(3,4), A0 A1

(4,7,8,9), A0 A2

(3,7,8), A0 A3 (4), A0 A1 A2 (3,4,7,9), A0 A4 (3), A0 A1 A3 (3,7), A0 A3 A4

((4,7,9), A0 A2 A4

(7,8), A0 A2 A3

(3,4), A0 A1 A4

(7), A0 A2 A3 A4 (3), A0 A1 A3 A4

(4) A0 A1 A2 A4

(0), A0 A1 A2 A3 A4

Page 14: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Conceptual Lattice

• Guarantee of the rule’s reliability and validity.

• Support – supp(Ai, S) = ((s S Ict(s, Ai))/ ((S )) – Supp (Ai Aj, S) = supp(Ai Aj, S )

• Confidence – Conf (Ai Aj, S) = Supp (Ai Aj, S) /

supp(Ai)

Page 15: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Rule No. i

Rule ri Supp(ri) Conf (ri)

1 A1 A2 0.2 0.5

2 A1 A3 0.2 0.5

3 A1 A4 0.4 1

4 A2 A3 0.4 0.5

5 A3 A4 0.4 0.66

6 A1 A2 A4 0.2 1

7 A2 A4 A4 0.2 0.33

8 A2 A3 A4 0.2 0.5

9 A2 A4 A3 0.2 0.33

10 A1 A3 A4 0.2 1

11 A3 A4 A1 0.2 0.5

Page 16: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Rough Sets

• Relation of indiscernibility • x1, x2 U,• (x1 RE(A) x2 )) ⇔ (g(x1, ai) = g(x2,

ai))

• Where:– U … universe of elements.

– A … set of attributes

– Vai … sets of values

– g: U x A → V

Page 17: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Rough Sets

• Which of these elements of universe U and with what certainty approach subset of X ⊂ U, in that we are interested ?

• Lower Approximation • Upper Approximation

• Border set

Page 18: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Rough Sets

•Lower Approximation• The Lower Approximation

(positive area PosiRE(X) ) is a set of objects which certain belong to a subset.

• PosiRE(X) = ∪ { Y Ⅰ (Y ∈ (U/RE)) AND (Y ⊆ X)

Page 19: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Rough Sets

•Upper Approximation • The set of elements from the U,

which may (possibly) belongs to X.• PossRE(X) = ∪ {Y Ⅰ (Y ∈ U/RE)

AND (Y ∩ X ≠ ∅) }

Page 20: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Rough Sets

•Boundary region • Difference between the upper

and lower approximation X.• BoundRE(X) = PossRE(X) -

PosiRE(X)

Page 21: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Rough Sets

• Rough Set• Rough set is a subset X of universe U and

this subset is defined using the upper and lower approximation (PossRE(X), PosiRE (X)) and for which:

• BoundRE(X) ∅

Page 22: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Rough Sets

• Rough accuracy of aproximation.RE(X) = card (PosiRE(X)) / card

(PossRE(X))

Page 23: The Application of Data Mining Methods In Monitoring of Ecosystems Jiri BILA and Jakub JURA Jiri.Bila@fs.cvut.cz Jakub.Jura@fs.cvut.cz Department of Instrumentation.

Conclusion

• The paper proposed application of two data mining methods. Fragments of a monitoring system database have been used for the data support. The paper emphasises that the use of the original database content is not direct and it is necessary to transform it into forms utilisable by the selected data mining methods. The success of data mining process then strongly depends also on the definition of the monitoring classes and the “operation" situations (formulated by experts).