A multi criteria evaluation of environmental databases using hasse

32
A Multi-Criteria Evaluation of Environmental databases using Hasse diagram technique (Research Paper) By: K.Balamurugan MFCS M.Tech-CSE-1 st year 28/06/2022 Pondicherry University 1

Transcript of A multi criteria evaluation of environmental databases using hasse

Page 1: A multi criteria evaluation of environmental databases using hasse

03/05/2023 1

A Multi-Criteria Evaluation of Environmental databases using Hasse diagram technique

(Research Paper)

By:K.Balamurugan

MFCSM.Tech-CSE-1st year

Pondicherry University

Page 2: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 2

Abstract• We apply Hasse Diagram Technique (HDT) (the software tool is named ProRank) which

originates in discrete mathematics• It is a multi-criteria evaluation method which can be used as a tool to rank objects and

is hence also applicable to decision making. • The HDT reveals the best and the worst databases and conflicts among them, due to

different information content. • We evaluate 15 Internet databases with respect to the existence of data on 24

chemicals. The information in the database x is coded by 0 = not available or 1 = available. Subsets of the databases are evaluated:

• Single databases, European versus US databases, and databases which comprise 2001-10,000 chemicals.

• Only one database, ChemExper Catalog contains all 24 chemicals. The comparison of European and US databases revealed no marked difference in the quantity of the selected information base, thus refuting the widespread notion that US databases cover more chemicals than do European databases.

• To sum up it can be stated that the data availability on the chosen test-set of chemicals is far from being satisfactory

Page 3: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 3

1. Introduction• The increasing complexity of environmental problems, the growing number of

topics involved and keen competition between conflicting interests make decisions and decision support difficult.

• In most environmental problems multi-criteria questions arise. Hence, decision-support tools are required which are able to solve multi-criteria problems.

• a multi-criteria decision-support tool is the Hasse Diagram Technique, based on discrete mathematics.

• The commercial soft-ware for HDT, ProRank, has been applied in the present study. ProRank presents a rather new approach based on partially ordered sets, and avoids the loss of information received by merging characterizing properties and thus preserves important elements of the evaluation and decision-making processes

Page 4: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 4

1. Introduction……

• The demand for decision-support tools is particularly strong in the field of water pollution. Close cooperation between scientists and decision makers is necessary and scientists will be a key element in decision making

• where the job of the scientist ends and the task of the decision maker begins

• Several approaches for multi-criteria decision-support methods and tools for environmental applications exist.

Page 5: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 5

1. Introduction……• A fuzzy knowledge-based decision-support system, providing information on the environmental impact of

anthropic activities by examining their effects on groundwater quality.

• Hasse Diagram Technique (HDT), normally providing more than one favourable solution (partial order).

• White Paper, to collect data on chemicals for their risk assessment leading to, where necessary, risk reduction.

• The gap in knowledge about intrinsic properties of existing substances should be closed to ensure that equivalent information to that on new substances is available. The available information on existing chemicals, as well as on pharmaceuticals, should be thoroughly examined and best use made of it in order to waive testing, wherever appropriate.

• However, publicly available knowledge of existing chemicals contains significant gaps . For example the contents of the IUCLID (International Uniform Chemical Information Database) were evaluated in a study in 1999. Considerable data gaps were found in environmental fate and pathways, and in ecotoxicity parameters. The sparse data situation in environmental and chemical databases was confirmed by an evaluation approach

• In this study the contents of data-bases were evaluated whereas in the present paper we study the data availability on different kinds of chemicals in several subsets of databases.

Page 6: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 6

2. Data availability for environmental chemicals

• Data availability for environmental chemicals is strongly related to structuring and archiving them in environmental and chemical databases.

• Several approaches are used to access the quality of databases in environmental sciences and chemistry. Commercial databases in toxicology were evaluated by applying environmental and toxicological evaluation criteria.

• Commercial online databases and CD-ROMs were examined with chemical and environmental evaluation parameters .

• Data availability is an important prerequisite for scrutinizing chemical substances (existing chemicals as well as pharmaceuticals) for their environmental behaviour and effect.

• We intend to determine whether publicly available databases comprise information on environmental chemicals, and in a further step, evaluate what kind of information is available.

Page 7: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 7

3. Methodology3.1. Background of the Hasse Diagram Technique

(HDT)

• The Hasse Diagram Technique (HDT) is an approach based on partially ordered sets that preserves important elements of the evaluation and decision-making processes.

• The basis of the HDT is the assumption that a ranking can be performed, while avoiding the use of an ordering index.

• For an evaluation of the objects they must be compared. The comparison is made by examining characteristic properties (attributes, descriptors) of these objects.

• If the evaluation is aimed at assessing criteria, then the attributes (synonyms: descriptors) are thought of as measures, of how well a criterion is fulfilled.

• Attributes are in the case of the object ‘‘x’’ denoted as q(1,x), q(2,x), ., q(m,x) and often written as a tuple q(x).

Page 8: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 8

3. Methodology3.1. Background of the Hasse Diagram Technique

(HDT)…..

• properties are gathered to a set without reference to actual values realized by the objects

. • This set of properties is called an information base IB. Often sub-sets of the IB are

needed. Consider now two objects x and y, then we say y ≥ x (with respect to the m properties of interest)

• if q(i,x) ≥ q(i,y) for all i =1, 2, …, m and there is at least one

• i*, for which q(i*,x) > q(i*,y) (because of the demand ‘‘for all’’, this definition is denoted as ‘‘generality principle’’)

• If q(i,x) ≥ q(i,y), or q(i,x) ≤ q(i,y), for all i =1, ., m then the objects x and y are comparable. The mere fact that x is comparable with y (without the information about the orientation) is often denoted as x ⱶy.

Page 9: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 9

3. Methodology3.1. Background of the Hasse Diagram Technique

(HDT)…..

• However, one often finds• q(i,x) < q(i,y) for one index set a and• q(i,x) > q(i,y) for another index set with

In such a case, the objects x and y are incomparable, and one writes x || y. Although incomparability's are not wanted in a final decision, they reveal interesting conflicts among the objects.

The main framework of HDT can be characterized as follows:

1. Selecting a set of elements of interest which are to be compared, E. The set E is called the ground set. This notation expresses that the ground set, together with at least one binary relation among the elements of E, gets a structure which can often be represented as a digraph as in the case discussed here.2. Selecting a set of properties, by which the comparison is performed, called the information base IB.

Page 10: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 10

3. Methodology3.1. Background of the Hasse Diagram Technique

(HDT)…..

• 3. Finding a common orientation for all properties, according to the criteria they are assigned.

• 4. Analysing if one of the following three relations is valid:-

• equivalence, we call the corresponding equivalence relation R, the equality of two tuples q(x),q( y). By R the quotient set E/R is given and

• Almost all operations in HDT are based on the quotient set E/R. For example the visualization in WHASSE is based on representatives taken from equivalence sets. In the software ProRank, the vertices are associated with the equivalence classes themselves.

Page 11: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 11

3. Methodology3.1. Background of the Hasse Diagram Technique

(HDT)…..

• The relation defined above among all objects, is indeed an order relation, because it fulfils the axioms of order, namely

• reflexivity (one can compare each object with itself)• antisymmetry (if x is preferred to y then the reverse is only true, if the two objects are equal (or

equivalent))• transitivity (if x is better than y, and y is better than z, then x is better than z).

• A set E equipped with an order relation ≤ is said to be an ordered set, or partially ordered set, or briefly ‘‘poset’’, and is denoted as (E, ≤ ).

• We note: A set E, equipped with a partial order, is often written as (E, ≤). Because the ≤ comparison depends on the selection of the information base (and of the data representation (classified or not, rounded, an so on)) we also write (E,IB) to denote this important influence of the IB for any rankings .

• Concerning the evaluation of the ecotoxicity of environmental chemicals by lethal concentrations i.e. LC50 values for example, orientation is the following:

• small values: ‘‘good’’, relatively non-hazardous• large values: ‘‘bad’’, relatively hazardous.

Page 12: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 12

3. Methodology3.1. Background of the Hasse Diagram Technique

(HDT)…..

• In our applications, the circles near the top of the page of the Hasse Diagram indicate objects that are the ‘‘better’’ objects according to the criteria/attributes used to rank them.

• The objects not ‘‘covered’’ by other objects are called maximal objects. Objects which do not cover other objects are called minimal objects.

• Equivalent objects are different objects that have the same data with respect to a given set of attributes. Only one representative of the equivalent objects is shown in the Hasse Diagram and named Kn in the ProRank software.

• The total number of comparabilities V and incomparabilities U and their local analogues that is to say the number of comparabilities V(x) and incomparabilities U(x) of a certain element x, are useful quantities for the documentation of the Hasse Diagram and for the estimation of ranking uncertainties

• Whereas V can be considered as a degree of correlatedness among the attributes, the quantity U generally provides information about the extent of conflicts among the objects.

Page 13: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 13

3. Methodology 3.2. Characterizing numbers of the Hasse Diagram

• In order to interpret a Hasse Diagram, some further terms and notations have to be introduced:

• N: the number of objects.• IB: ‘‘information base’’, the set of attributes characterizing the

objects within the evaluation study.• m: number of attributes of IB.• Rk(x): the rank of object x within a total order.• Chain: a set of mutually comparable objects.• Antichain: a set of mutually incomparable objects.• Articulation point: a vertex of the transitive hull of the digraph the

elimination of which would increase the number of hierarchies

Page 14: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 14

3. Methodology 3.2. Characterizing numbers of the Hasse Diagram

• If articulation points exist, the Hasse Diagram can almost be separated into hierarchies. That means that the identification of articulation points helps to discover specific data structures within the data-matrix.

• Levels: a first screening and a partitioning of set E according to increasing values of the attributes. The levels are defined by the longest chain within the Hasse Diagram. The assignment of objects to levels cannot be done uniquely from the point of view of order theory, but uniquely if additional rules are introduced (for example: conservatively ,meaning if HDT objects are assigned to the highest possible level).

• The set of levels together with the ≤ relation, forms a new poset (L, , ≤ ), which represents a chain over all objects of L, that is to say a total order. Both the empirical posets (E, , ≤ ) and (L, , ≤ ) are related by an order-preserving map.

Page 15: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 15

4. Set-up of a data-matrix to be evaluated by Hasse Diagram Technique

Page 16: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 16

4.Set-up of a data-matrix to be evaluated by Hasse Diagram Technique….

Page 17: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 17

4.Set-up of a data-matrix to be evaluated by Hasse Diagram Technique….

• For the evaluation of databases with chemicals, a data-matrix was set up consisting of 15 Internet databases and 12

• pharmaceuticals as well as 12 High Production Volume Chemicals (HPVCs).

• The databases are listed in Table 1, together with their abbreviation, Internet address (URL =Uniform Resource Locator)and number of chemicals.

• The queries were made by CAS-numbers. If the chemical could not be found by this step, a search by trade name, as listed in Table 2.

Page 18: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 18

4.Set-up of a data-matrix to be evaluated by Hasse Diagram Technique….

• Four different types of numerical databases can be distinguished:

• Single databases which cover only one data collection (BID, CIV, GES, HSD, ICS, NCL, OEK).

• Multi-database databases which encompass several databases under the same name and search interface (ECO,ENV, EFD, ESI, EXT).

• Monograph databases which cover extensive reviews on very few chemicals (EHC, OIH).

• Catalog Database (CEX).

Page 19: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 19

5.Applying the multi-criteria evaluation approach5.1. Evaluation of the 15 databases

The Hasse Diagram Technique using the ProRank program was applied to the complete 15 ×24 data-matrix, and the result is given in Fig. 1. The diagram is structured into seven levels, numbered from the bottom (minimal objects) to the top (maximal object).

Only 13 databases are individually shown in the diagram, with the equivalent objects (databases) indicated by the letter K. This K1 means that the database CIV is equivalent to ICS, and K2 means that EHC and EXT are equivalent.

The catalog database CEX (ChemExper Catalog of Chemical Suppliers, Physical Characteristics) is the only maximal object in this evaluation approach. This object is also called the greatest object. The CEX database is connected with all other databases in the downward position; hence it comprises more chemicals than any other database using our approach

The minimal objects are OIH (OECD Integrated HPV Database), BID (Biocatalysis/Biodegradation Database) and the equivalent objects EHC (Environmental Health Criteria Monographs) and EXT (EXTOXNET).

Page 20: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 20

5.Applying the multi-criteria evaluation approach5.1. Evaluation of the 15 databases………….

Page 21: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 21

5.Applying the multi-criteria evaluation approach5.1. Evaluation of the 15 databases………….

• Applying the so-called sensitivity matrix (W-matrix) ,we determined that the high production volume chemicals CMC (Chormequat chloride) and ISO (Isoproturon) have the greatest impact on the Hasse Diagram.

• This means that their absence or presence (coded by 0/1) is most important in this data analysis.

Page 22: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 22

5.Applying the multi-criteria evaluation approach5.2. Evaluation of single databases

• Seven databases were so-called single databases, meaning that they consisted of only one data collection: BID, CIV,GES, HSD, ICS, NCL, OEK.

• In the ProRank program a subset of objects (databases) can easily be generated, and in this case we evaluated a data-matrix of a subset of 7 × 24

• Two different types of Hasse Diagrams are given in Fig. 2,the standard diagram (left) and the bar diagram (right).

• Bar diagram the bars represent the attributes (chemicals), thus it is easy to interpret the ≤ relation.

• The maximal objects are HSD and GES. However, neither covers all 24 chemicals and the two databases are not comparable to each other.

• The HSD database includes 20 chemicals, whereas GES includes only 15 chemicals. Since the code 0/1 can also be interpreted as a characteristic set function for a given database.

• For example, regarding the high production volume chemical ISO (Isoproturon), the GES database provides data, whereas HSD does not.

• The BID database, the minimal object.

Page 23: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 23

5.Applying the multi-criteria evaluation approach5.2. Evaluation of single databases…………..

Page 24: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 24

5.Applying the multi-criteria evaluation approach5.3. Evaluation of databases with respect to their origin

• We identified three classes of databases according to their origin:• EU databases: CIV, ESI, GES, NCL, OEK.• US databases: BID, ECO, EFD, ENV, EXT, HSD.• International databases: CEX, EHC, ICS, OIH.

• we compared data availability of the five EU databases and the six US databases (Fig. 3). There were no equivalent objects in either of the two test-sets.

• In the Hasse Diagram of the European databases, a total of nine comparability's are demonstrated whereas only one incomparability is shown, namely CIV || NCL

• This incomparability is caused by the HPV chemical ISO (Isoproturon) and by the pharmaceutical PHE (Phenazone). The CIV database provides data for Phenazone (code =1) whereas NCL does not, the reverse is the case for Isoproturon

• The maximal object of the European databases, ESI, provides information for 23 out of the 24 chemicals

• The minimal object of the European data-bases OEK provides information on only seven out of 24 chemicals.

Page 25: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 25

5.Applying the multi-criteria evaluation approach5.3. Evaluation of databases with respect to their origin…..

Page 26: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 26

5.Applying the multi-criteria evaluation approach5.3. Evaluation of databases with respect to their origin…..

• The Hasse Diagram of the six US databases shows 10 comparability's and five incomparability's.

• The maximal objects of the US databases have HSD 20 out of 24 chemicals and EFD 22 out of 24 chemicals.

• The minimal object BID in the US databases covers only two out of 24 chemicals, and the other

• minimal database EXT has information on five chemicals out of 24.

• It should be mentioned that most of the US databases are multiple data collections (multi-database databases), whereas the entire EU databases (with the exception of ESI) are single databases.

• With respect to the evaluation approach, single databases do not give worse results than multi-database databases.

Page 27: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 27

5.Applying the multi-criteria evaluation approach5.4. Evaluation of databases with respect to the number

of chemicals they contain• The chosen test-set of 15 databases was divided

into the following three clusters with respect to the quantity of information:

• Databases containing less than 2000 chemicals: BID, EXT,EHC, ICS.

• Databases containing 2001-10,000 chemicals: CIV, ECO,GES, HSD, NCL, OEK, OIH.

• Databases containing more than 10,001 chemicals: CEX,EFD, ENV, ESI.

Page 28: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 28

5.Applying the multi-criteria evaluation approach5.4. Evaluation of databases with respect to the number

of chemicals they contain…..

• The largest group of databases is those covering from 2001 to 10,000 chemicals; for this group of seven databases, we elaborate the Hasse Diagram shown as follow

Page 29: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 29

5.Applying the multi-criteria evaluation approach5.4. Evaluation of databases with respect to the number

of chemicals they contain…..

• The Hasse Bar Diagram is structured into four levels, and shows 11 comparability's against 10 incomparability's. As discussed above, HSD and GES are incomparable with each other.

• The incomparability of the pair ECO and NCL is induced by chemicals’ subsets, differing with respect to Isoproturon (ECO coded by 0, NCL by 1), and chemicals {CAR,CLO, DAP, EES, PHE} where ECO has the code 1 and NCL 0.

• By elimination of the objects CIV and ECO, two hierarchies appear. The set {CIV, ECO} is called an articulation set (in generalization of articulation point).

• The subset of {HSD, OIH} has a peculiar data structure by which it is separated from the other databases {GES, NCL, OEK}.

• two maximal objects are HSD and GES. (HSD) comprises 4800 chemicals, whereas Databases (GES) comprises 8000 chemicals.

Page 30: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 30

6.conclusions• We analysed the quality of databases with respect to the complete data-

matrix comprising 15 databases and 24 chemicals.

• Subsets of the set of databases (objects) were investigated in three independent steps: the type of database, the origin of database and the number of chemicals contained in each database.

• The Hasse Diagrams generated by the software package ProRank show the most important and least important databases.

• Furthermore, the comparability's and incomparability's are demonstrated and interpreted by examples. The bar diagrams give a concrete insight into the partial order method, and the importance of an articulation set is explained.

Page 31: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 31

Page 32: A multi criteria evaluation of environmental databases using hasse

03/05/2023 Pondicherry University 32

Thank you