Steps in Normalization

download Steps in Normalization

of 25

Transcript of Steps in Normalization

  • 7/26/2019 Steps in Normalization

    1/25

    THE UNIVERSITY OF TEXAS AT AUSTIN

    SCHOOL OF INFORMATION

    LIS 384K.11 (known as INF 385M, beginning with the Fall Semester2003)

    DATABASE-MANAGEMENT PRINCIPLES AND APPLICATIONS

    R. E. Wyllys

    Steps in Normalization

    Contents:Section 1. Introduction

    Section 2. Summary of Definitions of the Normal FormsSection 3. Functional Dependency and Determinants

    Section 4. The 1st Normal Form (1NF)

    Section 5. The 2nd Normal Form (2NF)Section 6. Anomalies and Normalization

    Section 7. Turning a Table with Anomalies into Single-Theme Tables

    Section 8. The 3rd Normal Form (3NF)

    Section 9. The Boyce-Codd Normal Form (BCNF)Section 10. The 4th Normal Form (4NF)

    Section 11. The 5th Normal Form (5NF) and the Domain-Key Normal Form (DKNF)

    Section 11.1. Converting a Table with Partial Dependencies into DKNF TablesSection 11.2. Converting a Table with Transitive Dependencies into DKNF Tables

    Section 11.3. Converting into DKNF a Table in Which Not Every Determinant Is a Candidate Key

    Section 11.4. Converting a Table with Multivalued Dependencies into DKNFSection 11.5. Single-Theme Tables and the DKNF

    Section 1. Introduction

    This handout discusses the normalization of databases. Our goal here is to explain, and to illustrate the need for,

    the various normal forms through examples of sets of relations. The relations in the examples present various

    difficulties, which are removed by procedures stemming from the relevant definitions of normal forms.

    Note: This lesson presents a detailed discussion of normalization. For a simple introduction to the ideas of

    normalization, one source is my lesson entitledOverview of Normalization.

    Section 2. Summary of Definitions of the Normal Forms

    1st Normal Form (1NF)

    Definition: A table (relation) is in 1NF if

    http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%201.%20Introductionhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%201.%20Introductionhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%202.%20Summary%20of%20Definitions%20of%20the%20Normal%20Formshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%202.%20Summary%20of%20Definitions%20of%20the%20Normal%20Formshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%203.%20Functional%20Dependencyhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%203.%20Functional%20Dependencyhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%204.%20The%201st%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%204.%20The%201st%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%205.%20The%202nd%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%205.%20The%202nd%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%206.%20Anomalies%20and%20Normalizationhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%206.%20Anomalies%20and%20Normalizationhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%207.%20Turning%20Table%205.1%20into%20Single-Theme%20Tableshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%207.%20Turning%20Table%205.1%20into%20Single-Theme%20Tableshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%208.%20The%203rd%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%208.%20The%203rd%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%209.%20The%20Boyce-Codd%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%209.%20The%20Boyce-Codd%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2010.%20The%204th%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2010.%20The%204th%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.%20The%205th%20Normal%20Forma%20and%20the%20Domain-Key%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.%20The%205th%20Normal%20Forma%20and%20the%20Domain-Key%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.1.%20Converting%20a%20Table%20with%20Partial%20Dependencies%20into%20DKNF%20Tableshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.1.%20Converting%20a%20Table%20with%20Partial%20Dependencies%20into%20DKNF%20Tableshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.2.%20Converting%20a%20Table%20with%20Transitive%20Dependencies%20into%20DKNF%20Tableshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.2.%20Converting%20a%20Table%20with%20Transitive%20Dependencies%20into%20DKNF%20Tableshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.3.%20Converting%20into%20DKNF%20a%20Table%20in%20Which%20Not%20Every%20Determinant%20Is%20a%20Candidate%20Keyhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.3.%20Converting%20into%20DKNF%20a%20Table%20in%20Which%20Not%20Every%20Determinant%20Is%20a%20Candidate%20Keyhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.4.%20Converting%20a%20Table%20with%20Multivalued%20Dependencies%20into%20DKNFhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.4.%20Converting%20a%20Table%20with%20Multivalued%20Dependencies%20into%20DKNFhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.5.%20Single-Theme%20Tables%20and%20the%20DKNFhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.5.%20Single-Theme%20Tables%20and%20the%20DKNFhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normover.htmlhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normover.htmlhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normover.htmlhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normover.htmlhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.5.%20Single-Theme%20Tables%20and%20the%20DKNFhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.4.%20Converting%20a%20Table%20with%20Multivalued%20Dependencies%20into%20DKNFhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.3.%20Converting%20into%20DKNF%20a%20Table%20in%20Which%20Not%20Every%20Determinant%20Is%20a%20Candidate%20Keyhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.2.%20Converting%20a%20Table%20with%20Transitive%20Dependencies%20into%20DKNF%20Tableshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.1.%20Converting%20a%20Table%20with%20Partial%20Dependencies%20into%20DKNF%20Tableshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2011.%20The%205th%20Normal%20Forma%20and%20the%20Domain-Key%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2010.%20The%204th%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%209.%20The%20Boyce-Codd%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%208.%20The%203rd%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%207.%20Turning%20Table%205.1%20into%20Single-Theme%20Tableshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%206.%20Anomalies%20and%20Normalizationhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%205.%20The%202nd%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%204.%20The%201st%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%203.%20Functional%20Dependencyhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%202.%20Summary%20of%20Definitions%20of%20the%20Normal%20Formshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%201.%20Introduction
  • 7/26/2019 Steps in Normalization

    2/25

    1. There are no duplicated rows in the table.

    2. Each cell is single-valued (i.e., there are no repeating groups or arrays).

    3. Entries in a column (attribute, field) are of the same kind.

    Note: The order of the rows is immaterial; the order of the columns is immaterial.

    Note: The requirement that there be no duplicated rows in the table means that the table has a key (although thekey might be made up of more than one column--even, possibly, of all the columns).

    2nd Normal Form (2NF)

    Definition: A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on all of the key.

    Note: Since a partial dependency occurs when a non-key attribute is dependent on only a part of the (composite)

    key, the definition of 2NF is sometimes phrased as, "A table is in 2NF if it is in 1NF and if it has no partial

    dependencies."

    3rd Normal Form (3NF)

    Definition: A table is in 3NF if it is in 2NF and if it has no transitive dependencies.

    Boyce-Codd Normal Form (BCNF)

    Definition: A table is in BCNF if it is in 3NF and if every determinant is a candidate key.

    4th Normal Form (4NF)

    Definition: A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies.

    5th Normal Form (5NF)

    Definition: A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is in 4NF and if

    every join dependency in the table is a consequence of the candidate keys of the table.

    Domain-Key Normal Form (DKNF)

    Definition: A table is in DKNF if every constraint on the table is a logical consequence of the definition of

    keys and domains.

    Section 3. Functional Dependency and Determinants

    Before we develop the ideas of normalization further, it is important for you to have an understanding of

    "functional dependency." The essence of this idea is that if the existence of something, call it A, implies that B

    must exist and have a certain value, then we say that "B is functionally dependent on A." We also often expressthis idea by saying that "A determines B," or that "B is a function of A," or that "A functionally governs B."

    Often, the notions of functionality and functional dependency are expressed briefly by the statement, "If A, then

    B." It is important to note that the value B must be uniquefor a given value of A, i.e., any given value of Amust imply just one and only one value of B, in order for the relationship to qualify for the name "function."

    (However, this does not necessarily prevent different values of A from implying the same value of B.)

  • 7/26/2019 Steps in Normalization

    3/25

    For the terminology of relational databases, the word "function" was borrowed from mathematics, where it is

    common to say things like "y is a function of x" or "y = f(x)". (The latter expression is read "y equals f of x".)The determining value, x, is called the argument; the determined value, y or f(x), is called the result.

    The expression "y = f(x)" is a very general, and abstract, way of talking about functionality. Outside ofmathematics--and, in particular, ordinarily in relational database management--we talk not abstractly but in

    terms of particular examples. (Indeed, the general idea of a "function" is best understood when one has seen

    enough examples of specific functions to be able to start generalizing about the abstract, or general, properties

    that the specific functions share.)

    Here are some examples of functions. An easy one is y = x2. This particular function says that if we are given a

    particular value for x, say 3, then we must say that y has the value 9. (We could also write y = f(x) = x2or just

    f(x) = x2.) Another easy one is: y = x

    3. This particular function says that if we are given a particular value for x,

    say -2, then we must say that y has the value -8.

    A common way of indicating functions is to place the determining and determined values side by side in a table.

    Thus we can place sample values of the function, y = x2, in a table like the one shown here.

    This table shows just three of the infinity of possible pairs of values, x and y,

    for the function y = x2. It also shows that for some functions, different valuesof x (here, 3 and -3) imply the same value (here, 9) of the function.

    The functions we have given as examples so far have been functions that arespecified by an algebraic function. But the idea of function is more general;

    i.e., functions need not be algebraically defined. The essence of the idea of

    function is that to a specified determining value corresponds a uniquedetermined value. This essence can be defined, among other ways, by placing

    the determining and determined values in a table that displays and/or defines

    the relationship between the argument and the result.

    Note that the table above displays, but does not fully define, the relationship,y = x

    2. This function, since it has an infinite number of pairs of values, cannot be fully defined in a table. For

    functions that involve only a finite number of pairs of values of argument and result, a table is often aconvenient way--and may in fact be the only way--of displaying and, at the same time, defining the function.

    Here is a simple example of a finite function that is both displayed and defined in a table. Most of you will befamiliar with the conventional (though often delightfully breakable) rules for serving different types of wines

    with different courses in a dinner. Let us assume for the purpose of this example that these rules can be

    summarized as follows: with meat, serve red wine; with fish, white wine; and with cheese, ros wine. Then the

    following table defines the course-wine function:

    But note that this table looks just like a database table. In fact, there is noreason not to consider it a database table. Indeed, this table defines a relation

    in the database sense: it has columns, each of which contains entries of the

    same kind, and it has no duplicate rows. In other words, not only does thecourse-wine table display the data about the conventional rules for which

    wine to serve with which course, but also the table can be viewed as defining

    a function for which the determining value is the dinner course and the

    determined value is the type of wine. Thus we can say that type of wine isfunctionally dependent on the dinner course, or equally well, that the course

    determines the wine.

    Value of x("argument," or

    "A")

    Value of y =x2("the

    function," or

    "the result",or "B")

    3 9

    4 16

    -3 9

    Dinner Course Type of Wine

    meat red

    fish white

    cheese ros

  • 7/26/2019 Steps in Normalization

    4/25

    In relational database terminology, we often call the argument of the function (the dinner course in this

    example) the "determinant", and we often use an arrow notation to exhibit the functional dependency. Thus, wecan say that the dinner course is the determinant of the type of wine, and we can write: dinner course wine.

    And we can say that the attribute, type of wine, is functionally dependent on the attribute, dinner course.

    In general, a functional dependency is a relationship among attributes. In relational databases, we can have a

    determinant that governs one other attribute or several other attributes. To go back to our mathematical

    examples for a moment, we could view the situation of functional dependency of several attributes on one

    determinant as being like having several linked functions that share an argument and can be displayedeconomically in just one table. For example, consider the following table that displays sample values of the

    algebraic functions y = x2, y = x

    3, and y = x

    4.

    Looking at this table from the relational-database point

    of view, we can say that the attributes x2, x

    3, and x

    4are

    all functionally dependent on the attribute x.

    Similarly, we could expand the dinner-course and wine

    table to exhibit also the type of cutlery that would be

    appropriate in the case of a formal dinner.

    From this table we see that the attributes, type of wine and

    type of cutlery, are functionally dependent on theattribute, dinner course.

    Using the arrow notation, we have:

    dinner course wine

    and

    dinner course cutlery.

    Section 4. The 1st Normal Form (1NF)

    Now we ready to come to grips with the ideas of normalization. The following table, containing information

    about some students at Enormous State University, is a table that is in 1st Normal Form, 1NF. (Here as

    elsewhere in the rest of this discussion, you may want to refer back toSection 2. Summary of Definitions of the

    Normal Forms,where the various normal forms are defined.)

    Table 4.1

    Value of x Value of x Value of x Value ofx

    4

    3 9 27 81

    4 16 64 256

    -3 9 -27 81

    Dinner Course Type of Wine Type of Cutlery

    meat red meat fork

    fish white fish fork

    cheese ros cheese fork

    Social

    Security

    Number

    FirstName LastName Major

    123-45-6789 Jack Jones Library and Information

    Science

    http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%202.%20Summary%20of%20Definitions%20of%20the%20Normal%20Formshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%202.%20Summary%20of%20Definitions%20of%20the%20Normal%20Formshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%202.%20Summary%20of%20Definitions%20of%20the%20Normal%20Formshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%202.%20Summary%20of%20Definitions%20of%20the%20Normal%20Formshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%202.%20Summary%20of%20Definitions%20of%20the%20Normal%20Formshttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%202.%20Summary%20of%20Definitions%20of%20the%20Normal%20Forms
  • 7/26/2019 Steps in Normalization

    5/25

    You can easily verify for yourself

    that this table satisfies the definitionof 1NF: viz., it has no duplicated

    rows; each cell is single-valued (i.e.,

    there are no repeating groups or

    arrays); and all the entries in a givencolumn are of the same kind.

    In Table 4.1 we can see that the key,SSN, functionally determines the

    other attributes; i.e., a given Social

    Security Number implies (determines) a particular value for each of the attributes FirstName, LastName, andMajor (assuming, at least for the moment, that a student is allowed to have only one major). In the arrow

    notation: SSN FirstName, SSN LastName, and SSN Major.

    A key attribute will, by the definition of key, uniquely determine the values of the other attributes in a table; i.e.,

    all non-key attributes in a table will be functionally dependent on the key. But there may be non-key attributes

    in a table that determine other attributes in that table. Consider the following table:

    Table 4.2

    In Table 4.2 the Level attribute can be said to be

    functionally dependent on the Major attribute.

    Thus we have an example of an attribute that isfunctionally dependent on a non-key attribute.

    This statement is true in the tableper se, and that

    is all that the definition of functional dependence

    requires; but the statement also reflects the real-world fact that Library and Information Science is

    a major that is open only to graduate students and

    that Pre-Medicine and Pre-Law are majors that areopen only to undergraduate students.

    Section 5. The 2nd Normal Form (2NF)

    Table 4.2 has another interesting aspect. Its key is a composite key, consisting of the paired attributes,

    FirstName and LastName. The Level attribute is functionally dependent on this composite key, of course; but,

    in addition, Level can be seen to be dependent on only the attribute LastName. (This is true because each value

    of Level is paired with a distinct value of LastName. In contrast, there are two occurrences of the value Lynn

    for the attribute FirstName, and the two Lynns are paired with different values of Level, so Level is notfunctionally dependent on FirstName.) Thus this table fails to qualify as a 2nd Normal Form table, since the

    definition of 2NF requires that all non-key attributes be dependent on all of the key. (Admittedly, this exampleof a partial dependency is artificially contrived, but nevertheless it illustrates the problem of partial

    dependency.)

    We can turn Table 4.2 into a table in 2NF in an easy way, by adding a column for the Social Security Number,

    which will then be the natural thing to use as the key.

    Table 5.1

    222-33-4444 Lynn Lee Library and Information

    Science

    987-65-4321 Mary Ruiz Pre-Medicine

    123-54-3210 Lynn Smith Pre-Law

    111-33-5555 Jane Jones Library and InformationScience

    FirstName LastName Major Level

    Jack Jones LIS Graduate

    Lynn Lee LIS Graduate

    Mary Ruiz Pre-Medicine Undergraduate

    Lynn Smith Pre-Law Undergraduate

    Jane Jones LIS Graduate

  • 7/26/2019 Steps in Normalization

    6/25

    SSN FirstName LastName Major Level

    123-45-

    6789

    Jack Jones LIS Graduate

    222-33-4444

    Lynn Lee LIS Graduate

    987-65-

    4321

    Mary Ruiz Pre-

    Medicine

    Undergraduate

    123-54-3210

    Lynn Smith Pre-Law Undergraduate

    111-33-

    5555

    Jane Jones LIS Graduate

    With the SSN defined as the key, Table 5.1 is in 2NF, as you can easily verify. This illustrates the fact that anytable that is in 1NF and has a single-attribute (i.e., a non-composite) key is automatically also in 2NF.

    Table 5.1 still exhibits some problems, however. For example, it contains some repeated information about the

    LIS-Graduate pairing.

    Section 6. Anomalies and Normalization

    At this point it is appropriate to note that the main thrust behind the idea of normalizing databases is the

    avoidance of insertion and deletion anomalies in databases.

    To illustrate the idea of anomalies, consider what would happen to our knowledge (at least, as explicitly

    contained in a table) of the level of the major, Pre-Medicine, if Mary Ruiz left Enormous State University. With

    the deletion of the row for Ms. Ruiz, we would lose the information that Pre-Medicine is an Undergraduate

    major. This is an example of a deletion anomaly. We may possess the real-world information that Pre-Medicineis an Undergraduate major, but no such information is explicitly contained in a table in our database.

    As an example of an insertion anomaly, we can suppose that a new student wants to enroll in ESU: e.g., suppose

    Jane Doe wants to major in Public Affairs. From the information in Table 5.1 we cannot tell whether Public

    Affairs is an Undergraduate or a Graduate major; in fact, we do not even know whether Public Affairs is an

    established major at ESU. We do not know whether it is permissible to insert the value, Public Affairs, as avalue of the attribute, Major, or what to insert for the attribute, Level, if we were to assume that Public Affairs

    is a valid value for Major. The point is that while we may possess real-world information about whether Public

    Affairs is a major at ESU and what its level is, this information is not explicitly contained in any table that we

    have thus far mentioned as part of our database.

    A database-management system, a DBMS, can work only with the information that we put explicitly into itstables for a given database and into its rules for working with those tables, where such rules are appropriate and

    possible.

    How do anomalies relate to normalization? The simple answer is that by arranging that the tables in a database

    are sufficiently normalized (in practice, this typically means to at least the 4th level of normalization), we can

    ensure that anomalies will not arise in our database. Anomalies are difficult to avoid directly, because with

  • 7/26/2019 Steps in Normalization

    7/25

    databases of typical complexity (i.e., several tables) the database designer can easily overlook possible

    problems. Normalization offers a rigorous way of avoiding unrecognized anomalies.

    Normalization may look like a difficult process when one views it from the standpoint of the formal definitions

    of the various normal forms, as presented in Section 2 of this handout. But in practice, you can easily attainsufficient normalization in your database by simply ensuring that the tables in your database are what we can

    call "single-theme" tables. This idea will be illustrated as we proceed through the rest of the discussion in this

    handout.

    Section 7. Turning a Table with Anomalies (Table 5.1) into Single-Theme Tables

    AlthoughTable 5.1is in 2NF, it is still open to the problems of insertion and deletion anomalies, as thediscussion in the preceding section shows. The reason is that Table 5.1 deals with more than a single theme.

    What can we do to turn it into a set of tables that are, or at least come closer to being, single-theme tables?

    A reasonable way to proceed is to note that Table 5.1 deals with both information about students (their names

    and SSNs) and information about majors and levels. This should strike you as two different themes. Presented

    below is one possible set of single-theme tables dealing with the information in Table 5.1. (To save space, thefollowing tables also contain some information that is not in Table 5.1, and the discussion will deal with this

    added information.)

    Table 7.1

    SSN FirstName LastName

    123-45-

    6789

    Jack Jones

    222-33-

    4444

    Lynn Lee

    987-65-4321

    Mary Ruiz

    123-45-

    4321

    Lynn Smith

    111-33-

    5555

    Jane Jones

    999-88-

    7777

    Newton Gingpoor

    Table 7.2

    Major Level

    LIS Graduate

    http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%205.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%205.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%205.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%205.1
  • 7/26/2019 Steps in Normalization

    8/25

    Pre-Medicine Undergraduate

    Pre-Law Undergraduate

    PublicAffairs

    Graduate

    Table 7.3

    SSN Major

    123-45-

    6789

    LIS

    222-33-

    4444

    LIS

    987-65-4321 Pre-Medicine

    123-54-

    3210

    Pre-Law

    111-33-

    5555

    LIS

    The three preceding tables should strike you as providing a better arrangement of the information in Table 5.1.For one thing, this arrangement puts the information about the students into a smaller table, Table 7.1, which

    happily fails to contain redundant information about the LIS-Graduate pairing. For another thing, thisarrangement permits us to enter information about students (e.g., Newton Gingpoor) who have not yet identifiedthemselves as pursuing a particular major. For still another thing, it puts the information about the Major-Level

    pairings into a separate table, Table 7.2, which can easily be expanded to include information (e.g., that the

    Public Affairs major is at the Graduate level) about majors for which, at the moment, there may be no studentsregistered. Finally, Table 7.3 provides the needed link between individual students and their majors (note that

    Newton Gingpoor's SSN is not in this Table 7.3, which tells us that he has not yet selected a major).

    Tables 7.1 - 7.3 are single-theme tables and are in 2NF, as you can easily verify. (In fact, they are in DKNF, but

    we are not yet ready to discuss the latter level in detail.)

    Section 8. The 3rd Normal Form (3NF)

    In order to discuss the 3rd Normal Form, we need to begin by discussing the idea of transitive dependencies.

    In mathematics and logic, a transitive relationship is a relationship of the following form: "If A implies B, and if

    also B implies C, then A implies C." An example is: "If John Doe is a human, and if every human is a primate,

    then John Doe must be a primate." Another way of putting it is this: "If A functionally governs B, and if Bfunctionally governs C, then A functionally governs C." In the arrow notation, we have:

    [(A B) and (B C)] (A C)

    http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%205.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%205.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%205.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%205.1
  • 7/26/2019 Steps in Normalization

    9/25

    The following table, Table 8.1, provides an example of how transitive dependencies can occur in a table in a

    relational database.

    Table 8.1

    AuthorLast

    Name

    AuthorFirst

    Name

    Book Title Subject Collection orLibrary

    Building

    Berdahl Robert The Politics of the

    Prussian Nobility

    History PCL General

    Stacks

    Perry-Castaeda

    Library

    Yudof Mark Child Abuse and Neglect LegalProcedures

    Law Library Townes Hall

    Harmon Glynn Human Memory and

    Knowledge

    Cognitive

    Psychology

    PCL General

    Stacks

    Perry-Castaeda

    Library

    Graves Robert The Golden Fleece Greek

    Literature

    Classics Library Waggener Hall

    Miksa Francis Charles Ammi Cutter LibraryBiography

    Library andInformation

    Science

    Collection

    Perry-CastaedaLibrary

    Hunter David Music Publishing andCollecting

    MusicLiterature

    Fine Arts Library Fine ArtsBuilding

    Graves Robert English and Scottish

    Ballads

    Folksong PCL General

    Stacks

    Perry-Castaeda

    Library

    By examining Table 8.1 we can infer that books dealing with history, cognitive psychology, and folksong areassigned to the PCL General Stacks collection; that books dealing with legal procedures are assigned to the Law

    Library; that books dealing with Greek literature are assigned to the Classics Library; that books dealing with

    library biography are assigned to the Library and Information Science Collection (LISC);and that books dealing

    with music literature are assigned to the Fine Arts Library.

    Further, we can infer that the PCL General Stacks collection and the LISC are both housed in the Perry-

    Castaeda Library (PCL) building; that the Classics Library is housed in Waggener Hall; and that the LawLibrary and Fine Arts Library are housed, respectively, in Townes Hall and the Fine Arts Building.

    Thus we see that there is a transitive dependency in Table 8.1: any book that deals with history, cognitivepsychology, or library biography will be physically housed in the PCL building (unless it is temporarily

    checked out to a borrower); any book dealing with legal procedures will be housed in Townes Hall; and so on.

    In short, if we know what subject a book deals with, we also know not only what library or collection it will beassigned to but also what building it is physically housed in.

    What is wrong with having a transitive dependency or dependencies in a table? For one thing, there isduplicated information: from three different rows we can see that the PCL General Stacks are in the PCL

    building. For another thing, we have possible deletion anomalies: if the Yudof book were lost and its row

    removed from Table 8.1, we would lose the information that books on legal procedures are assigned to the Law

  • 7/26/2019 Steps in Normalization

    10/25

    Library and also the information the Law Library is in Townes Hall. As a third problem, we have possible

    insertion anomalies: if we wanted to add a chemistry book to the table, we would find that Table 8.1 nowherecontains the fact that the Chemistry Library is in Robert A.Welch Hall. As a fourth problem, we have the

    chance of making errors in updating: a careless data-entry clerk might add a book to the LISC but mistakenly

    enter Townes Hall in the building column.

    The solution to the problem is, once again, to place the information in Table 8.1 into appropriate single-theme

    tables. Here is one such possible arrangement:

    Table 8.2

    AuthorLast

    Name

    AuthorFirst

    Name

    Book Title

    Berdahl Robert The Politics of the Prussian Nobility

    Yudof Mark Child Abuse and Neglect

    Harmon Glynn Human Memory and Knowledge

    Graves Robert The Golden Fleece

    Miksa Francis Charles Ammi Cutter

    Hunter David Music Publishing and Collecting

    Graves Robert English and Scottish Ballads

    Table 8.3

    Book Title Subject

    The Politics of the Prussian Nobility History

    Child Abuse and Neglect Legal Procedures

    Human Memory and Knowledge Cognitive Psychology

    The Golden Fleece Greek Literature

    Charles Ammi Cutter Library Biography

    Music Publishing and Collecting Music Literature

    English and Scottish Ballads Folksong

  • 7/26/2019 Steps in Normalization

    11/25

    Table 8.4

    Subject Collection or Library

    History PCL General Stacks

    Legal Procedures Law Library

    Cognitive Psychology PCL General Stacks

    Greek Literature Classics Library

    Library Biography Library and Information Science Collection

    Music Literature Fine Arts Library

    Folksong PCL General Stacks

    Table 8.5

    Collection or Library Building

    PCL General Stacks Perry-Castaeda Library

    Law Library Townes Hall

    Classics Library Waggener Hall

    Library and Information ScienceCollection

    Perry-Castaeda Library

    Fine Arts Library Fine Arts Building

    You can verify for yourself that none of these tables contains a transitive dependency; hence, all of them are in3NF (and, in fact, in DKNF).

    We can note in passing that the fact thatTable 8.2contains the first and last names of Robert Graves in twodifferent rows suggests that it might be worthwhile to replace it with two further tables, along the lines of:

    Table 8.6

    Author

    Last Name

    Author

    First

    Name

    Author

    Identification

    Number

    http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.2http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.2http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.2http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.2
  • 7/26/2019 Steps in Normalization

    12/25

    Berdahl Robert 001

    Yudof Mark 002

    Harmon Glynn 003

    Graves Robert 004

    Miksa Francis 005

    Hunter David 006

    Table 8.7

    Author

    Identification

    Number

    Book Title

    001 The Politics of the Prussian

    Nobility

    002 Child Abuse and Neglect

    003 Human Memory and Knowledge

    004 The Golden Fleece

    005 Charles Ammi Cutter

    006 Music Publishing and Collecting

    004 English and Scottish Ballads

    Though Tables 8.6 and 8.7 together take a little more space thanTable 8.2,it is easy to see that given a muchlarger collection, in which there would be many more authors with multiple works to their credit, Tables 8.6 and

    8.7 would be more economical of storage space than Table 8.2. Furthermore, the structure of Tables 8.6 and 8.7

    lessens the chance of making updating errors (e.g., typing Grave instead of Graves, or Miska instead of Miksa).

    Section 9. The Boyce-Codd Normal Form (BCNF)

    The Boyce-Codd Normal Form (BCNF) deals with the anomalies that can occur when a table fails to have theproperty that every determinant is a candidate key. Here is an example, Table 9.1, that fails to have this

    property. (In Table 9.1 the SSNs are to be interpreted as those of students with the stated majors and advisers.

    Note that each of students 123-45-6789 and 987-65-4321 has two majors, with a different adviser for eachmajor.)

    Table 9.1

    http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.2http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.2http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.2http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.2
  • 7/26/2019 Steps in Normalization

    13/25

    We begin by showing that Table 9.1 lacks

    the required property, viz., that everydeterminant be a candidate key.

    What are the determinants in Table 9.1? Onedeterminant is the pair of attributes, SSN and

    Major. Each distinct pair of values of SSN

    and Major determines a unique value for the

    attribute, Adviser. Another determinant isthe pair, SSN and Adviser, which determines

    unique values of the attribute, Major. Still

    another determinant is the attribute, Adviser,for each different value of Adviser

    determines a unique value of the attribute,

    Major. (These observations about Table 9.1

    correspond to the real-world facts that eachstudent has a single adviser for each of his or

    her majors, and each adviser advises in just

    one major.)

    Now we need to examine these three

    determinants with respect to the question ofwhether they are candidate keys. The answer

    is that the pair, SSN and Major, is a candidate key, for each such pair uniquely identifies a row in Table 9.1. In

    similar fashion, the pair, SSN and Adviser, is a candidate key. But the determinant, Adviser, is not a candidatekey, because the value Dewey occurs in two rows of the Adviser column. So Table 9.1 fails to meet the

    condition that every determinant in it be a candidate key.

    It is easy to check on the anomalies in Table 9.1. For example, if student 987-65-4321 were to leave Enormous

    State University, the table would lose the information that Semmelweis is an adviser for the Pre-Medicine

    major. As another example, Table 9.1 has no information about advisers for students majoring in history.

    As usual, the solution lies in constructing single-theme tables containing the information in Table 9.1. Here are

    two tables that will do the job.

    Table 9.2

    SSN Adviser

    123-45-

    6789

    Dewey

    123-45-

    6789

    Roosevelt

    222-33-

    4444

    Putnam

    555-12-

    1212

    Dewey

    SSN Major Adviser

    123-45-

    6789

    Library and Information Science Dewey

    123-45-6789

    Public Affairs Roosevelt

    222-33-

    4444

    Library and Information Science Putnam

    555-12-1212

    Library and Information Science Dewey

    987-65-

    4321

    Pre-Medicine Semmelweis

    987-65-

    4321

    Biochemistry Pasteur

    123-54-3210

    Pre-Law Hammurabi

  • 7/26/2019 Steps in Normalization

    14/25

    987-65-

    4321

    Semmelweis

    987-65-4321

    Pasteur

    123-54-

    3210

    Hammurabi

    Table 9.3

    Major Adviser

    Library and Information

    Science

    Dewey

    Public Affairs Roosevelt

    Library and InformationScience

    Putnam

    Pre-Medicine Semmelweis

    Biochemistry Pasteur

    Pre-Law Hammurabi

    History Herodotus

    By way of an example of the value of separating Table 9.1 into single-theme tables, Table 9.3 includesinformation about at least one faculty member at ESU who could be the adviser of a student who wanted to

    major in history.

    Tables 9.2 and 9.3 are in BCNF (in fact, they are in DKNF), since every determinant in them is also a candidatekey. You can easily verify this statement if you note that the key in Table 9.2 is a composite key, SSN and

    Adviser.

    Section 10. The 4th Normal Form (4NF)

    The 4th Normal Form is concerned with the anomalies that can occur when a table fails to have the property of

    containing no multivalued dependencies (i.e., the anomalies that can occur when a table does have suchdependencies). We develop below a table that has these undesirable multivalued dependencies.

    Suppose we have some information about the hobbies of some students at Enormous State University and wantto put this information into a database. Suppose, in particular, that Jack Jones's hobbies are surfing the Internet

    and playing chess; Lynn Lee's, photography and stamp collecting; Mary Ruiz's, surfing the Internet and

    photography; and Lynn Smith's, playing poker.

    If we (foolishly) try to put all this information into just one table, here is what we get.

  • 7/26/2019 Steps in Normalization

    15/25

    Table 10.1

    The problem is that Jack Jones, for

    example, has two majors and two hobbies.

    If we coupled each of his majors with justone of his hobbies (e.g., LIS with chess, or

    Public Affairs with surfing the Internet), we

    would imply that Jack plays chess only as

    an LIS major and surfs the Internet only asa Public Affairs major. This would not

    make sense. (Note that in this relatively

    small and simple example, it is obvious thatsuch restrictive pairing does not make

    sense. In practice, however, the problems

    arise in connection with much larger tables,

    where it may be very difficult to detect thatrestrictive pairing has occurred.) To avoid

    such false implications, we enter all

    pairings of majors and hobbies for all the

    students. Obviously, however, thisapproach has the problem of redundant

    information. Equally obviously, updating

    this table presents anomalies; for example,you can work out for yourself what would

    have to be added to Table 10.1 if Jones

    took up tennis as a third hobby.

    This situation is an example of the effects

    of multivalued dependencies. Amultivalued dependency occurs when (a) a

    table has at least three attributes, (b) two ofthe attributes are multivalued, and (c) thevalues of the multivalued attributes depend

    on only one of the remaining attributes. Table 10.1 fits these specifications for the following reasons: The

    LastName attribute determines multiple values of the attributes Major and Hobby, but neither of these latterattributes depends on the other; they are independent.

    The notation for multivalued dependency is a double arrow. In this example, we can write: LastName Major, and LastName Hobby. We read these expressions as, "LastName multidetermines Major" and

    "LastName multidetermines Hobby."

    Once again, single-theme tables provide the solution. We break Table 10.1 down into the following tables.

    Table 10.2

    LastName Major

    Jones Library and Information Science

    Jones Public Affairs

    LastName Major Hobby

    Jones Library and Information

    Science

    Surfing the

    Internet

    Jones Library and InformationScience Chess

    Jones Public Affairs Surfing the

    Internet

    Jones Public Affairs Chess

    Lee Library and InformationScience

    Photography

    Lee Library and Information

    Science

    Stamp collecting

    Ruiz Pre-Medicine Surfing the

    Internet

    Ruiz Pre-Medicine Photography

    Ruiz Biochemistry Surfing the

    Internet

    Ruiz Biochemistry Photography

    Smith Pre-Law Playing poker

  • 7/26/2019 Steps in Normalization

    16/25

    Lee Library and Information Science

    Ruiz Pre-Medicine

    Ruiz Biochemistry

    Smith Pre-Law

    Table 10.3

    LastName Hobby

    Jones Surfing the Internet

    Jones Chess

    Lee Photography

    Lee Stamp collecting

    Ruiz Surfing the Internet

    Ruiz Photography

    Smith Playing poker

    Tables 10.2 and 10.3 display, separately, the various students' majors and hobbies; and while doing so, thesetables correctly avoid suggesting any connections between particular majors and particular hobbies.

    Section 11. The 5th Normal Form (5NF) and the Domain-Key Normal Form (DKNF)

    The 5th Normal Form is difficult to illustrate in terms of relatively simple examples. Hence, we will not attempt

    to illustrate the 5NF property of having every join dependency in the table be a consequence of the candidatekeys of the table. This omission is a minor one, for at least two reasons: First, in practice the 4NF is often

    regarded as sufficient; and second, the Domain-Key Normal Form (DKNF) subsumes the 5NF.

    The DKNF is important because it offers a complete solution to the problem of avoiding anomalies: A set oftables (relations) that is in DKNF is known, as a consequence of a theorem proved by Ronald Fagin in 1981, to

    be free of anomalies. We do not attempt here to reproduce the proof of Fagin's theorem but merely to illustrate

    how the theorem can be applied in practice.

    The DKNF definition is this: A relation is in DKNF if every constraint on the relation is a logical consequenceof the definitions of keys and domains. To understand what this definition means, we begin by noting that the

    central ideas are embodied in the words "constraint," "key," and "domain." By "key" Fagin means both primary

    keys and candidate keys. By "domain" Fagin means the set of definitions of the contents of attributes (columns)

    and any limitations on the kind of data to be stored in the columns, such as a limitation to only numeric data or

  • 7/26/2019 Steps in Normalization

    17/25

    only logical data; in addition, domain limitations may include such matters as the format (e.g., a limitation on

    numeric data to being expressed to exactly two decimal digits). By "constraint" Fagin means any rule dealingwith attributes that is clear enough so that one can decide whether the rule is upheld or broken by any set of the

    data with which one is dealing.

    There is an important qualification to be attached to the DKNF definition as presented in the preceding

    paragraph. Fagin excludes constraints that are time-dependent or relate to changes made in data values. That

    means that a time-dependent constraint (or other constraint on changes in value) may exist in a table and may

    fail to be a logical consequence of the definitions of keys and domains, yet the table may nevertheless be inDKNF.

    As an illustration, some states have a property-tax rule specifying that the assessed value of the primary-

    residence property owned by a citizen over 65 cannot be increased above the value that was assessed in the year

    in which the property owner turned 65. The existence of such a rule would not, in itself, prevent a table ofproperties and their assessed values from being in DKNF.

    Achieving DKNF amounts to establishing a set of tables in each of which the constraints follow logically from

    (i.e., are logical consequences of) the keys and the domain definitions. Although there is no direct procedure forconverting an arbitrary table into one or more tables each of which is in DKNF, in practice the effort to replace

    an arbitrary table by a set of single-theme tables achieves the goal. To show this, we consider some of theprevious examples from the DKNF point of view.

    Section 11.1. Converting a Table with Partial Dependencies into DKNF Tables

    Here once again is the table,Table 4.2,that we used in our discussion of the problem of partial dependencies.

    Since we going to use it here, we name this copy of it Table 11.1.1.

    Table 11.1.1

    Let us consider Table 11.1.1 from the DKNF point of

    view. First, we see that the key is composite,consisting of the LastName-FirstName pair ofattributes. We see also that all other attributes in the

    table are dependent on this key. But there is another

    significant aspect to this table: the Level attribute is

    dependent on the LastName attribute, i.e., Level isdependent on just part of the key. (As noted earlier,

    this partial dependency is contrived, but nevertheless

    it illustrates the problem of partial dependency.)

    Because Level is dependent on just LastName, thetable fails to be one in which all constraints are logical

    consequences of the key; hence, Table 11.1 is not inDKNF.

    From the DKNF point of view, therefore, we see that we should take the Level attribute out of Table 11.1.1 andput it in some other table, or tables, where it will be a logical consequence of the keys and domains. Clearly, a

    table that associates just the attributes Major and Level will achieve this.

    We will also need a table that provides the necessary link between the paired attributes, FirstName and

    LastName, and the attribute Major. In such a table, the attribute Major will be a logical consequence of the keys

    and domains.

    FirstName LastName Major Level

    Jack Jones LIS Graduate

    Lynn Lee LIS Graduate

    Mary Ruiz Pre-

    Medicine

    Undergraduate

    Lynn Smith Pre-Law Undergraduate

    Jane Jones LIS Graduate

    http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%204.2http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%204.2http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%204.2http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%204.2
  • 7/26/2019 Steps in Normalization

    18/25

    Thus it appears that we need two tables, one containing just Major and Level, and the other containing

    FirstName, LastName, and Major. We can indicate this more briefly as Table A: (Major, Level) and Table B:(FirstName, LastName, Major).

    Here are the tables.

    Table 11.1.2 (Table A as described above)

    Major Level

    LIS Graduate

    Pre-Medicine Undergraduate

    Pre-Law Undergraduate

    Table 11.1.3 (Table B as described above)

    FirstName LastName Major

    Jack Jones LIS

    Lynn Lee LIS

    Mary Ruiz Pre-

    Medicine

    Lynn Smith Pre-Law

    Jane Jones LIS

    These are single-theme tables, and we arrived at them by steps aimed at achieving DKNF.

    Section 11.2. Converting a Table with Transitive Dependencies into DKNF Tables

    Here once again is the table,Table 8.1,that we used in our discussion of transitive dependencies. Since we

    going to use it here, we name this copy of it Table 11.2.1.

    Table 11.2.1

    F

    AuthorLast

    Name

    AuthorFirst

    Name

    Book Title Subject Collection orLibrary

    Building

    Berdahl Robert The Politics of the

    Prussian Nobility

    History PCL General

    Stacks

    Perry-Castaeda

    Library

    http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1
  • 7/26/2019 Steps in Normalization

    19/25

    Yudof Mark Child Abuse and

    Neglect

    Legal

    Procedures

    Law Library Townes Hall

    Harmon Glynn Human Memory andKnowledge

    CognitivePsychology

    PCL GeneralStacks

    Perry-CastaedaLibrary

    Graves Robert The Golden Fleece Greek

    Literature

    Classics Library Waggener Hall

    Miksa Francis Charles Ammi Cutter LibraryBiography

    Library andInformation

    Science Collection

    Perry-CastaedaLibrary

    Hunter David Music Publishing and

    Collecting

    Music

    Literature

    Fine Arts Library Fine Arts

    Building

    Graves Robert English and Scottish

    Ballads

    Folksong PCL General

    Stacks

    Perry-Castaeda

    Library

    You will recall from the discussion of this table asTable 8.1that it exhibits the following transitivedependencies: Book Title Subject, Subject Collection-Library, and Collection-Library Building. From

    the DKNF point of view, this means that the primary key, Book Title, is not the only thing that determines theCollection-Library attribute and the Building attribute. In turn, this means that there are constraints that are not

    logical consequences of the key and, hence, that the table is not in DKNF.

    Reasoning from the DKNF point of view, we would like to have a table in which the Building attribute is a

    logical consequence of the key; constructing a table containing the Collection-Library and Building attributes,

    with Collection-Library as key, will accomplish that. Again from the DKNF point of view, we would like tohave a table in which the Collection-Library attribute is a logical consequence of the key; clearly, a table

    containing Subject (as key) and Collection-Library suffices. The same point of view leads us to desire a table in

    which the Author First Name and Author Last Name attributes will be a logical consequence of the key; such atable is one that contains Book Title (as key), Author First Name, and Author Last Name. Finally, a table thatcontains Book Title (as key) and Subject will be (1) a table in which the attribute Subject will be a logical

    consequence of the key and (2) a table that provides the necessary connection between Title and Subject.

    Thus from the DKNF point of view, we are led to the same tables as previously:

    Table 11.2.2

    Author

    Last

    Name

    Author

    First

    Name

    Book Title

    Berdahl Robert The Politics of the Prussian Nobility

    Yudof Mark Child Abuse and Neglect

    Harmon Glynn Human Memory and Knowledge

    Graves Robert The Golden Fleece

    http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1
  • 7/26/2019 Steps in Normalization

    20/25

    Miksa Francis Charles Ammi Cutter

    Hunter David Music Publishing and Collecting

    Graves Robert English and Scottish Ballads

    Table 11.2.3

    Book Title Subject

    The Politics of the Prussian Nobility History

    Child Abuse and Neglect Legal Procedures

    Human Memory and Knowledge Cognitive Psychology

    The Golden Fleece Greek Literature

    Charles Ammi Cutter Library Biography

    Music Publishing and Collecting Music Literature

    English and Scottish Ballads Folksong

    Table 11.2.4

    Subject Collection or Library

    History PCL General Stacks

    Legal Procedures Law Library

    Cognitive Psychology PCL General Stacks

    Greek Literature Classics Library

    Library Biography Library and Information Science Collection

    Music Literature Fine Arts Library

    Folksong PCL General Stacks

    Table 11.2.5

  • 7/26/2019 Steps in Normalization

    21/25

    Collection or Library Building

    PCL General Stacks Perry-Castaeda Library

    Law Library Townes Hall

    Classics Library Waggener Hall

    Library and Information Science

    Collection

    Perry-Castaeda Library

    Fine Arts Library Fine Arts Building

    These are the tables presented inSection 8as single-theme tables that solved the transitive-dependency problemofTable 8.1.Here we have arrived at these same tables by considering how the information inTable 11.2.1(the

    same information as inTable 8.1)should be re-arranged from the DKNF point of view.

    Section 11.3. Converting into DKNF a Table in Which Not Every Determinant Is a Candidate Key

    Here is the table,Table 9.1,that we used earlier to illustrate the problem of a table in which not every

    determinant is a candidate key. Since we going to use it here, we name this copy of it Table 11.3.1.

    Table 11.3.1

    You will recall from the discussion of this table

    asTable 9.1that one determinant is the pair of

    attributes, SSN and Major, which determinesAdviser; another determinant is the pair, SSN

    and Adviser, which determines Major; and stillanother is Adviser alone, which also determines

    Major. And you will recall that the candidatekeys are the pairs, SSN-Major and SSN-

    Adviser. The third determinant, Adviser, is not

    a candidate key.

    From the DKNF point of view, we reason asfollows: If we choose SSN-Adviser as the key,

    then Major is determined by, and hence is a

    logical consequence of, this key, If, instead, we

    choose SSN-Major as the key, then Adviser isdetermined by, and hence is a logical

    consequence of, this alternative key. But in

    either case, the third constraint, viz., that

    Adviser determines Major, is not a logicalconsequence of the key. Hence, the table is not

    in DKNF.

    In order to move from this table to a set of tables in DKNF, we can argue. from the DKNF point of view, that

    we need to move Major into a table in which it will be a logical consequence of the key. Such a table would

    SSN Major Adviser

    123-45-

    6789

    Library and Information

    Science

    Dewey

    123-45-

    6789

    Public Affairs Roosevelt

    222-33-4444

    Library and InformationScience

    Putnam

    555-12-

    1212

    Library and Information

    Science

    Dewey

    987-65-

    4321

    Pre-Medicine Semmelweis

    987-65-4321

    Biochemistry Pasteur

    123-54-

    3210

    Pre-Law Hammurabi

    http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%208.%20The%203rd%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%208.%20The%203rd%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%208.%20The%203rd%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2011.2.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2011.2.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2011.2.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2011.2.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%208.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%208.%20The%203rd%20Normal%20Form
  • 7/26/2019 Steps in Normalization

    22/25

    obviously need to have Adviser as the key. If we put Adviser and Major into such a table, then we will need at

    least one other table, viz., a table that provides the necessary link between SSN and Adviser, so that we willknow who each student's adviser is. Once we have put SSN and Adviser into such a table, there is nothing

    further that needs to be done.

    Here are the tables.

    Table 11.3.2

    Major Adviser

    Library and InformationScience

    Dewey

    Public Affairs Roosevelt

    Library and Information

    Science

    Putnam

    Pre-Medicine Semmelweis

    Biochemistry Pasteur

    Pre-Law Hammurabi

    History Herodotus

    Table 11.3.3

    SSN Adviser

    123-45-

    6789

    Dewey

    123-45-

    6789

    Roosevelt

    222-33-4444

    Putnam

    555-12-

    1212

    Dewey

    987-65-

    4321

    Semmelweis

    987-65-4321

    Pasteur

  • 7/26/2019 Steps in Normalization

    23/25

    123-54-

    3210

    Hammurabi

    These are the tables presented inSection 9as single-theme tables that solved the failure ofTable 9.1to be inBoyce-Codd Normal Form. Here we have arrived at these same tables by considering how the information in

    Table 11.3.1(the same information as inTable 9.1)should be re-arranged from the DKNF point of view.

    Section 11.4. Converting a Table with Multivalued Dependencies into DKNF

    Here is the table,Table 10.1,that we used previously to illustrate the problem of multivalued dependencies.

    Since we going to use it here, we name this copy of it Table 11.4.1.

    Table 11.4.1

    If we analyze Table 11.4.1 from theDKNF point of view, the first thing we see

    is that the key in the table is composite. It

    is the triple, LastName-Major-Hobby. But

    in an intuitive sense, the natural keywould be just LastName, since we know

    that there are just four students involvedand that we are trying to present data

    about their majors and their hobbies.

    The complications arise because some of

    the students have more than one major

    and/or more than one hobby. Another wayof putting it is that the complications of

    the table arise from the fact that we are

    trying to display, in just one table, moreinformation than it is practicable todisplay in a single table.

    From the DKNF point of view, we havetwo constraints. One constraint concerns

    the natural key, LastName, and the

    attribute, Major. If we set up one table thathouses these attributes, then the constraint

    on Major will be a logical consequence of

    the key, LastName. The other constraint

    concerns the natural key, LastName, andthe attribute, Hobby. If we set up a second

    table that houses these attributes, then the

    constraint on Hobby will be a logicalconsequence of the key, LastName. Having set up these two tables, we will find that there is nothing further to

    be done.

    Here are the tables.

    Table 11.4.2

    LastName Major Hobby

    Jones Library and Information

    Science

    Surfing the Internet

    Jones Library and Information

    Science

    Chess

    Jones Public Affairs Surfing the Internet

    Jones Public Affairs Chess

    Lee Library and InformationScience

    Photography

    Lee Library and Information

    Science

    Stamp collecting

    Ruiz Pre-Medicine Surfing the Internet

    Ruiz Pre-Medicine Photography

    Ruiz Biochemistry Surfing the Internet

    Ruiz Biochemistry Photography

    Smith Pre-Law Playing poker

    http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%209.%20The%20Boyce-Codd%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%209.%20The%20Boyce-Codd%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%209.%20The%20Boyce-Codd%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2011.3.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2011.3.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2010.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2010.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2010.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2010.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2011.3.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%209.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%209.%20The%20Boyce-Codd%20Normal%20Form
  • 7/26/2019 Steps in Normalization

    24/25

    LastName Major

    Jones Library and Information

    Science

    Jones Public Affairs

    Lee Library and InformationScience

    Ruiz Pre-Medicine

    Ruiz Biochemistry

    Smith Pre-Law

    Table 11.4.3

    LastName Hobby

    Jones Surfing the

    Internet

    Jones Chess

    Lee Photography

    Lee Stamp collecting

    Ruiz Surfing the

    Internet

    Ruiz Photography

    Smith Playing poker

    These are the tables presented inSection 10as single-theme tables that solved the failure ofTable 10.1to be in

    4NF. Here we have arrived at these same tables by considering how the information inTable 11.4.1(the sameinformation as inTable 10.1)should be re-arranged from the DKNF point of view.

    Section 11.5. Single-Theme Tables and the DKNF

    What has the preceding discussion shown us?

    We have seen that when we analyze, from the DKNF point of view, tables with various kinds of problems, we

    find--again and again--that the solutions to the problems consist in turning a complicated, multi-theme table into

    sets of single-theme tables, tables which satisfy the requirements of the DKNF. If on the other hand, we analyze

    http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2010.%20The%204th%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2010.%20The%204th%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2010.%20The%204th%20Normal%20Formhttp://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2010.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2010.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2010.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2011.4.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2011.4.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2011.4.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2010.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2010.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2010.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2010.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2011.4.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Table%2010.1http://www.gslis.utexas.edu/~wyllys/DMPAMaterials/normstep.html#Section%2010.%20The%204th%20Normal%20Form
  • 7/26/2019 Steps in Normalization

    25/25

    a complicated, problem-laden table from the point of view of turning it into a set of single-theme tables, we

    thereby achieve--again and again--a set of tables that satisfy the requirements of the DKNF.

    In short, sets of single-theme tables will almost always be sets of tables in DKNF and, as such, will be sets of

    tables that avoid the various kinds of anomalies that we want to avoid.

    Let your motto be:

    Strive for Single-Theme Tables