Advanced Data Structures Notes

download Advanced Data Structures Notes

of 142

Transcript of Advanced Data Structures Notes

  • 8/11/2019 Advanced Data Structures Notes

    1/142

  • 8/11/2019 Advanced Data Structures Notes

    2/142

  • 8/11/2019 Advanced Data Structures Notes

    3/142

  • 8/11/2019 Advanced Data Structures Notes

    4/142

  • 8/11/2019 Advanced Data Structures Notes

    5/142

  • 8/11/2019 Advanced Data Structures Notes

    6/142

  • 8/11/2019 Advanced Data Structures Notes

    7/142

  • 8/11/2019 Advanced Data Structures Notes

    8/142

  • 8/11/2019 Advanced Data Structures Notes

    9/142

  • 8/11/2019 Advanced Data Structures Notes

    10/142

  • 8/11/2019 Advanced Data Structures Notes

    11/142

  • 8/11/2019 Advanced Data Structures Notes

    12/142

  • 8/11/2019 Advanced Data Structures Notes

    13/142

  • 8/11/2019 Advanced Data Structures Notes

    14/142

  • 8/11/2019 Advanced Data Structures Notes

    15/142

  • 8/11/2019 Advanced Data Structures Notes

    16/142

  • 8/11/2019 Advanced Data Structures Notes

    17/142

  • 8/11/2019 Advanced Data Structures Notes

    18/142

  • 8/11/2019 Advanced Data Structures Notes

    19/142

  • 8/11/2019 Advanced Data Structures Notes

    20/142

  • 8/11/2019 Advanced Data Structures Notes

    21/142

  • 8/11/2019 Advanced Data Structures Notes

    22/142

  • 8/11/2019 Advanced Data Structures Notes

    23/142

  • 8/11/2019 Advanced Data Structures Notes

    24/142

  • 8/11/2019 Advanced Data Structures Notes

    25/142

  • 8/11/2019 Advanced Data Structures Notes

    26/142

  • 8/11/2019 Advanced Data Structures Notes

    27/142

  • 8/11/2019 Advanced Data Structures Notes

    28/142

  • 8/11/2019 Advanced Data Structures Notes

    29/142

  • 8/11/2019 Advanced Data Structures Notes

    30/142

  • 8/11/2019 Advanced Data Structures Notes

    31/142

  • 8/11/2019 Advanced Data Structures Notes

    32/142

  • 8/11/2019 Advanced Data Structures Notes

    33/142

  • 8/11/2019 Advanced Data Structures Notes

    34/142

  • 8/11/2019 Advanced Data Structures Notes

    35/142

  • 8/11/2019 Advanced Data Structures Notes

    36/142

  • 8/11/2019 Advanced Data Structures Notes

    37/142

  • 8/11/2019 Advanced Data Structures Notes

    38/142

  • 8/11/2019 Advanced Data Structures Notes

    39/142

  • 8/11/2019 Advanced Data Structures Notes

    40/142

  • 8/11/2019 Advanced Data Structures Notes

    41/142

  • 8/11/2019 Advanced Data Structures Notes

    42/142

  • 8/11/2019 Advanced Data Structures Notes

    43/142

  • 8/11/2019 Advanced Data Structures Notes

    44/142

  • 8/11/2019 Advanced Data Structures Notes

    45/142

  • 8/11/2019 Advanced Data Structures Notes

    46/142

  • 8/11/2019 Advanced Data Structures Notes

    47/142

  • 8/11/2019 Advanced Data Structures Notes

    48/142

  • 8/11/2019 Advanced Data Structures Notes

    49/142

  • 8/11/2019 Advanced Data Structures Notes

    50/142

  • 8/11/2019 Advanced Data Structures Notes

    51/142

  • 8/11/2019 Advanced Data Structures Notes

    52/142

  • 8/11/2019 Advanced Data Structures Notes

    53/142

  • 8/11/2019 Advanced Data Structures Notes

    54/142

  • 8/11/2019 Advanced Data Structures Notes

    55/142

  • 8/11/2019 Advanced Data Structures Notes

    56/142

  • 8/11/2019 Advanced Data Structures Notes

    57/142

  • 8/11/2019 Advanced Data Structures Notes

    58/142

  • 8/11/2019 Advanced Data Structures Notes

    59/142

  • 8/11/2019 Advanced Data Structures Notes

    60/142

  • 8/11/2019 Advanced Data Structures Notes

    61/142

  • 8/11/2019 Advanced Data Structures Notes

    62/142

  • 8/11/2019 Advanced Data Structures Notes

    63/142

  • 8/11/2019 Advanced Data Structures Notes

    64/142

  • 8/11/2019 Advanced Data Structures Notes

    65/142

  • 8/11/2019 Advanced Data Structures Notes

    66/142

  • 8/11/2019 Advanced Data Structures Notes

    67/142

  • 8/11/2019 Advanced Data Structures Notes

    68/142

  • 8/11/2019 Advanced Data Structures Notes

    69/142

  • 8/11/2019 Advanced Data Structures Notes

    70/142

  • 8/11/2019 Advanced Data Structures Notes

    71/142

  • 8/11/2019 Advanced Data Structures Notes

    72/142

  • 8/11/2019 Advanced Data Structures Notes

    73/142

  • 8/11/2019 Advanced Data Structures Notes

    74/142

  • 8/11/2019 Advanced Data Structures Notes

    75/142

  • 8/11/2019 Advanced Data Structures Notes

    76/142

  • 8/11/2019 Advanced Data Structures Notes

    77/142

  • 8/11/2019 Advanced Data Structures Notes

    78/142

  • 8/11/2019 Advanced Data Structures Notes

    79/142

  • 8/11/2019 Advanced Data Structures Notes

    80/142

  • 8/11/2019 Advanced Data Structures Notes

    81/142

  • 8/11/2019 Advanced Data Structures Notes

    82/142

  • 8/11/2019 Advanced Data Structures Notes

    83/142

  • 8/11/2019 Advanced Data Structures Notes

    84/142

  • 8/11/2019 Advanced Data Structures Notes

    85/142

  • 8/11/2019 Advanced Data Structures Notes

    86/142

  • 8/11/2019 Advanced Data Structures Notes

    87/142

  • 8/11/2019 Advanced Data Structures Notes

    88/142

  • 8/11/2019 Advanced Data Structures Notes

    89/142

  • 8/11/2019 Advanced Data Structures Notes

    90/142

  • 8/11/2019 Advanced Data Structures Notes

    91/142

  • 8/11/2019 Advanced Data Structures Notes

    92/142

  • 8/11/2019 Advanced Data Structures Notes

    93/142

  • 8/11/2019 Advanced Data Structures Notes

    94/142

  • 8/11/2019 Advanced Data Structures Notes

    95/142

  • 8/11/2019 Advanced Data Structures Notes

    96/142

  • 8/11/2019 Advanced Data Structures Notes

    97/142

  • 8/11/2019 Advanced Data Structures Notes

    98/142

  • 8/11/2019 Advanced Data Structures Notes

    99/142

  • 8/11/2019 Advanced Data Structures Notes

    100/142

  • 8/11/2019 Advanced Data Structures Notes

    101/142

  • 8/11/2019 Advanced Data Structures Notes

    102/142

  • 8/11/2019 Advanced Data Structures Notes

    103/142

  • 8/11/2019 Advanced Data Structures Notes

    104/142

  • 8/11/2019 Advanced Data Structures Notes

    105/142

  • 8/11/2019 Advanced Data Structures Notes

    106/142

  • 8/11/2019 Advanced Data Structures Notes

    107/142

  • 8/11/2019 Advanced Data Structures Notes

    108/142

  • 8/11/2019 Advanced Data Structures Notes

    109/142

  • 8/11/2019 Advanced Data Structures Notes

    110/142

  • 8/11/2019 Advanced Data Structures Notes

    111/142

  • 8/11/2019 Advanced Data Structures Notes

    112/142

  • 8/11/2019 Advanced Data Structures Notes

    113/142

  • 8/11/2019 Advanced Data Structures Notes

    114/142

  • 8/11/2019 Advanced Data Structures Notes

    115/142

  • 8/11/2019 Advanced Data Structures Notes

    116/142

  • 8/11/2019 Advanced Data Structures Notes

    117/142

  • 8/11/2019 Advanced Data Structures Notes

    118/142

    117

    We proceed by comparing successive characters of W to "parallel" characters of S, moving fromone to the next if they match. However, in the fourth step, we get S[3] is a space and W[3] = 'D',a mismatch. Rather than beginning to search again at S[1], we note that no 'A' occurs between

    positions 0 and 3 in S except at 0; hence, having checked all those characters previously, weknow there is no chance of finding the beginning of a match if we check them again. Therefore

    we move on to the next character, setting m = 4 and i = 0.1 2m: 01234567890123456789012S: ABC ABCDAB ABCDABCDABDEW: ABCDABDi: 0123456

    We quickly obtain a nearly complete match "ABCDAB" when, at W[6] (S[10]), we again have adiscrepancy. However, just prior to the end of the current partial match, we passed an "AB"which could be the beginning of a new match, so we must take this into consideration. As wealready know that these characters match the two characters prior to the current position, we

    need not check them again; we simply reset m = 8, i = 2 and continue matching the currentcharacter. Thus, not only do we omit previously matched characters of S, but also previouslymatched characters of W.

    1 2m: 01234567890123456789012S: ABC ABCDAB ABCDABCDABDEW: ABCDABDi: 0123456

    This search fails immediately, however, as the pattern still does not contain a space, so as in thefirst trial, we return to the beginning of W and begin searching at the next character of S: m = 11,reset i = 0.

    1 2m: 01234567890123456789012S: ABC ABCDAB ABCDABCDABDEW: ABCDABDi: 0123456

    Once again we immediately hit upon a match "ABCDAB" but the next character, 'C', does notmatch the final character 'D' of the word W. Reasoning as before, we set m = 15, to start at thetwo-character string "AB" leading up to the current position, set i = 2, and continue matchingfrom the current position.

    1 2m: 01234567890123456789012S: ABC ABCDAB ABCDABCDABDEW: ABCDABDi: 0123456

    This time we are able to complete the match, whose first character is S[15].Algorithm:

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

  • 8/11/2019 Advanced Data Structures Notes

    119/142

    118

    algorithm kmp_search:input:

    an array of characters, S (the text to be searched)an array of characters, W (the word sought)

    output:

    an integer (the zero-based position in S at which W is found)

    define variables:an integer, m 0 (the beginning of the current match in S) an integer, i 0 (the position of the current character in W) an array of integers, T (the table, computed elsewhere)

    while m + i < length(S) doif W[i] = S[m + i] then

    if i = length(W) - 1 thenreturn m

    let i i + 1 elselet m m + i - T[i]if T[i] > -1 then

    let i T[i] else

    let i 0

    (if we reach here, we have searched all of S unsuccessfully)return the length of S

    Efficiency of KMP:

    Since the two portions of the algorithm have, respectively, complexities of O(k) and O(n), thecomplexity of the overall algorithm is O(n + k).These complexities are the same, no matter how many repetitive patterns are in W or S.Implementation of KMP:#include#include#includeclass str{

    private:char t[58],p[67],f[45];int i,j,m,n;

    public:void failure(char[]);int kmpmatch(char[],char[]);};void str::failure(char x[]){

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

    http://en.wikipedia.org/wiki/Array_data_typehttp://en.wikipedia.org/wiki/Array_data_type
  • 8/11/2019 Advanced Data Structures Notes

    120/142

    119

    m=strlen(x); j=0;i=1;f[0]=0;while(i0)

    j=f[j-1];else{

    f[i]=0;i++;}}}int str::kmpmatch(char t[],char p[]){failure(p);i=0;j=0;n=strlen(t);while(i0)

    j=f[i-1];elsei++;}return -1;}void main(){int i,j,m,n;str b;

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

  • 8/11/2019 Advanced Data Structures Notes

    121/142

    120

    clrscr();char t[50],p[20];coutt;coutp;int a=b.kmpmatch(t,p);if(a!=-1)cout

  • 8/11/2019 Advanced Data Structures Notes

    122/142

    121

    The Boyer-Moore algorithm searches for occurrences of P in T by performing explicit charactercomparisons at different alignments. Instead of a brute-force search of all alignments (of whichthere are m - n + 1), Boyer-Moore uses information gained by preprocessing P to skip as manyalignments as possible.The algorithm begins at alignment k = n, so the start of P is aligned with the start of T.

    Characters in P and T are then compared starting at index n in P and k in T, moving backward:the strings are matched from the end of P to the start of P. The comparisons continue until eitherthe beginning of P is reached (which means there is a match) or a mismatch occurs upon whichthe alignment is shifted to the right according to the maximum value permitted by a number ofrules. The comparisons are performed again at the new alignment, and the process repeats untilthe alignment is shifted past the end of T, which means no further matches will be found.The shift rules are implemented as constant-time table lookups, using tables generated during the

    preprocessing of P.

    The Good Suffix Rule

    Description

    - - - - X - - K - - - - -M A N P A N A M A N A P -A N A M P N A M - - - - -- - - - A N A M P N A M -Demonstration of good suffix rule with pattern ANAMPNAM.

    The good suffix rule is markedly more complex in both concept and implementation than the badcharacter rule. It is the reason comparisons begin at the end of the pattern rather than the start,and is formally stated thus :[3] Suppose for a given alignment of P and T, a substring t of T matches a suffix of P, but a

    mismatch occurs at the next comparison to the left. Then find, if it exists, the right-most copy t'of t in P such that t' is not a suffix of P and the character to the left of t ' in P differs from thecharacter to the left of t in P. Shift P to the right so that substring t' in P is below substring t in T.If t' does not exist, then shift the left end of P past the left end of t in T by the least amount sothat a prefix of the shifted pattern matches a suffix of t in T. If no such shift is possible, then shiftP by n places to the right. If an occurrence of P is found, then shift P by the least amount so thata proper prefix of the shifted P matches a suffix of the occurrence of P in T. If no such shift is

    possible, then shift P by n places, that is, shift P past T.

    Preprocessing

    The good suffix rule requires two tables: one for use in the general case, and another for usewhen either the general case returns no meaningful result or a match occurs. These tables will bedesignated L and H respectively. Their definitions are as follows :[3] For each i, L[i] is the largest position less than n such that string P[i..n] matches a suffix ofP[1..L[i]] and such that the character preceding that suffix is not equal to P[i-1]. L[i] is defined to

    be zero if there is no position satisfying the condition.Let H[i] denote the length of the largest suffix of P[i..n] that is also a prefix of P, if one exists. Ifnone exists, let H[i] be zero.

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

    http://en.wikipedia.org/wiki/Brute-force_searchhttp://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Brute-force_search
  • 8/11/2019 Advanced Data Structures Notes

    123/142

    122

    Both of these tables are constructible in O(n) time and use O(n) space. The alignment shift forindex i in P is given by n - L[i] or n - H[i]. H should only be used if L[i] is zero or a match has

    been found.Performance:The Boyer-Moore algorithm as presented in the original paper has worst-case running time of

    O(n+m) only if the pattern does not appear in the text.Implementation of boyer moore:#include#include#includeclass bm{

    public:char M[20],P[15];

    public:int last(char);

    int psize();int boyer(int,int);int min(int,int);

    };

    int bm::last(char ch){

    int l[150],i,k;for(i=0;i

  • 8/11/2019 Advanced Data Structures Notes

    124/142

    123

    return i;else

    {i=i-1;

    j=j-1;

    }}else

    {i=i+m-min(j,1+last(M[i]));

    j=m-1;}} while(ii)?i:j);}

    void main(){

    bm b1;int m,n,x;clrscr();coutb1.M;coutb1.P;

    m=strlen(b1.P);n=strlen(b1.M);x=b1.boyer(n,m);if(x==-1)

    cout

  • 8/11/2019 Advanced Data Structures Notes

    125/142

    124

    Ex: bit one of 1000 is 1, and bits two , three , four are 0. All keys in the left subtree of a node at level I have bit i equal to zero whereas those in

    the right subtree of nodes at this level have bit i = 1. Assume fixed number of bits Not empty =>

    Root contains one dictionary pair (any pair) All remaining pairs whose key begins with

    a 0 are in the left subtree. All remaining pairs whose key begins with

    a 1 are in the right subtree. Left and right subtrees are digital subtrees

    on remaining bits.This digital search tree contains the keys 1000,0010,1001,0001,1100,0000

    Example: Start with an empty digital search tree and

    insert a pair whose key is 0110

    Now , insert a pair whose key is 0010

    Now , insert a pair whose key is 1001

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

  • 8/11/2019 Advanced Data Structures Notes

    126/142

    125

    Now insert a pair whose key is 1011

    Now , insert a pair whose key is 0000

    Search and Insert: The digital search tree functions to search and insert are quite similar to the

    corresponding functions for binary search trees. The essential difference is that the subtree to move to is determined by a bit in the searchkey rather than by the result of the comparison of the search key and the key in thecurrent node.

    Try to build the digital search tree: A 00001 S 10011 E 00101 R 10010 C 00011 H 01000

    I 01001 N 01110 G 00111 X 11000 M 01101 P 10000

    When we dealing with very long keys, the cost of a key comparison is high. We can reduce thenumber of key comparisons to one by using a related structure called Patricia

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

  • 8/11/2019 Advanced Data Structures Notes

    127/142

    126

    We shall develop this structure in three steps. First, we introduce a structure called a binary trie.

    Then we transform binary tries into compressed binary tries. Finally, from compressed binary tries we obtain Patricia.

    7.5 Binary Trie A binary trie is a binary tree that has two kinds of nodes: branch nodes and elementnodes.

    A branch node has the two data members LeftChild and RightChild. It has no datamember.

    An element node has the single data member data. Branch nodes are used to build a binary tree search structure similar to that of a digital

    search tree. This leads to element nodesSix element binary trie:

    Compressed Binary trie: The binary trie contains branch nodes whose degree is one. By adding another data

    member, BitNumber , to each branch node, we can eliminate all degree-one branch nodesfrom the trie. The BitNumber data member of a branch node gives the bit number of thekey that is to be used at this node.

    Binary Trie with degree one nodes eliminated:

    7.6 Patricia:( Practical Algorithm to Retrieve Information Coded inAlphanumeric)

    Compressed binary tries may be represented using nodes of a single type. The new nodes,called augmented branch nodes, are the original branch nodes augmented by the data

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

  • 8/11/2019 Advanced Data Structures Notes

    128/142

    127

    member data. The resulting structure is called Patricia and is obtained from a compressed binary trie in the following way:

    (1)Replace each branch node by an augmented branch node. (2)Eliminate the element nodes. (3)Store the data previously in the element node in the data data members of the

    augmented branch nodes. Since every nonempty compressed binary trie has one less branch node than it has element nodes, it is necessary to add one augmented branch node.This node is called the head node . The remaining structure is the left subtree of the headnode. The head node has BitNumber equal to zero. Its right-child data member is notused. The assignment of data to augmented branch node is less than or equal to that in the

    parent of the element node that contained this data . (4)Replace the original pointers to element nodes by pointers to the respective augmented

    branch nodes.

    typedef struct patricia_tree *patricia;struct patricia_tree {

    int bit_number;

    element data; patricia left_child, right_child;};

    patricia root; patricia search:Patricia search(patricia t, unsigned k){/*search the Patricia tree t; return the last node y encountered; if k = y ->data.key, the key is inthe tree */Patricia p, y;If (!t) return NULL; /* empty tree*/y=t->left_child;

    p=t;while (y->bit_number > p->bit_number){

    p=y;y=(bit(k, y->bit_number)) ?y->right_child : y->left_child;

    }return y;

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

  • 8/11/2019 Advanced Data Structures Notes

    129/142

    128

    }Patricia Insert:void insert (patricia *t, element x){/* insert x into the Patricia tree *t */

    patricia s, p, y, z;

    int i;if (!(*t)) { /* empty tree*/*t = (patricia)malloc(sizeof(patricia_tree));if (IS_FULL(*t)) {

    fprintf(stderr, The memory is full \n) ; exit(1);

    }(*t)->bit_number = 0(*t)->data = x;(*t)->left_child = *t;

    }

    y = search(*t,x.key);if (x.key == y->data.key) {fprintf(stderr, The key is in the tree. Insertion fails. \n); exit(1);}

    /* find the first bit where x.key and y->data.key differ*/for(i = 1; bit (x.key,i) == bit(y->data.key,i); i++ );/* search tree using the first i-1 bits*/s = (*t)->left_child;

    p = *t;while (s->bit_number > p->bit_number && s->bit_number < 1){

    p = s;s = (bit(x.key,s->bit_number)) ?

    s->right_child : s->left_child;}/* add x as a child of p */z = (patricia)malloc(sizeof(patricia_tree));if (IS_FULL(z)) {

    fprintf(stderr, The memory is full \n); exit(1);

    }z->data = x;z->bit_number = i;z->left_child = (bit(x.key,i)) ? s: z;z->right_child = (bit(x.key,i)) ? z : s;if (s == p->left_child) p->left_child = z;else

    p->right_child = z;

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

  • 8/11/2019 Advanced Data Structures Notes

    130/142

    129

    Assignment Questions:Pattern matching and Tries

    1. Explain Pattern matching algorithmsi. the Boyer Moore algorithm

    ii. the Knuth-Morris-Pratt algorithm

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

  • 8/11/2019 Advanced Data Structures Notes

    131/142

    130

    2. Define Tries?Give the concepts of digital search tree? What are the applications ofTries.

    3. Write a short notes oni. binary trie

    ii. Patricia

    iii.

    Multi-way trie4. Describe an efficient algorithm to find the longest palindrome that is a suffixof a string T of length n.

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

  • 8/11/2019 Advanced Data Structures Notes

    132/142

    131

    UNIT-VIIITopics: File Structures: Fundamental File Processing Operations-opening files, closingfiles, Reading and Writing file contents, Special characters in files.Fundamental File Structure Concepts- Field and record organization, Managing fixed-length,fixed-field buffers.

    8.1 Fundamental file processing operations physical file

    A file as seen by the operating system, and which actually exists on secondary storage.logical file

    A file as seen by a programPrograms read and write data from logical files.Before a logical file can be used, it must be associated with a physical file.This act of connection is called "opening" the file.Data in a physical file is persistent.Data in a logical file is temporary.A logical file is identified (within the program) by a program variable or constant.

    C++ supports file access on three levels:Unbuffered, unformatted file access using handles.Buffered, formatted file access using the FILE structure.Buffered, formatted file access using classes.

    8.1.1Opening Filesopen

    To associate a logical program file with a physical system file. protection mode

    The security status of a file, defining who is allowed to access a file and which accessmodes are allowed.

    access modeThe type of (file) access allowed.

    file descriptorA cardinal number used as the identifier for a logical file by operating systems such asUNIX and PC-DOS.In C++, a file is opened by using library functions.The name of the physical file must be supplied to an open function.The open function must also be supplied with an access mode.The open function can also be supplied with a protection mode.The access mode has several aspects:Is the file is to be accessed by reading, by writing, or by both?

    What should be done with existing contents of the file?Should a new file be created if none exists?Should any character translation be done?For handle level access, the logical file is declared as an int .The handle is also known as a file descriptor. The C++ open function is used to open a file for handle level access.The value returned by the open is assigned to the file variable.For FILE level access, the logical file is declared as a FILE * .

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

    http://cpp.comsci.us/etymology/function/open.htmlhttp://cpp.comsci.us/etymology/function/open.html
  • 8/11/2019 Advanced Data Structures Notes

    133/142

    132

    The C++ fopen function is used to open a file for FILE level access.The value returned by the fopen is assigned to the file variable.For class level access, the logical file is declared as an fstream (or as an ifstream or anofstream .)The C++ open method of the class is used to open a file for class level access.

    8.1.2 Closing FilesTo disassociate a logical program file from a physical system file.Closing a file frees system resources for reuse.Data may not be actually written to the physical file until a logical file is closed.A programs should close a file when it is no longer needed.The C++ close function is used to close a file for handle level access.The logical file to be closed is an argument to the handle close function.The C++ fclose function is used to close a file for FILE level access.The logical file to be closed is an argument to the FILE fclose function.The C++ close method of the class is used to close a file for class level access.

    8.1.3 Reading and writing

    read To tranfer data from a file to program variable(s).write

    To tranfer data to a file from program variable(s) or constant(s).end-of-file

    A physical location just beyond the last datum in a file.The read and write operations are performed on the logical file with calls to libraryfunctions.For read, one or more variables must be supplied to the read function, to receive the datafrom the file.For write, one or more values (as variables or constants) must be supplied to the writefunction, to provide the data for the file.For unformatted transfers, the amount of data to be transferred must also be supplied.The C++ read function is used to read a file with handle level access.The C++ write function is used to write a file with handle level access.The C++ fread function is used to read a file with FILE level access.The C++ fwrite function is used to write a file with FILE level access.The C++ read method of the class is used to read a file with class level access.The C++ write method of the class is used to write a file with class level access.The acronym for end-of-file is EOF.When a file reaches EOF, no more data can be read.Data can be written at or past EOF.The C++ eof function is used to detect end-of-file with handle level access.The handle eof function detects when the file pointer is at end-of-file.The C++ feof function is used to detect end-of-file with FILE level access.The FILE feof function detects when the file pointer is past end-of-file.The C++ eof method of the class is used to detect end-of-file with class level access.The stream class eof function detects when the file pointer is past end-of-file.

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

    http://cpp.comsci.us/etymology/function/fopen.htmlhttp://cpp.comsci.us/etymology/method/open.htmlhttp://cpp.comsci.us/etymology/function/close.htmlhttp://cpp.comsci.us/etymology/function/fclose.htmlhttp://cpp.comsci.us/etymology/method/close.htmlhttp://cpp.comsci.us/etymology/function/read.htmlhttp://cpp.comsci.us/etymology/function/write.htmlhttp://cpp.comsci.us/etymology/function/fread.htmlhttp://cpp.comsci.us/etymology/function/fwrite.htmlhttp://cpp.comsci.us/etymology/method/read.htmlhttp://cpp.comsci.us/etymology/method/write.htmlhttp://cpp.comsci.us/etymology/function/eof.htmlhttp://cpp.comsci.us/etymology/function/feof.htmlhttp://cpp.comsci.us/etymology/method/eof.htmlhttp://cpp.comsci.us/etymology/method/eof.htmlhttp://cpp.comsci.us/etymology/function/feof.htmlhttp://cpp.comsci.us/etymology/function/eof.htmlhttp://cpp.comsci.us/etymology/method/write.htmlhttp://cpp.comsci.us/etymology/method/read.htmlhttp://cpp.comsci.us/etymology/function/fwrite.htmlhttp://cpp.comsci.us/etymology/function/fread.htmlhttp://cpp.comsci.us/etymology/function/write.htmlhttp://cpp.comsci.us/etymology/function/read.htmlhttp://cpp.comsci.us/etymology/method/close.htmlhttp://cpp.comsci.us/etymology/function/fclose.htmlhttp://cpp.comsci.us/etymology/function/close.htmlhttp://cpp.comsci.us/etymology/method/open.htmlhttp://cpp.comsci.us/etymology/function/fopen.html
  • 8/11/2019 Advanced Data Structures Notes

    134/142

    133

    Seeking:seek

    To move to a specified location in a file. byte offset

    The distance, measured in bytes, from the beginning.

    Seeking moves an attribute in the file called the file pointer. C++ library functions allow seeking.In DOS, Windows, and UNIX, files are organized as streams of bytes, and locations arein terms of byte count.Seeking can be specified from one of three reference points:

    o The beginning of the file.o The end of the file.o The current file pointer position.

    The C++ lseek function is used to seek with handle level access.The C++ fseek function is used to seek with FILE level access.The C++ seekg method of the class is used to seek with class level access for read (get.)

    The C++ seekp method of the class is used to seek with class level access for write (put.)C++ allows, but does not require, separate file pointers and seek functions for readingand writing.Most implementations of C++ do not have separate file pointers.

    8.1.4 Special characters in filesSpecifics of files can vary with the operating system.The C++ language was written originally for the UNIX operating system.UNIX and DOS (Windows) systems handle separators between lines differently.In UNIX files, lines are separated by a single new line character (ASCII line feed .)In DOS (Windows) files, lines are separated by a two characters (ASCII carriage return and line feed .)When DOS files are opened in text mode, the internal separator ('\n') is translated to thethe external separator () during read and write.When DOS files are opened in binary mode, the internal separator ('\n') is not translatedto the the external separator () during read and write.In DOS (Windows) files, end-of-file can be marked by a "control-Z" character (ASCIISUB).In C++ implementations for DOS, a control-Z in a file is interpreted as end-of-file.Other operating systems may handle line separation and end-of-file differently.Implementations of C++ should treat text files so that the internal representations are thesame as UNIX.On UNIX systems, files opened in text mode or binary mode behave the same way.

    8.2 Fundamental file structure conceptsPersistent=Retained after execution of the program which created it.When we build file structures, we are making it possible to make data persistent . That is,one program can store data from memory to a file, and terminate. Later, another programcan retrieve the data from the file, and process it in memory.In this chapter, we look at file structures which can be used to organize the data withinthe file, and at the algorithms which can be used to store and retrieve the datasequentially.

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

    http://cpp.comsci.us/etymology/function/lseek.htmlhttp://cpp.comsci.us/etymology/function/fseek.htmlhttp://cpp.comsci.us/etymology/method/seekg.htmlhttp://cpp.comsci.us/etymology/method/seekp.htmlhttp://cs.tulsa.to/datacom/ascii.htmlhttp://cs.tulsa.to/datacom/ascii.htmlhttp://cs.tulsa.to/datacom/ascii.htmlhttp://cs.tulsa.to/datacom/ascii.htmlhttp://cpp.comsci.us/etymology/method/seekp.htmlhttp://cpp.comsci.us/etymology/method/seekg.htmlhttp://cpp.comsci.us/etymology/function/fseek.htmlhttp://cpp.comsci.us/etymology/function/lseek.html
  • 8/11/2019 Advanced Data Structures Notes

    135/142

    134

    8.2.1 Field and Record organizationRecord:A subdivision of a file, containing data related to a single entity.Field:A subdivision of a record containing a single attribute of the entity which the recorddescribes.stream of bytes:A file which is regarded as being without structure beyond separation

    into a sequential set of bytes.Within a program, data is temporarily stored in variables.Individual values can be aggregated into structures, which can be treated as a singlevariable with parts.In C++, classes are typically used as as an aggregate structure.C++ Person class (version 0.1):class Person { public:

    char FirstName [11];char LastName[11];char Address [21];

    char City [21];char State [3];char ZIP [5];

    };With this class declaration, variables can be declared to be of type Person. Theindividual fields within a Person can be referred to as the name of the variable and thename of the field, separated by a period (.).C++ Program:#include

    class Person { public:

    char FirstName [11];char LastName[11];char Address [31];

    char City [21];char State [3];

    char ZIP [5];};

    void Display (Person);

    int main () {Person Clerk;Person Customer;

    strcpy (Clerk.FirstName, "Fred");strcpy (Clerk.LastName, "Flintstone");strcpy (Clerk.Address, "4444 Granite Place");strcpy (Clerk.City, "Rockville");

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

  • 8/11/2019 Advanced Data Structures Notes

    136/142

    135

    strcpy (Clerk.State, "MD");strcpy (Clerk.ZIP, "00001");

    strcpy (Customer.FirstName, "Lily");strcpy (Customer.LastName, "Munster");

    strcpy (Customer.Address, "1313 Mockingbird Lane");strcpy (Customer.City, "Hollywood");strcpy (Customer.State, "CA");strcpy (Customer.ZIP, "90210");

    Display (Clerk);Display (Customer);

    }

    void Display (Person Someone) {cout

  • 8/11/2019 Advanced Data Structures Notes

    137/142

    136

    Fixed length records: A record which is predetermined to be the same length as the other recordsin the file.

    Record 1 Record 2 Record 3 Record 4 Record 5The file is divided into records of equal size.

    All records within a file have the same size.Different files can have different length records.Programs which access the file must know the record length.Offset, or position, of the nth record of a file can be calculated.There is no external overhead for record separation.There may be internal fragmentation (unused space within records.)There will be no external fragmentation (unused space outside of records) except fordeleted records.Individual records can always be updated in place.

    Example (80 byte records):

    0 66 69 72 73 74 20 6C 69 6E 65 0 0 1 0 0 0 first line......10 0 0 0 0 0 0 0 0 FF FF FF FF 0 0 0 0 ................20 68 FB 12 0 DC E0 40 0 3C BA 42 0 78 FB 12 0 h.....@.

  • 8/11/2019 Advanced Data Structures Notes

    138/142

    137

    Offset, or position, of the nth record of a file cannot be calculated.There is external overhead for record separation equal to the size of the delimiter per record.There should be no internal fragmentation (unused space within records.)There may be no external fragmentation (unused space outside of records) after file updating.Individual records cannot always be updated in place.

    Example (Delimiter = ASCII 30 (IE) = RS character:0 66 69 72 73 74 20 6C 69 6E 65 1E 73 65 63 6F 6E first line.secon

    10 64 20 6C 69 6E 65 1E d line.Example (Delimiter = '\n'):

    0 46 69 72 73 74 20 28 31 73 74 29 20 4C 69 6E 65 First (1st) Line10 D A 53 65 63 6F 6E 64 20 28 32 6E 64 29 20 6C ..Second (2nd) l20 69 6E 65 D A ine..

    Disadvantage: the offset of each record cannot be calculated from its record number. Thismakes direct access impossible.Advantage: there is space overhead for the length prefix.

    Advantage: there will probably be no internal fragmentation (unusable space within records.)Length prefixed variable length records:

    110 Record 1 40 Record2

    100 Record 3 80 Record 4 70 Record 5

    The records within a file are prefixed by a length byte or bytes.Records within a file can have different sizes.Different files can have different length records.Programs which access the file must know the size and format of the length prefix.Offset, or position, of the nth record of a file cannot be calculated.There is external overhead for record separation equal to the size of the length prefix per

    record.There should be no internal fragmentation (unused space within records.)There may be no external fragmentation (unused space outside of records) after fileupdating.Individual records cannot always be updated in place.Example:

    0 A 0 46 69 72 73 74 20 4C 69 6E 65 B 0 53 65 ..First Line..Se10 63 6F 6E 64 20 4C 69 6E 65 1F 0 54 68 69 72 64 cond Line..Third20 20 4C 69 6E 65 20 77 69 74 68 20 6D 6F 72 65 20 Line with more30 63 68 61 72 61 63 74 65 72 73 characters

    Disadvantage: the offset of each record can be calculated from its record number. Thismakes direct access possible.Disadvantage: there is space overhead for the delimiter suffix.Advantage: there will probably be no internal fragmentation (unusable space withinrecords.)

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

  • 8/11/2019 Advanced Data Structures Notes

    139/142

    138

    Indexed variable length records:

    An auxiliary file can be used to point to the beginning of each record.In this case, the data records can be contiguous.If the records are contiguous, the only access is through the index file.Example:Index File:

    0 12 0 0 0 25 0 0 0 47 0 0 0 ....%...G...

    Data File:

    0 46 69 72 73 74 20 28 31 73 74 29 20 53 74 72 69 First (1st) Stri10 6E 67 53 65 63 6F 6E 64 20 28 32 6E 64 29 20 53 ngSecond (2nd) S20 74 72 69 6E 67 54 68 69 72 64 20 28 33 72 64 29 tringThird (3rd)30 20 53 74 72 69 6E 67 20 77 68 69 63 68 20 69 73 String which is40 20 6C 6F 6E 67 65 72 longer

    Advantage: the offset of each record is be contained in the index, and can be looked up from itsrecord number. This makes direct access possible.Disadvantage: there is space overhead for the index file.Disadvantage: there is time overhead for the index file.Advantage: there will probably be no internal fragmentation (unusable space within records.)

    The time overhead for accessing the index file can be minimized by reading the entireindex file into memory when the files are opened.

    Fixed field count records:Records can be recognized if they always contain the same (predetermined) number of fields.Delineation of fields in a record:Fixed length fields:

    Field 1 Field 2 Field 3 Field 4 Field 5Each record is divided into fields of correspondingly equal size.Different fields within a record have different sizes.Different records can have different length fields.

    Programs which access the record must know the field lengths.There is no external overhead for field separation.There may be internal fragmentation (unused space within fields.)

    Delimited variable length fields:

    Field 1 ! Field2 ! Field 3 ! Field 4 ! Field 5 !

    The fields within a record are followed by a delimiting byte or series of bytes.

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

  • 8/11/2019 Advanced Data Structures Notes

    140/142

    139

    Fields within a record can have different sizes.Different records can have different length fields.Programs which access the record must know the delimiter.The delimiter cannot occur within the data.If used with delimited records, the field delimiter must be different from the record

    delimiter.There is external overhead for field separation equal to the size of the delimiter per field.There should be no internal fragmentation (unused space within fields.)

    Length prefixed variable length fields:

    12 Field 1 4 Field2 10 Field 3 8 Field 4 7 Field 5

    The fields within a record are prefixed by a length byte or bytes.Fields within a record can have different sizes.Different records can have different length fields.Programs which access the record must know the size and format of the length prefix.

    There is external overhead for field separation equal to the size of the length prefix perfield.There should be no internal fragmentation (unused space within fields.)

    Representing record or field length:Record or field length can be represented in either binary or character form.The length can be considered as another hidden field within the record.This length field can be either fixed length or delimited.When character form is used, a space can be used to delimit the length field.A two byte fixed length field could be used to hold lengths of 0 to 65535 bytes in binaryform.A two byte fixed length field could be used to hold lengths of 0 to 99 bytes in decimal

    character form.A variable length field delimited by a space could be used to hold effectively any length.In some languages, such as strict Pascal, it is difficult to mix binary values and charactervalues in the same file.The C++ language is flexible enough so that the use of either binary or character formatis easy.

    Tagged fields:Tags, in the form "Keyword=Value", can be used in fields.Use of tags does not in itself allow separation of fields, which must be done with anothermethod.Use of tags adds significant space overhead to the file.

    Use of tags does add flexibility to the file structure.Fields can be added without affecting the basic structure of the file.Tags can be useful when records have sparse fields - that is, when a significant number ofthe possible attributes are absent.

    Mixing numbers and Characters: Use of a File DumpFile-dump gives us the ability to look inside a file at the actual bytes that are storedOctal Dump: od -xc filenamee.g. The number 40, stored as ASCII characters and as a short integer

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

  • 8/11/2019 Advanced Data Structures Notes

    141/142

    140

    Byte order:The byte order of integers (and floating point numbers) is not the same on all computers.This is hardware dependent (CPU), not software dependent.Many computers store numbers as might be expected: 40 10 = 28 16 is stored in a four byte

    integer as 00 00 00 28.PCs reverse the byte order, and store numbers with the least significant byte first: 40 10 =2816 is stored in a four byte integer as 28 00 00 00.On most computers, the number 40 would be stored in character form in its ASCIIvalues: 34 30.IBM mainframe computers use EBCDIC instead of ASCII, and would store "40" as F4F0.

    8.3 Managing Fixed length,fixed field buffersFor having the fixed length and fixed field buffer instead of writing the size of each field or eachrecord we can write the methods that control the fixed length of each field.The class FixedLengthBuffer is subclass of IOBuffer.This class supports both the fixed length

    and fixed field buffers.The object of FixedLengthBuffer class can record the size of each record.FixedLengthBuffer class as given belowclass FixedFieldBuffer:public FixedLengthBuffer{

    public:FixedFieldBuffer(int maxFields,int RecordSize=3000);FixedFieldBuffer(int maxFields,int *fieldSize);int AddField(int fieldSize);//define the next fieldint Pack(const void* field,int size=-1);int Unpack(void * field,int maxBytes=-1);int NumberOfFields()const;// return number of defined fields

    protected:int * FieldSize;//array to hold field sizesint maxFields;//max number of fieldsint NumFields;//actual number of defined fields};The AddField method is used to specify the size of the field.The total number of fields can beobtained using NumberOfFields method.

    Assignment Questions:

    File Structures1. Explain the fundamental File Processing Operationsi. opening files

    ii. closing filesiii. Reading and Writing file contentsiv. Special characters in files.

    2. Discuss the fundamental File Structure Conceptsi. Field and record organization

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net

    www.jntuworld.com || www.jwjobs.net

  • 8/11/2019 Advanced Data Structures Notes

    142/142

    ii. Managing fixed-length,iii. fixed-field buffers.

    www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net