Advanced Data Structures Notes
-
Upload
chinkal-nagpal -
Category
Documents
-
view
229 -
download
0
Transcript of Advanced Data Structures Notes
-
8/11/2019 Advanced Data Structures Notes
1/142
-
8/11/2019 Advanced Data Structures Notes
2/142
-
8/11/2019 Advanced Data Structures Notes
3/142
-
8/11/2019 Advanced Data Structures Notes
4/142
-
8/11/2019 Advanced Data Structures Notes
5/142
-
8/11/2019 Advanced Data Structures Notes
6/142
-
8/11/2019 Advanced Data Structures Notes
7/142
-
8/11/2019 Advanced Data Structures Notes
8/142
-
8/11/2019 Advanced Data Structures Notes
9/142
-
8/11/2019 Advanced Data Structures Notes
10/142
-
8/11/2019 Advanced Data Structures Notes
11/142
-
8/11/2019 Advanced Data Structures Notes
12/142
-
8/11/2019 Advanced Data Structures Notes
13/142
-
8/11/2019 Advanced Data Structures Notes
14/142
-
8/11/2019 Advanced Data Structures Notes
15/142
-
8/11/2019 Advanced Data Structures Notes
16/142
-
8/11/2019 Advanced Data Structures Notes
17/142
-
8/11/2019 Advanced Data Structures Notes
18/142
-
8/11/2019 Advanced Data Structures Notes
19/142
-
8/11/2019 Advanced Data Structures Notes
20/142
-
8/11/2019 Advanced Data Structures Notes
21/142
-
8/11/2019 Advanced Data Structures Notes
22/142
-
8/11/2019 Advanced Data Structures Notes
23/142
-
8/11/2019 Advanced Data Structures Notes
24/142
-
8/11/2019 Advanced Data Structures Notes
25/142
-
8/11/2019 Advanced Data Structures Notes
26/142
-
8/11/2019 Advanced Data Structures Notes
27/142
-
8/11/2019 Advanced Data Structures Notes
28/142
-
8/11/2019 Advanced Data Structures Notes
29/142
-
8/11/2019 Advanced Data Structures Notes
30/142
-
8/11/2019 Advanced Data Structures Notes
31/142
-
8/11/2019 Advanced Data Structures Notes
32/142
-
8/11/2019 Advanced Data Structures Notes
33/142
-
8/11/2019 Advanced Data Structures Notes
34/142
-
8/11/2019 Advanced Data Structures Notes
35/142
-
8/11/2019 Advanced Data Structures Notes
36/142
-
8/11/2019 Advanced Data Structures Notes
37/142
-
8/11/2019 Advanced Data Structures Notes
38/142
-
8/11/2019 Advanced Data Structures Notes
39/142
-
8/11/2019 Advanced Data Structures Notes
40/142
-
8/11/2019 Advanced Data Structures Notes
41/142
-
8/11/2019 Advanced Data Structures Notes
42/142
-
8/11/2019 Advanced Data Structures Notes
43/142
-
8/11/2019 Advanced Data Structures Notes
44/142
-
8/11/2019 Advanced Data Structures Notes
45/142
-
8/11/2019 Advanced Data Structures Notes
46/142
-
8/11/2019 Advanced Data Structures Notes
47/142
-
8/11/2019 Advanced Data Structures Notes
48/142
-
8/11/2019 Advanced Data Structures Notes
49/142
-
8/11/2019 Advanced Data Structures Notes
50/142
-
8/11/2019 Advanced Data Structures Notes
51/142
-
8/11/2019 Advanced Data Structures Notes
52/142
-
8/11/2019 Advanced Data Structures Notes
53/142
-
8/11/2019 Advanced Data Structures Notes
54/142
-
8/11/2019 Advanced Data Structures Notes
55/142
-
8/11/2019 Advanced Data Structures Notes
56/142
-
8/11/2019 Advanced Data Structures Notes
57/142
-
8/11/2019 Advanced Data Structures Notes
58/142
-
8/11/2019 Advanced Data Structures Notes
59/142
-
8/11/2019 Advanced Data Structures Notes
60/142
-
8/11/2019 Advanced Data Structures Notes
61/142
-
8/11/2019 Advanced Data Structures Notes
62/142
-
8/11/2019 Advanced Data Structures Notes
63/142
-
8/11/2019 Advanced Data Structures Notes
64/142
-
8/11/2019 Advanced Data Structures Notes
65/142
-
8/11/2019 Advanced Data Structures Notes
66/142
-
8/11/2019 Advanced Data Structures Notes
67/142
-
8/11/2019 Advanced Data Structures Notes
68/142
-
8/11/2019 Advanced Data Structures Notes
69/142
-
8/11/2019 Advanced Data Structures Notes
70/142
-
8/11/2019 Advanced Data Structures Notes
71/142
-
8/11/2019 Advanced Data Structures Notes
72/142
-
8/11/2019 Advanced Data Structures Notes
73/142
-
8/11/2019 Advanced Data Structures Notes
74/142
-
8/11/2019 Advanced Data Structures Notes
75/142
-
8/11/2019 Advanced Data Structures Notes
76/142
-
8/11/2019 Advanced Data Structures Notes
77/142
-
8/11/2019 Advanced Data Structures Notes
78/142
-
8/11/2019 Advanced Data Structures Notes
79/142
-
8/11/2019 Advanced Data Structures Notes
80/142
-
8/11/2019 Advanced Data Structures Notes
81/142
-
8/11/2019 Advanced Data Structures Notes
82/142
-
8/11/2019 Advanced Data Structures Notes
83/142
-
8/11/2019 Advanced Data Structures Notes
84/142
-
8/11/2019 Advanced Data Structures Notes
85/142
-
8/11/2019 Advanced Data Structures Notes
86/142
-
8/11/2019 Advanced Data Structures Notes
87/142
-
8/11/2019 Advanced Data Structures Notes
88/142
-
8/11/2019 Advanced Data Structures Notes
89/142
-
8/11/2019 Advanced Data Structures Notes
90/142
-
8/11/2019 Advanced Data Structures Notes
91/142
-
8/11/2019 Advanced Data Structures Notes
92/142
-
8/11/2019 Advanced Data Structures Notes
93/142
-
8/11/2019 Advanced Data Structures Notes
94/142
-
8/11/2019 Advanced Data Structures Notes
95/142
-
8/11/2019 Advanced Data Structures Notes
96/142
-
8/11/2019 Advanced Data Structures Notes
97/142
-
8/11/2019 Advanced Data Structures Notes
98/142
-
8/11/2019 Advanced Data Structures Notes
99/142
-
8/11/2019 Advanced Data Structures Notes
100/142
-
8/11/2019 Advanced Data Structures Notes
101/142
-
8/11/2019 Advanced Data Structures Notes
102/142
-
8/11/2019 Advanced Data Structures Notes
103/142
-
8/11/2019 Advanced Data Structures Notes
104/142
-
8/11/2019 Advanced Data Structures Notes
105/142
-
8/11/2019 Advanced Data Structures Notes
106/142
-
8/11/2019 Advanced Data Structures Notes
107/142
-
8/11/2019 Advanced Data Structures Notes
108/142
-
8/11/2019 Advanced Data Structures Notes
109/142
-
8/11/2019 Advanced Data Structures Notes
110/142
-
8/11/2019 Advanced Data Structures Notes
111/142
-
8/11/2019 Advanced Data Structures Notes
112/142
-
8/11/2019 Advanced Data Structures Notes
113/142
-
8/11/2019 Advanced Data Structures Notes
114/142
-
8/11/2019 Advanced Data Structures Notes
115/142
-
8/11/2019 Advanced Data Structures Notes
116/142
-
8/11/2019 Advanced Data Structures Notes
117/142
-
8/11/2019 Advanced Data Structures Notes
118/142
117
We proceed by comparing successive characters of W to "parallel" characters of S, moving fromone to the next if they match. However, in the fourth step, we get S[3] is a space and W[3] = 'D',a mismatch. Rather than beginning to search again at S[1], we note that no 'A' occurs between
positions 0 and 3 in S except at 0; hence, having checked all those characters previously, weknow there is no chance of finding the beginning of a match if we check them again. Therefore
we move on to the next character, setting m = 4 and i = 0.1 2m: 01234567890123456789012S: ABC ABCDAB ABCDABCDABDEW: ABCDABDi: 0123456
We quickly obtain a nearly complete match "ABCDAB" when, at W[6] (S[10]), we again have adiscrepancy. However, just prior to the end of the current partial match, we passed an "AB"which could be the beginning of a new match, so we must take this into consideration. As wealready know that these characters match the two characters prior to the current position, we
need not check them again; we simply reset m = 8, i = 2 and continue matching the currentcharacter. Thus, not only do we omit previously matched characters of S, but also previouslymatched characters of W.
1 2m: 01234567890123456789012S: ABC ABCDAB ABCDABCDABDEW: ABCDABDi: 0123456
This search fails immediately, however, as the pattern still does not contain a space, so as in thefirst trial, we return to the beginning of W and begin searching at the next character of S: m = 11,reset i = 0.
1 2m: 01234567890123456789012S: ABC ABCDAB ABCDABCDABDEW: ABCDABDi: 0123456
Once again we immediately hit upon a match "ABCDAB" but the next character, 'C', does notmatch the final character 'D' of the word W. Reasoning as before, we set m = 15, to start at thetwo-character string "AB" leading up to the current position, set i = 2, and continue matchingfrom the current position.
1 2m: 01234567890123456789012S: ABC ABCDAB ABCDABCDABDEW: ABCDABDi: 0123456
This time we are able to complete the match, whose first character is S[15].Algorithm:
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
-
8/11/2019 Advanced Data Structures Notes
119/142
118
algorithm kmp_search:input:
an array of characters, S (the text to be searched)an array of characters, W (the word sought)
output:
an integer (the zero-based position in S at which W is found)
define variables:an integer, m 0 (the beginning of the current match in S) an integer, i 0 (the position of the current character in W) an array of integers, T (the table, computed elsewhere)
while m + i < length(S) doif W[i] = S[m + i] then
if i = length(W) - 1 thenreturn m
let i i + 1 elselet m m + i - T[i]if T[i] > -1 then
let i T[i] else
let i 0
(if we reach here, we have searched all of S unsuccessfully)return the length of S
Efficiency of KMP:
Since the two portions of the algorithm have, respectively, complexities of O(k) and O(n), thecomplexity of the overall algorithm is O(n + k).These complexities are the same, no matter how many repetitive patterns are in W or S.Implementation of KMP:#include#include#includeclass str{
private:char t[58],p[67],f[45];int i,j,m,n;
public:void failure(char[]);int kmpmatch(char[],char[]);};void str::failure(char x[]){
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
http://en.wikipedia.org/wiki/Array_data_typehttp://en.wikipedia.org/wiki/Array_data_type -
8/11/2019 Advanced Data Structures Notes
120/142
119
m=strlen(x); j=0;i=1;f[0]=0;while(i0)
j=f[j-1];else{
f[i]=0;i++;}}}int str::kmpmatch(char t[],char p[]){failure(p);i=0;j=0;n=strlen(t);while(i0)
j=f[i-1];elsei++;}return -1;}void main(){int i,j,m,n;str b;
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
-
8/11/2019 Advanced Data Structures Notes
121/142
120
clrscr();char t[50],p[20];coutt;coutp;int a=b.kmpmatch(t,p);if(a!=-1)cout
-
8/11/2019 Advanced Data Structures Notes
122/142
121
The Boyer-Moore algorithm searches for occurrences of P in T by performing explicit charactercomparisons at different alignments. Instead of a brute-force search of all alignments (of whichthere are m - n + 1), Boyer-Moore uses information gained by preprocessing P to skip as manyalignments as possible.The algorithm begins at alignment k = n, so the start of P is aligned with the start of T.
Characters in P and T are then compared starting at index n in P and k in T, moving backward:the strings are matched from the end of P to the start of P. The comparisons continue until eitherthe beginning of P is reached (which means there is a match) or a mismatch occurs upon whichthe alignment is shifted to the right according to the maximum value permitted by a number ofrules. The comparisons are performed again at the new alignment, and the process repeats untilthe alignment is shifted past the end of T, which means no further matches will be found.The shift rules are implemented as constant-time table lookups, using tables generated during the
preprocessing of P.
The Good Suffix Rule
Description
- - - - X - - K - - - - -M A N P A N A M A N A P -A N A M P N A M - - - - -- - - - A N A M P N A M -Demonstration of good suffix rule with pattern ANAMPNAM.
The good suffix rule is markedly more complex in both concept and implementation than the badcharacter rule. It is the reason comparisons begin at the end of the pattern rather than the start,and is formally stated thus :[3] Suppose for a given alignment of P and T, a substring t of T matches a suffix of P, but a
mismatch occurs at the next comparison to the left. Then find, if it exists, the right-most copy t'of t in P such that t' is not a suffix of P and the character to the left of t ' in P differs from thecharacter to the left of t in P. Shift P to the right so that substring t' in P is below substring t in T.If t' does not exist, then shift the left end of P past the left end of t in T by the least amount sothat a prefix of the shifted pattern matches a suffix of t in T. If no such shift is possible, then shiftP by n places to the right. If an occurrence of P is found, then shift P by the least amount so thata proper prefix of the shifted P matches a suffix of the occurrence of P in T. If no such shift is
possible, then shift P by n places, that is, shift P past T.
Preprocessing
The good suffix rule requires two tables: one for use in the general case, and another for usewhen either the general case returns no meaningful result or a match occurs. These tables will bedesignated L and H respectively. Their definitions are as follows :[3] For each i, L[i] is the largest position less than n such that string P[i..n] matches a suffix ofP[1..L[i]] and such that the character preceding that suffix is not equal to P[i-1]. L[i] is defined to
be zero if there is no position satisfying the condition.Let H[i] denote the length of the largest suffix of P[i..n] that is also a prefix of P, if one exists. Ifnone exists, let H[i] be zero.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
http://en.wikipedia.org/wiki/Brute-force_searchhttp://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm#cite_note-ASTS-3http://en.wikipedia.org/wiki/Brute-force_search -
8/11/2019 Advanced Data Structures Notes
123/142
122
Both of these tables are constructible in O(n) time and use O(n) space. The alignment shift forindex i in P is given by n - L[i] or n - H[i]. H should only be used if L[i] is zero or a match has
been found.Performance:The Boyer-Moore algorithm as presented in the original paper has worst-case running time of
O(n+m) only if the pattern does not appear in the text.Implementation of boyer moore:#include#include#includeclass bm{
public:char M[20],P[15];
public:int last(char);
int psize();int boyer(int,int);int min(int,int);
};
int bm::last(char ch){
int l[150],i,k;for(i=0;i
-
8/11/2019 Advanced Data Structures Notes
124/142
123
return i;else
{i=i-1;
j=j-1;
}}else
{i=i+m-min(j,1+last(M[i]));
j=m-1;}} while(ii)?i:j);}
void main(){
bm b1;int m,n,x;clrscr();coutb1.M;coutb1.P;
m=strlen(b1.P);n=strlen(b1.M);x=b1.boyer(n,m);if(x==-1)
cout
-
8/11/2019 Advanced Data Structures Notes
125/142
124
Ex: bit one of 1000 is 1, and bits two , three , four are 0. All keys in the left subtree of a node at level I have bit i equal to zero whereas those in
the right subtree of nodes at this level have bit i = 1. Assume fixed number of bits Not empty =>
Root contains one dictionary pair (any pair) All remaining pairs whose key begins with
a 0 are in the left subtree. All remaining pairs whose key begins with
a 1 are in the right subtree. Left and right subtrees are digital subtrees
on remaining bits.This digital search tree contains the keys 1000,0010,1001,0001,1100,0000
Example: Start with an empty digital search tree and
insert a pair whose key is 0110
Now , insert a pair whose key is 0010
Now , insert a pair whose key is 1001
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
-
8/11/2019 Advanced Data Structures Notes
126/142
125
Now insert a pair whose key is 1011
Now , insert a pair whose key is 0000
Search and Insert: The digital search tree functions to search and insert are quite similar to the
corresponding functions for binary search trees. The essential difference is that the subtree to move to is determined by a bit in the searchkey rather than by the result of the comparison of the search key and the key in thecurrent node.
Try to build the digital search tree: A 00001 S 10011 E 00101 R 10010 C 00011 H 01000
I 01001 N 01110 G 00111 X 11000 M 01101 P 10000
When we dealing with very long keys, the cost of a key comparison is high. We can reduce thenumber of key comparisons to one by using a related structure called Patricia
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
-
8/11/2019 Advanced Data Structures Notes
127/142
126
We shall develop this structure in three steps. First, we introduce a structure called a binary trie.
Then we transform binary tries into compressed binary tries. Finally, from compressed binary tries we obtain Patricia.
7.5 Binary Trie A binary trie is a binary tree that has two kinds of nodes: branch nodes and elementnodes.
A branch node has the two data members LeftChild and RightChild. It has no datamember.
An element node has the single data member data. Branch nodes are used to build a binary tree search structure similar to that of a digital
search tree. This leads to element nodesSix element binary trie:
Compressed Binary trie: The binary trie contains branch nodes whose degree is one. By adding another data
member, BitNumber , to each branch node, we can eliminate all degree-one branch nodesfrom the trie. The BitNumber data member of a branch node gives the bit number of thekey that is to be used at this node.
Binary Trie with degree one nodes eliminated:
7.6 Patricia:( Practical Algorithm to Retrieve Information Coded inAlphanumeric)
Compressed binary tries may be represented using nodes of a single type. The new nodes,called augmented branch nodes, are the original branch nodes augmented by the data
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
-
8/11/2019 Advanced Data Structures Notes
128/142
127
member data. The resulting structure is called Patricia and is obtained from a compressed binary trie in the following way:
(1)Replace each branch node by an augmented branch node. (2)Eliminate the element nodes. (3)Store the data previously in the element node in the data data members of the
augmented branch nodes. Since every nonempty compressed binary trie has one less branch node than it has element nodes, it is necessary to add one augmented branch node.This node is called the head node . The remaining structure is the left subtree of the headnode. The head node has BitNumber equal to zero. Its right-child data member is notused. The assignment of data to augmented branch node is less than or equal to that in the
parent of the element node that contained this data . (4)Replace the original pointers to element nodes by pointers to the respective augmented
branch nodes.
typedef struct patricia_tree *patricia;struct patricia_tree {
int bit_number;
element data; patricia left_child, right_child;};
patricia root; patricia search:Patricia search(patricia t, unsigned k){/*search the Patricia tree t; return the last node y encountered; if k = y ->data.key, the key is inthe tree */Patricia p, y;If (!t) return NULL; /* empty tree*/y=t->left_child;
p=t;while (y->bit_number > p->bit_number){
p=y;y=(bit(k, y->bit_number)) ?y->right_child : y->left_child;
}return y;
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
-
8/11/2019 Advanced Data Structures Notes
129/142
128
}Patricia Insert:void insert (patricia *t, element x){/* insert x into the Patricia tree *t */
patricia s, p, y, z;
int i;if (!(*t)) { /* empty tree*/*t = (patricia)malloc(sizeof(patricia_tree));if (IS_FULL(*t)) {
fprintf(stderr, The memory is full \n) ; exit(1);
}(*t)->bit_number = 0(*t)->data = x;(*t)->left_child = *t;
}
y = search(*t,x.key);if (x.key == y->data.key) {fprintf(stderr, The key is in the tree. Insertion fails. \n); exit(1);}
/* find the first bit where x.key and y->data.key differ*/for(i = 1; bit (x.key,i) == bit(y->data.key,i); i++ );/* search tree using the first i-1 bits*/s = (*t)->left_child;
p = *t;while (s->bit_number > p->bit_number && s->bit_number < 1){
p = s;s = (bit(x.key,s->bit_number)) ?
s->right_child : s->left_child;}/* add x as a child of p */z = (patricia)malloc(sizeof(patricia_tree));if (IS_FULL(z)) {
fprintf(stderr, The memory is full \n); exit(1);
}z->data = x;z->bit_number = i;z->left_child = (bit(x.key,i)) ? s: z;z->right_child = (bit(x.key,i)) ? z : s;if (s == p->left_child) p->left_child = z;else
p->right_child = z;
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
-
8/11/2019 Advanced Data Structures Notes
130/142
129
Assignment Questions:Pattern matching and Tries
1. Explain Pattern matching algorithmsi. the Boyer Moore algorithm
ii. the Knuth-Morris-Pratt algorithm
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
-
8/11/2019 Advanced Data Structures Notes
131/142
130
2. Define Tries?Give the concepts of digital search tree? What are the applications ofTries.
3. Write a short notes oni. binary trie
ii. Patricia
iii.
Multi-way trie4. Describe an efficient algorithm to find the longest palindrome that is a suffixof a string T of length n.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
-
8/11/2019 Advanced Data Structures Notes
132/142
131
UNIT-VIIITopics: File Structures: Fundamental File Processing Operations-opening files, closingfiles, Reading and Writing file contents, Special characters in files.Fundamental File Structure Concepts- Field and record organization, Managing fixed-length,fixed-field buffers.
8.1 Fundamental file processing operations physical file
A file as seen by the operating system, and which actually exists on secondary storage.logical file
A file as seen by a programPrograms read and write data from logical files.Before a logical file can be used, it must be associated with a physical file.This act of connection is called "opening" the file.Data in a physical file is persistent.Data in a logical file is temporary.A logical file is identified (within the program) by a program variable or constant.
C++ supports file access on three levels:Unbuffered, unformatted file access using handles.Buffered, formatted file access using the FILE structure.Buffered, formatted file access using classes.
8.1.1Opening Filesopen
To associate a logical program file with a physical system file. protection mode
The security status of a file, defining who is allowed to access a file and which accessmodes are allowed.
access modeThe type of (file) access allowed.
file descriptorA cardinal number used as the identifier for a logical file by operating systems such asUNIX and PC-DOS.In C++, a file is opened by using library functions.The name of the physical file must be supplied to an open function.The open function must also be supplied with an access mode.The open function can also be supplied with a protection mode.The access mode has several aspects:Is the file is to be accessed by reading, by writing, or by both?
What should be done with existing contents of the file?Should a new file be created if none exists?Should any character translation be done?For handle level access, the logical file is declared as an int .The handle is also known as a file descriptor. The C++ open function is used to open a file for handle level access.The value returned by the open is assigned to the file variable.For FILE level access, the logical file is declared as a FILE * .
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
http://cpp.comsci.us/etymology/function/open.htmlhttp://cpp.comsci.us/etymology/function/open.html -
8/11/2019 Advanced Data Structures Notes
133/142
132
The C++ fopen function is used to open a file for FILE level access.The value returned by the fopen is assigned to the file variable.For class level access, the logical file is declared as an fstream (or as an ifstream or anofstream .)The C++ open method of the class is used to open a file for class level access.
8.1.2 Closing FilesTo disassociate a logical program file from a physical system file.Closing a file frees system resources for reuse.Data may not be actually written to the physical file until a logical file is closed.A programs should close a file when it is no longer needed.The C++ close function is used to close a file for handle level access.The logical file to be closed is an argument to the handle close function.The C++ fclose function is used to close a file for FILE level access.The logical file to be closed is an argument to the FILE fclose function.The C++ close method of the class is used to close a file for class level access.
8.1.3 Reading and writing
read To tranfer data from a file to program variable(s).write
To tranfer data to a file from program variable(s) or constant(s).end-of-file
A physical location just beyond the last datum in a file.The read and write operations are performed on the logical file with calls to libraryfunctions.For read, one or more variables must be supplied to the read function, to receive the datafrom the file.For write, one or more values (as variables or constants) must be supplied to the writefunction, to provide the data for the file.For unformatted transfers, the amount of data to be transferred must also be supplied.The C++ read function is used to read a file with handle level access.The C++ write function is used to write a file with handle level access.The C++ fread function is used to read a file with FILE level access.The C++ fwrite function is used to write a file with FILE level access.The C++ read method of the class is used to read a file with class level access.The C++ write method of the class is used to write a file with class level access.The acronym for end-of-file is EOF.When a file reaches EOF, no more data can be read.Data can be written at or past EOF.The C++ eof function is used to detect end-of-file with handle level access.The handle eof function detects when the file pointer is at end-of-file.The C++ feof function is used to detect end-of-file with FILE level access.The FILE feof function detects when the file pointer is past end-of-file.The C++ eof method of the class is used to detect end-of-file with class level access.The stream class eof function detects when the file pointer is past end-of-file.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
http://cpp.comsci.us/etymology/function/fopen.htmlhttp://cpp.comsci.us/etymology/method/open.htmlhttp://cpp.comsci.us/etymology/function/close.htmlhttp://cpp.comsci.us/etymology/function/fclose.htmlhttp://cpp.comsci.us/etymology/method/close.htmlhttp://cpp.comsci.us/etymology/function/read.htmlhttp://cpp.comsci.us/etymology/function/write.htmlhttp://cpp.comsci.us/etymology/function/fread.htmlhttp://cpp.comsci.us/etymology/function/fwrite.htmlhttp://cpp.comsci.us/etymology/method/read.htmlhttp://cpp.comsci.us/etymology/method/write.htmlhttp://cpp.comsci.us/etymology/function/eof.htmlhttp://cpp.comsci.us/etymology/function/feof.htmlhttp://cpp.comsci.us/etymology/method/eof.htmlhttp://cpp.comsci.us/etymology/method/eof.htmlhttp://cpp.comsci.us/etymology/function/feof.htmlhttp://cpp.comsci.us/etymology/function/eof.htmlhttp://cpp.comsci.us/etymology/method/write.htmlhttp://cpp.comsci.us/etymology/method/read.htmlhttp://cpp.comsci.us/etymology/function/fwrite.htmlhttp://cpp.comsci.us/etymology/function/fread.htmlhttp://cpp.comsci.us/etymology/function/write.htmlhttp://cpp.comsci.us/etymology/function/read.htmlhttp://cpp.comsci.us/etymology/method/close.htmlhttp://cpp.comsci.us/etymology/function/fclose.htmlhttp://cpp.comsci.us/etymology/function/close.htmlhttp://cpp.comsci.us/etymology/method/open.htmlhttp://cpp.comsci.us/etymology/function/fopen.html -
8/11/2019 Advanced Data Structures Notes
134/142
133
Seeking:seek
To move to a specified location in a file. byte offset
The distance, measured in bytes, from the beginning.
Seeking moves an attribute in the file called the file pointer. C++ library functions allow seeking.In DOS, Windows, and UNIX, files are organized as streams of bytes, and locations arein terms of byte count.Seeking can be specified from one of three reference points:
o The beginning of the file.o The end of the file.o The current file pointer position.
The C++ lseek function is used to seek with handle level access.The C++ fseek function is used to seek with FILE level access.The C++ seekg method of the class is used to seek with class level access for read (get.)
The C++ seekp method of the class is used to seek with class level access for write (put.)C++ allows, but does not require, separate file pointers and seek functions for readingand writing.Most implementations of C++ do not have separate file pointers.
8.1.4 Special characters in filesSpecifics of files can vary with the operating system.The C++ language was written originally for the UNIX operating system.UNIX and DOS (Windows) systems handle separators between lines differently.In UNIX files, lines are separated by a single new line character (ASCII line feed .)In DOS (Windows) files, lines are separated by a two characters (ASCII carriage return and line feed .)When DOS files are opened in text mode, the internal separator ('\n') is translated to thethe external separator () during read and write.When DOS files are opened in binary mode, the internal separator ('\n') is not translatedto the the external separator () during read and write.In DOS (Windows) files, end-of-file can be marked by a "control-Z" character (ASCIISUB).In C++ implementations for DOS, a control-Z in a file is interpreted as end-of-file.Other operating systems may handle line separation and end-of-file differently.Implementations of C++ should treat text files so that the internal representations are thesame as UNIX.On UNIX systems, files opened in text mode or binary mode behave the same way.
8.2 Fundamental file structure conceptsPersistent=Retained after execution of the program which created it.When we build file structures, we are making it possible to make data persistent . That is,one program can store data from memory to a file, and terminate. Later, another programcan retrieve the data from the file, and process it in memory.In this chapter, we look at file structures which can be used to organize the data withinthe file, and at the algorithms which can be used to store and retrieve the datasequentially.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
http://cpp.comsci.us/etymology/function/lseek.htmlhttp://cpp.comsci.us/etymology/function/fseek.htmlhttp://cpp.comsci.us/etymology/method/seekg.htmlhttp://cpp.comsci.us/etymology/method/seekp.htmlhttp://cs.tulsa.to/datacom/ascii.htmlhttp://cs.tulsa.to/datacom/ascii.htmlhttp://cs.tulsa.to/datacom/ascii.htmlhttp://cs.tulsa.to/datacom/ascii.htmlhttp://cpp.comsci.us/etymology/method/seekp.htmlhttp://cpp.comsci.us/etymology/method/seekg.htmlhttp://cpp.comsci.us/etymology/function/fseek.htmlhttp://cpp.comsci.us/etymology/function/lseek.html -
8/11/2019 Advanced Data Structures Notes
135/142
134
8.2.1 Field and Record organizationRecord:A subdivision of a file, containing data related to a single entity.Field:A subdivision of a record containing a single attribute of the entity which the recorddescribes.stream of bytes:A file which is regarded as being without structure beyond separation
into a sequential set of bytes.Within a program, data is temporarily stored in variables.Individual values can be aggregated into structures, which can be treated as a singlevariable with parts.In C++, classes are typically used as as an aggregate structure.C++ Person class (version 0.1):class Person { public:
char FirstName [11];char LastName[11];char Address [21];
char City [21];char State [3];char ZIP [5];
};With this class declaration, variables can be declared to be of type Person. Theindividual fields within a Person can be referred to as the name of the variable and thename of the field, separated by a period (.).C++ Program:#include
class Person { public:
char FirstName [11];char LastName[11];char Address [31];
char City [21];char State [3];
char ZIP [5];};
void Display (Person);
int main () {Person Clerk;Person Customer;
strcpy (Clerk.FirstName, "Fred");strcpy (Clerk.LastName, "Flintstone");strcpy (Clerk.Address, "4444 Granite Place");strcpy (Clerk.City, "Rockville");
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
-
8/11/2019 Advanced Data Structures Notes
136/142
135
strcpy (Clerk.State, "MD");strcpy (Clerk.ZIP, "00001");
strcpy (Customer.FirstName, "Lily");strcpy (Customer.LastName, "Munster");
strcpy (Customer.Address, "1313 Mockingbird Lane");strcpy (Customer.City, "Hollywood");strcpy (Customer.State, "CA");strcpy (Customer.ZIP, "90210");
Display (Clerk);Display (Customer);
}
void Display (Person Someone) {cout
-
8/11/2019 Advanced Data Structures Notes
137/142
136
Fixed length records: A record which is predetermined to be the same length as the other recordsin the file.
Record 1 Record 2 Record 3 Record 4 Record 5The file is divided into records of equal size.
All records within a file have the same size.Different files can have different length records.Programs which access the file must know the record length.Offset, or position, of the nth record of a file can be calculated.There is no external overhead for record separation.There may be internal fragmentation (unused space within records.)There will be no external fragmentation (unused space outside of records) except fordeleted records.Individual records can always be updated in place.
Example (80 byte records):
0 66 69 72 73 74 20 6C 69 6E 65 0 0 1 0 0 0 first line......10 0 0 0 0 0 0 0 0 FF FF FF FF 0 0 0 0 ................20 68 FB 12 0 DC E0 40 0 3C BA 42 0 78 FB 12 0 h.....@.
-
8/11/2019 Advanced Data Structures Notes
138/142
137
Offset, or position, of the nth record of a file cannot be calculated.There is external overhead for record separation equal to the size of the delimiter per record.There should be no internal fragmentation (unused space within records.)There may be no external fragmentation (unused space outside of records) after file updating.Individual records cannot always be updated in place.
Example (Delimiter = ASCII 30 (IE) = RS character:0 66 69 72 73 74 20 6C 69 6E 65 1E 73 65 63 6F 6E first line.secon
10 64 20 6C 69 6E 65 1E d line.Example (Delimiter = '\n'):
0 46 69 72 73 74 20 28 31 73 74 29 20 4C 69 6E 65 First (1st) Line10 D A 53 65 63 6F 6E 64 20 28 32 6E 64 29 20 6C ..Second (2nd) l20 69 6E 65 D A ine..
Disadvantage: the offset of each record cannot be calculated from its record number. Thismakes direct access impossible.Advantage: there is space overhead for the length prefix.
Advantage: there will probably be no internal fragmentation (unusable space within records.)Length prefixed variable length records:
110 Record 1 40 Record2
100 Record 3 80 Record 4 70 Record 5
The records within a file are prefixed by a length byte or bytes.Records within a file can have different sizes.Different files can have different length records.Programs which access the file must know the size and format of the length prefix.Offset, or position, of the nth record of a file cannot be calculated.There is external overhead for record separation equal to the size of the length prefix per
record.There should be no internal fragmentation (unused space within records.)There may be no external fragmentation (unused space outside of records) after fileupdating.Individual records cannot always be updated in place.Example:
0 A 0 46 69 72 73 74 20 4C 69 6E 65 B 0 53 65 ..First Line..Se10 63 6F 6E 64 20 4C 69 6E 65 1F 0 54 68 69 72 64 cond Line..Third20 20 4C 69 6E 65 20 77 69 74 68 20 6D 6F 72 65 20 Line with more30 63 68 61 72 61 63 74 65 72 73 characters
Disadvantage: the offset of each record can be calculated from its record number. Thismakes direct access possible.Disadvantage: there is space overhead for the delimiter suffix.Advantage: there will probably be no internal fragmentation (unusable space withinrecords.)
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
-
8/11/2019 Advanced Data Structures Notes
139/142
138
Indexed variable length records:
An auxiliary file can be used to point to the beginning of each record.In this case, the data records can be contiguous.If the records are contiguous, the only access is through the index file.Example:Index File:
0 12 0 0 0 25 0 0 0 47 0 0 0 ....%...G...
Data File:
0 46 69 72 73 74 20 28 31 73 74 29 20 53 74 72 69 First (1st) Stri10 6E 67 53 65 63 6F 6E 64 20 28 32 6E 64 29 20 53 ngSecond (2nd) S20 74 72 69 6E 67 54 68 69 72 64 20 28 33 72 64 29 tringThird (3rd)30 20 53 74 72 69 6E 67 20 77 68 69 63 68 20 69 73 String which is40 20 6C 6F 6E 67 65 72 longer
Advantage: the offset of each record is be contained in the index, and can be looked up from itsrecord number. This makes direct access possible.Disadvantage: there is space overhead for the index file.Disadvantage: there is time overhead for the index file.Advantage: there will probably be no internal fragmentation (unusable space within records.)
The time overhead for accessing the index file can be minimized by reading the entireindex file into memory when the files are opened.
Fixed field count records:Records can be recognized if they always contain the same (predetermined) number of fields.Delineation of fields in a record:Fixed length fields:
Field 1 Field 2 Field 3 Field 4 Field 5Each record is divided into fields of correspondingly equal size.Different fields within a record have different sizes.Different records can have different length fields.
Programs which access the record must know the field lengths.There is no external overhead for field separation.There may be internal fragmentation (unused space within fields.)
Delimited variable length fields:
Field 1 ! Field2 ! Field 3 ! Field 4 ! Field 5 !
The fields within a record are followed by a delimiting byte or series of bytes.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
-
8/11/2019 Advanced Data Structures Notes
140/142
139
Fields within a record can have different sizes.Different records can have different length fields.Programs which access the record must know the delimiter.The delimiter cannot occur within the data.If used with delimited records, the field delimiter must be different from the record
delimiter.There is external overhead for field separation equal to the size of the delimiter per field.There should be no internal fragmentation (unused space within fields.)
Length prefixed variable length fields:
12 Field 1 4 Field2 10 Field 3 8 Field 4 7 Field 5
The fields within a record are prefixed by a length byte or bytes.Fields within a record can have different sizes.Different records can have different length fields.Programs which access the record must know the size and format of the length prefix.
There is external overhead for field separation equal to the size of the length prefix perfield.There should be no internal fragmentation (unused space within fields.)
Representing record or field length:Record or field length can be represented in either binary or character form.The length can be considered as another hidden field within the record.This length field can be either fixed length or delimited.When character form is used, a space can be used to delimit the length field.A two byte fixed length field could be used to hold lengths of 0 to 65535 bytes in binaryform.A two byte fixed length field could be used to hold lengths of 0 to 99 bytes in decimal
character form.A variable length field delimited by a space could be used to hold effectively any length.In some languages, such as strict Pascal, it is difficult to mix binary values and charactervalues in the same file.The C++ language is flexible enough so that the use of either binary or character formatis easy.
Tagged fields:Tags, in the form "Keyword=Value", can be used in fields.Use of tags does not in itself allow separation of fields, which must be done with anothermethod.Use of tags adds significant space overhead to the file.
Use of tags does add flexibility to the file structure.Fields can be added without affecting the basic structure of the file.Tags can be useful when records have sparse fields - that is, when a significant number ofthe possible attributes are absent.
Mixing numbers and Characters: Use of a File DumpFile-dump gives us the ability to look inside a file at the actual bytes that are storedOctal Dump: od -xc filenamee.g. The number 40, stored as ASCII characters and as a short integer
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
-
8/11/2019 Advanced Data Structures Notes
141/142
140
Byte order:The byte order of integers (and floating point numbers) is not the same on all computers.This is hardware dependent (CPU), not software dependent.Many computers store numbers as might be expected: 40 10 = 28 16 is stored in a four byte
integer as 00 00 00 28.PCs reverse the byte order, and store numbers with the least significant byte first: 40 10 =2816 is stored in a four byte integer as 28 00 00 00.On most computers, the number 40 would be stored in character form in its ASCIIvalues: 34 30.IBM mainframe computers use EBCDIC instead of ASCII, and would store "40" as F4F0.
8.3 Managing Fixed length,fixed field buffersFor having the fixed length and fixed field buffer instead of writing the size of each field or eachrecord we can write the methods that control the fixed length of each field.The class FixedLengthBuffer is subclass of IOBuffer.This class supports both the fixed length
and fixed field buffers.The object of FixedLengthBuffer class can record the size of each record.FixedLengthBuffer class as given belowclass FixedFieldBuffer:public FixedLengthBuffer{
public:FixedFieldBuffer(int maxFields,int RecordSize=3000);FixedFieldBuffer(int maxFields,int *fieldSize);int AddField(int fieldSize);//define the next fieldint Pack(const void* field,int size=-1);int Unpack(void * field,int maxBytes=-1);int NumberOfFields()const;// return number of defined fields
protected:int * FieldSize;//array to hold field sizesint maxFields;//max number of fieldsint NumFields;//actual number of defined fields};The AddField method is used to specify the size of the field.The total number of fields can beobtained using NumberOfFields method.
Assignment Questions:
File Structures1. Explain the fundamental File Processing Operationsi. opening files
ii. closing filesiii. Reading and Writing file contentsiv. Special characters in files.
2. Discuss the fundamental File Structure Conceptsi. Field and record organization
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net
www.jntuworld.com || www.jwjobs.net
-
8/11/2019 Advanced Data Structures Notes
142/142
ii. Managing fixed-length,iii. fixed-field buffers.
www.jntuworld.com || www.android.jntuworld.com || www.jwjobs.net || www.android.jwjobs.net