Files are used for long-term retention of large amounts of data, even after the program that created...

97
Chapter 17 Files and Streams, LINQ*

Transcript of Files are used for long-term retention of large amounts of data, even after the program that created...

Page 1: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Chapter 17Files and Streams, LINQ*

Page 2: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Data Hierarchy Files are used for long-term retention of large amounts of data, even after the

program that created the data terminates. persistent data.

The smallest data item that computers support is called a bit short for “binary digit”—a digit that can assume one of two values

Digits, letters and special symbols are referred to as characters Bytes are composed of 8 bits.

C# uses the Unicode® character set (www.unicode.org) in which characters are composed of 2 bytes.

Just as characters are composed of bits, fields are composed of characters. A field is a group of characters that conveys meaning. Typically, a record is composed of several related fields. A file is a group of related records.

To facilitate the retrieval of specific records from a file, at least one field in each record is chosen as a record key, which uniquely identifies a record.

A common file organization is called a sequential file, in which records typically are stored in order by a record-key field.

Page 3: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Files

A file can be seen as 1. A stream of bytes (no structure), or2. A collection of records with fields

Page 4: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Random access

To access a specific record without having to retrieve all records before it

File structures allowing this: ◦ indexed files ◦ hashed files

To access a record in a file randomly, we need to know the address of the record

Page 5: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Random files: indexed

Can have more than one index, each with a different key. For example, an employee file can be retrieved based on either social security number or last name. This type of indexed file is usually called an inverted file.

Page 6: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Random files: hashed

A hashed file uses a mathematical function to accomplish this mapping. The user gives the key, the function maps the key to the address and passes it to the operating system, and the record is retrieved

Page 7: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Random files: hashing methodsMany several hashing methods

Example: Direct hashingThe key is the data file address without any algorithmic manipulation. The file must therefore contain a record for every possible key. Although situations suitable for direct hashing are limited, it can be very powerful, because it guarantees that there are no synonyms or collisions, as with other methods.

Page 8: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Random files: hashing methods cont

Example: Modulo division (%) hashing

(division remainder hashing) divides the key by the file size and uses the remainder plus 1 for the address. This gives the simple hashing algorithm that follows, where list_size is the number of elements in the file. The reason for adding a 1 to the mod operation result is that the list starts with 1 instead of 0.

Page 9: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Sequential Files

Records can only be accessed one after another from beginning to end.

Records are stored one after another in auxiliary storage◦ disk◦ tape

EOF (end-of-file) marker after the last record. ◦ The operating system has no information about the record addresses, it

only knows where the whole file is stored. ◦ The operating system knows that records are sequential.

Page 10: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

17.2 Data Hierarchy

Page 11: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Sequential file: Processing records

Pseudo codeWhile Not EOF{

//Read the next record//Process the record

}

Record key Identifies a record to facilitate the retrieval of specific records

from a file

Page 12: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Sequential file: Applications

that need to access all records from beginning to end◦Ex.: Personal information

Because each record is processed, sequential access is more efficient and easier than random access

Page 13: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Sequential vs Random (https://technet.microsoft.com/en-us/library/cc938619.aspx)

Comparing random versus sequential operations is one way of assessing application efficiency in terms of disk use.

Accessing data sequentially is much faster than accessing it randomly because of the way in which the disk hardware works. The seek operation, which occurs when the disk head positions itself at the right disk cylinder to access data requested, takes more time than any other part of the I/O process.

Because reading randomly involves a higher number of seek operations than does sequential reading, random reads deliver a lower rate of throughput. The same is true for random writing. You might find it useful to examine your workload to determine whether it accesses data randomly or sequentially. If you find disk access is predominantly random, you might want to pay particular attention to the activities being done and monitor for the emergence of a bottleneck.

For workloads of either random or sequential I/O, use drives with faster rotational speeds. For workloads that are predominantly random I/O, use a drive with faster seek time.

Page 14: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Sequential files: updating

Must be updated periodically to reflect changes in information

all of the records need to be checked and updated (if necessary) sequentially: ◦ new Master File◦ old Master File◦ transaction File

(contains changes to be applied to the master file) add delete change

◦ Report Message

OLD MASTER

TRANSACTION

NEWMASTER

ERRORMESSAGES

UPDATEPROGRAM

Page 15: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Example: Sequential Update with Data Files

ERROR MESSAGES:

NO MATCH 500000000DUPLICATE ADDITION 888888888

UPDATE

OLD MASTER FILE:

111111111ADAMS 015000 NEW YORK222222222BAKER 025000 NEW YORK333333333ZIDROW 008000 NEW YORK444444444MILGROM 040000 BOSTON555555555BENJAMIN 100000 CHICAGO666666666SHERRY 007500 CHICAGO777777777BOROW 017500 BOSTON888888888JAMES 050000 NEW YORK 999999999RENAZEV 030000 NEW YORK

TRANSACTION FILE:

222222222 028000 C222222222 BOSTON C400000000NEW EMPLOYEE 016000 BOSTON A500000000 020000 C610000000NEW EMPLOYEE II 018000 CHICAGO A610000000 NEW YORK C 666666666SHERRY D777777777 055000 C888888888JAMES 017500 NEW YORK A

NEW MASTER FILE:

111111111ADAMS 015000 NEW YORK222222222BAKER 028000 BOSTON333333333ZIDROW 008000 NEW YORK400000000NEW EMPLOYEE 016000 BOSTON444444444MILGROM 040000 BOSTON 555555555BENJAMIN 100000 CHICAGO 610000000NEW EMPLOYEE II 018000 NEW YORK777777777BOROW 055000 NEW YORK888888888JAMES 050000 NEW YORK999999999RENAZEV 030000 NEW YORK

Page 16: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

A text file is a file of characters. It cannot contain integers, floating-point numbers, or any other data structures in their internal memory format. To store these data types, they must be converted to their character equivalent formats.

Some files can only use character data types. Most notable are file streams (input/output objects in some object-oriented language like C#, C++, Java) for keyboards, monitors and printers. This is why we need special functions to format data that is input from or output to these devices.

Text files

Page 17: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Binary files

A binary file is a collection of data stored in the internal format of the computer. In this definition, data can be an integer (including other data types represented as unsigned integers, such as image, audio, or video), a floating-point number or any other structured data (except a file).

Unlike text files, binary files contain data that is meaningful only if it is properly interpreted by a program. If the data is textual, one byte is used to represent one character (in ASCII encoding). But if the data is numeric, two or more bytes are considered a data item.

Page 18: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

17.3 Files and Streams C# views each file as a sequential stream of bytes

When a console application executes, the runtime environment creates the Console.Out, Console.In and Console.Error streams◦ Console.In refers to the standard input stream object, which

enables a program to input data from the keyboard. ◦ Console.Out refers to the standard output stream object, which

enables a program to output data to the screen. ◦ Console.Error refers to the standard error stream object, which

enables a program to output error messages to the screen. Console methods Write and WriteLine use Console.Out to perform

output Console methods Read and ReadLine use Console.In to perform

input

Page 19: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

17.4 Classes File and Directory

Page 20: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Class Directory provides capabilities for manipulating directories

The DirectoryInfo object returned by method CreateDirectory contains information about a directory

!

Page 21: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 22: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 23: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 24: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 25: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 26: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Using (from msdn)

Defines a scope, outside of which an object or objects will be disposed.

C#, through the .NET Framework common language runtime (CLR), automatically releases the memory used to store objects that are no longer required. The release of memory is non-deterministic; memory is released whenever the CLR decides to perform garbage collection. However, it is usually best to release limited resources such as file handles and network connections as quickly as possible.

The using statement allows the programmer to specify when objects that use resources should release them. The object provided to the using statement must implement the IDisposable interface. This interface provides the Dispose  method, which should release the object's resources.

A using statement can be exited either when the end of the using statement is reached or if an exception is thrown and control leaves the statement block before the end of the statement.

Page 27: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

map

Page 28: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 29: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 30: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 31: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 32: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 33: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 34: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 35: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Revisit LINQ*

class IntroToLINQ

{

static void Main()

{

// The Three Parts of a LINQ Query:

// 1. Data source.

int[] numbers = newint[7] { 0, 1, 2, 3, 4, 5, 6 };

 

// 2. Query creation.// numQuery is an IEnumerable<int>

var numQuery =

from num in numbers

where (num % 2) == 0

select num;

 

// 3. Query execution.

foreach (int num in numQuery)

{

Console.Write("{0,1} ", num);

}

}

}

Page 36: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Revisit LINQ cont In LINQ the execution of the query is distinct from the query itself

No data retrieving by creating a query variable

Create a data source from an XML document

// using System.Xml.Linq;

XElement contacts =

XElement.Load(@"c:\myContactList.xml");

Northwnd db=new Northwnd(@"c:\northwnd.mdf"); // Query for customers in London.

IQueryable<Customer> custQuery =

from cust in db.Customers

where cust.City == "London"

select cust;

Page 37: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Revisit LINQ executionQueries that perform aggregation functions over a range of source elements must first iterate over those elements. Examples of such queries are Count, Max, Average, and First. These execute without an explicit foreach statement because the query itself must use foreach in order to return a result.

Note also that these types of queries return a single value, not an IEnumerable collection.

var evenNumQuery =

from num in numbers

where (num % 2) == 0

select num;

int evenNumCount = evenNumQuery.Count();

Page 38: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Revisit LINQ forced executionTo force immediate execution of any query and cache its results,

call the ToList <TSource> or ToArray<TSource> methods.

List<int> numQuery2 =

(from num in numbers

where (num % 2) == 0

select num).ToList();

or

int[]var numQuery3 =

(from num in numbers

where (num % 2) == 0

select num).ToArray();

Page 39: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

17.5 Creating a Sequential-Access Text File

Step 1. Build REUSABLE dll in BankLibrary.sln. Record simulator

Page 40: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 41: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 42: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 43: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 44: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 45: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

CreateFileForm Class SaveFileDialog is used for selecting files.

The constant FileMode.OpenOrCreate indicates that the FileStream should open the file if it exists or create the file if it does not.

To preserve the original contents of a file, use FileMode.Append.

The constant FileAccess.Write indicates that the program can perform only write operations with the FileStream object.

There are two other FileAccess constants—FileAccess.Read for read-only access and FileAccess.ReadWrite for both read and write access.

An IOException is thrown if there is a problem opening the file or creating the StreamWriter .

StreamWriter method WriteLine writes a sequence of characters to a file.

The StreamWriter object is constructed with a FileStream argument that specifies the file to which the StreamWriter will output text.

Method Close throws an IOException if the file or stream cannot be closed properly.

Page 46: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Same as “”

Page 47: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 48: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 49: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 50: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 51: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Must be a Key

Page 52: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 53: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 54: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 55: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 56: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 57: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Select where to save first, than enter data

Page 58: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 59: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Clients.txt

100,Nancy,Brown,-25.54200,Stacey,Dunn,314.33300,Doug,Barker,0.00400,Dave,Smith,258.34500,Sam,Stone,34.98

Page 60: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

17.6 Reading Data from a Sequential-Access Text File

Page 61: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

ReadSequentialAccessFileForm

OpenFileDialog is used to open a file. The behavior and GUI for the Save and Open dialog types are

identical, except that Save is replaced by Open. Specify read-only access to a file by passing constant FileAccess.Read as the third argument to the FileStream constructor.

StreamReader method ReadLine reads the next line from the file.

Page 62: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

File Position

• A FileStream object can reposition its file-position pointer to any position in the file.

• When a FileStream object is opened, its file-position pointer is set to byte position 0.

• You can use StreamReader property BaseStream to invoke the Seek method of the underlying FileStream to reset the file-position pointer back to the beginning of the file.

• Exercise: Add a Start/Beginning button to the ReadSequentialAccessFileForm program to go to the very first record.

Page 63: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 64: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 65: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 66: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 67: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 68: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 69: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 70: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

17.7 Case Study: Credit Inquiry Program

Page 71: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 72: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 73: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 74: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

filter

Page 75: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 76: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 77: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 78: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 79: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 80: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Serialization

Class BinaryFormatter enables entire objects to be written to or read from a stream.

BinaryFormatter method Serialize writes an object’s representation to a file.

BinaryFormatter method Deserialize reads this representation from a file and reconstructs the original object.

Both methods throw a SerializationException if an error occurs during serialization or deserialization.

Page 81: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Serialization (writing to a file)

• Method Serialize takes the FileStream object as the first argument so that the BinaryFormatter can write its second argument to the correct file.

• Remember that we are now using binary files, which are not human readable.

Page 82: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

17.9 Creating a Sequential-Access File Using Object Serialization

Page 83: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 84: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 85: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 86: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 87: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 88: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 89: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 90: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 91: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Deserialization (reading from a file)

• Deserialize returns a reference of type object.• If an error occurs during deserialization, a SerializationException is thrown.

Page 92: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

17.10 Reading and Deserializing Data from a Binary File

Page 93: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.

Clients.bin

100,Nancy,Brown,-25.54 200,Stacey,Dunn,314.33 300,Doug,Barker,0.00 400,Dave,Smith,258.34 500,Sam,Stone,34.98

Page 94: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 95: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 96: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.
Page 97: Files are used for long-term retention of large amounts of data, even after the program that created the data terminates.  persistent data.  The smallest.