Comp 335 File Structures Fundamental File Structure Concepts.

Post on 19-Jan-2018

235 views 0 download

description

Example of Data saved to File Assume a programmer writes all data to file by using strings. Data to be saved on file Towns and Populations Searcy Bald Knob 3500 Romance 950

Transcript of Comp 335 File Structures Fundamental File Structure Concepts.

Comp 335File Structures

Fundamental File Structure Concepts

File Organization

File Organization is how the data is organized in the file.

Must be considered carefully how data is to be written to file because this will dictate how the data is to be read back in.

Example of Data saved to File

Assume a programmer writes all data to file by using strings.

Data to be saved on fileTowns and Populations

Searcy15000Bald Knob3500Romance950

Example of Data saved to File

When saved on file:

Searcy15000Bald Knob3500Romance950

Considerations when Writing Data to File

Must keep the “integrity” of the individual units of data (fields) which we wrote.

Group logical units of data together in records.

Within each record, organize the data on file in a way that will maintain “field separation”. In other words, write it in a way where the data can be recaptured.

Common Field Structures

Force fields to have a predictable length

Begin each field with a length indicator Place a delimeter at the end of each

field to separate it from the next Use a “keyword = value” expression to

identify each field and its contents.

Fields with a predictable length

Data to be saved on fileTowns and Populations

Searcy15000Bald Knob3500Romance950

Assume that: Towns (char [12]) and Population (char [7])

When written to file:

Searcy 15000 Bald Knob 3500 Romance 950

Fields with a predictable length

A good method if all of the data to be stored was fixed in length.

What if the data to be stored were variable in length?

A lot of wasted space is used unnecessarily.

Fields with a length indicator

Data to be saved on fileTowns and Populations

Searcy15000Bald Knob3500Romance950

Assume that: Towns (char [12]) and Population (char [7])

When written to file:

6Searcy5150009Bald Knob435007Romance3950

Fields with a length indicator

The length indicator tells how many bytes to read.

How many bytes should you use for the length indicator? 1 byte (field size max = 255) 2 byte (field size max = 65535)

This method should save space if the data is quite variable in length.

In this case, mixes binary data with text.

Fields separated by delimiters

Data to be saved on fileTowns and Populations

Searcy15000Bald Knob3500Romance950

Assume that: Towns (char [12]) and Population (char [7])

When written to file:

Searcy|15000|Bald Knob|3500|Romance|950

Fields separated by delimiters

Could possibly save more space Delimiter choice must not be part of

valid data Language must provide instructions to

read data based on a sentinel value In C++, getline is overloaded to be

able to handle this.

Fields separated by “keyword = value”

Data to be saved on fileTowns and Populations

Searcy15000Bald Knob3500Romance950

Assume that: Towns (char [12]) and Population (char [7])

When written to file:

TOWN=Searcy|POP=1500|TOWN=Bald Knob|POP=3500|TOWN=Romance

Fields separated by “keyword = value”

This does make for potentially a lot of wasted space in the file.

It is a good technique if some fields are not used at times within records.

It also is good if you just want to save a lot of information on file and not organize the data within records.

Record Organization

Fields can be combined to form a record An entire record can be read in at a time

into a buffer and then fields can be parsed out.

This is common because the majority of time we want to read and write records, not read and write individual fields.

Fixed-Length Records

A frequently utilized method for file organization.

This can imply that each field must be fixed length.

It could be just a “container” to store a variable number of variable length fields.

Fixed-Length Records

Data to be saved on fileTowns and Populations

Searcy15000Bald Knob3500Romance950

Assume that: Towns (char [12]) and Population (char [7])These fields are combined in a 19 byte record.

When written to file:

Searcy 15000 Bald Knob 3500 Romance 950

Fixed-Length Records

Makes DIRECT ACCESS to records feasible, this will help reduce seeks!!!!!

Space could be wasted if the fields within the record are highly variable.

Variable Length Records

Store just the data within the records, no wasted space.

Sequential access to get to each record. Typically a length indicator is given at the beginning of the record. It can be combined with “field integrity” techniques.

Variable Length Records

Data to be saved on fileTowns and Populations

Searcy15000Bald Knob3500Romance950

Assume that: Towns (char [12]) and Population (char [7])These fields are combined in a 19 byte record.

When written to file:

13Searcy|15000|15Bald Knob|3500|12Romance|950|

Variable Length Records

To improve access to records (which will minimize seeks), an index can be used which can store the offsets of each variable length record in the file.

Variable Length RecordsData to be saved on fileTowns and Populations

Searcy15000Bald Knob3500Romance950

When written to file:

Searcy|15000|Bald Knob|3500|Romance|950|

Index of Offsets

0 13 28 40

Variable Length Records

To obtain direct access to variable records, each offset address can be associated with a key which uniquely identifies each record.

The index can be searched for the key, address found and then directly access the record.