moving data.pdf

49
Advance Database Management System Using Oracle 10g Moving Data

Transcript of moving data.pdf

Page 1: moving data.pdf

Advance Database Management System

Using

Oracle 10g

Moving Data

Page 2: moving data.pdf

Moving Data : General Architecture

Page 3: moving data.pdf

Moving Data : General Architecture

Page 4: moving data.pdf

Directory Object : Overview

Page 5: moving data.pdf

Directory Object : Overview [cont…] Directory objects are logical structures that represents

a physical directory on the server‟s file system.

They contain the location of a specific operating system directory.

They provide greater file management flexibility because the names of the directory objects can be used in Enterprise Manager, so you are not required to hard-code directory path specifications.

Directory objects are owned by the SYS user.

Directory names are unique across the database because all the directories are located in a single name space (that is, SYS).

Page 6: moving data.pdf

Directory Object : Overview [cont…]

Directory objects are required when you specify

the file locations for Data Pump because it

accesses files on the server rather than on the

client.

In Enterprise Manager, select Administration ->

Directory Objects.

To edit or delete a directory object, select the

directory object and click appropriate button.

Page 7: moving data.pdf

Creating Directory Objects

Page 8: moving data.pdf

Creating Directory Objects [cont…]

Page 9: moving data.pdf

SQL*Loader : Overview

Page 10: moving data.pdf

SQL*Loader : Overview [cont…]

SQL*Loader loads data from external files into

tables of an Oracle database.

It has a powerful data parsing engine that puts

little limitation on the format of the data in the

data file.

Page 11: moving data.pdf

SQL*Loader : Overview [cont…] The files that are used by SQL*Loader are as

follows:

Input Data Files : SQL*Loader reads data from one or more files that are

specified in the control file. From SQL*Loader perspective, the data in the data file

is organized as records. A particular data file can be in fixed record format,

variable record format, or stream record format. The record format can be specified in the control file with

the INFILE parameter. If no record format is specified, the default is stream

record format.

Page 12: moving data.pdf

SQL*Loader : Overview [cont…]

The files that are used by SQL*Loader [cont…]

Control File : The control file is a text file that is written in a language

that SQL*Loader understands.

The control file indicates to SQL*Loader where to find the data, how to parse and interpret the data, where to insert the data, and so on.

Although not precisely defined, a control file can be said to have three sections: Global options, such as the input data file name, and records

to be skipped.

INFILE clauses to specify where the input data is located.

Data to be loaded.

Page 13: moving data.pdf

SQL*Loader : Overview [cont…]

The files that are used by SQL*Loader

[cont…]

Control File [cont…]

Although not precisely defined, a control file can be

said to have three sections:

The first section contains session wide information,

for example,

Global options, such as the input data file name, and

records to be skipped.

INFILE clauses to specify where the input data is located.

Data to be loaded.

Page 14: moving data.pdf

SQL*Loader : Overview [cont…]

The files that are used by SQL*Loader [cont…]

Control File [cont…]

Although not precisely defined, a control file can be said to have three sections: The second section consists of one or more INTO

TABLE blocks. Each of these blocks contains information about the table (such as the table name and the columns of the table) into which the data is to be loaded.

The third section is optional and, if present , contains input data.

Page 15: moving data.pdf

SQL*Loader : Overview [cont…]

The files that are used by SQL*Loader [cont…]

Log file :

When SQL*Loader begins execution, it creates a log

file. If it cannot create a log file, execution terminates.

The log file contains a detailed summary of the load,

including a description of any errors that occurred during

the load.

Bad file :

The bad file contains the records that are rejected, either

by SQL*Loader or by the Oracle database.

Data file records are rejected by SQL*Loader when the

input format is invalid.

Page 16: moving data.pdf

SQL*Loader : Overview [cont…]

The files that are used by SQL*Loader

[cont…]

Bad file [cont…]:

After a data file record is accepted for processing by

SQL*Loader, it is sent to the Oracle database fro

insertion into a table as a row.

If the Oracle database determines that the row is

valid, then the row is inserted into the table.

If the row is determined to be invalid, then the

record is rejected and SQL*Loader puts it in the bad

file.

Page 17: moving data.pdf

SQL*Loader : Overview [cont…]

The files that are used by SQL*Loader

[cont…]

Discard file :

This file is created only when it is needed, and only

if you have specified that a discard file should be

enabled.

The discard file contains records that are filtered out

of the load because they do not match any record-

selection criteria specified in the control file.

Page 18: moving data.pdf

Loading Data with SQL*Loader

Page 19: moving data.pdf

Using SQL * Loader

SQL*Loader is an Oracle utility that enables

you to efficiently load large amounts of data

into a database.

If you have data in a flat file, such as comma-

delimited text file, and you need to get that data

into the Oracle database, SQL * Loader is the tool

to use.

Page 20: moving data.pdf

Introducing SQL * Loader

Using SQL*Loader, you can do the following:

Load the data from the delimited text file, such as

comma-delimited file

Load the data from the fixed –width text file

Load the data from a binary file

Combine multiple input records into one logical

record

Store data from one logical record into one table or

into several tables

Page 21: moving data.pdf

Introducing SQL * Loader

[cont…]

Using SQL*Loader, you can do the following:

Write SQL expressions to validate and transform

data as it is being read from a file

combine data from multiple files into one table

Filter the data in the input file, loading only selected

rows

Collect bad records – that is, those records that

won‟t load – into a separate file where you can fix

them

And more…!

Page 22: moving data.pdf

Understanding the SQL*Loader Control

File

To use SQL* Loader, you need to have

A database

A flat file to load, and

A control file to describe the contents of the flat

file

Page 23: moving data.pdf

Understanding the SQL*Loader Control

File [cont…]

Page 24: moving data.pdf

Understanding the SQL*Loader Control

File [cont…]

Place the figure from spiral

Page 25: moving data.pdf

Understanding the SQL*Loader Control

File [cont…]

Control files, such as one illustrated in previous figure,

contain a number of commands and clauses

describing the data that SQL*Loader is reading.

Control files also tell SQL*Loader where to store

that data , and they can define validation

expressions for that data.

The control file is aptly named, because it controls

almost every aspect of how SQL*Loader operates.

The control file describes the format of the data in the

input file and tells SQL*Loader which tables and

columns to populate with this data.

Page 26: moving data.pdf

Understanding the SQL*Loader Control

File [cont…]

When you write a control file, you need to be

concerned with following questions: What file, or files, contain the data you want to load?

What table, or tables you are loading?

What is the format of the data that you are loading?

What do you want to do with records that won’t load?

All of these items represent things that you specify

when you write a SQL*Loader control file.

Page 27: moving data.pdf

Understanding the SQL*Loader Control

File [cont…]

Generally, control files consist of one long

command that starts out like this:

LOAD DATA

The keyword DATA is optional. Everything else in a

control file is a clause of some sort that is added

onto this command.

Page 28: moving data.pdf

Understanding the SQL*Loader Control

File [cont…]

Create following table:

CREATE TABLE animal_feeding

animal_id NUMBER,

feeding_date DATE,

pounds_eaten NUMBER (5,2),

note VARCHAR2 (80) );

Page 29: moving data.pdf

Specifying the input file You use the INFILE clause to identify the file

containing the data that you want to load.

The data can be in a file separate from the control file, which is usually the case, or you can place the data within the control file itself.

Use multiple INFILE clauses if your data is spread across several files.

Control File Data:

If you are loading the data from a text file, you have the option of placing the LOAD command at the beginning of that file, which then becomes the control file.

Page 30: moving data.pdf

Specifying the input file [cont…] Control File Data:

to specify that SQL*Loader looks in the control file for the data, supply an asterisk (*) for the file name in the INFILE clause. For example:

LOAD DATA

INFILE *

…..

….

BEGINDATA

data

data

data

Page 31: moving data.pdf

Specifying the input file [cont…]

Control File Data:

If you do include your data in the control file, the

last clause of your LOAD command must be the

BEGINDATA clause.

This tells the SQL*Loader where the command

ends and where your data begins.

SQL*Loader will begin reading data from the line

immediately following BEGINDATA.

Page 32: moving data.pdf

Specifying the input file [cont…]

DATA in a Separate File:

Although you can have data in the control file, it‟s more common to have it in a separate file.

In that case, you place the file name after the keyword INFILE as shown in example:

LOAD DATA

INFILE „animal_feeding.csv‟

. …

Page 33: moving data.pdf

Specifying the input file [cont…] DATA in Multiple Files:

You can use multiple INLINE clauses to load data from several files at once. The clauses must follow each other, as shown below:

LOAD DATA

INFILE „animal_feeding_fixed_1.dat‟ INFILE „animal_feeding_fixed_2.dat‟

. …

Page 34: moving data.pdf

Loading data into nonempty tables

After listing the input file, or files, in SQL * Loader,

you need to specify whether you expect the table

that you are loading to be empty.

By default, SQL*Loader expects that you are

loading the data in completely empty table.

If, when load starts, SQL*Loader finds even one

row in the table, the load will be aborted.

Four keywords control SQL*Loader‟s behaviour

when it comes do dealing with empty vs.

nonempty tables:

Page 35: moving data.pdf

Loading data into nonempty tables

INSERT Specifies that you are loading an empty table. SQL

*Loader will abort the load if the table contains

data to start with.

APPEND Specifies that you are adding data to a table. SQL *

Loader will proceed with the load even if preexisting

data is in the table.

REPLACE Specifies that you want to replace the data in the

table. Before loading, SQL *Loader will delete any

existing data.

TRUNCATE Specifies the same as REPLACE, but SQL * Loader

uses the TRUNCATE statement instead of a DELETE

statement to delete the existing data.

Page 36: moving data.pdf

Loading data into nonempty tables

[cont…]

Place the keyword for whichever option you

choose after INFILE clause, as shown in

example: LOAD DATA

INFILE „animal_feeding.csv‟

APPEND

….

….

If you don‟t specify an option, then

INSERT is assumed by default.

Page 37: moving data.pdf

Specifying the table to load

In SQL*Loader, you use the INTO TABLE clause

to specify which table or tables you wan to load.

It also specifies the format of the data contained

in the input file.

The INTO TABLE clause is the most complex of

all the clauses.

Page 38: moving data.pdf

Specifying the table to load [cont…] Loading One Table:

example:

LOAD DATA

INFILE 'load1.csv'

INSERT

INTO TABLE LOAD_TEST

(

eno char terminated by ",",

ename char terminated by ",",

city char terminated by ","

)

Page 39: moving data.pdf

Using SQL*Loader Data Types SQL*Loader supports a variety of data types. Some of the most useful data

types for loading data from text files are given below:

Data Type

Name

Description

CHAR Identifies the character data. If you are loading data into any

type of text field, such as VARCHAR2, CHAR, or CLOB, use

the SQL*Loader CHAR data type.

DATE

[“format”]

Identifies a date. Even though it‟s optional, specify the

format to avoid problems.

INTEGER

EXTERNAL

Identifies an integer value that is stored in character form.

For ex: the character string “123” is a valid INTEGER

EXTRNAL value.

Page 40: moving data.pdf

Using SQL*Loader Data Types [cont…]

Data Type Name Description

DECIMAL

EXTERNAL

Identifies a numeric value that is stored in character

form and that may include a decimal point. The

string “-123.45” is a good exa. Of this data type.

ZONED (precision,

scale)

Zoned decimal fields are numeric values

represented as character strings and that contain

an assumed decimal point. For ex., a definition of

ZONED (5,2) would cause “12345” to be

interpreted as 123.45.

Page 41: moving data.pdf

Creating a control file [example] Loading One Table:

example:

LOAD DATA

INFILE 'load1.csv'

INSERT

INTO TABLE LOAD_TEST

(

eno char terminated by ",",

ename char terminated by ",",

city char terminated by ","

)

Page 42: moving data.pdf

Describing fixed-width columns

The INTO TABLE clause contains a field list within parenthesis. This list defines the fields being loaded from the flat file into the table. Each entry in the field list has this general format: column_name POSITION (start:end) datatype

column_name : the name of a column in the table that you are loading

POSITION (start:end) : the position of the column within the record. The values for start and end represents the character positions for the first and last characters of the column. The first character of a record is always position 1.

Page 43: moving data.pdf

Describing fixed-width columns

Datatype : A SQL*Loader data type that identifies

the type of data being loaded.

You will need to write one field list entry for each column that

you are loading. As an example, consider the following record:

10010-jan-200002350Flipper seemed unusually hungry today.

The above record contains a three-digit ID number, followed by a

date, followed by a five digit number, followed by a text field.

The ID number occupies character positions 1 through 3 and is an

integer, so its definition would look like this:

animal_id POSITION (1:3) INTEGER EXTERNAL

The date field is next. Occupying character positions 4 through 14,

and its definition looks like this:

feeding_date POSITION (4:14) DATE “dd-mon-yyyy”.

Page 44: moving data.pdf

Example: Loading fixed-width data

LOAD DATA

INFILE „animal_feeding_fixed_1.dat‟

APPEND

INTO TABLE animal_feeding

TRAILING NULLCOLS

( animal_id POSITION (1:3) INTEGER EXTERNAL,

feeding_date POSITION (4:14) DATE “dd-mon-yyyy”,

pounds_eaten POSITION (15:19) ZONED (5,2),

note POSITION (20:99) CHAR

)

Page 45: moving data.pdf

Describing Delimited Columns The format for describing delimited data, such as comma-delimited

data, is similar to that used for fixed-width data. The difference is that you need to specify the delimited being used. The general format of a delimited column definition looks like this:

column_name datatype TERMINATED BY „delim‟

[OPTIONALLY ENCLOSED BY „delim‟]

The elements of this column definition are described as follows:

column_name : the name of a column in the table that you are loading

datatype : A SQL*Loader datatype

TERMINATED BY „delim‟ : identifies the delimiter that marks the end of the column

OPTIONALLY ENCLOSED BY „delim‟ : Specifies an optional enclosing character. Many text values, for example, are enclosed by quotation marks.

Page 46: moving data.pdf

Describing Delimited Columns [cont…]

When describing delimited fields, we must be

careful to describe them in the order in which they

occur. Take a look at following record which

contains delimited data: 100,1-jan-2000,23.5,”Flipper seemed unusually hungry today.”

It can be defined as below:

animal_id INTEGER EXTERNAL TERMINATED BY „,‟,

feeding_date DATE “dd-mon-yyyy” TERMINATED BY „,‟,

pounds_eaten DECIMAL EXTERNAL TERMINATED BY „,‟,

note CHAR TERMINATED BY „,‟

OPTIONALLY ENCLOSED BY „ ” ‟

Page 47: moving data.pdf

Working with short records When dealing with delimited data, you occasionally runs into

cases where not all fields are present in each record in a data

file. For example, look at two records:

100,1-jan-2000,23.5,”Flipper seemed unusually hungry today.”

151,1-jan-2000,55

The first record contains a note, while the second does not.

SQL*Loader‟s default behavior is to consider the second record

as an error because not all fields are present.

You can changes this behavior and cause SQL*Loader to treat

missing values at the end of a record as nulls, by using TRAILING

NULLCOLS clause.

Page 48: moving data.pdf

Working with short records [cont…]

The TRAILING NULLCOLS clause is the part

of the INTO TABLE clause, and it appears as

follows:

INTO TABLE animal_feeding

TRAILING NULLCOLS

(

)

Page 49: moving data.pdf

Converting Blanks to Nulls When dealing with data in fixed-width columns, you will find that missing

values appear as blanks in data file. For ex:

100120-mar-2012good morning all

11100223-mar-2012this is demo

The first record is missing the two digit id value. If this case is not handled, then the record will be rejected from the load.

If you prefer to treat a blank field as a null, you can use the NULLIF clause to tell SQL*Loader to interpret it as null value.

The NULLIF clause comes after the datatype and takes the following form:

NULLIF field_name= BLANKS

e.g:

cid POSITION (1:3) INTEGER EXTERNAL

NULLIF cid=BLANKS,