Data Types simple and compound abstraction and implementation.

Post on 17-Jan-2016

227 views 0 download

Transcript of Data Types simple and compound abstraction and implementation.

Data Types

simple and compoundabstraction and implementation

A brief history simple types arrays - compounds of same type strings records – compounds of different types pointers and references user defined types abstract data types objects

Simple types

integer floating point binary-coded decimal

character boolean user-defined types

usually in hardware

usually in software

not composed of other types hardware or software implemented

Integer

2’s complement unsigned operations exact within range range depends on size of virtual cell- typical size: 1, 2, 4, 8 bytes

Floating Point

based on scientific notation representations and operations are

approximate range and precision depend on size of

virtual cell (usually 4 or 8 bytes)

1 11 52 bits

Binary Coded Decimal

‘exact’ decimal arithmetic decimal digits in 4 bit code range and precision depend on size of

virtual cell – 2 digits per byte

4 4

5 9 0 5 1 8 7 8

defined decimal point

Character

ASCII – 128 character set – 1 byte Unicode – 2 byte extension usually coded as unsigned integer

Boolean

1 bit is sufficient but... no bit-wise addressability in hardware store in a byte – space inefficient store 8 per byte – execution

inefficient c: 0=false, non-zero=true

User-defined types

implemented (like character and boolean usually are) as a coding of unsigned integer

enumerated type: (Pascal example)type suit = (club, diamond, heart, spade);

var lead: suit;

lead := heart;

internally represented as { 0, 1, 2, 3 }

operations:

User-defined types

implemented as a restricted range of integer

subrange type: (Ada example)subtype CENTURY20 is INTEGER range 1900..1999;

BIRTHYEAR: CENTURY20;

BIRTHYEAR := 1981;

User-defined types

Type compatibility issues:-can two enumerated types contain

same constant?-can defined types be coerced with

integer, with each other?

Memory management intro

The parser creates a symbol table of identifiers including variables: Some information, name plus more, is

bound at this time and as the program is compiled by storage in symbol table:e.g. int x;

--> x type: intaddr: offset

namenametypetype addressaddress

Strings

First use: output formatting only Quasi-primitive type in most

languages (not just arrays of character)

- operations: initialization, substring, catenation, comparison

The length problem: fixed or varying? No standard string model

c

char *s = “abc”;

int len = strlen(s);

array of char with terminal:

extended syntax

library of methods

Strings - examples

JAVA

String s = “abc”+x;

s = s.substring(0,2);

fixed length array

extended syntax

class with 70 methods

a b c 0

Strings - representations fixed length and content

(static) fixed length and varying

content (FORTRAN) varying length and content by

reallocation (java String) varying length and content by

extension (java StringBuffer) Varying length and content(c)

Static strLengthAddress

Dynamic strMaxLengthCurrLengthAddress

char*Address

In symbol table

Compound (1) Arrays

collection of elements of one type access to individual elements is

computed at execution time by position, O(1), or O(dim)

Arrays – design decisions

indexing:dimensions – limit? recursive?types – int, other, user defined?first index: 0, 1, variablerange checking – no(c),

yes(java)lexemes – ‘subscripts’ = (),[]?

Arrays – design decisions binding times

type, index typeindex range(ie array size), space

staticfixed stack-dynamicstack-dynamicheap-dynamic

initial values of elementsat storage allocation? e.g. int[] x =

{1,2,3};

Arrays – operations

on elements – based on type on entire array as variables -

- vector and matrix operations e.g.,APL- sub array (~ substring)

subarray dimensions(slices)

Arrays – storage

<array>element type, size

index typeindex lower boundindex upper bound

address

address

lower bound upper bound

Arrays – element access

<array>element type, size

index typeindex lower boundindex upper bound

address

address

lower bound i

address of a[i] =

address + (i-lower bound)*size

Arrays - multidimensional

contiguous or not row major, column major order computed location of element

Jagged arrays

Implemented as arrays of arrays<array>

<array>, 4index type

index lower boundindex upper bound

address

address

<array><array>

<array>, 3<array>, 3index type

index type

index lower boundindex lower bound

index upper boundindex upper bound

addressaddress

<array><array>

<array>, 7<array>, 7index type

index type

index lower boundindex lower bound

index upper boundindex upper bound

addressaddress

<array><array>

<array>, 4<array>, 4index type

index type

index lower boundindex lower bound

index upper boundindex upper bound

addressaddress

<array><array>

<array>, 5<array>, 5index type

index type

index lower boundindex lower bound

index upper boundindex upper bound

addressaddress

(2) Associative Arrays - maps

values accessed by keys,not indices no order of elements automatic growth of capacity operations: add/set, get, remove fast search for individual data slower for batch processing than

array Java classes; Perl data structure

Associative Arrays - implementation

hash tables based on key value most operations ‘near O(1)’ expanding capacity may be O(n)

For a java class that combines features of array and associative array, see LinkedHashMap

(3) Records

multiple elements of any type elements accessed by field name design issues:

- hierarchical definition(records within records)

- syntax of naming- scopes for elliptical (incomplete) reference to fields

Records - implementation<array> a

element type, size

index type

index lower bound

index upper bound

address

address

lower bound upper bound

<record>dept

array [1..4] of char 0 (offset)

code

address

Caddress O S C 3127

dept course

integer4

type course =

record dept : array[1..4] of char; code : integer;

end

(4) Pascal variant records (unions)

type coord = (polar, cart);

point =

record

case rep : coord of

polar: ( radians : boolean;

radius : real;

angle : real);

cart: ( x : real;

y : real);

end;

Note:

•varying space requirements

•discriminant field is optional (rep)

•type checking loopholes: Ada has similar variant record but closed these loopholes

Other unions

Fortran EQUIVALENCE c union not inside records no type checking

* unions do not cause type coercion - data is reinterpreted

Sebesta’s c example

union flextype {

int intE1;

float floatE1;

}

union flexType ell;

float x;

ell.intE1 = 27;

x = ell.floatE1;

Sebesta’s c example

union flextype {

int intE1;

float floatE1;

}

union flexType ell;

float x;

ell.intE1 = 27;

x = ell.floatE1;

(5) Sets (Pascal)

defined on one (discrete) base type implementation imposes maximum

size (set of integer;-not possible)

type day = (M, Tu, W, Th, F, Sa, Su); dayset = set of day;var work, wknd : dayset; today : day;today = F;work = [M, Tu, W, Th, F];wknd = [Sa, Su, F];if (today in work and wknd) ...

1 1 0111 0

0 0 1100 1

0 0 0100 0

(6) Pointers and references

references are dereferenced pointers (whatever that means)

primary purpose: dynamic memory access

secondary purpose: indirect addressing as in machine instructions

Pointers (and references)

data type that stores an address in the format of the machine (usually 4 bytes) or a “null”

a pointer must be dereferenced to get the data at the address it contains

a reference is a pointer data type that is automatically dereferenced

Dereferencing example

In c++:

double x,y;

Point p(0.0,0.0);

Point *pref;

pref = &p;

x = p.X;

y = (*pref).Y;

In Java:

Point2D.Double p;

p = new Point2D.Double(0.0,0.0);

double xCoord = p.x;

Dereferencing and field access combined

Dereferencing Field access

Pointers hold addresses

Indirect addressingIn c: pointer to statically allocated memory

int a,b;

int *iptr, *jptr;

a = 100;

iptr = &a;

jptr = iptr;

b = *jptr;

int x, y, arr[4];

int *iptr;

iptr = arr;

arr[2] = 33;

x = iptr[2];

y = *(iptr + 2);

Security loophole…

Pointer arithmetic

Arithmetic operations on addresses

int x;

int *iptr;

iptr = &x;

for (;;){

<< process loc (*iptr)>>

iptr++;

}

Scan through memory starting at x

Basic dynamic memory management model:

Heap manager keeps list of available memory cells

“Allocate” operation transfers cell from list in heap to program

“Deallocate” transfers cell from program back to list in heap

Tradeoffs of fixed or variable sized cells

Problems with pointers and dynamic memory:1

Dangling reference: pointer points to de-allocated memory

Point *q;

Point *p = new Point(0,0);

q = p;

delete p;

// q is dangling - reference to q should cause

// an error - ‘tombstones’ will do error check

Problems with pointers and dynamic memory: 2

Memory leakage: memory cell with no reference to it

Point *p = new Point(0,0);

p = new Point(3,4);

// memory containing Point(0,0) object

// is inaccessible - counting references will help

Cause of reference problems

Multiple references to a memory cell Deallocation of memory cells

Where is responsibility?-automatic deallocation (garbage collection)

OR -user responsibility (explicit ‘delete’)

User management of memory

Dangling references can be detected as errors but not prevented- tombstones

- lock and key Memory leakage is a continuing

problem

int *p =*q = 6;

p = null;

int *p =*q = 6;

p = null;

p 6

q

p 6

q

Garbage Collection

1. Reference counting: ongoing “eager”-memory cells returned to heap as soon as all references removed.

2. Garbage collection: occasional “lazy”-let unreferenced memory cells ‘leak’ till heap is nearly empty then collect them

Reference counting:

2p = null;

1

0q = null;

Reference count in cell

Count 0 -> return cell to heap

Classic problem:

circular linked lists

int *p = *q = 6;

Garbage Collection: (mark-sweep)

1. All cells in memory marked inaccessible(f)

2. Follow all references in program and mark cells accessible(t);

f

t

t

‘Accessible’ marker in cell

3. Return inaccessible cells to heap

f

t

t

Classic problem:

effect on program performance

A sloppy java example from Main (Data Structures)public class ObjectStack{ private Object[] data; private int manyItems; .... public Object pop() { if (manyItems==0) throw new EmptyStackException(); return data[--manyItems]; //leaves reference in data }}

Managing heap ofvariable-sized cells

Necessary for objects with different space requirements

Problem: tracking cell size Problem: heap defragmentation

- keep blocks list in size order?- keep blocks list in sequence order?