Lecture 10: Structs and Lists Records of value. Structured Data So far, all variables have stored...

30
Lecture 10: Structs and Lists Records of value

Transcript of Lecture 10: Structs and Lists Records of value. Structured Data So far, all variables have stored...

Lecture 10: Structs and Lists

Records of value

Structured Data

• So far, all variables have stored one value• A structured record is a variable consisting of fields that

can be of different types• For example, a person record might have name, age, ...• To define your own structured record type, use the keyword

struct and the name of your type, followed by the list of fields inside a block of braces

• The block must be followed by semicolon, as you are allowed to define a bunch of variables at the same time

• struct person { int age; char name[30]; } p1, p2, p3;• To access a field, use the dot operator, e.g. p1.age

typedef

• After declaration, you can use your struct as you would use any other type to declare variables and parameters

• You can even use your struct as a field of another struct• However, the C syntax doesn't allow you to simply use the

name of your struct as type, but annoyingly, it must be preceded with keyword struct every time you use it

• Often after struct declaration, we use typedef to make a handy one-word alias for the struct

• For example, typedef struct person person;• Since aliases created with typedef reside in a different

namespace than other names, it is legal to reuse the name

Operations on structs

• structs can be assigned to each other and passed to and from functions either by value or (preferably) pointers

• However, structs can't be compared for equality with == • Since struct guarantees that its fields are stored in memory

in the exact order they are listed, there may be slack bytes between fields to achieve proper alignment

• Equality testing cannot be simple memory comparison byte by byte, since slack byte values shouldn't matter

• Also if some field is a string, it should be compared with strcmp, the slack in the end again being irrelevant

• When defining a struct, define your own functions for equality test, output, etc.

Example: Equality Test for person

int person_equals(person p1, person p2) {    return p1.age == p2.age &&               strcmp(p1.name, p2.name) == 0;}

int main() {    person bob = { 40, "Bob" }; /* struct initialization */    person bob2 = { 40, "Bob" };    person tom = { 32, "Tom" };    printf("%d %d\n", person_equals(bob, tom),              person_equals(bob, bob2) ); /* 0 1 */    return 0;}

Interlude: Unions

• For each struct, the compiler precomputes the offset for each data field in memory from the beginning of struct, based on the types of fields

• An union is defined exactly like a struct, but with union• In an union, the field type itself is variant, and the union

instance stores one value at the time • Simply implemented by defining each field offset to be 0• Size of union instance equals the size of its largest field• Rarely needed these days with ample memory• If you can edit somebody's stdio.h or some other often

included file, a classic prank is #define struct union 

Pointers to structs

• Just like with any type, you define a pointer to that type by appending an asterisk to the end of the type

• If p is a struct person*, you need to dereference the pointer to access the fields in the struct it points to, e.g. (*p).age

• p.age would be a type error, as pointers don't have fields• As convenient syntactic sugar, use p -> age• Often, a struct internally contains a pointer field to another

struct of the same type• This is necessary in all data structures that consist of

dynamically allocated nodes, since we need pointers to refer to nodes that don't themselves have names, which are in a fixed supply in our code

Opaque Data Types

• When defining a data structure that consists of several nodes that are dynamically added and removed, that data structure should always be designed to be opaque

• Users of the type never access the data directly, but only through functions coming with the data type, one function per possible allowed operation on that data type, getting a pointer to the data structure as parameter

• For example, a dynamic set might consist of operations "create set", "add element", "remove element", "search for element" and "release set"

• The users of the data type call these functions, but never, for example, explicitly create and release nodes of that data structure using malloc and free

Interlude: OOP

• An opaque data type is defined in a module whose header file the users of that type #include, and whose implementation is linked to the final executable

• This hides the implementation details, but inside the module, functions still operate on data from the outside

• In object-oriented programming, all functions are written inside the struct as methods, making that struct a class

• An instance of a class is called an object• In a sense, objects are live and control their own behaviour,

instead of being "dead data" operated on by outside functions

• Subtyping with inheritance to create subclasses• Method calls resolved polymorphically at runtime

Linked Lists

• Classic simple implementation of a dynamic set, a stack, a queue, or many other opaque data structures

• A linked list consists of a chain of nodes, each node storing one element (a key) and a next pointer to the next node

• Unlike arrays, lists are not a random access structure• Problem: how do we get to the first node, from which we can

then step to any other node?• A separate header node represents the list as a whole, and

contains a head pointer to the first actual node• If the list is currently empty, head == NULL• For the last node in the chain, next == NULL (a simple

linear list) or points back to the beginning (a cyclic list)

Example: sln.h

#ifndef SLN_H#define SLN_Hstruct sln_node {  struct sln_node* next;  int key;};typedef struct sln_node sln_node;struct sln_list {  struct sln_node* head;};typedef struct sln_list sln_list;/* Function prototypes */#endif

Recycling Unused Nodes

• Many data structures that consist of nodes that are  dynamically added and removed to it tend to be constantly allocating and releasing such nodes

• For greater efficiency, instead of actually releasing an unused node, recycle it in a separate free list

• Whenever you need a new node, instead of using malloc to create a node, take one from the free list if there are any

• Nodes don't go sour in the memory, so the node that you take from the free list is just as good as a fresh new one

• Can yield a noticeable speedup in some applications

Free List Operations

sln_node* sln_allocate_node() {  sln_node* n;  if(freelist == NULL) {    freelist = malloc(sizeof(sln_node));    freelist->next = NULL;  }  n = freelist; freelist = n->next; n->next = NULL;  return n;}

void sln_release_node(sln_node* n) {  n->next = freelist; freelist = n; }

Creating a New List

/* Create a new singly-linked list. We only need to allocate the header node and initialize its head field to NULL. */ 

sln_list* sln_create() {  sln_list* list = malloc(sizeof(sln_list));  list->head = NULL;  return list;}

Releasing the Entire List

/* Release the list and all its nodes. */

void sln_release(sln_list* list) {  sln_node* n = list->head;  sln_node* m;  while(n != NULL) {    m = n->next;    sln_release_node(n);    n = m;  }  free(list);}

Opaque Data Type Lifetime

• When we define an opaque data type, we define functions to create and release one instance of it

• The user code should never have to use malloc and free explicitly, but this logic is encapsulated in our functions

• However, the user code is still responsible for not using an instance that has been released

• In C, it is not possible to reliably determine if the instance has been released, since once a memory block has been released, anything can happen to its contents

• Those contents might get set to whatever values you interpret to mean "this block is still in use"

• For the same reason, you can't look at memory and determine if the instance has been properly created

Inserting a Key

/* Insert a new element to the list. In a linked list, inserting a node to the beginning of the list is especially easy. */

void sln_insert(sln_list* list, int key) {  sln_node* n = sln_allocate_node();  n->key = key;  n->next = list->head;  list->head = n;}

Searching for a Key

/* Check if the list contains the given key. */

int sln_contains(sln_list* list, int key) {  sln_node* n = list->head;   while(n != NULL && n->key != key) {    n = n->next;  }  return (n == NULL)? 0: 1;}

Removing a Key (1)

int sln_remove(sln_list* list, int key) {  sln_node* n;  sln_node* m;  n = list->head;  if(n == NULL) { return 0; }  if(n->key == key) {    list->head = n->next;    sln_release_node(n);    return 1;  } /* Continued on next slide */

Removing a Key (2)

  while(n->next != NULL && n->next->key != key) {      n = n->next;  }  if(n->next != NULL) {      m = n->next;      n->next = m->next;      sln_release_node(m);      return 1;  }  return 0;}

FIFO Queue with Flexible Arrays

• A linked list can be used to implement a last-in-first-out stack, or a first-in-first-out queue

• Push new element in one end, and pop the element in the same end (stack) or the other end (queue)

• Alternatively, elements can be stored in an array• To avoid having to resize the array all the time, we maintain

slack in the end of the array• Field capacity tells the true size of the underlying array,

whereas size tells how much of that capacity is in use• When pushing an element to an array that is full, we resize

by allocating a new array twice as big, so there is again good slack for some time

arrayqueue.h

#ifndef ARRAYQUEUE_H#define ARRAYQUEUE_Hstruct array_queue;typedef struct array_queue array_queue;array_queue* aq_create();void aq_release(array_queue*);int aq_size(array_queue*);int aq_pop_front(array_queue*);int aq_push_back(array_queue*, int); #endif

Storing a Queue in an Array

• A LIFO stack is easier to implement in a flexible array than a FIFO queue, since both push and pop occur in the end

• Need to maintain one index size to store the current end• In a FIFO queue, pop happens in the other size than push• If we push by adding the element to the end, pop must take

the oldest element from the front• Unlike in a stack, front does not remain at the beginning of

the array• A ring buffer consists of an array and two indices head and

tail, so that tail chases the head• Once either index goes past the end of the array, it comes

back to the beginning (hence the name)

The arrayqueue structure

const int INITIALCAPACITY = 2;

struct array_queue {  int* elements;  int size, capacity, head, tail;};

typedef struct array_queue array_queue;

Creating an arrayqueue

array_queue* aq_create() {  array_queue* aq;  aq = malloc(sizeof(array_queue));  aq->capacity = INITIALCAPACITY;  aq->head = aq->tail = aq->size = 0;  aq->elements = (int*) malloc(sizeof(int) * INITIALCAPACITY);  return aq;}

Releasing an arrayqueue

/* In writing aq_release, we have to be careful to release the array aq->elements before releasing the struct aq itself. Otherwise, if we release aq first, we can no longer access its field elements to get to the element array. Note how releasing memory blocks in a data structure typically happens in reverse order to how they were allocated. */

void aq_release(array_queue* aq) {  free(aq->elements);  free(aq);}

Expanding for Push

if(aq->size == aq->capacity) {    int* new_elements = (int*) malloc(sizeof(int) * 2 * aq->size);    for(i = 0; i < aq->size; i++) {      new_elements[i] = aq->elements[aq->tail];      aq->tail = (aq->tail + 1) % aq->capacity;    }    free(aq->elements);    aq->elements = new_elements;    aq->tail = 0; aq->head = aq->size;    aq->capacity = 2 * aq->capacity;  }

Pushing an Element

int aq_push_back(array_queue* aq, int key) {  int i;  expand_if_full(aq);  aq->elements[aq->head] = key;  aq->head = (aq->head + 1) % aq->capacity;  aq->size++;}

Popping an Element

int aq_pop_front(array_queue* aq) {  int key = aq->elements[aq->tail];  if(aq->size == 0) { printf("Error!\n"); exit(1); }  aq->tail = (aq->tail + 1) % aq->capacity;  aq->size--;  return key;}

 

"Object-oriented programming is an exceptionally bad idea which could only have originated in California."

Edsger W. Dijkstra