Topic 10 - Strings. CISC105 – Topic 10 Characters and Strings Thus far, we have not used the char...

60
Topic 10 - Strings
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    240
  • download

    0

Transcript of Topic 10 - Strings. CISC105 – Topic 10 Characters and Strings Thus far, we have not used the char...

Topic 10 - Strings

CISC105 – Topic 10

Characters and Strings Thus far, we have not used the char data type too much, as most applications deal with a group of characters, a data structure known as a string.

The C programming language implements the string as an array of characters (chars).

CISC105 – Topic 10

String Declarations A string, as it is simply an array of

characters, is declared as:char string1[20];

This declares a string named string1 which can hold strings from length 0 to length 19.

Notice that this fact means we “lose,” or cannot use one of the 20 characters declared. We will see why this is shortly.

CISC105 – Topic 10

String Initialization Strings can be initialized using a

string constant, in this form:char string1[20] = “Hello there”;

So, what does this string look like in memory?H e l l o t h e r

e \0 ? ? ? ? ? ? ? ?

CISC105 – Topic 10

Strings in Memory

As can be seen, the first characters (array elements) are the initialization string. Then comes a \0 character. This is the null character. This is a special character that means “end of string”. The array elements that follow the null character are not initialized, therefore their contents are indeterminate and unknown.

H e l l o t h e r

e \0 ? ? ? ? ? ? ? ?

CISC105 – Topic 10

Strings in Memory All strings must end with a null

character. When a string is enclosed in double quotes, i.e. “Hello there”, a null character is automatically inserted.

The double quotes mean “the string is whatever is inside the quotes followed by a null character.”

CISC105 – Topic 10

Strings in Memory The requirement of this null

character to indicate the end of the string shows why a string declaration of char string1[20];can only hold strings of length 0 to 19. The one “lost” character of the twenty allocated for string1 is needed for the null character.

CISC105 – Topic 10

String Input/Output The standard C input and output

functions, printf and scanf can directly work with strings using the placeholder “%s”.

For example, what is the output of the following program fragment?

char string1[20] = “Hello there.”;

printf(“%s”,string1);

Hello there.

CISC105 – Topic 10

String Input/Output The printf function, as well as other

functions that work with strings, depend on the strings they receive as arguments to be terminated with the null character.

Therefore, it is important that whenever a programmer-written function constructs an array, it inserts the null character at the end of the array.

CISC105 – Topic 10

String Input/Output We can also use the “%s” placeholder

with the scanf statement. For instance, to read a string from the keyboard and place it in string1:

char string1[20];

scanf(“%s”,string1);

CISC105 – Topic 10

String Input/Output

Notice that we do not use the “&” operator on string1.

Remember that the name of an array is the memory address of its first element. Thus, since string1 is a memory address, it does not need the “address of” operator.

char string1[20];

scanf(“%s”,string1);

CISC105 – Topic 10

String Input/Output Using the scanf function with the “%s”

placeholder stores the user input string in the character array (string) corresponding to the placeholder.

This input function does NOT check to make sure the string it read from the keyboard is short enough to fit in the specified character array.

Thus, it is possible to overrun the character array using a scanf statement. Thus, make sure the array that you are scanfing into is large enough to handle all possible inputs.

CISC105 – Topic 10

String Input/Output Notice that as the scanf function

treats a space as the end of a data input, we cannot input strings with spaces in them using the “%s” placeholder.

So…we need a way to read in a line of text into a character array (string). This function is the gets function.

CISC105 – Topic 10

String Input/Output The function gets works as follows:

gets(char []); Here, we simply put the name of the

character array we wish to store the user input string in as the only parameter passed to gets.

This function performs like scanf in that it waits for the user to input some text and press <enter>. Once the user presses <enter>, the entire line is stored in the specified character array.

CISC105 – Topic 10

String Input/Output

Here, if we were to type in “Hello there” string1 would be set to “Hello there”.

char string1[20];char string2[20];

gets(string1);

CISC105 – Topic 10

String Operations and Assignment Previously, we have seen the

assignment operator “=” used to store data in a variable.

We can use the “=” operator during string initialization (setting an initial value when the string is declared).

This is the ONLY time we can use the assignment operator to set a string.

CISC105 – Topic 10

String Operationsand Assignment We have seen before that the name of an

array is the memory address of the first element of that array (in this case, the first character in the string).

Thus, any statement such as:char string1[20];string1 = “Hello”;would attempt to set the address of the first element of the array, which cannot be changed by using such an assignment statement.

CISC105 – Topic 10

String Operationsand Assignment C provides a number of routines to

manipulate strings, through library functions.

All of the string manipulation functions are defined in string.h.

Each of the string manipulation functions takes, and returns, variables of type char *. Keep in mind that a string is represented by the memory address of its first element (as is always true with arrays).

CISC105 – Topic 10

String Operationsand Assignments The first, and most basic, string

manipulation library function is the string-copy function, strcpy().

This function takes two parameters, the destination string and the source string.

Its prototype looks like:strcpy(char *dest, const char *source);

CISC105 – Topic 10

The strcpy Function This function makes a copy of the source

string in the destination string. This is a simple example of the strcpy()

function:char string1[20];

strcpy(string1,”Hello there”); This code fragment sets string1 to “Hello

there” (with the last ‘e’ followed by a null character, as indicated with the double quotes).

CISC105 – Topic 10

The strcpy Function

? ? ? ? ? ? ? ? ? ?

? ? ? ? ? ? ? ? ? ?

H e l l o t h e r

e \0 ? ? ? ? ? ? ? ?

char string1[20];

strcpy(string1,”Hello there”);

string1

string1

CISC105 – Topic 10

The strcpy Function It is important to note that the strcpy

function does not perform any bounds checking.

Therefore, just as like using the “%s” placeholder with scanf, we can overflow the destination array.

It is important to ensure the destination array is large enough to hold the string that the strcpy function will place in it.

CISC105 – Topic 10

The strcpy Function

? ? ? ? ? ? ? ? ? ?

? ? ? ? ? ? ? ? ? ?

T h i s i s a

r e a l l y l o n

char string1[20];

strcpy(string1,”This is a really long string”);

string1

string1

CISC105 – Topic 10

The strcpy Function When such an overflow occurs, the same

problems that were discussed for array overstepping can occur.

Most of the time, the array will overflow into unallocated memory.

However, if the memory cells immediately following the array are allocated and then the strcpy function oversteps the array bounds, it can overwrite the variables which are allocated to these memory cells.

CISC105 – Topic 10

The strncpy Function This problem can be resolved using the strncpy function.

This function is a “counted” version of the strcpy function. It takes three arguments, the destination string, the source destination, and then the number of characters to copy.

Its prototype looks like:strncpy( char *dest,

const char *source,size_t size);

size_t is an unsignedinteger. Thus, it cannot

be negative

CISC105 – Topic 10

The strncpy Function Thus, in order to prevent copying past

the bounds of the array in the previous example, we could write:char string1[20];

strncpy(string1, ”This is a really long string”, 20);

This would only copy the first twenty characters (into index 0 to 19) and thus would not overwrite any memory past that allocated for the array.

CISC105 – Topic 10

The strncpy Function

Notice that even though we no longer overwrite memory past that allocated for the string, the string is still not valid, as it does not end with a null character.

T h i s i s a

r e a l l y l o n

string1

CISC105 – Topic 10

The strncpy Function

Thus, we need to set the last allocated string character to the null character manually. This can be done using a simple character assignment statement.

T h i s i s a

r e a l l y l o \0

string1

char string1[20];

strncpy(string1, ”This is a really long string”, 20);

string1[19] = ‘\0’;

CISC105 – Topic 10

Substrings

Keep in mind that string evaluates to the address of the first character, or string = &string[0].

Using this, we can see how to use a substring. For example, to reference the substring “there”, we could simply use &string[6] instead of string.

H e l l o t h e r

e \0 ? ? ? ? ? ? ? ?

string

CISC105 – Topic 10

Substrings

So, what would be the output of the following code fragment, assuming string as shown above?printf(“%s\n”,string);printf(“%s\n”,&string[0]);printf(“%s\n”,&string[6]);printf(“%s\n”,&string[2]);

H e l l o t h e r

e \0 ? ? ? ? ? ? ? ?

string

Hello thereHello theretherello there

CISC105 – Topic 10

Substrings

So, we can combine this and the strncpy function to extract a substring.

Therefore, to extract “there” from string and put it in string2, we could write:

strncpy(string2,&string[6],5);string2[5] = ‘\0’;

Note that we need to manually insert the null character into string2, as the data strncpy copies into string2 does not end with one.

H e l l o t h e r

e J o h n \0 ? ? ?

string

CISC105 – Topic 10

String Lengths Notice that the size of the

character array is an upper limit on the size of the string it can contain.

Thus, we need some way to find the size of the string contained in a character array (the actual number of characters present before the null character).

CISC105 – Topic 10

String Lengths This can be determined using the strlen function. This function takes one parameter, a char * (or string name) and returns one value, the length in characters of the string, a size_t value, which is an unsigned integer.

CISC105 – Topic 10

String Lengths What would be the output of the

following program fragment?H e l l o t h e r

e J o h n \0 ? ? ?

string

int string_size;string_size = strlen(string);printf(“Size = %d\n”,string_size);

Size = 16

CISC105 – Topic 10

String Lengths What would be the output of the

following program fragment?char string[40];int string_size;strcpy(string, ”I think this is a nice string.”);string_size = strlen(string);printf(“Size = %d\n”,string_size);printf(“%s\n”,&string[18]);

Size = 30nice string.

CISC105 – Topic 10

String Lengths Note that the strlen function determines

the number of characters between the memory address provided (normally the string name, thus the address of the first character) and the first-encountered null character.

If a string does not end in a null character, the function will not work correctly.

CISC105 – Topic 10

String Lengths Additionally, if a string is overstepped (a

longer string is stored than characters allocated), the string length, as reported by strlen, could be greater than the size of the character array declared to hold the string.

Note that this is NOT valid. Strings should not overflow their array size; such an overflow is not correct programming and could result in incorrect program operation.

CISC105 – Topic 10

String Concatenation Sometimes, a program may need to

append one string onto another string. This is known as concatenation, or

concatenating the arrays. This is accomplished using the C library

functions strcat and strncat. The strncat function is simply a

“counted” version of strcat, just as strncpy is a “counted” version of strcpy.

CISC105 – Topic 10

String Concatenation This function, strcat, takes two parameters,

the destination string and the source string. This function takes the second string and

appends it to the first string, overwriting the null character at the end of the first string with the first character of the second string.

The last character copied onto the end of the first string is the null character of the second string.

CISC105 – Topic 10

String ConcatenationH e l l o \0

J i m \0

string1

string2

char string1[10] = “Hello”;char string2[10] = “Jim”;

strcat(string1,string2);

J i m \0

CISC105 – Topic 10

String Concatenation Bear in mind that the strcat (and strncat) functions assume ample space in the destination array to hold what is copied into it.

No bounds checking is present…so make sure the first array (the one that is being copied to) is big enough to hold the contents of both strings!

CISC105 – Topic 10

String Comparison Frequently, we may wish to compare

strings. For integers, floating-point numbers, and

other such data types, we can use simple comparison operators such as:int x=9, y=27;

if (x < y) { … }

if (y < x) { … }

if (y == x) { … }

CISC105 – Topic 10

String Comparison If we were to do this with strings…char string1[20] = “Hello.”;char string2[20] = “A dog.”;if (string1 < string2) { … }

This is a valid comparison; however it does not compare the strings “Hello.” and “A dog.”

As the name of an array is the memory address of its first element, this comparison statement tests to see if the memory address string1 begins at is less than the memory address string2 begins at.

CISC105 – Topic 10

String Comparison Thus, C provides the strcmp function to

compare strings. This function takes two strings and

returns an integer value: A negative integer if str1 is less than str2. Zero if str1 equals str2. A positive integer if str1 is greater than str2.

So, when is a string less than, equal to, or greater than another string?

CISC105 – Topic 10

String Comparison The simplest case is that of

equality. If the two strings are the same length (same number of characters) and each character of one string is equal to the corresponding character in the other string, the strings are considered equal.

CISC105 – Topic 10

String Comparison There are two rules for less

than/greater than and strings: If the first n characters of str1 and str2

match and str1[n] and str2[n] are the first non-matching characters, str1 is less than str2 if str1[n] < str2[n].

If str1 is shorter in length than str2 and all of the characters of str1 match the corresponding characters of str2, str1 is less than str2.

CISC105 – Topic 10

String Comparison So what would be the result of the

following comparison function calls? strcmp(“hello”,“jackson”); strcmp(“red balloon”,“red car”); strcmp(“blue”,“blue”); strcmp(“aardvark”,“aardvarks”);

CISC105 – Topic 10

String Comparison Notice that the strcmp function

uses character comparisons to compare strings; the elements of the first string (characters) are compared with elements of the second string.

As such, the rules for comparing char datatypes come into play.

CISC105 – Topic 10

String Comparison Recalling, every character (symbol) is

actually a number, as defined by the ASCII code.

As such, these numbers are what is actually compared when characters are being compared (which occurs when strings are being compared).

This is fairly transparent as “A” is less than “B” which is less than “C”, etc…

However, it is important to realize that “Z” is less than “a”.

CISC105 – Topic 10

String Comparison The uppercase letters, in the ASCII

code, have lower codes than the lowercase letters.

Thus, if we were to call:strcmp(“Zebra”,“aardvark”);

This function would return a negative value as ‘Z’ is less then ‘a’, thus “Zebra” is less than “aardvark”.

CISC105 – Topic 10

String Comparison Thus, it is important to keep in

mind that in the ASCII code, which governs the values of characters and thus their comparisons, uppercase letters have smaller values than lowercase letters.

CISC105 – Topic 10

String Comparison So what would be the result of the

following comparison function calls? strcmp(“hello”,“Jackson”); strcmp(“red Balloon”,“red car”); strcmp(“blue”,“blUe”); strcmp(“AARDVARK”,“AARDVARKs”);

CISC105 – Topic 10

String Input/Output Functions We have already seen the printf

and scanf functions as the primary method of input and output in the C language.

We can now introduce sprintf and sscanf which use a string instead of the principle I/O devices (the keyboard and the display screen).

CISC105 – Topic 10

String Input/Output Functions We can use the sprintf function to

create a string. This function works the same as printf except that it takes one additional parameter, the string to store the result in.

The format of sprintf is:sprintf(string,format string,

print list); Notice that the format is the same as for printf with the addition of the string.

CISC105 – Topic 10

String Input/Output Functions So, the sprintf function works the

exact same as the printf function except that instead of displaying the resulting text, it stores it in the specified string.

Note that this function, like all others that operate on strings, does not do bounds checking.

CISC105 – Topic 10

String Input/Output Functions This program fragment

int month = 12;

int day = 25;

int year = 2000;

char string1[20];

sprintf(string1, “%d/%d/%d”,

month,day,year);

would store the string “12/25/2000” in the character array string1.

CISC105 – Topic 10

String Input/Output Functions What would be the output of the

following code fragment?

char string1[20] = “Today is ”;char string2[20] = “How about that?”;char string3[40];int day = 27;

sprintf(string3,”Hello Jeff. %s Monday, the %d th of\nNovember. %s”, string1, day, string2);printf(“%s”,string3);

Hello Jeff. Today is Monday, the 27 th ofNovember. How about that?

CISC105 – Topic 10

String Input/Output Functions Just as sprintf is simply a version of printf that outputs to a string and not the display, sscanf is a version of scanf that takes input from a string and not the keyboard.

The format of sscanf is:sscanf(string,format string,

scan list); Notice that the format is the same as

for scanf with the addition of the string.

CISC105 – Topic 10

String Input/Output Functions This program fragment

char string1[20] = “12/25/2000”;

int month;

int day;

int year;

sscanf(string1, “%d/%d/%d”,

&month,&day,&year);

would store the value 12 in month, the value 25 in day, and the value 2000 in year.

CISC105 – Topic 10

String Input/Output Functions What would be the output of the

following code fragment?char string1[30] = “Hello 24 is 3.14159”;char word1[10], word2[10];int number1;float number2;

sscanf(string1,”%s %d %s %f”, word1, &number1, word2, &number2);

printf(“Word 1 = %s\nWord 2 = %s\n number1 = %d\nnumber2= %f\n”, word1,word2,number1,number2);

Word 1 = HelloWord 2 = isnumber1 = 24number2 = 3.14159