(5) cpp dynamic memory_arrays_and_c-strings

29
1 Nico Ludwig (@ersatzteilchen) (5) Basics of the C++ Programming Language

Transcript of (5) cpp dynamic memory_arrays_and_c-strings

Page 1: (5) cpp dynamic memory_arrays_and_c-strings

1

Nico Ludwig (@ersatzteilchen)

(5) Basics of the C++ Programming Language

Page 2: (5) cpp dynamic memory_arrays_and_c-strings

2

TOC

● (5) Basics of the C++ Programming Language

– The Heap: Dynamic Memory and dynamic Array Allocation

– Automatic versus Dynamic Arrays

– A Glimpse of the Topic "Stack versus Heap"

● "Geometric" Properties of the Heap and the Stack

– Lost Pointers and Memory Leaks

– Advanced C-strings: Buffers, Concatenation and Formatting

● Sources:

– Bruce Eckel, Thinking in C++ Vol I

– Bjarne Stroustrup, The C++ Programming Language

Page 3: (5) cpp dynamic memory_arrays_and_c-strings

3

Automatic Arrays have a Compile Time fixed Size

● The size of automatic arrays can't be set at run time:

int count = 0;std::cout<<"How many numbers do you want to enter?"<<std::endl;std::cin>>count;if (0 < count) {

int numbers[count]; // Invalid (in C++)! The symbol count must be a compile-time constant!for (int i = 0; i < count; ++i) {

std::cout<<"Enter number: "<<(i + 1)<<std::endl;std::cin>>numbers[i];

}}

Page 4: (5) cpp dynamic memory_arrays_and_c-strings

4

The Correct Way of creating dynamic Arrays

● In C/C++, dynamic memory allocation on the heap is needed in such a case.

– In C, we access the heap with the functions std::malloc() and std::free() in <cstdlib>.

– The process of requesting memory manually is called "allocation of memory".

– std::malloc() creates a block of specified size in the heap memory and returns a void* to this block to the caller.

● For array creation the []-declarator is not used with std::malloc()!

● (Dynamically created arrays can have the length 0!)

– The caller needs to check the returned void* for validity (Did std::malloc() succeed?).

● The returned void* must not be 0!

– The caller needs to cast this void* to the correct pointer-type.

– Then the dynamically created array can be used like an "ordinary" array.

– When the work is done, the array must be manually freed somewhere in the code.

● In this course we make a distinction between heapand free store. This distinction is not defined bythe C++ standard. Instead it is used colloquially totell the memory managed bystd::malloc()/std::free() from that managed bynew/delete, because they could work in differentmemory locations each.

Page 5: (5) cpp dynamic memory_arrays_and_c-strings

5

The Correct Way of creating dynamic Arrays in Code

● Let's review and correct the example of dynamic array creation:

int count = 0;std::cout<<"How many numbers do you want to enter?"<<std::endl;std::cin>>count;if (0 < count) { // Create a properly sized block in heap. The function std::malloc() returns a generic

// pointer (void*) and we have to cast this generic pointer to the type we need.int* numbers = static_cast<int*>(std::malloc(sizeof(int) * count));if (numbers) { // Check, whether std::malloc() was successful.

for (int i = 0; i < count; ++i) { // Loop over the dynamically created array: std::cout<<"Enter number: "<<(i + 1)<<std::endl;// Use the block like an ordinary array, e.g. with the []-operator:std::cin>>numbers[i];

}std::free(numbers); // When done with the array, it must be freed!

}}

● Consequently check the success of std::malloc()!- This is called defensive programming.

● Why do we need to cast here?● This is the first time we really need to cast. -

Here we need to cast a pointer to memory ofraw type to a pointer to memory of the type weneed.

● The void* represents a pointer to memory ofunknown type; casting is required to get a typewith which we can work. A void* is irrelevant byitself, we can't even dereference it. The onlything we can do with it is comparing it with otherpointers or 0.

● Interestingly the conversion from void* toanother pointer type is seen as conversionbetween related types, so a static_cast issufficient.

Page 6: (5) cpp dynamic memory_arrays_and_c-strings

6

Example: Why dynamic Memory is needed: Returning Arrays

● Automatic arrays can't be returned from functions:

● In C/C++, dynamic memory allocation on the heap is needed in such a case.

– Again, the C way is to use the functions std::malloc() and std::free().

– 1. In GetValues() the dynamic array will be created and returned.

– 2. The caller can then use the returned array.

– 3. Then the caller has to free the dynamically created array!

int* GetValues() { // Defining a function that returns a pointer to aint values[] = {1, 2, 3}; // locally defined array (created on the stack) .return values; // This pointer points to the 1st item of values.

}//-------------------------------------------------------------------------------------------------------------------int* vals = GetValues(); // Semantically wrong! vals points to astd::cout<<"2. val is: "<<vals[1]<<std::endl; // discarded memory location.// The array "values" is gone away, vals points to its scraps, probably rubbish!

Page 7: (5) cpp dynamic memory_arrays_and_c-strings

7

The Correct Way of returning Arrays

● Let's review and correct the example of returning an array:

int* GetValues() { // Allocate an array of three ints:int* values = static_cast<int*>(std::malloc(sizeof(int) * 3));if (values) { // Check std::malloc()'s success and fill the array. Indeed the

values[0] = 1; // allocation of memory and assigning of array values must bevalues[1] = 2; // separated, when std::malloc() is used (there is no Resourcevalues[2] = 3; // Acquisition is Initialization (RAII)).

} // If std::malloc() failed, let's just "forward" the 0, the caller needs to check for 0!return values; // Return the pointer to the heap block (i.e. to the array).

} //----------------------------------------------------------------------------------------------------------------------int* vals = GetValues();if (vals) { // We have to check for nullity again, GetValues() could have failed!

std::cout<<"2nd value is: "<<vals[1]<<std::endl; // Use vals as array!// >2nd value is 2std::free(vals); // The caller (!) needs to free vals!

}

● Once again: Consequently check the success ofstd::malloc()! - Program defensively!

● RAII means that a resource, which requires e.g.dynamic memory or other operating systemresources, will be initialized and freed analogousto the lifetime of a variable. - Practically it meansthat a resource from dynamic memory can becontrolled by a variable on the stack! - This can beimplemented with user defined types.

Page 8: (5) cpp dynamic memory_arrays_and_c-strings

8

Heap Functions' Signatures in Detail

void* malloc(size_t size); // memory allocate in <cstdlib>// size – The size of a portion of the heap in bytes. size_t is a typedef for a unit representing a// count of bytes on a machine, typically it is of type unsigned int.// returns – A generic pointer to the allocated portion, or 0 in case of error (e.g. out of memory).// The pointer is generic because std::malloc() can't know the type of the allocated contents, it// just knows the size that the caller passed and it returns the location of that block in case// of success. The pointer must be cast to the type, the caller awaits. - We select the color of// the contact lenses to view the block.

void free(void* ptr); // in <cstdlib>// ptr – The pointer to a block of content, allocated with std::malloc(), std::calloc() or// std::realloc(). The attempt to free a pointer to a static variable is undefined. Calling std::free()// with 0 is allowed, it just has no effect.

● Can we assume that std::malloc() returns apointer to a gap-free portion in the heap?• As far as our current knowledge is concerned:

yes! It is required, because std::malloc() can beused to create arrays, and arrays need torepresent contiguous blocks of memory.

Page 9: (5) cpp dynamic memory_arrays_and_c-strings

9

Wrap up: automatic and dynamic Arrays in Code

● Creation of automatic arrays:

● Creation and freeing of dynamic arrays:

int* dynamicArray = static_cast<int*>(std::malloc(sizeof(int) * 100));// - Indeed the syntax looks weird, not even similar to the autoArray example.// - The type of the variable we assign to is int-pointer.// - The function std::malloc() is used to create a raw memory-block in heap.// - std::malloc() returns a generic pointer (void*) to this raw memory-block, it does so, because// it doesn't know, what the programmer wants to do.// - So, as programmers we need to tell C/C++ that we want to use the allocated memory// block as int-array, therefor we need to cast the generic pointer (void*) to int-pointer . - We// "put contact lenses on".std::free(dynamicArray); // Free the dynamically created array in the right place.

int autoArray[100];// This is a simple automatic array with a compile time constant size of 100.

● Whether objects are stored on the heap or thestack is an implementation detail in many otherlanguages, not so in C++, because the programerexplicitly decides, where an object will be created.(Annotated C# Language Reference, AndersHejlsberg et. al.)

Page 10: (5) cpp dynamic memory_arrays_and_c-strings

10

But, how does returning of "normal" Variables compute?

● When a local variable is returned from a function, it will be simply copied.

● Arrays can't be returned by value, so they can't be copied!

– Here the story is completely different, we have to use the heap generally!

int GetValue() { // Just returns an int:int value = 42; // value is an automatic variable on the stack.return value; // Returns value. value will be popped from GetValue()'s stack.// Then the content of value will be pushed to the stack of the caller function. // In effect value will be copied to its caller when GetValue() returns.

}//------------------------------------------------------------------------------------------------------------------------void Foo() { // Calls GetValue():

int val = GetValue(); // The returned int was pushed on Foo()'s stack by GetValue() and // will be copied into the variable val.

}

● Arrays can also be generated as static arraysinstead of automatic arrays. A static array can bereturned from a function. The lifetime of a static array is not restricted to a function's local scope.

Page 11: (5) cpp dynamic memory_arrays_and_c-strings

11

Stack vs. Heap: It's not a Mystery, just two Concepts

● The stack is a conceptual place, where local (auto) variables reside.

– This is a little oversimplification, but each function has its own stack.

– The lifetime of a stack variable is visible by its scope (i.e. automatic: auto).

– The stack is controlled by hardware and owned by hardware.

● The heap is a conceptual place, where all dynamic contents reside.

– All functions of a program generally use the same heap.

– Dynamic content must be created by the programmer manually.

– The heap is controlled by software, the heap manager (std::malloc(), std::free() etc.).

– There is always an "entity" that is in charge for the allocated heap memory.

– This "entity" is responsible for explicit freeing the allocated heap memory.

– In the end, the lifetime of a dynamic content is controlled by the entity's programmer.

● We'd try to control as less memory as possible manually: using the stack is preferred!

● What is a scope?● See RAII!

Page 12: (5) cpp dynamic memory_arrays_and_c-strings

12

Stack vs. Heap: In Memory (RAM)

● There is the illusion that all the machine's memory is owned by a program.

– This is not true, each program uses its own portion of memory respectively.

– But in the following graphics we stick to this illusion.

● The memory is segmented, different segments have different functions.

● Esp. the stack and heap segment often have special "geometric" properties:

– The heap segment resides at lower addresses than the stack segment.

– The addresses of subsequent stack variables are decreasing.

● This is called "descending stack".

– The stack evolves/grows to lower, the heap to greater addresses.

● In fact, stack and heap grow to meet each other halfway!

– Compared to the stack, the heap is very big, because dynamic contents are typically bigger than automatic contents (e.g. localvariables).

Page 13: (5) cpp dynamic memory_arrays_and_c-strings

13

Stack vs. Heap: Conventional Locations in Memory

0

232 - 1

Stack segment

Heap segment

0

232 - 1

● Why 232-1?● Well 232 is 4294967296 (ca. 4GB), but we have

to subtract 1 in order to get space for theaddress 0!

Page 14: (5) cpp dynamic memory_arrays_and_c-strings

14

The lost Pointer to the Heap Memory in Code

void F(int count) {// Allocating an array of three ints. F() is in charge of the dynamic content, to which// p points!int* p = static_cast<int*>(std::malloc(sizeof(int) * count));// The auto variable p will go out of scope and will be popped from stack. But the// referred dynamic content is still around!

}//-------------------------------------------------------------------------------------------------------------------// Calling F():F(3); // Oops, nobody did free the dynamic content, to which p was pointing to! Now there is// no pointer to the dynamic content in avail. This is a semantic error, a memory leak of// sizeof(int) * 3. The compiler will not see any problem here!

Page 15: (5) cpp dynamic memory_arrays_and_c-strings

15

The lost Pointer to the Heap Memory in Memory

Stack segment

Heap segment

0

232 - 1

? 1 2

4B

0

0xc0005968 p

4B

void F(int count) { // Allocating an array of three ints. int* p = static_cast<int*>(std::malloc(sizeof(int) * count));if (p) { // Check std::malloc()'s success.

for (int i = 0; i < count; ++i) {p[i] = i;

}}

}

// Calling F():F(3);// After F() did run: oops! The pointer to the allocated three// ints is lost, the allocated memory is orphaned. We have a// memory leak of 12B.

:-(

Page 16: (5) cpp dynamic memory_arrays_and_c-strings

16

How to handle dynamic Content responsibly in Code

int* G(int count) {// Allocating an array of three ints. G() is in charge of the dynamic content, to which// p points to!int* p = static_cast<int*>(std::malloc(sizeof(int) * count));// Returning p. Then G()'s caller is in charge of the dynamic content!return p; // The stack variable p will go out of scope and will be popped from the

// stack. But the referred dynamic content is still around!}//-------------------------------------------------------------------------------------------------------------------// Calling G():int* t = G(3);if (t) { // Fine! The returned pointer will be checked and freed correctly.

std::free(t); }

Page 17: (5) cpp dynamic memory_arrays_and_c-strings

17

Handling dynamic Content responsibly in Memory

Stack segment

Heap segment

0

232 - 1

1 2

4B

0

0xc0005968 p

4B

int* G(int count) { // Allocating an array of three ints. int* p = static_cast<int*>(std::malloc(sizeof(int) * count));if (p) { // Check std::malloc()'s success.

for (int i = 0; i < count; ++i) {p[i] = i;

}}return p; // This time: return p!

}

int* t = G(3); // Call G() and receive the pointer.if (t) { // Check and free the content (i.e. the

std::free(t); // memory from the heap). }// The local variable t is still on the stack.

t

1 20

Page 18: (5) cpp dynamic memory_arrays_and_c-strings

18

Potential Problems with Heap Memory

● It is needed to check, whether allocation was successful!

● It is needed to free dynamic content in the right place manually.

– We have to keep in mind that there is no garbage collection in C/C++.

– So, we should not forget to free dynamically created content!

– We should free dynamically created content as early as possible, but not too early!

– We should not free dynamically created content more than once!

– We should not free dynamically created content that we don't own.

● It's impossible to distinguish pointers to the stack from pointers to the heap.

– Don't free pointers to the stack (i.e. pointers not from the heap)! -> It will result in undefined behavior!

● Wherever function interfaces deal with dynamically content, it should be documented where this memory must be freed.Who's the owner? Who's in charge?

Page 19: (5) cpp dynamic memory_arrays_and_c-strings

19

More Information about Heap Memory

● There exist two further useful functions to deal with heap memory (<cstdlib>):

– std::realloc() resizes a given block of heap memory.

– std::calloc() allocates a block of size * count and initiates all "items" with 0.

– The returned value must be checked for 0-pointer and freed with std::free().

● Free store in C++:

– In C++ the heap segment can also be used as free store.

– The operators new and delete act as interface to the free store.

– These operators represent C++' way to dynamically allocate/deallocate user defined types.

– In general, C's heap memory and C++' free store are incompatible.

● Often 3rd party libraries invent own allocation/deallocation mechanisms:

– to deal with platform specialities,

– and/or to encapsulate usage of dynamic contents.

● std::realloc():• Present items will be preserved up to the

passed length.• Possibly a pointer to another location in the

heap is returned, rendering the passed addressinvalid.

• The returned pointer should be checked for 0,before it is assigned to the passed pointervariable. Don't do it like so: p = std::realloc(p,5000)! If reallocation didn't succeed the memoryto which the passed pointer p refers won't betouched.

Page 20: (5) cpp dynamic memory_arrays_and_c-strings

20

Putting Heap Memory to work with C-strings

● As c-strings are char arrays underneath, they share the limits of other arrays:

– Limitation 1: We can not resize/extend or assign c-strings.

– Limitation 2: We can not return an automatic c-string variable from a function.

● These limitations can be solved by usage of the heap memory:

– Pattern for 1: Create a dynamically sized char-array to hold a modified/resized copy of the original c-string. E.g.: Replace asubstring of the original c-string.

– Pattern for 2: Copy a c-string into a dynamically sized char-array and return the pointer from a function.

● Sidebar: Peculiarities of c-strings, not directly shared with other arrays:

– C-strings are 0-terminated, so the very last char-array item contains a 0.

– We can get a c-string's length (std::strlen()), this is impossible with other arrays.

– As c-strings are const char-arrays, we can't modify them.

– Indeed we can return a c-string literal from a function!

● Why is it possible to return a literal c-string from afunction?● C-string literals are stored in the static portion of

memory (they are an example of a static arrays,which were mentioned earlier)! We'll discussstatic memory in a future lecture.

Page 21: (5) cpp dynamic memory_arrays_and_c-strings

21

The Roles of const char[] and char[]

● When we need to create an array that must be filled afterwards, we can not use arrays with const items. Instead we needarrays as modifiable buffers.

– This is needed, cause we need to assign to the items, in order to modify the content!

● So, c-strings are of type const char[], their matching buffer type is char[].

– The buffers we allocate dynamically for char-based c-strings are always of type char[].

● Where c-string functions accept const char*'s as parameters, we can safely pass char[] buffers; they will be decayed to constchar*.

● To sum up (char-based c-strings):

– C-strings are of type const char[], they're not modifiable

– C-string buffers are of type char[], they're modifiable.

Page 22: (5) cpp dynamic memory_arrays_and_c-strings

22

Working with C-strings: Functions for individual Chars

● A rich set of functions dealing with individual chars can be found in <cctype>.

● Character predicates (these functions await an int/char and return an int/bool):

– std::islower(), std::isupper(), std::isalpha(), std::isdigit(), std::isspace() etc.

● Character casing (these functions await an int/char and return an int/char):

– std::tolower() and std::toupper(), their result must be cast to char for presentation.

char ch = '1';if (std::isdigit(ch)) {

std::cout<<ch<<" is a digit!"<<std::endl;// >1 is a digit

}

char ch = 'X';// If ch is no upper case letter, the same char will be returned.std::cout<<ch<<" as lower case: "<<static_cast<char>(std::tolower(ch))<<std::endl;// >X as lower case: x

● If we pass a non-letter char tostd::tolower()/std::toupper() the passed char willjust be returned.

Page 23: (5) cpp dynamic memory_arrays_and_c-strings

23

Working with C-strings: Parsing C-strings

● There exists a set of functions to parse c-strings to fundamental types in <cstdlib>.

– Parsing means reading a c-string and reinterpret its content e.g. as fundamental type.

– Most important: std::atoi() (c-string to int) and std::atof() (c-string to double).

– These functions return 0 in case of error.

// A c-string that can be interpreted as int:const char anInt[] = "42";// Parse the int from anInt's content:int theInt = std::atoi(anInt);

// A c-string that can be interpreted as double:const char aDouble[] = "1.786";// Parse the double from aDouble's double:double theDouble = std::atof(aDouble);

Page 24: (5) cpp dynamic memory_arrays_and_c-strings

24

Working with C-strings: Modify the Case of a C-string

● Putting it all together, here a first example of c-string modification:

const char* oldText = "home"; // The original plain c-string. // Create a buffer in the heap, large enough to hold the original c-string. The buffer needs a size of:// sizeof(char) * (count of chars/letters + one byte for the 0-termination).char* newText = static_cast<char*>(std::malloc(sizeof(char) * (std::strlen(oldText) + 1)));if (newText) { // Check std::malloc()'s success.

// Loop over the original c-string and store the upper case variant of each char and the// 0-termination into newText at the same index.for (int i = 0; i < std::strlen(oldText) + 1; ++i) {

newText[i] = std::toupper(oldText[i]);}std::cout<<"The modified text: "<<newText<<std::endl;// >The modified text: HOME"std::free(newText); // Free the buffer.

}

Page 25: (5) cpp dynamic memory_arrays_and_c-strings

25

C-strings: Formatting and Concatenation of C-strings

● With the function std::sprintf() in <cstdlib> we can create and format c-strings.

int sprintf(char* buffer, const char* format, ...); // in <cstdlib>// buffer – A char buffer that will contain the effectively created c-string content. The content// of the buffer is the virtual result of this function; buffer is rather an output parameter,// than an input parameter!// format – A c-string that contains a format that describes the disguise of the c-string// to create. This "format-string" defines a template of the c-string to create, with// placeholders of values to be replaced.// ... – An arbitrary set of further arguments that are used to "satisfy" and replace the// placeholders in the "format-string". (The ...-notation is called "ellipsis" in C++.)// returns – The count of resulting chars in buffer (w/o the 0-termination).

Page 26: (5) cpp dynamic memory_arrays_and_c-strings

26

C-strings: "Format-strings" and Examples

● There exist many type field characters acting as format placeholders.

– But %d and %s (maybe also %p for pointers) are the most important ones:

– There are different ways to control the format in a more detailed fashion:

● Additional flags control alignment and padding, also the width and precision can be controlled.

– If the format-string and the arguments don't match the behavior is undefined.

char buffer[1000]; // buffer is an automatic fixed buffer of large extend.// The placeholder %d awaits an int to be replaced:std::sprintf(buffer, "Answer: %d items", 42);std::cout<<buffer<<std::endl; // Will print "Answer: 42 items".// The placeholder %s awaits another c-string to be replaced:std::sprintf(buffer, "It's %s's problem!", "Rick");std::cout<<buffer<<std::endl; // Will print "It's Ricks's problem!".// Now concatenation of e.g. three c-strings can be accomplished:std::sprintf(buffer, "%s%s%s", "Weyland Yutani", " at ", "LV-426");std::cout<<buffer<<std::endl; // Will print "Weyland Yutani at LV-426".

● std::sprintf():● Superfluous arguments will be ignored.● Too few arguments result in undefined behavior.

Page 27: (5) cpp dynamic memory_arrays_and_c-strings

27

C-strings: Formatting C-strings the Dynamic Way

● In most scenarios we don't know, how large the buffer must be to hold the result.

– We can define a very large (maybe auto) buffer, but this is neither safe nor efficient.

– Preferred: we can calculate the buffer length and create it dynamically!

● Let's learn how this works.

Page 28: (5) cpp dynamic memory_arrays_and_c-strings

28

C-strings: Formatting C-strings the Dynamic Way in Code

● This is a good way to safely concatenate c-strings with efficient memory usage:

const char s1[] = "Weyland Yutani";const char s2[] = " at ";const char s3[] = "LV-426";// Calculate the length of the resulting c-string:int countOfChars = std::strlen(s1) + std::strlen(s2) + std::strlen(s3);// Allocate buffer with the exact size, sufficient for our situation:char* buffer = static_cast<char*>(std::malloc(sizeof(char) * (countOfChars + 1)));if (buffer) { // Check std::malloc()'s success then do the concatenation:

std::sprintf(buffer, "%s%s%s", s1, s2, s3);std::cout<<buffer<<std::endl;// >Weyland Yutani at LV-426"std::free(buffer); // Free buffer.

}

Page 29: (5) cpp dynamic memory_arrays_and_c-strings

29

Thank you!