60-140 Lecture 2a Dr. Robert D. Kent. Data concepts Operator basics.

60-140 Lecture 2aDr. Robert D. Kent

Data concepts Operator basics

Types, symbols and values

Data concepts◦ Data versus Information◦ Data Typing◦ Symbols and Referencing◦ Values

Computers are specialized tools (hardware) built to process data using components (instruction logic) designed to perform specific (well-defined) transformations

◦ Instructions are simply bit-strings (0’s and 1’s) that encode the Type of operation (eg. +, -, *, =) Location(s) of values to be operated on (or values

embedded within, or implied by, the instruction itself)

◦ Operand data are bit-strings that encode values according to specified representations that computer hardware (ALU) can operate on “meaningfully”

In order to really understand programming it is necessary to appreciate both data and logic◦ The same is true of problem solving in general, but

we often take an intuitive view of Data and focus on Process

◦ Data may present limitations or obstacles to problem solving

◦ Data representation is problem dependent and therefore requires special consideration

◦ With computer hardware, there may be significant performance differences between similar operations on different data types (eg. Integer versus Real)

Information is a human conceptualization that is much broader than Data.

Data (singular: datum) refers to value in a measurement system◦ EX> Three meters

Data: Three System: Metric Length◦ EX> 100 stone

Data: 100 System: (Brit.) Weight

Is it meaningful to ask – what is the total of Three meters and 100 stone?

Is it meaningful to ask – what is the total of Three meters and 100 stone?

NO!

Clearly, if we ignore the context of the values Three and 100, we can just add numbers◦ But, the result is meaningless because it lacks a

cogent informational content

Data alone, without information (context) is typically meaningless◦ Operations on data must always be designed carefully

to account for context (ie. Information)

Another example:

Imagine a time (~0 BCE/AD) in Italy when two owners of goats decide to combine their herds into one for a common business

At the time of merger, each must count their own goats (a labour intensive task, using fingers, sticks and the Roman numbering system)◦ One has MXXVII goats, the other DCCCXLIII goats◦ What is the total number of goats?

The notion (concept) of TOTAL (or sum) is not at issue – both goat herders understand this concept◦ What is difficult is how to calculate the value of the Total

without having to merge the herds into a single pen and then count them all again, starting at one (I).

DXXVII plus DCCCXLIII

Five Hundred Twenty Seven Plus Eight Hundred Forty Three

Five Two SevenPlus Eight Four Three

Five Two Seven Plus Eight Four ThreeEquals One Three Seven Zero

527+ 843 1370

Five Hundred Twenty Seven Plus Eight Hundred Forty Three

Equals One Thousand Three Hundred Seventy

DXXVII plus DCCCXLIII equals MCCCLXX

Courtesy of arabic insights in mathematics

Now, think about how many different kinds of mental operations you have

performed – translation, organization, representational formatting, addition !

This is more about handling information than simply data alone.

Now we know how to tell the goats from the sheep

Lessons Learned?◦ Computers, through logic,

do exactly what programmers tell them to do

◦ Most errors are due to mistaking information for data and leaving out essential aspects of logic

Data can be grouped into types according to the context of the values used◦ Integers are used to count whole (ie. complete)

things 1 person, 4 balls, 12 moons

◦ Real numbers are used to describe both integer and fractional portions of wholes Pi = 3.14159 (approx) is the ratio of the circle

circumference and its diameter The average number of children per Canadian family

is 1.4 The set of integers forms a proper subset of the set

of real numbers.

Other types of data can be constructed using the mathematical concept of mapping (a type of transformational logic)◦ Ordinal sequencing is the simplest form of usage

Characters can be organized into sequences◦ Lower Case Alphabetic: a, b, c, ..., z◦ Upper Case Alphabetic: A, B, C, ... Z◦ Digits : 0, 1, 2, ... 9◦ Punctuation : { , . / ! ? ; : ‘ “ [ ] ( ) } $ @ _◦ Operators : < > = * & % ^ - +

And other special symbols

The organization of character sequences has several forms◦ First developed by Hollerith (still used in Fortran)◦ BCD and EBCDIC◦ ASCII (7 bit and 8 bit)◦ UniCode

Although we will not require knowledge (ie. memorization) of the ASCII code, students should familiarize themselves with it and note ◦ how code subgroups are sequenced◦ the interpretive meanings of the various codes◦ the breadth of the code applicability to both printing of

characters and also communications

APPENDIX C of textbook.

In the C language several data types have been specifically designed and planned for within compilers and taking account of modern computer instruction logic (hardware)◦ Integer : int◦ Real : float, double◦ Character : char

These are called the primitive data types.◦ Supported in hardware by most computers

Integer variables are defined in declaration statements, as follows:

int SymbolName ; /* one variable */int VarName1, VarName2 ; /* two variable list */

When the compiler interprets the first statement it◦ reserves enough room for data to be stored, ◦ translates the user-defined SymbolName into a set of

numerical address references that CPU hardware can operate on, and

◦ utilizes the data type assigned (int) to perform semantic consistency checking (and code generation) throughout the program

int SymbolName ;

When the program is eventually compiled and then executed (a.out), a suitable amount of space (L bits, or L/8 bytes) in RAM is allocated to SymbolName◦ Most computers will allocate 4 bytes (32 bits)◦ An integer representation is applied (eg. 2’s

complement)

Values may be in the range from – 2L-1 (minimum, negative) up to 2L-1-1 (maximum, positive)◦ For a 32-bit integer: 231 is about 2.1 billion

Integers can come in flavours, or sub-types.

short int ShortIntVar ; /* 16b, 32767 */ long int LongIntVar ; /* 64b, 263 ~ 1019 */

unsigned int PosIntVar ; /* ONLY >= 0, 65K */

◦ Each of these subtypes is useful for solving problems when the range of values is restricted (ie. small, or positive) or when a larger range is needed Often, specific computers will show differences in

performance when operating on integer subtypes

Real valued variables are declared as follows

float FloatVar ; double DoubleVar ;

Values that are stored in float- and double-sized memory allocations are specified by standards organizations (eg. IEEE, ANSI)◦ Size◦ Representation

Consider the real number (conventional form):

1234.56789

Restate in scientific notation:

+ 0.123456789 x 104

It is obvious that the amount of space that can be allocated to store real values is finite.

For real data, this means that there is a limit to how many significant digits can be stored◦ Thus, when operating on real data, answers will be

adjusted to the available precision offered by each machine

◦ This leads to a potential loss of accuracy in calculations With potentially devastating effects ! This subject is typically dealt with in courses (and

books) on Numerical Analysis and Applied Mathematics

From Mathematics we know that the Set of Integer Numbers is a subset of the Set of Real Numbers

This view is carried out in most programming languages, but with an important caveat:◦ Semantics (Compilers)

integer valued expressions are subsets of real valued expressions (compatibility)

The converse is not true (incompatibility)◦ Hardware

Integer and Floating Point calculations are performed by different hardware components which are sensitive to the representational formats of each data type

Character valued variables are declared as follows

char CharVar ;

Characters represented using the ASCII encodings are allocated one (1) byte of storage◦ Exactly and only 1 character per variable

Technically speaking, char is a subtype of int

Later in your study of C, you will encounter the concept of a collection of characters, or strings.◦ This will involve array and logical delimiter

concepts and techniques◦ An important category of algorithms is that of

string processing Word processing Language translation, compilers Natural language processing (NLP) and artificial

intelligence (AI)

As you continue learning the C language you will ◦ Develop an understanding of functions and how

they are given a data type attribute◦ Understand the notion and practice of abstract

data types◦ Understand how to work with arbitrary collections

of bits What the bits represent is only restricted by the

limits of your imagination (and some meaningful logic)

You will also need to understand the fundamental logic operations of Boolean Set Theory and, or, complement, nand, nor, exclusive or, exclusive

nor

A quick note on Input/Output.

Assume the declaration: int N = 5 ;

Consider: printf ( “Total = %d\n”, N ) ;

The %d is used to indicate that an integer (decimal) value is to be outputted.

The value at location N is assumed to be an int data type – if it is not, then a logical error will occur.

The value outputted (5) will be formatted (by default) to start at the position of the % with minus sign (-) if N is negative, followed by as many digits are required.


Assume the declaration: int N ;

scanf ( “%d”, &N ) ;

The %d is used to indicate that an integer (decimal) value is to be inputted.

The variable N is assumed to be an int data type – if it is not, then a logical error will likely occur somewhere in the program.

The variable N is preceded by the ampersand operator (&) which signifies “address of”.

In other words, we scan the input for a valid integer and store that “at the address of location N”


In both printf() and scanf() library functions we note that the first operand within parentheses is a string of characters (enclosed within quotation marks “ “) Within this string are included data specifier

codes, each preceded by a % Integer (int) : %d Real (float) : %f Character (char) %c

User defined variable names (and later functions and data structures) are used to benefit algorithm designers (ie. programmers)

Variables are abstractions of the data values used in actual calculations ◦ We find it easier to refer to X in a formula than to

think separately about each specific value that X might represent

Compilers are programs that follow rigorous rules of logic◦ Programmers must follow these rules through the

formal definitions and requirements of each programming language

In C◦ All symbols (names) must be declared before they

may be referenced◦ All symbol declarations must follow the C rules of

grammar and syntax◦ Any undeclared symbol references will be reported as

compiler errors Mis-spellings account for most such errors C language declared symbol names are CaSe sensitive

Data values (called literal values) are stated using conventional formats

Integers:◦ 0 -1 4789 (no commas)

Reals:◦ 0 -1 -1.0 3.14159 12345 (no commas)

Characters: (sandwiched between two apostrophes)◦ `a` `b` `,` `A` `Y` `$` ` \n`

Accuracy is an important consideration when planning solutions

◦ Do not over-specify real values when the machine precision will not allow this (eg. stating Pi with too many digits)

◦ Integers have an upper-limit value (about 2.1 billion) than may be exceeded Ex. Factorial of 12, 13, 14 ?

◦ Reals may suffer from both an overflow and an underflow that can lead to erroneous calculations

Assignment, Arithmetic, Relations, Expressions, Data types

Operator basics◦ Assignment◦ Arithmetic◦ Relational◦ Logical

Expressions Data types

An operator is a symbol that denotes a specific action.◦ Operator symbols may be single characters, or

they may be terms◦ Each action must be well-defined (unambiguous)

in a mathematical (logical) sense◦ Actions have both Semantic and Logical aspects

The meaning of the operation (human) How the operation is performed (computer)

◦ Actions may be understood as sometimes failing These are noted as exceptions and are usually

reportable, or remedial (healing) actions may be prescribed and carried out by computers and O/S`s.

The set equal to symbol is used to denote the concept of assignment of a value to a variable◦ This also means that data is being stored in RAM

(usually, rarely in the CPU)

◦ Examples:

int N = 0 ; /* declare N and store 0 */

N = 5 ; /* Store 5 at location N, replace 0 */

The way we humans often say this, in English, is:

Set N equal to the value 5.

In the programming sense, one must be more careful and vigilant to ensure that it is understood that a value

is being stored at a memory location.

In other words, before the value 5 is actually stored it is not known if N already contains this value. However,

once the value has been stored it is clear that the value stored at location N is equivalent (equal to) the value 5.

The assignment operator must be used with care and attention to detail

◦ Avoid using = where you intend to perform a comparison for equivalence (equality) using ==

◦ You may use = more than once in a statement This may be confusing and should be avoided when it is

necessary to assure clarity of codes.

◦ Examples:

N = M = 5 ; /* Store 5 at both locations M and N */

N = ( M == 3 ) ; /* Evaluate if M is equal to 3 - store result at location N */

A final point to emphasize

◦ Assignment requires Right-to-Left type compatibility

◦ This means that for every expression: A = B

If the type of A and the type of B are identical then the assignment does not require conversion and is directly implementable

It is necessary that the type of B is a proper subset ( sub-type) of the type of A – thus, if A and B have different types it is necessary to perform conversion of data representation (which may take several primitive operations and be time consuming)

Arithmetic operators are used to express the logic of numerical operations◦ This logic may depend on data type

The operators may be grouped as follows:◦ Addition and Subtraction : + -◦ Multiplication : *◦ Integer Division : / %◦ Floating point Division : /◦ Auto-Increment and Auto-Decrement

++ and -- Pre- versus Post-

Addition, subtraction and multiplication of numbers are all meaningful operations ◦ Learned by small children all over the world !◦ From a mechanical viewpoint, we all learn to

perform these operations in the same way (same algorithms) for both integers and real numbers. There are some differences to be careful of (more

later).

We denote the operator symbols◦ Addition and Subtraction : + (plus) - (hyphen) ◦ Multiplication : * (asterisk)

Unary versus Binary◦ It is meaningful to say –X (negative X) so C

permits use of the minus symbol (hyphen) as a unary operator. It also permits use of + as unary. Ex. A = -3 ; Clearly, multiplication (*) of numbers does not make

sense as a unary operator, but we will see later that * does indeed act unarily on a specific data type

◦ All operators have typical use as binary operators in arithmetic expression units of the general form

Operand1 arith_op Operand2

There are considerable differences between how different computers may handle the int and float (or double) data types◦ As a general rule, floating point hardware is slower

than integer hardware for the same arithmetic operation.

Programmers should work with int ' s unless it is quite clear that float ' s should be used◦ NOTE: For programs involving financial calculations it

is advised to store currency values as integers (low order 2 digits are the cents) and perform integer based computations Ex. $1,256.73 becomes 125673

There are two division operators in C◦ / (quotient) and % (modulus)◦ Both are binary operators◦ Modulus division is used almost exclusively for division of

integers, since it evaluates to the remainder X % Y evaluates to: Q + R / Y

Integer Division : / %

◦ int X=5, Y=3, N, M ;◦ N = X / Y ; /* evaluates to 1 */◦ M = X % Y ; /* evaluates to 2 */

Floating point Division : /◦ An expensive operation – use sparingly !

A simple illustration of Modulus:

Consider the problem of a 12 hour digital clock. The clock starts at time 0, then counts up in 1 hour increments: 1, 2, 3, .... , 10, 11, and then resets to 0 on the twelfth hour.

A statement that updates the Hour (assumed of int data type) is :

Hour = ( Hour + 1 ) % 12 ;

Note how this behaves. When Hour is any value from 0 to 10 inclusive, the right side expression (Hour + 1) evaluates from 1 to 11 and the modulus division does not change this result.

However, when Hour is 11, the rhs evaluates to 0. If this statement is in a loop structure, the clock repeatedly counts through the 12 hour cycle.

A common programming statement involves adding (or subtracting) 1 to (from) a variable used for counting

◦ N = N + 1 ; N = N – 1 ;

◦ The addition of 1 to an integer variable is called incrementation

◦ Similarly, subtracting 1 from an integer variable is called decrementation

The C language supports two operators that automatically generate increment or decrement statements on integer variables◦ Auto-Increment ++◦ Auto-Decrement --

◦ Examples: (Equivalent statements) Explicit Post-auto Pre-auto◦ N = N + 1 ; N++ ; ++N ;◦ N = N – 1 ; N-- ; --N ;

There is a very important difference between using these operators before versus after a variable symbol

◦ AFTER (POST) : If an expression contains N++, the expression is

evaluated using the value stored at the location N. After the expression is evaluated, the value at N is incremented by 1.

◦ BEFORE (PRE) : If an expression contains ++N, the value at N is

incremented by 1 and stored at N, before any other parts of the expression are evaluated. The expression is then evaluated using the new value at N.

Assume the declarations with initial values specified◦ int A, B, N = 4, M = 3 ;

What are the final values of A, B, N and M ?

◦ A = N++ ;◦ B = ++M + N-- ; /* watch out ! */◦ A = --A ;

◦ ANSWER: A = 3 B = 9 N = 4 M = 4

Operator augmentation involves combining two operator symbols to form a new symbol with extended meaning

Arithmetic Assignment operators combine the expressiveness of arithmetic and assignment and permit abbreviation of coding

◦ += and -=◦ *= ◦ /= and %=

◦ In some cases they may lead to hardware optimization of executable code.

Although these operations have a certain kind of elegance, they may create ambiguity. ◦ However, programmers should ensure that

programs have clarity.

◦ Examples:◦ Longhand Shorthand

X = X + Y ; X += Y ;

X = X * Y ; X *= Y ;

X = X % Y ; X %= Y ;

Relational operators are used to express the concept of comparison of two values◦ Based on the Boolean notions of True and False

This is vital to decision making logic where we do something – or not – based on evaluating an expression

◦ while ( Age > 0 ) .....

◦ if ( Num <= 0 ) .....

Formally, these operators are defined as

◦ Equivalence (Equal to) : ==◦ Non-equivalance (Not equal to) : !=

◦ Open Precursor (Less than) : <◦ Closed Precursor (Less than or equal to) : <=

◦ Open Successor (Greater than) : >◦ Closed Successor (Greater than or equal to) :

>=

Each matching colour pair is complementary.

◦ Equivalence (Equal to) : ==◦ Non-equivalance (Not equal to) : !=

◦ Open Precursor (Less than) : <◦ Closed Precursor (Less than or equal to) : <=

◦ Open Successor (Greater than) : >◦ Closed Successor (Greater than or equal to) :

>=

Each relational operator is a binary operator, with an operand on the left and another on the right of the operator symbol(s)

Relational expressions are formed using units of the form:

◦ Operand1 rel_op Operand2

The value of a relational expression is always 0 (meaning false) or 1 (meaning true).◦ The data type is an integer◦ These are fundamental expression units in Boolean Set

Theory◦ Sometimes called propositions.

Boolean Set Theory defines several operations that act on values 0 and 1◦ These values apply to relational expressions and also

integer variables (limited to these two values)

Complement (Not) : !◦ Unary ! ( X < Y )

Intersection (And) : &&◦ Binary ( X < Y ) && ( Age >

20 )

Union (inclusive Or) : ||◦ Binary ( X < Y ) || ( Age >

20 )

The logical operators considered at this time are a subset of the logic operators. The remaining operators will be considered later.

The main use of these operators is in forming complex decision logic◦ Several logical sub-expressions can be combined

into a single expression◦ This is very useful in the condition expressions

that appear in if or while structures

PROPOSITIONI will go to the movies if:

I have $20 in my pocketAND I have enough gas in my carOR it is $10 Tuesday special night

AND I have $10 in my pocketAND I am able to walk to the movie theater

C is one of only a few languages that contains a ternary operator, an operator that acts on three operands

This operator is used for simplified expression of decision logic intended to provide a result

(A > B ) ? 10 : 20

If it is true that A > B, the expression evaluates to 10 – otherwise 20.

Complex expressions can be constructed using the various operators seen so far◦ Such expressions must be constructed with care,

taking into account the issue of data type compatibility

◦ It is also important to avoid ambiguity in how the expression is to be interpreted (both by the compiler and by the programmer)

Parentheses ( ) are often used to encapsulate sub-expression terms◦ Sub-expressions within parentheses are compiled

before other terms.

When an expression is constructed using parenthesized sub-expressions, these sub-expressions themselves may be further broken down into parenthesized sub-sub-expressions

This is referred to as nesting of expressions◦ Innermost nested sub-expressions are evaluated

first by compilers (and during execution)

Example:

( 1 + 5 ) * 3 – ( 4 – 2 ) % 3

Example:

( 1 + 5 ) * 3 – ( 4 – 2 ) % 3

( 6 ) * 3 - ( 2 ) % 3

18 - 2

16

Example:

( 1 + 5 ) * ( 3 – ( 4 – 2 ) / ( 5 – 1 ) ) % 3

Example:

( 1 + 5 ) * ( 3 – ( 4 – 2 ) / ( 5 – 1 ) ) % 3

( 6 ) * ( 3 - ( 2 ) / ( 4 ) ) % 3

6 * ( 3 - 0 ) % 3

6 * 3 % 3

18 % 3 = 0

Defined in C as default types:◦ char - ASCII◦ int

Default signed unsigned int short int unsigned short int long int unsigned long int

◦ float, double Extended precision float: long double

Not defined in C:◦ Bit – boolean (is defined in some languages/C++)

Compilers are designed to execute with well-defined logic. ◦ In order to properly translate C source code

programs, programmers must follow the rules of the language in coding

Precedence ordering◦ Fixed by the rules of grammar defined by the C

language designers Dennis Kernighan and Brian Ritchie (and many others)

◦ Ordering of operators by application rules◦ Left to right rule (LR) Right to left rule (RL)

Precedence ordering◦ Unary prefix, (type) cast [RL]◦ Parentheses [LR]

Nesting – innermost to outermost◦ Multiplication, Division, Modulus [LR]◦ Add, Subtract, Negation, Unary postfix [LR]◦ Relational

< <= > >= [LR] == != [LR]

◦ Logical operators [LR] Complement ! And && Or ||

60-140 Lecture 2a Dr. Robert D. Kent. Data concepts Operator basics.

Documents

Transcript of 60-140 Lecture 2a Dr. Robert D. Kent. Data concepts Operator basics.