60-140 Lecture 2a Dr. Robert D. Kent. Data concepts Operator basics.
-
Upload
shanon-rodgers -
Category
Documents
-
view
219 -
download
2
Transcript of 60-140 Lecture 2a Dr. Robert D. Kent. Data concepts Operator basics.
60-140 Lecture 2aDr. Robert D. Kent
Data concepts Operator basics
Types, symbols and values
Data concepts◦ Data versus Information◦ Data Typing◦ Symbols and Referencing◦ Values
Computers are specialized tools (hardware) built to process data using components (instruction logic) designed to perform specific (well-defined) transformations
◦ Instructions are simply bit-strings (0’s and 1’s) that encode the Type of operation (eg. +, -, *, =) Location(s) of values to be operated on (or values
embedded within, or implied by, the instruction itself)
◦ Operand data are bit-strings that encode values according to specified representations that computer hardware (ALU) can operate on “meaningfully”
In order to really understand programming it is necessary to appreciate both data and logic◦ The same is true of problem solving in general, but
we often take an intuitive view of Data and focus on Process
◦ Data may present limitations or obstacles to problem solving
◦ Data representation is problem dependent and therefore requires special consideration
◦ With computer hardware, there may be significant performance differences between similar operations on different data types (eg. Integer versus Real)
Information is a human conceptualization that is much broader than Data.
Data (singular: datum) refers to value in a measurement system◦ EX> Three meters
Data: Three System: Metric Length◦ EX> 100 stone
Data: 100 System: (Brit.) Weight
Is it meaningful to ask – what is the total of Three meters and 100 stone?
Is it meaningful to ask – what is the total of Three meters and 100 stone?
NO!
Clearly, if we ignore the context of the values Three and 100, we can just add numbers◦ But, the result is meaningless because it lacks a
cogent informational content
Data alone, without information (context) is typically meaningless◦ Operations on data must always be designed carefully
to account for context (ie. Information)
Another example:
Imagine a time (~0 BCE/AD) in Italy when two owners of goats decide to combine their herds into one for a common business
At the time of merger, each must count their own goats (a labour intensive task, using fingers, sticks and the Roman numbering system)◦ One has MXXVII goats, the other DCCCXLIII goats◦ What is the total number of goats?
The notion (concept) of TOTAL (or sum) is not at issue – both goat herders understand this concept◦ What is difficult is how to calculate the value of the Total
without having to merge the herds into a single pen and then count them all again, starting at one (I).
DXXVII plus DCCCXLIII
Five Hundred Twenty Seven Plus Eight Hundred Forty Three
Five Two SevenPlus Eight Four Three
Five Two Seven Plus Eight Four ThreeEquals One Three Seven Zero
527+ 843 1370
Five Hundred Twenty Seven Plus Eight Hundred Forty Three
Equals One Thousand Three Hundred Seventy
DXXVII plus DCCCXLIII equals MCCCLXX
Courtesy of arabic insights in mathematics
Now, think about how many different kinds of mental operations you have
performed – translation, organization, representational formatting, addition !
This is more about handling information than simply data alone.
Now we know how to tell the goats from the sheep
Lessons Learned?◦ Computers, through logic,
do exactly what programmers tell them to do
◦ Most errors are due to mistaking information for data and leaving out essential aspects of logic
Data can be grouped into types according to the context of the values used◦ Integers are used to count whole (ie. complete)
things 1 person, 4 balls, 12 moons
◦ Real numbers are used to describe both integer and fractional portions of wholes Pi = 3.14159 (approx) is the ratio of the circle
circumference and its diameter The average number of children per Canadian family
is 1.4 The set of integers forms a proper subset of the set
of real numbers.
Other types of data can be constructed using the mathematical concept of mapping (a type of transformational logic)◦ Ordinal sequencing is the simplest form of usage
Characters can be organized into sequences◦ Lower Case Alphabetic: a, b, c, ..., z◦ Upper Case Alphabetic: A, B, C, ... Z◦ Digits : 0, 1, 2, ... 9◦ Punctuation : { , . / ! ? ; : ‘ “ [ ] ( ) } $ @ _◦ Operators : < > = * & % ^ - +
And other special symbols
The organization of character sequences has several forms◦ First developed by Hollerith (still used in Fortran)◦ BCD and EBCDIC◦ ASCII (7 bit and 8 bit)◦ UniCode
Although we will not require knowledge (ie. memorization) of the ASCII code, students should familiarize themselves with it and note ◦ how code subgroups are sequenced◦ the interpretive meanings of the various codes◦ the breadth of the code applicability to both printing of
characters and also communications
APPENDIX C of textbook.
In the C language several data types have been specifically designed and planned for within compilers and taking account of modern computer instruction logic (hardware)◦ Integer : int◦ Real : float, double◦ Character : char
These are called the primitive data types.◦ Supported in hardware by most computers
Integer variables are defined in declaration statements, as follows:
int SymbolName ; /* one variable */int VarName1, VarName2 ; /* two variable list */
When the compiler interprets the first statement it◦ reserves enough room for data to be stored, ◦ translates the user-defined SymbolName into a set of
numerical address references that CPU hardware can operate on, and
◦ utilizes the data type assigned (int) to perform semantic consistency checking (and code generation) throughout the program
int SymbolName ;
When the program is eventually compiled and then executed (a.out), a suitable amount of space (L bits, or L/8 bytes) in RAM is allocated to SymbolName◦ Most computers will allocate 4 bytes (32 bits)◦ An integer representation is applied (eg. 2’s
complement)
Values may be in the range from – 2L-1 (minimum, negative) up to 2L-1-1 (maximum, positive)◦ For a 32-bit integer: 231 is about 2.1 billion
Integers can come in flavours, or sub-types.
short int ShortIntVar ; /* 16b, 32767 */ long int LongIntVar ; /* 64b, 263 ~ 1019 */
unsigned int PosIntVar ; /* ONLY >= 0, 65K */
◦ Each of these subtypes is useful for solving problems when the range of values is restricted (ie. small, or positive) or when a larger range is needed Often, specific computers will show differences in
performance when operating on integer subtypes
Real valued variables are declared as follows
float FloatVar ; double DoubleVar ;
Values that are stored in float- and double-sized memory allocations are specified by standards organizations (eg. IEEE, ANSI)◦ Size◦ Representation
Consider the real number (conventional form):
1234.56789
Restate in scientific notation:
+ 0.123456789 x 104
It is obvious that the amount of space that can be allocated to store real values is finite.
For real data, this means that there is a limit to how many significant digits can be stored◦ Thus, when operating on real data, answers will be
adjusted to the available precision offered by each machine
◦ This leads to a potential loss of accuracy in calculations With potentially devastating effects ! This subject is typically dealt with in courses (and
books) on Numerical Analysis and Applied Mathematics
From Mathematics we know that the Set of Integer Numbers is a subset of the Set of Real Numbers
This view is carried out in most programming languages, but with an important caveat:◦ Semantics (Compilers)
integer valued expressions are subsets of real valued expressions (compatibility)
The converse is not true (incompatibility)◦ Hardware
Integer and Floating Point calculations are performed by different hardware components which are sensitive to the representational formats of each data type
Character valued variables are declared as follows
char CharVar ;
Characters represented using the ASCII encodings are allocated one (1) byte of storage◦ Exactly and only 1 character per variable
Technically speaking, char is a subtype of int
Later in your study of C, you will encounter the concept of a collection of characters, or strings.◦ This will involve array and logical delimiter
concepts and techniques◦ An important category of algorithms is that of
string processing Word processing Language translation, compilers Natural language processing (NLP) and artificial
intelligence (AI)
As you continue learning the C language you will ◦ Develop an understanding of functions and how
they are given a data type attribute◦ Understand the notion and practice of abstract
data types◦ Understand how to work with arbitrary collections
of bits What the bits represent is only restricted by the
limits of your imagination (and some meaningful logic)
You will also need to understand the fundamental logic operations of Boolean Set Theory and, or, complement, nand, nor, exclusive or, exclusive
nor
A quick note on Input/Output.
Assume the declaration: int N = 5 ;
Consider: printf ( “Total = %d\n”, N ) ;
The %d is used to indicate that an integer (decimal) value is to be outputted.
The value at location N is assumed to be an int data type – if it is not, then a logical error will occur.
The value outputted (5) will be formatted (by default) to start at the position of the % with minus sign (-) if N is negative, followed by as many digits are required.
A quick note on Input/Output.
Assume the declaration: int N ;
scanf ( “%d”, &N ) ;
The %d is used to indicate that an integer (decimal) value is to be inputted.
The variable N is assumed to be an int data type – if it is not, then a logical error will likely occur somewhere in the program.
The variable N is preceded by the ampersand operator (&) which signifies “address of”.
In other words, we scan the input for a valid integer and store that “at the address of location N”
A quick note on Input/Output.
In both printf() and scanf() library functions we note that the first operand within parentheses is a string of characters (enclosed within quotation marks “ “) Within this string are included data specifier
codes, each preceded by a % Integer (int) : %d Real (float) : %f Character (char) %c
User defined variable names (and later functions and data structures) are used to benefit algorithm designers (ie. programmers)
Variables are abstractions of the data values used in actual calculations ◦ We find it easier to refer to X in a formula than to
think separately about each specific value that X might represent
Compilers are programs that follow rigorous rules of logic◦ Programmers must follow these rules through the
formal definitions and requirements of each programming language
In C◦ All symbols (names) must be declared before they
may be referenced◦ All symbol declarations must follow the C rules of
grammar and syntax◦ Any undeclared symbol references will be reported as
compiler errors Mis-spellings account for most such errors C language declared symbol names are CaSe sensitive
Data values (called literal values) are stated using conventional formats
Integers:◦ 0 -1 4789 (no commas)
Reals:◦ 0 -1 -1.0 3.14159 12345 (no commas)
Characters: (sandwiched between two apostrophes)◦ `a` `b` `,` `A` `Y` `$` ` \n`
Accuracy is an important consideration when planning solutions
◦ Do not over-specify real values when the machine precision will not allow this (eg. stating Pi with too many digits)
◦ Integers have an upper-limit value (about 2.1 billion) than may be exceeded Ex. Factorial of 12, 13, 14 ?
◦ Reals may suffer from both an overflow and an underflow that can lead to erroneous calculations
Assignment, Arithmetic, Relations, Expressions, Data types
Operator basics◦ Assignment◦ Arithmetic◦ Relational◦ Logical
Expressions Data types
An operator is a symbol that denotes a specific action.◦ Operator symbols may be single characters, or
they may be terms◦ Each action must be well-defined (unambiguous)
in a mathematical (logical) sense◦ Actions have both Semantic and Logical aspects
The meaning of the operation (human) How the operation is performed (computer)
◦ Actions may be understood as sometimes failing These are noted as exceptions and are usually
reportable, or remedial (healing) actions may be prescribed and carried out by computers and O/S`s.
The set equal to symbol is used to denote the concept of assignment of a value to a variable◦ This also means that data is being stored in RAM
(usually, rarely in the CPU)
◦ Examples:
int N = 0 ; /* declare N and store 0 */
N = 5 ; /* Store 5 at location N, replace 0 */
The way we humans often say this, in English, is:
Set N equal to the value 5.
In the programming sense, one must be more careful and vigilant to ensure that it is understood that a value
is being stored at a memory location.
In other words, before the value 5 is actually stored it is not known if N already contains this value. However,
once the value has been stored it is clear that the value stored at location N is equivalent (equal to) the value 5.
The assignment operator must be used with care and attention to detail
◦ Avoid using = where you intend to perform a comparison for equivalence (equality) using ==
◦ You may use = more than once in a statement This may be confusing and should be avoided when it is
necessary to assure clarity of codes.
◦ Examples:
N = M = 5 ; /* Store 5 at both locations M and N */
N = ( M == 3 ) ; /* Evaluate if M is equal to 3 - store result at location N */
A final point to emphasize
◦ Assignment requires Right-to-Left type compatibility
◦ This means that for every expression: A = B
If the type of A and the type of B are identical then the assignment does not require conversion and is directly implementable
It is necessary that the type of B is a proper subset ( sub-type) of the type of A – thus, if A and B have different types it is necessary to perform conversion of data representation (which may take several primitive operations and be time consuming)
Arithmetic operators are used to express the logic of numerical operations◦ This logic may depend on data type
The operators may be grouped as follows:◦ Addition and Subtraction : + -◦ Multiplication : *◦ Integer Division : / %◦ Floating point Division : /◦ Auto-Increment and Auto-Decrement
++ and -- Pre- versus Post-
Addition, subtraction and multiplication of numbers are all meaningful operations ◦ Learned by small children all over the world !◦ From a mechanical viewpoint, we all learn to
perform these operations in the same way (same algorithms) for both integers and real numbers. There are some differences to be careful of (more
later).
We denote the operator symbols◦ Addition and Subtraction : + (plus) - (hyphen) ◦ Multiplication : * (asterisk)
Unary versus Binary◦ It is meaningful to say –X (negative X) so C
permits use of the minus symbol (hyphen) as a unary operator. It also permits use of + as unary. Ex. A = -3 ; Clearly, multiplication (*) of numbers does not make
sense as a unary operator, but we will see later that * does indeed act unarily on a specific data type
◦ All operators have typical use as binary operators in arithmetic expression units of the general form
Operand1 arith_op Operand2
There are considerable differences between how different computers may handle the int and float (or double) data types◦ As a general rule, floating point hardware is slower
than integer hardware for the same arithmetic operation.
Programmers should work with int ' s unless it is quite clear that float ' s should be used◦ NOTE: For programs involving financial calculations it
is advised to store currency values as integers (low order 2 digits are the cents) and perform integer based computations Ex. $1,256.73 becomes 125673
There are two division operators in C◦ / (quotient) and % (modulus)◦ Both are binary operators◦ Modulus division is used almost exclusively for division of
integers, since it evaluates to the remainder X % Y evaluates to: Q + R / Y
Integer Division : / %
◦ int X=5, Y=3, N, M ;◦ N = X / Y ; /* evaluates to 1 */◦ M = X % Y ; /* evaluates to 2 */
Floating point Division : /◦ An expensive operation – use sparingly !
A simple illustration of Modulus:
Consider the problem of a 12 hour digital clock. The clock starts at time 0, then counts up in 1 hour increments: 1, 2, 3, .... , 10, 11, and then resets to 0 on the twelfth hour.
A statement that updates the Hour (assumed of int data type) is :
Hour = ( Hour + 1 ) % 12 ;
Note how this behaves. When Hour is any value from 0 to 10 inclusive, the right side expression (Hour + 1) evaluates from 1 to 11 and the modulus division does not change this result.
However, when Hour is 11, the rhs evaluates to 0. If this statement is in a loop structure, the clock repeatedly counts through the 12 hour cycle.
A common programming statement involves adding (or subtracting) 1 to (from) a variable used for counting
◦ N = N + 1 ; N = N – 1 ;
◦ The addition of 1 to an integer variable is called incrementation
◦ Similarly, subtracting 1 from an integer variable is called decrementation
The C language supports two operators that automatically generate increment or decrement statements on integer variables◦ Auto-Increment ++◦ Auto-Decrement --
◦ Examples: (Equivalent statements) Explicit Post-auto Pre-auto◦ N = N + 1 ; N++ ; ++N ;◦ N = N – 1 ; N-- ; --N ;
There is a very important difference between using these operators before versus after a variable symbol
◦ AFTER (POST) : If an expression contains N++, the expression is
evaluated using the value stored at the location N. After the expression is evaluated, the value at N is incremented by 1.
◦ BEFORE (PRE) : If an expression contains ++N, the value at N is
incremented by 1 and stored at N, before any other parts of the expression are evaluated. The expression is then evaluated using the new value at N.
Assume the declarations with initial values specified◦ int A, B, N = 4, M = 3 ;
What are the final values of A, B, N and M ?
◦ A = N++ ;◦ B = ++M + N-- ; /* watch out ! */◦ A = --A ;
◦ ANSWER: A = 3 B = 9 N = 4 M = 4
Operator augmentation involves combining two operator symbols to form a new symbol with extended meaning
Arithmetic Assignment operators combine the expressiveness of arithmetic and assignment and permit abbreviation of coding
◦ += and -=◦ *= ◦ /= and %=
◦ In some cases they may lead to hardware optimization of executable code.
Although these operations have a certain kind of elegance, they may create ambiguity. ◦ However, programmers should ensure that
programs have clarity.
◦ Examples:◦ Longhand Shorthand
X = X + Y ; X += Y ;
X = X * Y ; X *= Y ;
X = X % Y ; X %= Y ;
Relational operators are used to express the concept of comparison of two values◦ Based on the Boolean notions of True and False
This is vital to decision making logic where we do something – or not – based on evaluating an expression
◦ while ( Age > 0 ) .....
◦ if ( Num <= 0 ) .....
Formally, these operators are defined as
◦ Equivalence (Equal to) : ==◦ Non-equivalance (Not equal to) : !=
◦ Open Precursor (Less than) : <◦ Closed Precursor (Less than or equal to) : <=
◦ Open Successor (Greater than) : >◦ Closed Successor (Greater than or equal to) :
>=
Each matching colour pair is complementary.
◦ Equivalence (Equal to) : ==◦ Non-equivalance (Not equal to) : !=
◦ Open Precursor (Less than) : <◦ Closed Precursor (Less than or equal to) : <=
◦ Open Successor (Greater than) : >◦ Closed Successor (Greater than or equal to) :
>=
Each relational operator is a binary operator, with an operand on the left and another on the right of the operator symbol(s)
Relational expressions are formed using units of the form:
◦ Operand1 rel_op Operand2
The value of a relational expression is always 0 (meaning false) or 1 (meaning true).◦ The data type is an integer◦ These are fundamental expression units in Boolean Set
Theory◦ Sometimes called propositions.
Boolean Set Theory defines several operations that act on values 0 and 1◦ These values apply to relational expressions and also
integer variables (limited to these two values)
Complement (Not) : !◦ Unary ! ( X < Y )
Intersection (And) : &&◦ Binary ( X < Y ) && ( Age >
20 )
Union (inclusive Or) : ||◦ Binary ( X < Y ) || ( Age >
20 )
The logical operators considered at this time are a subset of the logic operators. The remaining operators will be considered later.
The main use of these operators is in forming complex decision logic◦ Several logical sub-expressions can be combined
into a single expression◦ This is very useful in the condition expressions
that appear in if or while structures
PROPOSITIONI will go to the movies if:
I have $20 in my pocketAND I have enough gas in my carOR it is $10 Tuesday special night
AND I have $10 in my pocketAND I am able to walk to the movie theater
C is one of only a few languages that contains a ternary operator, an operator that acts on three operands
This operator is used for simplified expression of decision logic intended to provide a result
(A > B ) ? 10 : 20
If it is true that A > B, the expression evaluates to 10 – otherwise 20.
Complex expressions can be constructed using the various operators seen so far◦ Such expressions must be constructed with care,
taking into account the issue of data type compatibility
◦ It is also important to avoid ambiguity in how the expression is to be interpreted (both by the compiler and by the programmer)
Parentheses ( ) are often used to encapsulate sub-expression terms◦ Sub-expressions within parentheses are compiled
before other terms.
When an expression is constructed using parenthesized sub-expressions, these sub-expressions themselves may be further broken down into parenthesized sub-sub-expressions
This is referred to as nesting of expressions◦ Innermost nested sub-expressions are evaluated
first by compilers (and during execution)
Example:
( 1 + 5 ) * 3 – ( 4 – 2 ) % 3
Example:
( 1 + 5 ) * 3 – ( 4 – 2 ) % 3
( 6 ) * 3 - ( 2 ) % 3
18 - 2
16
Example:
( 1 + 5 ) * ( 3 – ( 4 – 2 ) / ( 5 – 1 ) ) % 3
Example:
( 1 + 5 ) * ( 3 – ( 4 – 2 ) / ( 5 – 1 ) ) % 3
( 6 ) * ( 3 - ( 2 ) / ( 4 ) ) % 3
6 * ( 3 - 0 ) % 3
6 * 3 % 3
18 % 3 = 0
Defined in C as default types:◦ char - ASCII◦ int
Default signed unsigned int short int unsigned short int long int unsigned long int
◦ float, double Extended precision float: long double
Not defined in C:◦ Bit – boolean (is defined in some languages/C++)
Compilers are designed to execute with well-defined logic. ◦ In order to properly translate C source code
programs, programmers must follow the rules of the language in coding
Precedence ordering◦ Fixed by the rules of grammar defined by the C
language designers Dennis Kernighan and Brian Ritchie (and many others)
◦ Ordering of operators by application rules◦ Left to right rule (LR) Right to left rule (RL)
Precedence ordering◦ Unary prefix, (type) cast [RL]◦ Parentheses [LR]
Nesting – innermost to outermost◦ Multiplication, Division, Modulus [LR]◦ Add, Subtract, Negation, Unary postfix [LR]◦ Relational
< <= > >= [LR] == != [LR]
◦ Logical operators [LR] Complement ! And && Or ||