Topic 1 Data Representation

CS Topic 1 - Data Representation v2

1

Data representation considers how a computer uses numbers to represent data inside the computer. Three types of data are considered at this stage:

1. Numbers including positive, negative and fractions.

2. Text.

3. Graphics.


2

Binary (Base 2)The binary system only requires two symbols. 0 and 1 are used. The columns in binary represent:

27 26 25 24 23 22 21 20

e.g. the binary number 0 0 0 1 0 1 0 1

is equal to 16 + 4 + 1 = 21 in decimal.

The number 1110 =

128s 64s 32s 16s 8s 4s 2s units

8+4+2 = 14 in decimal


3

Try the following. Show your working:

The number 1110 = 8+4+2 = 14 in decimal

1. 0110

2. 1001

3. 0101

4. 1111

5. 0010

6. 1101

4+2 = 6

8+1 = 9

4+1 = 5

8+4+2+1 = 15

2 = 2

8+4+1 = 13


4

Try to learn the following powers of 2 by heart

288

221010

221616

222020

222424

223030

223232

224040

= 256

= 1024 =1K

= 65,536= 64K

= 1,048,576 = 1 MB

= 16 MB

= 4 GB

= 1 GB

= 1 TB = 1 Terabyte


5

Remember the units used in the binary system.

1 byte =

1 Kilobyte =

1 Terabyte =

1 Gigabyte =

1 Megabyte =

2048 Kilobytes = ?A. 1024 MegabytesB. 1 GigabyteC. 2 MegabytesD. 4096 bytes

8 bits

1024 bytes

1024 Kilobytes

1024 Megabytes

1024 Gigabytes

☺

3 Gigabytes = ?A. 24 TerabytesB. 3072 MegabytesC. 24 KilobytesD. 3072 Terabytes

☺


6

Here are some useful terms used in binary

Bit

Byte

Least significant bit(LSB)

Most significant bit(MSB)

Binary digit (1 or 0)

Group of 8 bits 28 = 256 values

Bit furthest to the left

Bit furthest to the right (units)


7

The computer is a two-state (binary) machine. All components inside a computer and all backing storage devices have only two states. e.g.

• a switch is on or off.

• a transistor conducts or does not conduct.

• a signal is a pulse of electricity or no pulse.

• an area of a magnetic disk is positive or negative.

• with laser technology light can reflect in two different directions.

Binary, using the numbers 0 and 1, can be represented by a two state system.


8

1. A simple two-state system is less complex to represent using electrical signals than our decimal ten-state system. Degradation in signal levels does not corrupt the information as easily and so there is less chance of errors.

2. A two state system is easy to store magnetically and optically.

3. Calculations are simpler. There are only four rules for addition. These can be easily built into the electronic circuits.

0 + 0 =0 + 1 =1 + 0 =1 + 1 =

0110 carry 1

Advantages of using Binary


9

The disadvantages of using binary are that:

1. A binary number has more digits than its decimal equivalent. i.e. it will be longer. This is not a problem for the computer but it makes it harder for us to read and work with.

2. Binary is more difficult than decimal for us to read as we are more used to decimal.


10

An integer is a whole number, positive or negative.

Every integer stored in the computer is allocated the same amount of space, whether it is a large integer or a small integer. The number of bits allocated determines the range of numbers which can be stored.

If one byte was allowed then the largest integer would be:

11111111 which is 255 in decimal or 28 - 1

Two bytes would allow:

216 -1 possibilities = a range from 0 to 65535.


11

If a computer only had to store positive integers then we could easily convert each number into its binary equivalent as you saw in the examples earlier.

However, negative numbers have to be stored too and we need to find a method of representing a –ve sign using 1s and 0s.

Modern computers use the Two’s complement method to represent integers.


12

With this method we take the most significant bit (the one on the far left) and treat it as a negative number.

The following examples illustrate the principle using 4 bit numbers to help you understand. A modern computer would use 32 bit numbers for integers.

In your NABS and final exams you are likely to be asked to use 8 bit numbers and you will practise with these later.


13

Binary Decimal1000 -81001 -71010 -61011 -51100 -41101 -31110 -21111 -10000 00001 +10010 +20011 +30100 +40101 +50110 +60111 +7

In this table the 1 at the far left represents -8 (negative 8).

Make sure that you understand this concept

Note that the range is still 24 = 16 numbers

= -8 to +7

Two’s Complement


14

Range and Accuracy of Two’s Complement

1. The range of numbers which can be stored depends on the number of bits being used.

4 bit numbers have a range

8 bit numbers have a range

-8 to +7

-128 to +127

3. Numbers stored using two’s complement are always 100% accurate.

2. In a modern computer 32 bits are used stored integers. This gives a range of 232 around -2,147,483,648 to +2,147,483,647


15

8 Bit Two’s complement numbers

Here is an example of how to work out the Two’s complement for the number -80

-80 =

-128 64 32 16 8 4 2 1

1 0 011 0 0 0

1. The number is negative so put a 1 in the first column.

1288048

2. Subtract the 80 from 128.

3. Now make 48 from the remaining columns using normal binary rules.

32

1616

0


16

Express the following numbers using 8 bit Two’s complement:

1. -45

2. -21

3. -16

4. 127

5. -129

-128 64 32 16 8 4 2 1

1 1 010 0 1 1

1 1 101 0 1 1

1 1 011 0 0 0

0 1 111 1 1 1

Number out of range


17

Real numbers (numbers with a decimal point in them) are stored using floating point representation. This is like standard form/scientific notation used in decimal.

2. The point has been moved 4 to the left so we need to multiply by 24. The power 4 = 100 in binary.

1101.101 = .1101101 x 2100

1. The binary point is moved to the far left.


18

The general form of this representation is

m x be

where m =b =e =

mantissa (the number)baseexponent (the power)

As the base is always 2 and the point is always at the far left, we only need to store the mantissa and the exponent, so the number 1101.101 becomes:

1101101 100Mantissa Exponent


19

Range and Precision of Floating Point numbers

1. The range of numbers which can be stored depends on the number of bits being used for the exponent. The exponent has no effect on precision.

2. The precision of the numbers being stored depends on the number of bits being used for the mantissa. The mantissa has no effect on range.

3. In a modern computer, floating point allows:

A 4 byte mantissa -231 to +231

A 1 byte exponent -128 to 127

In decimal this means accuracy to 9 significant figures and a range from 10-38 to 1038.


20

80 characters would need a 7 bit code. This would allow 27 different codes =

Western world alphabets need around 80 characters.

Text is made up of characters and each character is allocated its own binary code.

The set of characters that can be represented by a computer is known as the character set.

These are made up of 26 upper case letters, 26 lower case letters, 10 digits 0-9, and around 20 punctuation marks.

128


21

It is useful to have a standard code so that text can be transferred between different types of computer easily without the need for translation.

ASCII and Unicode are two of the most common codes in use today.

ASCII (American Standard Code for Information Interchange) is a 7 bit code allowing 128 characters. These include 96 displayable characters and 32 control characters which control the display devices. Examples of these include:

Code 13 = Carriage Return

Code 9 = TAB

Code 10 = Line feed

Code 8 = Backspace


22

ASCII is often extended to 8 bit which allows

28 = 256 different characters.

These include alphabetic characters in foreign languages and accented characters. This standard became known as extended ASCII and then ISO 8859.

ASCII was designed to cope with Western based character sets such as English, French, German but did not include Japanese or Arabic symbol shapes.

The increase in worldwide communication led to a need for a larger standard code to cope with other foreign alphabets, technical symbols etc.


23

Unicode provides a unique number for every character,no matter what the platform,no matter what the program,no matter what the language.

www.unicode.org

Unicode use a 16 bit code for each character.

This provides a unique code for up to 216 = 65,536 characters.

Unicode includes all the ASCII character codes to ensure compatibility.


24

takes up more space to store Unicode than it does to store ASCII.

Can represent many more characters than ASCII.

Unicode

Advantage –

Disadvantage –


25

The graphic is seen as a matrix of (picture elements) pixels and the colour of each pixel is represented by a binary code.

000111000

000111000

111111111

000111000

001000100

110000011

This simple graphic of a match stick man could be stored as a series of binary numbers.

███

███

████████

███

█ █

██ ██

In black and white mode, each pixel requires a one bit code: 0 for white

1 for black


26

Resolution refers to the number of pixels in the width and height of the image. The more pixels there are in the image the higher the resolution.

A typical 15’’ TFT screen could have a resolution of 1024 x768 = 786,432 pixels

Bit depth refers to to the number of bits needed to represent the colour of each pixel. Greyscale simply means shades of grey and so each shade needs its own code.

A 2 colour image would require a bit code. e.g.1

0 = red

1 = green


27

A 16 colour image would need a bit code(=24).4

0000 = red0001 = green0010 = blue0011 = yellow0100 = orange0101 = etc.0110011110001001101010111100110111101111

Increasing the number of colours that are available increases the size of the code for each colour.

Bit depth x(No of bits in code)

No of colours available = 2x

1 2

4 16

16 65,536

24 (true colour) 16 million

8 256


28

Here is an example of how to calculate memory requirements for an image on a screen 800x600 using 16 million colours.

Number of pixels = 800 x 600 = 480000 pixels

Bit Depth is 2x = 16 million so bit depth = i.e. you need a 24 bit code to represent the colour for each pixel.

24

The file size is 480000 x 24 bits =

Keep dividing by 1024 until to you have an appropriate unit. 1440000/1024 = 1406.25

Divide by 8 to find the number of bytes. = 1440000 Bytes

bits

11520000 bits

KB /1024 = 1.4 MB


29

Resolution No of colours File size

640 x 420 16

800 x 600 65,536

1024 x 768 256

Remember that the size of an image depends on the number of pixels and the bit depth.

1. Find the number of pixels.

2. Find the bit depth. (express answer in bits)

3. Multiply the pixels by the bit depth to give an answer in bits.

4. Divide by 8 to give the answer in bytes.

5. Keep dividing by 1024 to find the answer in KB, MB or GB.

131.2 KB

937.5 KB

768 KB


30

Sometimes you are given the bit depth in the question e.g. 24 bit colour. This makes the question easier.

If you are only told how many colours can be represented then unfortunately you have to calculate the bit depth using the equation:

2x = number of colours

Use a calculator to do this if necessary.

where x is the bit depth.


31

A higher bit depth allows more colours so the quality of photographs etc will improve.

If asked to work out how many images can be stored on a backing storage medium then remember to round down your answer as you would want to store complete images. Here is a worked example:

the file size will increase.Disadvantage:


32

How many 8.4 MB images can be stored on a 1 GB memory stick?

1. Make sure that each number is using the same units.So 1 GB = 1024 MB

2. Divide the capacity by the number of images1024/8.4 = 121.904

3. Round down the answer (Remember that you wouldn’t store a part of an image!)

You can store 121 images on a 1 GB memory stick.


33

Bit Map graphics - Advantages

2. It is easy to draw freehand shapes.

1. File sizes are large as the content of every pixel has to be stored (even blank (background) pixels).

1. You can edit individual pixels in the image.

2. Resolution dependent - when a graphic is created at a particular resolution it cannot then take advantage of a higher resolution device. It becomes "blocky" if enlarged.

3. It is difficult to manipulate shapes on the screen. (e.g. move, scale, rotate or layer)

Bit Map graphics - Disadvantages


34

A graphic is seen as being made up of a series of objects.

A mathematical description of each object is stored as a set of instructions or formulae.

A straight line can be stored as a set of two co-ordinate pairs, a line colour, thickness, pattern

A square has co-ordinates for four points, four co-ordinate pairs, line colour, thickness, pattern, fill pattern and layer.

This information allows the objects to be represented accurately.

and layer.


35

Vector graphics - Advantages1. Resolution independent - a graphic created at a

particular resolution can take advantage of a higher resolution device. It will still look in proportion.

2. It is easy to manipulate shapes on the screen. (e.g. move, scale, rotate or layer)

3. File sizes are generally smaller as values do not need to be held for every pixel.

1. It is difficult to represent freehand shapes as the computer needs to describe them mathematically.

2. You cannot edit individual pixels.

Vector graphics - Disadvantages

4. Objects can be grouped to form larger objects that can then be manipulated as a single object


36

At any given resolution and bit depth, the file size will be the same. It doesn’t matter what is actually on the screen. The content of every pixel has to be stored.

Bit mapped & vector graphics - File size

The more objects there are on the screen the bigger the file size will be.

Vector -

Bit-mapped -


37

A vector graphic has to be converted into a bit map before it is displayed on the screen or printed out. This is called rasterising or rendering.

Bit mapped packages often have the word Paint or Photo associated with them.

Graphics on screen and at the printer

Bit mapped and Vector are different ways of representing graphics in RAM and on disk.

It is important to remember that monitors and printers always display a graphic as a bit-map.

e.g. Adobe Photoshop.

Vector packages often contain the words Draw or Designe.g. Corel Draw.

Topic 1 Data Representation

Technology

Transcript of Topic 1 Data Representation