Storage of Digital Data Within a Computer System

72
Storage of Digital Data within a Computer System Understanding the elements of computer storage will enable a GIS user to design optimum storage for different types of data. 2.1 Bits Computers function on two basic elements, on and off. The smallest processing unit is called a bit (short for Binary digIT). Each bit can have one of 2 values: "on" (indicated by the value 1) and "off" (indicated by the value 0). Bits are grouped together in sets of eight, called bytes . 2.2 Binary Systems Computers use a binary system for storing numbers. In a binary system, the only figures are 1 and 0. Binary systems are best explained by comparison to the familiar decimal system. (A decimal system is uses 10 figures.) o In a decimal system, the digits 206 represent the number that is made up of 2 lots of 10² plus 0 lots of 10¹ plus 6 lots of 10° (from high school mathematics: 10² is 100, 10¹ is 10, and 10° is 1) o In a binary system the digits 101 represent the number that is made up of: 1 lots of 2² plus 0 lots of 2¹ plus 1 lot of 2° (2² is 4, 2¹ is 2, and 2° is 1, so the number is 4 + 1 = 5) Counting from 1 to 10 in binary gives the following series of numbers:

description

about data type storage and its descriptions

Transcript of Storage of Digital Data Within a Computer System

Storage of Digital Data within a Computer SystemUnderstanding the elements of computer storage will enable a GIS user to design optimum storage for different types of data.2.1 Bits Computers function on two basic elements, on and off. The smallest processing unit is called a bit (short for Binary digIT. !ach bit can ha"e one of # "alues$ %on% (indicated by the "alue & and %off% (indicated by the "alue '. Bits are grouped together in sets of eight, called bytes .2.2 Binary Systems Computers use a binary system for storing numbers. In a binary system, the only figures are & and '. Binary systems are best e(plained by comparison to the familiar decimal system. () decimal system is uses &' figures.o In a decimal system, the digits #'* represent the number that is made up of# lots of &'+ plus ' lots of &', plus * lots of &'- (from high school mathematics$ &'+ is &'', &', is &', and &'- is &o In a binary system the digits &'& represent the number that is made up of$& lots of #+ plus ' lots of #, plus & lot of #- (#+ is ., #, is #, and #- is &, so the number is . / & 0 1 Counting from & to &' in binary gi"es the following series of numbers$&, &', &&, &'', &'&, &&', &&&, &''', &''&, &'&' Binary and decimal systems are 2ust # number systems$ potentially there are many others that could be used. Two others, octal and he(adecimal, are common because they are alsoused in computing. Table & gi"es the numbers for counting from & to #' in these systems.2.3 Bytes 3ne byte of storage is 4 bits, and so can hold integer numbers in the range ' to #11.o (Integer numbers are numbers that don5t ha"e decimal points.o The number #11 is the limit, because it is the binary number &&&&&&&& (4 &5swhich e6uals & (which is #-, plus # (which is #,, plus . (#+, plus 4 (#7, plus &*, plus 8#, plus *., plus . This is a "ery useful range of data. 9uch (but certainly not all non: spatial data in a GIS,falls in this range. ;or e(ample$&. e"en in comple( forested areas, tree species can usually be allotted a discrete codewithin this range.BUT ele"ation often falls outside this range.#. umbers can be stored as character data. ;or e(ample it is useful to be able to store lot numbers for land parcels.o ) number stored as a character string will be stored as a series of characters.o It is not usually possible to use numbers stored as character strings in mathematical operations, such as addition. If a character string includes spaces (the )SCII code 8#, it is necessary to use a terminator to indicate the e(tent of the string. ?ifferent software uses different terminators, for e(ample some use single 6uotes (5 and others use double 6uotes (%.Sta"$ and heaps%Ehat goes inside when you declare a "ariableFEhen you declare a "ariable in a .>!T application, it allocates some chun= of memory in the ow the e(ecution mo"es to the ne(t step. )s the name says stac=, it stac=s this memory allocation on top of the first memory allocation. Lou can thin= about stac= as a series of compartments or bo(es put on top of each other.9emory allocation and de:allocation is done using MI;3 (Mast In ;irst 3ut logic. In other wordsmemory is allocated and de:allocated at only one end of the memory, i.e., top of the stac=. &ine 3$ In line 8, we ha"e created an ob2ect. Ehen this line is e(ecuted it creates a pointer on the stac= and the actual ob2ect is stored in a different type of memory location called NHeapG. NHeapG does not trac= running memory, itGs 2ust a pile of ob2ects which can be reached at any moment of time. Heap is used for dynamic memory allocation.3ne more important point to note here is reference pointers are allocated on stac=. The statement, Class& cls&J does not allocate memory for an instance of Class&, it only allocates a stac= "ariable cls& (and sets it tonull. The time it hits the new =eyword, it allocates on %heap%.'(iting the method )the fun*$ >ow finally the e(ecution control starts e(iting the method. Ehen it passes the end control, it clears all the memory "ariables which are assigned on stac=. Inother words all "ariables which are related to int data type are de:allocated in NMI;3G fashion from the stac=.The +ig catch O It did not de:allocate the heap memory. This memory will be later de:allocated by the garbage collector.>ow many of our de"eloper friends must be wondering why two types of memory, canGt we 2ust allocate e"erything on 2ust one memory type and we are doneFIf you loo= closely, primiti"e data types are not comple(, they hold single "alues li=e Nint i 0 'G. 3b2ect data types are comple(, they reference other ob2ects or other primiti"e data types. In otherwords, they hold reference to other multiple "alues and each one of them must be stored in memory. 3b2ect types need dynamic memory while primiti"e ones needs static type memory. If the re6uirement is of dynamic memory, itGs allocated on the heap or else it goes on a stac=.Image ta=en from http$PPmichaelbungartC.wordpress.comPAalue types and reference types>ow that we ha"e understood the concept of Stac= and Heap, itGs time to understand the concept of "alue types and reference types. Aalue types are types which hold both data and memory on the same location. ) reference type has a pointer which points to the memory location.Below is a simple integer data type with name i whose "alue is assigned to another integer data type with name2. Both these memory "alues are allocated on the stac=.Ehen we assign the int "alue to the other int "alue, it creates a completely different copy. In other words, if you change either of them, the other does not change. These =inds of data types are called as NAalue typesG.Ehen we create an ob2ect and when we assign an ob2ect to another ob2ect, they both point to the same memory location as shown in the below code snippet. So when we assign ob2 to ob2&, they both point to the same memory location.In other words if we change one of them, the other ob2ect is also affectedJ this is termed as N!T depending on the data type, the "ariable is either assigned on the stac= or on the heap. NStringG and N3b2ectsG are reference types, and any other .>!T primiti"e data types are assigned on the stac=. The figure below e(plains the same in a more detail manner.,o-ing Data from Sta"$ to .eap%Boxing and unboxingEow, you ha"e gi"en so much =nowledge, so whatGs the use of it in actual programmingF 3ne ofthe biggest implications is to understand the performance hit which is incurred due to data mo"ing from stac= to heap and "ice "ersa.Consider the below code snippet. Ehen we mo"e a "alue type to reference type, data is mo"ed from the stac= to the heap. Ehen we mo"e a reference type to a "alue type, the data is mo"ed from the heap to the stac=.This mo"ement of data from the heap to stac= and "ice:"ersa creates a performance hit.Ehen the data mo"es from "alue types to reference types, it is termed NBo(ingG and the re"erse istermed NUnBo(ingG.If you compile the abo"e code and see the same in IM?)S9, you can see in the IM code how Nbo(ingG and Nunbo(ingG loo=s. The figure below demonstrates the same.Performance implication of boxing and unboxingIn order to see how the performance is impacted, we ran the below two functions &',''' times. 3ne function has bo(ing and the other function is simple. Ee used a stop watch ob2ect to monitor the time ta=en.The bo(ing function was e(ecuted in 81.# ms while without bo(ing, the code was e(ecuted in #.@@ ms. In other words try to a"oid bo(ing and unbo(ing. In a pro2ect where you need bo(ing and unbo(ing, use it when itGs absolutely necessary.Eith this article, sample code is attached which demonstrates this performance implication.Currently I have not included source code for unboxing but the same holds true for it. You can write code and experiment it using the stopwatch class.!umeri" Data Types )/isual Basi"*Aisual Basic supplies se"eral numeric data types for handling numbers in "ariousrepresentations. Integral types represent onlywhole numbers (positi"e, negati"e, and Cero,and nonintegral types represent numbers with both integer and fractional parts.;or a table showing a side:by:side comparison of the Aisual Basic data types,see ?ata TypeSummary (Aisual Basic.Integral umeric !ypesIntegral data types are those that represent only numbers without fractional parts.The signed integral data types are SByte ?ata Type (Aisual Basic (4:bit, Short ?ata Type(Aisual Basic (&*:bit, Integer ?ata Type (Aisual Basic (8#:bit, and Mong ?ata Type (AisualBasic (*.:bit. If a "ariable always stores integers rather than fractional numbers, declare it asone of these types.The unsigned integral types are Byte ?ata Type (Aisual Basic (4:bit, UShort ?ata Type (AisualBasic (&*:bit, UInteger ?ata Type (8#:bit, and UMong ?ata Type (Aisual Basic (*.:bit. If a"ariable contains binary data, or data of un=nown nature, declare it as one of these types.Performance)rithmetic operations are faster with integral types than with other data types. They are fastestwith the Integer and 0Integer types in Aisual Basic."arge IntegersIf you need to hold an integer larger than the Integer data type can hold, you can usethe &ong data type instead. &ong "ariables can hold numbers from :Q,##8,8@#,'8*,41.,@@1,4'4through Q,##8,8@#,'8*,41.,@@1,4'@. 3perations with &ong are slightly slower than with Integer.If you need e"en larger "alues, you can use the ?ecimal ?ata Type (Aisual Basic. Lou can holdnumbers from :@Q,##4,&*#,1&.,#*.,88@,1Q8,1.8,Q1',881 through@Q,##4,&*#,1&.,#*.,88@,1Q8,1.8,Q1',881ina De"imal"ariableifyoudonotuseanydecimalplaces. Howe"er, operations with De"imal numbers are considerably slower than with any othernumeric data type.Small IntegersIf you do not need the full range of the Integer data type, you can use the Short data type, whichcan hold integers from :8#,@*4 through 8#,@*@. ;or the smallest integer range, the SByte datatype holds integers from : through @. If you ha"e a "ery large number of "ariables that holdsmall integers, the common language runtime can sometimes storeyour Short and SByte "ariables more efficiently and sa"e memory consumption. Howe"er,operations with Short and SByte are somewhat slower than with Integer.#nsigned IntegersIf you =now that your "ariable ne"er needs to hold a negati"e number, you can use the unsignedtypes Byte, 0Short, 0Integer, and 0&ong. !ach of these data types can hold a positi"e integertwice as large as its corresponding signed type (SByte, Short, Integer, and &ong. In terms ofperformance, eachunsignedtypeise(actlyasefficient asitscorrespondingsignedtype. Inparticular, 0Integer shareswith Integer thedistinctionof beingthemost efficient of all theelementary numeric data types.onintegral umeric !ypesNonintegral data types are those that represent numbers with both integer and fractional parts.The nonintegral numeric data types are De"imal (:bit fi(ed point, Single ?ata Type (AisualBasic (8#:bit floating point, and ?ouble ?ata Type (Aisual Basic (*.:bit floating point. Theyare all signed types. If a "ariable can contain a fraction, declare it as one of these types.De"imal is not a floating:point data type. De"imal numbers ha"e a binary integer "alue and aninteger scaling factor that specifies what portion of the "alue is a decimal fraction.Lou can use De"imal "ariables for money "alues. The ad"antage is the precision of the "alues.The Dou+le data type is faster and re6uires less memory, but it is sub2ect to rounding errors.The De"imal data type retains complete accuracy to #4 decimal places.;loating:point (Single and Dou+le numbers ha"e larger ranges than De"imal numbers but canbe sub2ect to rounding errors. ;loating:point types support fewer significant digitsthan De"imal but can represent "alues of greater magnitude.>onintegral number "alues can be e(pressed as mmm!eee, in which mmm is the mantissa (thesignificant digitsand eee isthe exponent (apowerof&'. Thehighestpositi"e"aluesofthenonintegral types are @.Q##4&*#1&.#*[email protected]'881!/#4for De"imal, 8..'#4#81!/84for Single, and &.@Q@*Q8&8.4*#8&1@'!/8'4 for Dou+le.PerformanceDou+le is themost efficient of thefractional datatypes, becausetheprocessorsoncurrentplatforms perform floating:point operations in double precision. Howe"er, operationswith Dou+le are not as fast as with the integral types such as Integer.Small $agnitudes;or numberswiththesmallest possiblemagnitude(closest to', Dou+le "ariablescanholdnumbers as small as :..Q.'*1*.14..*1..!:8#. for negati"e "alues and..Q.'*1*.14..*1..!:8#. for positi"e "alues.Small %ractional umbersIfyoudonot needthefull rangeofthe Dou+le datatype, youcanusethe Single datatype,whichcanholdfloating:point numbers from:8..'#4#81!/84through8..'#4#81!/84. Thesmallest magnitudes forSingle "ariables are :&..'Q4!:.1 for negati"e "alues and &..'Q4!:.1 for positi"e "alues. If you ha"e a "ery large number of "ariables that hold small floating:pointnumbers, the common language runtime can sometimes store your Single "ariables moreefficiently and sa"e memory consumption.Chara"ter Data Types )/isual Basi"*Aisual Basicpro"ides characterdatatypes todeal withprintableanddisplayablecharacters.Ehile they both deal with Unicode characters, Char holds a single characterwhereas String contains an indefinite number of characters.;or a table that displays a side:by:side comparison of the Aisual Basic data types, see ?ata TypeSummary (Aisual Basic.Char !ypeThe Char data type is a single two:byte (&*:bit Unicode character. If a "ariable always storese(actly one character, declare it as Char. ;or e(ample$AB5 InitialiCe the prefi( "ariable to the character 5a5. ?im prefi( )s Char 0 %a%!achpossible"alueina Char or String "ariableis a code point, or character code, intheUnicode character set. Unicode characters include the basic )SCII character set, "arious otheralphabet letters, accents, currency symbols, fractions, diacritics, and mathematical and technicalsymbols. !oteThe Unicode character set reser"es the code points ?4'' through ?;;; (11#Q* through 1111& decimal for surrogate pairs) Char "ariable cannot hold a surrogate pair, and a String uses two positions to hold such a pair.;or more information, see Char ?ata Type (Aisual Basic.String !ypeThe String data type is a se6uence of Cero or more two:byte (&*:bit Unicode characters. If a"ariable can contain an indefinite number of characters, declare it as String. ;or e(ample$AB5 InitialiCe the name "ariable to %9onday%. ?im name )s String 0 %9onday%,is"ellaneous Data Types )/isual Basi"*Aisual Basicsuppliesse"eral datatypesthat arenot orientedtowardnumbersorcharacters.Instead, theydeal withspecialiCeddatasuchasyesPno"alues, datePtime"alues, andob2ectaddresses.;or a table showing a side:by:side comparison of the Aisual Basic data types,see ?ata TypeSummary (Aisual Basic.Boolean !ypeThe Boolean ?ata Type (Aisual Basic is an unsigned "alue that is interpreted aseither True or 1alse. Itsdatawidthdependsontheimplementingplatform. Ifa"ariablecancontain only two:state "alues such as truePfalse, yesPno, or onPoff, declare it as Boolean.&ate !ypeThe ?ate ?ata Type (Aisual Basic is a *.:bit "alue that holds both date and time information.!ach increment represents &'' nanoseconds of elapsed time since the beginning ($'' )9 ofRanuary & of the year & in the Gregorian calendar. If a "ariable can contain a date "alue, a time"alue, or both, declare it as Date.'b(ect !ypeThe 3b2ect ?ata Type is a 8#:bit address that points to an ob2ect instance within your applicationor insome other application. )n 2+3e"t "ariable canrefer toanyob2ect your applicationrecogniCes, or to data of any data type. This includes both value types, such as Integer, Boolean,and structure instances, and reference types, which are instances of ob2ects created from classessuch as String and ;orm, and array instances.If a "ariable stores a pointer to an instance of a class that you do not =now at compile time, or ifit can point to data of "arious data types, declare it as 2+3e"t.The ad"antage of the 2+3e"t data type is that you can use it to store data of any data type. Thedisad"antage is that you incur e(tra operations that ta=e more e(ecution time and ma=e yourapplication perform slower. If you use an 2+3e"t "ariable for "alue types, youincur boxing and unboxing. If you use it for reference types, you incur late binding.Understand Computer Decision StructuresControl Structures (loops, ifs, and switchEelcome to my tutorial on Control Structures (or constructs for short. In this tutorial, we shall go through each control structure in turn, and then we shall finish by demonstrating them in use with a basic e(ample. 4hy is this important to learn a+out5Control structures are one of the most fundamental concepts in any programming language you will come across. If you want to =now CS, and if you want to learn to program with any language, you must ha"e a firm grip on the control structures of the language.Generally spea=ing, if you learn them in one language, you will pic= them up in different languages "ery 6uic=ly.Definitions of terms used.Iteration O This is the act of repeating something (that NsomethingG being code statements in the programming conte(t.Conditional O ) conditional action is said to be one that is only performed if a certain condition is true. !ote% All examples were created using isual !tudio 2"1"# targetting the $N%& 'ramewor( )$"$ *e+ll do our best to point out anything that might not wor( in older versions$Control Structures (loops, ifs, and switchBefore we begin, you should ha"e a reasonable understanding of$ ?ata Types 3perators) basic =nowledge of methods would be useful, but is not re6uired.ote that all e(amples are written in a Console)pplicationGs 9ain( method.Iteration Constru"tsIteration constructs (otherwise =nown as loops allow us to repeat code in a cyclic fashion. Thereare . types of iteration constructs in CS, each of which are co"ered ne(t.'or Loops;or loops allow us to specify the number of times to repeat a bloc= of code. It is best demonstrated with an e(ample$& for (int i 0 &J i U0 1J i//# I8Console.EriteMine(iJ. KThis basic e(ample prints out all the numbers from & to 1. How does it do thisF I shall go throughwhat happens when the loop e(ecutes, step by step$&. )n integer "ariable called NiG is declared and set e6ual to & in this lineJ int i 0 &. This is called the initialisation "ariable.#. The condition i U0 1 is chec=ed. If NiG is less than or e6ual to 1, then the body of the loop is e(ecuted. 3therwise, the loop terminates. & is less than 1, so the condition e"aluates to true, and the body (the code statements in the curly braces of the loop e(ecutes.8. 3nce the body has been e(ecuted once (and the "alue of NiG has been printed out, which, of course, is currently &. >e(t, this statement is e(ecutedJ i//. This means Nincrement the "alue of NiG by &.G Conse6uently meaning that NiG now e6uals #... >ow, the condition i U0 1 is e"aluated again. NiG is now e6ual to #, of which is less than 1, so the loop body e(ecutes again, printing out #.1. i// is then e(ecuted, meaning NiG is now 8. i U0 1 e"aluates to true as 8 is less than 1, so the V8W gets printed out to the console.*. i// is e(ecuted again, meaning NiG e6uals ., which is still less than 1, so the loop body e(ecutes again, leading to . being printed.@. i// is e(ecuted once again, meaning NiG e6uals 1. i U0 1 is testing if NiG is less than 3< e6ual to 1. NiG is now e6ual to 1, so it still e"aluates to true, meaning the loop body is e(ecuted again, thus printing out 1.4. i// is e(ecuted again, meaning NiG now e6uals *. >ow, i U0 1 e"aluates to false. Therefore, the loop terminates as the terminating condition has e"aluated to false. The loop doesnGt get e(ecutedany more, and the program can continue to e(ecute statements that come after the loop. Thus, we see numbers from & to 1 printed out to the console in this e(ample. >ote that you can use any condition as the terminating condition, as long as it e"aluates to true offalsePX.) few points to note$ The i// statement can be changed to any other "alid statement that modifies the "ariable NiG. ) few e(amples includeJ i::(decrement the "ariable by &, i /0 # (add # to the "ariable etc. In our e(ample, the scope of the "ariable NiG is limited to the body of the for loop. So, if we did this$1 for (int i = 1; i