ARM Microcontroller Code Size (Full)

download ARM Microcontroller Code Size (Full)

of 16

description

jjjjj

Transcript of ARM Microcontroller Code Size (Full)

  • ARMMicrocontrollerCodeSizeAnalysis|Overview 1

    32BitMicrocontrollerCodeSizeAnalysisDraft1.2.4.JosephYiu,AndrewFrame

    OverviewMicrocontrollerapplicationprogramcodesizecandirectlyaffectthecostandpowerconsumptionofproductsthereforeitisalmostalwaysviewedasanimportantfactorintheselectionofamicrocontrollerforembeddedprojects.Sincethereleaseandavailabilityof32bitprocessorssuchastheARMCortexM3,moreandmoremicrocontrollerusershavediscoveredthebenefitsofswitchingto32bitproductslowerpower,greaterenergyefficiency,smallercodesizeandmuchbetterperformance.Whilstmostofthebenefitsofusing32bitmicrocontrollersarewidelyknown,thecodesizeadvantageof32bitmicrocontrollersislessobvious.

    Inthisarticlewewillexplainwhy32bitmicrocontrollerscanreduceapplicationcodesizewhilststillachievinghighsystemperformanceandeaseofuse.

    Typicalmythsofprogramsize

    Myth#1:8bitand16bitmicrocontrollershavesmallercodesizeThereisacommonmisconceptionthatswitchingfroman8bitmicrocontrollertoa32bitmicrocontrollerwillresultinmuchbiggercodesizewhy?Manypeoplehavetheimpressionthat8bitmicrocontrollersuse8bitinstructionsand32bitmicrocontrollersuse32bitinstructions.Thisimpressionisoftenreinforcedbyslightlymisleadingmarketingfromthe8bitand16bitmicrocontrollervendors.

    Inreality,manyinstructionsin8bitmicrocontrollersare16bit,24bitsorothersizeslargerthan8bit,forexample,thePIC18instructionsizesare16bitand,withthe8051architecture,althoughsomeinstructionsare1bytelong,manyothersare2or3byteslong.

    Sowouldcodesizebebettermovingtoa16bitmicrocontroller?Notnecessarily.TakingtheMSP430asanexample,asingleoperandinstructioncantake4bytes(32bits)andadoubleoperandinstructioncantake6bytes(48bits).Intheworstcase,anextendedimmediate/indexinstructioninMSP430Xcantake8bytes(64bits).

    SohowaboutthecodesizeforARMCortexmicrocontrollers?TheARMCortexM3andCortexM0processorsarebasedonThumb2technology,whichprovidesexcellentcodedensity.Thumb2microcontrollershave16bitinstructionsaswellas32bitinstructions,withthe32bitinstructionfunctionalityasupersetofthe16bitversion.InmostcasesaCcompilerwillusethe16bitversionoftheinstruction.The32bitversionwouldonlybeusedwhentheoperationcannotbeperformed

  • ARMMicrocontrollerCodeSizeAnalysis|Typicalmythsofprogramsize 2

    witha16bitinstruction.Asaresult,mostoftheinstructionsinanARMCortexmicrocontrollerprogramare16bits.Thatsevensmallerthansomeoftheinstructionsin8bitmicrocontrollers.

    Instruction size

    8051Min

    Max

    Number of bits

    PIC18 MSP430 / MSP430X

    Min

    Max

    ARM

    Min

    Max

    PIC24

    16

    32

    48

    64

    Figure1:Sizeofasingleinstructioninvariousprocessors

    WithinacompiledprogramforCortexMprocessors,thenumberof32bitinstructionscanbeonlyasmallportionofthetotalinstructioncount.Forexample,theamountof32bitinstructionsintheDhrystoneprogramimageisonly15.8%ofthetotalinstructioncount(averageinstructionsizeis18.53bits)whencompiledfortheCortexM3.FortheCortexM0theratioof32bitinstructionsisevenlowerat5.4%(averageinstructionsize16.9bits).

    Myth#2:Myapplicationonlyprocesses8bitdataand16bitdataManyembeddeddevelopersthinkthatiftheirapplicationonlyprocesses8bitdatathenthereisnobenefitinswitchingtoa32bitmicrocontroller.However,lookingintotheoutputfromtheCcompilercarefully,inmostcasesthehumbleintegerdatatypeisactually16bits.Sowhenyouhaveaforloopwithanintegerasloopindex,comparingavaluetoanintegervalue,orusingaClibraryfunctionthatusesaninteger(e.g.memcpy()),youareactuallyusing16bitorlargerdata.Thiscanaffectcodesizeandperformanceinvariousways:

    Foreachintegercomputation,an8bitprocessorwillneedmultipleinstructionstocarryouttheoperations.Thisdirectlyincreasesthecodesizeandtheclockcyclecount.

    Iftheintegervaluehastobesavedintomemory,orifyouneedtoloadanimmediatevaluefromprogramROMtothisinteger,itwilltakemultipleinstructionsandmultipleclockcycles.

    Sinceanintegercantakeuptwo8bitregisters,moreregistersarerequiredtoholdthesamenumberofintegervariables.Whenthereareaninsufficientnumberofregistersintheregisterbanktoholdlocalvariables,somehavetobestoredinmemory.Thusan8bitmicrocontrollermightresultinmorememoryaccesseswhichincreasescodesizeandreducesperformanceandpowerefficiency.Thesameissueappliestotheprocessingof32bitdataon16bitmicrocontrollers.

  • ARMMicrocontrollerCodeSizeAnalysis|Typicalmythsofprogramsize 3

    Sincemoreregistersarerequiredtoholdanintegerinan8bitmicrocontrollerwhenpassingvariablestoafunctionviathestack,orsavingregistercontentsduringcontextswitchingorinterruptservicing,thenumberofstackoperationsrequiredismorethanthatof32bitmicrocontrollers.Thisincreasestheprogramsize,andcanalsoaffectinterruptlatencybecauseanInterruptServiceRoutine(ISR)mustmakesurethatallregistersusedaresavedatISRentryandrestoredatISRexit.Thesameissueappliestotheprocessingof32bitdataon16bitmicrocontrollers.

    Thereisevenmorebadnewsfor8bitmicrocontrollerusers:memoryaddresspointerstakemultiplebytessodataprocessinginvolvingtheuseofpointerscanthereforebeextremelyinefficient.

    Myth#3:A32bitprocessorisnotefficientathandling8bitand16bitdataMost32bitprocessorsareactuallyveryefficientathandling8bitand16bitdata.Compactmemoryaccessinstructionsforsignedandunsigned8bit,16bitand32bitdataareallavailable.Therearealsoanumberofinstructionsspeciallyincludedfordatatypeconversions.Overallthehandlingof8bitand16bitdatain32bitprocessorssuchastheARMCortexmicrocontrollersisjustaseasyandefficientashandling32bitdata.

    Myth#4:ClibrariesforARMprocessorsaretoobigTherearevariousClibraryoptionsforARMprocessors.Formicrocontrollerapplications,anumberofcompilervendorshavedevelopedClibrarieswithamuchsmallerfootprint.Forexample,theARMdevelopmenttoolshaveasmallerversionoftheClibrarycalledMicroLib.TheseClibrariesareespeciallydesignedformicrocontrollersandallowapplicationcodesizetobesmallandefficient.

    Myth#5:InterrupthandlingonARMmicrocontrollersismorecomplexOntheARMCortexmicrocontrollerstheinterruptserviceroutinesarejustnormalCsubroutines.VectoredornestedinterruptsaresupportedbytheNestedVectoredInterruptController(NVIC)withnoneedforsoftwareintervention.Infactthesetupprocessandprocessingofaninterruptrequestismuchsimplerthan8bitand16bitmicrocontrollers,asgenerallyyouonlyneedtoprogramtheprioritylevelofaninterruptandthenenableit.

    Theinterruptvectorsarestoredinavectortableinthebeginningofthememory,normallywithintheflash,withouttheneedforanysoftwareprogrammingsteps.WhenaninterruptrequesttakesplacetheprocessorautomaticallyfetchesthecorrespondinginterruptvectorandstartstoexecutetheISR.Someoftheregistersarepushedtothestackbyahardwaresequenceandrestoredautomaticallywhentheinterrupthandlerexits.TheotherregistersthatarenotcoveredbythehardwarestackingsequencearepushedontothestackbyCcompilergeneratedcodeonlyiftheregisterisusedandmodifiedwithintheISR.

  • ARMMicrocontrollerCodeSizeAnalysis|Typicalmythsofprogramsize 4

    Whataboutmovingto16bitmicrocontrollers?16bitmicrocontrollerscanbeefficientinhandling16bitintegersand8bitdata(e.g.strings)howeverthecodesizeisstillnotasoptimalasusing32bitprocessors:

    Handlingof32bitdata:iftheapplicationrequireshandlingofanylonginteger(32bit)orfloatingpointtypesthentheefficiencyof16bitprocessorsisgreatlyreducedbecausemultipleinstructionsarerequiredforeachprocessingoperation,aswellasdatatransfersbetweentheprocessorandthememory.

    Registerusage:Whenprocessing32bitdata,16bitprocessorsrequirestworegisterstoholdeach32bitvariable.Thisreducesthenumberofvariablesthatcanbeheldintheregisterbank,hencereducingprocessingspeedaswellasincreasingstackoperationsandmemoryaccesses.

    Memoryaddressingmode:Many16bitarchitecturesprovideonlybasicaddressingmodessimilarto8bitarchitectures.Asaresult,thecodedensityispoorwhentheyareusedinapplicationsthatrequireprocessingofcomplexdatasets.

    64Kbyteslimitation:Many16bitprocessorsarelimitedto64Kbytesofaddressablememoryreducingthefunctionalityoftheapplication.Some16bitarchitectureshaveextensionstoallowmorethan64Kbytesofmemorytobeaccessed,however,theseextensionshaveaninstructioncodeandclockcycleoverhead,forexample,amemorypointerwouldbelargerthan16bitsandmightrequiremultipleinstructionsandmultipleregisterstoprocessit.

  • ARMMicrocontrollerCodeSizeAnalysis|InstructionSetefficiency 5

    InstructionSetefficiencyWhencustomersporttheirapplicationsfrom8bitarchitecturetoARMCortexmicrocontrollers,theyveryoftenfindthatthetotalcodehasdramaticallydecreased.Forexample,whenMelfas(aleadingcompanyincapacitivesensingtouchscreencontrollers)evaluatedtheCortexM0processor,theyfoundthattheCortexM0programsizewaslessthanhalfofthatofthe8051and,atthesametime,deliveredfivetimesmoreperformanceatthesameclockfrequency.This,forexample,couldenablethemtoruntheapplicationat1/5clockspeedoftheequivalent8051product,reducingthepowerconsumption,andloweringproductcostatthesametimeduetoasmallerprogramflashsizerequirements.

    SohowdoesARMarchitectureprovidesuchbigadvantages?ThekeyfactorisThumb2technologywhichprovidesahighlyefficientunifiedinstructionset.

    PowerfulAddressingmodeTheARMCortexmicrocontrollerssupportanumberofaddressingmodesformemorytransferinstructions.Forexample:

    Immediateoffset(Address=Registervalue+offset)

    Registeroffset((Address=Registervalue1+shifted(Registervalue2))

    PCrelated(Address=CurrentPCvalue+offset)

    Stackpointerrelated(Address=SP+offset)

    Multipleregisterloadandstore,withoptionalautomaticbaseaddressupdate

    PUSH/POPinstructionswithmultipleregisters

    Asaresultofthesevariousaddressingmodes,datatransferbetweenregistersandmemorycanbehandledwithfewerinstructions.SincethePUSHandPOPinstructionssupportmultipleregisters,inmostcases,savingandrestoringofregistersinafunctioncallwillonlyneedonePUSHinthebeginningoffunctionandonePOPattheendofthefunction.ThePOPcanevenbecombinedwiththereturninstructionattheendoffunctiontofurtherreducetheinstructioncount.

    ConditionalbranchesAlmostallprocessorsprovideconditionalbranchinstructionshoweverARMprocessorsprovideimprovedconditionalbranchingbyhavingseparatedbranchconditionsforsignedandunsigneddataoperationresults,andprovidingagoodbranchrange.

    Forexample,whencomparingtheconditionalbranchesoftheCortexM0andMSP430,theCortexM0hasmorebranchconditionsavailable,makingitpossibletogeneratemorecompactcodenomatterwhetherthedatabeingprocessissignedorunsigned.TheMSP430conditionalbranchesmightrequiremultipleinstructionstogetthesameoperations.

  • ARMMicrocontrollerCodeSizeAnalysis|InstructionSetefficiency 6

    Generallythesamesituationappliestomany8bitor16bitmicrocontrollerswhendealingwithsigneddata,additionalstepsmightalsoberequiredintheconditionalbranch.

    InadditiontothebranchinstructionsavailableintheCortexM0,theCortexM3processoralsosupportscompareandbranchinstructions(CBZandCBNZ).Thisfurthersimplifiessomeoftheconditionalbranchinstructionsequence.

    ConditionalExecutionAnotherareathatallowstheARMCortexM3microcontrollerstohavemorecompactcodeistheconditionalexecutionfeature.TheCortexM3supportsaninstructioncalledIT(IFTHEN).Thisinstructionallowsupto4subsequentinstructionstobeconditionallyexecutedreducingtheneedforadditionalbranches.Forexample,

    if(xpos1

  • ARMMicrocontrollerCodeSizeAnalysis|InstructionSetefficiency 7

    MOV.W R11, R14 MOV.W R12, R13 JMP Label2 Label1 MOV.W R11, R13 MOV.W R12, R14 Label2 ThisresultsinanextratwobytesfortheMSP430whencomparedtoCortexM3.

    MultiplyandDivideBoththeCortexM0andCortexM3processorssupportsinglecyclemultiplyoperations.TheCortexM3alsohasmultiplyandmultiplyaccumulateinstructionsfor32bitor64bitresults.Theseinstructionsgreatlyreducethecodesizerequiredwhenhandlingmultiplicationoflargevariables.

    Mostother8bitand16bitmicrocontrollersalsohavemultiplyinstructionshoweverthelimitationoftheregistersizeoftenmeansthatthemultiplicationrequiresmultiplesteps,iftheresultneedstobemorethan8or16bits.

    TheMSP430doesnothavemultiplyinstruction(MSP430documentslaa329,reference1).Tocarryoutmultiplicationeitheramemorymappedhardwaremultiplierisused,orthemultiplyoperationhastobehandledbysoftwareusingaddandshift.Evenifahardwaremultiplierispresentthememorymappednatureofthemultiplierresultsintheadditionaloverheadoftransferringdatatoandfromtheexternalhardware.Inaddition,usingthemultiplierwithinaninterrupthandlercouldcauseexistingdatainthemultipliertobelost.Asaresult,interruptsareusuallydisabledbeforeamultiplyoperationandtheinterruptisreenabledaftermultiplicationiscompleted.Thisaddsadditionalsoftwareoverheadandaffectsinterruptlatencyanddeterminism.

    TheCortexM3processoralsohasunsignedandsignedintegerdivideinstructions.ThisreducesthecodesizerequiredinapplicationsthatneedtoperformintegerdivisionbecausethereisnoneedfortheClibrarytoincludeafunctionforhandlingdivideoperations.

    PowerfulinstructionsetInadditionaltothestandarddataprocessing,memoryaccessandprogramcontrolinstructions,theCortexmicrocontrollersalsosupportanumberofotherinstructionstohelpdatatypeconversion.TheCortexM3processoralsosupportsanumberofbitfieldoperationsreducingthesoftwareoverheadin,forexample,peripheralcontrolandcommunicationdataprocessing.

  • ARMMicrocontrollerCodeSizeAnalysis|Breakingthe64Kbytememorybarrier 8

    Breakingthe64KbytememorybarrierAsalreadymentioned,many8bitand16bitmicrocontrollersarelimitedto64kbytesaddressablememory.Duetothenatureof8bitand16bitmicrocontrollerarchitecture,thecodingefficiencyofthesemicrocontrollersoftendecreasesdramaticallywhentheapplicationexceedsthe64kbytememorybarrier.In8bitand16bitmicrocontrollers(e.g.8051,PIC24,C166)thisisoftenhandledbymemorybankswitchingormemorysegmentationwiththeswitchingcodegeneratedautomaticallybytheCcompilers.Everytimeafunctionordatainadifferentmemorypageisrequiredbankswitchingcodewouldbeneededandhencefurtherincreasestheprogramsize.

    Figure2:Increasecodesizeoverheadofmemorybankswitchingorsegmentationin8bitand16bitsystems

    Thememorybankswitchingnotonlycreateslargercodebutitalsogreatlyreducestheperformanceofasystem.Thisisespeciallythecaseifthedatabeingprocessedisondifferentmemorybank(e.g.copyingablockofdatafromonepagetoanotherpagecanbeverycostlyintermsofperformance.)Thisisparticularlyinefficientfor8bitmicrocontrollerslikethe8051becausetheMCS51

    architecturedoesnothavepropersupportforsuchamemorybankswitchingfeature.ThereforememoryswitchinghastobecarriedoutbysavingandupdatingmemorybankcontrollikeI/Oportregisters.Inaddition,thememorypageswitchingcodeusuallyhastobecarriedoutinacongestedsharedmemoryspacewithlimitedsize.Atthesametimesomeofthememorypagesmightnotbefullyutilizedandmemoryspaceiswasted.

    Forthe8bitand16bitmicrocontrollersthatsupportmemoryofover64kthisoftencomesataprice.TheMSP430Xdesignovercomesthe64KbytesmemorybarrierbyincreasingtheProgramCounter(PC)andregisterwidthto20bits.Despitenomemorypagingbeinginvolved,thesizesofsomeMSP430XinstructionsareconsiderablylargerthantheoriginalMSP430.Forexample,whenthelargememorymodelisused,adoubleoperandformattedinstructioncantake8bytesratherthan6(a33%increases):

  • ARMMicrocontrollerCodeSizeAnalysis|Examples 9

    Op-code

    15 12 11 8

    Rsrc Ad

    7

    B/W

    6

    As

    5

    Rdst

    4 3 0

    Source or destination 15:0

    Destination 15:0

    MSP430 Double Operand intruction

    Op-code

    15 12 11 8

    Rsrc Ad

    7

    B/W

    6

    As

    5

    Rdst

    4 3 0

    Source or destination 15:0

    Destination 15:0

    MSP430X Double Operand

    intruction

    00011 Source 19:16 Destination 19:16A/L Rsrv

    Figure3:SupportoflargermemorysystemincreasesthesizeofsomeinstructionsinMSP430X

    Apartfromthesizeoftheinstructionitself,theuseofthe20bitaddressingalsoincreasesthenumberofstackoperationsrequired.Sincethememoryisonly16bit,thesavingofa20bitaddresspointerwillneedtwostackpushoperations,resultinginextrainstructionsandpoorutilizationofthestackmemory.

    Figure4:UseoflargememorydatamodelinMSP430Xincreasescodesize

    Asaresult,anMSP430Xapplicationhasalowercodedensitywhenthelargememorymodelisused,whichisrequiredwhentheaddressrangeexceedsthe64krange.

    InARMCortexmicrocontrollers,32bitlinearaddressingisusedtoprovide4GBofmemoryspaceforembeddedapplications.Thereforethereisnopagingoverheadandtheprogrammingmodeliseasytouse.

  • ARMMicrocontrollerCodeSizeAnalysis|Examples 10

    ExamplesTodemonstratethecodesizecomparedto8bitand16bitprocessors,anumberoftestcasesarecompiledandillustratedhere.ThetestsarebasedonMSP430CompetitiveBenchmarkdocumentfromTexasinstruments(SLAA205C,reference2).Theresultslistedhereshowtotalprogrammemorysizeinbytes.

    MSP430results:

    ThetestslistedarecompiledusingIAREmbeddedWorkbench4.20.1withhardwaremultiplerenabled,optimizationlevelsettoHighwithSizeoptimization.Unlessspecified,theSmalldatamodelisusedandtypedoubleis32bit.Theresultsareobtainedatlinkeroutputreport(CODE+CONST).

    ARMCortexprocessorresults:

    ThetestslistedarecompiledusingRealViewDevelopmentSuite4.0SP2.Optimizationlevelis3forsize,minimalvectortable,andMicroLIBisused.Theresultsareobtainedatlinkeroutputreport(VECTORS+CODE).

    Test GenericMSP430

    MSP430F5438 MSP430F5438largedatamodel

    CortexM3

    Math8bit 198 198 202 144

    Math16bit 144 144 144 144

    Math32bit 256 244 256 120

    MathFloat 1122 1122 1162 600

    Matrix2dim8bit 180 178 196 184

    Matrix2dim16bit 268 246 290 256

    Matrixmult 276 228 (linkererror) 228

    Switch8bit 200 218 218 160

    Switch16bit 198 218 218 160

    Firfilter(Note1) 1202 1170 1222 716(820withoutmodification)

    Dhry 923 893 1079 900

    Whet(Note2) 6434 6308 6614 4384(8496withoutmodification)

  • ARMMicrocontrollerCodeSizeAnalysis|Examples 11

    Note1:TheconstantdataarrayintheFirfiltertestismodifiedtouse16bitdatatypeontheCortexMprocessor(constunsignedshortintINPUT[]).

    Note2:Whencertainmathfunctionsareused(sin,cos,atan,sqrt,exp,log)intheARMCstandardthedoubleprecisionlibrariesareusedbydefault.Thiscanresultinsignificantlylargerprogramsizeunlessadjustmentsaremade.Inordertoachieveanequivalentcomparison,theprogramcodeiseditedsothatsingleprecisionversionsareused(sinf,cosf,atanf,sqrtd,expf,logf).Also,someoftheconstantdefinitionshavebeenadjustedtosingleprecision(e.g.1.0becomes1.0F).

    Figure5:Codesizecomparisonforbasicoperations

    Thetotalsizeforsimpletests(integermath,matrixandswitchtests)are:

    Summaryforsimpletests

    GenericMSP430 MSP430F5438 CortexM3

    Totalsize(bytes) 1720 1674 1396

    Advantage(%smaller) 2.6% 18.8%

    Forapplicationsusingfloatingpoint,thereusasignicantadvantageforCortexmicrocontrollers.,whereasDhrystoneprogramsizeiscloser.

  • ARMMicrocontrollerCodeSizeAnalysis|Examples 12

    Figure6:Codesizecomparisonforfloatingpointoperationsandbenchmarksuites

    Thetotalsizeforbenchmarkandfloatingpointtests(Dhrystone,Whetstone,FirfilterandMathFloat)are:

    Summaryforsimpletests

    GenericMSP430 MSP430F5438 CortexM3

    Totalsize(bytes) 9681 9493 6600

    Advantage(%smaller) 1.9% 31.8%

    Observations:

    1. Fromtheresults,wecanseethattheCortexmicrocontrollershavebettercodedensitycomparedtoMSP430inmostcases.TheremainingtestsshowsimilarcodedensitywhencomparedtoMSP430.

    2. Oneofthetests(firfilter)usesanintegerdatatypeforaconstantarray.Sinceanintegeris32bitintheARMprocessorandis16bitonMSP430,theprogramhasbeenmodifiedtoallowadirectcomparison.

    3. WhenthelargedatamemorymodelisusedwithMSP430,thecodesizeincreasesbyupto20%(dhrystone).

    4. WeareunabletoreproducealloftheclaimedresultsintheTexasInstrumentsdocument.ThismaybebecausethestorageofconstantdatainROMmighthavebeenomittedfromtheircodesizecalculations.

  • ARMMicrocontrollerCodeSizeAnalysis|Additionalinvestigationonfloatingpoint 13

    AdditionalinvestigationonfloatingpointWhenanalysingtheresultsofthewhetstonebenchmarkitbecameapparentthattheMSP430Ccompileronlygeneratedsingleprecisionfloatingoperations,whiletheARMCcompilergenerateddoubleprecisionoperationsforsomeofthemathfunctionsused.

    AfterchangingthecodetouseonlysingleprecisionfloatingpointsthecodesizereduceddramaticallyandresultedinmuchsmallercodesizethantheMSP430codesize.

    TheIARMSP430compilerhasanoptiontodefinefloatingpoint:Sizeoftypedoublewhichisbydefaultsetto32bit(singleprecision).Ifitissetto64bit(asinARMCcompiler),thecodesizeincreasedsignificantly.

    Programsize GenericMSP430 MSP430430F5438

    TypeDoubleis32bit 6434 6308

    TypeDoubleis64bit 11510 11798

    TheseresultsmatchthoseseenfortheARMCortexM3processor.

    Programsize CortexM3

    Whetstonemodifiedtousesingleprecisiononly 4384

    Outofboxcompileforwhetstone(usedoubleprecisionformathfunctions)

    8496

    Theoptionofsettingtypedoubleto32bitisquitesensibleforsmallmicrocontrollerapplicationswheretheCcodemightonlyneedtoprocesssourcedatageneratedfrom12bit/14bitADC.Benchmarkingusingdifferentdefaulttypescanmakeaverybigdifferenceandnotshowaccuratecomparativeresults.

  • ARMMicrocontrollerCodeSizeAnalysis|RecommendationsonhowtogetthesmallestcodesizewithCortexMmicrocontrollers

    14

    RecommendationsonhowtogetthesmallestcodesizewithCortexMmicrocontrollers

    UseMicroLibIntheARMdevelopmenttoolsthereisanoptiontousetheareaoptimizedMicroLIBratherthanthestandardClibraries.TheMicroLIBissuitableformostembeddedapplicationsandhasamuchsmallercodesizewhencomparedtothestandardClibrary.

    EnsuretheuseofareaoptimizationsTheperformanceofCortexMmicrocontrollersismuchhigherthanthatof16bitand8bitmicrocontrollerssowhenportingapplicationsfromthesemicrocontrollersyoucangenerallyselectthehighestareaoptimizationratherthanselectingoptimizationsforspeed.Theresultingperformancewillstillbemuchhigherthanthatofa16bitor8bitsystemrunningatthesameclockfrequency.

    UsetherightdatatypeWhenportingapplicationsfrom8bitor16bitmicrocontrollers,youmightneedtomodifythedatatypeforconstantarraystoachievethemostoptimalprogramsize.Forexample,anintegerisnormally16bitin8bitand16bitmicrocontrollers,whileinARMmicrocontrollersintegersare32bit.

    Type Numberofbitsin8051

    NumberofbitsinMSP430

    NumberofbitsinARM

    char,unsignedchar 8 8 8

    enum 8/16 16 8/16/32(smallestischosen)

    short,unsignedshort 16 16 16

    int,unsignedint 16 16 32

    long,unsignedlong 32 32 32

    float 32 32 32

    double 32 32 64

    Whenportingaconstantarrayofintegersfroman8bitor16bitarchitecture,youshouldmodifythedatatypefrominttoshortinttomakesuretheconstantarrayremainsthesamesize.Forexample,

    constintmydata={1234,5678,};

    Thisshouldbechangedto:

    constshortintmydata={1234,5678,};

  • ARMMicrocontrollerCodeSizeAnalysis|RecommendationsonhowtogetthesmallestcodesizewithCortexMmicrocontrollers

    15

    Foranarrayofintegervariables(nonconstantdata),changingfromanintegertoashortintegermightalsopreventanincreaseinmemoryusageduringsoftwareporting.Mostotherdata(e.g.variables)doesnotrequiremodification.

    FloatingpointfunctionsSomefloatingpointfunctionsaredefinedassingleprecisionin8bitor16bitmicrocontrollersandarebydefaultdefinedasdoubleprecisioninARMmicrocontrollers,aswehavefoundoutwiththewhetstonetestanalysis.Whenportingapplicationcodefrom8bitor16bitmicrocontrollerstoanARMmicrocontroller,youmighthavetoadjustmathfunctionstosingleprecisionversionsandmodifyconstantdefinitionstoensurethattheprogrambehavesinthesameway.Forexample,inthewhetstoneprogramcode,asectionofcodeusessomemathfunctionsthataredoubleprecisioninARMcompilers:

    X=T*atan(T2*sin(X)*cos(X)/(cos(X+Y)+cos(XY)1.0));

    Y=T*atan(T2*sin(Y)*cos(Y)/(cos(X+Y)+cos(XY)1.0));

    Ifwewanttousesingleprecisiononly,theprogramcodehastobechangedto

    X=T*atanf(T2*sinf(X)*cosf(X)/(cosf(X+Y)+cosf(XY)1.0F));

    Y=T*atanf(T2*sinf(Y)*cosf(Y)/(cosf(X+Y)+cosf(XY)1.0F));

    Otherconstantdefinitionssuchas:

    /*Module7:Procedurecalls*/X=1.0;Y=1.0;Z=1.0;shouldtobechangedtothefollowingforsingleprecisionrepresentation:

    /*Module7:Procedurecalls*/X=1.0F;Y=1.0F;Z=1.0F;

    DefineperipheralsasdatastructureYoucanalsoreduceprogramsizebydefiningregistersinperipheralsasadatastructure.Forexample,insteadofrepresentingtheSysTicktimerregistersas

    #define SYSTICK_CTRL (*((volatile unsigned long *)(0xE000E010))) #define SYSTICK_LOAD (*((volatile unsigned long *)(0xE000E014))) #define SYSTICK_VAL (*((volatile unsigned long *)(0xE000E018))) #define SYSTICK_CALIB (*((volatile unsigned long *)(0xE000E01C)))

  • ARMMicrocontrollerCodeSizeAnalysis|Conclusions 16

    youcandefinetheSysTickregistersas:

    typedef struct { volatile unsigned int CTRL; volatile unsigned int LOAD; volatile unsigned int VAL; unsigned int CALIB; } SysTick_Type; #define SysTick ((SysTick_Type *) 0xE000E010) Bydoingthis,youonlyneedoneaddressconstanttobestoredintheprogramROM.Theregisteraccesseswillbeusingthisaddressconstantwithdifferentaddressoffsetsfordifferentregisters.Ifasequenceofhardwareregisteraccessesisrequiredforaperipheral,usingadatastructurecanreducecodesizeaswellasimproveperformance.Most8bitmicrocontrollersdonothavethesameaddressingmodefeaturewhichcanresultinamuchlargercodesizeforthesametask.

    Conclusions32bitprocessorsprovideequalormoreoftenbettercodesizethan8bitand16bitarchitectureswhilstatthesametimedeliveringmuchbetterperformance.

    Forusersof8bitmicrocontrollers,movingtoa16bitarchitecturecansolvesomeoftheinherentproblemswith8bitarchitectures,however,theoverallbenefitsofmigratingfrom8bitto16bitismuchlessthanthatachievedbymigratingtothe32bitCortexprocessors.

    Asthepowerconsumptionandcostof32bitmicrocontrollershasreduceddramaticallyoverlastfewyears,32bitprocessorshavebecomethebestchoiceformanyembeddedprojects.

    ReferenceThefollowingarticlesonMSP430arereferenced:

    Reference

    1 MSP430CompetitiveBenchmarkinghttp://focus.ti.com/lit/an/slaa205c/slaa205c.pdf

    2 EfficientMultiplicationandDivisionUsingMSP430http://focus.ti.com/lit/an/slaa329/slaa329.pdf