ARM Microcontroller Code Size (Full)

ARMMicrocontrollerCodeSizeAnalysis|Overview 1

32BitMicrocontrollerCodeSizeAnalysisDraft1.2.4.JosephYiu,AndrewFrame

OverviewMicrocontrollerapplicationprogramcodesizecandirectlyaffectthecostandpowerconsumptionofproductsthereforeitisalmostalwaysviewedasanimportantfactorintheselectionofamicrocontrollerforembeddedprojects.Sincethereleaseandavailabilityof32bitprocessorssuchastheARMCortexM3,moreandmoremicrocontrollerusershavediscoveredthebenefitsofswitchingto32bitproductslowerpower,greaterenergyefficiency,smallercodesizeandmuchbetterperformance.Whilstmostofthebenefitsofusing32bitmicrocontrollersarewidelyknown,thecodesizeadvantageof32bitmicrocontrollersislessobvious.

Inthisarticlewewillexplainwhy32bitmicrocontrollerscanreduceapplicationcodesizewhilststillachievinghighsystemperformanceandeaseofuse.

Typicalmythsofprogramsize

Myth#1:8bitand16bitmicrocontrollershavesmallercodesizeThereisacommonmisconceptionthatswitchingfroman8bitmicrocontrollertoa32bitmicrocontrollerwillresultinmuchbiggercodesizewhy?Manypeoplehavetheimpressionthat8bitmicrocontrollersuse8bitinstructionsand32bitmicrocontrollersuse32bitinstructions.Thisimpressionisoftenreinforcedbyslightlymisleadingmarketingfromthe8bitand16bitmicrocontrollervendors.

Inreality,manyinstructionsin8bitmicrocontrollersare16bit,24bitsorothersizeslargerthan8bit,forexample,thePIC18instructionsizesare16bitand,withthe8051architecture,althoughsomeinstructionsare1bytelong,manyothersare2or3byteslong.

Sowouldcodesizebebettermovingtoa16bitmicrocontroller?Notnecessarily.TakingtheMSP430asanexample,asingleoperandinstructioncantake4bytes(32bits)andadoubleoperandinstructioncantake6bytes(48bits).Intheworstcase,anextendedimmediate/indexinstructioninMSP430Xcantake8bytes(64bits).

SohowaboutthecodesizeforARMCortexmicrocontrollers?TheARMCortexM3andCortexM0processorsarebasedonThumb2technology,whichprovidesexcellentcodedensity.Thumb2microcontrollershave16bitinstructionsaswellas32bitinstructions,withthe32bitinstructionfunctionalityasupersetofthe16bitversion.InmostcasesaCcompilerwillusethe16bitversionoftheinstruction.The32bitversionwouldonlybeusedwhentheoperationcannotbeperformed

ARMMicrocontrollerCodeSizeAnalysis|Typicalmythsofprogramsize 2

witha16bitinstruction.Asaresult,mostoftheinstructionsinanARMCortexmicrocontrollerprogramare16bits.Thatsevensmallerthansomeoftheinstructionsin8bitmicrocontrollers.

Instruction size

8051Min

Max

Number of bits

PIC18 MSP430 / MSP430X

Min

Max

ARM

Min

Max

PIC24

16

32

48

64

Figure1:Sizeofasingleinstructioninvariousprocessors

WithinacompiledprogramforCortexMprocessors,thenumberof32bitinstructionscanbeonlyasmallportionofthetotalinstructioncount.Forexample,theamountof32bitinstructionsintheDhrystoneprogramimageisonly15.8%ofthetotalinstructioncount(averageinstructionsizeis18.53bits)whencompiledfortheCortexM3.FortheCortexM0theratioof32bitinstructionsisevenlowerat5.4%(averageinstructionsize16.9bits).

Myth#2:Myapplicationonlyprocesses8bitdataand16bitdataManyembeddeddevelopersthinkthatiftheirapplicationonlyprocesses8bitdatathenthereisnobenefitinswitchingtoa32bitmicrocontroller.However,lookingintotheoutputfromtheCcompilercarefully,inmostcasesthehumbleintegerdatatypeisactually16bits.Sowhenyouhaveaforloopwithanintegerasloopindex,comparingavaluetoanintegervalue,orusingaClibraryfunctionthatusesaninteger(e.g.memcpy()),youareactuallyusing16bitorlargerdata.Thiscanaffectcodesizeandperformanceinvariousways:

Foreachintegercomputation,an8bitprocessorwillneedmultipleinstructionstocarryouttheoperations.Thisdirectlyincreasesthecodesizeandtheclockcyclecount.

Iftheintegervaluehastobesavedintomemory,orifyouneedtoloadanimmediatevaluefromprogramROMtothisinteger,itwilltakemultipleinstructionsandmultipleclockcycles.

Sinceanintegercantakeuptwo8bitregisters,moreregistersarerequiredtoholdthesamenumberofintegervariables.Whenthereareaninsufficientnumberofregistersintheregisterbanktoholdlocalvariables,somehavetobestoredinmemory.Thusan8bitmicrocontrollermightresultinmorememoryaccesseswhichincreasescodesizeandreducesperformanceandpowerefficiency.Thesameissueappliestotheprocessingof32bitdataon16bitmicrocontrollers.


Sincemoreregistersarerequiredtoholdanintegerinan8bitmicrocontrollerwhenpassingvariablestoafunctionviathestack,orsavingregistercontentsduringcontextswitchingorinterruptservicing,thenumberofstackoperationsrequiredismorethanthatof32bitmicrocontrollers.Thisincreasestheprogramsize,andcanalsoaffectinterruptlatencybecauseanInterruptServiceRoutine(ISR)mustmakesurethatallregistersusedaresavedatISRentryandrestoredatISRexit.Thesameissueappliestotheprocessingof32bitdataon16bitmicrocontrollers.

Thereisevenmorebadnewsfor8bitmicrocontrollerusers:memoryaddresspointerstakemultiplebytessodataprocessinginvolvingtheuseofpointerscanthereforebeextremelyinefficient.

Myth#3:A32bitprocessorisnotefficientathandling8bitand16bitdataMost32bitprocessorsareactuallyveryefficientathandling8bitand16bitdata.Compactmemoryaccessinstructionsforsignedandunsigned8bit,16bitand32bitdataareallavailable.Therearealsoanumberofinstructionsspeciallyincludedfordatatypeconversions.Overallthehandlingof8bitand16bitdatain32bitprocessorssuchastheARMCortexmicrocontrollersisjustaseasyandefficientashandling32bitdata.

Myth#4:ClibrariesforARMprocessorsaretoobigTherearevariousClibraryoptionsforARMprocessors.Formicrocontrollerapplications,anumberofcompilervendorshavedevelopedClibrarieswithamuchsmallerfootprint.Forexample,theARMdevelopmenttoolshaveasmallerversionoftheClibrarycalledMicroLib.TheseClibrariesareespeciallydesignedformicrocontrollersandallowapplicationcodesizetobesmallandefficient.

Myth#5:InterrupthandlingonARMmicrocontrollersismorecomplexOntheARMCortexmicrocontrollerstheinterruptserviceroutinesarejustnormalCsubroutines.VectoredornestedinterruptsaresupportedbytheNestedVectoredInterruptController(NVIC)withnoneedforsoftwareintervention.Infactthesetupprocessandprocessingofaninterruptrequestismuchsimplerthan8bitand16bitmicrocontrollers,asgenerallyyouonlyneedtoprogramtheprioritylevelofaninterruptandthenenableit.

Theinterruptvectorsarestoredinavectortableinthebeginningofthememory,normallywithintheflash,withouttheneedforanysoftwareprogrammingsteps.WhenaninterruptrequesttakesplacetheprocessorautomaticallyfetchesthecorrespondinginterruptvectorandstartstoexecutetheISR.Someoftheregistersarepushedtothestackbyahardwaresequenceandrestoredautomaticallywhentheinterrupthandlerexits.TheotherregistersthatarenotcoveredbythehardwarestackingsequencearepushedontothestackbyCcompilergeneratedcodeonlyiftheregisterisusedandmodifiedwithintheISR.


Whataboutmovingto16bitmicrocontrollers?16bitmicrocontrollerscanbeefficientinhandling16bitintegersand8bitdata(e.g.strings)howeverthecodesizeisstillnotasoptimalasusing32bitprocessors:

Handlingof32bitdata:iftheapplicationrequireshandlingofanylonginteger(32bit)orfloatingpointtypesthentheefficiencyof16bitprocessorsisgreatlyreducedbecausemultipleinstructionsarerequiredforeachprocessingoperation,aswellasdatatransfersbetweentheprocessorandthememory.

Registerusage:Whenprocessing32bitdata,16bitprocessorsrequirestworegisterstoholdeach32bitvariable.Thisreducesthenumberofvariablesthatcanbeheldintheregisterbank,hencereducingprocessingspeedaswellasincreasingstackoperationsandmemoryaccesses.

Memoryaddressingmode:Many16bitarchitecturesprovideonlybasicaddressingmodessimilarto8bitarchitectures.Asaresult,thecodedensityispoorwhentheyareusedinapplicationsthatrequireprocessingofcomplexdatasets.

64Kbyteslimitation:Many16bitprocessorsarelimitedto64Kbytesofaddressablememoryreducingthefunctionalityoftheapplication.Some16bitarchitectureshaveextensionstoallowmorethan64Kbytesofmemorytobeaccessed,however,theseextensionshaveaninstructioncodeandclockcycleoverhead,forexample,amemorypointerwouldbelargerthan16bitsandmightrequiremultipleinstructionsandmultipleregisterstoprocessit.

ARMMicrocontrollerCodeSizeAnalysis|InstructionSetefficiency 5

InstructionSetefficiencyWhencustomersporttheirapplicationsfrom8bitarchitecturetoARMCortexmicrocontrollers,theyveryoftenfindthatthetotalcodehasdramaticallydecreased.Forexample,whenMelfas(aleadingcompanyincapacitivesensingtouchscreencontrollers)evaluatedtheCortexM0processor,theyfoundthattheCortexM0programsizewaslessthanhalfofthatofthe8051and,atthesametime,deliveredfivetimesmoreperformanceatthesameclockfrequency.This,forexample,couldenablethemtoruntheapplicationat1/5clockspeedoftheequivalent8051product,reducingthepowerconsumption,andloweringproductcostatthesametimeduetoasmallerprogramflashsizerequirements.

SohowdoesARMarchitectureprovidesuchbigadvantages?ThekeyfactorisThumb2technologywhichprovidesahighlyefficientunifiedinstructionset.

PowerfulAddressingmodeTheARMCortexmicrocontrollerssupportanumberofaddressingmodesformemorytransferinstructions.Forexample:

Immediateoffset(Address=Registervalue+offset)

Registeroffset((Address=Registervalue1+shifted(Registervalue2))

PCrelated(Address=CurrentPCvalue+offset)

Stackpointerrelated(Address=SP+offset)

Multipleregisterloadandstore,withoptionalautomaticbaseaddressupdate

PUSH/POPinstructionswithmultipleregisters

Asaresultofthesevariousaddressingmodes,datatransferbetweenregistersandmemorycanbehandledwithfewerinstructions.SincethePUSHandPOPinstructionssupportmultipleregisters,inmostcases,savingandrestoringofregistersinafunctioncallwillonlyneedonePUSHinthebeginningoffunctionandonePOPattheendofthefunction.ThePOPcanevenbecombinedwiththereturninstructionattheendoffunctiontofurtherreducetheinstructioncount.

ConditionalbranchesAlmostallprocessorsprovideconditionalbranchinstructionshoweverARMprocessorsprovideimprovedconditionalbranchingbyhavingseparatedbranchconditionsforsignedandunsigneddataoperationresults,andprovidingagoodbranchrange.

Forexample,whencomparingtheconditionalbranchesoftheCortexM0andMSP430,theCortexM0hasmorebranchconditionsavailable,makingitpossibletogeneratemorecompactcodenomatterwhetherthedatabeingprocessissignedorunsigned.TheMSP430conditionalbranchesmightrequiremultipleinstructionstogetthesameoperations.


Generallythesamesituationappliestomany8bitor16bitmicrocontrollerswhendealingwithsigneddata,additionalstepsmightalsoberequiredintheconditionalbranch.

InadditiontothebranchinstructionsavailableintheCortexM0,theCortexM3processoralsosupportscompareandbranchinstructions(CBZandCBNZ).Thisfurthersimplifiessomeoftheconditionalbranchinstructionsequence.

ConditionalExecutionAnotherareathatallowstheARMCortexM3microcontrollerstohavemorecompactcodeistheconditionalexecutionfeature.TheCortexM3supportsaninstructioncalledIT(IFTHEN).Thisinstructionallowsupto4subsequentinstructionstobeconditionallyexecutedreducingtheneedforadditionalbranches.Forexample,

if(xpos1


MOV.W R11, R14 MOV.W R12, R13 JMP Label2 Label1 MOV.W R11, R13 MOV.W R12, R14 Label2 ThisresultsinanextratwobytesfortheMSP430whencomparedtoCortexM3.

MultiplyandDivideBoththeCortexM0andCortexM3processorssupportsinglecyclemultiplyoperations.TheCortexM3alsohasmultiplyandmultiplyaccumulateinstructionsfor32bitor64bitresults.Theseinstructionsgreatlyreducethecodesizerequiredwhenhandlingmultiplicationoflargevariables.

Mostother8bitand16bitmicrocontrollersalsohavemultiplyinstructionshoweverthelimitationoftheregistersizeoftenmeansthatthemultiplicationrequiresmultiplesteps,iftheresultneedstobemorethan8or16bits.

TheMSP430doesnothavemultiplyinstruction(MSP430documentslaa329,reference1).Tocarryoutmultiplicationeitheramemorymappedhardwaremultiplierisused,orthemultiplyoperationhastobehandledbysoftwareusingaddandshift.Evenifahardwaremultiplierispresentthememorymappednatureofthemultiplierresultsintheadditionaloverheadoftransferringdatatoandfromtheexternalhardware.Inaddition,usingthemultiplierwithinaninterrupthandlercouldcauseexistingdatainthemultipliertobelost.Asaresult,interruptsareusuallydisabledbeforeamultiplyoperationandtheinterruptisreenabledaftermultiplicationiscompleted.Thisaddsadditionalsoftwareoverheadandaffectsinterruptlatencyanddeterminism.

TheCortexM3processoralsohasunsignedandsignedintegerdivideinstructions.ThisreducesthecodesizerequiredinapplicationsthatneedtoperformintegerdivisionbecausethereisnoneedfortheClibrarytoincludeafunctionforhandlingdivideoperations.

PowerfulinstructionsetInadditionaltothestandarddataprocessing,memoryaccessandprogramcontrolinstructions,theCortexmicrocontrollersalsosupportanumberofotherinstructionstohelpdatatypeconversion.TheCortexM3processoralsosupportsanumberofbitfieldoperationsreducingthesoftwareoverheadin,forexample,peripheralcontrolandcommunicationdataprocessing.

ARMMicrocontrollerCodeSizeAnalysis|Breakingthe64Kbytememorybarrier 8

Breakingthe64KbytememorybarrierAsalreadymentioned,many8bitand16bitmicrocontrollersarelimitedto64kbytesaddressablememory.Duetothenatureof8bitand16bitmicrocontrollerarchitecture,thecodingefficiencyofthesemicrocontrollersoftendecreasesdramaticallywhentheapplicationexceedsthe64kbytememorybarrier.In8bitand16bitmicrocontrollers(e.g.8051,PIC24,C166)thisisoftenhandledbymemorybankswitchingormemorysegmentationwiththeswitchingcodegeneratedautomaticallybytheCcompilers.Everytimeafunctionordatainadifferentmemorypageisrequiredbankswitchingcodewouldbeneededandhencefurtherincreasestheprogramsize.

Figure2:Increasecodesizeoverheadofmemorybankswitchingorsegmentationin8bitand16bitsystems

Thememorybankswitchingnotonlycreateslargercodebutitalsogreatlyreducestheperformanceofasystem.Thisisespeciallythecaseifthedatabeingprocessedisondifferentmemorybank(e.g.copyingablockofdatafromonepagetoanotherpagecanbeverycostlyintermsofperformance.)Thisisparticularlyinefficientfor8bitmicrocontrollerslikethe8051becausetheMCS51

architecturedoesnothavepropersupportforsuchamemorybankswitchingfeature.ThereforememoryswitchinghastobecarriedoutbysavingandupdatingmemorybankcontrollikeI/Oportregisters.Inaddition,thememorypageswitchingcodeusuallyhastobecarriedoutinacongestedsharedmemoryspacewithlimitedsize.Atthesametimesomeofthememorypagesmightnotbefullyutilizedandmemoryspaceiswasted.

Forthe8bitand16bitmicrocontrollersthatsupportmemoryofover64kthisoftencomesataprice.TheMSP430Xdesignovercomesthe64KbytesmemorybarrierbyincreasingtheProgramCounter(PC)andregisterwidthto20bits.Despitenomemorypagingbeinginvolved,thesizesofsomeMSP430XinstructionsareconsiderablylargerthantheoriginalMSP430.Forexample,whenthelargememorymodelisused,adoubleoperandformattedinstructioncantake8bytesratherthan6(a33%increases):

ARMMicrocontrollerCodeSizeAnalysis|Examples 9

Op-code

15 12 11 8

Rsrc Ad

7

B/W

6

As

5

Rdst

4 3 0

Source or destination 15:0

Destination 15:0

MSP430 Double Operand intruction

Op-code

15 12 11 8

Rsrc Ad

7

B/W

6

As

5

Rdst

4 3 0

Source or destination 15:0

Destination 15:0

MSP430X Double Operand

intruction

00011 Source 19:16 Destination 19:16A/L Rsrv

Figure3:SupportoflargermemorysystemincreasesthesizeofsomeinstructionsinMSP430X

Apartfromthesizeoftheinstructionitself,theuseofthe20bitaddressingalsoincreasesthenumberofstackoperationsrequired.Sincethememoryisonly16bit,thesavingofa20bitaddresspointerwillneedtwostackpushoperations,resultinginextrainstructionsandpoorutilizationofthestackmemory.

Figure4:UseoflargememorydatamodelinMSP430Xincreasescodesize

Asaresult,anMSP430Xapplicationhasalowercodedensitywhenthelargememorymodelisused,whichisrequiredwhentheaddressrangeexceedsthe64krange.

InARMCortexmicrocontrollers,32bitlinearaddressingisusedtoprovide4GBofmemoryspaceforembeddedapplications.Thereforethereisnopagingoverheadandtheprogrammingmodeliseasytouse.


ExamplesTodemonstratethecodesizecomparedto8bitand16bitprocessors,anumberoftestcasesarecompiledandillustratedhere.ThetestsarebasedonMSP430CompetitiveBenchmarkdocumentfromTexasinstruments(SLAA205C,reference2).Theresultslistedhereshowtotalprogrammemorysizeinbytes.

MSP430results:

ThetestslistedarecompiledusingIAREmbeddedWorkbench4.20.1withhardwaremultiplerenabled,optimizationlevelsettoHighwithSizeoptimization.Unlessspecified,theSmalldatamodelisusedandtypedoubleis32bit.Theresultsareobtainedatlinkeroutputreport(CODE+CONST).

ARMCortexprocessorresults:

ThetestslistedarecompiledusingRealViewDevelopmentSuite4.0SP2.Optimizationlevelis3forsize,minimalvectortable,andMicroLIBisused.Theresultsareobtainedatlinkeroutputreport(VECTORS+CODE).

Test GenericMSP430

MSP430F5438 MSP430F5438largedatamodel

CortexM3

Math8bit 198 198 202 144

Math16bit 144 144 144 144

Math32bit 256 244 256 120

MathFloat 1122 1122 1162 600

Matrix2dim8bit 180 178 196 184

Matrix2dim16bit 268 246 290 256

Matrixmult 276 228 (linkererror) 228

Switch8bit 200 218 218 160

Switch16bit 198 218 218 160

Firfilter(Note1) 1202 1170 1222 716(820withoutmodification)

Dhry 923 893 1079 900

Whet(Note2) 6434 6308 6614 4384(8496withoutmodification)


Note1:TheconstantdataarrayintheFirfiltertestismodifiedtouse16bitdatatypeontheCortexMprocessor(constunsignedshortintINPUT[]).

Note2:Whencertainmathfunctionsareused(sin,cos,atan,sqrt,exp,log)intheARMCstandardthedoubleprecisionlibrariesareusedbydefault.Thiscanresultinsignificantlylargerprogramsizeunlessadjustmentsaremade.Inordertoachieveanequivalentcomparison,theprogramcodeiseditedsothatsingleprecisionversionsareused(sinf,cosf,atanf,sqrtd,expf,logf).Also,someoftheconstantdefinitionshavebeenadjustedtosingleprecision(e.g.1.0becomes1.0F).

Figure5:Codesizecomparisonforbasicoperations

Thetotalsizeforsimpletests(integermath,matrixandswitchtests)are:

Summaryforsimpletests

GenericMSP430 MSP430F5438 CortexM3

Totalsize(bytes) 1720 1674 1396

Advantage(%smaller) 2.6% 18.8%

Forapplicationsusingfloatingpoint,thereusasignicantadvantageforCortexmicrocontrollers.,whereasDhrystoneprogramsizeiscloser.


Figure6:Codesizecomparisonforfloatingpointoperationsandbenchmarksuites

Thetotalsizeforbenchmarkandfloatingpointtests(Dhrystone,Whetstone,FirfilterandMathFloat)are:

Summaryforsimpletests

GenericMSP430 MSP430F5438 CortexM3

Totalsize(bytes) 9681 9493 6600

Advantage(%smaller) 1.9% 31.8%

Observations:

1. Fromtheresults,wecanseethattheCortexmicrocontrollershavebettercodedensitycomparedtoMSP430inmostcases.TheremainingtestsshowsimilarcodedensitywhencomparedtoMSP430.

2. Oneofthetests(firfilter)usesanintegerdatatypeforaconstantarray.Sinceanintegeris32bitintheARMprocessorandis16bitonMSP430,theprogramhasbeenmodifiedtoallowadirectcomparison.

3. WhenthelargedatamemorymodelisusedwithMSP430,thecodesizeincreasesbyupto20%(dhrystone).

4. WeareunabletoreproducealloftheclaimedresultsintheTexasInstrumentsdocument.ThismaybebecausethestorageofconstantdatainROMmighthavebeenomittedfromtheircodesizecalculations.

ARMMicrocontrollerCodeSizeAnalysis|Additionalinvestigationonfloatingpoint 13

AdditionalinvestigationonfloatingpointWhenanalysingtheresultsofthewhetstonebenchmarkitbecameapparentthattheMSP430Ccompileronlygeneratedsingleprecisionfloatingoperations,whiletheARMCcompilergenerateddoubleprecisionoperationsforsomeofthemathfunctionsused.

AfterchangingthecodetouseonlysingleprecisionfloatingpointsthecodesizereduceddramaticallyandresultedinmuchsmallercodesizethantheMSP430codesize.

TheIARMSP430compilerhasanoptiontodefinefloatingpoint:Sizeoftypedoublewhichisbydefaultsetto32bit(singleprecision).Ifitissetto64bit(asinARMCcompiler),thecodesizeincreasedsignificantly.

Programsize GenericMSP430 MSP430430F5438

TypeDoubleis32bit 6434 6308

TypeDoubleis64bit 11510 11798

TheseresultsmatchthoseseenfortheARMCortexM3processor.

Programsize CortexM3

Whetstonemodifiedtousesingleprecisiononly 4384

Outofboxcompileforwhetstone(usedoubleprecisionformathfunctions)

8496

Theoptionofsettingtypedoubleto32bitisquitesensibleforsmallmicrocontrollerapplicationswheretheCcodemightonlyneedtoprocesssourcedatageneratedfrom12bit/14bitADC.Benchmarkingusingdifferentdefaulttypescanmakeaverybigdifferenceandnotshowaccuratecomparativeresults.

ARMMicrocontrollerCodeSizeAnalysis|RecommendationsonhowtogetthesmallestcodesizewithCortexMmicrocontrollers

14

RecommendationsonhowtogetthesmallestcodesizewithCortexMmicrocontrollers

UseMicroLibIntheARMdevelopmenttoolsthereisanoptiontousetheareaoptimizedMicroLIBratherthanthestandardClibraries.TheMicroLIBissuitableformostembeddedapplicationsandhasamuchsmallercodesizewhencomparedtothestandardClibrary.

EnsuretheuseofareaoptimizationsTheperformanceofCortexMmicrocontrollersismuchhigherthanthatof16bitand8bitmicrocontrollerssowhenportingapplicationsfromthesemicrocontrollersyoucangenerallyselectthehighestareaoptimizationratherthanselectingoptimizationsforspeed.Theresultingperformancewillstillbemuchhigherthanthatofa16bitor8bitsystemrunningatthesameclockfrequency.

UsetherightdatatypeWhenportingapplicationsfrom8bitor16bitmicrocontrollers,youmightneedtomodifythedatatypeforconstantarraystoachievethemostoptimalprogramsize.Forexample,anintegerisnormally16bitin8bitand16bitmicrocontrollers,whileinARMmicrocontrollersintegersare32bit.

Type Numberofbitsin8051

NumberofbitsinMSP430

NumberofbitsinARM

char,unsignedchar 8 8 8

enum 8/16 16 8/16/32(smallestischosen)

short,unsignedshort 16 16 16

int,unsignedint 16 16 32

long,unsignedlong 32 32 32

float 32 32 32

double 32 32 64

Whenportingaconstantarrayofintegersfroman8bitor16bitarchitecture,youshouldmodifythedatatypefrominttoshortinttomakesuretheconstantarrayremainsthesamesize.Forexample,

constintmydata={1234,5678,};

Thisshouldbechangedto:

constshortintmydata={1234,5678,};

ARMMicrocontrollerCodeSizeAnalysis|RecommendationsonhowtogetthesmallestcodesizewithCortexMmicrocontrollers

15

Foranarrayofintegervariables(nonconstantdata),changingfromanintegertoashortintegermightalsopreventanincreaseinmemoryusageduringsoftwareporting.Mostotherdata(e.g.variables)doesnotrequiremodification.

FloatingpointfunctionsSomefloatingpointfunctionsaredefinedassingleprecisionin8bitor16bitmicrocontrollersandarebydefaultdefinedasdoubleprecisioninARMmicrocontrollers,aswehavefoundoutwiththewhetstonetestanalysis.Whenportingapplicationcodefrom8bitor16bitmicrocontrollerstoanARMmicrocontroller,youmighthavetoadjustmathfunctionstosingleprecisionversionsandmodifyconstantdefinitionstoensurethattheprogrambehavesinthesameway.Forexample,inthewhetstoneprogramcode,asectionofcodeusessomemathfunctionsthataredoubleprecisioninARMcompilers:

X=T*atan(T2*sin(X)*cos(X)/(cos(X+Y)+cos(XY)1.0));

Y=T*atan(T2*sin(Y)*cos(Y)/(cos(X+Y)+cos(XY)1.0));

Ifwewanttousesingleprecisiononly,theprogramcodehastobechangedto

X=T*atanf(T2*sinf(X)*cosf(X)/(cosf(X+Y)+cosf(XY)1.0F));

Y=T*atanf(T2*sinf(Y)*cosf(Y)/(cosf(X+Y)+cosf(XY)1.0F));

Otherconstantdefinitionssuchas:

/*Module7:Procedurecalls*/X=1.0;Y=1.0;Z=1.0;shouldtobechangedtothefollowingforsingleprecisionrepresentation:

/*Module7:Procedurecalls*/X=1.0F;Y=1.0F;Z=1.0F;

DefineperipheralsasdatastructureYoucanalsoreduceprogramsizebydefiningregistersinperipheralsasadatastructure.Forexample,insteadofrepresentingtheSysTicktimerregistersas

#define SYSTICK_CTRL (*((volatile unsigned long *)(0xE000E010))) #define SYSTICK_LOAD (*((volatile unsigned long *)(0xE000E014))) #define SYSTICK_VAL (*((volatile unsigned long *)(0xE000E018))) #define SYSTICK_CALIB (*((volatile unsigned long *)(0xE000E01C)))

ARMMicrocontrollerCodeSizeAnalysis|Conclusions 16

youcandefinetheSysTickregistersas:

typedef struct { volatile unsigned int CTRL; volatile unsigned int LOAD; volatile unsigned int VAL; unsigned int CALIB; } SysTick_Type; #define SysTick ((SysTick_Type *) 0xE000E010) Bydoingthis,youonlyneedoneaddressconstanttobestoredintheprogramROM.Theregisteraccesseswillbeusingthisaddressconstantwithdifferentaddressoffsetsfordifferentregisters.Ifasequenceofhardwareregisteraccessesisrequiredforaperipheral,usingadatastructurecanreducecodesizeaswellasimproveperformance.Most8bitmicrocontrollersdonothavethesameaddressingmodefeaturewhichcanresultinamuchlargercodesizeforthesametask.

Conclusions32bitprocessorsprovideequalormoreoftenbettercodesizethan8bitand16bitarchitectureswhilstatthesametimedeliveringmuchbetterperformance.

Forusersof8bitmicrocontrollers,movingtoa16bitarchitecturecansolvesomeoftheinherentproblemswith8bitarchitectures,however,theoverallbenefitsofmigratingfrom8bitto16bitismuchlessthanthatachievedbymigratingtothe32bitCortexprocessors.

Asthepowerconsumptionandcostof32bitmicrocontrollershasreduceddramaticallyoverlastfewyears,32bitprocessorshavebecomethebestchoiceformanyembeddedprojects.

ReferenceThefollowingarticlesonMSP430arereferenced:

Reference

1 MSP430CompetitiveBenchmarkinghttp://focus.ti.com/lit/an/slaa205c/slaa205c.pdf

2 EfficientMultiplicationandDivisionUsingMSP430http://focus.ti.com/lit/an/slaa329/slaa329.pdf

ARM Microcontroller Code Size (Full)

Documents

Transcript of ARM Microcontroller Code Size (Full)