Learn Python Programming: A beginner's guide to learning the fundamentals of Python ... ·...

Post on 04-Aug-2020

45 views 1 download

Transcript of Learn Python Programming: A beginner's guide to learning the fundamentals of Python ... ·...

LearnPythonProgrammingSecondEdition

Abeginner'sguidetolearningthefundamentalsofPythonlanguagetowriteefficient,high-qualitycode

FabrizioRomano

BIRMINGHAM-MUMBAI

LearnPythonProgrammingSecondEditionCopyright©2018PacktPublishing

Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.

Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthor(s),norPacktPublishingoritsdealersanddistributors,willbeheldliableforanydamagescausedorallegedtohavebeencauseddirectlyorindirectlybythisbook.

PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.

CommissioningEditor:RichaTripathiAcquisitionEditor:KaranSadawanaContentDevelopmentEditor:RohitSinghTechnicalEditor:RomyDiasCopyEditor:SafisEditingProjectCoordinator:VaidehiSawantProofreader:SafisEditingIndexer:MariammalChettiyarGraphics:JasonMonteiroProductionCoordinator:ShantanuZagade

Firstpublished:December2015Secondedition:June2018

Productionreference:1280618

PublishedbyPacktPublishingLtd.LiveryPlace35LiveryStreetBirminghamB32PB,UK.

ISBN978-1-78899-666-2

www.packtpub.com

Tomydeardearfriendandmentor,TorstenAlexanderLange.Thankyouforalltheloveandsupport.

mapt.io

Maptisanonlinedigitallibrarythatgivesyoufullaccesstoover5,000booksandvideos,aswellasindustryleadingtoolstohelpyouplanyourpersonaldevelopmentandadvanceyourcareer.Formoreinformation,pleasevisitourwebsite.

Whysubscribe?SpendlesstimelearningandmoretimecodingwithpracticaleBooksandVideosfromover4,000industryprofessionals

ImproveyourlearningwithSkillPlansbuiltespeciallyforyou

GetafreeeBookorvideoeverymonth

Maptisfullysearchable

Copyandpaste,print,andbookmarkcontent

PacktPub.comDidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusatservice@packtpub.comformoredetails.

Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewsletters,andreceiveexclusivediscountsandoffersonPacktbooksandeBooks.

ForewordIfirstgottoknowFabriziowhenhebecameourleaddeveloperafewyearsago.Itwasquicklyapparentthathewasoneofthoserarepeoplewhocombinerigoroustechnicalexpertisewithagenuinecareforthepeoplearoundhimandatruepassiontomentorandteach.Whetheritwasdesigningasystem,pairingtowritecode,doingcodereviews,orevenorganizingteamcardgamesatlunch,Fabwasalwaysthinkingnotonlyaboutthebestwaytodothejob,butalsoabouthowtomakesurethattheentireteamhadtheskillsandmotivationtodotheirbest.

You'llmeetthesamewiseandcaringguideinthisbook.Everychapter,everyexample,everyexplanationhasbeencarefullythoughtout,drivenbyadesiretoimpartthebestandmostaccurateunderstandingofthetechnology,andtodoitwithkindness.FabtakesyouunderhiswingtoteachyoubothPython'ssyntaxanditsbestpractices.

I'malsoimpressedwiththescopeofthisbook.Pythonhasgrownandevolvedovertheyears,anditnowspansanenormousecosystem,beingusedforwebdevelopment,routinedatahandling,andETL,andincreasinglyfordatascience.IfyouarenewtothePythonecosystem,it'softenhardtoknowwhattostudytoachieveyourgoals.Inthisbook,youwillfindusefulexamplesexposingyoutomanydifferentusesofPython,whichwillhelpguideyouasyoumovethroughthebreadththatPythonoffers.

IhopeyouwillenjoylearningPythonandbecomeamemberofourglobalcommunity.I'mproudtohavebeenaskedtowritethis,butaboveall,I'mpleasedthatFabwillbeyourguide.

NaomiCeder

PythonSoftwareFoundationFellow

Contributors

AbouttheauthorFabrizioRomanowasborninItalyin1975.Heholdsamaster'sdegreeincomputerscienceengineeringfromtheUniversityofPadova.Heisalsoacertifiedscrummaster,Reikimasterandteacher,andamemberofCNHC.

HemovedtoLondonin2011toworkforcompaniessuchasGlassesDirect,TBG/Sprinklr,andstudent.com.HenowworksatSohonetasaPrincipalEngineer/TeamLead.

HehasgiventalksonTeachingPythonandTDDattwoeditionsofEuroPython,andatSkillsmatterandProgSCon,inLondon.

I'mgratefultoallthosewhohelpedmecreatethisbook.SpecialthankstoDr.NaomiCederforwritingtheforewordtothisedition,andtoHeinrichKrugerandJulioTrigoforreviewingthisvolume.Tomyfriendsandfamily,wholovemeandsupportmeeveryday,thankyou.AndtoPetraLange,foralwaysbeingsolovelytome,thankyou.

AboutthereviewersHeinrichKrugerwasborninSouthAfricain1981.Heobtainedabachelor'sdegreewithhonorsfromtheUniversityoftheWitwatersrandinSouthAfricain2005andamaster'sdegreeincomputersciencefromUtrechtUniversityintheNetherlandsin2008.

HeworkedasaresearchassistantatUtrechtUniversityfrom2009until2013andhasbeenworkingasaprofessionalsoftwaredeveloperdevelopersince2014.HehasbeenusingPythonforpersonalandprojectsandinhisstudiessince2004,andprofessionallysince2014.

JulioVicenteTrigoGuijarroisacomputerscientistandsoftwareengineerwithoveradecadeofexperienceinsoftwaredevelopment.HecompletedhisstudiesattheUniversityofAlicante,Spain,in2007.Hehasworkedwithseveraltechnologiesandlanguages,includingMicrosoftDynamicsNAV,Java,JavaScript,andPython.HeisacertifiedScrumMaster.HehasbeenusingPythonsince2012,andheispassionateaboutsoftwaredesign,quality,andcodingstandards.HecurrentlyworksasseniorsoftwaredeveloperandteamleadatSohonet,developingreal-timecollaborationapplications.

Iwouldliketothankmyparentsfortheirlove,goodadvice,andcontinuoussupport.IwouldalsoliketothankallthefriendsIhavemetalongtheway,whoenrichedmylife,forkeepingupmymotivation,andmakemeprogress.

PacktissearchingforauthorslikeyouIfyou'reinterestedinbecominganauthorforPackt,pleasevisitauthors.packtpub.comandapplytoday.Wehaveworkedwiththousandsofdevelopersandtechprofessionals,justlikeyou,tohelpthemsharetheirinsightwiththeglobaltechcommunity.Youcanmakeageneralapplication,applyforaspecifichottopicthatwearerecruitinganauthorfor,orsubmityourownidea.

TableofContents

TitlePage

CopyrightandCredits

LearnPythonProgramming

SecondEdition

Dedication

PacktUpsell

Whysubscribe?

PacktPub.com

Foreword

Contributors

Abouttheauthor

Aboutthereviewers

Packtissearchingforauthorslikeyou

Preface

Whothisbookisfor

Whatthisbookcovers

Togetthemostoutofthisbook

Downloadtheexamplecodefiles

Conventionsused

Getintouch

Reviews

1. AGentleIntroductiontoPython

Aproperintroduction

EnterthePython

AboutPython

Portability

Coherence

Developerproductivity

Anextensivelibrary

Softwarequality

Softwareintegration

Satisfactionandenjoyment

Whatarethedrawbacks?

WhoisusingPythontoday?

Settinguptheenvironment

Python2versusPython3

InstallingPython

SettingupthePythoninterpreter

Aboutvirtualenv

Yourfirstvirtualenvironment

Yourfriend,theconsole

HowyoucanrunaPythonprogram

RunningPythonscripts

RunningthePythoninteractiveshell

RunningPythonasaservice

RunningPythonasaGUIapplication

HowisPythoncodeorganized?

Howdoweusemodulesandpackages?

Python'sexecutionmodel

Namesandnamespaces

Scopes

Objectsandclasses

Guidelinesonhowtowritegoodcode

ThePythonculture

AnoteonIDEs

Summary

2. Built-inDataTypes

Everythingisanobject

Mutableorimmutable?Thatisthequestion

Numbers

Integers

Booleans

Realnumbers

Complexnumbers

Fractionsanddecimals

Immutablesequences

Stringsandbytes

Encodinganddecodingstrings

Indexingandslicingstrings

Stringformatting

Tuples

Mutablesequences

Lists

Bytearrays

Settypes

Mappingtypes – dictionaries

Thecollectionsmodule

namedtuple

defaultdict

ChainMap

Enums

Finalconsiderations

Smallvaluescaching

Howtochoosedatastructures

Aboutindexingandslicing

Aboutthenames

Summary

3. IteratingandMakingDecisions

Conditionalprogramming

Aspecializedelse –elif

Theternaryoperator

Looping

Theforloop

Iteratingoverarange

Iteratingoverasequence

Iteratorsanditerables

Iteratingovermultiplesequences

Thewhileloop

Thebreakandcontinuestatements

Aspecialelseclause

Puttingallthistogether

Aprimegenerator

Applyingdiscounts

Aquickpeekattheitertoolsmodule

Infiniteiterators

Iteratorsterminatingontheshortestinputsequence

Combinatoricgenerators

Summary

4. Functions,theBuildingBlocksofCode

Whyusefunctions?

Reducingcodeduplication

Splittingacomplextask

Hidingimplementationdetails

Improvingreadability

Improvingtraceability

Scopesandnameresolution

Theglobalandnonlocalstatements

Inputparameters

Argument-passing

Assignmenttoargumentnamesdoesn'taffectthecaller

Changingamutableaffectsthecaller

Howtospecifyinputparameters

Positionalarguments

Keywordargumentsanddefaultvalues

Variablepositionalarguments

Variablekeywordarguments

Keyword-onlyarguments

Combininginputparameters

Additionalunpackinggeneralizations

Avoidthetrap!Mutabledefaults

Returnvalues

Returningmultiplevalues

Afewusefultips

Recursivefunctions

Anonymousfunctions

Functionattributes

Built-infunctions

Onefinalexample

Documentingyourcode

Importingobjects

Relativeimports

Summary

5. SavingTimeandMemory

Themap,zip,andfilterfunctions

map

zip

filter

Comprehensions

Nestedcomprehensions

Filteringacomprehension

dictcomprehensions

setcomprehensions

Generators

Generatorfunctions

Goingbeyondnext

Theyieldfromexpression

Generatorexpressions

Someperformanceconsiderations

Don'toverdocomprehensionsandgenerators

Namelocalization

Generationbehaviorinbuilt-ins

Onelastexample

Summary

6. OOP,Decorators,andIterators

Decorators

Adecoratorfactory

Object-orientedprogramming(OOP)

ThesimplestPythonclass

Classandobjectnamespaces

Attributeshadowing

Me,myself,andI – usingtheselfvariable

Initializinganinstance

OOPisaboutcodereuse

Inheritanceandcomposition

Accessingabaseclass

Multipleinheritance

Methodresolutionorder

Classandstaticmethods

Staticmethods

Classmethods

Privatemethodsandnamemangling

Thepropertydecorator

Operatoroverloading

Polymorphism –abriefoverview

Dataclasses

Writingacustomiterator

Summary

7. FilesandDataPersistence

Workingwithfilesanddirectories

Openingfiles

Usingacontextmanagertoopenafile

Readingandwritingtoafile

Readingandwritinginbinarymode

Protectingagainstoverridinganexistingfile

Checkingforfileanddirectoryexistence

Manipulatingfilesanddirectories

Manipulatingpathnames

Temporaryfilesanddirectories

Directorycontent

Fileanddirectorycompression

Datainterchangeformats

WorkingwithJSON

Customencoding/decodingwithJSON

IO,streams,andrequests

Usinganin-memorystream

MakingHTTPrequests

Persistingdataondisk

Serializingdatawithpickle

Savingdatawithshelve

Savingdatatoadatabase

Summary

8. Testing,Profiling,andDealingwithExceptions

Testingyourapplication

Theanatomyofatest

Testingguidelines

Unittesting 

Writingaunittest

Mockobjectsandpatching

Assertions

TestingaCSVgenerator

Boundariesandgranularity

Testingtheexportfunction

Finalconsiderations

Test-drivendevelopment

Exceptions

ProfilingPython

Whentoprofile?

Summary

9. CryptographyandTokens

Theneedforcryptography

Usefulguidelines

Hashlib

Secrets

Randomnumbers

Tokengeneration

Digestcomparison

HMAC

JSONWebTokens

Registeredclaims

Time-relatedclaims

Auth-relatedclaims

Usingasymmetric(public-key)algorithms

Usefulreferences

Summary

10. ConcurrentExecution

Concurrencyversusparallelism

Threadsandprocesses– anoverview

Quickanatomyofathread

Killingthreads

Context-switching

TheGlobalInterpreterLock

Raceconditionsanddeadlocks

Raceconditions

ScenarioA– raceconditionnothappening

ScenarioB– raceconditionhappening

Lockstotherescue

ScenarioC– usingalock

Deadlocks

Quickanatomyofaprocess

Propertiesofaprocess

Multithreadingormultiprocessing?

ConcurrentexecutioninPython

Startingathread

Startingaprocess

Stoppingthreadsandprocesses

Stoppingaprocess

Spawningmultiplethreads

Dealingwithraceconditions

Athread'slocaldata

Threadandprocesscommunication

Threadcommunication

Sendingevents

Inter-processcommunicationwithqueues

Threadandprocesspools

Usingaprocesstoaddatimeouttoafunction

Caseexamples

Exampleone–concurrentmergesort

Single-threadmergesort

Single-threadmultipartmergesort

Multithreadedmergesort

Multiprocessmergesort

Exampletwo –batchsudoku-solver

WhatisSudoku?

Implementingasudoku-solverinPython

Solvingsudokuwithmultiprocessing

Examplethree –downloadingrandompictures

Downloadingrandompictureswithasyncio

Summary

11. DebuggingandTroubleshooting

Debuggingtechniques

Debuggingwithprint

Debuggingwithacustomfunction

Inspectingthetraceback

UsingthePythondebugger

Inspectinglogfiles

Othertechniques

Profiling

Assertions

Wheretofindinformation

Troubleshootingguidelines

Usingconsoleeditors

Wheretoinspect

Usingteststodebug

Monitoring

Summary

12. GUIsandScripts

Firstapproach–scripting

Theimports

Parsingarguments

Thebusinesslogic

Secondapproach –aGUIapplication

Theimports

Thelayoutlogic

Thebusinesslogic

Fetchingthewebpage

Savingtheimages

Alertingtheuser

Howcanweimprovetheapplication?

Wheredowegofromhere?

Theturtlemodule

wxPython,PyQt,andPyGTK

Theprincipleofleastastonishment

Threadingconsiderations

Summary

13. DataScience

IPythonandJupyterNotebook

Installingtherequiredlibraries

UsingAnaconda

StartingaNotebook

Dealingwithdata

SettinguptheNotebook

Preparingthedata

Cleaningthedata

CreatingtheDataFrame

Unpackingthecampaignname

Unpackingtheuserdata

Cleaningeverythingup

SavingtheDataFrametoafile

Visualizingtheresults

Wheredowegofromhere?

Summary

14. WebDevelopment

Whatistheweb?

Howdoesthewebwork?

TheDjangowebframework

Djangodesignphilosophy

Themodellayer

Theviewlayer

Thetemplatelayer

TheDjangoURLdispatcher

Regularexpressions

Aregexwebsite

SettingupDjango

Startingtheproject

Creatingusers

AddingtheEntrymodel

Customizingtheadminpanel

Creatingtheform

Writingtheviews

Thehomeview

Theentrylistview

Theformview

TyingupURLsandviews

Writingthetemplates

Thefutureofwebdevelopment

WritingaFlaskview

BuildingaJSONquoteserverinFalcon

Summary

Afarewell

OtherBooksYouMayEnjoy

Leaveareview-letotherreadersknowwhatyouthink

PrefaceWhenIstartedwritingthefirsteditionofthisbook,Iknewverylittleaboutwhatwasexpected.Gradually,Ilearnedhowtoconverteachtopicintoastory.IwantedtotalkaboutPythonbyofferinguseful,simple,easy-to-graspexamples,but,atthesametime,Iwantedtopourmyownexperienceintothepages,anythingI'velearnedovertheyearsthatIthoughtwouldbevaluableforthereader—somethingtothinkabout,reflectupon,andhopefullyassimilate.Readersmaydisagreeandcomeupwithadifferentwayofdoingthings,buthopefullyabetterway.

Iwantedthisbooktonotjustbeaboutthelanguagebutaboutprogramming.Theartofprogramming,infact,comprisesmanyaspects,andlanguageisjustoneofthem.

Anothercrucialaspectofprogrammingisindependence.Theabilitytounblockyourselfwhenyouhitawallanddon'tknowwhattodotosolvetheproblemyou'refacing.Thereisnobookthatcanteachit,soIthought,insteadoftryingtoteachthataspect,Iwilltryandtrainthereaderinit.Therefore,Ileftcomments,questions,andremarksscatteredthroughoutthewholebook,hopingtoinspirethereader.IhopedthattheywouldtakethetimetobrowsetheWebortheofficialdocumentation,todigdeeper,learnmore,anddiscoverthepleasureoffindingthingsoutbythemselves.

Finally,Iwantedtowriteabookthat,eveninitspresentation,wouldbeslightlydifferent.So,Idecided,withmyeditor,towritethefirstpartinatheoreticalway,presentingtopicsthatwoulddescribethecharacteristicsofPython,andtohaveasecondpartmadeupofvariousreal-lifeprojects,toshowthereaderhowmuchcanbeachievedwiththislanguage.

Withallthesegoalsinmind,Ithenhadtofacethehardestchallenge:takeallthecontentIwantedtowriteandmakeitfitintheamountofpagesthatwereallowed.Ithasbeentough,andsacrificesweremade.

Myeffortshavebeenrewardedthough:tothisday,afteralmost3years,Istillreceivelovelymessagesfromreaders,everynowandthen,whothankmeand

tellmethingslikeyourbookhasempoweredme.Tome,itisthemostbeautifulcompliment.Iknowthatthelanguagemightchangeandpass,butIhavemanagedtosharesomeofmyknowledgewiththereader,andthatpieceofknowledgewillstickwiththem.

Andnow,Ihavewrittenthesecondeditionofthisbook,andthistime,Ihadalittlemorespace.SoIdecidedtoaddachapteraboutIO,whichwasdesperatelyneeded,andIevenhadtheopportunitytoaddtwomorechapters,oneaboutsecretsandoneaboutconcurrentexecution.Thelatterisdefinitelythemostchallengingchapterinthewholebook,anditspurposeisthatofstimulatingthereadertoreachalevelwheretheywillbeabletoeasilydigestthecodeinitandunderstanditsconcepts.

Ihavekeptalltheoriginalchapters,exceptforthelastonethatwasslightlyredundant.TheyhaveallbeenrefreshedandupdatedtothelatestversionofPython,whichis3.7atthetimeofwriting.

WhenIlookatthisbook,Iseeamuchmorematureproduct.Therearemorechapters,andthecontenthasbeenreorganizedtobetterfitthenarrative,butthesoulofthebookisstillthere.Themainandmostimportantpoint,empoweringthereader,isstillverymuchintact.

Ihopethatthiseditionwillbeevenmoresuccessfulthanthepreviousone,andthatitwillhelpthereadersbecomegreatprogrammers.Ihopetohelpthemdevelopcriticalthinking,greatskills,andtheabilitytoadaptovertime,thankstothesolidfoundationtheyhaveacquiredfromthebook.

Whothisbookisfor

PythonisthemostpopularintroductoryteachinglanguageinthetopcomputerscienceuniversitiesintheUS,soifyouarenewtosoftwaredevelopment,orifyouhavelittleexperienceandwouldliketostartoffontherightfoot,thenthislanguageandthisbookarewhatyouneed.Itsamazingdesignandportabilitywillhelpyoutobecomeproductiveregardlessoftheenvironmentyouchoosetoworkwith.

IfyouhavealreadyworkedwithPythonoranyotherlanguage,thisbookcanstillbeusefultoyou,bothasareferencetoPython'sfundamentals,andforprovidingawiderangeofconsiderationsandsuggestionscollectedovertwodecadesofexperience.

WhatthisbookcoversChapter1,AGentleIntroductiontoPython,introducesyoutofundamentalprogrammingconcepts.ItguidesyouthroughgettingPythonupandrunningonyourcomputerandintroducesyoutosomeofitsconstructs.

Chapter2,Built-inDataTypes,introducesyoutoPythonbuilt-indatatypes.Pythonhasaveryrichsetofnativedatatypes,andthischapterwillgiveyouadescriptionandashortexampleforeachofthem.

Chapter3,IteratingandMakingDecisions,teachesyouhowtocontroltheflowofyourcodebyinspectingconditions,applyinglogic,andperformingloops.

Chapter4,Functions,theBuildingBlocksofCode,teachesyouhowtowritefunctions.Functionsarethekeystoreusingcode,toreducingdebuggingtime,and,ingeneral,towritingbettercode.

Chapter5,SavingTimeandMemory,introducesyoutothefunctionalaspectsofPythonprogramming.Thischapterteachesyouhowtowritecomprehensionsandgenerators,whicharepowerfultoolsthatyoucanusetospeedupyourcodeandsavememory.

Chapter6,OOP,Decorators,andIterators,teachesyouthebasicsofobject-orientedprogrammingwithPython.Itshowsyouthekeyconceptsandallthepotentialsofthisparadigm.ItalsoshowsyouoneofthemostbelovedcharacteristicsofPython:decorators.Finally,italsocoverstheconceptofiterators.

Chapter7,FilesandDataPersistence,teachesyouhowtodealwithfiles,streams,datainterchangeformats,anddatabases,amongotherthings.

Chapter8,Testing,Profiling,andDealingwithExceptions,teachesyouhowtomakeyourcodemorerobust,fast,andstableusingtechniquessuchastestingandprofiling.Italsoformallydefinestheconceptofexceptions.

Chapter9,CryptographyandTokens,touchesupontheconceptsofsecurity,

hashes,encryption,andtokens,whicharepartofday-to-dayprogrammingatpresent.

Chapter10,ConcurrentExecution,isachallengingchapterthatdescribeshowtodomanythingsatthesametime.Itprovidesanintroductiontothetheoreticalaspectsofthissubjectandthenpresentsthreeniceexercisesthataredevelopedwithdifferenttechniques,therebyenablingthereadertounderstandthedifferencesbetweentheparadigmspresented.

Chapter11,DebuggingandTroubleshooting,showsyouthemainmethodsfordebuggingyourcodeandsomeexamplesonhowtoapplythem.

Chapter12,GUIsandScripts,guidesyouthroughanexamplefromtwodifferentpointsofview.Theyareatoppositeendsofthespectrum:oneimplementationisascript,andanotheroneisapropergraphicaluserinterfaceapplication.

Chapter13,DataScience,introducesafewkeyconceptsandaveryspecialtool,theJupyterNotebook.

Chapter14,WebDevelopment,introducesthefundamentalsofwebdevelopmentanddeliversaprojectusingtheDjangowebframework.Theexamplewillbebasedonregularexpressions.

TogetthemostoutofthisbookYouareencouragedtofollowtheexamplesinthisbook.Inordertodoso,youwillneedacomputer,aninternetconnection,andabrowser.ThebookiswritteninPython3.7,butitshouldalsowork,forthemostpart,withanyrecentPython3.*version.IhavegivenguidelinesonhowtoinstallPythononyouroperatingsystem.Theprocedurestodothatchangeallthetime,soyouwillneedtorefertothemostup-to-dateguideontheWebtofindprecisesetupinstructions.Ihavealsoexplainedhowtoinstallalltheextralibrariesusedinthevariousexamplesandprovidedsuggestionsifthereaderfindsanyissuesduringtheinstallationofanyofthem.Noparticulareditorisrequiredtotypethecode;however,Isuggestthatthosewhoareinterestedinfollowingtheexamplesshouldconsideradoptingapropercodingenvironment.Ihavegivensuggestionsonthismatterinthefirstchapter.

DownloadtheexamplecodefilesYoucandownloadtheexamplecodefilesforthisbookfromyouraccountatwww.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisitwww.packtpub.com/supportandregistertohavethefilesemaileddirectlytoyou.

Youcandownloadthecodefilesbyfollowingthesesteps:

1. Loginorregisteratwww.packtpub.com.2. SelecttheSUPPORTtab.3. ClickonCodeDownloads&Errata.4. EnterthenameofthebookintheSearchboxandfollowtheonscreen

instructions.

Oncethefileisdownloaded,pleasemakesurethatyouunziporextractthefolderusingthelatestversionof:

WinRAR/7-ZipforWindowsZipeg/iZip/UnRarXforMac7-Zip/PeaZipforLinux

ThecodebundleforthebookisalsohostedonGitHubathttps://github.com/PacktPublishing/Learn-Python-Programming-Second-Edition.Incasethere'sanupdatetothecode,itwillbeupdatedontheexistingGitHubrepository.

Wealsohaveothercodebundlesfromourrichcatalogofbooksandvideosavailableathttps://github.com/PacktPublishing/.Checkthemout!

ConventionsusedThereareanumberoftextconventionsusedthroughoutthisbook.

CodeInText:Indicatescodewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandles.Hereisanexample:"Withinthelearn.ppfolder,wewillcreateavirtualenvironmentcalledlearnpp."

Ablockofcodeissetasfollows:

#wedefineafunction,calledlocal

deflocal():

m=7

print(m)

Whenwewishtodrawyourattentiontoaparticularpartofacodeblock,therelevantlinesoritemsaresetinbold:

#key.points.mutable.assignment.py

x=[1,2,3]

deffunc(x):

x[1]=42#thischangesthecaller!

x='somethingelse'#thispointsxtoanewstringobject

Anycommand-lineinputoroutputiswrittenasfollows:

>>>importsys

>>>print(sys.version)

Bold:Indicatesanewterm,animportantword,orwordsthatyouseeonscreen.Forexample,wordsinmenusordialogboxesappearinthetextlikethis.Hereisanexample:"ToopentheconsoleinWindows,gototheStartmenu,chooseRun,andtypecmd."

Warningsorimportantnotesappearlikethis.

Tipsandtricksappearlikethis.

GetintouchFeedbackfromourreadersisalwayswelcome.

Generalfeedback:Emailfeedback@packtpub.comandmentionthebooktitleinthesubjectofyourmessage.Ifyouhavequestionsaboutanyaspectofthisbook,pleaseemailusatquestions@packtpub.com.

Errata:Althoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyouhavefoundamistakeinthisbook,wewouldbegratefulifyouwouldreportthistous.Pleasevisitwww.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetails.

Piracy:IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,wewouldbegratefulifyouwouldprovideuswiththelocationaddressorwebsitename.Pleasecontactusatcopyright@packtpub.comwithalinktothematerial.

Ifyouareinterestedinbecominganauthor:Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,pleasevisitauthors.packtpub.com.

ReviewsPleaseleaveareview.Onceyouhavereadandusedthisbook,whynotleaveareviewonthesitethatyoupurchaseditfrom?Potentialreaderscanthenseeanduseyourunbiasedopiniontomakepurchasedecisions,weatPacktcanunderstandwhatyouthinkaboutourproducts,andourauthorscanseeyourfeedbackontheirbook.Thankyou!

FormoreinformationaboutPackt,pleasevisitpacktpub.com.

AGentleIntroductiontoPython"Giveamanafishandyoufeedhimforaday.Teachamantofishandyoufeedhimforalifetime."

–Chineseproverb

AccordingtoWikipedia,computerprogrammingis:

"...aprocessthatleadsfromanoriginalformulationofacomputingproblemtoexecutablecomputerprograms.Programminginvolvesactivitiessuchasanalysis,developingunderstanding,generatingalgorithms,verificationofrequirementsofalgorithmsincludingtheircorrectnessandresourcesconsumption,andimplementation(commonlyreferredtoascoding)ofalgorithmsinatargetprogramminglanguage."

Inanutshell,codingistellingacomputertodosomethingusingalanguageitunderstands.

Computersareverypowerfultools,butunfortunately,theycan'tthinkforthemselves.Theyneedtobetoldeverything:howtoperformatask,howtoevaluateaconditiontodecidewhichpathtofollow,howtohandledatathatcomesfromadevice,suchasthenetworkoradisk,andhowtoreactwhensomethingunforeseenhappens,say,somethingisbrokenormissing.

Youcancodeinmanydifferentstylesandlanguages.Isithard?Iwouldsayyesandno.It'sabitlikewriting.Everybodycanlearnhowtowrite,andyoucantoo.But,whatifyouwantedtobecomeapoet?Thenwritingaloneisnotenough.Youhavetoacquireawholeothersetofskillsandthiswilltakealongerandgreatereffort.

Intheend,itallcomesdowntohowfaryouwanttogodowntheroad.Codingisnotjustputtingtogethersomeinstructionsthatwork.Itissomuchmore!

Goodcodeisshort,fast,elegant,easytoreadandunderstand,simple,easytomodifyandextend,easytoscaleandrefactor,andeasytotest.Ittakestimetobeabletowritecodethathasallthesequalitiesatthesametime,butthegoodnewsisthatyou'retakingthefirststeptowardsitatthisverymomentbyreadingthisbook.AndIhavenodoubtyoucandoit.Anyonecan;infact,weallprogramallthetime,onlywearen'tawareofit.

Wouldyoulikeanexample?

Sayyouwanttomakeinstantcoffee.Youhavetogetamug,theinstantcoffeejar,ateaspoon,water,andthekettle.Evenifyou'renotawareofit,you'reevaluatingalotofdata.You'remakingsurethatthereiswaterinthekettleandthatthekettleispluggedin,thatthemugisclean,andthatthereisenoughcoffeeinthejar.Then,youboilthewaterandmaybe,inthemeantime,youputsomecoffeeinthemug.Whenthewaterisready,youpouritintothecup,andstir.

So,howisthisprogramming?

Well,wegatheredresources(thekettle,coffee,water,teaspoon,andmug)andweverifiedsomeconditionsconcerningthem(thekettleispluggedin,themugisclean,andthereisenoughcoffee).Thenwestartedtwoactions(boilingthewaterandputtingcoffeeinthemug),andwhenbothofthemwerecompleted,wefinallyendedtheprocedurebypouringwaterintothemugandstirring.

Canyouseeit?Ihavejustdescribedthehigh-levelfunctionalityofacoffeeprogram.Itwasn'tthathardbecausethisiswhatthebraindoesalldaylong:evaluateconditions,decidetotakeactions,carryouttasks,repeatsomeofthem,andstopatsomepoint.Cleanobjects,putthemback,andsoon.

Allyouneednowistolearnhowtodeconstructallthoseactionsyoudoautomaticallyinreallifesothatacomputercanactuallymakesomesenseofthem.Andyouneedtolearnalanguageaswell,toinstructit.

Sothisiswhatthisbookisfor.I'lltellyouhowtodoitandI'lltrytodothatbymeansofmanysimplebutfocusedexamples(myfavoritekind).

Inthischapter,wearegoingtocoverthefollowing:

Python'scharacteristicsandecosystemGuidelinesonhowtogetupandrunningwithPythonandvirtualenvironmentsHowtorunPythonprogramsHowtoorganizePythoncodeandPython'sexecutionmodel

AproperintroductionIlovetomakereferencestotherealworldwhenIteachcoding;Ibelievetheyhelppeopleretaintheconceptsbetter.However,nowisthetimetobeabitmorerigorousandseewhatcodingisfromamoretechnicalperspective.

Whenwewritecode,we'reinstructingacomputeraboutthethingsithastodo.Wheredoestheactionhappen?Inmanyplaces:thecomputermemory,harddrives,networkcables,theCPU,andsoon.It'sawholeworld,whichmostofthetimeistherepresentationofasubsetoftherealworld.

Ifyouwriteapieceofsoftwarethatallowspeopletobuyclothesonline,youwillhavetorepresentrealpeople,realclothes,realbrands,sizes,andsoonandsoforth,withintheboundariesofaprogram.

Inordertodoso,youwillneedtocreateandhandleobjectsintheprogramyou'rewriting.Apersoncanbeanobject.Acarisanobject.Apairofsocksisanobject.Luckily,Pythonunderstandsobjectsverywell.

Thetwomainfeaturesanyobjecthasarepropertiesandmethods.Let'stakeapersonobjectasanexample.Typicallyinacomputerprogram,you'llrepresentpeopleascustomersoremployees.Thepropertiesthatyoustoreagainstthemarethingslikethename,theSSN,theage,iftheyhaveadrivinglicense,theiremail,gender,andsoon.Inacomputerprogram,youstoreallthedatayouneedinordertouseanobjectforthepurposeyou'reserving.Ifyouarecodingawebsitetosellclothes,youprobablywanttostoretheheightsandweightsaswellasothermeasuresofyourcustomerssothatyoucansuggesttheappropriateclothesforthem.So,propertiesarecharacteristicsofanobject.Weusethemallthetime:Couldyoupassmethatpen?—Whichone?—Theblackone.Here,weusedtheblackpropertyofapentoidentifyit(mostlikelyamongablueandaredone).

Methodsarethingsthatanobjectcando.Asaperson,Ihavemethodssuchasspeak,walk,sleep,wakeup,eat,dream,write,read,andsoon.AllthethingsthatIcandocouldbeseenasmethodsoftheobjectsthatrepresentme.

So,nowthatyouknowwhatobjectsareandthattheyexposemethodsthatyoucanrunandpropertiesthatyoucaninspect,you'rereadytostartcoding.Codinginfactissimplyaboutmanagingthoseobjectsthatliveinthesubsetoftheworldthatwe'rereproducinginoursoftware.Youcancreate,use,reuse,anddeleteobjectsasyouplease.

AccordingtotheDataModelchapterontheofficialPythondocumentation(https://docs.python.org/3/reference/datamodel.html):

"ObjectsarePython'sabstractionfordata.AlldatainaPythonprogramisrepresentedbyobjectsorbyrelationsbetweenobjects."

We'lltakeacloserlookatPythonobjectsinChapter6,OOP,Decorators,andIterators.Fornow,allweneedtoknowisthateveryobjectinPythonhasanID(oridentity),atype,andavalue.

Oncecreated,theIDofanobjectisneverchanged.It'sauniqueidentifierforit,andit'susedbehindthescenesbyPythontoretrievetheobjectwhenwewanttouseit.

Thetype,aswell,neverchanges.Thetypetellswhatoperationsaresupportedbytheobjectandthepossiblevaluesthatcanbeassignedtoit.

We'llseePython'smostimportantdatatypesinChapter2,Built-inDataTypes.

Thevaluecaneitherchangeornot.Ifitcan,theobjectissaidtobemutable,whilewhenitcannot,theobjectissaidtobeimmutable.

Howdoweuseanobject?Wegiveitaname,ofcourse!Whenyougiveanobjectaname,thenyoucanusethenametoretrievetheobjectanduseit.

Inamoregenericsense,objectssuchasnumbers,strings(text),collections,andsoonareassociatedwithaname.Usually,wesaythatthisnameisthenameofavariable.Youcanseethevariableasbeinglikeabox,whichyoucanusetoholddata.

So,youhavealltheobjectsyouneed;whatnow?Well,weneedtousethem,right?Wemaywanttosendthemoveranetworkconnectionorstoretheminadatabase.Maybedisplaythemonawebpageorwritethemintoafile.Inorder

todoso,weneedtoreacttoauserfillinginaform,orpressingabutton,oropeningawebpageandperformingasearch.Wereactbyrunningourcode,evaluatingconditionstochoosewhichpartstoexecute,howmanytimes,andunderwhichcircumstances.

Andtodoallthis,basicallyweneedalanguage.That'swhatPythonisfor.Pythonisthelanguagewe'llusetogetherthroughoutthisbooktoinstructthecomputertodosomethingforus.

Now,enoughofthistheoreticalstuff;let'sgetstarted.

EnterthePythonPythonisthemarvelouscreationofGuidoVanRossum,aDutchcomputerscientistandmathematicianwhodecidedtogifttheworldwithaprojecthewasplayingaroundwithoverChristmas1989.Thelanguageappearedtothepublicsomewherearound1991,andsincethenhasevolvedtobeoneoftheleadingprogramminglanguagesusedworldwidetoday.

IstartedprogrammingwhenIwas7yearsold,onaCommodoreVIC-20,whichwaslaterreplacedbyitsbiggerbrother,theCommodore64.ItslanguagewasBASIC.Lateron,IlandedonPascal,Assembly,C,C++,Java,JavaScript,VisualBasic,PHP,ASP,ASP.NET,C#,andotherminorlanguagesIcannotevenremember,butonlywhenIlandedonPythondidIfinallyhavethatfeelingthatyouhavewhenyoufindtherightcouchintheshop.Whenallofyourbodypartsareyelling,Buythisone!Thisoneisperfectforus!

Ittookmeaboutadaytogetusedtoit.ItssyntaxisabitdifferentfromwhatIwasusedto,butaftergettingpastthatinitialfeelingofdiscomfort(likehavingnewshoes),Ijustfellinlovewithit.Deeply.Let'sseewhy.

AboutPythonBeforewegetintothegorydetails,let'sgetasenseofwhysomeonewouldwanttousePython(IwouldrecommendyoutoreadthePythonpageonWikipediatogetamoredetailedintroduction).

Tomymind,Pythonepitomizesthefollowingqualities.

PortabilityPythonrunseverywhere,andportingaprogramfromLinuxtoWindowsorMacisusuallyjustamatteroffixingpathsandsettings.Pythonisdesignedforportabilityandittakescareofspecificoperatingsystem(OS)quirksbehindinterfacesthatshieldyoufromthepainofhavingtowritecodetailoredtoaspecificplatform.

Coherence

Pythonisextremelylogicalandcoherent.Youcanseeitwasdesignedbyabrilliantcomputerscientist.Mostofthetime,youcanjustguesshowamethodiscalled,ifyoudon'tknowit.

Youmaynotrealizehowimportantthisisrightnow,especiallyifyouareatthebeginning,butthisisamajorfeature.Itmeanslessclutteringinyourhead,aswellaslessskimmingthroughthedocumentation,andlessneedformappingsinyourbrainwhenyoucode.

Developerproductivity

AccordingtoMarkLutz(LearningPython,5thEdition,O'ReillyMedia),aPythonprogramistypicallyone-fifthtoone-thirdthesizeofequivalentJavaorC++code.Thismeansthejobgetsdonefaster.Andfasterisgood.Fastermeansafasterresponseonthemarket.Lesscodenotonlymeanslesscodetowrite,butalsolesscodetoread(andprofessionalcodersreadmuchmorethantheywrite),lesscodetomaintain,todebug,andtorefactor.

AnotherimportantaspectisthatPythonrunswithouttheneedforlengthyandtime-consumingcompilationandlinkagesteps,soyoudon'thavetowaittoseetheresultsofyourwork.

AnextensivelibraryPythonhasanincrediblywidestandardlibrary(it'ssaidtocomewithbatteriesincluded).Ifthatwasn'tenough,thePythoncommunityallovertheworldmaintainsabodyofthird-partylibraries,tailoredtospecificneeds,whichyoucanaccessfreelyatthePythonPackageIndex(PyPI).WhenyoucodePythonandyourealizethatyouneedacertainfeature,inmostcases,thereisatleastonelibrarywherethatfeaturehasalreadybeenimplementedforyou.

SoftwarequalityPythonisheavilyfocusedonreadability,coherence,andquality.Thelanguageuniformityallowsforhighreadabilityandthisiscrucialnowadayswherecodingismoreofacollectiveeffortthanasoloendeavor.AnotherimportantaspectofPythonisitsintrinsicmultiparadigmnature.Youcanuseitasascriptinglanguage,butyoualsocanexploitobject-oriented,imperative,andfunctionalprogrammingstyles.Itisversatile.

SoftwareintegrationAnotherimportantaspectisthatPythoncanbeextendedandintegratedwithmanyotherlanguages,whichmeansthatevenwhenacompanyisusingadifferentlanguageastheirmainstreamtool,Pythoncancomeinandactasaglueagentbetweencomplexapplicationsthatneedtotalktoeachotherinsomeway.Thisiskindofanadvancedtopic,butintherealworld,thisfeatureisveryimportant.

Satisfactionandenjoyment

Last,butnotleast,thereisthefunofit!WorkingwithPythonisfun.Icancodefor8hoursandleavetheofficehappyandsatisfied,alientothestruggleothercodershavetoendurebecausetheyuselanguagesthatdon'tprovidethemwiththesameamountofwell-designeddatastructuresandconstructs.Pythonmakescodingfun,nodoubtaboutit.Andfunpromotesmotivationandproductivity.

ThesearethemajoraspectsofwhyIwouldrecommendPythontoeveryone.Ofcourse,therearemanyothertechnicalandadvancedfeaturesthatIcouldhavetalkedabout,buttheydon'treallypertaintoanintroductorysectionlikethisone.Theywillcomeupnaturally,chapterafterchapter,inthisbook.

Whatarethedrawbacks?Probably,theonlydrawbackthatonecouldfindinPython,whichisnotduetopersonalpreferences,isitsexecutionspeed.Typically,Pythonisslowerthanitscompiledbrothers.ThestandardimplementationofPythonproduces,whenyourunanapplication,acompiledversionofthesourcecodecalledbytecode(withtheextension.pyc),whichisthenrunbythePythoninterpreter.Theadvantageofthisapproachisportability,whichwepayforwithaslowdownduetothefactthatPythonisnotcompileddowntomachinelevelasareotherlanguages.

However,Pythonspeedisrarelyaproblemtoday,henceitswideuseregardlessofthissuboptimalfeature.Whathappensisthat,inreallife,hardwarecostisnolongeraproblem,andusuallyit'seasyenoughtogainspeedbyparallelizingtasks.Moreover,manyprogramsspendagreatproportionofthetimewaitingforIOoperationstocomplete;therefore,therawexecutionspeedisoftenasecondaryfactortotheoverallperformance.Whenitcomestonumbercrunchingthough,onecanswitchtofasterPythonimplementations,suchasPyPy,whichprovidesanaveragefive-foldspeedupbyimplementingadvancedcompilationtechniques(checkhttp://pypy.org/forreference).

Whendoingdatascience,you'llmostlikelyfindthatthelibrariesthatyouusewithPython,suchasPandasandNumPy,achievenativespeedduetothewaytheyareimplemented.

Ifthatwasn'tagood-enoughargument,youcanalwaysconsiderthatPythonhasbeenusedtodrivethebackendofservicessuchasSpotifyandInstagram,whereperformanceisaconcern.Nonetheless,Pythonhasdoneitsjobperfectlyadequately.

WhoisusingPythontoday?Notyetconvinced?Let'stakeaverybrieflookatthecompaniesthatareusingPythontoday:Google,YouTube,Dropbox,Yahoo!,ZopeCorporation,IndustrialLight&Magic,WaltDisneyFeatureAnimation,Blender3D,Pixar,NASA,theNSA,RedHat,Nokia,IBM,Netflix,Yelp,Intel,Cisco,HP,Qualcomm,andJPMorganChase,tonamejustafew.

EvengamessuchasBattlefield2,CivilizationIV,andQuArKareimplementedusingPython.

Pythonisusedinmanydifferentcontexts,suchassystemprogramming,webprogramming,GUIapplications,gamingandrobotics,rapidprototyping,systemintegration,datascience,databaseapplications,andmuchmore.SeveralprestigiousuniversitieshavealsoadoptedPythonastheirmainlanguageincomputersciencecourses.

SettinguptheenvironmentBeforewetalkaboutinstallingPythononyoursystem,letmetellyouaboutwhichPythonversionI'llbeusinginthisbook.

Python2versusPython3Pythoncomesintwomainversions:Python2,whichisthepast,andPython3,whichisthepresent.Thetwoversions,thoughverysimilar,areincompatibleinsomerespects.

Intherealworld,Python2isactuallyquitefarfrombeingthepast.Inshort,eventhoughPython3hasbeenoutsince2008,thetransitionphasefromVersion2isstillfarfrombeingover.ThisismostlyduetothefactthatPython2iswidelyusedintheindustry,andofcourse,companiesaren'tsokeenonupdatingtheirsystemsjustforthesakeofupdatingthem,followingtheifitain'tbroke,don'tfixitphilosophy.Youcanreadallaboutthetransitionbetweenthetwoversionsontheweb.

Anotherissuethathashinderedthetransitionistheavailabilityofthird-partylibraries.Usually,aPythonprojectreliesontensofexternallibraries,andofcourse,whenyoustartanewproject,youneedtobesurethatthereisalreadyaVersion-3-compatiblelibraryforanybusinessrequirementthatmaycomeup.Ifthat'snotthecase,startingabrand-newprojectinPython3meansintroducingapotentialrisk,whichmanycompaniesarenothappytotake.

Atthetimeofwriting,though,themajorityofthemostwidelyusedlibrarieshavebeenportedtoPython3,andit'squitesafetostartaprojectinPython3formostcases.Manyofthelibrarieshavebeenrewrittensothattheyarecompatiblewithbothversions,mostlyharnessingthepowerofthesixlibrary(thenamecomesfromthemultiplication2x3,duetotheportingfromVersion2to3),whichhelpsintrospectingandadaptingthebehavioraccordingtotheversionused.AccordingtoPEP373(https://legacy.python.org/dev/peps/pep-0373/),theendoflife(EOL)ofPython2.7hasbeensetto2020,andtherewon'tbeaPython2.8,sothisisthetimewhencompaniesthathaveprojectsrunninginPython2needtostartdevisinganupgradestrategytomovetoPython3beforeit'stoolate.

Onmybox(MacBookPro),thisisthelatestPythonversionIhave:

>>>importsys

>>>print(sys.version)

3.7.0a3(default,Jan272018,00:46:45)

[Clang9.0.0(clang-900.0.39.2)]

SoyoucanseethattheversionisanalphareleaseofPython3.7,whichwillbereleasedinJune2018.TheprecedingtextisalittlebitofPythoncodethatItypedintomyconsole.We'lltalkaboutitinamoment.

AlltheexamplesinthisbookwillberunusingPython3.7.EventhoughatthemomentthefinalversionmightstillbeslightlydifferentthanwhatIhave,Iwillmakesurethatallthecodeandexamplesareuptodatewith3.7bythetimethebookispublished.

SomeofthecodecanalsoruninPython2.7,eitherasitisorwithminortweaks,butatthispointintime,Ithinkit'sbettertolearnPython3,andthen,ifyouneedto,learnthedifferencesithaswithPython2,ratherthangoingtheotherwayaround.

Don'tworryaboutthisversionthingthough;it'snotthatbiganissueinpractice.

InstallingPython

Ineverreallygotthepointofhavingasetupsectioninabook,regardlessofwhatitisthatyouhavetosetup.Mostofthetime,betweenthetimetheauthorwritestheinstructionsandthetimeyouactuallytrythemout,monthshavepassed.Thatis,ifyou'relucky.Oneversionchangeandthingsmaynotworkinthewaythatisdescribedinthebook.Luckily,wehavethewebnow,soinordertohelpyougetupandrunning,I'lljustgiveyoupointersandobjectives.

Iamconsciousthatthemajorityofreaderswouldprobablyhavepreferredtohaveguidelinesinthebook.Idoubtitwouldhavemadetheirlifemucheasier,asIstronglybelievethatifyouwanttogetstartedwithPythonyouhavetoputinthatinitialeffortinordertogetfamiliarwiththeecosystem.Itisveryimportant,anditwillboostyourconfidencetofacethematerialinthechaptersahead.Ifyougetstuck,rememberthatGoogleisyourfriend.

SettingupthePythoninterpreterFirstofall,let'stalkaboutyourOS.PythonisfullyintegratedandmostlikelyalreadyinstalledinbasicallyalmosteveryLinuxdistribution.IfyouhaveamacOS,it'slikelythatPythonisalreadythereaswell(however,possiblyonlyPython2.7),whereasifyou'reusingWindows,youprobablyneedtoinstallit.

GettingPythonandthelibrariesyouneedupandrunningrequiresabitofhandiwork.LinuxandmacOSseemtobethemostuser-friendlyOSesforPythonprogrammers;Windows,ontheotherhand,istheonethatrequiresthebiggesteffort.

MycurrentsystemisaMacBookPro,andthisiswhatIwillusethroughoutthebook,alongwithPython3.7.

TheplaceyouwanttostartistheofficialPythonwebsite:https://www.python.org.ThiswebsitehoststheofficialPythondocumentationandmanyotherresourcesthatyouwillfindveryuseful.Takethetimetoexploreit.

Anotherexcellent,resourcefulwebsiteonPythonanditsecosystemishttp://docs.python-guide.org.YoucanfindinstructionstosetupPythonondifferentoperatingsystems,usingdifferentmethods.

FindthedownloadsectionandchoosetheinstallerforyourOS.IfyouareonWindows,makesurethatwhenyouruntheinstaller,youchecktheoptioninstallpip(actually,Iwouldsuggesttomakeacompleteinstallation,justtobesafe,ofallthecomponentstheinstallerholds).We'lltalkaboutpiplater.

NowthatPythonisinstalledinyoursystem,theobjectiveistobeabletoopenaconsoleandrunthePythoninteractiveshellbytypingpython.

PleasenotethatIusuallyrefertothePythoninteractiveshellsimplyasthePythonconsole.

ToopentheconsoleinWindows,gototheStartmenu,chooseRun,andtypecmd.Ifyouencounteranythingthatlookslikeapermissionproblemwhileworkingontheexamplesinthisbook,pleasemakesureyouarerunningtheconsolewith

administratorrights.

OnthemacOSX,youcanstartaTerminalbygoingtoApplications|Utilities|Terminal.

IfyouareonLinux,youknowallthatthereistoknowabouttheconsole.

IwillusethetermconsoleinterchangeablytoindicatetheLinuxconsole,theWindowsCommandPrompt,andtheMacintoshTerminal.Iwillalsoindicatethecommand-linepromptwiththeLinuxdefaultformat,likethis:

$sudoapt-getupdate

Ifyou'renotfamiliarwiththat,pleasetakesometimetolearnthebasicsonhowaconsoleworks.Inanutshell,afterthe$sign,younormallyfindaninstructionthatyouhavetotype.Payattentiontocapitalizationandspaces,astheyareveryimportant.

Whateverconsoleyouopen,typepythonattheprompt,andmakesurethePythoninteractiveshellshowsup.Typeexit()toquit.Keepinmindthatyoumayhavetospecifypython3ifyourOScomeswithPython2.*preinstalled.

ThisisroughlywhatyoushouldseewhenyourunPython(itwillchangeinsomedetailsaccordingtotheversionandOS):

$python3.7

Python3.7.0a3(default,Jan272018,00:46:45)

[Clang9.0.0(clang-900.0.39.2)]ondarwin

Type"help","copyright","credits"or"license"formoreinformation.

>>>

NowthatPythonissetupandyoucanrunit,it'stimetomakesureyouhavetheothertoolthatwillbeindispensabletofollowtheexamplesinthebook:virtualenv.

AboutvirtualenvAsyouprobablyhaveguessedbyitsname,virtualenvisallaboutvirtualenvironments.Letmeexplainwhattheyareandwhyweneedthemandletmedoitbymeansofasimpleexample.

YouinstallPythononyoursystemandyoustartworkingonawebsiteforClientX.Youcreateaprojectfolderandstartcoding.Alongtheway,youalsoinstallsomelibraries;forexample,theDjangoframework,whichwe'llseeindepthinChapter14,WebDevelopment.Let'ssaytheDjangoversionyouinstallforProjectXis1.7.1.

Now,yourwebsiteissogoodthatyougetanotherclient,Y.Shewantsyoutobuildanotherwebsite,soyoustartProjectYand,alongtheway,youneedtoinstallDjangoagain.TheonlyissueisthatnowtheDjangoversionis1.8andyoucannotinstallitonyoursystembecausethiswouldreplacetheversionyouinstalledforProjectX.Youdon'twanttoriskintroducingincompatibilityissues,soyouhavetwochoices:eitheryoustickwiththeversionyouhavecurrentlyonyourmachine,oryouupgradeitandmakesurethefirstprojectisstillfullyworkingcorrectlywiththenewversion.

Let'sbehonest,neitheroftheseoptionsisveryappealing,right?Definitelynot.So,here'sthesolution:virtualenv!

virtualenvisatoolthatallowsyoutocreateavirtualenvironment.Inotherwords,itisatooltocreateisolatedPythonenvironments,eachofwhichisafolderthatcontainsallthenecessaryexecutablestousethepackagesthataPythonprojectwouldneed(thinkofpackagesaslibrariesforthetimebeing).

SoyoucreateavirtualenvironmentforProjectX,installallthedependencies,andthenyoucreateavirtualenvironmentforProjectY,installingallitsdependencieswithouttheslightestworrybecauseeverylibraryyouinstallendsupwithintheboundariesoftheappropriatevirtualenvironment.Inourexample,ProjectXwillholdDjango1.7.1,whileProjectYwillholdDjango1.8.

Itisofvitalimportancethatyouneverinstalllibrariesdirectlyatthesystemlevel.Linux,for

example,reliesonPythonformanydifferenttasksandoperations,andifyoufiddlewiththesysteminstallationofPython,youriskcompromisingtheintegrityofthewholesystem(guesstowhomthishappened...).Sotakethisasarule,suchasbrushingyourteethbeforegoingtobed:always,alwayscreateavirtualenvironmentwhenyoustartanewproject.

Toinstallvirtualenvonyoursystem,thereareafewdifferentways.OnaDebian-baseddistributionofLinux,forexample,youcaninstallitwiththefollowingcommand:

$sudoapt-getinstallpython-virtualenv

Probably,theeasiestwayistofollowtheinstructionsyoucanfindonthevirtualenvofficialwebsite:https://virtualenv.pypa.io.

Youwillfindthatoneofthemostcommonwaystoinstallvirtualenvisbyusingpip,apackagemanagementsystemusedtoinstallandmanagesoftwarepackageswritteninPython.

AsofPython3.5,thesuggestedwaytocreateavirtualenvironmentistousethevenvmodule.Pleaseseetheofficialdocumentationforfurtherinformation.However,atthetimeofwriting,virtualenvisstillbyfarthetoolmostusedforcreatingvirtualenvironments.

YourfirstvirtualenvironmentItisveryeasytocreateavirtualenvironment,butaccordingtohowyoursystemisconfiguredandwhichPythonversionyouwantthevirtualenvironmenttorun,youneedtorunthecommandproperly.Anotherthingyouwillneedtodowithvirtualenv,whenyouwanttoworkwithit,istoactivateit.ActivatingvirtualenvbasicallyproducessomepathjugglingbehindthescenessothatwhenyoucallthePythoninterpreter,you'reactuallycallingtheactivevirtualenvironmentone,insteadofthemeresystemone.

I'llshowyouafullexampleonmyMacintoshconsole.Wewill:

1. Createafoldernamedlearn.ppunderyourprojectroot(whichinmycaseisafoldercalledsrv,inmyhomefolder).Pleaseadaptthepathsaccordingtothesetupyoufancyonyourbox.

2. Withinthelearn.ppfolder,wewillcreateavirtualenvironmentcalledlearnpp.

Somedevelopersprefertocallallvirtualenvironmentsusingthesamename(forexample,.venv).Thiswaytheycanrunscriptsagainstanyvirtualenvbyjustknowingthenameoftheprojecttheydwellin.Thedotin.venvistherebecauseinLinux/macOSprependinganamewithadotmakesthatfileorfolderinvisible.

3. Aftercreatingthevirtualenvironment,wewillactivateit.ThemethodsareslightlydifferentbetweenLinux,macOS,andWindows.

4. Then,we'llmakesurethatwearerunningthedesiredPythonversion(3.7.*)byrunningthePythoninteractiveshell.

5. Finally,wewilldeactivatethevirtualenvironmentusingthedeactivatecommand.

Thesefivesimplestepswillshowyouallyouhavetodotostartanduseaproject.

Here'sanexampleofhowthosestepsmightlook(notethatyoumightgetaslightlydifferentresult,accordingtoyourOS,Pythonversion,andsoon)onthemacOS(commandsthatstartwitha#arecomments,spaceshavebeenintroducedforreadability,and⇢indicateswherethelinehaswrappedarounddue

tolackofspace):

fabmp:srvfab$#step1-createfolder

fabmp:srvfab$mkdirlearn.pp

fabmp:srvfab$cdlearn.pp

fabmp:learn.ppfab$#step2-createvirtualenvironment

fabmp:learn.ppfab$whichpython3.7

/Users/fab/.pyenv/shims/python3.7

fabmp:learn.ppfab$virtualenv-p

⇢/Users/fab/.pyenv/shims/python3.7learnppRunningvirtualenvwithinterpreter/Users/fab/.pyenv/shims/python3.7

Usingbaseprefix'/Users/fab/.pyenv/versions/3.7.0a3'

Newpythonexecutablein/Users/fab/srv/learn.pp/learnpp/bin/python3.7

Alsocreatingexecutablein/Users/fab/srv/learn.pp/learnpp/bin/python

Installingsetuptools,pip,wheel...done.

fabmp:learn.ppfab$#step3-activatevirtualenvironment

fabmp:learn.ppfab$sourcelearnpp/bin/activate

(learnpp)fabmp:learn.ppfab$#step4-verifywhichpython

(learnpp)fabmp:learn.ppfab$whichpython

/Users/fab/srv/learn.pp/learnpp/bin/python

(learnpp)fabmp:learn.ppfab$python

Python3.7.0a3(default,Jan272018,00:46:45)

[Clang9.0.0(clang-900.0.39.2)]ondarwin

Type"help","copyright","credits"or"license"formoreinformation.

>>>exit()

(learnpp)fabmp:learn.ppfab$#step5-deactivate

(learnpp)fabmp:learn.ppfab$deactivate

fabmp:learn.ppfab$

NoticethatIhadtotellvirtualenvexplicitlytousethePython3.7interpreterbecauseonmyboxPython2.7isthedefaultone.HadInotdonethat,IwouldhavehadavirtualenvironmentwithPython2.7insteadofPython3.7.

Youcancombinethetwoinstructionsforstep2inonesinglecommandlikethis:

$virtualenv-p$(whichpython3.7)learnpp

Ichosetobeexplicitlyverboseinthisinstance,tohelpyouunderstandeachbitoftheprocedure.

Anotherthingtonoticeisthatinordertoactivateavirtualenvironment,weneedtorunthe/bin/activatescript,whichneedstobesourced.Whenascriptissourced,itmeansthatitisexecutedinthecurrentshell,andthereforeitseffectslastaftertheexecution.Thisisveryimportant.Alsonoticehowthepromptchangesafterweactivatethevirtualenvironment,showingitsnameontheleft(andhowitdisappearswhenwedeactivateit).OnLinux,thestepsarethesame

soIwon'trepeatthemhere.OnWindows,thingschangeslightly,buttheconceptsarethesame.Pleaserefertotheofficialvirtualenvwebsiteforguidance.

Atthispoint,youshouldbeabletocreateandactivateavirtualenvironment.Pleasetryandcreateanotheronewithoutmeguidingyou.Getacquaintedwiththisprocedurebecauseit'ssomethingthatyouwillalwaysbedoing:weneverworksystem-widewithPython,remember?It'sextremelyimportant.

So,withthescaffoldingoutoftheway,we'rereadytotalkabitmoreaboutPythonandhowyoucanuseit.Beforewedothatthough,allowmetospeakafewwordsabouttheconsole.

Yourfriend,theconsoleInthiseraofGUIsandtouchscreendevices,itseemsalittleridiculoustohavetoresorttoatoolsuchastheconsole,wheneverythingisjustaboutoneclickaway.

Butthetruthiseverytimeyouremoveyourrighthandfromthekeyboard(ortheleftone,ifyou'realefty)tograbyourmouseandmovethecursorovertothespotyouwanttoclickon,you'relosingtime.Gettingthingsdonewiththeconsole,counter-intuitiveasitmaybe,resultsinhigherproductivityandspeed.Iknow,youhavetotrustmeonthis.

Speedandproductivityareimportantand,personally,Ihavenothingagainstthemouse,butthereisanotherverygoodreasonforwhichyoumaywanttogetwell-acquaintedwiththeconsole:whenyoudevelopcodethatendsuponsomeserver,theconsolemightbetheonlyavailabletool.Ifyoumakefriendswithit,Ipromiseyou,youwillnevergetlostwhenit'sofutmostimportancethatyoudon't(typically,whenthewebsiteisdownandyouhavetoinvestigateveryquicklywhat'sgoingon).

Soit'sreallyuptoyou.Ifyou'reundecided,pleasegrantmethebenefitofthedoubtandgiveitatry.It'seasierthanyouthink,andyou'llneverregretit.ThereisnothingmorepitifulthanagooddeveloperwhogetslostwithinanSSHconnectiontoaserverbecausetheyareusedtotheirowncustomsetoftools,andonlytothat.

Now,let'sgetbacktoPython.

HowyoucanrunaPythonprogramThereareafewdifferentwaysinwhichyoucanrunaPythonprogram.

RunningPythonscriptsPythoncanbeusedasascriptinglanguage.Infact,italwaysprovesitselfveryuseful.Scriptsarefiles(usuallyofsmalldimensions)thatyounormallyexecutetodosomethinglikeatask.Manydevelopersenduphavingtheirownarsenaloftoolsthattheyfirewhentheyneedtoperformatask.Forexample,youcanhavescriptstoparsedatainaformatandrenderitintoanotherdifferentformat.Oryoucanuseascripttoworkwithfilesandfolders.Youcancreateormodifyconfigurationfiles,andmuchmore.Technically,thereisnotmuchthatcannotbedoneinascript.

It'squitecommontohavescriptsrunningataprecisetimeonaserver.Forexample,ifyourwebsitedatabaseneedscleaningevery24hours(forexample,thetablethatstorestheusersessions,whichexpireprettyquicklybutaren'tcleanedautomatically),youcouldsetupaCronjobthatfiresyourscriptat3:00A.M.everyday.

AccordingtoWikipedia,thesoftwareutilityCronisatime-basedjobschedulerinUnix-likecomputeroperatingsystems.PeoplewhosetupandmaintainsoftwareenvironmentsuseCrontoschedulejobs(commandsorshellscripts)torunperiodicallyatfixedtimes,dates,orintervals.

IhavePythonscriptstodoallthemenialtasksthatwouldtakememinutesormoretodomanually,andatsomepoint,Idecidedtoautomate.We'lldevotehalfofChapter12,GUIsandScripts,onscriptingwithPython.

RunningthePythoninteractiveshellAnotherwayofrunningPythonisbycallingtheinteractiveshell.Thisissomethingwealreadysawwhenwetypedpythononthecommandlineofourconsole.

So,openaconsole,activateyourvirtualenvironment(whichbynowshouldbesecondnaturetoyou,right?),andtypepython.Youwillbepresentedwithacoupleoflinesthatshouldlooklikethis:

$python

Python3.7.0a3(default,Jan272018,00:46:45)

[Clang9.0.0(clang-900.0.39.2)]ondarwin

Type"help","copyright","credits"or"license"formoreinformation.

>>>

Those>>>arethepromptoftheshell.TheytellyouthatPythoniswaitingforyoutotypesomething.Ifyoutypeasimpleinstruction,somethingthatfitsinoneline,that'sallyou'llsee.However,ifyoutypesomethingthatrequiresmorethanonelineofcode,theshellwillchangethepromptto...,givingyouavisualcluethatyou'retypingamultilinestatement(oranythingthatwouldrequiremorethanonelineofcode).

Goon,tryitout;let'sdosomebasicmath:

>>>2+4

6

>>>10/4

2.5

>>>2**1024

179769313486231590772930519078902473361797697894230657273430081157732675805500963132708477322407536021120113879871393357658789768814416622492847430639474124377767893424865485276302219601246094119453082952085005768838150682342462881473913110540827237163350510684586298239947245938479716304835356329624224137216

Thelastoperationisshowingyousomethingincredible.Weraise2tothepowerof1024,andPythonishandlingthistaskwithnotroubleatall.TrytodoitinJava,C++,orC#.Itwon'twork,unlessyouusespeciallibrariestohandlesuchbignumbers.

Iusetheinteractiveshelleveryday.It'sextremelyusefultodebugveryquickly,forexample,tocheckifadatastructuresupportsanoperation.Ormaybetoinspectorrunapieceofcode.

WhenyouuseDjango(awebframework),theinteractiveshelliscoupledwithitandallowsyoutoworkyourwaythroughtheframeworktools,toinspectthedatainthedatabase,andmanymorethings.Youwillfindthattheinteractiveshellwillsoonbecomeoneofyourdearestfriendsonthejourneyyouareembarkingon.

Anothersolution,whichcomesinamuchnicergraphiclayout,istouseIntegratedDeveLopmentEnvironment(IDLE).It'squiteasimpleIDE,whichisintendedmostlyforbeginners.Ithasaslightlylargersetofcapabilitiesthanthenakedinteractiveshellyougetintheconsole,soyoumaywanttoexploreit.ItcomesforfreeintheWindowsPythoninstallerandyoucaneasilyinstallitinanyothersystem.YoucanfindinformationaboutitonthePythonwebsite.

GuidoVanRossumnamedPythonaftertheBritishcomedygroup,MontyPython,soit'srumoredthatthenameIDLEhasbeenchoseninhonorofEricIdle,oneofMontyPython'sfoundingmembers.

RunningPythonasaserviceApartfrombeingrunasascript,andwithintheboundariesofashell,Pythoncanbecodedandrunasanapplication.We'llseemanyexamplesthroughoutthebookaboutthismode.Andwe'llunderstandmoreaboutitinamoment,whenwe'lltalkabouthowPythoncodeisorganizedandrun.

RunningPythonasaGUIapplicationPythoncanalsoberunasagraphicaluserinterface(GUI).Thereareseveralframeworksavailable,someofwhicharecross-platformandsomeothersareplatform-specific.InChapter12,GUIsandScripts,we'llseeanexampleofaGUIapplicationcreatedusingTkinter,whichisanobject-orientedlayerthatlivesontopofTk(TkintermeansTkinterface).

TkisaGUItoolkitthattakesdesktopapplicationdevelopmenttoahigherlevelthantheconventionalapproach.ItisthestandardGUIforToolCommandLanguage(Tcl),butalsoformanyotherdynamiclanguages,anditcanproducerichnativeapplicationsthatrunseamlesslyunderWindows,Linux,macOSX,andmore.

TkintercomesbundledwithPython;therefore,itgivestheprogrammereasyaccesstotheGUIworld,andforthesereasons,IhavechosenittobetheframeworkfortheGUIexamplesthatI'llpresentinthisbook.

AmongtheotherGUIframeworks,wefindthatthefollowingarethemostwidelyused:

PyQtwxPythonPyGTK

Describingthemindetailisoutsidethescopeofthisbook,butyoucanfindalltheinformationyouneedonthePythonwebsite(https://docs.python.org/3/faq/gui.html)intheWhatplatform-independentGUItoolkitsexistforPython?section.IfGUIsarewhatyou'relookingfor,remembertochoosetheoneyouwantaccordingtosomeprinciples.Makesurethey:

OfferallthefeaturesyoumayneedtodevelopyourprojectRunonalltheplatformsyoumayneedtosupportRelyonacommunitythatisaswideandactiveaspossibleWrapgraphicdrivers/toolsthatyoucaneasilyinstall/access

HowisPythoncodeorganized?Let'stalkalittlebitabouthowPythoncodeisorganized.Inthissection,we'llstartgoingdowntherabbitholealittlebitmoreandintroducemoretechnicalnamesandconcepts.

Startingwiththebasics,howisPythoncodeorganized?Ofcourse,youwriteyourcodeintofiles.Whenyousaveafilewiththeextension.py,thatfileissaidtobeaPythonmodule.

Ifyou'reonWindowsormacOSthattypicallyhidefileextensionsfromtheuser,pleasemakesureyouchangetheconfigurationsothatyoucanseethecompletenamesofthefiles.Thisisnotstrictlyarequirement,butasuggestion.

Itwouldbeimpracticaltosaveallthecodethatitisrequiredforsoftwaretoworkwithinonesinglefile.Thatsolutionworksforscripts,whichareusuallynotlongerthanafewhundredlines(andoftentheyarequiteshorterthanthat).

AcompletePythonapplicationcanbemadeofhundredsofthousandsoflinesofcode,soyouwillhavetoscatteritthroughdifferentmodules,whichisbetter,butnotnearlygoodenough.Itturnsoutthatevenlikethis,itwouldstillbeimpracticaltoworkwiththecode.SoPythongivesyouanotherstructure,calledpackage,whichallowsyoutogroupmodulestogether.Apackageisnothingmorethanafolder,whichmustcontainaspecialfile,__init__.py,thatdoesn'tneedtoholdanycodebutwhosepresenceisrequiredtotellPythonthatthefolderisnotjustsomefolder,butit'sactuallyapackage(notethatasofPython3.3,the__init__.pymoduleisnotstrictlyrequiredanymore).

Asalways,anexamplewillmakeallofthismuchclearer.Ihavecreatedanexamplestructureinmybookproject,andwhenItypeinmyconsole:

$tree-vexample

Igetatreerepresentationofthecontentsofthech1/examplefolder,whichholdsthecodefortheexamplesofthischapter.Here'swhatthestructureofareallysimpleapplicationcouldlooklike:

example

├──core.py

├──run.py

└──util

├──__init__.py

├──db.py

├──math.py

└──network.py

Youcanseethatwithintherootofthisexample,wehavetwomodules,core.pyandrun.py,andonepackage:util.Withincore.py,theremaybethecorelogicofourapplication.Ontheotherhand,withintherun.pymodule,wecanprobablyfindthelogictostarttheapplication.Withintheutilpackage,Iexpecttofindvariousutilitytools,andinfact,wecanguessthatthemodulestherearenamedbasedonthetypesoftoolstheyhold:db.pywouldholdtoolstoworkwithdatabases,math.pywould,ofcourse,holdmathematicaltools(maybeourapplicationdealswithfinancialdata),andnetwork.pywouldprobablyholdtoolstosend/receivedataonnetworks.

Asexplainedbefore,the__init__.pyfileistherejusttotellPythonthatutilisapackageandnotjustamerefolder.

Hadthissoftwarebeenorganizedwithinmodulesonly,itwouldhavebeenhardertoinferitsstructure.Iputamoduleonlyexampleunderthech1/files_onlyfolder;seeitforyourself:

$tree-vfiles_only

Thisshowsusacompletelydifferentpicture:

files_only/

├──core.py

├──db.py

├──math.py

├──network.py

└──run.py

Itisalittlehardertoguesswhateachmoduledoes,right?Now,considerthatthisisjustasimpleexample,soyoucanguesshowmuchharderitwouldbetounderstandarealapplicationifwecouldn'torganizethecodeinpackagesandmodules.

Howdoweusemodulesandpackages?Whenadeveloperiswritinganapplication,itislikelythattheywillneedtoapplythesamepieceoflogicindifferentpartsofit.Forexample,whenwritingaparserforthedatathatcomesfromaformthatausercanfillinawebpage,theapplicationwillhavetovalidatewhetheracertainfieldisholdinganumberornot.Regardlessofhowthelogicforthiskindofvalidationiswritten,it'slikelythatitwillbeneededinmorethanoneplace.

Forexample,inapollapplication,wheretheuserisaskedmanyquestions,it'slikelythatseveralofthemwillrequireanumericanswer.Forexample:

Whatisyourage?Howmanypetsdoyouown?Howmanychildrendoyouhave?Howmanytimeshaveyoubeenmarried?

Itwouldbeverybadpracticetocopy/paste(or,moreproperlysaid:duplicate)thevalidationlogicineveryplacewhereweexpectanumericanswer.Thiswouldviolatethedon'trepeatyourself(DRY)principle,whichstatesthatyoushouldneverrepeatthesamepieceofcodemorethanonceinyourapplication.Ifeeltheneedtostresstheimportanceofthisprinciple:youshouldneverrepeatthesamepieceofcodemorethanonceinyourapplication(punintended).

Thereareseveralreasonswhyrepeatingthesamepieceoflogiccanbeverybad,themostimportantonesbeing:

Therecouldbeabuginthelogic,andtherefore,youwouldhavetocorrectitineveryplacethatthelogicisapplied.Youmaywanttoamendthewayyoucarryoutthevalidation,andagainyouwouldhavetochangeitineveryplaceitisapplied.Youmayforgettofix/amendapieceoflogicbecauseyoumisseditwhensearchingforallitsoccurrences.Thiswouldleavewrong/inconsistentbehaviorinyourapplication.

Yourcodewouldbelongerthanneeded,fornogoodreason.

Pythonisawonderfullanguageandprovidesyouwithallthetoolsyouneedtoapplyallthecodingbestpractices.Forthisparticularexample,weneedtobeabletoreuseapieceofcode.Tobeabletoreuseapieceofcode,weneedtohaveaconstructthatwillholdthecodeforussothatwecancallthatconstructeverytimeweneedtorepeatthelogicinsideit.Thatconstructexists,andit'scalledafunction.

I'mnotgoingtoodeepintothespecificshere,sopleasejustrememberthatafunctionisablockoforganized,reusablecodethatisusedtoperformatask.Functionscanassumemanyformsandnames,accordingtowhatkindofenvironmenttheybelongto,butfornowthisisnotimportant.We'llseethedetailswhenweareabletoappreciatethem,lateron,inthebook.Functionsarethebuildingblocksofmodularityinyourapplication,andtheyarealmostindispensable.Unlessyou'rewritingasuper-simplescript,you'llusefunctionsallthetime.We'llexplorefunctionsinChapter4,Functions,theBuildingBlocksofCode.

Pythoncomeswithaveryextensivelibrary,asIhavealreadysaidafewpagesago.Now,maybeit'sagoodtimetodefinewhatalibraryis:alibraryisacollectionoffunctionsandobjectsthatprovidefunctionalitiesthatenrichtheabilitiesofalanguage.

Forexample,withinPython'smathlibrary,wecanfindaplethoraoffunctions,oneofwhichisthefactorialfunction,whichofcoursecalculatesthefactorialofanumber.

Inmathematics,thefactorialofanon-negativeintegernumberN,denotedasN!,isdefinedastheproductofallpositiveintegerslessthanorequaltoN.Forexample,thefactorialof5iscalculatedas:5!=5*4*3*2*1=120

Thefactorialof0is0!=1,torespecttheconventionforanemptyproduct.

So,ifyouwantedtousethisfunctioninyourcode,allyouwouldhavetodoistoimportitandcallitwiththerightinputvalues.Don'tworrytoomuchifinputvaluesandtheconceptofcallingisnotveryclearfornow;pleasejustconcentrateontheimportpart.Weusealibrarybyimportingwhatweneedfromit,andthenweuseit.

InPython,tocalculatethefactorialofnumber5,wejustneedthefollowingcode:

>>>frommathimportfactorial

>>>factorial(5)

120

Whateverwetypeintheshell,ifithasaprintablerepresentation,willbeprintedontheconsoleforus(inthiscase,theresultofthefunctioncall:120).

So,let'sgobacktoourexample,theonewithcore.py,run.py,util,andsoon.

Inourexample,thepackageutilisourutilitylibrary.Ourcustomutilitybeltthatholdsallthosereusabletools(thatis,functions),whichweneedinourapplication.Someofthemwilldealwithdatabases(db.py),somewiththenetwork(network.py),andsomewillperformmathematicalcalculations(math.py)thatareoutsidethescopeofPython'sstandardmathlibraryand,therefore,wehavetocodethemforourselves.

Wewillseeindetailhowtoimportfunctionsandusethemintheirdedicatedchapter.Let'snowtalkaboutanotherveryimportantconcept:Python'sexecutionmodel.

Python'sexecutionmodelInthissection,Iwouldliketointroduceyoutoafewveryimportantconcepts,suchasscope,names,andnamespaces.YoucanreadallaboutPython'sexecutionmodelintheofficiallanguagereference,ofcourse,butIwouldarguethatitisquitetechnicalandabstract,soletmegiveyoualessformalexplanationfirst.

NamesandnamespacesSayyouarelookingforabook,soyougotothelibraryandasksomeoneforthebookyouwanttofetch.TheytellyousomethinglikeSecondFloor,SectionX,RowThree.Soyougoupthestairs,lookforSectionX,andsoon.

Itwouldbeverydifferenttoenteralibrarywhereallthebooksarepiledtogetherinrandomorderinonebigroom.Nofloors,nosections,norows,noorder.Fetchingabookwouldbeextremelyhard.

Whenwewritecode,wehavethesameissue:wehavetotryandorganizeitsothatitwillbeeasyforsomeonewhohasnopriorknowledgeaboutittofindwhatthey'relookingfor.Whensoftwareisstructuredcorrectly,italsopromotescodereuse.Ontheotherhand,disorganizedsoftwareismorelikelytoexposescatteredpiecesofduplicatedlogic.

Firstofall,let'sstartwiththebook.WerefertoabookbyitstitleandinPythonlingo,thatwouldbeaname.Pythonnamesaretheclosestabstractiontowhatotherlanguagescallvariables.Namesbasicallyrefertoobjectsandareintroducedbyname-bindingoperations.Let'smakeaquickexample(noticethatanythingthatfollowsa#isacomment):

>>>n=3#integernumber

>>>address="221bBakerStreet,NW16XE,London"#SherlockHolmes'address

>>>employee={

...'age':45,

...'role':'CTO',

...'SSN':'AB1234567',

...}

>>>#let'sprintthem

>>>n

3

>>>address

'221bBakerStreet,NW16XE,London'

>>>employee

{'age':45,'role':'CTO','SSN':'AB1234567'}

>>>other_name

Traceback(mostrecentcalllast):

File"<stdin>",line1,in<module>

NameError:name'other_name'isnotdefined

Wedefinedthreeobjectsintheprecedingcode(doyourememberwhatarethethreefeatureseveryPythonobjecthas?):

Anintegernumbern(type:int,value:3)Astringaddress(type:str,value:SherlockHolmes'address)Adictionaryemployee(type:dict,value:adictionarythatholdsthreekey/valuepairs)

Don'tworry,Iknowyou'renotsupposedtoknowwhatadictionaryis.We'llseeinChapter2,Built-inDataTypes,thatit'sthekingofPythondatastructures.

Haveyounoticedthatthepromptchangedfrom>>>to...whenItypedinthedefinitionofemployee?That'sbecausethedefinitionspansovermultiplelines.

So,whataren,address,andemployee?Theyarenames.Namesthatwecanusetoretrievedatawithinourcode.Theyneedtobekeptsomewheresothatwheneverweneedtoretrievethoseobjects,wecanusetheirnamestofetchthem.Weneedsomespacetoholdthem,hence:namespaces!

Anamespaceisthereforeamappingfromnamestoobjects.Examplesarethesetofbuilt-innames(containingfunctionsthatarealwaysaccessibleinanyPythonprogram),theglobalnamesinamodule,andthelocalnamesinafunction.Eventhesetofattributesofanobjectcanbeconsideredanamespace.

Thebeautyofnamespacesisthattheyallowyoutodefineandorganizeyournameswithclarity,withoutoverlappingorinterference.Forexample,thenamespaceassociatedwiththatbookwewerelookingforinthelibrarycanbeusedtoimportthebookitself,likethis:

fromlibrary.second_floor.section_x.row_threeimportbook

Westartfromthelibrarynamespace,andbymeansofthedot(.)operator,wewalkintothatnamespace.Withinthisnamespace,welookforsecond_floor,andagainwewalkintoitwiththe.operator.Wethenwalkintosection_x,andfinallywithinthelastnamespace,row_three,wefindthenamewewerelookingfor:book.

Walkingthroughanamespacewillbeclearerwhenwe'llbedealingwithrealcodeexamples.Fornow,justkeepinmindthatnamespacesareplaceswherenamesareassociatedwithobjects.

Thereisanotherconcept,whichiscloselyrelatedtothatofanamespace,whichI'dliketobrieflytalkabout:thescope.

ScopesAccordingtoPython'sdocumentation:

"AscopeisatextualregionofaPythonprogram,whereanamespaceisdirectlyaccessible."

Directlyaccessiblemeansthatwhenyou'relookingforanunqualifiedreferencetoaname,Pythontriestofinditinthenamespace.

Scopesaredeterminedstatically,butactually,duringruntime,theyareuseddynamically.Thismeansthatbyinspectingthesourcecode,youcantellwhatthescopeofanobjectis,butthisdoesn'tpreventthesoftwarefromalteringthatduringruntime.TherearefourdifferentscopesthatPythonmakesaccessible(notnecessarilyallofthemarepresentatthesametime,ofcourse):

Thelocalscope,whichistheinnermostoneandcontainsthelocalnames.Theenclosingscope,thatis,thescopeofanyenclosingfunction.Itcontainsnon-localnamesandalsonon-globalnames.Theglobalscopecontainstheglobalnames.Thebuilt-inscopecontainsthebuilt-innames.Pythoncomeswithasetoffunctionsthatyoucanuseinanoff-the-shelffashion,suchasprint,all,abs,andsoon.Theyliveinthebuilt-inscope.

Theruleisthefollowing:whenwerefertoaname,Pythonstartslookingforitinthecurrentnamespace.Ifthenameisnotfound,Pythoncontinuesthesearchtotheenclosingscopeandthiscontinuesuntilthebuilt-inscopeissearched.Ifanamehasn'tbeenfoundaftersearchingthebuilt-inscope,thenPythonraisesaNameErrorexception,whichbasicallymeansthatthenamehasn'tbeendefined(yousawthisintheprecedingexample).

Theorderinwhichthenamespacesarescannedwhenlookingforanameistherefore:local,enclosing,global,built-in(LEGB).

Thisisallverytheoretical,solet'sseeanexample.Inordertoshowyoulocalandenclosingnamespaces,Iwillhavetodefineafewfunctions.Don'tworryifyouarenotfamiliarwiththeirsyntaxforthemoment.We'llstudyfunctionsinChapter4,Functions,theBuildingBlocksofCode.Justrememberthatinthe

followingcode,whenyouseedef,itmeansI'mdefiningafunction:

#scopes1.py

#LocalversusGlobal

#wedefineafunction,calledlocal

deflocal():

m=7

print(m)

m=5

print(m)

#wecall,or`execute`thefunctionlocal

local()

Intheprecedingexample,wedefinethesamenamem,bothintheglobalscopeandinthelocalone(theonedefinedbythelocalfunction).Whenweexecutethisprogramwiththefollowingcommand(haveyouactivatedyourvirtualenv?):

$pythonscopes1.py

Weseetwonumbersprintedontheconsole:5and7.

WhathappensisthatthePythoninterpreterparsesthefile,toptobottom.First,itfindsacoupleofcommentlines,whichareskipped,thenitparsesthedefinitionofthefunctionlocal.Whencalled,thisfunctiondoestwothings:itsetsupanametoanobjectrepresentingnumber7andprintsit.ThePythoninterpreterkeepsgoinganditfindsanothernamebinding.Thistimethebindinghappensintheglobalscopeandthevalueis5.Thenextlineisacalltotheprintfunction,whichisexecuted(andsowegetthefirstvalueprintedontheconsole:5).

Afterthis,thereisacalltothefunctionlocal.Atthispoint,Pythonexecutesthefunction,soatthistime,thebindingm=7happensandit'sprinted.

Oneveryimportantthingtonoticeisthatthepartofthecodethatbelongstothedefinitionofthelocalfunctionisindentedbyfourspacesontheright.Python,infact,definesscopesbyindentingthecode.Youwalkintoascopebyindenting,andwalkoutofitbyunindenting.Somecodersusetwospaces,othersthree,butthesuggestednumberofspacestouseisfour.It'sagoodmeasuretomaximizereadability.We'lltalkmoreaboutalltheconventionsyoushouldembracewhenwritingPythoncodelater.

Whatwouldhappenifweremovedthatm=7line?RemembertheLEGBrule.

Pythonwouldstartlookingforminthelocalscope(functionlocal),and,notfindingit,itwouldgotothenextenclosingscope.Thenextone,inthiscase,istheglobalonebecausethereisnoenclosingfunctionwrappedaroundlocal.Therefore,wewouldseetwonumbers5printedontheconsole.Let'sactuallyseewhatthecodewouldlooklike:

#scopes2.py

#LocalversusGlobal

deflocal():

#mdoesn'tbelongtothescopedefinedbythelocalfunction

#soPythonwillkeeplookingintothenextenclosingscope.

#misfinallyfoundintheglobalscope

print(m,'printingfromthelocalscope')

m=5

print(m,'printingfromtheglobalscope')

local()

Runningscopes2.pywillprintthis:

$pythonscopes2.py

5printingfromtheglobalscope

5printingfromthelocalscope

Asexpected,Pythonprintsmthefirsttime,thenwhenthefunctionlocaliscalled,misn'tfoundinitsscope,soPythonlooksforitfollowingtheLEGBchainuntilmisfoundintheglobalscope.

Let'sseeanexamplewithanextralayer,theenclosingscope:

#scopes3.py

#Local,EnclosingandGlobal

defenclosing_func():

m=13

deflocal():

#mdoesn'tbelongtothescopedefinedbythelocal

#functionsoPythonwillkeeplookingintothenext

#enclosingscope.Thistimemisfoundintheenclosing

#scope

print(m,'printingfromthelocalscope')

#callingthefunctionlocal

local()

m=5

print(m,'printingfromtheglobalscope')

enclosing_func()

Runningscopes3.pywillprintontheconsole:

$pythonscopes3.py

(5,'printingfromtheglobalscope')

(13,'printingfromthelocalscope')

Asyoucansee,theprintinstructionfromthefunctionlocalisreferringtomasbefore.misstillnotdefinedwithinthefunctionitself,soPythonstartswalkingscopesfollowingtheLEGBorder.Thistimemisfoundintheenclosingscope.

Don'tworryifthisisstillnotperfectlyclearfornow.Itwillcometoyouaswegothroughtheexamplesinthebook.TheClassessectionofthePythontutorial(https://docs.python.org/3/tutorial/classes.html)hasaninterestingparagraphaboutscopesandnamespaces.Makesureyoureaditatsomepointifyouwantadeeperunderstandingofthesubject.

Beforewefinishoffthischapter,Iwouldliketotalkabitmoreaboutobjects.Afterall,basicallyeverythinginPythonisanobject,soIthinktheydeserveabitmoreattention.

ObjectsandclassesWhenIintroducedobjectspreviouslyintheAproperintroductionsectionofthechapter,Isaidthatweusethemtorepresentreal-lifeobjects.Forexample,wesellgoodsofanykindonthewebnowadaysandweneedtobeabletohandle,store,andrepresentthemproperly.Butobjectsareactuallysomuchmorethanthat.Mostofwhatyouwilleverdo,inPython,hastodowithmanipulatingobjects.

So,withoutgoingintotoomuchdetail(we'lldothatinChapter6,OOP,Decorators,andIterators),Iwanttogiveyoutheinanutshellkindofexplanationaboutclassesandobjects.

We'vealreadyseenthatobjectsarePython'sabstractionfordata.Infact,everythinginPythonisanobject,infactnumbers,strings(datastructuresthatholdtext),containers,collections,evenfunctions.Youcanthinkofthemasiftheywereboxeswithatleastthreefeatures:anID(unique),atype,andavalue.

Buthowdotheycometolife?Howdowecreatethem?Howdowewriteourowncustomobjects?Theanswerliesinonesimpleword:classes.

Objectsare,infact,instancesofclasses.ThebeautyofPythonisthatclassesareobjectsthemselves,butlet'snotgodownthisroad.Itleadstooneofthemostadvancedconceptsofthislanguage:metaclasses.Fornow,thebestwayforyoutogetthedifferencebetweenclassesandobjectsisbymeansofanexample.

Sayafriendtellsyou,Iboughtanewbike!Youimmediatelyunderstandwhatshe'stalkingabout.Haveyouseenthebike?No.Doyouknowwhatcoloritis?Nope.Thebrand?Nope.Doyouknowanythingaboutit?Nope.Butatthesametime,youknoweverythingyouneedinordertounderstandwhatyourfriendmeantwhenshetoldyousheboughtanewbike.Youknowthatabikehastwowheelsattachedtoaframe,asaddle,pedals,handlebars,brakes,andsoon.Inotherwords,evenifyouhaven'tseenthebikeitself,youknowtheconceptofbike.Anabstractsetoffeaturesandcharacteristicsthattogetherformsomethingcalledbike.

Incomputerprogramming,thatiscalledaclass.It'sthatsimple.Classesareusedtocreateobjects.Infact,objectsaresaidtobeinstancesofclasses.

Inotherwords,weallknowwhatabikeis;weknowtheclass.ButthenIhavemyownbike,whichisaninstanceofthebikeclass.Andmybikeisanobjectwithitsowncharacteristicsandmethods.Youhaveyourownbike.Sameclass,butdifferentinstance.Everybikeevercreatedintheworldisaninstanceofthebikeclass.

Let'sseeanexample.Wewillwriteaclassthatdefinesabikeandthenwe'llcreatetwobikes,oneredandoneblue.I'llkeepthecodeverysimple,butdon'tfretifyoudon'tunderstandeverythingaboutit;allyouneedtocareaboutatthismomentistounderstandthedifferencebetweenaclassandanobject(orinstanceofaclass):

#bike.py

#let'sdefinetheclassBike

classBike:

def__init__(self,colour,frame_material):

self.colour=colour

self.frame_material=frame_material

defbrake(self):

print("Braking!")

#let'screateacoupleofinstances

red_bike=Bike('Red','Carbonfiber')

blue_bike=Bike('Blue','Steel')

#let'sinspecttheobjectswehave,instancesoftheBikeclass.

print(red_bike.colour)#prints:Red

print(red_bike.frame_material)#prints:Carbonfiber

print(blue_bike.colour)#prints:Blue

print(blue_bike.frame_material)#prints:Steel

#let'sbrake!

red_bike.brake()#prints:Braking!

IhopebynowIdon'tneedtotellyoutorunthefileeverytime,right?Thefilenameisindicatedinthefirstlineofthecodeblock.Justrun$pythonfilename,andyou'llbefine.Butremembertohaveyourvirtualenvactivated!

Somanyinterestingthingstonoticehere.Firstthingsfirst;thedefinitionofaclasshappenswiththeclassstatement.Whatevercodecomesaftertheclassstatement,andisindented,iscalledthebodyoftheclass.Inourcase,thelastlinethatbelongstotheclassdefinitionistheprint("Braking!")one.

Afterhavingdefinedtheclass,we'rereadytocreateinstances.Youcanseethat

theclassbodyhoststhedefinitionoftwomethods.Amethodisbasically(andsimplistically)afunctionthatbelongstoaclass.

Thefirstmethod,__init__,isaninitializer.ItusessomePythonmagictosetuptheobjectswiththevalueswepasswhenwecreateit.

Everymethodthathasleadingandtrailingdoubleunderscores,inPython,iscalledamagicmethod.MagicmethodsareusedbyPythonforamultitudeofdifferentpurposes;henceit'sneveragoodideatonameacustommethodusingtwoleadingandtrailingunderscores.ThisnamingconventionisbestlefttoPython.

Theothermethodwedefined,brake,isjustanexampleofanadditionalmethodthatwecouldcallifwewantedtobrakethebike.Itcontainsjustaprintstatement,ofcourse;it'sanexample.

Wecreatedtwobikesthen.Onehasredcolorandacarbonfiberframe,andtheotheronehasbluecolorandasteelframe.Wepassthosevaluesuponcreation.Aftercreation,weprintoutthecolorpropertyandframetypeoftheredbike,andtheframetypeoftheblueonejustasanexample.Wealsocallthebrakemethodofthered_bike.

Onelastthingtonotice.YourememberItoldyouthatthesetofattributesofanobjectisconsideredtobeanamespace?Ihopeit'sclearerwhatImeantnow.Youseethatbygettingtotheframe_typepropertythroughdifferentnamespaces(red_bike,blue_bike),weobtaindifferentvalues.Nooverlapping,noconfusion.

Thedot(.)operatorisofcoursethemeansweusetowalkintoanamespace,inthecaseofobjectsaswell.

GuidelinesonhowtowritegoodcodeWritinggoodcodeisnotaseasyasitseems.AsIalreadysaidbefore,goodcodeexposesalonglistofqualitiesthatisquitehardtoputtogether.Writinggoodcodeis,tosomeextent,anart.Regardlessofwhereonthepathyouwillbehappytosettle,thereissomethingthatyoucanembracewhichwillmakeyourcodeinstantlybetter:PEP8.

AccordingtoWikipedia:

"Python'sdevelopmentisconductedlargelythroughthePythonEnhancementProposal(PEP)process.ThePEPprocessistheprimarymechanismforproposingmajornewfeatures,forcollectingcommunityinputonanissue,andfordocumentingthedesigndecisionsthathavegoneintoPython."

PEP8isperhapsthemostfamousofallPEPs.ItlaysoutasimplebuteffectivesetofguidelinestodefinePythonaestheticssothatwewritebeautifulPythoncode.Ifyoutakeonesuggestionoutofthischapter,pleaseletitbethis:useit.Embraceit.Youwillthankmelater.

Codingtodayisnolongeracheck-in/check-outbusiness.Rather,it'smoreofasocialeffort.SeveraldeveloperscollaborateonapieceofcodethroughtoolssuchasGitandMercurial,andtheresultiscodethatisfatheredbymanydifferenthands.

GitandMercurialareprobablythedistributedrevisioncontrolsystemsthataremostusedtoday.Theyareessentialtoolsdesignedtohelpteamsofdeveloperscollaborateonthesamesoftware.

Thesedays,morethanever,weneedtohaveaconsistentwayofwritingcode,sothatreadabilityismaximized.WhenalldevelopersofacompanyabidebyPEP8,it'snotuncommonforanyofthemlandingonapieceofcodetothinktheywroteitthemselves.Itactuallyhappenstomeallthetime(IalwaysforgetthecodeIwrite).

Thishasatremendousadvantage:whenyoureadcodethatyoucouldhavewrittenyourself,youreaditeasily.Withoutaconvention,everycoderwouldstructurethecodethewaytheylikemost,orsimplythewaytheyweretaughtorareusedto,andthiswouldmeanhavingtointerpreteverylineaccordingto

someoneelse'sstyle.Itwouldmeanhavingtolosemuchmoretimejusttryingtounderstandit.ThankstoPEP8,wecanavoidthis.I'msuchafanofitthatIwon'tsignoffacodereviewifthecodedoesn'trespectit.So,pleasetakethetimetostudyit;it'sveryimportant.

Intheexamplesinthisbook,IwilltrytorespectitasmuchasIcan.Unfortunately,Idon'thavetheluxuryof79characters(whichisthemaximumlinelengthsuggestedbyPEP8),andIwillhavetocutdownonblanklinesandotherthings,butIpromiseyouI'lltrytolayoutmycodesothatit'sasreadableaspossible.

ThePythonculturePythonhasbeenadoptedwidelyinallcodingindustries.It'susedbymanydifferentcompaniesformanydifferentpurposes,andit'salsousedineducation(it'sanexcellentlanguageforthatpurpose,becauseofitsmanyqualitiesandthefactthatit'seasytolearn).

OneofthereasonsPythonissopopulartodayisthatthecommunityarounditisvast,vibrant,andfullofbrilliantpeople.Manyeventsareorganizedallovertheworld,mostlyeitheraroundPythonoritsmainwebframework,Django.

Pythonisopen,andveryoftensoarethemindsofthosewhoembraceit.CheckoutthecommunitypageonthePythonwebsiteformoreinformationandgetinvolved!

ThereisanotheraspecttoPythonwhichrevolvesaroundthenotionofbeingPythonic.IthastodowiththefactthatPythonallowsyoutousesomeidiomsthataren'tfoundelsewhere,atleastnotinthesameformoraseasytouse(IfeelquiteclaustrophobicwhenIhavetocodeinalanguagewhichisnotPythonnow).

Anyway,overtheyears,thisconceptofbeingPythonichasemergedand,thewayIunderstandit,issomethingalongthelinesofdoingthingsthewaytheyaresupposedtobedoneinPython.

TohelpyouunderstandalittlebitmoreaboutPython'scultureandaboutbeingPythonic,IwillshowyoutheZenofPython.AlovelyEastereggthatisverypopular.OpenupaPythonconsoleandtypeimportthis.Whatfollowsistheresultofthisline:

>>>importthis

TheZenofPython,byTimPeters

Beautifulisbetterthanugly.

Explicitisbetterthanimplicit.

Simpleisbetterthancomplex.

Complexisbetterthancomplicated.

Flatisbetterthannested.

Sparseisbetterthandense.

Readabilitycounts.

Specialcasesaren'tspecialenoughtobreaktherules.

Althoughpracticalitybeatspurity.

Errorsshouldneverpasssilently.

Unlessexplicitlysilenced.

Inthefaceofambiguity,refusethetemptationtoguess.

Thereshouldbeone--andpreferablyonlyone--obviouswaytodoit.

Althoughthatwaymaynotbeobviousatfirstunlessyou'reDutch.

Nowisbetterthannever.

Althoughneverisoftenbetterthan*right*now.

Iftheimplementationishardtoexplain,it'sabadidea.

Iftheimplementationiseasytoexplain,itmaybeagoodidea.

Namespacesareonehonkinggreatidea--let'sdomoreofthose!

Therearetwolevelsofreadinghere.Oneistoconsideritasasetofguidelinesthathavebeenputdowninafunway.Theotheroneistokeepitinmind,andmaybereaditonceinawhile,tryingtounderstandhowitreferstosomethingdeeper:somePythoncharacteristicsthatyouwillhavetounderstanddeeplyinordertowritePythonthewayit'ssupposedtobewritten.Startwiththefunlevel,andthendigdeeper.Alwaysdigdeeper.

AnoteonIDEs

JustafewwordsaboutIDEs.Tofollowtheexamplesinthisbook,youdon'tneedone;anytexteditorwilldofine.Ifyouwanttohavemoreadvancedfeatures,suchassyntaxcoloringandautocompletion,youwillhavetofetchyourselfanIDE.YoucanfindacomprehensivelistofopensourceIDEs(justGooglePythonIDEs)onthePythonwebsite.IpersonallyuseSublimeTexteditor.It'sfreetotryoutanditcostsjustafewdollars.IhavetriedmanyIDEsinmylife,butthisistheonethatmakesmemostproductive.

Twoimportantpiecesofadvice:

WhateverIDEyouchoosetouse,trytolearnitwellsothatyoucanexploititsstrengths,butdon'tdependonit.ExerciseyourselftoworkwithVIM(oranyothertexteditor)onceinawhile;learntobeabletodosomeworkonanyplatform,withanysetoftools.Whatevertexteditor/IDEyouuse,whenitcomestowritingPython,indentationisfourspaces.Don'tusetabs,don'tmixthemwithspaces.Usefourspaces,nottwo,notthree,notfive.Justusefour.Thewholeworldworkslikethat,andyoudon'twanttobecomeanoutcastbecauseyouwerefondofthethree-spacelayout.

SummaryInthischapter,westartedtoexploretheworldofprogrammingandthatofPython.We'vebarelyscratchedthesurface,justalittle,touchingconceptsthatwillbediscussedlateroninthebookingreaterdetail.

WetalkedaboutPython'smainfeatures,whoisusingitandforwhat,andwhatarethedifferentwaysinwhichwecanwriteaPythonprogram.

Inthelastpartofthechapter,weflewoverthefundamentalnotionsofnamespaces,scopes,classes,andobjects.WealsosawhowPythoncodecanbeorganizedusingmodulesandpackages.

Onapracticallevel,welearnedhowtoinstallPythononoursystem,howtomakesurewehavethetoolsweneed,pipandvirtualenv,andwealsocreatedandactivatedourfirstvirtualenvironment.Thiswillallowustoworkinaself-containedenvironmentwithouttheriskofcompromisingthePythonsysteminstallation.

Nowyou'rereadytostartthisjourneywithme.Allyouneedisenthusiasm,anactivatedvirtualenvironment,thisbook,yourfingers,andsomecoffee.

Trytofollowtheexamples;I'llkeepthemsimpleandshort.Ifyouputthemunderyourfingertips,youwillretainthemmuchbetterthanifyoujustreadthem.

Inthenextchapter,wewillexplorePython'srichsetofbuilt-indatatypes.There'smuchtocoverandmuchtolearn!

Built-inDataTypes"Data!Data!Data!"hecriedimpatiently."Ican'tmakebrickswithoutclay."

–SherlockHolmes–TheAdventureoftheCopperBeeches

Everythingyoudowithacomputerismanagingdata.Datacomesinmanydifferentshapesandflavors.It'sthemusicyoulistento,themoviesyoustream,thePDFsyouopen.Eventhesourceofthechapteryou'rereadingatthisverymomentisjustafile,whichisdata.

Datacanbesimple,anintegernumbertorepresentanage,orcomplex,likeanorderplacedonawebsite.Itcanbeaboutasingleobjectoraboutacollectionofthem.Datacanevenbeaboutdata,thatis,metadata.Datathatdescribesthedesignofotherdatastructuresordatathatdescribesapplicationdataoritscontext.InPython,objectsareabstractionfordata,andPythonhasanamazingvarietyofdatastructuresthatyoucanusetorepresentdata,orcombinethemtocreateyourowncustomdata.

Inthischapter,wearegoingtocoverthefollowing:

Pythonobjects'structuresMutabilityandimmutabilityBuilt-indatatypes:numbers,strings,sequences,collections,andmappingtypesThecollectionsmoduleEnumerations

EverythingisanobjectBeforewedelveintothespecifics,IwantyoutobeveryclearaboutobjectsinPython,solet'stalkalittlebitmoreaboutthem.Aswealreadysaid,everythinginPythonisanobject.Butwhatreallyhappenswhenyoutypeaninstructionlikeage=42inaPythonmodule?

Ifyougotohttp://pythontutor.com/,youcantypethatinstructionintoatextboxandgetitsvisualrepresentation.Keepthiswebsiteinmind;it'sveryusefultoconsolidateyourunderstandingofwhatgoesonbehindthescenes.

So,whathappensisthatanobjectiscreated.Itgetsanid,thetypeissettoint(integernumber),andthevalueto42.Anameageisplacedintheglobalnamespace,pointingtothatobject.Therefore,wheneverweareintheglobalnamespace,aftertheexecutionofthatline,wecanretrievethatobjectbysimplyaccessingitthroughitsname:age.

Ifyouweretomovehouse,youwouldputalltheknives,forks,andspoonsinaboxandlabelitcutlery.Canyouseeit'sexactlythesameconcept?Here'sascreenshotofwhatitmaylooklike(youmayhavetotweakthesettingstogettothesameview):

So,fortherestofthischapter,wheneveryoureadsomethingsuchasname=some_value,thinkofanameplacedinthenamespacethatistiedtothescopeinwhichtheinstructionwaswritten,withanicearrowpointingtoanobjectthathasanid,atype,andavalue.Thereisalittlebitmoretosayaboutthismechanism,butit'smucheasiertotalkaboutitoveranexample,sowe'llget

backtothislater.

Mutableorimmutable?ThatisthequestionAfirstfundamentaldistinctionthatPythonmakesondataisaboutwhetherornotthevalueofanobjectchanges.Ifthevaluecanchange,theobjectiscalledmutable,whileifthevaluecannotchange,theobjectiscalledimmutable.

Itisveryimportantthatyouunderstandthedistinctionbetweenmutableandimmutablebecauseitaffectsthecodeyouwrite,sohere'saquestion:

>>>age=42

>>>age

42

>>>age=43#A

>>>age

43

Intheprecedingcode,ontheline#A,haveIchangedthevalueofage?Well,no.Butnowit's43(Ihearyousay...).Yes,it's43,but42wasanintegernumber,ofthetypeint,whichisimmutable.So,whathappenedisreallythatonthefirstline,ageisanamethatissettopointtoanintobject,whosevalueis42.Whenwetypeage=43,whathappensisthatanotherobjectiscreated,ofthetypeintandvalue43(also,theidwillbedifferent),andthenameageissettopointtoit.So,wedidn'tchangethat42to43.Weactuallyjustpointedagetoadifferentlocation:thenewintobjectwhosevalueis43.Let'sseethesamecodealsoprintingtheIDs:

>>>age=42

>>>id(age)

4377553168

>>>age=43

>>>id(age)

4377553200

NoticethatweprinttheIDsbycallingthebuilt-inidfunction.Asyoucansee,theyaredifferent,asexpected.Bearinmindthatagepointstooneobjectatatime:42first,then43.Nevertogether.

Now,let'sseethesameexampleusingamutableobject.Forthisexample,let'sjustuseaPersonobject,thathasapropertyage(don'tworryabouttheclassdeclarationfornow;it'sthereonlyforcompleteness):

>>>classPerson():

...def__init__(self,age):

...self.age=age

...

>>>fab=Person(age=42)

>>>fab.age

42

>>>id(fab)

4380878496

>>>id(fab.age)

4377553168

>>>fab.age=25#Iwish!

>>>id(fab)#willbethesame

4380878496

>>>id(fab.age)#willbedifferent

4377552624

Inthiscase,IsetupanobjectfabwhosetypeisPerson(acustomclass).Oncreation,theobjectisgiventheageof42.I'mprintingit,alongwiththeobjectid,andtheIDofageaswell.Noticethat,evenafterIchangeagetobe25,theIDoffabstaysthesame(whiletheIDofagehaschanged,ofcourse).CustomobjectsinPythonaremutable(unlessyoucodethemnottobe).Keepthisconceptinmind;it'sveryimportant.I'llremindyouaboutitthroughouttherestofthechapter.

NumbersLet'sstartbyexploringPython'sbuilt-indatatypesfornumbers.Pythonwasdesignedbyamanwithamaster'sdegreeinmathematicsandcomputerscience,soit'sonlylogicalthatithasamazingsupportfornumbers.

Numbersareimmutableobjects.

IntegersPythonintegershaveanunlimitedrange,subjectonlytotheavailablevirtualmemory.Thismeansthatitdoesn'treallymatterhowbiganumberyouwanttostoreis:aslongasitcanfitinyourcomputer'smemory,Pythonwilltakecareofit.Integernumberscanbepositive,negative,and0(zero).Theysupportallthebasicmathematicaloperations,asshowninthefollowingexample:

>>>a=14

>>>b=3

>>>a+b#addition

17

>>>a-b#subtraction

11

>>>a*b#multiplication

42

>>>a/b#truedivision

4.666666666666667

>>>a//b#integerdivision

4

>>>a%b#modulooperation(reminderofdivision)

2

>>>a**b#poweroperation

2744

Theprecedingcodeshouldbeeasytounderstand.Justnoticeoneimportantthing:Pythonhastwodivisionoperators,oneperformstheso-calledtruedivision(/),whichreturnsthequotientoftheoperands,andtheotherone,theso-calledintegerdivision(//),whichreturnstheflooredquotientoftheoperands.ItmightbeworthnotingthatinPython2thedivisionoperator/behavesdifferentlythaninPython3.Seehowthatisdifferentforpositiveandnegativenumbers:

>>>7/4#truedivision

1.75

>>>7//4#integerdivision,truncationreturns1

1

>>>-7/4#truedivisionagain,resultisoppositeofprevious

-1.75

>>>-7//4#integerdiv.,resultnottheoppositeofprevious

-2

Thisisaninterestingexample.Ifyouwereexpectinga-1onthelastline,don'tfeelbad,it'sjustthewayPythonworks.TheresultofanintegerdivisioninPythonisalwaysroundedtowardsminusinfinity.If,insteadofflooring,you

wanttotruncateanumbertoaninteger,youcanusethebuilt-inintfunction,asshowninthefollowingexample:

>>>int(1.75)

1

>>>int(-1.75)

-1

Noticethatthetruncationisdonetoward0.

Thereisalsoanoperatortocalculatetheremainderofadivision.It'scalledamodulooperator,andit'srepresentedbyapercentage(%):

>>>10%3#remainderofthedivision10//3

1

>>>10%4#remainderofthedivision10//4

2

OnenicefeatureintroducedinPython3.6istheabilitytoaddunderscoreswithinnumberliterals(betweendigitsorbasespecifiers,butnotleadingortrailing).Thepurposeistohelpmakesomenumbersmorereadable,likeforexample1_000_000_000:

>>>n=1_024

>>>n

1024

>>>hex_n=0x_4_0_0#0x400==1024

>>>hex_n

1024

BooleansBooleanalgebraisthatsubsetofalgebrainwhichthevaluesofthevariablesarethetruthvalues:trueandfalse.InPython,TrueandFalsearetwokeywordsthatareusedtorepresenttruthvalues.Booleansareasubclassofintegers,andbehaverespectivelylike1and0.TheequivalentoftheintclassforBooleansistheboolclass,whichreturnseitherTrueorFalse.Everybuilt-inPythonobjecthasavalueintheBooleancontext,whichmeanstheybasicallyevaluatetoeitherTrueorFalsewhenfedtotheboolfunction.We'llseeallaboutthisinChapter3,IteratingandMakingDecisions.

BooleanvaluescanbecombinedinBooleanexpressionsusingthelogicaloperatorsand,or,andnot.Again,we'llseetheminfullinthenextchapter,sofornowlet'sjustseeasimpleexample:

>>>int(True)#Truebehaveslike1

1

>>>int(False)#Falsebehaveslike0

0

>>>bool(1)#1evaluatestoTrueinabooleancontext

True

>>>bool(-42)#andsodoeseverynon-zeronumber

True

>>>bool(0)#0evaluatestoFalse

False

>>>#quickpeakattheoperators(and,or,not)

>>>notTrue

False

>>>notFalse

True

>>>TrueandTrue

True

>>>FalseorTrue

True

YoucanseethatTrueandFalsearesubclassesofintegerswhenyoutrytoaddthem.Pythonupcaststhemtointegersandperformstheaddition:

>>>1+True

2

>>>False+42

42

>>>7-True

6

Upcastingisatypeconversionoperationthatgoesfromasubclasstoitsparent.Intheexamplepresentedhere,TrueandFalse,whichbelongtoaclassderivedfromtheintegerclass,areconvertedbacktointegerswhenneeded.Thistopicisaboutinheritanceandwillbe

explainedindetailinChapter6,OOP,Decorators,andIterators.

RealnumbersRealnumbers,orfloatingpointnumbers,arerepresentedinPythonaccordingtotheIEEE754double-precisionbinaryfloating-pointformat,whichisstoredin64bitsofinformationdividedintothreesections:sign,exponent,andmantissa.

QuenchyourthirstforknowledgeaboutthisformatonWikipedia:http://en.wikipedia.org/wiki/Double-precision_floating-point_format.

Usually,programminglanguagesgivecoderstwodifferentformats:singleanddoubleprecision.Theformertakesup32bitsofmemory,andthelatter64.Pythonsupportsonlythedoubleformat.Let'sseeasimpleexample:

>>>pi=3.1415926536#howmanydigitsofPIcanyouremember?

>>>radius=4.5

>>>area=pi*(radius**2)

>>>area

63.617251235400005

Inthecalculationofthearea,Iwrappedtheradius**2withinbraces.Eventhoughthatwasn'tnecessarybecausethepoweroperatorhashigherprecedencethanthemultiplicationone,Ithinktheformulareadsmoreeasilylikethat.Moreover,shouldyougetaslightlydifferentresultforthearea,don'tworry.ItmightdependonyourOS,howPythonwascompiled,andsoon.Aslongasthefirstfewdecimaldigitsarecorrect,youknowit'sacorrectresult.

Thesys.float_infostructsequenceholdsinformationabouthowfloatingpointnumberswillbehaveonyoursystem.ThisiswhatIseeonmybox:

>>>importsys

>>>sys.float_info

sys.float_info(max=1.7976931348623157e+308,max_exp=1024,max_10_exp=308,

min=2.2250738585072014e-308,min_exp=-1021,min_10_exp=-307,dig=15,mant_dig=53,

epsilon=2.220446049250313e-16,radix=2,rounds=1)

Let'smakeafewconsiderationshere:wehave64bitstorepresentfloatnumbers.Thismeanswecanrepresentatmost2**64==18,446,744,073,709,551,616numberswiththatamountofbits.Takealookatthemaxandepsilonvaluesforthefloatnumbers,andyou'llrealizeit'simpossibletorepresentthemall.Thereisjustnotenoughspace,sotheyareapproximatedtotheclosestrepresentablenumber.Youprobablythinkthatonlyextremelybigorextremelysmallnumberssufferfromthisissue.Well,thinkagainandtrythefollowinginyourconsole:

>>>0.3-0.1*3#thisshouldbe0!!!

-5.551115123125783e-17

Whatdoesthistellyou?Ittellsyouthatdoubleprecisionnumberssufferfromapproximationissuesevenwhenitcomestosimplenumberslike0.1or0.3.Whyisthisimportant?Itcanbeabigproblemifyou'rehandlingprices,orfinancialcalculations,oranykindofdatathatneedsnottobeapproximated.Don'tworry,Pythongivesyouthedecimaltype,whichdoesn'tsufferfromtheseissues;we'llseetheminamoment.

Complexnumbers

Pythongivesyoucomplexnumberssupportoutofthebox.Ifyoudon'tknowwhatcomplexnumbersare,theyarenumbersthatcanbeexpressedintheforma+ibwhereaandbarerealnumbers,andi(orjifyou'reanengineer)istheimaginaryunit,thatis,thesquarerootof-1.aandbarecalled,respectively,therealandimaginarypartofthenumber.

It'sactuallyunlikelyyou'llbeusingthem,unlessyou'recodingsomethingscientific.Let'sseeasmallexample:

>>>c=3.14+2.73j

>>>c.real#realpart

3.14

>>>c.imag#imaginarypart

2.73

>>>c.conjugate()#conjugateofA+BjisA-Bj

(3.14-2.73j)

>>>c*2#multiplicationisallowed

(6.28+5.46j)

>>>c**2#poweroperationaswell

(2.4067000000000007+17.1444j)

>>>d=1+1j#additionandsubtractionaswell

>>>c-d

(2.14+1.73j)

FractionsanddecimalsLet'sfinishthetourofthenumberdepartmentwithalookatfractionsanddecimals.Fractionsholdarationalnumeratoranddenominatorintheirlowestforms.Let'sseeaquickexample:

>>>fromfractionsimportFraction

>>>Fraction(10,6)#madhatter?

Fraction(5,3)#noticeit'sbeensimplified

>>>Fraction(1,3)+Fraction(2,3)#1/3+2/3==3/3==1/1

Fraction(1,1)

>>>f=Fraction(10,6)

>>>f.numerator

5

>>>f.denominator

3

Althoughtheycanbeveryusefulattimes,it'snotthatcommontospotthemincommercialsoftware.Mucheasierinstead,istoseedecimalnumbersbeingusedinallthosecontextswhereprecisioniseverything;forexample,inscientificandfinancialcalculations.

It'simportanttorememberthatarbitraryprecisiondecimalnumberscomeatapriceinperformance,ofcourse.Theamountofdatatobestoredforeachnumberisfargreaterthanitisforfractionsorfloatsaswellasthewaytheyarehandled,whichcausesthePythoninterpretermuchmoreworkbehindthescenes.Anotherinterestingthingtonoteisthatyoucangetandsettheprecisionbyaccessingdecimal.getcontext().prec.

Let'sseeaquickexamplewithdecimalnumbers:

>>>fromdecimalimportDecimalasD#renameforbrevity

>>>D(3.14)#pi,fromfloat,soapproximationissues

Decimal('3.140000000000000124344978758017532527446746826171875')

>>>D('3.14')#pi,fromastring,sonoapproximationissues

Decimal('3.14')

>>>D(0.1)*D(3)-D(0.3)#fromfloat,westillhavetheissue

Decimal('2.775557561565156540423631668E-17')

>>>D('0.1')*D(3)-D('0.3')#fromstring,allperfect

Decimal('0.0')

>>>D('1.4').as_integer_ratio()#7/5=1.4(isn'tthiscool?!)

(7,5)

NoticethatwhenweconstructaDecimalnumberfromafloat,ittakesonalltheapproximationissuesfloatmaycomefrom.Ontheotherhand,whentheDecimalhasnoapproximationissues(forexample,whenwefeedanintorastringrepresentationtotheconstructor),thenthecalculationhasnoquirkybehavior.

Whenitcomestomoney,usedecimals.

Thisconcludesourintroductiontobuilt-innumerictypes.Let'snowlookatsequences.

ImmutablesequencesLet'sstartwithimmutablesequences:strings,tuples,andbytes.

StringsandbytesTextualdatainPythonishandledwithstrobjects,morecommonlyknownasstrings.TheyareimmutablesequencesofUnicodecodepoints.Unicodecodepointscanrepresentacharacter,butcanalsohaveothermeanings,suchasformattingdata,forexample.Python,unlikeotherlanguages,doesn'thaveachartype,soasinglecharacterisrenderedsimplybyastringoflength1.

Unicodeisanexcellentwaytohandledata,andshouldbeusedfortheinternalsofanyapplication.Whenitcomestostoringtextualdatathough,orsendingitonthenetwork,youmaywanttoencodeit,usinganappropriateencodingforthemediumyou'reusing.Theresultofanencodingproducesabytesobject,whosesyntaxandbehaviorissimilartothatofstrings.StringliteralsarewritteninPythonusingsingle,double,ortriplequotes(bothsingleordouble).Ifbuiltwithtriplequotes,astringcanspanonmultiplelines.Anexamplewillclarifythis:

>>>#4waystomakeastring

>>>str1='Thisisastring.Webuiltitwithsinglequotes.'

>>>str2="Thisisalsoastring,butbuiltwithdoublequotes."

>>>str3='''Thisisbuiltusingtriplequotes,

...soitcanspanmultiplelines.'''

>>>str4="""Thistoo

...isamultilineone

...builtwithtripledouble-quotes."""

>>>str4#A

'Thistoo\nisamultilineone\nbuiltwithtripledouble-quotes.'

>>>print(str4)#B

Thistoo

isamultilineone

builtwithtripledouble-quotes.

In#Aand#B,weprintstr4,firstimplicitly,andthenexplicitly,usingtheprintfunction.Aniceexercisewouldbetofindoutwhytheyaredifferent.Areyouuptothechallenge?(hint:lookupthestrfunction.)

Strings,likeanysequence,havealength.Youcangetthisbycallingthelenfunction:

>>>len(str1)

49

Encodinganddecodingstrings

Usingtheencode/decodemethods,wecanencodeUnicodestringsanddecodebytesobjects.UTF-8isavariablelengthcharacterencoding,capableofencodingallpossibleUnicodecodepoints.Itisthedominantencodingfortheweb.Noticealsothatbyaddingaliteralbinfrontofastringdeclaration,we'recreatingabytesobject:

>>>s="Thisisüŋíc0de"#unicodestring:codepoints

>>>type(s)

<class'str'>

>>>encoded_s=s.encode('utf-8')#utf-8encodedversionofs

>>>encoded_s

b'Thisis\xc3\xbc\xc5\x8b\xc3\xadc0de'#result:bytesobject

>>>type(encoded_s)#anotherwaytoverifyit

<class'bytes'>

>>>encoded_s.decode('utf-8')#let'sreverttotheoriginal

'Thisisüŋíc0de'

>>>bytes_obj=b"Abytesobject"#abytesobject

>>>type(bytes_obj)

<class'bytes'>

IndexingandslicingstringsWhenmanipulatingsequences,it'sverycommontohavetoaccessthematonepreciseposition(indexing),ortogetasubsequenceoutofthem(slicing).Whendealingwithimmutablesequences,bothoperationsareread-only.

Whileindexingcomesinoneform,azero-basedaccesstoanypositionwithinthesequence,slicingcomesindifferentforms.Whenyougetasliceofasequence,youcanspecifythestartandstoppositions,andthestep.Theyareseparatedwithacolon(:)likethis:my_sequence[start:stop:step].Alltheargumentsareoptional,startisinclusive,andstopisexclusive.It'smucheasiertoshowanexample,ratherthanexplainthemfurtherinwords:

>>>s="Thetroubleisyouthinkyouhavetime."

>>>s[0]#indexingatposition0,whichisthefirstchar

'T'

>>>s[5]#indexingatposition5,whichisthesixthchar

'r'

>>>s[:4]#slicing,wespecifyonlythestopposition

'The'

>>>s[4:]#slicing,wespecifyonlythestartposition

'troubleisyouthinkyouhavetime.'

>>>s[2:14]#slicing,bothstartandstoppositions

'etroubleis'

>>>s[2:14:3]#slicing,start,stopandstep(every3chars)

'erb'

>>>s[:]#quickwayofmakingacopy

'Thetroubleisyouthinkyouhavetime.'

Ofallthelines,thelastoneisprobablythemostinteresting.Ifyoudon'tspecifyaparameter,Pythonwillfillinthedefaultforyou.Inthiscase,startwillbethestartofthestring,stopwillbetheendofthestring,andstepwillbethedefault1.Thisisaneasyandquickwayofobtainingacopyofthestrings(samevalue,butdifferentobject).Canyoufindawaytogetthereversedcopyofastringusingslicing(don'tlookitup;finditforyourself)?

StringformattingOneofthefeaturesstringshaveistheabilitytobeusedasatemplate.Thereareseveraldifferentwaysofformattingastring,andforthefulllistofpossibilities,Iencourageyoutolookupthedocumentation.Herearesomecommonexamples:

>>>greet_old='Hello%s!'

>>>greet_old%'Fabrizio'

'HelloFabrizio!'

>>>greet_positional='Hello{}{}!'

>>>greet_positional.format('Fabrizio','Romano')

'HelloFabrizioRomano!'

>>>greet_positional_idx='Thisis{0}!{1}loves{0}!'

>>>greet_positional_idx.format('Python','Fabrizio')

'ThisisPython!FabriziolovesPython!'

>>>greet_positional_idx.format('Coffee','Fab')

'ThisisCoffee!FablovesCoffee!'

>>>keyword='Hello,mynameis{name}{last_name}'

>>>keyword.format(name='Fabrizio',last_name='Romano')

'Hello,mynameisFabrizioRomano'

Inthepreviousexample,youcanseefourdifferentwaysofformattingstings.Thefirstone,whichreliesonthe%operator,isdeprecatedandshouldn'tbeusedanymore.Thecurrent,modernwaytoformatastringisbyusingtheformatstringmethod.Youcansee,fromthedifferentexamples,thatapairofcurlybracesactsasaplaceholderwithinthestring.Whenwecallformat,wefeeditdatathatreplacestheplaceholders.Wecanspecifyindexes(andmuchmore)withinthecurlybraces,andevennames,whichimplieswe'llhavetocallformatusingkeywordargumentsinsteadofpositionalones.

Noticehowgreet_positional_idxisrendereddifferentlybyfeedingdifferentdatatothecalltoformat.Apparently,I'mintoPythonandcoffee...bigsurprise!

OnelastfeatureIwanttoshowyouisarelativelynewadditiontoPython(Version3.6)andit'scalledformattedstringliterals.Thisfeatureisquitecool:stringsareprefixedwithf,andcontainreplacementfieldssurroundedbycurlybraces.Replacementfieldsareexpressionsevaluatedatruntime,andthenformattedusingtheformatprotocol:

>>>name='Fab'

>>>age=42

>>>f"Hello!Mynameis{name}andI'm{age}"

"Hello!MynameisFabandI'm42"

>>>frommathimportpi

>>>f"Noarguingwith{pi},it'sirrational..."

"Noarguingwith3.141592653589793,it'sirrational..."

Checkouttheofficialdocumentationtolearneverythingaboutstringformattingandhowpowerfulitcanbe.

TuplesThelastimmutablesequencetypewe'regoingtoseeisthetuple.AtupleisasequenceofarbitraryPythonobjects.Inatuple,itemsareseparatedbycommas.TheyareusedeverywhereinPython,becausetheyallowforpatternsthatarehardtoreproduceinotherlanguages.Sometimestuplesareusedimplicitly;forexample,tosetupmultiplevariablesononeline,ortoallowafunctiontoreturnmultipledifferentobjects(usuallyafunctionreturnsoneobjectonly,inmanyotherlanguages),andeveninthePythonconsole,youcanusetuplesimplicitlytoprintmultipleelementswithonesingleinstruction.We'llseeexamplesforallthesecases:

>>>t=()#emptytuple

>>>type(t)

<class'tuple'>

>>>one_element_tuple=(42,)#youneedthecomma!

>>>three_elements_tuple=(1,3,5)#bracesareoptionalhere

>>>a,b,c=1,2,3#tupleformultipleassignment

>>>a,b,c#implicittupletoprintwithoneinstruction

(1,2,3)

>>>3inthree_elements_tuple#membershiptest

True

Noticethatthemembershipoperatorincanalsobeusedwithlists,strings,dictionaries,and,ingeneral,withcollectionandsequenceobjects.

Noticethattocreateatuplewithoneitem,weneedtoputthatcommaaftertheitem.Thereasonisthatwithoutthecommathatitemisjustitselfwrappedinbraces,kindofinaredundantmathematicalexpression.Noticealsothatonassignment,bracesareoptionalsomy_tuple=1,2,3isthesameasmy_tuple=(1,2,3).

Onethingthattupleassignmentallowsustodo,isone-lineswaps,withnoneedforathirdtemporaryvariable.Let'sseefirstamoretraditionalwayofdoingit:

>>>a,b=1,2

>>>c=a#weneedthreelinesandatemporaryvarc

>>>a=b

>>>b=c

>>>a,b#aandbhavebeenswapped

(2,1)

Andnowlet'sseehowwewoulddoitinPython:

>>>a,b=0,1

>>>a,b=b,a#thisisthePythonicwaytodoit

>>>a,b

(1,0)

TakealookatthelinethatshowsyouthePythonicwayofswappingtwovalues.DoyourememberwhatIwroteinChapter1,AGentleIntroductiontoPython?APythonprogramistypicallyone-fifthtoone-thirdthesizeofequivalentJavaorC++code,andfeatureslikeone-lineswapscontributetothis.Pythoniselegant,whereeleganceinthiscontextalsomeanseconomy.

Becausetheyareimmutable,tuplescanbeusedaskeysfordictionaries(we'llseethisshortly).Tome,tuplesarePython'sbuilt-indatathatmostcloselyrepresentamathematicalvector.Thisdoesn'tmeanthatthiswasthereasonforwhichtheywerecreatedthough.Tuplesusuallycontainanheterogeneoussequenceofelements,whileontheotherhand,listsaremostofthetimeshomogeneous.Moreover,tuplesarenormallyaccessedviaunpackingorindexing,whilelistsareusuallyiteratedover.

MutablesequencesMutablesequencesdifferfromtheirimmutablesistersinthattheycanbechangedaftercreation.TherearetwomutablesequencetypesinPython:listsandbytearrays.IsaidbeforethatthedictionaryisthekingofdatastructuresinPython.Iguessthismakesthelistitsrightfulqueen.

ListsPythonlistsaremutablesequences.Theyareverysimilartotuples,buttheydon'thavetherestrictionsofimmutability.Listsarecommonlyusedtostoringcollectionsofhomogeneousobjects,butthereisnothingpreventingyoufromstoreheterogeneouscollectionsaswell.Listscanbecreatedinmanydifferentways.Let'sseeanexample:

>>>[]#emptylist

[]

>>>list()#sameas[]

[]

>>>[1,2,3]#aswithtuples,itemsarecommaseparated

[1,2,3]

>>>[x+5forxin[2,3,4]]#Pythonismagic

[7,8,9]

>>>list((1,3,5,7,9))#listfromatuple

[1,3,5,7,9]

>>>list('hello')#listfromastring

['h','e','l','l','o']

Inthepreviousexample,Ishowedyouhowtocreatealistusingdifferenttechniques.IwouldlikeyoutotakeagoodlookatthelinethatsaysPythonismagic,whichIamnotexpectingyoutofullyunderstandatthispoint(unlessyoucheatedandyou'renotanovice!).Thatiscalledalistcomprehension,averypowerfulfunctionalfeatureofPython,whichwe'llseeindetailinChapter5,SavingTimeandMemory.Ijustwantedtomakeyourmouthwateratthispoint.

Creatinglistsisgood,buttherealfuncomeswhenweusethem,solet'sseethemainmethodstheygiftuswith:

>>>a=[1,2,1,3]

>>>a.append(13)#wecanappendanythingattheend

>>>a

[1,2,1,3,13]

>>>a.count(1)#howmany`1`arethereinthelist?

2

>>>a.extend([5,7])#extendthelistbyanother(orsequence)

>>>a

[1,2,1,3,13,5,7]

>>>a.index(13)#positionof`13`inthelist(0-basedindexing)

4

>>>a.insert(0,17)#insert`17`atposition0

>>>a

[17,1,2,1,3,13,5,7]

>>>a.pop()#pop(removeandreturn)lastelement

7

>>>a.pop(3)#popelementatposition3

1

>>>a

[17,1,2,3,13,5]

>>>a.remove(17)#remove`17`fromthelist

>>>a

[1,2,3,13,5]

>>>a.reverse()#reversetheorderoftheelementsinthelist

>>>a

[5,13,3,2,1]

>>>a.sort()#sortthelist

>>>a

[1,2,3,5,13]

>>>a.clear()#removeallelementsfromthelist

>>>a

[]

Theprecedingcodegivesyouaroundupofalist'smainmethods.Iwanttoshowyouhowpowerfultheyare,usingextendasanexample.Youcanextendlistsusinganysequencetype:

>>>a=list('hello')#makesalistfromastring

>>>a

['h','e','l','l','o']

>>>a.append(100)#append100,heterogeneoustype

>>>a

['h','e','l','l','o',100]

>>>a.extend((1,2,3))#extendusingtuple

>>>a

['h','e','l','l','o',100,1,2,3]

>>>a.extend('...')#extendusingstring

>>>a

['h','e','l','l','o',100,1,2,3,'.','.','.']

Now,let'sseewhatarethemostcommonoperationsyoucandowithlists:

>>>a=[1,3,5,7]

>>>min(a)#minimumvalueinthelist

1

>>>max(a)#maximumvalueinthelist

7

>>>sum(a)#sumofallvaluesinthelist

16

>>>len(a)#numberofelementsinthelist

4

>>>b=[6,7,8]

>>>a+b#`+`withlistmeansconcatenation

[1,3,5,7,6,7,8]

>>>a*2#`*`hasalsoaspecialmeaning

[1,3,5,7,1,3,5,7]

Thelasttwolinesintheprecedingcodearequiteinterestingbecausetheyintroduceustoaconceptcalledoperatoroverloading.Inshort,itmeansthatoperatorssuchas+,-.*,%,andsoon,mayrepresentdifferentoperationsaccordingtothecontexttheyareusedin.Itdoesn'tmakeanysensetosumtwolists,right?Therefore,the+signisusedtoconcatenatethem.Hence,the*signis

usedtoconcatenatethelisttoitselfaccordingtotherightoperand.

Now,let'stakeastepfurtherandseesomethingalittlemoreinteresting.IwanttoshowyouhowpowerfulthesortedmethodcanbeandhoweasyitisinPythontoachieveresultsthatrequireagreatdealofeffortinotherlanguages:

>>>fromoperatorimportitemgetter

>>>a=[(5,3),(1,3),(1,2),(2,-1),(4,9)]

>>>sorted(a)

[(1,2),(1,3),(2,-1),(4,9),(5,3)]

>>>sorted(a,key=itemgetter(0))

[(1,3),(1,2),(2,-1),(4,9),(5,3)]

>>>sorted(a,key=itemgetter(0,1))

[(1,2),(1,3),(2,-1),(4,9),(5,3)]

>>>sorted(a,key=itemgetter(1))

[(2,-1),(1,2),(5,3),(1,3),(4,9)]

>>>sorted(a,key=itemgetter(1),reverse=True)

[(4,9),(5,3),(1,3),(1,2),(2,-1)]

Theprecedingcodedeservesalittleexplanation.Firstofall,aisalistoftuples.Thismeanseachelementinaisatuple(a2-tuple,tobeprecise).Whenwecallsorted(some_list),wegetasortedversionofsome_list.Inthiscase,thesortingona2-tupleworksbysortingthemonthefirstiteminthetuple,andonthesecondwhenthefirstoneisthesame.Youcanseethisbehaviorintheresultofsorted(a),whichyields[(1,2),(1,3),...].Pythonalsogivesustheabilitytocontrolwhichelement(s)ofthetuplethesortingmustberunagainst.Noticethatwhenweinstructthesortedfunctiontoworkonthefirstelementofeachtuple(bykey=itemgetter(0)),theresultisdifferent:[(1,3),(1,2),...].Thesortingisdoneonlyonthefirstelementofeachtuple(whichistheoneatposition0).Ifwewanttoreplicatethedefaultbehaviorofasimplesorted(a)call,weneedtousekey=itemgetter(0,1),whichtellsPythontosortfirstontheelementsatposition0withinthetuples,andthenonthoseatposition1.Comparetheresultsandyou'llseetheymatch.

Forcompleteness,Iincludedanexampleofsortingonlyontheelementsatposition1,andthesamebutinreverseorder.IfyouhaveeverseensortinginJava,Iexpectyoutobequiteimpressedatthismoment.

ThePythonsortingalgorithmisverypowerful,anditwaswrittenbyTimPeters(we'vealreadyseenthisname,canyourecallwhen?).ItisaptlynamedTimsort,anditisablendbetweenmergeandinsertionsortandhasbettertimeperformancesthanmostotheralgorithmsusedformainstreamprogramminglanguages.Timsortisastablesortingalgorithm,whichmeansthatwhenmultiple

recordshavethesamekey,theiroriginalorderispreserved.We'veseenthisintheresultofsorted(a,key=itemgetter(0)),whichhasyielded[(1,3),(1,2),...],inwhichtheorderofthosetwotupleshasbeenpreservedbecausetheyhavethesamevalueatposition0.

BytearraysToconcludeouroverviewofmutablesequencetypes,let'sspendacoupleofminutesonthebytearraytype.Basically,theyrepresentthemutableversionofbytesobjects.Theyexposemostoftheusualmethodsofmutablesequencesaswellasmostofthemethodsofthebytestype.Itemsareintegersintherange[0,256).

Whenitcomestointervals,I'mgoingtousethestandardnotationforopen/closedranges.Asquarebracketononeendmeansthatthevalueisincluded,whilearoundbracemeansit'sexcluded.Thegranularityisusuallyinferredbythetypeoftheedgeelementsso,forexample,theinterval[3,7]meansallintegersbetween3and7,inclusive.Ontheotherhand,(3,7)meansallintegersbetween3and7exclusive(hence4,5,and6).Itemsinabytearraytypeareintegersbetween0and256;0isincluded,256isnot.Onereasonintervalsareoftenexpressedlikethisistoeasecoding.Ifwebreakarange[a,b)intoNconsecutiveranges,wecaneasilyrepresenttheoriginaloneasaconcatenationlikethis:[a,k1)+[k1,k2)+[k2,k3)+...+[kN-1,b)Themiddlepoints(ki)beingexcludedononeend,andincludedontheotherend,allowforeasyconcatenationandsplittingwhenintervalsarehandledinthecode.

Let'sseeaquickexamplewiththebytearraytype:

>>>bytearray()#emptybytearrayobject

bytearray(b'')

>>>bytearray(10)#zero-filledinstancewithgivenlength

bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')

>>>bytearray(range(5))#bytearrayfromiterableofintegers

bytearray(b'\x00\x01\x02\x03\x04')

>>>name=bytearray(b'Lina')#A-bytearrayfrombytes

>>>name.replace(b'L',b'l')

bytearray(b'lina')

>>>name.endswith(b'na')

True

>>>name.upper()

bytearray(b'LINA')

>>>name.count(b'L')

1

Asyoucanseeintheprecedingcode,thereareafewwaystocreateabytearrayobject.Theycanbeusefulinmanysituations;forexample,whenreceivingdatathroughasocket,theyeliminatetheneedtoconcatenatedatawhilepolling,hencetheycanprovetobeveryhandy.Ontheline#A,Icreatedabytearraynamedasnamefromthebytesliteralb'Lina'toshowyouhowthebytearrayobjectexposesmethodsfrombothsequencesandstrings,whichisextremelyhandy.Ifyouthinkaboutit,theycanbeconsideredasmutablestrings.

SettypesPythonalsoprovidestwosettypes,setandfrozenset.Thesettypeismutable,whilefrozensetisimmutable.Theyareunorderedcollectionsofimmutableobjects.Hashabilityisacharacteristicthatallowsanobjecttobeusedasasetmemberaswellasakeyforadictionary,aswe'llseeverysoon.

Fromtheofficialdocumentation:"Anobjectishashableifithasahashvaluewhichneverchangesduringitslifetime,andcanbecomparedtootherobjects.Hashabilitymakesanobjectusableasadictionarykeyandasetmember,becausethesedatastructuresusethehashvalueinternally.AllofPython’simmutablebuilt-inobjectsarehashablewhilemutablecontainersarenot."

Objectsthatcompareequallymusthavethesamehashvalue.Setsareverycommonlyusedtotestformembership,solet'sintroducetheinoperatorinthefollowingexample:

>>>small_primes=set()#emptyset

>>>small_primes.add(2)#addingoneelementatatime

>>>small_primes.add(3)

>>>small_primes.add(5)

>>>small_primes

{2,3,5}

>>>small_primes.add(1)#LookwhatI'vedone,1isnotaprime!

>>>small_primes

{1,2,3,5}

>>>small_primes.remove(1)#solet'sremoveit

>>>3insmall_primes#membershiptest

True

>>>4insmall_primes

False

>>>4notinsmall_primes#negatedmembershiptest

True

>>>small_primes.add(3)#tryingtoadd3again

>>>small_primes

{2,3,5}#nochange,duplicationisnotallowed

>>>bigger_primes=set([5,7,11,13])#fastercreation

>>>small_primes|bigger_primes#unionoperator`|`

{2,3,5,7,11,13}

>>>small_primes&bigger_primes#intersectionoperator`&`

{5}

>>>small_primes-bigger_primes#differenceoperator`-`

{2,3}

Intheprecedingcode,youcanseetwodifferentwaystocreateaset.Onecreatesanemptysetandthenaddselementsoneatatime.Theothercreatesthesetusingalistofnumbersasanargumenttotheconstructor,whichdoesalltheworkforus.Ofcourse,youcancreateasetfromalistortuple(oranyiterable)andthenyoucanaddandremovemembersfromthesetasyouplease.

We'lllookatiterableobjectsanditerationinthenextchapter.Fornow,justknowthatiterableobjectsareobjectsyoucaniterateoninadirection.

Anotherwayofcreatingasetisbysimplyusingthecurlybracesnotation,likethis:

>>>small_primes={2,3,5,5,3}

>>>small_primes

{2,3,5}

NoticeIaddedsomeduplicationtoemphasizethattheresultingsetwon'thaveany.Let'sseeanexampleabouttheimmutablecounterpartofthesettype,frozenset:

>>>small_primes=frozenset([2,3,5,7])

>>>bigger_primes=frozenset([5,7,11])

>>>small_primes.add(11)#wecannotaddtoafrozenset

Traceback(mostrecentcalllast):

File"<stdin>",line1,in<module>

AttributeError:'frozenset'objecthasnoattribute'add'

>>>small_primes.remove(2)#neitherwecanremove

Traceback(mostrecentcalllast):

File"<stdin>",line1,in<module>

AttributeError:'frozenset'objecthasnoattribute'remove'

>>>small_primes&bigger_primes#intersect,union,etc.allowed

frozenset({5,7})

Asyoucansee,frozensetobjectsarequitelimitedinrespectoftheirmutablecounterpart.Theystillproveveryeffectiveformembershiptest,union,intersection,anddifferenceoperations,andforperformancereasons.

Mappingtypes–dictionariesOfallthebuilt-inPythondatatypes,thedictionaryiseasilythemostinterestingone.It'stheonlystandardmappingtype,anditisthebackboneofeveryPythonobject.

Adictionarymapskeystovalues.Keysneedtobehashableobjects,whilevaluescanbeofanyarbitrarytype.Dictionariesaremutableobjects.Therearequiteafewdifferentwaystocreateadictionary,soletmegiveyouasimpleexampleofhowtocreateadictionaryequalto{'A':1,'Z':-1}infivedifferentways:

>>>a=dict(A=1,Z=-1)

>>>b={'A':1,'Z':-1}

>>>c=dict(zip(['A','Z'],[1,-1]))

>>>d=dict([('A',1),('Z',-1)])

>>>e=dict({'Z':-1,'A':1})

>>>a==b==c==d==e#aretheyallthesame?

True#Theyareindeed

Haveyounoticedthosedoubleequals?Assignmentisdonewithoneequal,whiletocheckwhetheranobjectisthesameasanotherone(orfiveinonego,inthiscase),weusedoubleequals.Thereisalsoanotherwaytocompareobjects,whichinvolvestheisoperator,andcheckswhetherthetwoobjectsarethesame(iftheyhavethesameID,notjustthevalue),butunlessyouhaveagoodreasontouseit,youshouldusethedoubleequalsinstead.Intheprecedingcode,Ialsousedonenicefunction:zip.Itisnamedafterthereal-lifezip,whichgluestogethertwothingstakingoneelementfromeachatatime.Letmeshowyouanexample:

>>>list(zip(['h','e','l','l','o'],[1,2,3,4,5]))

[('h',1),('e',2),('l',3),('l',4),('o',5)]

>>>list(zip('hello',range(1,6)))#equivalent,morePythonic

[('h',1),('e',2),('l',3),('l',4),('o',5)]

Intheprecedingexample,Ihavecreatedthesamelistintwodifferentways,onemoreexplicit,andtheotheralittlebitmorePythonic.ForgetforamomentthatIhadtowrapthelistconstructoraroundthezipcall(thereasonisbecausezipreturnsaniterator,notalist,soifIwanttoseetheresultIneedtoexhaustthatiteratorintosomething—alistinthiscase),andconcentrateontheresult.See

howziphascoupledthefirstelementsofitstwoargumentstogether,thenthesecondones,thenthethirdones,andsoonandsoforth?Takealookatyourpants(oratyourpurse,ifyou'realady)andyou'llseethesamebehaviorinyouractualzip.Butlet'sgobacktodictionariesandseehowmanywonderfulmethodstheyexposeforallowingustomanipulatethemaswewant.

Let'sstartwiththebasicoperations:

>>>d={}

>>>d['a']=1#let'ssetacoupleof(key,value)pairs

>>>d['b']=2

>>>len(d)#howmanypairs?

2

>>>d['a']#whatisthevalueof'a'?

1

>>>d#howdoes`d`looknow?

{'a':1,'b':2}

>>>deld['a']#let'sremove`a`

>>>d

{'b':2}

>>>d['c']=3#let'sadd'c':3

>>>'c'ind#membershipischeckedagainstthekeys

True

>>>3ind#notthevalues

False

>>>'e'ind

False

>>>d.clear()#let'scleaneverythingfromthisdictionary

>>>d

{}

Noticehowaccessingkeysofadictionary,regardlessofthetypeofoperationwe'reperforming,isdonethroughsquarebrackets.Doyourememberstrings,lists,andtuples?Wewereaccessingelementsatsomepositionthroughsquarebracketsaswell,whichisyetanotherexampleofPython'sconsistency.

Let'sseenowthreespecialobjectscalleddictionaryviews:keys,values,anditems.Theseobjectsprovideadynamicviewofthedictionaryentriesandtheychangewhenthedictionarychanges.keys()returnsallthekeysinthedictionary,values()returnsallthevaluesinthedictionary,anditems()returnsallthe(key,value)pairsinthedictionary.

AccordingtothePythondocumentation:"Keysandvaluesareiteratedoverinanarbitraryorderwhichisnon-random,variesacrossPythonimplementations,anddependsonthedictionary’shistoryofinsertionsanddeletions.Ifkeys,valuesanditemsviewsareiteratedoverwithnointerveningmodificationstothedictionary,theorderofitemswilldirectlycorrespond."

Enoughwiththischatter;let'sputallthisdownintocode:

>>>d=dict(zip('hello',range(5)))

>>>d

{'h':0,'e':1,'l':3,'o':4}

>>>d.keys()

dict_keys(['h','e','l','o'])

>>>d.values()

dict_values([0,1,3,4])

>>>d.items()

dict_items([('h',0),('e',1),('l',3),('o',4)])

>>>3ind.values()

True

>>>('o',4)ind.items()

True

Thereareafewthingstonoticeintheprecedingcode.First,noticehowwe'recreatingadictionarybyiteratingoverthezippedversionofthestring'hello'andthelist[0,1,2,3,4].Thestring'hello'hastwo'l'charactersinside,andtheyarepairedupwiththevalues2and3bythezipfunction.Noticehowinthedictionary,thesecondoccurrenceofthe'l'key(theonewithvalue3),overwritesthefirstone(theonewithvalue2).Anotherthingtonoticeisthatwhenaskingforanyview,theoriginalorderisnowpreserved,whilebeforeVersion3.6therewasnoguaranteeofthat.

AsofPython3.6,thedicttypehasbeenreimplementedtouseamorecompactrepresentation.Thisresultedindictionariesusing20%to25%lessmemorywhencomparedtoPython3.5.Moreover,inPython3.6,asasideeffect,dictionariesarenativelyordered.Thisfeaturehasreceivedsuchawelcomefromthecommunitythatin3.7ithasbecomealegitfeatureofthelanguageratherthananimplementationsideeffect.Adictisorderedifitrememberstheorderinwhichkeyswerefirstinserted.

We'llseehowtheseviewsarefundamentaltoolswhenwetalkaboutiteratingovercollections.Let'stakealooknowatsomeothermethodsexposedbyPython'sdictionaries;there'splentyofthemandtheyareveryuseful:

>>>d

{'e':1,'h':0,'o':4,'l':3}

>>>d.popitem()#removesarandomitem(usefulinalgorithms)

('o',4)

>>>d

{'h':0,'e':1,'l':3}

>>>d.pop('l')#removeitemwithkey`l`

3

>>>d.pop('not-a-key')#removeakeynotindictionary:KeyError

Traceback(mostrecentcalllast):

File"<stdin>",line1,in<module>

KeyError:'not-a-key'

>>>d.pop('not-a-key','default-value')#withadefaultvalue?

'default-value'#wegetthedefaultvalue

>>>d.update({'another':'value'})#wecanupdatedictthisway

>>>d.update(a=13)#orthisway(likeafunctioncall)

>>>d

{'h':0,'e':1,'another':'value','a':13}

>>>d.get('a')#sameasd['a']butifkeyismissingnoKeyError

13

>>>d.get('a',177)#defaultvalueusedifkeyismissing

13

>>>d.get('b',177)#likeinthiscase

177

>>>d.get('b')#keyisnotthere,soNoneisreturned

Allthesemethodsarequitesimpletounderstand,butit'sworthtalkingaboutthatNone,foramoment.EveryfunctioninPythonreturnsNone,unlessthereturnstatementisexplicitlyusedtoreturnsomethingelse,butwe'llseethiswhenweexplorefunctions.Noneisfrequentlyusedtorepresenttheabsenceofavalue,anditisquitecommonlyusedasadefaultvalueforargumentsinfunctiondeclaration.SomeinexperiencedcoderssometimeswritecodethatreturnseitherFalseorNone.BothFalseandNoneevaluatetoFalseinaBooleancontextsoitmayseemthereisnotmuchdifferencebetweenthem.Butactually,Iwouldarguethereisquiteanimportantdifference:Falsemeansthatwehaveinformation,andtheinformationwehaveisFalse.Nonemeansnoinformation.AndnoinformationisverydifferentfrominformationthatisFalse.Inlayman'sterms,ifyouaskyourmechanic,Ismycarready?,thereisabigdifferencebetweentheanswer,No,it'snot(False)and,Ihavenoidea(None).

OnelastmethodIreallylikeaboutdictionariesissetdefault.Itbehaveslikeget,butalsosetsthekeywiththegivenvalueifitisnotthere.Let'sseeanexample:

>>>d={}

>>>d.setdefault('a',1)#'a'ismissing,wegetdefaultvalue

1

>>>d

{'a':1}#also,thekey/valuepair('a',1)hasnowbeenadded

>>>d.setdefault('a',5)#let'strytooverridethevalue

1

>>>d

{'a':1}#nooverride,asexpected

So,we'renowattheendofthistour.Testyourknowledgeaboutdictionariesbytryingtoforeseewhatdlookslikeafterthisline:

>>>d={}

>>>d.setdefault('a',{}).setdefault('b',[]).append(1)

Don'tworryifyoudon'tgetitimmediately.Ijustwantedtoencourageyoutoexperimentwithdictionaries.

Thisconcludesourtourofbuilt-indatatypes.BeforeIdiscusssomeconsiderationsaboutwhatwe'veseeninthischapter,Iwanttotakeapeek

brieflyatthecollectionsmodule.

ThecollectionsmoduleWhenPythongeneralpurposebuilt-incontainers(tuple,list,set,anddict)aren'tenough,wecanfindspecializedcontainerdatatypesinthecollectionsmodule.Theyare:

Datatype Description

namedtuple() Factoryfunctionforcreatingtuplesubclasseswithnamedfields

deque List-likecontainerwithfastappendsandpopsoneitherend

ChainMapDictionary-likeclassforcreatingasingleviewofmultiplemappings

Counter Dictionarysubclassforcountinghashableobjects

OrderedDictDictionarysubclassthatrememberstheorderentrieswereadded

defaultdictDictionarysubclassthatcallsafactoryfunctiontosupplymissingvalues

UserDictWrapperarounddictionaryobjectsforeasierdictionarysubclassing

UserList Wrapperaroundlistobjectsforeasierlistsubclassing

UserString Wrapperaroundstringobjectsforeasierstringsubclassing

Wedon'thavetheroomtocoverallofthem,butyoucanfindplentyofexamplesintheofficialdocumentation,sohereI'lljustgiveasmallexampletoshowyounamedtuple,defaultdict,andChainMap.

namedtupleAnamedtupleisatuple-likeobjectthathasfieldsaccessiblebyattributelookupaswellasbeingindexableanditerable(it'sactuallyasubclassoftuple).Thisissortofacompromisebetweenafull-fledgedobjectandatuple,anditcanbeusefulinthosecaseswhereyoudon'tneedthefullpowerofacustomobject,butyouwantyourcodetobemorereadablebyavoidingweirdindexing.Anotherusecaseiswhenthereisachancethatitemsinthetupleneedtochangetheirpositionafterrefactoring,forcingthecodertorefactoralsoallthelogicinvolved,whichcanbeverytricky.Asusual,anexampleisbetterthanathousandwords(orwasitapicture?).Saywearehandlingdataabouttheleftandrighteyesofapatient.Wesaveonevalueforthelefteye(position0)andonefortherighteye(position1)inaregulartuple.Here'showthatmightbe:>>>vision=(9.5,8.8)>>>vision(9.5,8.8)>>>vision[0]#lefteye(implicitpositionalreference)9.5>>>vision[1]#righteye(implicitpositionalreference)8.8

Nowlet'spretendwehandlevisionobjectsallthetime,andatsomepointthedesignerdecidestoenhancethembyaddinginformationforthecombinedvision,sothatavisionobjectstoresdatainthisformat:(lefteye,combined,righteye).

Doyouseethetroublewe'reinnow?Wemayhavealotofcodethatdependsonvision[0]beingthelefteyeinformation(whichitstillis)andvision[1]beingtherighteyeinformation(whichisnolongerthecase).Wehavetorefactorourcodewhereverwehandletheseobjects,changingvision[1]tovision[2],anditcanbepainful.Wecouldhaveprobablyapproachedthisabitbetterfromthebeginning,byusinganamedtuple.LetmeshowyouwhatImean:

>>>fromcollectionsimportnamedtuple

>>>Vision=namedtuple('Vision',['left','right'])

>>>vision=Vision(9.5,8.8)

>>>vision[0]

9.5

>>>vision.left#sameasvision[0],butexplicit

9.5

>>>vision.right#sameasvision[1],butexplicit

8.8

Ifwithinourcode,werefertotheleftandrighteyesusingvision.leftandvision.right,allweneedtodotofixthenewdesignissueistochangeourfactoryandthewaywecreateinstances.Therestofthecodewon'tneedtochange:

>>>Vision=namedtuple('Vision',['left','combined','right'])

>>>vision=Vision(9.5,9.2,8.8)

>>>vision.left#stillcorrect

9.5

>>>vision.right#stillcorrect(thoughnowisvision[2])

8.8

>>>vision.combined#thenewvision[1]

9.2

Youcanseehowconvenientitistorefertothosevaluesbynameratherthanbyposition.Afterall,awisemanoncewrote,Explicitisbetterthanimplicit(canyourecallwhere?ThinkZenifyoucan't...).Thisexamplemaybealittleextreme;ofcourse,it'snotlikelythatourcodedesignerwillgoforachangelikethis,butyou'dbeamazedtoseehowfrequentlyissuessimilartothisonehappeninaprofessionalenvironment,andhowpainfulitistorefactorthem.

defaultdictThedefaultdictdatatypeisoneofmyfavorites.Itallowsyoutoavoidcheckingifakeyisinadictionarybysimplyinsertingitforyouonyourfirstaccessattempt,withadefaultvaluewhosetypeyoupassoncreation.Insomecases,thistoolcanbeveryhandyandshortenyourcodealittle.Let'sseeaquickexample.Sayweareupdatingthevalueofage,byaddingoneyear.Ifageisnotthere,weassumeitwas0andweupdateitto1:>>>d={}>>>d['age']=d.get('age',0)+1#agenotthere,weget0+1>>>d{'age':1}>>>d={'age':39}>>>d['age']=d.get('age',0)+1#ageisthere,weget40>>>d{'age':40}

Nowlet'sseehowitwouldworkwithadefaultdictdatatype.Thesecondlineisactuallytheshortversionofafour-lines-longifclausethatwewouldhavetowriteifdictionariesdidn'thavethegetmethod(we'llseeallaboutifclausesinChapter3,IteratingandMakingDecisions):

>>>fromcollectionsimportdefaultdict

>>>dd=defaultdict(int)#intisthedefaulttype(0thevalue)

>>>dd['age']+=1#shortfordd['age']=dd['age']+1

>>>dd

defaultdict(<class'int'>,{'age':1})#1,asexpected

Noticehowwejustneedtoinstructthedefaultdictfactorythatwewantanintnumbertobeusedincasethekeyismissing(we'llget0,whichisthedefaultfortheinttype).Also,noticethateventhoughinthisexamplethereisnogainonthenumberoflines,thereisdefinitelyagaininreadability,whichisveryimportant.Youcanalsouseadifferenttechniquetoinstantiateadefaultdictdatatype,whichinvolvescreatingafactoryobject.Todigdeeper,pleaserefertotheofficialdocumentation.

ChainMapChainMapisanextremelynicedatatypewhichwasintroducedinPython3.3.ItbehaveslikeanormaldictionarybutaccordingtothePythondocumentation:"isprovidedforquicklylinkinganumberofmappingssotheycanbetreatedasasingleunit""".Thisisusuallymuchfasterthancreatingonedictionaryandrunningmultipleupdatecallsonit.ChainMapcanbeusedtosimulatenestedscopesandisusefulintemplating.Theunderlyingmappingsarestoredinalist.Thatlistispublicandcanbeaccessedorupdatedusingthemapsattribute.Lookupssearchtheunderlyingmappingssuccessivelyuntilakeyisfound.Bycontrast,writes,updates,anddeletionsonlyoperateonthefirstmapping.

Averycommonusecaseisprovidingdefaults,solet'sseeanexample:

>>>fromcollectionsimportChainMap

>>>default_connection={'host':'localhost','port':4567}

>>>connection={'port':5678}

>>>conn=ChainMap(connection,default_connection)#mapcreation

>>>conn['port']#portisfoundinthefirstdictionary

5678

>>>conn['host']#hostisfetchedfromtheseconddictionary

'localhost'

>>>conn.maps#wecanseethemappingobjects

[{'port':5678},{'host':'localhost','port':4567}]

>>>conn['host']='packtpub.com'#let'saddhost

>>>conn.maps

[{'port':5678,'host':'packtpub.com'},

{'host':'localhost','port':4567}]

>>>delconn['port']#let'sremovetheportinformation

>>>conn.maps

[{'host':'packtpub.com'},{'host':'localhost','port':4567}]

>>>conn['port']#nowportisfetchedfromtheseconddictionary

4567

>>>dict(conn)#easytomergeandconverttoregulardictionary

{'host':'packtpub.com','port':4567}

IjustlovehowPythonmakesyourlifeeasy.YouworkonaChainMapobject,configurethefirstmappingasyouwant,andwhenyouneedacompletedictionarywithallthedefaultsaswellasthecustomizeditems,youjustfeedtheChainMapobjecttoadictconstructor.Ifyouhavenevercodedinotherlanguages,suchasJavaorC++,youprobablywon'tbeabletoappreciatefullyhowpreciousthisis,andhowPythonmakesyourlifesomucheasier.Ido,IfeelclaustrophobiceverytimeIhavetocodeinsomeotherlanguage.

EnumsTechnicallynotabuilt-indatatype,asyouhavetoimportthemfromtheenummodule,butdefinitelyworthmentioning,areenumerations.TheywereintroducedinPython3.4,andthoughitisnotthatcommontoseetheminprofessionalcode(yet),IthoughtI'dgiveyouanexampleanyway.

Theofficialdefinitiongoeslikethis:"Anenumerationisasetofsymbolicnames(members)boundtounique,constantvalues.Withinanenumeration,thememberscanbecomparedbyidentity,andtheenumerationitselfcanbeiteratedover."

Sayyouneedtorepresenttrafficlights.Inyourcode,youmightresorttodoingthis:

>>>GREEN=1

>>>YELLOW=2

>>>RED=4

>>>TRAFFIC_LIGHTS=(GREEN,YELLOW,RED)

>>>#orwithadict

>>>traffic_lights={'GREEN':1,'YELLOW':2,'RED':4}

There'snothingspecialabouttheprecedingcode.It'ssomething,infact,thatisverycommontofind.But,considerdoingthisinstead:

>>>fromenumimportEnum

>>>classTrafficLight(Enum):

...GREEN=1

...YELLOW=2

...RED=4

...

>>>TrafficLight.GREEN

<TrafficLight.GREEN:1>

>>>TrafficLight.GREEN.name

'GREEN'

>>>TrafficLight.GREEN.value

1

>>>TrafficLight(1)

<TrafficLight.GREEN:1>

>>>TrafficLight(4)

<TrafficLight.RED:4>

Ignoringforamomentthe(relative)complexityofaclassdefinition,youcanappreciatehowthismightbemoreadvantageous.Thedatastructureismuchcleaner,andtheAPIitprovidesismuchmorepowerful.Iencourageyouto

checkouttheofficialdocumentationtoexploreallthegreatfeaturesyoucanfindintheenummodule.Ithinkit'sworthexploring,atleastonce.

Finalconsiderations

That'sit.NowyouhaveseenaverygoodproportionofthedatastructuresthatyouwilluseinPython.IencourageyoutotakeadiveintothePythondocumentationandexperimentfurtherwitheachandeverydatatypewe'veseeninthischapter.It'sworthit,believeme.Everythingyou'llwritewillbeabouthandlingdata,somakesureyourknowledgeaboutitisrocksolid.

BeforeweleapintoChapter3,IteratingandMakingDecisions,I'dliketosharesomefinalconsiderationsaboutdifferentaspectsthattomymindareimportantandnottobeneglected.

Smallvaluescaching

Whenwediscussedobjectsatthebeginningofthischapter,wesawthatwhenweassignedanametoanobject,Pythoncreatestheobject,setsitsvalue,andthenpointsthenametoit.Wecanassigndifferentnamestothesamevalueandweexpectdifferentobjectstobecreated,likethis:

>>>a=1000000

>>>b=1000000

>>>id(a)==id(b)

False

Intheprecedingexample,aandbareassignedtotwointobjects,whichhavethesamevaluebuttheyarenotthesameobject,asyoucansee,theiridisnotthesame.Solet'sdoitagain:

>>>a=5

>>>b=5

>>>id(a)==id(b)

True

Oh,oh!IsPythonbroken?Whyarethetwoobjectsthesamenow?Wedidn'tdoa=b=5,wesetthemupseparately.Well,theanswerisperformances.Pythoncachesshortstringsandsmallnumbers,toavoidhavingmanycopiesofthemcloggingupthesystemmemory.Everythingishandledproperlyunderthehoodsoyoudon'tneedtoworryabit,butmakesurethatyourememberthisbehaviorshouldyourcodeeverneedtofiddlewithIDs.

HowtochoosedatastructuresAswe'veseen,Pythonprovidesyouwithseveralbuilt-indatatypesandsometimes,ifyou'renotthatexperienced,choosingtheonethatservesyoubestcanbetricky,especiallywhenitcomestocollections.Forexample,sayyouhavemanydictionariestostore,eachofwhichrepresentsacustomer.Withineachcustomerdictionary,there'san'id':'code'uniqueidentificationcode.Inwhatkindofcollectionwouldyouplacethem?Well,unlessIknowmoreaboutthesecustomers,it'sveryhardtoanswer.WhatkindofaccesswillIneed?WhatsortofoperationswillIhavetoperformoneachofthem,andhowmanytimes?Willthecollectionchangeovertime?WillIneedtomodifythecustomerdictionariesinanyway?WhatisgoingtobethemostfrequentoperationIwillhavetoperformonthecollection?

Ifyoucananswertheprecedingquestions,thenyouwillknowwhattochoose.Ifthecollectionnevershrinksorgrows(inotherwords,itwon'tneedtoadd/deleteanycustomerobjectaftercreation)orshuffles,thentuplesareapossiblechoice.Otherwise,listsareagoodcandidate.Everycustomerdictionaryhasauniqueidentifierthough,soevenadictionarycouldwork.Letmedrafttheseoptionsforyou:

#examplecustomerobjects

customer1={'id':'abc123','full_name':'MasterYoda'}

customer2={'id':'def456','full_name':'Obi-WanKenobi'}

customer3={'id':'ghi789','full_name':'AnakinSkywalker'}

#collecttheminatuple

customers=(customer1,customer2,customer3)

#orcollecttheminalist

customers=[customer1,customer2,customer3]

#ormaybewithinadictionary,theyhaveauniqueidafterall

customers={

'abc123':customer1,

'def456':customer2,

'ghi789':customer3,

}

Somecustomerswehavethere,right?Iprobablywouldn'tgowiththetupleoption,unlessIwantedtohighlightthatthecollectionisnotgoingtochange.I'dsayusuallyalistisbetter,asitallowsformoreflexibility.

Anotherfactortokeepinmindisthattuplesandlistsareorderedcollections.If

youuseadictionary(priortoPython3.6)oraset,youlosetheordering,soyouneedtoknowiforderingisimportantinyourapplication.

Whataboutperformances?Forexample,inalist,operationssuchasinsertionandmembershipcantakeO(n),whiletheyareO(1)foradictionary.It'snotalwayspossibletousedictionariesthough,ifwedon'thavetheguaranteethatwecanuniquelyidentifyeachitemofthecollectionbymeansofoneofitsproperties,andthatthepropertyinquestionishashable(soitcanbeakeyindict).

Ifyou'rewonderingwhatO(n)andO(1)mean,pleaseGooglebigOnotation.Inthiscontext,let'sjustsaythatifperforminganoperationOponadatastructuretakesO(f(n)),itwouldmeanthatOptakesatmostatimet≤c*f(n)tocomplete,wherecissomepositiveconstant,nisthesizeoftheinput,andfissomefunction.So,thinkofO(...)asanupperboundfortherunningtimeofanoperation(itcanbeusedalsotosizeothermeasurablequantities,ofcourse).

Anotherwayofunderstandingifyouhavechosentherightdatastructureisbylookingatthecodeyouhavetowriteinordertomanipulateit.Ifeverythingcomeseasilyandflowsnaturally,thenyouprobablyhavechosencorrectly,butifyoufindyourselfthinkingyourcodeisgettingunnecessarilycomplicated,thenyouprobablyshouldtryanddecidewhetheryouneedtoreconsideryourchoices.It'squitehardtogiveadvicewithoutapracticalcasethough,sowhenyouchooseadatastructureforyourdata,trytokeepeaseofuseandperformanceinmindandgiveprecedencetowhatmattersmostinthecontextyouarein.

AboutindexingandslicingAtthebeginningofthischapter,wesawslicingappliedonstrings.Slicing,ingeneral,appliestoasequence:tuples,lists,strings,andsoon.Withlists,slicingcanalsobeusedforassignment.I'vealmostneverseenthisusedinprofessionalcode,butstill,youknowyoucan.Couldyouslicedictionariesorsets?Ihearyouscream,Ofcoursenot!.Excellent;Iseewe'reonthesamepagehere,solet'stalkaboutindexing.

ThereisonecharacteristicaboutPythonindexingIhaven'tmentionedbefore.I'llshowyoubywayofanexample.Howdoyouaddressthelastelementofacollection?Let'ssee:

>>>a=list(range(10))#`a`has10elements.Lastoneis9.

>>>a

[0,1,2,3,4,5,6,7,8,9]

>>>len(a)#itslengthis10elements

10

>>>a[len(a)-1]#positionoflastoneislen(a)-1

9

>>>a[-1]#butwedon'tneedlen(a)!Pythonrocks!

9

>>>a[-2]#equivalenttolen(a)-2

8

>>>a[-3]#equivalenttolen(a)-3

7

Ifthelistahas10elements,becauseofthe0-indexpositioningsystemofPython,thefirstoneisatposition0andthelastoneisatposition9.Intheprecedingexample,theelementsareconvenientlyplacedinapositionequaltotheirvalue:0isatposition0,1atposition1,andsoon.

So,inordertofetchthelastelement,weneedtoknowthelengthofthewholelist(ortuple,orstring,andsoon)andthensubtract1.Hence:len(a)-1.ThisissocommonanoperationthatPythonprovidesyouwithawaytoretrieveelementsusingnegativeindexing.Thisprovesveryusefulwhenyoudodatamanipulation.Here'sanicediagramabouthowindexingworksonthestring"HelloThere"(whichisObi-WanKenobisarcasticallygreetingGeneralGrievous):

Tryingtoaddressindexesgreaterthan9orsmallerthan-10willraiseanIndexError,asexpected.

AboutthenamesYoumayhavenoticedthat,inordertokeeptheexamplesasshortaspossible,Ihavecalledmanyobjectsusingsimpleletters,likea,b,c,d,andsoon.ThisisperfectlyOKwhenyoudebugontheconsoleorwhenyoushowthata+b==7,butit'sbadpracticewhenitcomestoprofessionalcoding(oranytypeofcoding,forthatmatter).IhopeyouwillindulgemeifIsometimesdoit;thereasonistopresentthecodeinamorecompactway.

Inarealenvironmentthough,whenyouchoosenamesforyourdata,youshouldchoosethemcarefullyandtheyshouldreflectwhatthedataisabout.So,ifyouhaveacollectionofCustomerobjects,customersisaperfectlygoodnameforit.Wouldcustomers_list,customers_tuple,orcustomers_collectionworkaswell?Thinkaboutitforasecond.Isitgoodtotiethenameofthecollectiontothedatatype?Idon'tthinkso,atleastinmostcases.SoI'dsayifyouhaveanexcellentreasontodoso,goahead;otherwise,don't.Thereasonis,oncethatcustomers_tuplestartsbeingusedindifferentplacesofyourcode,andyourealizeyouactuallywanttousealistinsteadofatuple,you'reupforsomefunrefactoring(alsoknownaswastedtime).Namesfordatashouldbenouns,andnamesforfunctionsshouldbeverbs.Namesshouldbeasexpressiveaspossible.Pythonisactuallyaverygoodexamplewhenitcomestonames.Mostofthetimeyoucanjustguesswhatafunctioniscalledifyouknowwhatitdoes.Crazy,huh?

Chapter2ofMeaningfulNamesofCleanCode,RobertC.Martin,PrenticeHallisentirelydedicatedtonames.It'sanamazingbookthathelpedmeimprovemycodingstyleinmanydifferentways,andisamust-readifyouwanttotakeyourcodingtothenextlevel.

SummaryInthischapter,we'veexploredthebuilt-indatatypesofPython.We'veseenhowmanythereareandhowmuchcanbeachievedbyjustusingthemindifferentcombinations.

We'veseennumbertypes,sequences,sets,mappings,collections(andaspecialguestappearancebyEnum),we'veseenthateverythingisanobject,we'velearnedthedifferencebetweenmutableandimmutable,andwe'vealsolearnedaboutslicingandindexing(and,proudly,negativeindexingaswell).

We'vepresentedsimpleexamples,butthere'smuchmorethatyoucanlearnaboutthissubject,sostickyournoseintotheofficialdocumentationandexplore.

Mostofall,Iencourageyoutotryoutalltheexercisesbyyourself,getyourfingersusingthatcode,buildsomemusclememory,andexperiment,experiment,experiment.Learnwhathappenswhenyoudividebyzero,whenyoucombinedifferentnumbertypesintoasingleexpression,whenyoumanagestrings.Playwithalldatatypes.Exercisethem,breakthem,discoveralltheirmethods,enjoythem,andlearnthemvery,verywell.

Ifyourfoundationisnotrocksolid,howgoodcanyourcodebe?Anddataisthefoundationforeverything.Datashapeswhatdancesaroundit.

Themoreyouprogresswiththebook,themoreit'slikelythatyouwillfindsomediscrepanciesormaybeasmalltypohereandthereinmycode(oryours).Youwillgetanerrormessage,somethingwillbreak.That'swonderful!Whenyoucode,thingsbreakallthetime,youdebugandfixallthetime,soconsidererrorsasusefulexercisestolearnsomethingnewaboutthelanguageyou'reusing,andnotasfailuresorproblems.Errorswillkeepcomingupuntilyourverylastlineofcode,that'sforsure,soyoumayaswellstartmakingyourpeacewiththemnow.

Thenextchapterisaboutiteratingandmakingdecisions.We'llseehowactuallytoputthosecollectionstouse,andtakedecisionsbasedonthedatawe'representedwith.We'llstarttogoalittlefasternowthatyourknowledgeis

buildingup,somakesureyou'recomfortablewiththecontentsofthischapterbeforeyoumovetothenextone.Oncemore,havefun,explore,breakthings.It'saverygoodwaytolearn.

IteratingandMakingDecisions"Insanity:doingthesamethingoverandoveragainandexpectingdifferentresults."–AlbertEinsteinInthepreviouschapter,welookedatPython'sbuilt-indatatypes.Nowthatyou'refamiliar

withdatainitsmanyformsandshapes,it'stimetostartlookingathowaprogramcanuseit.

AccordingtoWikipedia:Incomputerscience,controlflow(oralternatively,flowofcontrol)referstothespecificationoftheorderinwhichtheindividualstatements,instructionsorfunctioncallsofanimperativeprogramareexecutedorevaluated.

Inordertocontroltheflowofaprogram,wehavetwomainweapons:conditionalprogramming(alsoknownasbranching)andlooping.Wecanusetheminmanydifferentcombinationsandvariations,butinthischapter,insteadofgoingthroughallthepossibleformsofthosetwoconstructsinadocumentationfashion,I'drathergiveyouthebasicsandthenI'llwriteacoupleofsmallscriptswithyou.Inthefirstone,we'llseehowtocreatearudimentaryprime-numbergenerator,whileinthesecondone,we'llseehowtoapplydiscountstocustomersbasedoncoupons.Thisway,youshouldgetabetterfeelingforhowconditionalprogrammingandloopingcanbeused.

Inthischapter,wearegoingtocoverthefollowing:

ConditionalprogrammingLoopinginPythonAquickpeekattheitertoolsmodule

ConditionalprogrammingConditionalprogramming,orbranching,issomethingyoudoeveryday,everymoment.It'saboutevaluatingconditions:ifthelightisgreen,thenIcancross;ifit'sraining,thenI'mtakingtheumbrella;andifI'mlateforwork,thenI'llcallmymanager.

Themaintoolistheifstatement,whichcomesindifferentformsandcolors,butbasicallyitevaluatesanexpressionand,basedontheresult,chooseswhichpartofthecodetoexecute.Asusual,let'slookatanexample:

#conditional.1.py

late=True

iflate:

print('Ineedtocallmymanager!')

Thisispossiblythesimplestexample:whenfedtotheifstatement,lateactsasaconditionalexpression,whichisevaluatedinaBooleancontext(exactlylikeifwewerecallingbool(late)).IftheresultoftheevaluationisTrue,thenweenterthebodyofthecodeimmediatelyaftertheifstatement.Noticethattheprintinstructionisindented:thismeansitbelongstoascopedefinedbytheifclause.Executionofthiscodeyields:

$pythonconditional.1.py

Ineedtocallmymanager!

SincelateisTrue,theprintstatementwasexecuted.Let'sexpandonthisexample:

#conditional.2.py

late=False

iflate:

print('Ineedtocallmymanager!')#1

else:

print('noneedtocallmymanager...')#2

ThistimeIsetlate=False,sowhenIexecutethecode,theresultisdifferent:

$pythonconditional.2.py

noneedtocallmymanager...

Dependingontheresultofevaluatingthelateexpression,wecaneitherenterblock#1orblock#2,butnotboth.Block#1isexecutedwhenlateevaluatesto

True,whileblock#2isexecutedwhenlateevaluatestoFalse.TryassigningFalse/Truevaluestothelatename,andseehowtheoutputforthiscodechangesaccordingly.

Theprecedingexamplealsointroducestheelseclause,whichbecomesveryhandywhenwewanttoprovideanalternativesetofinstructionstobeexecutedwhenanexpressionevaluatestoFalsewithinanifclause.Theelseclauseisoptional,asisevidentbycomparingtheprecedingtwoexamples.

Aspecializedelse–elifSometimesallyouneedistodosomethingifaconditionismet(asimpleifclause).Atothertimes,youneedtoprovideanalternative,incasetheconditionisFalse(if/elseclause),buttherearesituationswhereyoumayhavemorethantwopathstochoosefrom,so,sincecallingthemanager(ornotcallingthem)iskindofabinarytypeofexample(eitheryoucalloryoudon't),let'schangethetypeofexampleandkeepexpanding.Thistime,wedecideontaxpercentages.Ifmyincomeislessthan$10,000,Iwon'tpayanytaxes.Ifitisbetween$10,000and$30,000,I'llpay20%intaxes.Ifitisbetween$30,000and$100,000,I'llpay35%intaxes,andifit'sover$100,000,I'll(gladly)pay45%intaxes.Let'sputthisalldownintobeautifulPythoncode:

#taxes.py

income=15000

ifincome<10000:

tax_coefficient=0.0#1

elifincome<30000:

tax_coefficient=0.2#2

elifincome<100000:

tax_coefficient=0.35#3

else:

tax_coefficient=0.45#4

print('Iwillpay:',income*tax_coefficient,'intaxes')

Executingtheprecedingcodeyields:

$pythontaxes.py

Iwillpay:3000.0intaxes

Let'sgothroughtheexamplelinebyline:westartbysettinguptheincomevalue.Intheexample,myincomeis$15,000.Weentertheifclause.Noticethatthistimewealsointroducedtheelifclause,whichisacontractionofelse-if,andit'sdifferentfromabareelseclauseinthatitalsohasitsowncondition.So,theifexpressionofincome<10000evaluatestoFalse,thereforeblock#1isnotexecuted.

Thecontrolpassestothenextconditionevaluator:elifincome<30000.ThisoneevaluatestoTrue,thereforeblock#2isexecuted,andbecauseofthis,Pythonthenresumesexecutionafterthewholeif/elif/elif/elseclause(whichwecanjustcall

theifclausefromnowon).Thereisonlyoneinstructionaftertheifclause,theprintcall,whichtellsusIwillpay3000.0intaxesthisyear(15,000*20%).Noticethattheorderismandatory:ifcomesfirst,then(optionally)asmanyelifclausesasyouneed,andthen(optionally)anelseclause.

Interesting,right?Nomatterhowmanylinesofcodeyoumayhavewithineachblock,whenoneoftheconditionsevaluatestoTrue,theassociatedblockisexecutedandthenexecutionresumesafterthewholeclause.IfnoneoftheconditionsevaluatestoTrue(forexample,income=200000),thenthebodyoftheelseclausewouldbeexecuted(block#4).Thisexampleexpandsourunderstandingofthebehavioroftheelseclause.Itsblockofcodeisexecutedwhennoneoftheprecedingif/elif/.../elifexpressionshasevaluatedtoTrue.

Trytomodifythevalueofincomeuntilyoucancomfortablyexecuteallblocksatwill(oneperexecution,ofcourse).Andthentrytheboundaries.Thisiscrucial,wheneveryouhaveconditionsexpressedasequalitiesorinequalities(==,!=,<,>,<=,>=),thosenumbersrepresentboundaries.Itisessentialtotestboundariesthoroughly.ShouldIallowyoutodriveat18or17?AmIcheckingyouragewithage<18,orage<=18?Youcan'timaginehowmanytimesI'vehadtofixsubtlebugsthatstemmedfromusingthewrongoperator,sogoaheadandexperimentwiththeprecedingcode.Changesome<to<=andsetincometobeoneoftheboundaryvalues(10,000,30,000,100,000)aswellasanyvalueinbetween.Seehowtheresultchanges,andgetagoodunderstandingofitbeforeproceeding.

Let'snowseeanotherexamplethatshowsushowtonestifclauses.Sayyourprogramencountersanerror.Ifthealertsystemistheconsole,weprinttheerror.Ifthealertsystemisanemail,wesenditaccordingtotheseverityoftheerror.Ifthealertsystemisanythingotherthanconsoleoremail,wedon'tknowwhattodo,thereforewedonothing.Let'sputthisintocode:

#errorsalert.py

alert_system='console'#othervaluecanbe'email'

error_severity='critical'#othervalues:'medium'or'low'

error_message='OMG!Somethingterriblehappened!'

ifalert_system=='console':

print(error_message)#1

elifalert_system=='email':

iferror_severity=='critical':

send_email('admin@example.com',error_message)#2

eliferror_severity=='medium':

send_email('support.1@example.com',error_message)#3

else:

send_email('support.2@example.com',error_message)#4

Theprecedingexampleisquiteinteresting,becauseofitssilliness.Itshowsustwonestedifclauses(outerandinner).Italsoshowsusthattheouterifclausedoesn'thaveanyelse,whiletheinneronedoes.Noticehowindentationiswhatallowsustonestoneclausewithinanotherone.

Ifalert_system=='console',body#1isexecuted,andnothingelsehappens.Ontheotherhand,ifalert_system=='email',thenweenterintoanotherifclause,whichwecalledinner.Intheinnerifclause,accordingtoerror_severity,wesendanemailtoeitheranadmin,first-levelsupport,orsecond-levelsupport(blocks#2,#3,and#4).Thesend_emailfunctionisnotdefinedinthisexample,thereforetryingtorunitwouldgiveyouanerror.Inthesourcecodeofthebook,whichyoucandownloadfromthewebsite,Iincludedatricktoredirectthatcalltoaregularprintfunction,justsoyoucanexperimentontheconsolewithoutactuallysendinganemail.Trychangingthevaluesandseehowitallworks.

TheternaryoperatorOnelastthingIwouldliketoshowyou,beforemovingontothenextsubject,istheternaryoperatoror,inlayman'sterms,theshortversionofanif/elseclause.Whenthevalueofanameistobeassignedaccordingtosomecondition,sometimesit'seasierandmorereadabletousetheternaryoperatorinsteadofaproperifclause.Inthefollowingexample,thetwocodeblocksdoexactlythesamething:#ternary.pyorder_total=247#GBP#classicif/elseformiforder_total>100:discount=25#GBPelse:discount=0#GBPprint(order_total,discount)#ternaryoperatordiscount=25iforder_total>100else0print(order_total,discount)

Forsimplecaseslikethis,Ifinditverynicetobeabletoexpressthatlogicinonelineinsteadoffour.Remember,asacoder,youspendmuchmoretimereadingcodethanwritingit,soPython'sconcisenessisinvaluable.

Areyouclearonhowtheternaryoperatorworks?Basically,name=somethingifconditionelsesomething-else.SonameisassignedsomethingifconditionevaluatestoTrue,andsomething-elseifconditionevaluatestoFalse.

Nowthatyouknoweverythingaboutcontrollingthepathofthecode,let'smoveontothenextsubject:looping.

Looping

Ifyouhaveanyexperiencewithloopinginotherprogramminglanguages,youwillfindPython'swayofloopingabitdifferent.Firstofall,whatislooping?Loopingmeansbeingabletorepeattheexecutionofacodeblockmorethanonce,accordingtotheloopparameterswe'regiven.Therearedifferentloopingconstructs,whichservedifferentpurposes,andPythonhasdistilledallofthemdowntojusttwo,whichyoucanusetoachieveeverythingyouneed.Thesearetheforandwhilestatements.

Whileit'sdefinitelypossibletodoeverythingyouneedusingeitherofthem,theyservedifferentpurposesandthereforethey'reusuallyusedindifferentcontexts.We'llexplorethisdifferencethoroughlyinthischapter.

Theforloop

Theforloopisusedwhenloopingoverasequence,suchasalist,tuple,oracollectionofobjects.Let'sstartwithasimpleexampleandexpandontheconcepttoseewhatthePythonsyntaxallowsustodo:

#simple.for.py

fornumberin[0,1,2,3,4]:

print(number)

Thissimplesnippetofcode,whenexecuted,printsallnumbersfrom0to4.Theforloopisfedthelist[0,1,2,3,4]andateachiteration,numberisgivenavaluefromthesequence(whichisiteratedsequentially,inorder),thenthebodyoftheloopisexecuted(theprintline).Thenumbervaluechangesateveryiteration,accordingtowhichvalueiscomingnextfromthesequence.Whenthesequenceisexhausted,theforloopterminates,andtheexecutionofthecoderesumesnormallywiththecodeaftertheloop.

IteratingoverarangeSometimesweneedtoiterateoverarangeofnumbers,anditwouldbequiteunpleasanttohavetodosobyhardcodingthelistsomewhere.Insuchcases,therangefunctioncomestotherescue.Let'sseetheequivalentoftheprevioussnippetofcode:

#simple.for.py

fornumberinrange(5):

print(number)

TherangefunctionisusedextensivelyinPythonprogramswhenitcomestocreatingsequences:youcancallitbypassingonevalue,whichactsasstop(countingfrom0),oryoucanpasstwovalues(startandstop),oreventhree(start,stop,andstep).Checkoutthefollowingexample:

>>>list(range(10))#onevalue:from0tovalue(excluded)

[0,1,2,3,4,5,6,7,8,9]

>>>list(range(3,8))#twovalues:fromstarttostop(excluded)

[3,4,5,6,7]

>>>list(range(-10,10,4))#threevalues:stepisadded

[-10,-6,-2,2,6]

Forthemoment,ignorethatweneedtowraprange(...)withinalist.Therangeobjectisalittlebitspecial,butinthiscase,we'rejustinterestedinunderstandingwhatvaluesitwillreturntous.Youcanseethatthedealisthesamewithslicing:startisincluded,stopexcluded,andoptionallyyoucanaddastepparameter,whichbydefaultis1.

Trymodifyingtheparametersoftherange()callinoursimple.for.pycodeandseewhatitprints.Getcomfortablewithit.

IteratingoverasequenceNowwehaveallthetoolstoiterateoverasequence,solet'sbuildonthatexample:

#simple.for.2.py

surnames=['Rivest','Shamir','Adleman']

forpositioninrange(len(surnames)):

print(position,surnames[position])

Theprecedingcodeaddsalittlebitofcomplexitytothegame.Executionwillshowthisresult:

$pythonsimple.for.2.py

0Rivest

1Shamir

2Adleman

Let'susetheinside-outtechniquetobreakitdown,OK?Westartfromtheinnermostpartofwhatwe'retryingtounderstand,andweexpandoutward.So,len(surnames)isthelengthofthesurnameslist:3.Therefore,range(len(surnames))isactuallytransformedintorange(3).Thisgivesustherange[0,3),whichisbasicallyasequence(0,1,2).Thismeansthattheforloopwillrunthreeiterations.Inthefirstone,positionwilltakevalue0,whileinthesecondone,itwilltakevalue1,andfinallyvalue2inthethirdandlastiteration.Whatis(0,1,2),ifnotthepossibleindexingpositionsforthesurnameslist?Atposition0,wefind'Rivest',atposition1,'Shamir',andatposition2,'Adleman'.Ifyouarecuriousaboutwhatthesethreemencreatedtogether,changeprint(position,surnames[position])toprint(surnames[position][0],end=''),addafinalprint()outsideoftheloop,andrunthecodeagain.

Now,thisstyleofloopingisactuallymuchclosertolanguagessuchasJavaorC++.InPython,it'squiteraretoseecodelikethis.Youcanjustiterateoveranysequenceorcollection,sothereisnoneedtogetthelistofpositionsandretrieveelementsoutofasequenceateachiteration.It'sexpensive,needlesslyexpensive.Let'schangetheexampleintoamorePythonicform:

#simple.for.3.py

surnames=['Rivest','Shamir','Adleman']

forsurnameinsurnames:

print(surname)

Nowthat'ssomething!It'spracticallyEnglish.Theforloopcaniterateoverthesurnameslist,anditgivesbackeachelementinorderateachinteraction.Runningthiscodewillprintthethreesurnames,oneatatime.It'smucheasiertoread,right?

Whatifyouwantedtoprintthepositionaswellthough?Orwhatifyouactuallyneededit?Shouldyougobacktotherange(len(...))form?No.Youcanusetheenumeratebuilt-infunction,likethis:

#simple.for.4.py

surnames=['Rivest','Shamir','Adleman']

forposition,surnameinenumerate(surnames):

print(position,surname)

Thiscodeisveryinterestingaswell.Noticethatenumerategivesbackatwo-tuple(position,surname)ateachiteration,butstill,it'smuchmorereadable(andmoreefficient)thantherange(len(...))example.Youcancallenumeratewithastartparameter,suchasenumerate(iterable,start),anditwillstartfromstart,ratherthan0.JustanotherlittlethingthatshowsyouhowmuchthoughthasbeengivenindesigningPythonsothatitmakesyourlifeeasier.

Youcanuseaforlooptoiterateoverlists,tuples,andingeneralanythingthatPythoncallsiterable.Thisisaveryimportantconcept,solet'stalkaboutitabitmore.

IteratorsanditerablesAccordingtothePythondocumentation(https://docs.python.org/3/glossary.html),aniterableis:Anobjectcapableofreturningitsmembersoneatatime.Examplesofiterablesincludeallsequencetypes(suchaslist,str,andtuple)andsomenon-sequencetypeslikedict,fileobjects,andobjectsofanyclassesyoudefinewithan__iter__()or__getitem__()method.Iterablescanbeusedinaforloopandinmanyotherplaceswhereasequenceisneeded(zip(),map(),...).Whenaniterableobjectispassedasanargumenttothebuilt-infunctioniter(),itreturnsaniteratorfortheobject.Thisiteratorisgoodforonepassoverthesetofvalues.Whenusingiterables,itisusuallynotnecessarytocalliter()ordealwithiteratorobjectsyourself.Theforstatementdoesthatautomaticallyforyou,creatingatemporaryunnamedvariabletoholdtheiteratorforthedurationoftheloop.

Simplyput,whathappenswhenyouwriteforkinsequence:...body...,isthattheforloopaskssequenceforthenextelement,itgetssomethingback,itcallsthatsomethingk,andthenexecutesitsbody.Then,onceagain,theforloopaskssequenceforthenextelement,itcallsitkagain,andexecutesthebodyagain,andsoonandsoforth,untilthesequenceisexhausted.Emptysequenceswillresultinzeroexecutionsofthebody.

Somedatastructures,wheniteratedover,producetheirelementsinorder,suchaslists,tuples,andstrings,whilesomeothersdon't,suchassetsanddictionaries(priortoPython3.6).Pythongivesustheabilitytoiterateoveriterables,usingatypeofobjectcalledaniterator.

Accordingtotheofficialdocumentation(https://docs.python.org/3/glossary.html),aniteratoris:Anobjectrepresentingastreamofdata.Repeatedcallstotheiterator's__next__()method(orpassingittothebuilt-infunctionnext())returnsuccessiveitemsinthestream.WhennomoredataareavailableaStopIterationexceptionisraisedinstead.Atthispoint,theiteratorobjectisexhaustedandanyfurthercallstoits__next__()methodjustraiseStopIterationagain.Iteratorsarerequiredtohavean__iter__()methodthatreturnstheiteratorobjectitselfsoeveryiteratorisalsoiterableandmaybeusedinmostplaceswhereotheriterablesareaccepted.Onenotableexceptioniscodewhichattemptsmultiple

iterationpasses.Acontainerobject(suchasalist)producesafreshnewiteratoreachtimeyoupassittotheiter()functionoruseitinaforloop.Attemptingthiswithaniteratorwilljustreturnthesameexhaustediteratorobjectusedinthepreviousiterationpass,makingitappearlikeanemptycontainer.

Don'tworryifyoudon'tfullyunderstandalltheprecedinglegalese,youwillinduetime.Iputithereasahandyreferenceforthefuture.

Inpractice,thewholeiterable/iteratormechanismissomewhathiddenbehindthecode.Unlessyouneedtocodeyourowniterableoriteratorforsomereason,youwon'thavetoworryaboutthistoomuch.Butit'sveryimportanttounderstandhowPythonhandlesthiskeyaspectofcontrolflowbecauseitwillshapethewayyouwillwriteyourcode.

IteratingovermultiplesequencesLet'sseeanotherexampleofhowtoiterateovertwosequencesofthesamelength,inordertoworkontheirrespectiveelementsinpairs.Saywehavealistofpeopleandalistofnumbersrepresentingtheageofthepeopleinthefirstlist.Wewanttoprintapairperson/ageononelineforallofthem.Let'sstartwithanexampleandlet'srefineitgradually:

#multiple.sequences.py

people=['Conrad','Deepak','Heinrich','Tom']

ages=[29,30,34,36]

forpositioninrange(len(people)):

person=people[position]

age=ages[position]

print(person,age)

Bynow,thiscodeshouldbeprettystraightforwardforyoutounderstand.Weneedtoiterateoverthelistofpositions(0,1,2,3)becausewewanttoretrieveelementsfromtwodifferentlists.Executingitwegetthefollowing:

$pythonmultiple.sequences.py

Conrad29

Deepak30

Heinrich34

Tom36

ThiscodeisbothinefficientandnotPythonic.It'sinefficientbecauseretrievinganelementgiventhepositioncanbeanexpensiveoperation,andwe'redoingitfromscratchateachiteration.Thepostalworkerdoesn'tgobacktothebeginningoftheroadeachtimetheydeliveraletter,right?Theymovefromhousetohouse.Fromonetothenextone.Let'strytomakeitbetterusingenumerate:

#multiple.sequences.enumerate.py

people=['Conrad','Deepak','Heinrich','Tom']

ages=[29,30,34,36]

forposition,personinenumerate(people):

age=ages[position]

print(person,age)

That'sbetter,butstillnotperfect.Andit'sstillabitugly.We'reiteratingproperlyonpeople,butwe'restillfetchingageusingpositionalindexing,whichwewanttoloseaswell.Well,noworries,Pythongivesyouthezipfunction,remember?

Let'suseit:

#multiple.sequences.zip.py

people=['Conrad','Deepak','Heinrich','Tom']

ages=[29,30,34,36]

forperson,ageinzip(people,ages):

print(person,age)

Ah!Somuchbetter!Onceagain,comparetheprecedingcodewiththefirstexampleandadmirePython'selegance.ThereasonIwantedtoshowthisexampleistwofold.Ontheonehand,IwantedtogiveyouanideaofhowshortercodeinPythoncanbecomparedtootherlanguageswherethesyntaxdoesn'tallowyoutoiterateoversequencesorcollectionsaseasily.Andontheotherhand,andmuchmoreimportantly,noticethatwhentheforloopaskszip(sequenceA,sequenceB)forthenextelement,itgetsbackatuple,notjustasingleobject.Itgetsbackatuplewithasmanyelementsasthenumberofsequenceswefeedtothezipfunction.Let'sexpandalittleonthepreviousexampleintwoways,usingexplicitandimplicitassignment:

#multiple.sequences.explicit.py

people=['Conrad','Deepak','Heinrich','Tom']

ages=[29,30,34,36]

nationalities=['Poland','India','SouthAfrica','England']

forperson,age,nationalityinzip(people,ages,nationalities):

print(person,age,nationality)

Intheprecedingcode,weaddedthenationalitieslist.Nowthatwefeedthreesequencestothezipfunction,theforloopgetsbackathree-tupleateachiteration.Noticethatthepositionoftheelementsinthetuplerespectsthepositionofthesequencesinthezipcall.Executingthecodewillyieldthefollowingresult:

$pythonmultiple.sequences.explicit.py

Conrad29Poland

Deepak30India

Heinrich34SouthAfrica

Tom36England

Sometimes,forreasonsthatmaynotbeclearinasimpleexamplesuchastheprecedingone,youmaywanttoexplodethetuplewithinthebodyoftheforloop.Ifthatisyourdesire,it'sperfectlypossibletodoso:

#multiple.sequences.implicit.py

people=['Conrad','Deepak','Heinrich','Tom']

ages=[29,30,34,36]

nationalities=['Poland','India','SouthAfrica','England']

fordatainzip(people,ages,nationalities):

person,age,nationality=data

print(person,age,nationality)

It'sbasicallydoingwhattheforloopdoesautomaticallyforyou,butinsomecasesyoumaywanttodoityourself.Here,thethree-tupledatathatcomesfromzip(...)isexplodedwithinthebodyoftheforloopintothreevariables:person,age,andnationality.

ThewhileloopIntheprecedingpages,wesawtheforloopinaction.It'sincrediblyusefulwhenyouneedtoloopoverasequenceoracollection.Thekeypointtokeepinmind,whenyouneedtobeabletodiscriminatewhichloopingconstructtouse,isthattheforlooprockswhenyouhavetoiterateoverafiniteamountofelements.Itcanbeahugeamount,butstill,somethingthatendsatsomepoint.

Thereareothercasesthough,whenyoujustneedtoloopuntilsomeconditionissatisfied,orevenloopindefinitelyuntiltheapplicationisstopped,suchascaseswherewedon'treallyhavesomethingtoiterateon,andthereforetheforloopwouldbeapoorchoice.Butfearnot,forthesecases,Pythonprovidesuswiththewhileloop.

Thewhileloopissimilartotheforloop,inthattheybothloop,andateachiterationtheyexecuteabodyofinstructions.Whatisdifferentbetweenthemisthatthewhileloopdoesn'tloopoverasequence(itcan,butyouhavetowritethelogicmanuallyanditwouldn'tmakeanysense,youwouldjustwanttouseaforloop),rather,itloopsaslongasacertainconditionissatisfied.Whentheconditionisnolongersatisfied,theloopends.

Asusual,let'sseeanexamplethatwillclarifyeverythingforus.Wewanttoprintthebinaryrepresentationofapositivenumber.Inordertodoso,wecanuseasimplealgorithmthatcollectstheremaindersofdivisionby2(inreverseorder),andthatturnsouttobethebinaryrepresentationofthenumberitself:

6/2=3(remainder:0)

3/2=1(remainder:1)

1/2=0(remainder:1)

Listofremainders:0,1,1.

Inverseis1,1,0,whichisalsothebinaryrepresentationof6:110

Let'swritesomecodetocalculatethebinaryrepresentationforthenumber39:1001112:

#binary.py

n=39

remainders=[]

whilen>0:

remainder=n%2#remainderofdivisionby2

remainders.insert(0,remainder)#wekeeptrackofremainders

n//=2#wedividenby2

print(remainders)

Intheprecedingcode,Ihighlightedn>0,whichistheconditiontokeeplooping.Wecanmakethecodealittleshorter(andmorePythonic),byusingthedivmodfunction,whichiscalledwithanumberandadivisor,andreturnsatuplewiththeresultoftheintegerdivisionanditsremainder.Forexample,divmod(13,5)wouldreturn(2,3),andindeed5*2+3=13:

#binary.2.py

n=39

remainders=[]

whilen>0:

n,remainder=divmod(n,2)

remainders.insert(0,remainder)

print(remainders)

Intheprecedingcode,wehavereassignedntotheresultofthedivisionby2,andtheremainder,inonesingleline.

Noticethattheconditioninawhileloopisaconditiontocontinuelooping.IfitevaluatestoTrue,thenthebodyisexecutedandthenanotherevaluationfollows,andsoon,untiltheconditionevaluatestoFalse.Whenthathappens,theloopisexitedimmediatelywithoutexecutingitsbody.

IftheconditionneverevaluatestoFalse,theloopbecomesaso-calledinfiniteloop.Infiniteloopsareused,forexample,whenpollingfromnetworkdevices:youaskthesocketwhetherthereisanydata,youdosomethingwithitifthereisany,thenyousleepforasmallamountoftime,andthenyouaskthesocketagain,overandoveragain,withouteverstopping.

Havingtheabilitytoloopoveracondition,ortoloopindefinitely,isthereasonwhytheforloopaloneisnotenough,andthereforePythonprovidesthewhileloop.

Bytheway,ifyouneedthebinaryrepresentationofanumber,checkoutthebinfunction.

Justforfun,let'sadaptoneoftheexamples(multiple.sequences.py)usingthewhilelogic:

#multiple.sequences.while.py

people=['Conrad','Deepak','Heinrich','Tom']

ages=[29,30,34,36]

position=0

whileposition<len(people):

person=people[position]

age=ages[position]

print(person,age)

position+=1

Intheprecedingcode,Ihavehighlightedtheinitialization,condition,andupdateofthepositionvariable,whichmakesitpossibletosimulatetheequivalentforloopcodebyhandlingtheiterationvariablemanually.Everythingthatcanbedonewithaforloopcanalsobedonewithawhileloop,eventhoughyoucanseethere'sabitofboilerplateyouhavetogothroughinordertoachievethesameresult.Theoppositeisalsotrue,butunlessyouhaveareasontodoso,yououghttousetherighttoolforthejob,and99.9%ofthetimeyou'llbefine.

So,torecap,useaforloopwhenyouneedtoiterateoveraniterable,andawhileloopwhenyouneedtoloopaccordingtoaconditionbeingsatisfiedornot.Ifyoukeepinmindthedifferencebetweenthetwopurposes,youwillneverchoosethewrongloopingconstruct.

Let'snowseehowtoalterthenormalflowofaloop.

ThebreakandcontinuestatementsAccordingtothetaskathand,sometimesyouwillneedtoaltertheregularflowofaloop.Youcaneitherskipasingleiteration(asmanytimesasyouwant),oryoucanbreakoutoftheloopentirely.Acommonusecaseforskippingiterationsis,forexample,whenyou'reiteratingoveralistofitemsandyouneedtoworkoneachofthemonlyifsomeconditionisverified.Ontheotherhand,ifyou'reiteratingoveracollectionofitems,andyouhavefoundoneofthemthatsatisfiessomeneedyouhave,youmaydecidenottocontinuetheloopentirelyandthereforebreakoutofit.Therearecountlesspossiblescenarios,soit'sbettertoseeacoupleofexamples.

Let'ssayyouwanttoapplya20%discounttoallproductsinabasketlistforthosethathaveanexpirationdateoftoday.Thewayyouachievethisistousethecontinuestatement,whichtellstheloopingconstruct(fororwhile)tostopexecutionofthebodyimmediatelyandgotothenextiteration,ifany.Thisexamplewilltakeusalittledeeperdowntherabbithole,sobereadytojump:

#discount.py

fromdatetimeimportdate,timedelta

today=date.today()

tomorrow=today+timedelta(days=1)#today+1dayistomorrow

products=[

{'sku':'1','expiration_date':today,'price':100.0},

{'sku':'2','expiration_date':tomorrow,'price':50},

{'sku':'3','expiration_date':today,'price':20},

]

forproductinproducts:

ifproduct['expiration_date']!=today:

continue

product['price']*=0.8#equivalenttoapplying20%discount

print(

'Priceforsku',product['sku'],

'isnow',product['price'])

Westartbyimportingthedateandtimedeltaobjects,thenwesetupourproducts.Thosewithskuas1and3haveanexpirationdateoftoday,whichmeanswewanttoapplya20%discountonthem.Weloopovereachproductandweinspecttheexpirationdate.Ifitisnot(inequalityoperator,!=)today,wedon'twanttoexecutetherestofthebodysuite,sowecontinue.

Noticethatitisnotimportantwhereinthebodysuiteyouplacethecontinuestatement(youcanevenuseitmorethanonce).Whenyoureachit,executionstopsandgoesbacktothenextiteration.Ifwerunthediscount.pymodule,thisistheoutput:

$pythondiscount.py

Priceforsku1isnow80.0

Priceforsku3isnow16.0

Thisshowsyouthatthelasttwolinesofthebodyhaven'tbeenexecutedforskunumber2.

Let'snowseeanexampleofbreakingoutofaloop.SaywewanttotellwhetheratleastoneoftheelementsinalistevaluatestoTruewhenfedtotheboolfunction.Giventhatweneedtoknowwhetherthereisatleastone,whenwefindit,wedon'tneedtokeepscanningthelistanyfurther.InPythoncode,thistranslatestousingthebreakstatement.Let'swritethisdownintocode:

#any.py

items=[0,None,0.0,True,0,7]#Trueand7evaluatetoTrue

found=False#thisiscalled"flag"

foriteminitems:

print('scanningitem',item)

ifitem:

found=True#weupdatetheflag

break

iffound:#weinspecttheflag

print('AtleastoneitemevaluatestoTrue')

else:

print('AllitemsevaluatetoFalse')

Theprecedingcodeissuchacommonpatterninprogramming,youwillseeitalot.Whenyouinspectitemsthisway,basicallywhatyoudoistosetupaflagvariable,thenstarttheinspection.Ifyoufindoneelementthatmatchesyourcriteria(inthisexample,thatevaluatestoTrue),thenyouupdatetheflagandstopiterating.Afteriteration,youinspecttheflagandtakeactionaccordingly.Executionyields:

$pythonany.py

scanningitem0

scanningitemNone

scanningitem0.0

scanningitemTrue

AtleastoneitemevaluatestoTrue

SeehowexecutionstoppedafterTruewasfound?Thebreakstatementactsexactly

likethecontinueone,inthatitstopsexecutingthebodyoftheloopimmediately,butalso,preventsanyotheriterationfromrunning,effectivelybreakingoutoftheloop.Thecontinueandbreakstatementscanbeusedtogetherwithnolimitationintheirnumbers,bothintheforandwhileloopingconstructs.

Bytheway,thereisnoneedtowritecodetodetectwhetherthereisatleastoneelementinasequencethatevaluatestoTrue.Justcheckoutthebuilt-inanyfunction.

AspecialelseclauseOneofthefeaturesI'veseenonlyinthePythonlanguageistheabilitytohaveelseclausesafterwhileandforloops.It'sveryrarelyused,butit'sdefinitelynicetohave.Inshort,youcanhaveanelsesuiteafterafororwhileloop.Iftheloopendsnormally,becauseofexhaustionoftheiterator(forloop)orbecausetheconditionisfinallynotmet(whileloop),thentheelsesuite(ifpresent)isexecuted.Incaseexecutionisinterruptedbyabreakstatement,theelseclauseisnotexecuted.Let'stakeanexampleofaforloopthatiteratesoveragroupofitems,lookingforonethatwouldmatchsomecondition.Incasewedon'tfindatleastonethatsatisfiesthecondition,wewanttoraiseanexception.Thismeanswewanttoarresttheregularexecutionoftheprogramandsignalthattherewasanerror,orexception,thatwecannotdealwith.ExceptionswillbethesubjectofChapter8,Testing,Profiling,andDealingwithExceptions,sodon'tworryifyoudon'tfullyunderstandthemnow.Justbearinmindthattheywillaltertheregularflowofthecode.

Letmenowshowyoutwoexamplesthatdoexactlythesamething,butoneofthemisusingthespecialfor...elsesyntax.Saythatwewanttofind,amongacollectionofpeople,onethatcoulddriveacar:

#for.no.else.py

classDriverException(Exception):

pass

people=[('James',17),('Kirk',9),('Lars',13),('Robert',8)]

driver=None

forperson,ageinpeople:

ifage>=18:

driver=(person,age)

break

ifdriverisNone:

raiseDriverException('Drivernotfound.')

Noticetheflagpatternagain.WesetthedrivertobeNone,thenifwefindone,weupdatethedriverflag,andthen,attheendoftheloop,weinspectittoseewhetheronewasfound.Ikindofhavethefeelingthatthosekidswoulddriveaverymetalliccar,butanyway,noticethatifadriverisnotfound,DriverExceptionisraised,signalingtotheprogramthatexecutioncannotcontinue(we'relackingthedriver).

Thesamefunctionalitycanberewrittenabitmoreelegantlyusingthefollowingcode:

#for.else.py

classDriverException(Exception):

pass

people=[('James',17),('Kirk',9),('Lars',13),('Robert',8)]

forperson,ageinpeople:

ifage>=18:

driver=(person,age)

break

else:

raiseDriverException('Drivernotfound.')

Noticethatwearen'tforcedtousetheflagpatternanymore.Theexceptionisraisedaspartoftheforlooplogic,whichmakesgoodsensebecausetheforloopischeckingonsomecondition.Allweneedistosetupadriverobjectincasewefindone,becausetherestofthecodeisgoingtousethatinformationsomewhere.Noticethecodeisshorterandmoreelegant,becausethelogicisnowcorrectlygroupedtogetherwhereitbelongs.

IntheTransformingCodeintoBeautiful,IdiomaticPythonvideo,RaymondHettingersuggestsamuchbetternamefortheelsestatementassociatedwithaforloop:nobreak.Ifyoustrugglerememberinghowtheelseworksforaforloop,simplyrememberingthisfactshouldhelpyou.

PuttingallthistogetherNowthatyouhaveseenallthereistoseeaboutconditionalsandloops,it'stimetospicethingsupalittle,andlookatthosetwoexamplesIanticipatedatthebeginningofthischapter.We'llmixandmatchhere,soyoucanseehowyoucanusealltheseconceptstogether.Let'sstartbywritingsomecodetogeneratealistofprimenumbersuptosomelimit.PleasebearinmindthatI'mgoingtowriteaveryinefficientandrudimentaryalgorithmtodetectprimes.Theimportantthingforyouistoconcentrateonthosebitsinthecodethatbelongtothischapter'ssubject.

AprimegeneratorAccordingtoWikipedia:

Aprimenumber(oraprime)isanaturalnumbergreaterthan1thathasnopositivedivisorsotherthan1anditself.Anaturalnumbergreaterthan1thatisnotaprimenumberiscalledacompositenumber.

Basedonthisdefinition,ifweconsiderthefirst10naturalnumbers,wecanseethat2,3,5,and7areprimes,while1,4,6,8,9,and10arenot.Inordertohaveacomputertellyouwhetheranumber,N,isprime,youcandividethatnumberbyallnaturalnumbersintherange[2,N).Ifanyofthosedivisionsyieldszeroasaremainder,thenthenumberisnotaprime.Enoughchatter,let'sgetdowntobusiness.I'llwritetwoversionsofthis,thesecondofwhichwillexploitthefor...elsesyntax:

#primes.py

primes=[]#thiswillcontaintheprimesintheend

upto=100#thelimit,inclusive

forninrange(2,upto+1):

is_prime=True#flag,newateachiterationofouterfor

fordivisorinrange(2,n):

ifn%divisor==0:

is_prime=False

break

ifis_prime:#checkonflag

primes.append(n)

print(primes)

Therearealotofthingstonoticeintheprecedingcode.Firstofall,wesetupanemptyprimeslist,whichwillcontaintheprimesattheend.Thelimitis100,andyoucanseeit'sinclusiveinthewaywecallrange()intheouterloop.Ifwewroterange(2,upto)thatwouldbe[2,upto),right?Thereforerange(2,upto+1)givesus[2,upto+1)==[2,upto].

So,therearetwoforloops.Intheouterone,weloopoverthecandidateprimes,thatis,allnaturalnumbersfrom2toupto.Insideeachiterationofthisouterloop,wesetupaflag(whichissettoTrueateachiteration),andthenstartdividingthecurrentnbyallnumbersfrom2ton-1.Ifwefindaproperdivisorforn,itmeansniscomposite,andthereforewesettheflagtoFalseandbreaktheloop.Noticethatwhenwebreaktheinnerone,theouteronekeepsongoingnormally.Thereasonwhywebreakafterhavingfoundaproperdivisorfornisthatwedon't

needanyfurtherinformationtobeabletotellthatnisnotaprime.

Whenwecheckontheis_primeflag,ifitisstillTrue,itmeanswecouldn'tfindanynumberin[2,n)thatisaproperdivisorforn,thereforenisaprime.Weappendntotheprimeslist,andhop!Anotheriterationproceeds,untilnequals100.

Runningthiscodeyields:

$pythonprimes.py

[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,

83,89,97]

Beforeweproceed,onequestion:ofalltheiterationsoftheouterloop,oneofthemisdifferentfromalltheothers.Couldyoutellwhichone,andwhy?Thinkaboutitforasecond,gobacktothecode,trytofigureitoutforyourself,andthenkeepreadingon.

Didyoufigureitout?Ifnot,don'tfeelbad,it'sperfectlynormal.Iaskedyoutodoitasasmallexercisebecauseit'swhatcodersdoallthetime.Theskilltounderstandwhatthecodedoesbysimplylookingatitissomethingyoubuildovertime.It'sveryimportant,sotrytoexerciseitwheneveryoucan.I'lltellyoutheanswernow:theiterationthatbehavesdifferentlyfromallothersisthefirstone.Thereasonisbecauseinthefirstiteration,nis2.Thereforetheinnermostforloopwon'tevenrun,becauseit'saforloopthatiteratesoverrange(2,2),andwhatisthatifnot[2,2)?Tryitoutforyourself,writeasimpleforloopwiththatiterable,putaprintinthebodysuite,andseewhetheranythinghappens(itwon't...).

Now,fromanalgorithmicpointofview,thiscodeisinefficient,solet'satleastmakeitmorebeautiful:

#primes.else.py

primes=[]

upto=100

forninrange(2,upto+1):

fordivisorinrange(2,n):

ifn%divisor==0:

break

else:

primes.append(n)

print(primes)

Muchnicer,right?Theis_primeflagisgone,andweappendntotheprimeslistwhenweknowtheinnerforloophasn'tencounteredanybreakstatements.See

howthecodelookscleanerandreadsbetter?

ApplyingdiscountsInthisexample,IwanttoshowyouatechniqueIlikealot.Inmanyprogramminglanguages,otherthantheif/elif/elseconstructs,inwhateverformorsyntaxtheymaycome,youcanfindanotherstatement,usuallycalledswitch/case,thatinPythonismissing.Itistheequivalentofacascadeofif/elif/.../elif/elseclauses,withasyntaxsimilartothis(warning!JavaScriptcode!):

/*switch.js*/

switch(day_number){

case1:

case2:

case3:

case4:

case5:

day="Weekday";

break;

case6:

day="Saturday";

break;

case0:

day="Sunday";

break;

default:

day="";

alert(day_number+'isnotavaliddaynumber.')

}

Intheprecedingcode,weswitchonavariablecalledday_number.Thismeanswegetitsvalueandthenwedecidewhatcaseitfitsin(ifany).From1to5thereisacascade,whichmeansnomatterthenumber,[1,5]allgodowntothebitoflogicthatsetsdayas"Weekday".Thenwehavesinglecasesfor0and6,andadefaultcasetopreventerrors,whichalertsthesystemthatday_numberisnotavaliddaynumber,thatis,notin[0,6].Pythonisperfectlycapableofrealizingsuchlogicusingif/elif/elsestatements:

#switch.py

if1<=day_number<=5:

day='Weekday'

elifday_number==6:

day='Saturday'

elifday_number==0:

day='Sunday'

else:

day=''

raiseValueError(

str(day_number)+'isnotavaliddaynumber.')

Intheprecedingcode,wereproducethesamelogicoftheJavaScriptsnippetinPython,usingif/elif/elsestatements.IraisedtheValueErrorexceptionjustasanexampleattheend,ifday_numberisnotin[0,6].Thisisonepossiblewayoftranslatingtheswitch/caselogic,butthereisalsoanotherone,sometimescalleddispatching,whichIwillshowyouinthelastversionofthenextexample.

Bytheway,didyounoticethefirstlineoftheprevioussnippet?HaveyounoticedthatPythoncanmakedouble(actually,evenmultiple)comparisons?It'sjustwonderful!

Let'sstartthenewexamplebysimplywritingsomecodethatassignsadiscounttocustomersbasedontheircouponvalue.I'llkeepthelogicdowntoaminimumhere,rememberthatallwereallycareaboutisunderstandingconditionalsandloops:

#coupons.py

customers=[

dict(id=1,total=200,coupon_code='F20'),#F20:fixed,£20

dict(id=2,total=150,coupon_code='P30'),#P30:percent,30%

dict(id=3,total=100,coupon_code='P50'),#P50:percent,50%

dict(id=4,total=110,coupon_code='F15'),#F15:fixed,£15

]

forcustomerincustomers:

code=customer['coupon_code']

ifcode=='F20':

customer['discount']=20.0

elifcode=='F15':

customer['discount']=15.0

elifcode=='P30':

customer['discount']=customer['total']*0.3

elifcode=='P50':

customer['discount']=customer['total']*0.5

else:

customer['discount']=0.0

forcustomerincustomers:

print(customer['id'],customer['total'],customer['discount'])

Westartbysettingupsomecustomers.Theyhaveanordertotal,acouponcode,andanID.Imadeupfourdifferenttypesofcoupons,twoarefixedandtwoarepercentage-based.Youcanseethatintheif/elif/elsecascadeIapplythediscountaccordingly,andIsetitasa'discount'keyinthecustomerdictionary.

Attheend,Ijustprintoutpartofthedatatoseewhethermycodeisworkingproperly:

$pythoncoupons.py

120020.0

215045.0

310050.0

411015.0

Thiscodeissimpletounderstand,butallthoseclausesarekindofclutteringthelogic.It'snoteasytoseewhat'sgoingonatafirstglance,andIdon'tlikeit.Incaseslikethis,youcanexploitadictionarytoyouradvantage,likethis:

#coupons.dict.py

customers=[

dict(id=1,total=200,coupon_code='F20'),#F20:fixed,£20

dict(id=2,total=150,coupon_code='P30'),#P30:percent,30%

dict(id=3,total=100,coupon_code='P50'),#P50:percent,50%

dict(id=4,total=110,coupon_code='F15'),#F15:fixed,£15

]

discounts={

'F20':(0.0,20.0),#eachvalueis(percent,fixed)

'P30':(0.3,0.0),

'P50':(0.5,0.0),

'F15':(0.0,15.0),

}

forcustomerincustomers:

code=customer['coupon_code']

percent,fixed=discounts.get(code,(0.0,0.0))

customer['discount']=percent*customer['total']+fixed

forcustomerincustomers:

print(customer['id'],customer['total'],customer['discount'])

Runningtheprecedingcodeyieldsexactlythesameresultwehadfromthesnippetbeforeit.Wesparedtwolines,butmoreimportantly,wegainedalotinreadability,asthebodyoftheforloopnowisjustthreelineslong,andveryeasytounderstand.Theconcepthereistouseadictionaryasadispatcher.Inotherwords,wetrytofetchsomethingfromthedictionarybasedonacode(ourcoupon_code),andbyusingdict.get(key,default),wemakesurewealsocaterforwhenthecodeisnotinthedictionaryandweneedadefaultvalue.

NoticethatIhadtoapplysomeverysimplelinearalgebrainordertocalculatethediscountproperly.Eachdiscounthasapercentageandfixedpartinthedictionary,representedbyatwo-tuple.Byapplyingpercent*total+fixed,wegetthecorrectdiscount.Whenpercentis0,theformulajustgivesthefixedamount,anditgivespercent*totalwhenfixedis0.

Thistechniqueisimportantbecauseitisalsousedinothercontexts,withfunctions,whereitactuallybecomesmuchmorepowerfulthanwhatwe'veseenintheprecedingsnippet.Anotheradvantageofusingitisthatyoucancodeitinsuchawaythatthekeysandvaluesofthediscountsdictionaryarefetched

dynamically(forexample,fromadatabase).Thiswillallowthecodetoadapttowhateverdiscountsandconditionsyouhave,withouthavingtomodifyanything.

Ifit'snotcompletelycleartoyouhowitworks,Isuggestyoutakeyourtimeandexperimentwithit.Changevaluesandaddprintstatementstoseewhat'sgoingonwhiletheprogramisrunning.

AquickpeekattheitertoolsmoduleAchapteraboutiterables,iterators,conditionallogic,andloopingwouldn'tbecompletewithoutafewwordsabouttheitertoolsmodule.Ifyouareintoiterating,thisisakindofheaven.

AccordingtothePythonofficialdocumentation(https://docs.python.org/2/library/itertools.html),theitertoolsmoduleis:ThismodulewhichimplementsanumberofiteratorbuildingblocksinspiredbyconstructsfromAPL,Haskell,andSML.EachhasbeenrecastinaformsuitableforPython.Themodulestandardizesacoresetoffast,memoryefficienttoolsthatareusefulbythemselvesorincombination.Together,theyforman“iteratoralgebra”makingitpossibletoconstructspecializedtoolssuccinctlyandefficientlyinpurePython.

BynomeansdoIhavetheroomheretoshowyouallthegoodiesyoucanfindinthismodule,soIencourageyoutogocheckitoutforyourself,Ipromiseyou'llenjoyit.Inanutshell,itprovidesyouwiththreebroadcategoriesofiterators.Iwillgiveyouaverysmallexampleofoneiteratortakenfromeachoneofthem,justtomakeyourmouthwateralittle.

InfiniteiteratorsInfiniteiteratorsallowyoutoworkwithaforloopinadifferentfashion,suchasifitwereawhileloop:#infinite.pyfromitertoolsimportcount

fornincount(5,3):ifn>20:breakprint(n,end=',')#insteadofnewline,commaandspace

Runningthecodegivesthis:

$pythoninfinite.py

5,8,11,14,17,20,

Thecountfactoryclassmakesaniteratorthatjustgoesonandoncounting.Itstartsfrom5andkeepsadding3toit.Weneedtobreakitmanuallyifwedon'twanttogetstuckinaninfiniteloop.

IteratorsterminatingontheshortestinputsequenceThiscategoryisveryinteresting.Itallowsyoutocreateaniteratorbasedonmultipleiterators,combiningtheirvaluesaccordingtosomelogic.Thekeypointhereisthatamongthoseiterators,incaseanyofthemareshorterthantherest,theresultingiteratorwon'tbreak,itwillsimplystopassoonastheshortestiteratorisexhausted.Thisisverytheoretical,Iknow,soletmegiveyouanexampleusingcompress.ThisiteratorgivesyoubackthedataaccordingtoacorrespondingiteminaselectorbeingTrueorFalse:

compress('ABC',(1,0,1))wouldgiveback'A'and'C',becausetheycorrespondto1.Let'sseeasimpleexample:

#compress.py

fromitertoolsimportcompress

data=range(10)

even_selector=[1,0]*10

odd_selector=[0,1]*10

even_numbers=list(compress(data,even_selector))

odd_numbers=list(compress(data,odd_selector))

print(odd_selector)

print(list(data))

print(even_numbers)

print(odd_numbers)

Noticethatodd_selectorandeven_selectorare20elementslong,whiledataisjust10elementslong.compresswillstopassoonasdatahasyieldeditslastelement.Runningthiscodeproducesthefollowing:

$pythoncompress.py

[0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1]

[0,1,2,3,4,5,6,7,8,9]

[0,2,4,6,8]

[1,3,5,7,9]

It'saveryfastandnicewayofselectingelementsoutofaniterable.Thecodeisverysimple,justnoticethatinsteadofusingaforlooptoiterateovereachvaluethatisgivenbackbythecompresscalls,weusedlist(),whichdoesthesame,butinsteadofexecutingabodyofinstructions,putsallthevaluesintoalistand

returnsit.

CombinatoricgeneratorsLastbutnotleast,combinatoricgenerators.Thesearereallyfun,ifyouareintothiskindofthing.Let'sjustseeasimpleexampleonpermutations.

AccordingtoWolframMathworld:

Apermutation,alsocalledan"arrangementnumber"or"order",isarearrangementoftheelementsofanorderedlistSintoaone-to-onecorrespondencewithSitself.

Forexample,therearesixpermutationsofABC:ABC,ACB,BAC,BCA,CAB,andCBA.

IfasethasNelements,thenthenumberofpermutationsofthemisN!(Nfactorial).FortheABCstring,thepermutationsare3!=3*2*1=6.Let'sdoitinPython:

#permutations.py

fromitertoolsimportpermutations

print(list(permutations('ABC')))

Thisveryshortsnippetofcodeproducesthefollowingresult:

$pythonpermutations.py

[('A','B','C'),('A','C','B'),('B','A','C'),('B','C','A'),('C','A','B'),

('C','B','A')]

Beverycarefulwhenyouplaywithpermutations.Theirnumbergrowsataratethatisproportionaltothefactorialofthenumberoftheelementsyou'repermuting,andthatnumbercangetreallybig,reallyfast.

SummaryInthischapter,we'vetakenanothersteptowardexpandingourcodingvocabulary.We'veseenhowtodrivetheexecutionofthecodebyevaluatingconditions,andwe'veseenhowtoloopanditerateoversequencesandcollectionsofobjects.Thisgivesusthepowertocontrolwhathappenswhenourcodeisrun,whichmeanswearegettinganideaofhowtoshapeitsothatitdoeswhatwewantanditreactstodatathatchangesdynamically.

We'vealsoseenhowtocombineeverythingtogetherinacoupleofsimpleexamples,andintheend,wetookabrieflookattheitertoolsmodule,whichisfullofinterestingiteratorsthatcanenrichourabilitieswithPythonevenmore.

Nowit'stimetoswitchgears,takeanotherstepforward,andtalkaboutfunctions.Thenextchapterisallaboutthembecausetheyareextremelyimportant.Makesureyou'recomfortablewithwhathasbeencovereduptonow.Iwanttoprovideyouwithinterestingexamples,soI'llhavetogoalittlefaster.Ready?Turnthepage.

Functions,theBuildingBlocksofCode"Tocreatearchitectureistoputinorder.Putwhatinorder?Functionsandobjects."

–LeCorbusier

Inthepreviouschapters,wehaveseenthateverythingisanobjectinPython,andfunctionsarenoexception.But,whatexactlyisafunction?Afunctionisasequenceofinstructionsthatperformatask,bundledasaunit.Thisunitcanthenbeimportedandusedwhereverit'sneeded.Therearemanyadvantagestousingfunctionsinyourcode,aswe'llseeshortly.

Inthischapter,wearegoingtocoverthefollowing:

Functions—whattheyareandwhyweshouldusethemScopesandnameresolutionFunctionsignatures—inputparametersandreturnvaluesRecursiveandanonymousfunctionsImportingobjectsforcodereuse

Ibelievethesaying,apictureisworthonethousandwords,isparticularlytruewhenexplainingfunctionstosomeonewhoisnewtothisconcept,sopleasetakealookatthefollowingdiagram:

Asyoucansee,afunctionisablockofinstructions,packagedasawhole,likeabox.Functionscanacceptinputargumentsandproduceoutputvalues.Bothoftheseareoptional,aswe'llseeintheexamplesinthischapter.

AfunctioninPythonisdefinedbyusingthedefkeyword,afterwhichthenameofthefunctionfollows,terminatedbyapairofparentheses(whichmayormay

notcontaininputparameters),andacolon(:)signalstheendofthefunctiondefinitionline.Immediatelyafterwards,indentedbyfourspaces,wefindthebodyofthefunction,whichisthesetofinstructionsthatthefunctionwillexecutewhencalled.

Notethattheindentationbyfourspacesisnotmandatory,butitistheamountofspacessuggestedbyPEP8,and,inpractice,itisthemostwidelyusedspacingmeasure.

Afunctionmayormaynotreturnanoutput.Ifafunctionwantstoreturnanoutput,itdoessobyusingthereturnkeyword,followedbythedesiredoutput.Ifyouhaveaneagleeye,youmayhavenoticedthelittle*afterOptionalintheoutputsectionoftheprecedingdiagram.ThisisbecauseafunctionalwaysreturnssomethinginPython,evenifyoudon'texplicitlyusethereturnclause.Ifthefunctionhasnoreturnstatementinitsbody,ornovalueisgiventothereturnstatementitself,thefunctionreturnsNone.Thereasonsbehindthisdesignchoiceareoutsidethescopeofanintroductorychapter,soallyouneedtoknowisthatthisbehaviorwillmakeyourlifeeasier.Asalways,thankyou,Python.

Whyusefunctions?

Functionsareamongthemostimportantconceptsandconstructsofanylanguage,soletmegiveyouafewreasonswhyweneedthem:

Theyreducecodeduplicationinaprogram.Byhavingaspecifictasktakencareofbyaniceblockofpackagedcodethatwecanimportandcallwheneverwewant,wedon'tneedtoduplicateitsimplementation.Theyhelpinsplittingacomplextaskorprocedureintosmallerblocks,eachofwhichbecomesafunction.Theyhidetheimplementationdetailsfromtheirusers.Theyimprovetraceability.Theyimprovereadability.

Let'slookatafewexamplestogetabetterunderstandingofeachpoint.

ReducingcodeduplicationImaginethatyouarewritingapieceofscientificsoftware,andyouneedtocalculateprimesuptoalimit,aswedidinthepreviouschapter.Youhaveanicealgorithmtocalculatethem,soyoucopyandpasteittowhereveryouneed.Oneday,though,yourfriend,B.Riemann,givesyouabetteralgorithmtocalculateprimes,whichwillsaveyoualotoftime.Atthispoint,youneedtogooveryourwholecodebaseandreplacetheoldcodewiththenewone.

Thisisactuallyabadwaytogoaboutit.It'serror-prone,youneverknowwhatlinesyouarechoppingoutorleavinginbymistake,whenyoucutandpastecodeintoothercode,andyoumayalsoriskmissingoneoftheplaceswhereprimecalculationisdone,leavingyoursoftwareinaninconsistentstatewherethesameactionisperformedindifferentplacesindifferentways.Whatif,insteadofreplacingcodewithabetterversionofit,youneedtofixabug,andyoumissoneoftheplaces?Thatwouldbeevenworse.

So,whatshouldyoudo?Simple!Youwriteafunction,get_prime_numbers(upto),anduseitanywhereyouneedalistofprimes.WhenB.Riemanncomestoyouandgivesyouthenewcode,allyouhavetodoisreplacethebodyofthatfunctionwiththenewimplementation,andyou'redone!Therestofthesoftwarewillautomaticallyadapt,sinceit'sjustcallingthefunction.

Yourcodewillbeshorter,itwillnotsufferfrominconsistenciesbetweenoldandnewwaysofperformingatask,orundetectedbugsduetocopy-and-pastefailuresoroversights.Usefunctions,andyou'llonlygainfromit,Ipromise.

SplittingacomplextaskFunctionsarealsoveryusefulforsplittinglongorcomplextasksintosmallerones.Theendresultisthatthecodebenefitsfromitinseveralways,forexample,readability,testability,andreuse.Togiveyouasimpleexample,imaginethatyou'repreparingareport.Yourcodeneedstofetchdatafromadatasource,parseit,filterit,polishit,andthenawholeseriesofalgorithmsneedstoberunagainstit,inordertoproducetheresultsthatwillfeedtheReportclass.It'snotuncommontoreadprocedureslikethisthatarejustonebigdo_report(data_source)function.Therearetensorhundredsoflinesofcodethatendwithreturnreport.

Thesesituationsareslightlymorecommoninscientificcode,whichtendtobebrilliantfromanalgorithmicpointofview,butsometimeslackthetouchofexperiencedprogrammerswhenitcomestothestyleinwhichtheyarewritten.Now,pictureafewhundredlinesofcode.It'sveryhardtofollowthrough,tofindtheplaceswherethingsarechangingcontext(suchasfinishingonetaskandstartingthenextone).Doyouhavethepictureinyourmind?Good.Don'tdoit!Instead,lookatthiscode:

#data.science.example.py

defdo_report(data_source):

#fetchandpreparedata

data=fetch_data(data_source)

parsed_data=parse_data(data)

filtered_data=filter_data(parsed_data)

polished_data=polish_data(filtered_data)

#runalgorithmsondata

final_data=analyse(polished_data)

#createandreturnreport

report=Report(final_data)

returnreport

Thepreviousexampleisfictitious,ofcourse,butcanyouseehoweasyitwouldbetogothroughthecode?Iftheendresultlookswrong,itwouldbeveryeasytodebugeachofthesingledataoutputsinthedo_reportfunction.Moreover,it'seveneasiertoexcludepartoftheprocesstemporarilyfromthewholeprocedure(youjustneedtocommentoutthepartsyouneedtosuspend).Codelikethisiseasiertodealwith.

HidingimplementationdetailsLet'sstaywiththeprecedingexampletotalkaboutthispointaswell.Youcanseethat,bygoingthroughthecodeofthedo_reportfunction,youcangetaprettygoodunderstandingwithoutreadingonesinglelineofimplementation.Thisisbecausefunctionshidetheimplementationdetails.Thisfeaturemeansthat,ifyoudon'tneedtodelveintothedetails,youarenotforcedto,inthewayyouwouldifdo_reportwasjustonebig,fatfunction.Inordertounderstandwhatwasgoingon,youwouldhavetoreadeverysinglelineofcode.Withfunctions,youdon'tneedto.Thisreducesthetimeyouspendreadingthecodeandsince,inaprofessionalenvironment,readingcodetakesmuchmoretimethanactuallywritingit,it'sveryimportanttoreduceitbyasmuchaswecan.

ImprovingreadabilityCoderssometimesdon'tseethepointinwritingafunctionwithabodyofoneortwolinesofcode,solet'slookatanexamplethatshowsyouwhyyoushoulddoit.

Imaginethatyouneedtomultiplytwomatrices:

Wouldyouprefertohavetoreadthiscode:

#matrix.multiplication.nofunc.py

a=[[1,2],[3,4]]

b=[[5,1],[2,1]]

c=[[sum(i*jfori,jinzip(r,c))forcinzip(*b)]

forrina]

Orwouldyoupreferthisone:

#matrix.multiplication.func.py

#thisfunctioncouldalsobedefinedinanothermodule

defmatrix_mul(a,b):

return[[sum(i*jfori,jinzip(r,c))forcinzip(*b)]

forrina]

a=[[1,2],[3,4]]

b=[[5,1],[2,1]]

c=matrix_mul(a,b)

It'smucheasiertounderstandthatcistheresultofthemultiplicationbetweenaandbinthesecondexample.It'smucheasiertoreadthroughthecodeand,ifyoudon'tneedtomodifythatmultiplicationlogic,youdon'tevenneedtogointotheimplementationdetails.Therefore,readabilityisimprovedherewhile,inthefirstsnippet,youwouldhavetospendtimetryingtounderstandwhatthatcomplicatedlistcomprehensionisdoing.

Don'tworryifyoudon'tunderstandlistcomprehensions,we'llstudytheminChapter5,SavingTimeandMemory.

ImprovingtraceabilityImaginethatyouhavewrittenane-commercewebsite.Youhavedisplayedtheproductpricesalloverthepages.ImaginethatthepricesinyourdatabasearestoredwithnoVAT(salestax),butyouwanttodisplaythemonthewebsitewithVATat20%.Here'safewwaysofcalculatingtheVAT-inclusivepricefromtheVAT-exclusiveprice:

#vat.py

price=100#GBP,noVAT

final_price1=price*1.2

final_price2=price+price/5.0

final_price3=price*(100+20)/100.0

final_price4=price+price*0.2

AllthesefourdifferentwaysofcalculatingaVAT-inclusivepriceareperfectlyacceptable,andIpromiseyouIhavefoundthemallinmycolleagues'code,overtheyears.Now,imaginethatyouhavestartedsellingyourproductsindifferentcountriesandsomeofthemhavedifferentVATrates,soyouneedtorefactoryourcode(throughoutthewebsite)inordertomakethatVATcalculationdynamic.

HowdoyoutracealltheplacesinwhichyouareperformingaVATcalculation?CodingtodayisacollaborativetaskandyoucannotbesurethattheVAThasbeencalculatedusingonlyoneofthoseforms.It'sgoingtobehell,believeme.

So,let'swriteafunctionthattakestheinputvalues,vatandprice(VAT-exclusive),andreturnsaVAT-inclusiveprice:

#vat.function.py

defcalculate_price_with_vat(price,vat):

returnprice*(100+vat)/100

NowyoucanimportthatfunctionanduseitinanyplaceinyourwebsitewhereyouneedtocalculateaVAT-inclusiveprice,andwhenyouneedtotracethosecalls,youcansearchforcalculate_price_with_vat.

Notethat,intheprecedingexample,priceisassumedtobeVAT-exclusive,andvatisapercentagevalue(forexample,19,20,or23).

ScopesandnameresolutionDoyourememberwhenwetalkedaboutscopesandnamespacesinChapter1,AGentleIntroductiontoPython?We'regoingtoexpandonthatconceptnow.Finally,wecantalkaboutfunctionsandthiswillmakeeverythingeasiertounderstand.Let'sstartwithaverysimpleexample:

#scoping.level.1.py

defmy_function():

test=1#thisisdefinedinthelocalscopeofthefunction

print('my_function:',test)

test=0#thisisdefinedintheglobalscope

my_function()

print('global:',test)

Ihavedefinedthetestnameintwodifferentplacesinthepreviousexample.Itisactuallyintwodifferentscopes.Oneistheglobalscope(test=0),andtheotheristhelocalscopeofthemy_functionfunction(test=1).Ifyouexecutethecode,you'llseethis:

$pythonscoping.level.1.py

my_function:1

global:0

It'sclearthattest=1shadowsthetest=0assignmentinmy_function.Intheglobalcontext,testisstill0,asyoucanseefromtheoutputoftheprogram,butwedefinethetestnameagaininthefunctionbody,andwesetittopointtoanintegerofvalue1.Boththetwotestnamesthereforeexist,oneintheglobalscope,pointingtoanintobjectwithavalueof0,theotherinthemy_functionscope,pointingtoanintobjectwithavalueof1.Let'scommentoutthelinewithtest=1.Pythonsearchesforthetestnameinthenextenclosingnamespace(recalltheLEGBrule:local,enclosing,global,built-indescribedinChapter1,AGentleIntroductiontoPython)and,inthiscase,wewillseethevalue0printedtwice.Tryitinyourcode.

Now,let'sraisethestakeshereandlevelup:

#scoping.level.2.py

defouter():

test=1#outerscope

definner():

test=2#innerscope

print('inner:',test)

inner()

print('outer:',test)

test=0#globalscope

outer()

print('global:',test)

Intheprecedingcode,wehavetwolevelsofshadowing.Onelevelisinthefunctionouter,andtheotheroneisinthefunctioninner.Itisfarfromrocketscience,butitcanbetricky.Ifwerunthecode,weget:

$pythonscoping.level.2.py

inner:2

outer:1

global:0

Trycommentingoutthetest=1line.Canyoufigureoutwhattheresultwillbe?Well,whenreachingtheprint('outer:',test)line,Pythonwillhavetolookfortestinthenextenclosingscope,thereforeitwillfindandprint0,insteadof1.Makesureyoucommentouttest=2aswell,toseewhetheryouunderstandwhathappens,andwhethertheLEGBruleisclear,beforeproceeding.

AnotherthingtonoteisthatPythongivesyoutheabilitytodefineafunctioninanotherfunction.Theinnerfunction'snameisdefinedwithinthenamespaceoftheouterfunction,exactlyaswouldhappenwithanyothername.

TheglobalandnonlocalstatementsGoingbacktotheprecedingexample,wecanalterwhathappenstotheshadowingofthetestnamebyusingoneofthesetwospecialstatements:globalandnonlocal.Asyoucanseefromthepreviousexample,whenwedefinetest=2intheinnerfunction,weoverwritetestneitherintheouterfunctionnorintheglobalscope.Wecangetreadaccesstothosenamesifweusetheminanestedscopethatdoesn'tdefinethem,butwecannotmodifythembecause,whenwewriteanassignmentinstruction,we'reactuallydefininganewnameinthecurrentscope.

Howdowechangethisbehavior?Well,wecanusethenonlocalstatement.Accordingtotheofficialdocumentation:

"Thenonlocalstatementcausesthelistedidentifierstorefertopreviouslyboundvariablesinthenearestenclosingscopeexcludingglobals."

Let'sintroduceitintheinnerfunction,andseewhathappens:

#scoping.level.2.nonlocal.py

defouter():

test=1#outerscope

definner():

nonlocaltest

test=2#nearestenclosingscope(whichis'outer')

print('inner:',test)

inner()

print('outer:',test)

test=0#globalscope

outer()

print('global:',test)

Noticehowinthebodyoftheinnerfunction,Ihavedeclaredthetestnametobenonlocal.Runningthiscodeproducesthefollowingresult:

$pythonscoping.level.2.nonlocal.py

inner:2

outer:2

global:0

Wow,lookatthatresult!Itmeansthat,bydeclaringtesttobenonlocalintheinnerfunction,weactuallygettobindthetestnametotheonedeclaredintheouter

function.Ifweremovedthenonlocaltestlinefromtheinnerfunctionandtriedthesametrickintheouterfunction,wewouldgetaSyntaxError,becausethenonlocalstatementworksonenclosingscopesexcludingtheglobalone.

Isthereawaytogettothattest=0intheglobalnamespacethen?Ofcourse,wejustneedtousetheglobalstatement:

#scoping.level.2.global.py

defouter():

test=1#outerscope

definner():

globaltest

test=2#globalscope

print('inner:',test)

inner()

print('outer:',test)

test=0#globalscope

outer()

print('global:',test)

Notethatwehavenowdeclaredthetestnametobeglobal,whichwillbasicallybindittotheonewedefinedintheglobalnamespace(test=0).Runthecodeandyoushouldgetthefollowing:

$pythonscoping.level.2.global.py

inner:2

outer:1

global:2

Thisshowsthatthenameaffectedbythetest=2assignmentisnowtheglobalone.Thistrickwouldalsoworkintheouterfunctionbecause,inthiscase,we'rereferringtotheglobalscope.Tryitforyourselfandseewhatchanges,getcomfortablewithscopesandnameresolution,it'sveryimportant.Also,couldyoutellwhathappensifyoudefinedinneroutsideouterintheprecedingexamples?

Inputparameters

Atthebeginningofthischapter,wesawthatafunctioncantakeinputparameters.Beforewedelveintoallpossibletypeofparameters,let'smakesureyouhaveaclearunderstandingofwhatpassingaparametertoafunctionmeans.Therearethreekeypointstokeepinmind:

Argument-passingisnothingmorethanassigninganobjecttoalocalvariablenameAssigninganobjecttoanargumentnameinsideafunctiondoesn'taffectthecallerChangingamutableobjectargumentinafunctionaffectsthecaller

Let'slookatanexampleforeachofthesepoints.

Argument-passingTakealookatthefollowingcode.Wedeclareaname,x,intheglobalscope,thenwedeclareafunction,func(y),andfinallywecallit,passingx:#key.points.argument.passing.pyx=3deffunc(y):print(y)func(x)#prints:3

Whenfunciscalledwithx,withinitslocalscope,aname,y,iscreated,andit'spointedtothesameobjectxispointingto.Thisisbetterclarifiedbythefollowingfigure(don'tworryaboutPython3.3,thisisafeaturethathasn't

changed):

Therightpartoftheprecedingfiguredepictsthestateoftheprogramwhenexecutionhasreachedtheend,afterfunchasreturned(None).TakealookattheFramescolumn,andnotethatwehavetwonames,xandfunc,intheglobalnamespace(Globalframe),pointingtoanint(withavalueof3)andtoafunctionobject,respectively.Rightbeneathit,intherectangletitledfunc,wecanseethefunction'slocalnamespace,inwhichonlyonenamehasbeendefined:y.Becausewehavecalledfuncwithx(line5intheleftpartofthefigure),yispointingtothesameobjectthatxispointingto.Thisiswhathappensunderthehoodwhenanargumentispassedtoafunction.Ifwehadusedthenamexinsteadofyinthefunctiondefinition,thingswouldhavebeenexactlythesame(onlymaybeabitconfusingatfirst),therewouldbealocalxinthefunction,andaglobalxoutside,aswesawintheScopesandnameresolutionsectionpreviouslyinthischapter.

So,inanutshell,whatreallyhappensisthatthefunctioncreates,initslocalscope,thenamesdefinedasargumentsand,whenwecallit,webasicallytellPythonwhichobjectsthosenamesmustbepointedtoward.

Assignmenttoargumentnamesdoesn'taffectthecaller

Thisissomethingthatcanbetrickytounderstandatfirst,solet'slookatanexample:

#key.points.assignment.py

x=3

deffunc(x):

x=7#definingalocalx,notchangingtheglobalone

func(x)

print(x)#prints:3

Intheprecedingcode,whenthex=7lineisexecuted,withinthelocalscopeofthefuncfunction,thename,x,ispointedtoanintegerwithavalueof7,leavingtheglobalxunaltered.

ChangingamutableaffectsthecallerThisisthefinalpoint,andit'sveryimportantbecausePythonapparentlybehavesdifferentlywithmutables(justapparently,though).Let'slookatanexample:

#key.points.mutable.py

x=[1,2,3]

deffunc(x):

x[1]=42#thisaffectsthecaller!

func(x)

print(x)#prints:[1,42,3]

Wow,weactuallychangedtheoriginalobject!Ifyouthinkaboutit,thereisnothingweirdinthisbehavior.Thexnameinthefunctionissettopointtothecallerobjectbythefunctioncallandwithinthebodyofthefunction,we'renotchangingx,inthatwe'renotchangingitsreference,or,inotherwords,wearenotchangingtheobjectxispointingto.We'reaccessingthatobject'selementatposition1,andchangingitsvalue.

Rememberpoint#2undertheInputparameterssection:Assigninganobjecttoanargumentnamewithinafunctiondoesn'taffectthecaller.Ifthatiscleartoyou,thefollowingcodeshouldnotbesurprising:

#key.points.mutable.assignment.py

x=[1,2,3]

deffunc(x):

x[1]=42#thischangesthecaller!

x='somethingelse'#thispointsxtoanewstringobject

func(x)

print(x)#stillprints:[1,42,3]

TakealookatthetwolinesIhavehighlighted.Atfirst,likebefore,wejustaccessthecallerobjectagain,atposition1,andchangeitsvaluetonumber42.Then,wereassignxtopointtothe'somethingelse'string.Thisleavesthecallerunalteredand,infact,theoutputisthesameasthatoftheprevioussnippet.

Takeyourtimetoplayaroundwiththisconcept,andexperimentwithprintsandcallstotheidfunctionuntileverythingisclearinyourmind.ThisisoneofthekeyaspectsofPythonanditmustbeveryclear,otherwiseyouriskintroducingsubtlebugsintoyourcode.Onceagain,thePythonTutorwebsite(http://www.pytho

ntutor.com/)willhelpyoualotbygivingyouavisualrepresentationoftheseconcepts.

Nowthatwehaveagoodunderstandingofinputparametersandhowtheybehave,let'sseehowwecanspecifythem.

HowtospecifyinputparametersTherearefivedifferentwaysofspecifyinginputparameters:

PositionalargumentsKeywordargumentsVariablepositionalargumentsVariablekeywordargumentsKeyword-onlyarguments

Let'slookatthemonebyone.

Positionalarguments

Positionalargumentsarereadfromlefttorightandtheyarethemostcommontypeofarguments:

#arguments.positional.py

deffunc(a,b,c):

print(a,b,c)

func(1,2,3)#prints:123

Thereisnotmuchelsetosay.Theycanbeasnumerousasyouwantandtheyareassignedbyposition.Inthefunctioncall,1comesfirst,2comessecond,and3comesthird,thereforetheyareassignedtoa,b,andc,respectively.

KeywordargumentsanddefaultvaluesKeywordargumentsareassignedbykeywordusingthename=valuesyntax:

#arguments.keyword.py

deffunc(a,b,c):

print(a,b,c)

func(a=1,c=2,b=3)#prints:132

Keywordargumentsarematchedbyname,evenwhentheydon'trespectthedefinition'soriginalposition(we'llseethatthereisalimitationtothisbehaviorlater,whenwemixandmatchdifferenttypesofarguments).

Thecounterpartofkeywordarguments,onthedefinitionside,isdefaultvalues.Thesyntaxisthesame,name=value,andallowsustonothavetoprovideanargumentifwearehappywiththegivendefault:

#arguments.default.py

deffunc(a,b=4,c=88):

print(a,b,c)

func(1)#prints:1488

func(b=5,a=7,c=9)#prints:759

func(42,c=9)#prints:4249

func(42,43,44)#prints:42,43,44

Thearetwothingstonotice,whichareveryimportant.Firstofall,youcannotspecifyadefaultargumentontheleftofapositionalone.Second,notehowintheexamples,whenanargumentispassedwithoutusingtheargument_name=valuesyntax,itmustbethefirstoneinthelist,anditisalwaysassignedtoa.Noticealsothatpassingvaluesinapositionalfashionstillworks,andfollowsthefunctionsignatureorder(lastlineoftheexample).

Tryandscramblethoseargumentsandseewhathappens.Pythonerrormessagesareverygoodattellingyouwhat'swrong.So,forexample,ifyoutriedsomethingsuchasthis:

#arguments.default.error.py

deffunc(a,b=4,c=88):

print(a,b,c)

func(b=1,c=2,42)#positionalargumentafterkeywordone

Youwouldgetthefollowingerror:

$pythonarguments.default.error.py

File"arguments.default.error.py",line4

func(b=1,c=2,42)#positionalargumentafterkeywordone

^

SyntaxError:positionalargumentfollowskeywordargument

Thisinformsyouthatyou'vecalledthefunctionincorrectly.

VariablepositionalargumentsSometimesyoumaywanttopassavariablenumberofpositionalargumentstoafunction,andPythonprovidesyouwiththeabilitytodoit.Let'slookataverycommonusecase,theminimumfunction.Thisisafunctionthatcalculatestheminimumofitsinputvalues:

#arguments.variable.positional.py

defminimum(*n):

#print(type(n))#nisatuple

ifn:#explainedafterthecode

mn=n[0]

forvalueinn[1:]:

ifvalue<mn:

mn=value

print(mn)

minimum(1,3,-7,9)#n=(1,3,-7,9)-prints:-7

minimum()#n=()-prints:nothing

Asyoucansee,whenwespecifyaparameterprependinga*toitsname,wearetellingPythonthatthatparameterwillbecollectingavariablenumberofpositionalarguments,accordingtohowthefunctioniscalled.Withinthefunction,nisatuple.Uncommentprint(type(n))toseeforyourselfandplayaroundwithitforabit.

Haveyounoticedhowwecheckedwhethernwasn'temptywithasimpleifn:?ThisisbecausecollectionobjectsevaluatetoTruewhennon-empty,andotherwiseFalseinPython.Thisistruefortuples,sets,lists,dictionaries,andsoon.Oneotherthingtonoteisthatwemaywanttothrowanerrorwhenwecallthefunctionwithnoarguments,insteadofsilentlydoingnothing.Inthiscontext,we'renotconcernedaboutmakingthisfunctionrobust,butinunderstandingvariablepositionalarguments.

Let'smakeanotherexampletoshowyoutwothingsthat,inmyexperience,areconfusingtothosewhoarenewtothis:

#arguments.variable.positional.unpacking.py

deffunc(*args):

print(args)

values=(1,3,-7,9)

func(values)#equivalentto:func((1,3,-7,9))

func(*values)#equivalentto:func(1,3,-7,9)

Takeagoodlookatthelasttwolinesoftheprecedingexample.Inthefirstone,

wecallfuncwithoneargument,afour-elementstuple.Inthesecondexample,byusingthe*syntax,we'redoingsomethingcalledunpacking,whichmeansthatthefour-elementstupleisunpacked,andthefunctioniscalledwithfourarguments:1,3,-7,9.

ThisbehaviorispartofthemagicPythondoestoallowyoutodoamazingthingswhencallingfunctionsdynamically.

VariablekeywordargumentsVariablekeywordargumentsareverysimilartovariablepositionalarguments.Theonlydifferenceisthesyntax(**insteadof*)andthattheyarecollectedinadictionary.Collectionandunpackingworkinthesameway,solet'slookatanexample:

#arguments.variable.keyword.py

deffunc(**kwargs):

print(kwargs)

#Allcallsequivalent.Theyprint:{'a':1,'b':42}

func(a=1,b=42)

func(**{'a':1,'b':42})

func(**dict(a=1,b=42))

Allthecallsareequivalentintheprecedingexample.Youcanseethataddinga**infrontoftheparameternameinthefunctiondefinitiontellsPythontousethatnametocollectavariablenumberofkeywordparameters.Ontheotherhand,whenwecallthefunction,wecaneitherpassname=valueargumentsexplicitly,orunpackadictionaryusingthesame**syntax.

Thereasonwhybeingabletopassavariablenumberofkeywordparametersissoimportantmaynotbeevidentatthemoment,so,howaboutamorerealisticexample?Let'sdefineafunctionthatconnectstoadatabase.Wewanttoconnecttoadefaultdatabasebysimplycallingthisfunctionwithnoparameters.Wealsowanttoconnecttoanyotherdatabasebypassingthefunctiontheappropriatearguments.Beforeyoureadon,trytospendacoupleofminutesfiguringoutasolutionbyyourself:

#arguments.variable.db.py

defconnect(**options):

conn_params={

'host':options.get('host','127.0.0.1'),

'port':options.get('port',5432),

'user':options.get('user',''),

'pwd':options.get('pwd',''),

}

print(conn_params)

#wethenconnecttothedb(commentedout)

#db.connect(**conn_params)

connect()

connect(host='127.0.0.42',port=5433)

connect(port=5431,user='fab',pwd='gandalf')

Notethatinthefunction,wecanprepareadictionaryofconnectionparameters(conn_params)usingdefaultvaluesasfallbacks,allowingthemtobeoverwritteniftheyareprovidedinthefunctioncall.Therearebetterwaystodothiswithfewerlinesofcode,butwe'renotconcernedwiththatrightnow.Runningtheprecedingcodeyieldsthefollowingresult:

$pythonarguments.variable.db.py

{'host':'127.0.0.1','port':5432,'user':'','pwd':''}

{'host':'127.0.0.42','port':5433,'user':'','pwd':''}

{'host':'127.0.0.1','port':5431,'user':'fab','pwd':'gandalf'}

Notethecorrespondencebetweenthefunctioncallsandtheoutput.Noticehowdefaultvaluesareoverriddenaccordingtowhatwaspassedtothefunction.

Keyword-onlyargumentsPython3allowsforanewtypeofparameter:thekeyword-onlyparameter.Wearegoingtostudythemonlybrieflyastheirusecasesarenotthatfrequent.Therearetwowaysofspecifyingthem,eitherafterthevariablepositionalarguments,orafterabare*.Let'sseeanexampleofboth:#arguments.keyword.only.pydefkwo(*a,c):print(a,c)

kwo(1,2,3,c=7)#prints:(1,2,3)7kwo(c=4)#prints:()4#kwo(1,2)#breaks,invalidsyntax,withthefollowingerror#TypeError:kwo()missing1requiredkeyword-onlyargument:'c'

defkwo2(a,b=42,*,c):print(a,b,c)

kwo2(3,b=7,c=99)#prints:3799kwo2(3,c=13)#prints:34213#kwo2(3,23)#breaks,invalidsyntax,withthefollowingerror#TypeError:kwo2()missing1requiredkeyword-onlyargument:'c'

Asanticipated,thefunction,kwo,takesavariablenumberofpositionalarguments(a)andakeyword-onlyone,c.TheresultsofthecallsarestraightforwardandyoucanuncommentthethirdcalltoseewhaterrorPythonreturns.

Thesameappliestothefunction,kwo2,whichdiffersfromkwointhatittakesapositionalargument,a,akeywordargument,b,andthenakeyword-onlyone,c.Youcanuncommentthethirdcalltoseetheerror.

Nowthatyouknowhowtospecifydifferenttypesofinputparameters,let'sseehowyoucancombinetheminfunctiondefinitions.

CombininginputparametersYoucancombineinputparameters,aslongasyoufollowtheseorderingrules:

Whendefiningafunction,normalpositionalargumentscomefirst(name),thenanydefaultarguments(name=value),thenthevariablepositionalarguments(*nameorsimply*),thenanykeyword-onlyarguments(eithernameorname=valueformisgood),andthenanyvariablekeywordarguments(**name).

Ontheotherhand,whencallingafunction,argumentsmustbegiveninthefollowingorder:positionalargumentsfirst(value),thenanycombinationofkeywordarguments(name=value),variablepositionalarguments(*name),andthenvariablekeywordarguments(**name).

Sincethiscanbeabittrickywhenlefthanginginthetheoreticalworld,let'slookatacoupleofquickexamples:

#arguments.all.py

deffunc(a,b,c=7,*args,**kwargs):

print('a,b,c:',a,b,c)

print('args:',args)

print('kwargs:',kwargs)

func(1,2,3,*(5,7,9),**{'A':'a','B':'b'})

func(1,2,3,5,7,9,A='a',B='b')#sameaspreviousone

Notetheorderoftheparametersinthefunctiondefinition,andthatthetwocallsareequivalent.Inthefirstone,we'reusingtheunpackingoperatorsforiterablesanddictionaries,whileinthesecondonewe'reusingamoreexplicitsyntax.Theexecutionofthisyieldsthefollowing(Iprintedonlytheresultofonecall,theotheronebeingthesame):

$pythonarguments.all.py

a,b,c:123

args:(5,7,9)

kwargs:{'A':'a','B':'b'}

Let'snowlookatanexamplewithkeyword-onlyarguments:

#arguments.all.kwonly.py

deffunc_with_kwonly(a,b=42,*args,c,d=256,**kwargs):

print('a,b:',a,b)

print('c,d:',c,d)

print('args:',args)

print('kwargs:',kwargs)

#bothcallsequivalent

func_with_kwonly(3,42,c=0,d=1,*(7,9,11),e='E',f='F')

func_with_kwonly(3,42,*(7,9,11),c=0,d=1,e='E',f='F')

NotethatIhavehighlightedthekeyword-onlyargumentsinthefunctiondeclaration.Theycomeafterthe*argsvariablepositionalargument,anditwouldbethesameiftheycamerightafterasingle*(inwhichcasetherewouldn'tbeavariablepositionalargument).Theexecutionofthisyieldsthefollowing(Iprintedonlytheresultofonecall):

$pythonarguments.all.kwonly.py

a,b:342

c,d:01

args:(7,9,11)

kwargs:{'e':'E','f':'F'}

OneotherthingtonoteisthenamesIgavetothevariablepositionalandkeywordarguments.You'refreetochoosedifferently,butbeawarethatargsandkwargsaretheconventionalnamesgiventotheseparameters,atleastgenerically.

AdditionalunpackinggeneralizationsOneoftherecentnewfeatures,introducedinPython3.5,istheabilitytoextendtheiterable(*)anddictionary(**)unpackingoperatorstoallowunpackinginmorepositions,anarbitrarynumberoftimes,andinadditionalcircumstances.I'llpresentyouwithanexampleconcerningfunctioncalls:

#additional.unpacking.py

defadditional(*args,**kwargs):

print(args)

print(kwargs)

args1=(1,2,3)

args2=[4,5]

kwargs1=dict(option1=10,option2=20)

kwargs2={'option3':30}

additional(*args1,*args2,**kwargs1,**kwargs2)

Inthepreviousexample,wedefinedasimplefunctionthatprintsitsinputarguments,argsandkwargs.Thenewfeatureliesinthewaywecallthisfunction.Noticehowwecanunpackmultipleiterablesanddictionaries,andtheyarecorrectlycoalescedunderargsandkwargs.Thereasonwhythisfeatureisimportantisthatitallowsusnottohavetomergeargs1withargs2,andkwargs1withkwargs2inthecode.Runningthecodeproduces:

$pythonadditional.unpacking.py

(1,2,3,4,5)

{'option1':10,'option2':20,'option3':30}

PleaserefertoPEP448(https://www.python.org/dev/peps/pep-0448/)tolearnthefullextentofthisnewfeatureandseefurtherexamples.

Avoidthetrap!MutabledefaultsOnethingtobeveryawareofwithPythonisthatdefaultvaluesarecreatedatdeftime,therefore,subsequentcallstothesamefunctionwillpossiblybehavedifferentlyaccordingtothemutabilityoftheirdefaultvalues.Let'slookatanexample:

#arguments.defaults.mutable.py

deffunc(a=[],b={}):

print(a)

print(b)

print('#'*12)

a.append(len(a))#thiswillaffecta'sdefaultvalue

b[len(a)]=len(a)#andthiswillaffectb'sone

func()

func()

func()

Bothparametershavemutabledefaultvalues.Thismeansthat,ifyouaffectthoseobjects,anymodificationwillstickaroundinsubsequentfunctioncalls.Seeifyoucanunderstandtheoutputofthosecalls:

$pythonarguments.defaults.mutable.py

[]

{}

############

[0]

{1:1}

############

[0,1]

{1:1,2:2}

############

It'sinteresting,isn'tit?Whilethisbehaviormayseemveryweirdatfirst,itactuallymakessense,andit'sveryhandy,forexample,whenusingmemoizationtechniques(Googleanexampleofthat,ifyou'reinterested).Evenmoreinterestingiswhathappenswhen,betweenthecalls,weintroduceonethatdoesn'tusedefaults,suchasthis:

#arguments.defaults.mutable.intermediate.call.py

func()

func(a=[1,2,3],b={'B':1})

func()

Whenwerunthiscode,thisistheoutput:

$pythonarguments.defaults.mutable.intermediate.call.py

[]

{}

############

[1,2,3]

{'B':1}

############

[0]

{1:1}

############

Thisoutputshowsusthatthedefaultsareretainedevenifwecallthefunctionwithothervalues.Onequestionthatcomestomindis,howdoIgetafreshemptyvalueeverytime?Well,theconventionisthefollowing:

#arguments.defaults.mutable.no.trap.py

deffunc(a=None):

ifaisNone:

a=[]

#dowhateveryouwantwith`a`...

Notethat,byusingtheprecedingtechnique,ifaisn'tpassedwhencallingthefunction,youalwaysgetabrandnew,emptylist.

Okay,enoughwiththeinput,let'slookattheothersideofthecoin,theoutput.

ReturnvaluesThereturnvaluesoffunctionsareoneofthosethingswherePythonisaheadofmostotherlanguages.Functionsareusuallyallowedtoreturnoneobject(onevalue)but,inPython,youcanreturnatuple,andthisimpliesthatyoucanreturnwhateveryouwant.Thisfeatureallowsacodertowritesoftwarethatwouldbemuchhardertowriteinanyotherlanguage,orcertainlymoretedious.We'vealreadysaidthattoreturnsomethingfromafunctionweneedtousethereturnstatement,followedbywhatwewanttoreturn.Therecanbeasmanyreturnstatementsasneededinthebodyofafunction.

Ontheotherhand,ifwithinthebodyofafunctionwedon'treturnanything,orweinvokeabarereturnstatement,thefunctionwillreturnNone.Thisbehaviorisharmlessand,eventhoughIdon'thavetheroomheretogointodetailexplainingwhyPythonwasdesignedlikethis,letmejusttellyouthatthisfeatureallowsforseveralinterestingpatterns,andconfirmsPythonasaveryconsistentlanguage.

Isayit'sharmlessbecauseyouareneverforcedtocollecttheresultofafunctioncall.I'llshowyouwhatImeanwithanexample:

#return.none.py

deffunc():

pass

func()#thereturnofthiscallwon'tbecollected.It'slost.

a=func()#thereturnofthisoneinsteadiscollectedinto`a`

print(a)#prints:None

Notethatthewholebodyofthefunctioniscomposedonlyofthepassstatement.Astheofficialdocumentationtellsus,passisanulloperation.Whenitisexecuted,nothinghappens.Itisusefulasaplaceholderwhenastatementisrequiredsyntactically,butnocodeneedstobeexecuted.Inotherlanguages,wewouldprobablyjustindicatethatwithapairofcurlybrackets({}),whichdefineanemptyscope,butinPython,ascopeisdefinedbyindentingcode,thereforeastatementsuchaspassisnecessary.

Noticealsothatthefirstcallofthefuncfunctionreturnsavalue(None)whichwedon'tcollect.AsIsaidbefore,collectingthereturnvalueofafunctioncallisnot

mandatory.

Now,that'sgoodbutnotveryinterestingso,howaboutwewriteaninterestingfunction?RememberthatinChapter1,AGentleIntroductiontoPython,wetalkedaboutthefactorialofafunction.Let'swriteourownhere(forsimplicity,IwillassumethefunctionisalwayscalledcorrectlywithappropriatevaluessoIwon'tsanity-checktheinputargument):

#return.single.value.py

deffactorial(n):

ifnin(0,1):

return1

result=n

forkinrange(2,n):

result*=k

returnresult

f5=factorial(5)#f5=120

Notethatwehavetwopointsofreturn.Ifniseither0or1(inPythonit'scommontousetheintypeofcheck,asIdidinsteadofthemoreverboseifn==0orn==1:),wereturn1.Otherwise,weperformtherequiredcalculationandwereturnresult.Let'strytowritethisfunctionalittlebitmoresuccinctly:

#return.single.value.2.py

fromfunctoolsimportreduce

fromoperatorimportmul

deffactorial(n):

returnreduce(mul,range(1,n+1),1)

f5=factorial(5)#f5=120

Iknowwhatyou'rethinking:oneline?Pythoniselegant,andconcise!Ithinkthisfunctionisreadableevenifyouhaveneverseenreduceormul,butifyoucan'treaditorunderstandit,setasideafewminutesanddosomeresearchonthePythondocumentationuntilitsbehavioriscleartoyou.Beingabletolookupfunctionsinthedocumentationandunderstandcodewrittenbysomeoneelseisataskeverydeveloperneedstobeabletoperform,sotakethisasachallenge.

Tothisend,makesureyoulookupthehelpfunction,whichprovesquitehelpfulwhenexploringwiththeconsole.

ReturningmultiplevaluesUnlikeinmostotherlanguages,inPythonit'sveryeasytoreturnmultipleobjectsfromafunction.Thisfeatureopensupawholeworldofpossibilitiesandallowsyoutocodeinastylethatishardtoreproducewithotherlanguages.Ourthinkingislimitedbythetoolsweuse,thereforewhenPythongivesyoumorefreedomthanotherlanguages,itisactuallyboostingyourowncreativityaswell.Toreturnmultiplevaluesisveryeasy,youjustusetuples(eitherexplicitlyorimplicitly).Let'slookatasimpleexamplethatmimicsthedivmodbuilt-infunction:#return.multiple.pydefmoddiv(a,b):returna//b,a%b

print(moddiv(20,7))#prints(2,6)

Icouldhavewrappedthehighlightedpartintheprecedingcodeinbrackets,makingitanexplicittuple,butthere'snoneedforthat.Theprecedingfunctionreturnsboththeresultandtheremainderofthedivision,atthesametime.

Inthesourcecodeforthisexample,Ihaveleftasimpleexampleofatestfunctiontomakesuremycodeisdoingthecorrectcalculation.

Afewusefultips

Whenwritingfunctions,it'sveryusefultofollowguidelinessothatyouwritethemwell.I'llquicklypointsomeofthemout:

Functionsshoulddoonething:Functionsthatdoonethingareeasytodescribeinoneshortsentence.Functionsthatdomultiplethingscanbesplitintosmallerfunctionsthatdoonething.Thesesmallerfunctionsareusuallyeasiertoreadandunderstand.Rememberthedatascienceexamplewesawafewpagesago.Functionsshouldbesmall:Thesmallertheyare,theeasieritistotestthemandtowritethemsothattheydoonething.Thefewerinputparameters,thebetter:Functionsthattakealotofargumentsquicklybecomehardertomanage(amongotherissues).Functionsshouldbeconsistentintheirreturnvalues:ReturningFalseorNoneisnotthesamething,evenifwithinaBooleancontexttheybothevaluatetoFalse.Falsemeansthatwehaveinformation(False),whileNonemeansthatthereisnoinformation.Trywritingfunctionsthatreturninaconsistentway,nomatterwhathappensintheirbody.Functionsshouldn'thavesideeffects:Inotherwords,functionsshouldnotaffectthevaluesyoucallthemwith.Thisisprobablythehardeststatementtounderstandatthispoint,soI'llgiveyouanexampleusinglists.Inthefollowingcode,notehownumbersisnotsortedbythesortedfunction,whichactuallyreturnsasortedcopyofnumbers.Conversely,thelist.sort()methodisactingonthenumbersobjectitself,andthatisfinebecauseitisamethod(afunctionthatbelongstoanobjectandthereforehastherightstomodifyit):

>>>numbers=[4,1,7,5]

>>>sorted(numbers)#won'tsorttheoriginal`numbers`list

[1,4,5,7]

>>>numbers#let'sverify

[4,1,7,5]#good,untouched

>>>numbers.sort()#thiswillactonthelist

>>>numbers

[1,4,5,7]

Followtheseguidelinesandyou'llwritebetterfunctions,whichwillserveyou

well.

Chapter3,FunctionsinCleanCodebyRobertC.Martin,PrenticeHallisdedicatedtofunctionsandit'sprobablythebestsetofguidelinesI'veeverreadonthesubject.

RecursivefunctionsWhenafunctioncallsitselftoproducearesult,itissaidtoberecursive.Sometimesrecursivefunctionsareveryusefulinthattheymakeiteasiertowritecode.Somealgorithmsareveryeasytowriteusingtherecursiveparadigm,whileothersarenot.Thereisnorecursivefunctionthatcannotberewritteninaniterativefashion,soit'susuallyuptotheprogrammertochoosethebestapproachforthecaseathand.

Thebodyofarecursivefunctionusuallyhastwosections:onewherethereturnvaluedependsonasubsequentcalltoitself,andonewhereitdoesn't(calledabasecase).

Asanexample,wecanconsiderthe(hopefullyfamiliarbynow)factorialfunction,N!.ThebasecaseiswhenNiseither0or1.Thefunctionreturns1withnoneedforfurthercalculation.Ontheotherhand,inthegeneralcase,N!returnstheproduct1*2*...*(N-1)*N.Ifyouthinkaboutit,N!canberewrittenlikethis:N!=(N-1)!*N.Asapracticalexample,consider5!=1*2*3*4*5=(1*2*3*4)*5=4!*5.

Let'swritethisdownincode:

#recursive.factorial.py

deffactorial(n):

ifnin(0,1):#basecase

return1

returnfactorial(n-1)*n#recursivecase

Whenwritingrecursivefunctions,alwaysconsiderhowmanynestedcallsyoumake,sincethereisalimit.Forfurtherinformationonthis,checkoutsys.getrecursionlimit()andsys.setrecursionlimit().

Recursivefunctionsareusedalotwhenwritingalgorithmsandtheycanbereallyfuntowrite.Asanexercise,trytosolveacoupleofsimpleproblemsusingbotharecursiveandaniterativeapproach.

AnonymousfunctionsOnelasttypeoffunctionsthatIwanttotalkaboutareanonymousfunctions.Thesefunctions,whicharecalledlambdasinPython,areusuallyusedwhenafully-fledgedfunctionwithitsownnamewouldbeoverkill,andallwewantisaquick,simpleone-linerthatdoesthejob.

ImaginethatyouwantalistofallthenumbersuptoNthataremultiplesoffive.Imaginethatyouwanttofilterthoseoutusingthefilterfunction,whichtakesafunctionandaniterableandconstructsafilterobjectthatyoucaniterateon,fromthoseelementsofiterablesforwhichthefunctionreturnsTrue.Withoutusingananonymousfunction,youwoulddosomethinglikethis:

#filter.regular.py

defis_multiple_of_five(n):

returnnotn%5

defget_multiples_of_five(n):

returnlist(filter(is_multiple_of_five,range(n)))

Notehowweuseis_multiple_of_fivetofilterthefirstnnaturalnumbers.Thisseemsabitexcessive,thetaskissimpleandwedon'tneedtokeeptheis_multiple_of_fivefunctionaroundforanythingelse.Let'srewriteitusingalambdafunction:

#filter.lambda.py

defget_multiples_of_five(n):

returnlist(filter(lambdak:notk%5,range(n)))

Thelogicisexactlythesamebutthefilteringfunctionisnowalambda.Definingalambdaisveryeasyandfollowsthisform:func_name=lambda[parameter_list]:expression.Afunctionobjectisreturned,whichisequivalenttothis:deffunc_name([parameter_list]):returnexpression.

Notethatoptionalparametersareindicatedfollowingthecommonsyntaxofwrappingtheminsquarebrackets.

Let'slookatanothercoupleofexamplesofequivalentfunctionsdefinedinthetwoforms:

#lambda.explained.py

#example1:adder

defadder(a,b):

returna+b

#isequivalentto:

adder_lambda=lambdaa,b:a+b

#example2:touppercase

defto_upper(s):

returns.upper()

#isequivalentto:

to_upper_lambda=lambdas:s.upper()

Theprecedingexamplesareverysimple.Thefirstoneaddstwonumbers,andthesecondoneproducestheuppercaseversionofastring.NotethatIassignedwhatisreturnedbythelambdaexpressionstoaname(adder_lambda,to_upper_lambda),butthereisnoneedforthatwhenyouuselambdasinthewaywedidinthefilterexample.

FunctionattributesEveryfunctionisafully-fledgedobjectand,assuch,theyhavemanyattributes.Someofthemarespecialandcanbeusedinanintrospectivewaytoinspectthefunctionobjectatruntime.Thefollowingscriptisanexamplethatshowsapartofthemandhowtodisplaytheirvalueforanexamplefunction:

#func.attributes.py

defmultiplication(a,b=1):

"""Returnamultipliedbyb."""

returna*b

special_attributes=[

"__doc__","__name__","__qualname__","__module__",

"__defaults__","__code__","__globals__","__dict__",

"__closure__","__annotations__","__kwdefaults__",

]

forattributeinspecial_attributes:

print(attribute,'->',getattr(multiplication,attribute))

Iusedthebuilt-ingetattrfunctiontogetthevalueofthoseattributes.getattr(obj,attribute)isequivalenttoobj.attributeandcomesinhandywhenweneedtogetanattributeatruntimeusingitsstringname.Runningthisscriptyields:

$pythonfunc.attributes.py

__doc__->Returnamultipliedbyb.

__name__->multiplication

__qualname__->multiplication

__module__->__main__

__defaults__->(1,)

__code__-><codeobjectmultiplicationat0x10caf7660,file"func.attributes.py",line

1>

__globals__->{...omitted...}

__dict__->{}

__closure__->None

__annotations__->{}

__kwdefaults__->None

Ihaveomittedthevalueofthe__globals__attribute,asitwastoobig.AnexplanationofthemeaningofthisattributecanbefoundintheCallabletypessectionofthePythonDataModeldocumentationpage(https://docs.python.org/3/reference/datamodel.html#the-standard-type-hierarchy).Shouldyouwanttoseealltheattributesofanobject,justcalldir(object_name)andyou'llbegiventhelistofallofitsattributes.

Built-infunctionsPythoncomeswithalotofbuilt-infunctions.Theyareavailableanywhereandyoucangetalistofthembyinspectingthebuiltinsmodulewithdir(__builtins__),orbygoingtotheofficialPythondocumentation.Unfortunately,Idon'thavetheroomtogothroughallofthemhere.We'vealreadyseensomeofthem,suchasany,bin,bool,divmod,filter,float,getattr,id,int,len,list,min,print,set,tuple,type,andzip,buttherearemanymore,whichyoushouldreadatleastonce.Getfamiliarwiththem,experiment,writeasmallpieceofcodeforeachofthem,andmakesureyouhavethematyourfingertipssothatyoucanusethemwhenyouneedthem.

OnefinalexampleBeforewefinishoffthischapter,howaboutonelastexample?Iwasthinkingwecouldwriteafunctiontogeneratealistofprimenumbersuptoalimit.We'vealreadyseenthecodeforthissolet'smakeitafunctionand,tokeepitinteresting,let'soptimizeitabit.

Itturnsoutthatyoudon'tneedtodivideitbyallnumbersfrom2toN-1todecidewhetheranumber,N,isprime.Youcanstopat√N.Moreover,youdon'tneedtotestthedivisionforallnumbersfrom2to√N,youcanjustusetheprimesinthatrange.I'llleaveittoyoutofigureoutwhythisworks,ifyou'reinterested.Let'sseehowthecodechanges:

#primes.py

frommathimportsqrt,ceil

defget_primes(n):

"""Calculatealistofprimesupton(included)."""

primelist=[]

forcandidateinrange(2,n+1):

is_prime=True

root=ceil(sqrt(candidate))#divisionlimit

forprimeinprimelist:#wetryonlytheprimes

ifprime>root:#noneedtocheckanyfurther

break

ifcandidate%prime==0:

is_prime=False

break

ifis_prime:

primelist.append(candidate)

returnprimelist

Thecodeisthesameasinthepreviouschapter.Wehavechangedthedivisionalgorithmsothatweonlytestdivisibilityusingthepreviouslycalculatedprimesandwestoppedoncethetestingdivisorwasgreaterthantherootofthecandidate.Weusedtheprimelistresultlisttogettheprimesforthedivision.Wecalculatedtherootvalueusingafancyformula,theintegervalueoftheceilingoftherootofthecandidate.Whileasimpleint(k**0.5)+1wouldhaveservedourpurposeaswell,theformulaIchoseiscleanerandrequiresmetouseacoupleofimports,whichIwantedtoshowyou.Checkoutthefunctionsinthemathmodule,theyareveryinteresting!

DocumentingyourcodeI'mabigfanofcodethatdoesn'tneeddocumentation.Whenyouprogramcorrectly,choosetherightnamesandtakecareofthedetails,yourcodeshouldcomeoutasself-explanatoryanddocumentationshouldnotbeneeded.Sometimesacommentisveryusefulthough,andsoissomedocumentation.YoucanfindtheguidelinesfordocumentingPythoninPEP257-Docstringconventions(https://www.python.org/dev/peps/pep-0257/),butI'llshowyouthebasicshere.

Pythonisdocumentedwithstrings,whichareaptlycalleddocstrings.Anyobjectcanbedocumented,andyoucanuseeitherone-lineormultilinedocstrings.One-linersareverysimple.Theyshouldnotprovideanothersignatureforthefunction,butclearlystateitspurpose:#docstrings.pydefsquare(n):"""Returnthesquareofanumbern."""returnn**2

defget_username(userid):"""Returntheusernameofausergiventheirid."""returndb.get(user_id=userid).username

Usingtripledouble-quotedstringsallowsyoutoexpandeasilylateron.Usesentencesthatendinaperiod,anddon'tleaveblanklinesbeforeorafter.

Multilinecommentsarestructuredinasimilarway.Thereshouldbeaone-linerthatbrieflygivesyouthegistofwhattheobjectisabout,andthenamoreverbosedescription.Asanexample,Ihavedocumentedafictitiousconnectfunction,usingtheSphinxnotation,inthefollowingexample:defconnect(host,port,user,password):"""Connecttoadatabase.

ConnecttoaPostgreSQLdatabasedirectly,usingthegivenparameters.

:paramhost:ThehostIP.:paramport:Thedesiredport.:paramuser:Theconnectionusername.:parampassword:Theconnectionpassword.:return:Theconnectionobject."""#bodyofthefunctionhere...returnconnection

SphinxisprobablythemostwidelyusedtoolforcreatingPythondocumentation.Infact,theofficialPythondocumentationwaswrittenwithit.It'sdefinitelyworthspendingsometimecheckingitout.

ImportingobjectsNowthatyouknowalotaboutfunctions,let'slookathowtousethem.Thewholepointofwritingfunctionsistobeabletoreusethemlater,andinPython,thistranslatestoimportingthemintothenamespacewhereyouneedthem.Therearemanydifferentwaystoimportobjectsintoanamespace,butthemostcommononesareimportmodule_nameandfrommodule_nameimportfunction_name.Ofcourse,thesearequitesimplisticexamples,butbearwithmeforthetimebeing.

Theimportmodule_nameformfindsthemodule_namemoduleanddefinesanameforitinthelocalnamespacewheretheimportstatementisexecuted.Thefrommodule_nameimportidentifierformisalittlebitmorecomplicatedthanthat,butbasicallydoesthesamething.Itfindsmodule_nameandsearchesforanattribute(orasubmodule)andstoresareferencetoidentifierinthelocalnamespace.

Bothformshavetheoptiontochangethenameoftheimportedobjectusingtheasclause:

frommymoduleimportmyfuncasbetter_named_func

Justtogiveyouaflavorofwhatimportinglookslike,here'sanexamplefromatestmoduleofoneofmyprojects(noticethattheblanklinesbetweenblocksofimportsfollowtheguidelinesfromPEP8athttps://www.python.org/dev/peps/pep-0008/#imports:standardlibrary,thirdparty,andlocalcode):

fromdatetimeimportdatetime,timezone#twoimportsonthesameline

fromunittest.mockimportpatch#singleimport

importpytest#thirdpartylibrary

fromcore.modelsimport(#multilineimport

Exam,

Exercise,

Solution,

)

Whenyouhaveastructureoffilesstartingintherootofyourproject,youcanusethedotnotationtogettotheobjectyouwanttoimportintoyourcurrentnamespace,beitapackage,amodule,aclass,afunction,oranythingelse.Thefrommoduleimportsyntaxalsoallowsacatch-allclause,frommoduleimport*,which

issometimesusedtogetallthenamesfromamoduleintothecurrentnamespaceatonce,butit'sfrowneduponforseveralreasons,suchasperformanceandtheriskofsilentlyshadowingothernames.YoucanreadallthatthereistoknowaboutimportsintheofficialPythondocumentationbut,beforeweleavethesubject,letmegiveyouabetterexample.

Imaginethatyouhavedefinedacoupleoffunctions:square(n)andcube(n)inamodule,funcdef.py,whichisinthelibfolder.Youwanttousetheminacoupleofmodulesthatareatthesamelevelofthelibfolder,calledfunc_import.pyandfunc_from.py.Showingthetreestructureofthatprojectproducessomethinglikethis:

├──func_from.py

├──func_import.py

├──lib

├──funcdef.py

└──__init__.py

BeforeIshowyouthecodeofeachmodule,pleaserememberthatinordertotellPythonthatitisactuallyapackage,weneedtoputa__init__.pymoduleinit.

Therearetwothingstonoteaboutthe__init__.pyfile.Firstofall,itisafully-fledgedPythonmodulesoyoucanputcodeintoitasyouwouldwithanyothermodule.Second,asofPython3.3,itspresenceisnolongerrequiredtomakeafolderbeinterpretedasaPythonpackage.

Thecodeisasfollows:

#funcdef.py

defsquare(n):

returnn**2

defcube(n):

returnn**3

#func_import.py

importlib.funcdef

print(lib.funcdef.square(10))

print(lib.funcdef.cube(10))

#func_from.py

fromlib.funcdefimportsquare,cube

print(square(10))

print(cube(10))

Boththesefiles,whenexecuted,print100and1000.Youcanseehowdifferentlywethenaccessthesquareandcubefunctions,accordingtohowandwhatweimportedinthecurrentscope.

Relativeimports

Theimportswe'veseensofararecalledabsolute,thatis,theydefinethewholepathofthemodulethatwewanttoimport,orfromwhichwewanttoimportanobject.ThereisanotherwayofimportingobjectsintoPython,whichiscalledarelativeimport.It'shelpfulinsituationswherewewanttorearrangethestructureoflargepackageswithouthavingtoeditsub-packages,orwhenwewanttomakeamoduleinsideapackageabletoimportitself.Relativeimportsaredonebyaddingasmanyleadingdotsinfrontofthemoduleasthenumberoffoldersweneedtobacktrack,inordertofindwhatwe'researchingfor.Simplyput,itissomethingsuchasthis:

from.mymoduleimportmyfunc

Foracompleteexplanationofrelativeimports,refertoPEP328(https://www.python.org/dev/peps/pep-0328/).Inlaterchapters,we'llcreateprojectsusingdifferentlibrariesandwe'lluseseveraldifferenttypesofimports,includingrelativeones,somakesureyoutakeabitoftimetoreadupaboutitintheofficialPythondocumentation.

SummaryInthischapter,weexploredtheworldoffunctions.Theyareextremelyimportantand,fromnowon,we'llusethembasicallyeverywhere.Wetalkedaboutthemainreasonsforusingthem,themostimportantofwhicharecodereuseandimplementationhiding.

Wesawthatafunctionobjectislikeaboxthattakesoptionalinputsandproducesoutputs.Wecanfeedinputvaluestoafunctioninmanydifferentways,usingpositionalandkeywordarguments,andusingvariablesyntaxforbothtypes.

Nowyoushouldknowhowtowriteafunction,documentit,importitintoyourcode,andcallit.

Thenextchapterwillforcemetopushmyfootdownonthethrottleevenmore,soIsuggestyoutakeanyopportunityyougettoconsolidateandenrichtheknowledgeyou'vegatheredsofarbyputtingyournoseintothePythonofficialdocumentation.

SavingTimeandMemory"It'snotthedailyincreasebutdailydecrease.Hackawayattheunessential."

–BruceLee

IlovethisquotefromBruceLee.Hewassuchawiseman!Especially,thesecondpart,""hackawayattheunessential"",istomewhatmakesacomputerprogramelegant.Afterall,ifthereisabetterwayofdoingthingssothatwedon'twastetimeormemory,whynot?

Sometimes,therearevalidreasonsfornotpushingourcodeuptothemaximumlimit:forexample,sometimestoachieveanegligibleimprovement,wehavetosacrificeonreadabilityormaintainability.Doesitmakeanysensetohaveawebpageservedin1secondwithunreadable,complicatedcode,whenwecanserveitin1.05secondswithreadable,cleancode?No,itmakesnosense.

Ontheotherhand,sometimesit'sperfectlyreasonabletotrytoshaveoffamillisecondfromafunction,especiallywhenthefunctionismeanttobecalledthousandsoftimes.Everymillisecondyousavetheremeansonesecondsavedperthousandsofcalls,andthiscouldbemeaningfulforyourapplication.

Inlightoftheseconsiderations,thefocusofthischapterwillnotbetogiveyouthetoolstopushyourcodetotheabsolutelimitsofperformanceandoptimization"nomatterwhat,"butrather,toenableyoutowriteefficient,elegantcodethatreadswell,runsfast,anddoesn'twasteresourcesinanobviousway.

Inthischapter,wearegoingtocoverthefollowing:

Themap,zip,andfilterfunctionsComprehensionsGenerators

Iwillperformseveralmeasurementsandcomparisons,andcautiouslydrawsomeconclusions.Pleasedokeepinmindthatonadifferentboxwithadifferentsetuporadifferentoperatingsystem,resultsmayvary.Takealookatthiscode:

#squares.py

defsquare1(n):

returnn**2#squaringthroughthepoweroperator

defsquare2(n):

returnn*n#squaringthroughmultiplication

Bothfunctionsreturnthesquareofn,butwhichisfaster?FromasimplebenchmarkIranonthem,itlookslikethesecondisslightlyfaster.Ifyouthinkaboutit,itmakessense:calculatingthepowerofanumberinvolvesmultiplicationandtherefore,whateveralgorithmyoumayusetoperformthepoweroperation,it'snotlikelytobeatasimplemultiplicationsuchastheoneinsquare2.

Dowecareaboutthisresult?Inmostcases,no.Ifyou'recodingane-commercewebsite,chancesareyouwon'teverevenneedtoraiseanumbertothesecondpower,andifyoudo,it'slikelytobeasporadicoperation.Youdon'tneedtoconcernyourselfwithsavingafractionofamicrosecondonafunctionyoucallafewtimes.

So,whendoesoptimizationbecomeimportant?Oneverycommoncaseiswhenyouhavetodealwithhugecollectionsofdata.Ifyou'reapplyingthesamefunctiononamillioncustomerobjects,thenyouwantyourfunctiontobetuneduptoitsbest.Gaining1/10ofasecondonafunctioncalledonemilliontimessavesyou100,000seconds,whichisabout27.7hours.That'snotthesame,right?So,let'sfocusoncollections,andlet'sseewhichtoolsPythongivesyoutohandlethemwithefficiencyandgrace.

Manyoftheconceptswewillseeinthischapterarebasedonthoseoftheiteratoranditerable.Simplyput,theabilityforanobjecttoreturnitsnextelementwhenasked,andtoraiseaStopIterationexceptionwhenexhausted.We'llseehowtocodeacustomiteratoranditerableobjectsinChapter6,OOP,Decorators,andIterators.

Duetothenatureoftheobjectswe'regoingtoexploreinthischapter,Iwasoftenforcedtowrapthecodeinalistconstructor.Thisisbecausepassinganiterator/generatortolist(...)exhaustsitandputsallthegenerateditemsinanewlycreatedlist,whichIcaneasilyprinttoshowyouitscontent.Thistechniquehindersreadability,soletmeintroduceanaliasforlist:

#alias.py

>>>range(7)

range(0,7)

>>>list(range(7))#putallelementsinalisttoviewthem

[0,1,2,3,4,5,6]

>>>_=list#createan"alias"tolist

>>>_(range(7))#sameaslist(range(7))

[0,1,2,3,4,5,6]

OfthethreesectionsIhavehighlighted,thefirstoneisthecallweneedtodoinordertoshowwhatwouldbegeneratedbyrange(7),thesecondoneisthemomentwhenIcreatethealiastolist(Ichosethehopefullyunobtrusiveunderscore),andthethirdoneistheequivalentcall,whenIusethealiasinsteadoflist.

Hopefullyreadabilitywillbenefitfromthis,andpleasekeepinmindthatIwillassumethisaliastohavebeendefinedforallthecodeinthischapter.

Themap,zip,andfilterfunctionsWe'llstartbyreviewingmap,filter,andzip,whicharethemainbuilt-infunctionsonecanemploywhenhandlingcollections,andthenwe'lllearnhowtoachievethesameresultsusingtwoveryimportantconstructs:comprehensionsandgenerators.Fastenyourseatbelt!

mapAccordingtotheofficialPythondocumentation:

map(function,iterable,...)returnsaniteratorthatappliesfunctiontoeveryitemofiterable,yieldingtheresults.Ifadditionaliterableargumentsarepassed,functionmusttakethatmanyargumentsandisappliedtotheitemsfromalliterablesinparallel.Withmultipleiterables,theiteratorstopswhentheshortestiterableisexhausted.

Wewillexplaintheconceptofyieldinglateroninthechapter.Fornow,let'stranslatethisintocode—we'llusealambdafunctionthattakesavariablenumberofpositionalarguments,andjustreturnsthemasatuple:

#map.example.py

>>>map(lambda*a:a,range(3))#1iterable

<mapobjectat0x10acf8f98>#Notuseful!Let'susealias

>>>_(map(lambda*a:a,range(3)))#1iterable

[(0,),(1,),(2,)]

>>>_(map(lambda*a:a,range(3),'abc'))#2iterables

[(0,'a'),(1,'b'),(2,'c')]

>>>_(map(lambda*a:a,range(3),'abc',range(4,7)))#3

[(0,'a',4),(1,'b',5),(2,'c',6)]

>>>#mapstopsattheshortestiterator

>>>_(map(lambda*a:a,(),'abc'))#emptytupleisshortest

[]

>>>_(map(lambda*a:a,(1,2),'abc'))#(1,2)shortest

[(1,'a'),(2,'b')]

>>>_(map(lambda*a:a,(1,2,3,4),'abc'))#'abc'shortest

[(1,'a'),(2,'b'),(3,'c')]

Intheprecedingcode,youcanseewhywehavetowrapcallsinlist(...)(oritsalias,_,inthiscase).Withoutit,Igetthestringrepresentationofamapobject,whichisnotreallyusefulinthiscontext,isit?

Youcanalsonoticehowtheelementsofeachiterableareappliedtothefunction;atfirst,thefirstelementofeachiterable,thenthesecondoneofeachiterable,andsoon.Noticealsothatmapstopswhentheshortestoftheiterableswecalleditwithisexhausted.Thisisactuallyaverynicebehavior;itdoesn'tforceustoleveloffalltheiterablestoacommonlength,anditdoesn'tbreakiftheyaren'tallthesamelength.

mapisveryusefulwhenyouhavetoapplythesamefunctiontooneormorecollectionsofobjects.Asamoreinterestingexample,let'sseethedecorate-sort-undecorateidiom(alsoknownasSchwartziantransform).It'sa

techniquethatwasextremelypopularwhenPythonsortingwasn'tprovidingkey-functions,andthereforeislessusedtoday,butit'sacooltrickthatstillcomesinhandyonceinawhile.

Let'sseeavariationofitinthenextexample:wewanttosortindescendingorderbythesumofcreditsaccumulatedbystudents,sotohavethebeststudentatposition0.Wewriteafunctiontoproduceadecoratedobject,wesort,andthenweundecorate.Eachstudenthascreditsinthree(possiblydifferent)subjects.Inthiscontext,todecorateanobjectmeanstotransformit,eitheraddingextradatatoit,orputtingitintoanotherobject,inawaythatallowsustobeabletosorttheoriginalobjectsthewaywewant.ThistechniquehasnothingtodowithPythondecorators,whichwewillexplorelateroninthebook.

Afterthesorting,werevertthedecoratedobjectstogettheoriginalonesfromthem.Thisiscalledtoundecorate:

#decorate.sort.undecorate.py

students=[

dict(id=0,credits=dict(math=9,physics=6,history=7)),

dict(id=1,credits=dict(math=6,physics=7,latin=10)),

dict(id=2,credits=dict(history=8,physics=9,chemistry=10)),

dict(id=3,credits=dict(math=5,physics=5,geography=7)),

]

defdecorate(student):

#createa2-tuple(sumofcredits,student)fromstudentdict

return(sum(student['credits'].values()),student)

defundecorate(decorated_student):

#discardsumofcredits,returnoriginalstudentdict

returndecorated_student[1]

students=sorted(map(decorate,students),reverse=True)

students=_(map(undecorate,students))

Let'sstartbyunderstandingwhateachstudentobjectis.Infact,let'sprintthefirstone:

{'credits':{'history':7,'math':9,'physics':6},'id':0}

Youcanseethatit'sadictionarywithtwokeys:idandcredits.Thevalueofcreditsisalsoadictionaryinwhichtherearethreesubject/gradekey/valuepairs.AsI'msureyourecallfromourvisitinthedatastructuresworld,callingdict.values()returnsanobjectsimilartoiterable,withonlythevalues.Therefore,sum(student['credits'].values())forthefirststudentisequivalenttosum((9,6,7)).

Let'sprinttheresultofcallingdecoratewiththefirststudent:

>>>decorate(students[0])

(22,{'credits':{'history':7,'math':9,'physics':6},'id':0})

Ifwedecorateallthestudentslikethis,wecansortthemontheirtotalamountofcreditsbyjustsortingthelistoftuples.Inordertoapplythedecorationtoeachiteminstudents,wecallmap(decorate,students).Thenwesorttheresult,andthenweundecorateinasimilarfashion.Ifyouhavegonethroughthepreviouschapterscorrectly,understandingthiscodeshouldn'tbetoohard.

Printingstudentsafterrunningthewholecodeyields:

$pythondecorate.sort.undecorate.py

[{'credits':{'chemistry':10,'history':8,'physics':9},'id':2},

{'credits':{'latin':10,'math':6,'physics':7},'id':1},

{'credits':{'history':7,'math':9,'physics':6},'id':0},

{'credits':{'geography':7,'math':5,'physics':5},'id':3}]

Andyoucansee,bytheorderofthestudentobjects,thattheyhaveindeedbeensortedbythesumoftheircredits.

Formoreonthedecorate-sort-undecorateidiom,there'saveryniceintroductioninthesortinghow-tosectionoftheofficialPythondocumentation(https://docs.python.org/3.7/howto/sorting.html#the-old-way-using-decorate-sort-undecorate).

Onethingtonoticeaboutthesortingpart:whatiftwoormorestudentssharethesametotalsum?Thesortingalgorithmwouldthenproceedtosortthetuplesbycomparingthestudentobjectswitheachother.Thisdoesn'tmakeanysense,andinmorecomplexcases,couldleadtounpredictableresults,orevenerrors.Ifyouwanttobesuretoavoidthisissue,onesimplesolutionistocreateathree-tupleinsteadofatwo-tuple,havingthesumofcreditsinthefirstposition,thepositionofthestudentobjectinthestudentslistinthesecondone,andthestudentobjectitselfinthethirdone.Thisway,ifthesumofcreditsisthesame,thetupleswillbesortedagainsttheposition,whichwillalwaysbedifferentandthereforeenoughtoresolvethesortingbetweenanypairoftuples.

zipWe'vealreadycoveredzipinthepreviouschapters,solet'sjustdefineitproperlyandthenIwanttoshowyouhowyoucouldcombineitwithmap.

AccordingtothePythondocumentation:

zip(*iterables)returnsaniteratoroftuples,wherethei-thtuplecontainsthei-thelementfromeachoftheargumentsequencesoriterables.Theiteratorstopswhentheshortestinputiterableisexhausted.Withasingleiterableargument,itreturnsaniteratorof1-tuples.Withnoarguments,itreturnsanemptyiterator.

Let'sseeanexample:

#zip.grades.py

>>>grades=[18,23,30,27]

>>>avgs=[22,21,29,24]

>>>_(zip(avgs,grades))

[(22,18),(21,23),(29,30),(24,27)]

>>>_(map(lambda*a:a,avgs,grades))#equivalenttozip

[(22,18),(21,23),(29,30),(24,27)]

Intheprecedingcode,we'rezippingtogethertheaverageandthegradeforthelastexam,foreachstudent.Noticehoweasyitistoreproducezipusingmap(lasttwoinstructionsoftheexample).Hereaswell,tovisualizeresultswehavetouseour_alias.

Asimpleexampleonthecombineduseofmapandzipcouldbeawayofcalculatingtheelement-wisemaximumamongstsequences,thatis,themaximumofthefirstelementofeachsequence,thenthemaximumofthesecondone,andsoon:

#maxims.py

>>>a=[5,9,2,4,7]

>>>b=[3,7,1,9,2]

>>>c=[6,8,0,5,3]

>>>maxs=map(lambdan:max(*n),zip(a,b,c))

>>>_(maxs)

[6,9,2,9,7]

Noticehoweasyitistocalculatethemaxvaluesofthreesequences.zipisnotstrictlyneededofcourse,wecouldjustusemap.Sometimesit'shard,whenshowingasimpleexample,tograspwhyusingatechniquemightbegoodorbad.Weforgetthatwearen'talwaysincontrolofthesourcecode,wemight

havetouseathird-partylibrary,whichwecan'tchangethewaywewant.Havingdifferentwaystoworkwithdataisthereforereallyhelpful.

filterAccordingtothePythondocumentation:filter(function,iterable)constructaniteratorfromthoseelementsofiterableforwhichfunctionreturnsTrue.iterablemaybeeitherasequence,acontainerwhichsupportsiteration,oraniterator.IffunctionisNone,theidentityfunctionisassumed,thatis,allelementsofiterablethatarefalseareremoved.

Let'sseeaveryquickexample:#filter.py>>>test=[2,5,8,0,0,1,0]>>>_(filter(None,test))[2,5,8,1]>>>_(filter(lambdax:x,test))#equivalenttopreviousone[2,5,8,1]>>>_(filter(lambdax:x>4,test))#keeponlyitems>4[5,8]

Intheprecedingcode,noticehowthesecondcalltofilterisequivalenttothefirstone.Ifwepassafunctionthattakesoneargumentandreturnstheargumentitself,onlythoseargumentsthatareTruewillmakethefunctionreturnTrue,thereforethisbehaviorisexactlythesameaspassingNone.It'softenaverygoodexercisetomimicsomeofthebuilt-inPythonbehaviors.Whenyousucceed,youcansayyoufullyunderstandhowPythonbehavesinaspecificsituation.

Armedwithmap,zip,andfilter(andseveralotherfunctionsfromthePythonstandardlibrary)wecanmassagesequencesveryeffectively.Butthosefunctionsarenottheonlywaytodoit.Solet'sseeoneofthenicestfeaturesofPython:comprehensions.

ComprehensionsComprehensionsareaconcisenotation,bothperformsomeoperationforacollectionofelements,and/orselectasubsetofthemthatmeetsomecondition.TheyareborrowedfromthefunctionalprogramminglanguageHaskell(https://www.haskell.org/),andcontributetogivingPythonafunctionalflavor,togetherwithiteratorsandgenerators.

Pythonoffersyoudifferenttypesofcomprehensions:list,dict,andset.We'llconcentrateonthefirstonefornow,andthenitwillbeeasytoexplaintheothertwo.

Let'sstartwithaverysimpleexample.Iwanttocalculatealistwiththesquaresofthefirst10naturalnumbers.Howwouldyoudoit?Thereareacoupleofequivalentways:

#squares.map.py

#IfyoucodelikethisyouarenotaPythondev!;)

>>>squares=[]

>>>forninrange(10):

...squares.append(n**2)

...

>>>squares

[0,1,4,9,16,25,36,49,64,81]

#Thisisbetter,oneline,niceandreadable

>>>squares=map(lambdan:n**2,range(10))

>>>_(squares)

[0,1,4,9,16,25,36,49,64,81]

Theprecedingexampleshouldbenothingnewforyou.Let'sseehowtoachievethesameresultusingalistcomprehension:

#squares.comprehension.py

>>>[n**2forninrange(10)]

[0,1,4,9,16,25,36,49,64,81]

Assimpleasthat.Isn'titelegant?Basicallywehaveputaforloopwithinsquarebrackets.Let'snowfilterouttheoddsquares.I'llshowyouhowtodoitwithmapandfilterfirst,andthenusingalistcomprehensionagain:

#even.squares.py

#usingmapandfilter

sq1=list(

map(lambdan:n**2,filter(lambdan:notn%2,range(10)))

)

#equivalent,butusinglistcomprehensions

sq2=[n**2forninrange(10)ifnotn%2]

print(sq1,sq1==sq2)#prints:[0,4,16,36,64]True

Ithinkthatnowthedifferenceinreadabilityisevident.Thelistcomprehensionreadsmuchbetter.It'salmostEnglish:givemeallsquares(n**2)fornbetween0and9ifniseven.

AccordingtothePythondocumentation:

Alistcomprehensionconsistsofbracketscontaininganexpressionfollowedbyaforclause,thenzeroormorefororifclauses.Theresultwillbeanewlistresultingfromevaluatingtheexpressioninthecontextoftheforandifclauseswhichfollowit.

NestedcomprehensionsLet'sseeanexampleofnestedloops.It'sverycommonwhendealingwithalgorithmstohavetoiterateonasequenceusingtwoplaceholders.Thefirstonerunsthroughthewholesequence,lefttoright.Thesecondoneaswell,butitstartsfromthefirstone,insteadof0.Theconceptisthatoftestingallpairswithoutduplication.Let'sseetheclassicalforloopequivalent:#pairs.for.loop.pyitems='ABCD'pairs=[]

forainrange(len(items)):forbinrange(a,len(items)):pairs.append((items[a],items[b]))

Ifyouprintpairsattheend,youget:

$pythonpairs.for.loop.py

[('A','A'),('A','B'),('A','C'),('A','D'),('B','B'),('B','C'),('B','D'),

('C','C'),('C','D'),('D','D')]

Allthetupleswiththesameletterarethosewherebisatthesamepositionasa.Now,let'sseehowwecantranslatethisinalistcomprehension:

#pairs.list.comprehension.py

items='ABCD'

pairs=[(items[a],items[b])

forainrange(len(items))forbinrange(a,len(items))]

Thisversionisjusttwolineslongandachievesthesameresult.Noticethatinthisparticularcase,becausetheforloopoverbhasadependencyona,itmustfollowtheforloopoverainthecomprehension.Ifyouswapthemaround,you'llgetanameerror.

FilteringacomprehensionWecanapplyfilteringtoacomprehension.Let'sdoitfirstwithfilter.Let'sfindallPythagoreantripleswhoseshortsidesarenumberssmallerthan10.Weobviouslydon'twanttotestacombinationtwice,andthereforewe'lluseatricksimilartotheonewesawinthepreviousexample:

#pythagorean.triple.py

frommathimportsqrt

#thiswillgenerateallpossiblepairs

mx=10

triples=[(a,b,sqrt(a**2+b**2))

forainrange(1,mx)forbinrange(a,mx)]

#thiswillfilteroutallnonpythagoreantriples

triples=list(

filter(lambdatriple:triple[2].is_integer(),triples))

print(triples)#prints:[(3,4,5.0),(6,8,10.0)]

APythagoreantripleisatriple(a,b,c)ofintegernumberssatisfyingtheequationa2+b2=c2.

Intheprecedingcode,wegeneratedalistofthree-tuples,triples.Eachtuplecontainstwointegernumbers(thelegs),andthehypotenuseofthePythagoreantrianglewhoselegsarethefirsttwonumbersinthetuple.Forexample,whenais3andbis4,thetuplewillbe(3,4,5.0),andwhenais5andbis7,thetuplewillbe(5,7,8.602325267042627).

Afterhavingallthetriplesdone,weneedtofilteroutallthosethatdon'thaveahypotenusethatisanintegernumber.Inordertodothis,wefilterbasedonfloat_number.is_integer()beingTrue.ThismeansthatofthetwoexampletuplesIshowedyoubefore,theonewith5.0hypotenusewillberetained,whiletheonewiththe8.602325267042627hypotenusewillbediscarded.

Thisisgood,butIdon'tlikethatthetriplehastwointegernumbersandafloat.Theyaresupposedtobeallintegers,solet'susemaptofixthis:

#pythagorean.triple.int.py

frommathimportsqrt

mx=10

triples=[(a,b,sqrt(a**2+b**2))

forainrange(1,mx)forbinrange(a,mx)]

triples=filter(lambdatriple:triple[2].is_integer(),triples)

#thiswillmakethethirdnumberinthetuplesinteger

triples=list(

map(lambdatriple:triple[:2]+(int(triple[2]),),triples))

print(triples)#prints:[(3,4,5),(6,8,10)]

Noticethestepweadded.Wetakeeachelementintriplesandwesliceit,takingonlythefirsttwoelementsinit.Then,weconcatenatetheslicewithaone-tuple,inwhichweputtheintegerversionofthatfloatnumberthatwedidn'tlike.Seemslikealotofwork,right?Indeeditis.Let'sseehowtodoallthiswithalistcomprehension:

#pythagorean.triple.comprehension.py

frommathimportsqrt

#thisstepisthesameasbefore

mx=10

triples=[(a,b,sqrt(a**2+b**2))

forainrange(1,mx)forbinrange(a,mx)]

#herewecombinefilterandmapinoneCLEANlistcomprehension

triples=[(a,b,int(c))fora,b,cintriplesifc.is_integer()]

print(triples)#prints:[(3,4,5),(6,8,10)]

Iknow.It'smuchbetter,isn'tit?It'sclean,readable,shorter.Inotherwords,it'selegant.

I'mgoingquitefasthere,asanticipatedintheSummaryofChapter4,Functions,theBuildingBlocksofCode.Areyouplayingwiththiscode?Ifnot,Isuggestyoudo.It'sveryimportantthatyouplayaround,breakthings,changethings,seewhathappens.Makesureyouhaveaclearunderstandingofwhatisgoingon.Youwanttobecomeaninja,right?

dictcomprehensionsDictionaryandsetcomprehensionsworkexactlylikethelistones,onlythereisalittledifferenceinthesyntax.Thefollowingexamplewillsufficetoexplaineverythingyouneedtoknow:

#dictionary.comprehensions.py

fromstringimportascii_lowercase

lettermap=dict((c,k)fork,cinenumerate(ascii_lowercase,1))

Ifyouprintlettermap,youwillseethefollowing(Iomittedthemiddleresults,yougetthegist):

$pythondictionary.comprehensions.py

{'a':1,

'b':2,

...

'y':25,

'z':26}

Whathappensintheprecedingcodeisthatwe'refeedingthedictconstructorwithacomprehension(technically,ageneratorexpression,we'llseeitinabit).Wetellthedictconstructortomakekey/valuepairsfromeachtupleinthecomprehension.WeenumeratethesequenceofalllowercaseASCIIletters,startingfrom1,usingenumerate.Pieceofcake.Thereisalsoanotherwaytodothesamething,whichisclosertotheotherdictionarysyntax:

lettermap={c:kfork,cinenumerate(ascii_lowercase,1)}

Itdoesexactlythesamething,withaslightlydifferentsyntaxthathighlightsabitmoreofthekey:valuepart.

Dictionariesdonotallowduplicationinthekeys,asshowninthefollowingexample:

#dictionary.comprehensions.duplicates.py

word='Hello'

swaps={c:c.swapcase()forcinword}

print(swaps)#prints:{'H':'h','e':'E','l':'L','o':'O'}

Wecreateadictionarywithkeys,thelettersinthe'Hello'string,andvaluesofthesameletters,butwiththecaseswapped.Noticethereisonlyone'l':'L'pair.

Theconstructordoesn'tcomplain,itsimplyreassignsduplicatestothelatestvalue.Let'smakethisclearerwithanotherexample;let'sassigntoeachkeyitspositioninthestring:

#dictionary.comprehensions.positions.py

word='Hello'

positions={c:kfork,cinenumerate(word)}

print(positions)#prints:{'H':0,'e':1,'l':3,'o':4}

Noticethevalueassociatedwiththeletter'l':3.The'l':2pairisn'tthere;ithasbeenoverriddenby'l':3.

setcomprehensionsThesetcomprehensionsareverysimilartolistanddictionaryones.Pythonallowsboththeset()constructortobeused,ortheexplicit{}syntax.Let'sseeonequickexample:#set.comprehensions.pyword='Hello'letters1=set(cforcinword)letters2={cforcinword}print(letters1)#prints:{'H','o','e','l'}print(letters1==letters2)#prints:True

Noticehowforsetcomprehensions,asfordictionaries,duplicationisnotallowedandthereforetheresultingsethasonlyfourletters.Also,noticethattheexpressionsassignedtoletters1andletters2produceequivalentsets.

Thesyntaxusedtocreateletters2isverysimilartotheonewecanusetocreateadictionarycomprehension.Youcanspotthedifferenceonlybythefactthatdictionariesrequirekeysandvalues,separatedbycolumns,whilesetsdon't.

Generators

GeneratorsareverypowerfultoolthatPythongiftsuswith.Theyarebasedontheconceptsofiteration,aswesaidbefore,andtheyallowforcodingpatternsthatcombineelegancewithefficiency.

Generatorsareoftwotypes:

Generatorfunctions:Theseareverysimilartoregularfunctions,butinsteadofreturningresultsthroughreturnstatements,theyuseyield,whichallowsthemtosuspendandresumetheirstatebetweeneachcallGeneratorexpressions:Theseareverysimilartothelistcomprehensionswe'veseeninthischapter,butinsteadofreturningalisttheyreturnanobjectthatproducesresultsonebyone

GeneratorfunctionsGeneratorfunctionsbehavelikeregularfunctionsinallrespects,exceptforonedifference.Insteadofcollectingresultsandreturningthematonce,theyareautomaticallyturnedintoiteratorsthatyieldresultsoneatatimewhenyoucallnextonthem.GeneratorfunctionsareautomaticallyturnedintotheirowniteratorsbyPython.

Thisisallverytheoreticalso,let'smakeitclearwhysuchamechanismissopowerful,andthenlet'sseeanexample.

SayIaskedyoutocountoutloudfrom1to1,000,000.Youstart,andatsomepointIaskyoutostop.Aftersometime,Iaskyoutoresume.Atthispoint,whatistheminimuminformationyouneedtobeabletoresumecorrectly?Well,youneedtorememberthelastnumberyoucalled.IfIstoppedyouafter31,415,youwilljustgoonwith31,416,andsoon.

Thepointis,youdon'tneedtorememberallthenumbersyousaidbefore31,415,nordoyouneedthemtobewrittendownsomewhere.Well,youmaynotknowit,butyou'rebehavinglikeageneratoralready!

Takeagoodlookatthefollowingcode:

#first.n.squares.py

defget_squares(n):#classicfunctionapproach

return[x**2forxinrange(n)]

print(get_squares(10))

defget_squares_gen(n):#generatorapproach

forxinrange(n):

yieldx**2#weyield,wedon'treturn

print(list(get_squares_gen(10)))

Theresultofthetwoprintstatementswillbethesame:[0,1,4,9,16,25,36,49,64,81].Butthereisahugedifferencebetweenthetwofunctions.get_squaresisaclassicfunctionthatcollectsallthesquaresofnumbersin[0,n)inalist,andreturnsit.Ontheotherhand,get_squares_genisagenerator,andbehavesverydifferently.Eachtimetheinterpreterreachestheyieldline,itsexecutionissuspended.Theonlyreasonthoseprintstatementsreturnthesameresultisbecausewefedget_squares_gentothelistconstructor,whichexhauststhe

generatorcompletelybyaskingthenextelementuntilaStopIterationisraised.Let'sseethisindetail:

#first.n.squares.manual.py

defget_squares_gen(n):

forxinrange(n):

yieldx**2

squares=get_squares_gen(4)#thiscreatesageneratorobject

print(squares)#<generatorobjectget_squares_genat0x10dd...>

print(next(squares))#prints:0

print(next(squares))#prints:1

print(next(squares))#prints:4

print(next(squares))#prints:9

#thefollowingraisesStopIteration,thegeneratorisexhausted,

#anyfurthercalltonextwillkeepraisingStopIteration

print(next(squares))

Intheprecedingcode,eachtimewecallnextonthegeneratorobject,weeitherstartit(firstnext)ormakeitresumefromthelastsuspensionpoint(anyothernext).

Thefirsttimewecallnextonit,weget0,whichisthesquareof0,then1,then4,then9,andsincetheforloopstopsafterthat(nis4),thenthegeneratornaturallyends.AclassicfunctionwouldatthatpointjustreturnNone,butinordertocomplywiththeiterationprotocol,ageneratorwillinsteadraiseaStopIterationexception.

Thisexplainshowaforloopworks.Whenyoucallforkinrange(n),whathappensunderthehoodisthattheforloopgetsaniteratoroutofrange(n)andstartscallingnextonit,untilStopIterationisraised,whichtellstheforloopthattheiterationhasreacheditsend.

HavingthisbehaviorbuiltintoeveryiterationaspectofPythonmakesgeneratorsevenmorepowerfulbecauseoncewewritethem,we'llbeabletoplugthemintowhateveriterationmechanismwewant.

Atthispoint,you'reprobablyaskingyourselfwhyyouwouldwanttouseageneratorinsteadofaregularfunction.Well,thetitleofthischaptershouldsuggesttheanswer.I'lltalkaboutperformanceslater,sofornowlet'sconcentrateonanotheraspect:sometimesgeneratorsallowyoutodosomethingthatwouldn'tbepossiblewithasimplelist.Forexample,sayyouwanttoanalyzeallpermutationsofasequence.IfthesequencehasalengthofN,thenthenumberofitspermutationsisN!.Thismeansthatifthesequenceis10elementslong,the

numberofpermutationsis3,628,800.Butasequenceof20elementswouldhave2,432,902,008,176,640,000permutations.Theygrowfactorially.

Nowimagineyouhaveaclassicfunctionthatisattemptingtocalculateallpermutations,puttheminalist,andreturnittoyou.With10elements,itwouldrequireprobablyafewdozenseconds,butfor20elementsthereissimplynowaythatitcanbedone.

Ontheotherhand,ageneratorfunctionwillbeabletostartthecomputationandgiveyoubackthefirstpermutation,thenthesecond,andsoon.Ofcourseyouwon'thavethetimetoparsethemall,therearetoomany,butatleastyou'llbeabletoworkwithsomeofthem.

Rememberwhenweweretalkingaboutthebreakstatementinforloops?Whenwefoundanumberdividingacandidateprimewewerebreakingtheloop,andtherewasnoneedtogoon.

Sometimesit'sexactlythesame,onlytheamountofdatayouhavetoiterateoverissohugethatyoucannotkeepitallinmemoryinalist.Inthiscase,generatorsareinvaluable:theymakepossiblewhatwouldn'tbepossibleotherwise.

So,inordertosavememory(andtime),usegeneratorfunctionswheneverpossible.

It'salsoworthnotingthatyoucanusethereturnstatementinageneratorfunction.ItwillproduceaStopIterationexceptiontoberaised,effectivelyendingtheiteration.Thisisextremelyimportant.Ifareturnstatementwereactuallytomakethefunctionreturnsomething,itwouldbreaktheiterationprotocol.Python'sconsistencypreventsthis,andallowsusgreateasewhencoding.Let'sseeaquickexample:

#gen.yield.return.py

defgeometric_progression(a,q):

k=0

whileTrue:

result=a*q**k

ifresult<=100000:

yieldresult

else:

return

k+=1

forningeometric_progression(2,5):

print(n)

Theprecedingcodeyieldsalltermsofthegeometricprogression,a,aq,aq2,aq3,....Whentheprogressionproducesatermthatisgreaterthan100000,thegeneratorstops(withareturnstatement).Runningthecodeproducesthefollowingresult:

$pythongen.yield.return.py

2

10

50

250

1250

6250

31250

Thenexttermwouldhavebeen156250,whichistoobig.

SpeakingaboutStopIteration,asofPython3.5,thewaythatexceptionsarehandledingeneratorshaschanged.Tounderstandtheimplicationsofthechangeisprobablyaskingtoomuchofyouatthispoint,sojustknowthatyoucanreadallaboutitinPEP479(https://legacy.python.org/dev/peps/pep-0479/).

GoingbeyondnextAtthebeginningofthischapter,Itoldyouthatgeneratorobjectsarebasedontheiterationprotocol.We'llseeinChapter6,OOP,Decorators,andIteratorsacompleteexampleofhowtowriteacustomiterator/iterableobject.Fornow,Ijustwantyoutounderstandhownext()works.

Whathappenswhenyoucallnext(generator)isthatyou'recallingthegenerator.__next__()method.Remember,amethodisjustafunctionthatbelongstoanobject,andobjectsinPythoncanhavespecialmethods.__next__()isjustoneoftheseanditspurposeistoreturnthenextelementoftheiteration,ortoraiseStopIterationwhentheiterationisoverandtherearenomoreelementstoreturn.

Ifyourecall,inPython,anobject'sspecialmethodsarealsocalledmagicmethods,ordunder(from"doubleunderscore")methods.

Whenwewriteageneratorfunction,Pythonautomaticallytransformsitintoanobjectthatisverysimilartoaniterator,andwhenwecallnext(generator),thatcallistransformedingenerator.__next__().Let'srevisitthepreviousexampleaboutgeneratingsquares:

#first.n.squares.manual.method.py

defget_squares_gen(n):

forxinrange(n):

yieldx**2

squares=get_squares_gen(3)

print(squares.__next__())#prints:0

print(squares.__next__())#prints:1

print(squares.__next__())#prints:4

#thefollowingraisesStopIteration,thegeneratorisexhausted,

#anyfurthercalltonextwillkeepraisingStopIteration

Theresultisexactlyasthepreviousexample,onlythistimeinsteadofusingthenext(squares)proxycall,we'redirectlycallingsquares.__next__().

Generatorobjectshavealsothreeothermethodsthatallowustocontroltheirbehavior:send,throw,andclose.sendallowsustocommunicateavaluebacktothegeneratorobject,whilethrowandclose,respectively,allowustoraiseanexceptionwithinthegeneratorandcloseit.TheiruseisquiteadvancedandI

won'tbecoveringthemhereindetail,butIwanttospendafewwordsonsend,withasimpleexample:

#gen.send.preparation.py

defcounter(start=0):

n=start

whileTrue:

yieldn

n+=1

c=counter()

print(next(c))#prints:0

print(next(c))#prints:1

print(next(c))#prints:2

Theprecedingiteratorcreatesageneratorobjectthatwillrunforever.Youcankeepcallingit,anditwillneverstop.Alternatively,youcanputitinaforloop,forexample,fornincounter():...,anditwillgoonforeveraswell.Butwhatifyouwantedtostopitatsomepoint?Onesolutionistouseavariabletocontrolthewhileloop.Somethingsuchasthis:

#gen.send.preparation.stop.py

stop=False

defcounter(start=0):

n=start

whilenotstop:

yieldn

n+=1

c=counter()

print(next(c))#prints:0

print(next(c))#prints:1

stop=True

print(next(c))#raisesStopIteration

Thiswilldoit.Westartwithstop=False,anduntilwechangeittoTrue,thegeneratorwilljustkeepgoing,likebefore.ThemomentwechangestoptoTruethough,thewhileloopwillexit,andthenextcallwillraiseaStopIterationexception.Thistrickworks,butIdon'tlikeit.Wedependonanexternalvariable,andthiscanleadtoissues:whatifanotherfunctionchangesthatstop?Moreover,thecodeisscattered.Inanutshell,thisisn'tgoodenough.

Wecanmakeitbetterbyusinggenerator.send().Whenwecallgenerator.send(),thevaluethatwefeedtosendwillbepassedintothegenerator,executionisresumed,andwecanfetchitviatheyieldexpression.Thisisallverycomplicatedwhenexplainedwithwords,solet'sseeanexample:

#gen.send.py

defcounter(start=0):

n=start

whileTrue:

result=yieldn#A

print(type(result),result)#B

ifresult=='Q':

break

n+=1

c=counter()

print(next(c))#C

print(c.send('Wow!'))#D

print(next(c))#E

print(c.send('Q'))#F

Executionoftheprecedingcodeproducesthefollowing:

$pythongen.send.py

0

<class'str'>Wow!

1

<class'NoneType'>None

2

<class'str'>Q

Traceback(mostrecentcalllast):

File"gen.send.py",line14,in<module>

print(c.send('Q'))#F

StopIteration

Ithinkit'sworthgoingthroughthiscodelinebyline,likeifwewereexecutingit,toseewhetherwecanunderstandwhat'sgoingon.

Westartthegeneratorexecutionwithacalltonext(#C).Withinthegenerator,nissettothesamevalueasstart.Thewhileloopisentered,executionstops(#A)andn(0)isyieldedbacktothecaller.0isprintedontheconsole.

Wethencallsend(#D),executionresumes,andresultissetto'Wow!'(still#A),thenitstypeandvalueareprintedontheconsole(#B).resultisnot'Q',thereforenisincrementedby1andexecutiongoesbacktothewhilecondition,which,beingTrue,evaluatestoTrue(thatwasn'thardtoguess,right?).Anotherloopcyclebegins,executionstopsagain(#A),andn(1)isyieldedbacktothecaller.1isprintedontheconsole.

Atthispoint,wecallnext(#E),executionisresumedagain(#A),andbecausewearenotsendinganythingtothegeneratorexplicitly,Pythonbehavesexactlylikefunctionsthatarenotusingthereturnstatement;theyieldnexpression(#A)returnsNone.resultthereforeissettoNone,anditstypeandvalueareyetagainprintedontheconsole(#B).Executioncontinues,resultisnot'Q'sonisincrementedby1,andwestartanotherloopagain.Executionstopsagain(#A)and

n(2)isyieldedbacktothecaller.2isprintedontheconsole.

Andnowforthegrandfinale:wecallsendagain(#F),butthistimewepassin'Q',thereforewhenexecutionisresumed,resultissetto'Q'(#A).Itstypeandvalueareprintedontheconsole(#B),andthenfinallytheifclauseevaluatestoTrueandthewhileloopisstoppedbythebreakstatement.Thegeneratornaturallyterminates,whichmeansaStopIterationexceptionisraised.Youcanseetheprintofitstracebackonthelastfewlinesprintedontheconsole.

Thisisnotatallsimpletounderstandatfirst,soifit'snotcleartoyou,don'tbediscouraged.Youcankeepreadingonandthenyoucancomebacktothisexampleaftersometime.

Usingsendallowsforinterestingpatterns,andit'sworthnotingthatsendcanalsobeusedtostarttheexecutionofagenerator(providedyoucallitwithNone).

TheyieldfromexpressionAnotherinterestingconstructistheyieldfromexpression.Thisexpressionallowsyoutoyieldvaluesfromasubiterator.Itsuseallowsforquiteadvancedpatterns,solet'sjustseeaveryquickexampleofit:

#gen.yield.for.py

defprint_squares(start,end):

forninrange(start,end):

yieldn**2

forninprint_squares(2,5):

print(n)

Thepreviouscodeprintsthenumbers4,9,16ontheconsole(onseparatelines).Bynow,Iexpectyoutobeabletounderstanditbyyourself,butlet'squicklyrecapwhathappens.Theforloopoutsidethefunctiongetsaniteratorfromprint_squares(2,5)andcallsnextonituntiliterationisover.Everytimethegeneratoriscalled,executionissuspended(andlaterresumed)onyieldn**2,whichreturnsthesquareofthecurrentn.Let'sseehowwecantransformthiscodebenefitingfromtheyieldfromexpression:

#gen.yield.from.py

defprint_squares(start,end):

yieldfrom(n**2forninrange(start,end))

forninprint_squares(2,5):

print(n)

Thiscodeproducesthesameresult,butasyoucanseeyieldfromisactuallyrunningasubiterator,(n**2...).Theyieldfromexpressionreturnstothecallereachvaluethesubiteratorisproducing.It'sshorteranditreadsbetter.

GeneratorexpressionsLet'snowtalkabouttheothertechniquestogeneratevaluesoneatatime.

Thesyntaxisexactlythesameaslistcomprehensions,only,insteadofwrappingthecomprehensionwithsquarebrackets,youwrapitwithroundbrackets.Thatiscalledageneratorexpression.

Ingeneral,generatorexpressionsbehavelikeequivalentlistcomprehensions,butthereisoneveryimportantthingtoremember:generatorsallowforoneiterationonly,thentheywillbeexhausted.Let'sseeanexample:

#generator.expressions.py

>>>cubes=[k**3forkinrange(10)]#regularlist

>>>cubes

[0,1,8,27,64,125,216,343,512,729]

>>>type(cubes)

<class'list'>

>>>cubes_gen=(k**3forkinrange(10))#createasgenerator

>>>cubes_gen

<generatorobject<genexpr>at0x103fb5a98>

>>>type(cubes_gen)

<class'generator'>

>>>_(cubes_gen)#thiswillexhaustthegenerator

[0,1,8,27,64,125,216,343,512,729]

>>>_(cubes_gen)#nothingmoretogive

[]

Lookatthelineinwhichthegeneratorexpressioniscreatedandassignedthenamecubes_gen.Youcanseeit'sageneratorobject.Inordertoseeitselements,wecanuseaforloop,amanualsetofcallstonext,orsimply,feedittoalistconstructor,whichiswhatIdid(rememberI'musing_asanalias).

Noticehow,oncethegeneratorhasbeenexhausted,thereisnowaytorecoverthesameelementsfromitagain.Weneedtorecreateitifwewanttouseitfromscratchagain.

Inthenextfewexamples,let'sseehowtoreproducemapandfilterusinggeneratorexpressions:

#gen.map.py

defadder(*n):

returnsum(n)

s1=sum(map(lambda*n:adder(*n),range(100),range(1,101)))

s2=sum(adder(*n)forninzip(range(100),range(1,101)))

Inthepreviousexample,s1ands2areexactlythesame:theyarethesumofadder(0,1),adder(1,2),adder(2,3),andsoon,whichtranslatestosum(1,3,5,...).Thesyntaxisdifferent,thoughIfindthegeneratorexpressiontobemuchmorereadable:

#gen.filter.py

cubes=[x**3forxinrange(10)]

odd_cubes1=filter(lambdacube:cube%2,cubes)

odd_cubes2=(cubeforcubeincubesifcube%2)

Inthepreviousexample,odd_cubes1andodd_cubes2arethesame:theygenerateasequenceofoddcubes.Yetagain,Ipreferthegeneratorsyntax.Thisshouldbeevidentwhenthingsgetalittlemorecomplicated:

#gen.map.filter.py

N=20

cubes1=map(

lambdan:(n,n**3),

filter(lambdan:n%3==0orn%5==0,range(N))

)

cubes2=(

(n,n**3)forninrange(N)ifn%3==0orn%5==0)

Theprecedingcodecreatestwogenerators,cubes1andcubes2.Theyareexactlythesame,andreturntwo-tuples(n,n3)whennisamultipleof3or5.

Ifyouprintthelist(cubes1),youget:[(0,0),(3,27),(5,125),(6,216),(9,729),(10,1000),(12,1728),(15,3375),(18,5832)].

Seehowmuchbetterthegeneratorexpressionreads?Itmaybedebatablewhenthingsareverysimple,butassoonasyoustartnestingfunctionsabit,likewedidinthisexample,thesuperiorityofthegeneratorsyntaxisevident.It'sshorter,simpler,andmoreelegant.

Now,letmeaskyouaquestion—whatisthedifferencebetweenthefollowinglinesofcode:

#sum.example.py

s1=sum([n**2forninrange(10**6)])

s2=sum((n**2forninrange(10**6)))

s3=sum(n**2forninrange(10**6))

Strictlyspeaking,theyallproducethesamesum.Theexpressionstogets2ands3

areexactlythesamebecausethebracketsins2areredundant.Theyarebothgeneratorexpressionsinsidethesumfunction.Theexpressiontogets1isdifferentthough.Insidesum,wefindalistcomprehension.Thismeansthatinordertocalculates1,thesumfunctionhastocallnextonalistamilliontimes.

Doyouseewherewe'relosingtimeandmemory?Beforesumcanstartcallingnextonthatlist,thelistneedstohavebeencreated,whichisawasteoftimeandspace.It'smuchbetterforsumtocallnextonasimplegeneratorexpression.Thereisnoneedtohaveallthenumbersfromrange(10**6)storedinalist.

So,watchoutforextraparentheseswhenyouwriteyourexpressions:sometimesit'seasytoskipoverthesedetails,whichmakesourcodeverydifferent.Ifyoudon'tbelieveme,checkoutthefollowingcode:

#sum.example.2.py

s=sum([n**2forninrange(10**8)])#thisiskilled

#s=sum(n**2forninrange(10**8))#thissucceeds

print(s)#prints:333333328333333350000000

Tryrunningtheprecedingexample.IfIrunthefirstlineonmyoldLinuxboxwith8GBRAM,thisiswhatIget:

$pythonsum.example.2.py

Killed

Ontheotherhand,ifIcommentoutthefirstline,anduncommentthesecondone,thisistheresult:

$pythonsum.example.2.py

333333328333333350000000

Sweetgeneratorexpressions.Thedifferencebetweenthetwolinesisthatinthefirstone,alistwiththesquaresofthefirsthundredmillionnumbersmustbemadebeforebeingabletosumthemup.Thatlistishuge,andweranoutofmemory(atleast,myboxdid,ifyoursdoesn'ttryabiggernumber),thereforePythonkillstheprocessforus.Sadface.

Butwhenweremovethesquarebrackets,wedon'thavealistanymore.Thesumfunctionreceives0,1,4,9,andsoonuntilthelastone,andsumsthemup.Noproblems,happyface.

SomeperformanceconsiderationsSo,we'veseenthatwehavemanydifferentwaystoachievethesameresult.Wecanuseanycombinationofmap,zip,andfilter,orchoosetogowithacomprehension,ormaybechoosetouseagenerator,eitherfunctionorexpression.Wemayevendecidetogowithforloops;whenthelogictoapplytoeachrunningparameterisn'tsimple,theymaybethebestoption.

Otherthanreadabilityconcernsthough,let'stalkaboutperformance.Whenitcomestoperformance,usuallytherearetwofactorsthatplayamajorrole:spaceandtime.

Spacemeansthesizeofthememorythatadatastructureisgoingtotakeup.Thebestwaytochooseistoaskyourselfifyoureallyneedalist(ortuple)orifasimplegeneratorfunctionwouldworkaswell.Iftheanswerisyes,gowiththegenerator,it'llsavealotofspace.Thesamegoesforfunctions;ifyoudon'tactuallyneedthemtoreturnalistortuple,thenyoucantransformthemintogeneratorfunctionsaswell.

Sometimes,youwillhavetouselists(ortuples),forexampletherearealgorithmsthatscansequencesusingmultiplepointersormaybetheyrunoverthesequencemorethanonce.Ageneratorfunction(orexpression)canbeiteratedoveronlyonceandthenit'sexhausted,sointhesesituations,itwouldn'tbetherightchoice.

Timeisabitharderthanspacebecauseitdependsonmorevariablesandthereforeitisn'tpossibletostatethatXisfasterthanYwithabsolutecertaintyforallcases.However,basedontestsrunonPythontoday,wecansaythatonaverage,mapexhibitsperformancessimilartolistcomprehensionsandgeneratorexpressions,whileforloopsareconsistentlyslower.

Inordertoappreciatethereasoningbehindthesestatementsfully,weneedtounderstandhowPythonworks,andthisisabitoutsidethescopeofthisbook,asit'stootechnicalindetail.Let'sjustsaythatmapandlistcomprehensionsrunatC-languagespeedwithintheinterpreter,whileaPythonforloopisrunasPythonbytecodewithinthePythonVirtualMachine,whichisoftenmuchslower.

ThereareseveraldifferentimplementationsofPython.Theoriginalone,andstillthemostcommonone,isCPython(https://github.com/python/cpython),whichiswritteninC.Cisoneofthemostpowerfulandpopularprogramminglanguagesstillusedtoday.

HowaboutwedoasmallexerciseandtrytofindoutwhethertheclaimsImadeareaccurate?Iwillwriteasmallpieceofcodethatcollectstheresultsofdivmod(a,b)foracertainsetofintegerpairs,(a,b).IwillusethetimefunctionfromthetimemoduletocalculatetheelapsedtimeoftheoperationsthatIwillperform:

#performances.py

fromtimeimporttime

mx=5000

t=time()#starttimefortheforloop

floop=[]

forainrange(1,mx):

forbinrange(a,mx):

floop.append(divmod(a,b))

print('forloop:{:.4f}s'.format(time()-t))#elapsedtime

t=time()#starttimeforthelistcomprehension

compr=[

divmod(a,b)forainrange(1,mx)forbinrange(a,mx)]

print('listcomprehension:{:.4f}s'.format(time()-t))

t=time()#starttimeforthegeneratorexpression

gener=list(

divmod(a,b)forainrange(1,mx)forbinrange(a,mx))

print('generatorexpression:{:.4f}s'.format(time()-t))

Asyoucansee,we'recreatingthreelists:floop,compr,andgener.Runningthecodeproducesthefollowing:

$pythonperformances.py

forloop:4.4814s

listcomprehension:3.0210s

generatorexpression:3.4334s

Thelistcomprehensionrunsin~67%ofthetimetakenbytheforloop.That'simpressive.Thegeneratorexpressioncamequiteclosetothat,withagood~77%.Thereasonthegeneratorexpressionissloweristhatweneedtofeedittothelist()constructor,andthishasalittlebitmoreoverheadcomparedtoasheerlistcomprehension.IfIdidn'thavetoretaintheresultsofthosecalculations,ageneratorwouldprobablyhavebeenamoresuitableoption.

Aninterestingresultistonoticethat,withinthebodyoftheforloop,we'reappendingdatatoalist.ThisimpliesthatPythondoesthework,behindthescenes,ofresizingiteverynowandthen,allocatingspaceforitemstobe

appended.Iguessedthatcreatingalistofzeros,andsimplyfillingitwiththeresults,mighthavespeduptheforloop,butIwaswrong.Checkitforyourself,youjustneedmx*(mx-1)//2elementstobepreallocated.

Let'sseeasimilarexamplethatcomparesaforloopandamapcall:

#performances.map.py

fromtimeimporttime

mx=2*10**7

t=time()

absloop=[]

forninrange(mx):

absloop.append(abs(n))

print('forloop:{:.4f}s'.format(time()-t))

t=time()

abslist=[abs(n)forninrange(mx)]

print('listcomprehension:{:.4f}s'.format(time()-t))

t=time()

absmap=list(map(abs,range(mx)))

print('map:{:.4f}s'.format(time()-t))

Thiscodeisconceptuallyverysimilartothepreviousexample.Theonlythingthathaschangedisthatwe'reapplyingtheabsfunctioninsteadofthedivmodone,andwehaveonlyoneloopinsteadoftwonestedones.Executiongivesthefollowingresult:

$pythonperformances.map.py

forloop:3.8948s

listcomprehension:1.8594s

map:1.1548s

Andmapwinstherace:~62%ofthelistcomprehensionand~30%oftheforloop.Taketheseresultswithapinchofsalt,asthingsmightbedifferentaccordingtovariousfactors,suchasOSandPythonversion.Butingeneral,Ithinkit'ssafetosaythattheseresultsaregoodenoughforhavinganideawhenitcomestocodingforperformance.

Apartfromthecase-by-caselittledifferencesthough,it'squiteclearthattheforloopoptionistheslowestone,solet'sseewhywestillwanttouseit.

Don'toverdocomprehensionsandgeneratorsWe'veseenhowpowerfullistcomprehensionsandgeneratorexpressionscanbe.Andtheyare,don'tgetmewrong,butthefeelingthatIhavewhenIdealwiththemisthattheircomplexitygrowsexponentially.Themoreyoutrytodowithinasinglecomprehensionorageneratorexpression,theharderitbecomestoread,understand,andthereforemaintainorchange.

IfyouchecktheZenofPythonagain,thereareafewlinesthatIthinkareworthkeepinginmindwhendealingwithoptimizedcode:

>>>importthis

...

Explicitisbetterthanimplicit.

Simpleisbetterthancomplex.

...

Readabilitycounts.

...

Iftheimplementationishardtoexplain,it'sabadidea.

...

Comprehensionsandgeneratorexpressionsaremoreimplicitthanexplicit,canbequitedifficulttoreadandunderstand,andtheycanbehardtoexplain.Sometimesyouhavetobreakthemapartusingtheinside-outtechnique,tounderstandwhat'sgoingon.

Togiveyouanexample,let'stalkabitmoreaboutPythagoreantriples.Justtoremindyou,aPythagoreantripleisatupleofpositiveintegers(a,b,c)suchthata2+b2=c2.

WesawhowtocalculatethemintheFilteringacomprehensionsection,butwediditinaveryinefficientwaybecausewewerescanningallpairsofnumbersbelowacertainthreshold,calculatingthehypotenuse,andfilteringoutthosethatwerenotproducingatriple.

AbetterwaytogetalistofPythagoreantriplesistogeneratethemdirectly.Therearemanydifferentformulasyoucanusetodothis,we'llusethe

Euclideanformula.

Thisformulasaysthatanytriple(a,b,c),wherea=m2-n2,b=2mn,c=m2+n2,withmandnpositiveintegerssuchthatm>n,isaPythagoreantriple.Forexample,whenm=2andn=1,wefindthesmallesttriple:(3,4,5).

Thereisonecatchthough:considerthetriple(6,8,10)thatisjustlike(3,4,5)withallthenumbersmultipliedby2.ThistripleisdefinitelyPythagorean,since62+82=102,butwecanderiveitfrom(3,4,5)simplybymultiplyingeachofitselementsby2.Samegoesfor(9,12,15),(12,16,20),andingeneralforallthetriplesthatwecanwriteas(3k,4k,5k),withkbeingapositiveintegergreaterthan1.

Atriplethatcannotbeobtainedbymultiplyingtheelementsofanotheronebysomefactor,k,iscalledprimitive.Anotherwayofstatingthisis:ifthethreeelementsofatriplearecoprime,thenthetripleisprimitive.Twonumbersarecoprimewhentheydon'tshareanyprimefactoramongsttheirdivisors,thatis,theirgreatestcommondivisor(GCD)is1.Forexample,3and5arecoprime,while3and6arenot,becausetheyarebothdivisibleby3.

So,theEuclideanformulatellsusthatifmandnarecoprime,andm-nisodd,thetripletheygenerateisprimitive.Inthefollowingexample,wewillwriteageneratorexpressiontocalculatealltheprimitivePythagoreantripleswhosehypotenuse(c)islessthanorequaltosomeinteger,N.Thismeanswewantalltriplesforwhichm2+n2≤N.Whennis1,theformulalookslikethis:m2≤N-1,whichmeanswecanapproximatethecalculationwithanupperboundofm≤N1/2.

So,torecap:mmustbegreaterthann,theymustalsobecoprime,andtheirdifferencem-nmustbeodd.Moreover,inordertoavoiduselesscalculations,we'llputtheupperboundformatfloor(sqrt(N))+1.

Thefloorfunctionforarealnumber,x,givesthemaximuminteger,n,suchthatn<x,forexample,floor(3.8)=3,floor(13.1)=13.Takingfloor(sqrt(N))+1meanstakingtheintegerpartofthesquarerootofNandaddingaminimalmarginjusttomakesurewedon'tmissanynumbers.

Let'sputallofthisintocode,stepbystep.Let'sstartbywritingasimplegcdfunctionthatusesEuclid'salgorithm:

#functions.py

defgcd(a,b):

"""CalculatetheGreatestCommonDivisorof(a,b)."""

whileb!=0:

a,b=b,a%b

returna

TheexplanationofEuclid'salgorithmisavailableontheweb,soIwon'tspendanytimeheretalkingaboutit;weneedtofocusonthegeneratorexpression.ThenextstepistousetheknowledgewegatheredbeforetogeneratealistofprimitivePythagoreantriples:

#pythagorean.triple.generation.py

fromfunctionsimportgcd

N=50

triples=sorted(#1

((a,b,c)fora,b,cin(#2

((m**2-n**2),(2*m*n),(m**2+n**2))#3

forminrange(1,int(N**.5)+1)#4

forninrange(1,m)#5

if(m-n)%2andgcd(m,n)==1#6

)ifc<=N),key=lambda*triple:sum(*triple)#7

)

Thereyougo.It'snoteasytoread,solet'sgothroughitlinebyline.At#3,westartageneratorexpressionthatiscreatingtriples.Youcanseefrom#4and#5thatwe'reloopingonmin[1,M]withMbeingtheintegerpartofsqrt(N),plus1.Ontheotherhand,nloopswithin[1,m),torespectthem>nrule.It'sworthnotinghowIcalculatedsqrt(N),thatis,N**.5,whichisjustanotherwaytodoitthatIwantedtoshowyou.

At#6,youcanseethefilteringconditionstomakethetriplesprimitive:(m-n)%2evaluatestoTruewhen(m-n)isodd,andgcd(m,n)==1meansmandnarecoprime.Withtheseinplace,weknowthetripleswillbeprimitive.Thistakescareoftheinnermostgeneratorexpression.Theoutermostonestartsat#2,andfinishesat#7.Wetakethetriples(a,b,c)in(...innermostgenerator...)suchthatc<=N.

Finally,at#1weapplysorting,topresentthelistinorder.At#7,aftertheoutermostgeneratorexpressionisclosed,youcanseethatwespecifythesortingkeytobethesuma+b+c.Thisisjustmypersonalpreference,thereisnomathematicalreasonbehindit.

So,whatdoyouthink?Wasitstraightforwardtoread?Idon'tthinkso.Andbelieveme,thisisstillasimpleexample;Ihaveseenmuchworseinmycareer.

Thiskindofcodeisdifficulttounderstand,debug,andmodify.Itshouldn'tfindaplaceinaprofessionalenvironment.

So,let'sseeifwecanrewritethiscodeintosomethingmorereadable:

#pythagorean.triple.generation.for.py

fromfunctionsimportgcd

defgen_triples(N):

forminrange(1,int(N**.5)+1):#1

forninrange(1,m):#2

if(m-n)%2andgcd(m,n)==1:#3

c=m**2+n**2#4

ifc<=N:#5

a=m**2-n**2#6

b=2*m*n#7

yield(a,b,c)#8

triples=sorted(

gen_triples(50),key=lambda*triple:sum(*triple))#9

Thisissomuchbetter.Let'sgothroughit,linebyline.You'llseehowmucheasieritistounderstand.

Westartloopingat#1and#2,inexactlythesamewaywewereloopinginthepreviousexample.Online#3,wehavethefilteringforprimitivetriples.Online#4,wedeviateabitfromwhatweweredoingbefore:wecalculatec,andonline#5,wefilteroncbeinglessthanorequaltoN.Onlywhencsatisfiesthatcondition,wedocalculateaandb,andyieldtheresultingtuple.It'salwaysgoodtodelayallcalculationsforasmuchaspossiblesothatwedon'twastetimeandCPU.Onthelastline,weapplysortingwiththesamekeywewereusinginthegeneratorexpressionexample.

Ihopeyouagree,thisexampleiseasiertounderstand.AndIpromiseyou,ifyouhavetomodifythecodeoneday,you'llfindthatmodifyingthisoneiseasy,whiletomodifytheotherversionwilltakemuchlonger(anditwillbemoreerror-prone).

Ifyouprinttheresultsofbothexamples(theyarethesame),youwillgetthis:

[(3,4,5),(5,12,13),(15,8,17),(7,24,25),(21,20,29),(35,12,37),(9,40,

41)]

Themoralofthestoryis,tryandusecomprehensionsandgeneratorexpressionsasmuchasyoucan,butifthecodestartstobecomplicatedtomodifyortoread,

youmaywanttorefactoritintosomethingmorereadable.Yourcolleagueswillthankyou.

NamelocalizationNowthatwearefamiliarwithalltypesofcomprehensionsandgeneratorexpression,let'stalkaboutnamelocalizationwithinthem.Python3.*localizesloopvariablesinallfourformsofcomprehensions:list,dict,set,andgeneratorexpressions.Thisbehavioristhereforedifferentfromthatoftheforloop.Let'sseeasimpleexampletoshowallthecases:#scopes.pyA=100ex1=[AforAinrange(5)]print(A)#prints:100

ex2=list(AforAinrange(5))print(A)#prints:100

ex3=dict((A,2*A)forAinrange(5))print(A)#prints:100

ex4=set(AforAinrange(5))print(A)#prints:100

s=0forAinrange(5):s+=Aprint(A)#prints:4

Intheprecedingcode,wedeclareaglobalname,A=100,andthenweexercisethefourcomprehensions:list,generatorexpression,dictionary,andset.Noneofthemaltertheglobalname,A.Conversely,youcanseeattheendthattheforloopmodifiesit.Thelastprintstatementprints4.

Let'sseewhathappensifAwasn'tthere:

#scopes.noglobal.py

ex1=[AforAinrange(5)]

print(A)#breaks:NameError:name'A'isnotdefined

Theprecedingcodewouldworkthesamewithanyofthefourtypesof

comprehensions.Afterwerunthefirstline,Aisnotdefinedintheglobalnamespace.Onceagain,theforloopbehavesdifferently:

#scopes.for.py

s=0

forAinrange(5):

s+=A

print(A)#prints:4

print(globals())

Theprecedingcodeshowsthatafteraforloop,iftheloopvariablewasn'tdefinedbeforeit,wecanfinditintheglobalframe.Tomakesureofit,let'stakeapeekatitbycallingtheglobals()built-infunction:

$pythonscopes.for.py

4

{'__name__':'__main__','__doc__':None,...,'s':10,'A':4}

TogetherwithalotofotherboilerplatestuffthatIhaveomitted,wecanspot'A':4.

Generationbehaviorinbuilt-insAmongthebuilt-intypes,thegenerationbehaviorisnowquitecommon.ThisisamajordifferencebetweenPython2andPython3.Alotoffunctions,suchasmap,zip,andfilter,havebeentransformedsothattheyreturnobjectsthatbehavelikeiterables.Theideabehindthischangeisthatifyouneedtomakealistofthoseresults,youcanalwayswrapthecallinalist()class,andyou'redone.Ontheotherhand,ifyoujustneedtoiterateandwanttokeeptheimpactonmemoryaslightaspossible,youcanusethosefunctionssafely.

Anothernotableexampleistherangefunction.InPython2itreturnsalist,andthereisanotherfunctioncalledxrangethatreturnsanobjectthatyoucaniterateon,whichgeneratesthenumbersonthefly.InPython3thisfunctionhasgone,andrangenowbehaveslikeit.

Butthisconcept,ingeneral,isnowquitewidespread.Youcanfinditintheopen()function,whichisusedtooperateonfileobjects(we'llseeitinChapter7,FilesandDataPersistence),butalsoinenumerate,inthedictionarykeys,values,anditemsmethods,andseveralotherplaces.

Itallmakessense:Python'saimistotrytoreducethememoryfootprintbyavoidingwastingspacewhereverpossible,especiallyinthosefunctionsandmethodsthatareusedextensivelyinmostsituations.

Doyourememberatthebeginningofthischapter?Isaidthatitmakesmoresensetooptimizetheperformancesofcodethathastodealwithalotofobjects,ratherthanshavingoffafewmillisecondsfromafunctionthatwecalltwiceaday.

OnelastexampleBeforewefinishthischapter,I'llshowyouasimpleproblemthatIusedtosubmittocandidatesforaPythondeveloperroleinacompanyIusedtoworkfor.

Theproblemisthefollowing:giventhesequence01123581321...,writeafunctionthatwouldreturnthetermsofthissequenceuptosomelimit,N.

Ifyouhaven'trecognizedit,thatistheFibonaccisequence,whichisdefinedasF(0)=0,F(1)=1and,foranyn>1,F(n)=F(n-1)+F(n-2).Thissequenceisexcellenttotestknowledgeaboutrecursion,memoizationtechniques,andothertechnicaldetails,butinthiscase,itwasagoodopportunitytocheckwhetherthecandidateknewaboutgenerators.

Let'sstartfromarudimentaryversionofafunction,andthenimproveonit:

#fibonacci.first.py

deffibonacci(N):

"""ReturnallfibonaccinumbersuptoN."""

result=[0]

next_n=1

whilenext_n<=N:

result.append(next_n)

next_n=sum(result[-2:])

returnresult

print(fibonacci(0))#[0]

print(fibonacci(1))#[0,1,1]

print(fibonacci(50))#[0,1,1,2,3,5,8,13,21,34]

Fromthetop:wesetuptheresultlisttoastartingvalueof[0].Thenwestarttheiterationfromthenextelement(next_n),whichis1.WhilethenextelementisnotgreaterthanN,wekeepappendingittothelistandcalculatingthenext.Wecalculatethenextelementbytakingasliceofthelasttwoelementsintheresultlistandpassingittothesumfunction.Addsomeprintstatementshereandthereifthisisnotcleartoyou,butbynowIwouldexpectitnottobeanissue.

WhentheconditionofthewhileloopevaluatestoFalse,weexittheloopandreturnresult.Youcanseetheresultofthoseprintstatementsinthecommentsnexttoeachofthem.

Atthispoint,Iwouldaskthecandidatethefollowingquestion:WhatifIjustwantedtoiterateoverthosenumbers?Agoodcandidatewouldthenchangethecodetowhatyou'llfindhere(anexcellentcandidatewouldhavestartedwithit!):

#fibonacci.second.py

deffibonacci(N):

"""ReturnallfibonaccinumbersuptoN."""

yield0

ifN==0:

return

a=0

b=1

whileb<=N:

yieldb

a,b=b,a+b

print(list(fibonacci(0)))#[0]

print(list(fibonacci(1)))#[0,1,1]

print(list(fibonacci(50)))#[0,1,1,2,3,5,8,13,21,34]

ThisisactuallyoneofthesolutionsIwasgiven.Idon'tknowwhyIkeptit,butI'mgladIdidsoIcanshowittoyou.Now,thefibonaccifunctionisageneratorfunction.Firstweyield0,thenifNis0,wereturn(thiswillcauseaStopIterationexceptiontoberaised).Ifthat'snotthecase,westartiterating,yieldingbateveryloopcycle,andthenupdatingaandb.Allweneedinordertobeabletoproducethenextelementofthesequenceisthepasttwo:aandb,respectively.

Thiscodeismuchbetter,hasalightermemoryfootprintandallwehavetodotogetalistofFibonaccinumbersistowrapthecallwithlist(),asusual.Butwhataboutelegance?Ican'tleaveitlikethat,canI?Let'strythefollowing:

#fibonacci.elegant.py

deffibonacci(N):

"""ReturnallfibonaccinumbersuptoN."""

a,b=0,1

whilea<=N:

yielda

a,b=b,a+b

Muchbetter.Thewholebodyofthefunctionisfourlines,fiveifyoucountthedocstring.Noticehow,inthiscase,usingtupleassignment(a,b=0,1anda,b=b,a+b)helpsinmakingthecodeshorter,andmorereadable.

SummaryInthischapter,weexploredtheconceptofiterationandgenerationabitmoredeeply.Welookedatthemap,zip,andfilterfunctionsindetail,andlearnedhowtousethemasanalternativetoaregularforloopapproach.

Thenwecoveredtheconceptofcomprehensions,forlists,dictionaries,andsets.Weexploredtheirsyntaxandhowtousethemasanalternativetoboththeclassicforloopapproachandalsototheuseofthemap,zip,andfilterfunctions.

Finally,wetalkedabouttheconceptofgeneration,intwoforms:generatorfunctionsandexpressions.Welearnedhowtosavetimeandspacebyusinggenerationtechniquesandsawhowtheycanmakepossiblewhatwouldn'tnormallybeifweusedaconventionalapproachbasedonlists.

Wetalkedaboutperformance,andsawthatforloopsarelastintermsofspeed,buttheyprovidethebestreadabilityandflexibilitytochange.Ontheotherhand,functionssuchasmapandfilter,andlistcomprehensions,canbemuchfaster.

Thecomplexityofthecodewrittenusingthesetechniquesgrowsexponentiallyso,inordertofavorreadabilityandeaseofmaintainability,westillneedtousetheclassicforloopapproachattimes.Anotherdifferenceisinthenamelocalization,wheretheforloopbehavesdifferentlyfromallothertypesofcomprehensions.

Thenextchapterwillbeallaboutobjectsandclasses.Itisstructurallysimilartothisone,inthatwewon'texploremanydifferentsubjects,justafewofthem,butwe'lltrytodiveintothemalittlebitmoredeeply.

Makesureyouunderstandtheconceptsofthischapterbeforemovingontothenextone.We'rebuildingawallbrickbybrick,andifthefoundationisnotsolid,wewon'tgetveryfar.

OOP,Decorators,andIteratorsLaclassenonèacqua.(Classwillout)

–Italiansaying

Icouldprobablywriteawholebookaboutobject-orientedprogramming(OOP)andclasses.Inthischapter,I'mfacingthehardchallengeoffindingthebalancebetweenbreadthanddepth.Therearesimplytoomanythingstotell,andplentyofthemwouldtakemorethanthiswholechapterifIdescribedthemindepth.Therefore,IwilltrytogiveyouwhatIthinkisagoodpanoramicviewofthefundamentals,plusafewthingsthatmaycomeinhandyinthenextchapters.Python'sofficialdocumentationwillhelpinfillingthegaps.

Inthischapter,wearegoingtocoverthefollowingtopics:

DecoratorsOOPwithPythonIterators

DecoratorsInChapter5,SavingTimeandMemory,Imeasuredtheexecutiontimeofvariousexpressions.Ifyourecall,Ihadtoinitializeavariabletothestarttime,andsubtractitfromthecurrenttimeafterexecutioninordertocalculatetheelapsedtime.Ialsoprinteditontheconsoleaftereachmeasurement.Thatwasverytedious.

Everytimeyoufindyourselfrepeatingthings,analarmbellshouldgooff.Canyouputthatcodeinafunctionandavoidrepetition?Theanswermostofthetimeisyes,solet'slookatanexample:

#decorators/time.measure.start.py

fromtimeimportsleep,time

deff():

sleep(.3)

defg():

sleep(.5)

t=time()

f()

print('ftook:',time()-t)#ftook:0.3001396656036377

t=time()

g()

print('gtook:',time()-t)#gtook:0.5039339065551758

Intheprecedingcode,Idefinedtwofunctions,fandg,whichdonothingbutsleep(by0.3and0.5seconds,respectively).Iusedthesleepfunctiontosuspendtheexecutionforthedesiredamountoftime.Noticehowthetimemeasureisprettyaccurate.Now,howdoweavoidrepeatingthatcodeandthosecalculations?Onefirstpotentialapproachcouldbethefollowing:

#decorators/time.measure.dry.py

fromtimeimportsleep,time

deff():

sleep(.3)

defg():

sleep(.5)

defmeasure(func):

t=time()

func()

print(func.__name__,'took:',time()-t)

measure(f)#ftook:0.30434322357177734

measure(g)#gtook:0.5048270225524902

Ah,muchbetternow.Thewholetimingmechanismhasbeenencapsulatedintoafunctionsowedon'trepeatcode.Weprintthefunctionnamedynamicallyandit'seasyenoughtocode.Whatifweneedtopassargumentstothefunctionwemeasure?Thiscodewouldgetjustabitmorecomplicated,solet'sseeanexample:

#decorators/time.measure.arguments.py

fromtimeimportsleep,time

deff(sleep_time=0.1):

sleep(sleep_time)

defmeasure(func,*args,**kwargs):

t=time()

func(*args,**kwargs)

print(func.__name__,'took:',time()-t)

measure(f,sleep_time=0.3)#ftook:0.30056095123291016

measure(f,0.2)#ftook:0.2033553123474121

Now,fisexpectingtobefedsleep_time(withadefaultvalueof0.1),sowedon'tneedganymore.Ialsohadtochangethemeasurefunctionsothatitisnowacceptsafunction,anyvariablepositionalarguments,andanyvariablekeywordarguments.Inthisway,whateverwecallmeasurewith,weredirectthoseargumentstothecalltofuncwedoinside.

Thisisverygood,butwecanpushitalittlebitfurther.Let'ssaywewanttosomehowhavethattimingbehaviorbuilt-inintotheffunction,sothatwecouldjustcallitandhavethatmeasuretaken.Here'showwecoulddoit:

#decorators/time.measure.deco1.py

fromtimeimportsleep,time

deff(sleep_time=0.1):

sleep(sleep_time)

defmeasure(func):

defwrapper(*args,**kwargs):

t=time()

func(*args,**kwargs)

print(func.__name__,'took:',time()-t)

returnwrapper

f=measure(f)#decorationpoint

f(0.2)#ftook:0.20372915267944336

f(sleep_time=0.3)#ftook:0.30455899238586426

print(f.__name__)#wrapper<-ouch!

Theprecedingcodeisprobablynotsostraightforward.Let'sseewhathappenshere.Themagicisinthedecorationpoint.Webasicallyreassignfwithwhateverisreturnedbymeasurewhenwecallitwithfasanargument.Withinmeasure,wedefineanotherfunction,wrapper,andthenwereturnit.So,theneteffectisthatafterthedecorationpoint,whenwecallf,we'reactuallycallingwrapper.Sincethewrapperinsideiscallingfunc,whichisf,weareactuallyclosingthelooplikethat.Ifyoudon'tbelieveme,takealookatthelastline.

wrapperisactually...awrapper.Ittakesvariableandpositionalarguments,andcallsfwiththem.Italsodoesthetimemeasurementcalculationaroundthecall.

Thistechniqueiscalleddecoration,andmeasureis,effectively,adecorator.Thisparadigmbecamesopopularandwidelyusedthatatsomepoint,Pythonaddedaspecialsyntaxforit(checkouthttps://www.python.org/dev/peps/pep-0318/).Let'sexplorethreecases:onedecorator,twodecorators,andonedecoratorthattakesarguments:

#decorators/syntax.py

deffunc(arg1,arg2,...):

pass

func=decorator(func)

#isequivalenttothefollowing:

@decorator

deffunc(arg1,arg2,...):

pass

Basically,insteadofmanuallyreassigningthefunctiontowhatwasreturnedbythedecorator,weprependthedefinitionofthefunctionwiththespecialsyntax,@decorator_name.

Wecanapplymultipledecoratorstothesamefunctioninthefollowingway:

#decorators/syntax.py

deffunc(arg1,arg2,...):

pass

func=deco1(deco2(func))

#isequivalenttothefollowing:

@deco1

@deco2

deffunc(arg1,arg2,...):

pass

Whenapplyingmultipledecorators,payattentiontotheorder.Intheprecedingexample,funcisdecoratedwithdeco2first,andtheresultisdecoratedwithdeco1.Agoodruleofthumbis:thecloserthedecoratoristothefunction,thesooneritisapplied.

Somedecoratorscantakearguments.Thistechniqueisgenerallyusedtoproduceotherdecorators.Let'slookatthesyntax,andthenwe'llseeanexampleofit:

#decorators/syntax.py

deffunc(arg1,arg2,...):

pass

func=decoarg(arg_a,arg_b)(func)

#isequivalenttothefollowing:

@decoarg(arg_a,arg_b)

deffunc(arg1,arg2,...):

pass

Asyoucansee,thiscaseisabitdifferent.First,decoargiscalledwiththegivenarguments,andthenitsreturnvalue(theactualdecorator)iscalledwithfunc.BeforeIgiveyouanotherexample,let'sfixonethingthatisbotheringme.Idon'twanttolosetheoriginalfunctionnameanddocstring(andotherattributesaswell,checkthedocumentationforthedetails)whenIdecorateit.Butbecauseinsideourdecoratorwereturnwrapper,theoriginalattributesfromfuncarelostandfendsupbeingassignedtheattributesofwrapper.Thereisaneasyfixforthatfromthebeautifulfunctoolsmodule.Iwillfixthelastexample,andIwillalsorewriteitssyntaxtousethe@operator:

#decorators/time.measure.deco2.py

fromtimeimportsleep,time

fromfunctoolsimportwraps

defmeasure(func):

@wraps(func)

defwrapper(*args,**kwargs):

t=time()

func(*args,**kwargs)

print(func.__name__,'took:',time()-t)

returnwrapper

@measure

deff(sleep_time=0.1):

"""I'macat.Ilovetosleep!"""

sleep(sleep_time)

f(sleep_time=0.3)#ftook:0.3010902404785156

print(f.__name__,':',f.__doc__)#f:I'macat.Ilovetosleep!

Nowwe'retalking!Asyoucansee,allweneedtodoistotellPythonthatwrapperactuallywrapsfunc(bymeansofthewrapsfunction),andyoucanseethattheoriginalnameanddocstringarenowmaintained.

Let'sseeanotherexample.Iwantadecoratorthatprintsanerrormessagewhentheresultofafunctionisgreaterthanacertainthreshold.Iwillalsotakethisopportunitytoshowyouhowtoapplytwodecoratorsatonce:

#decorators/two.decorators.py

fromtimeimportsleep,time

fromfunctoolsimportwraps

defmeasure(func):

@wraps(func)

defwrapper(*args,**kwargs):

t=time()

result=func(*args,**kwargs)

print(func.__name__,'took:',time()-t)

returnresult

returnwrapper

defmax_result(func):

@wraps(func)

defwrapper(*args,**kwargs):

result=func(*args,**kwargs)

ifresult>100:

print('Resultistoobig({0}).Maxallowedis100.'

.format(result))

returnresult

returnwrapper

@measure

@max_result

defcube(n):

returnn**3

print(cube(2))

print(cube(5))

Takeyourtimeinstudyingtheprecedingexampleuntilyouaresureyouunderstanditwell.Ifyoudo,Idon'tthinkthereisanydecoratoryounowwon'tbeabletowrite.

Ihadtoenhancethemeasuredecorator,sothatitswrappernowreturnstheresultofthecalltofunc.Themax_resultdecoratordoesthataswell,butbeforereturning,itchecksthatresultisnotgreaterthan100,whichisthemaximumallowed.Idecoratedcubewithbothofthem.First,max_resultisapplied,thenmeasure.Runningthiscodeyieldsthisresult:

$pythontwo.decorators.py

cubetook:3.0994415283203125e-06

8

Resultistoobig(125).Maxallowedis100.

cubetook:1.0013580322265625e-05

125

Foryourconvenience,Ihaveseparatedtheresultsofthetwocallswithablankline.Inthefirstcall,theresultis8,whichpassesthethresholdcheck.Therunningtimeismeasuredandprinted.Finally,weprinttheresult(8).

Onthesecondcall,theresultis125,sotheerrormessageisprinted,theresultreturned,andthenit'stheturnofmeasure,whichprintstherunningtimeagain,andfinally,weprinttheresult(125).

HadIdecoratedthecubefunctionwiththesametwodecoratorsbutinadifferentorder,theerrormessagewouldhavefollowedthelinethatprintstherunningtime,insteadofhaveprecededit.

AdecoratorfactoryLet'ssimplifythisexamplenow,goingbacktoasingledecorator:max_result.IwanttomakeitsothatIcandecoratedifferentfunctionswithdifferentthresholds,Idon'twanttowriteonedecoratorforeachthreshold.Let'samendmax_resultsothatitallowsustodecoratefunctionsspecifyingthethresholddynamically:

#decorators/decorators.factory.py

fromfunctoolsimportwraps

defmax_result(threshold):

defdecorator(func):

@wraps(func)

defwrapper(*args,**kwargs):

result=func(*args,**kwargs)

ifresult>threshold:

print(

'Resultistoobig({0}).Maxallowedis{1}.'

.format(result,threshold))

returnresult

returnwrapper

returndecorator

@max_result(75)

defcube(n):

returnn**3

print(cube(5))

Theprecedingcodeshowsyouhowtowriteadecoratorfactory.Ifyourecall,decoratingafunctionwithadecoratorthattakesargumentsisthesameaswritingfunc=decorator(argA,argB)(func),sowhenwedecoratecubewithmax_result(75),we'redoingcube=max_result(75)(cube).

Let'sgothroughwhathappens,stepbystep.Whenwecallmax_result(75),weenteritsbody.Adecoratorfunctionisdefinedinside,whichtakesafunctionasitsonlyargument.Insidethatfunction,theusualdecoratortrickisperformed.Wedefinewrapper,insideofwhichwechecktheresultoftheoriginalfunction'scall.Thebeautyofthisapproachisthatfromtheinnermostlevel,wecanstillrefertoasbothfuncandthreshold,whichallowsustosetthethresholddynamically.

wrapperreturnsresult,decoratorreturnswrapper,andmax_resultreturnsdecorator.Thismeansthatourcube=max_result(75)(cube)callactuallybecomescube=

decorator(cube).Notjustanydecoratorthough,butoneforwhichthresholdhasavalueof75.Thisisachievedbyamechanismcalledclosure,whichisoutsideofthescopeofthischapterbutstillveryinteresting,soImentioneditforyoutodosomeresearchonit.

Runningthelastexampleproducesthefollowingresult:

$pythondecorators.factory.py

Resultistoobig(125).Maxallowedis75.

125

Theprecedingcodeallowsmetousethemax_resultdecoratorwithdifferentthresholdsatmyownwill,likethis:

#decorators/decorators.factory.py

@max_result(75)

defcube(n):

returnn**3

@max_result(100)

defsquare(n):

returnn**2

@max_result(1000)

defmultiply(a,b):

returna*b

Notethateverydecorationusesadifferentthresholdvalue.

DecoratorsareverypopularinPython.Theyareusedquiteoftenandtheysimplify(andbeautify,Idaresay)thecodealot.

Object-orientedprogramming(OOP)It'sbeenquitealongandhopefullynicejourneyand,bynow,weshouldbereadytoexploreOOP.I'llusethedefinitionfromKindler,E.;Krivy,I.(2011).Object-orientedsimulationofsystemswithsophisticatedcontrolbyInternationalJournalofGeneralSystems,andadaptittoPython:

Object-orientedprogramming(OOP)isaprogrammingparadigmbasedontheconceptof"objects",whicharedatastructuresthatcontaindata,intheformofattributes,andcode,intheformoffunctionsknownasmethods.Adistinguishingfeatureofobjectsisthatanobject'smethodcanaccessandoftenmodifythedataattributesoftheobjectwithwhichtheyareassociated(objectshaveanotionof"self").InOOprogramming,computerprogramsaredesignedbymakingthemoutofobjectsthatinteractwithoneanother.

Pythonhasfullsupportforthisparadigm.Actually,aswehavealreadysaid,everythinginPythonisanobject,sothisshowsthatOOPisnotjustsupportedbyPython,butit'sapartofitsverycore.

ThetwomainplayersinOOPareobjectsandclasses.Classesareusedtocreateobjects(objectsareinstancesoftheclassesfromwhichtheywerecreated),sowecouldseethemasinstancefactories.Whenobjectsarecreatedbyaclass,theyinherittheclassattributesandmethods.Theyrepresentconcreteitemsintheprogram'sdomain.

ThesimplestPythonclassIwillstartwiththesimplestclassyoucouldeverwriteinPython:

#oop/simplest.class.py

classSimplest():#whenempty,thebracesareoptional

pass

print(type(Simplest))#whattypeisthisobject?

simp=Simplest()#wecreateaninstanceofSimplest:simp

print(type(simp))#whattypeissimp?

#issimpaninstanceofSimplest?

print(type(simp)==Simplest)#There'sabetterwayforthis

Let'sruntheprecedingcodeandexplainitlinebyline:

$pythonsimplest.class.py

<class'type'>

<class'__main__.Simplest'>

True

TheSimplestclassIdefinedhasonlythepassinstructioninitsbody,whichmeansitdoesn'thaveanycustomattributesormethods.Bracketsafterthenameareoptionalifempty.Iwillprintitstype(__main__isthenameofthescopeinwhichtop-levelcodeexecutes),andIamawarethat,inthecomment,Iwroteobjectinsteadofclass.Itturnsoutthat,asyoucanseebytheresultofthatprint,classesareactuallyobjects.Tobeprecise,theyareinstancesoftype.Explainingthisconceptwouldleadustoatalkaboutmetaclassesandmetaprogramming,advancedconceptsthatrequireasolidgraspofthefundamentalstobeunderstoodandarebeyondthescopeofthischapter.Asusual,Imentionedittoleaveapointerforyou,forwhenyou'llbereadytodigdeeper.

Let'sgobacktotheexample:IusedSimplesttocreateaninstance,simp.Youcanseethatthesyntaxtocreateaninstanceisthesameasweusetocallafunction.ThenweprintwhattypesimpbelongstoandweverifythatsimpisinfactaninstanceofSimplest.I'llshowyouabetterwayofdoingthislateroninthechapter.

Uptonow,it'sallverysimple.WhathappenswhenwewriteclassClassName():pass,though?Well,whatPythondoesiscreateaclassobjectandassignitaname.Thisisverysimilartowhathappenswhenwedeclareafunctionusingdef.

ClassandobjectnamespacesAftertheclassobjecthasbeencreated(whichusuallyhappenswhenthemoduleisfirstimported),itbasicallyrepresentsanamespace.Wecancallthatclasstocreateitsinstances.Eachinstanceinheritstheclassattributesandmethodsandisgivenitsownnamespace.Wealreadyknowthat,towalkanamespace,allweneedtodoistousethedot(.)operator.

Let'slookatanotherexample:

#oop/class.namespaces.py

classPerson:

species='Human'

print(Person.species)#Human

Person.alive=True#Addeddynamically!

print(Person.alive)#True

man=Person()

print(man.species)#Human(inherited)

print(man.alive)#True(inherited)

Person.alive=False

print(man.alive)#False(inherited)

man.name='Darth'

man.surname='Vader'

print(man.name,man.surname)#DarthVader

Intheprecedingexample,Ihavedefinedaclassattributecalledspecies.Anyvariabledefinedinthebodyofaclassisanattributethatbelongstothatclass.Inthecode,IhavealsodefinedPerson.alive,whichisanotherclassattribute.Youcanseethatthereisnorestrictiononaccessingthatattributefromtheclass.Youcanseethatman,whichisaninstanceofPerson,inheritsbothofthem,andreflectstheminstantlywhentheychange.

manhasalsotwoattributesthatbelongtoitsownnamespaceandthereforearecalledinstanceattributes:nameandsurname.

Classattributesaresharedamongallinstances,whileinstanceattributesarenot;therefore,youshoulduseclassattributestoprovidethestatesandbehaviorstobesharedbyallinstances,anduseinstanceattributesfordatathatbelongsjusttoonespecificobject.

AttributeshadowingWhenyousearchforanattributeinanobject,ifitisnotfound,Pythonkeepssearchingintheclassthatwasusedtocreatethatobject(andkeepssearchinguntilit'seitherfoundortheendoftheinheritancechainisreached).Thisleadstoaninterestingshadowingbehavior.Let'slookatanotherexample:

#oop/class.attribute.shadowing.py

classPoint:

x=10

y=7

p=Point()

print(p.x)#10(fromclassattribute)

print(p.y)#7(fromclassattribute)

p.x=12#pgetsitsown`x`attribute

print(p.x)#12(nowfoundontheinstance)

print(Point.x)#10(classattributestillthesame)

delp.x#wedeleteinstanceattribute

print(p.x)#10(nowsearchhastogoagaintofindclassattr)

p.z=3#let'smakeita3Dpoint

print(p.z)#3

print(Point.z)

#AttributeError:typeobject'Point'hasnoattribute'z'

Theprecedingcodeisveryinteresting.WehavedefinedaclasscalledPointwithtwoclassattributes,xandy.Whenwecreateaninstance,p,youcanseethatwecanprintbothxandyfromthepnamespace(p.xandp.y).WhathappenswhenwedothatisPythondoesn'tfindanyxoryattributesontheinstance,andthereforesearchestheclass,andfindsthemthere.

Thenwegivepitsownxattributebyassigningp.x=12.Thisbehaviormayappearabitweirdatfirst,butifyouthinkaboutit,it'sexactlythesameaswhathappensinafunctionthatdeclaresx=12whenthereisaglobalx=10outside.Weknowthatx=12won'taffecttheglobalone,andforclassesandinstances,itisexactlythesame.

Afterassigningp.x=12,whenweprintit,thesearchdoesn'tneedtoreadtheclassattributes,becausexisfoundontheinstance,thereforeweget12printedout.WealsoprintPoint.x,whichreferstoxintheclassnamespace.

Andthen,wedeletexfromthenamespaceofp,whichmeansthat,onthenextline,whenweprintitagain,Pythonwillgoagainandsearchforitintheclass,becauseitwon'tbefoundintheinstanceanymore.

Thelastthreelinesshowyouthatassigningattributestoaninstancedoesn'tmeanthattheywillbefoundintheclass.Instancesgetwhateverisintheclass,buttheoppositeisnottrue.

Whatdoyouthinkaboutputtingthexandycoordinatesasclassattributes?Doyouthinkitwasagoodidea?WhatifyouaddedanotherinstanceofPoint?Wouldthathelptoshowwhyclassattributescanbeveryuseful?

Me,myself,andI–usingtheselfvariableFromwithinaclassmethod,wecanrefertoaninstancebymeansofaspecialargument,calledselfbyconvention.selfisalwaysthefirstattributeofaninstancemethod.Let'sexaminethisbehaviortogetherwithhowwecanshare,notjustattributes,butmethodswithallinstances:

#oop/class.self.py

classSquare:

side=8

defarea(self):#selfisareferencetoaninstance

returnself.side**2

sq=Square()

print(sq.area())#64(sideisfoundontheclass)

print(Square.area(sq))#64(equivalenttosq.area())

sq.side=10

print(sq.area())#100(sideisfoundontheinstance)

Notehowtheareamethodisusedbysq.Thetwocalls,Square.area(sq)andsq.area(),areequivalent,andteachushowthemechanismworks.Eitheryoupasstheinstancetothemethodcall(Square.area(sq)),whichwithinthemethodwilltakethenameself,oryoucanuseamorecomfortablesyntax,sq.area(),andPythonwilltranslatethatforyoubehindthescenes.

Let'slookatabetterexample:

#oop/class.price.py

classPrice:

deffinal_price(self,vat,discount=0):

"""Returnspriceafterapplyingvatandfixeddiscount."""

return(self.net_price*(100+vat)/100)-discount

p1=Price()

p1.net_price=100

print(Price.final_price(p1,20,10))#110(100*1.2-10)

print(p1.final_price(20,10))#equivalent

Theprecedingcodeshowsyouthatnothingpreventsusfromusingargumentswhendeclaringmethods.Wecanusetheexactsamesyntaxasweusedwiththefunction,butweneedtorememberthatthefirstargumentwillalwaysbetheinstance.Wedon'tneedtonecessarilycallitself,butit'stheconvention,andthis

isoneofthefewcaseswhereit'sveryimportanttoabidebyit.

Initializinganinstance

Haveyounoticedhow,beforecallingp1.final_price(...),wehadtoassignnet_pricetop1?Thereisabetterwaytodoit.Inotherlanguages,thiswouldbecalledaconstructor,butinPython,it'snot.Itisactuallyaninitializer,sinceitworksonanalready-createdinstance,andthereforeit'scalled__init__.It'samagicmethod,whichisrunrightaftertheobjectiscreated.Pythonobjectsalsohavea__new__method,whichistheactualconstructor.Inpractice,it'snotsocommontohavetooverrideitthough,it'sapracticethatismostlyusedwhencodingmetaclasses,whichaswementioned,isafairlyadvancedtopicthatwewon'texploreinthebook:

#oop/class.init.py

classRectangle:

def__init__(self,side_a,side_b):

self.side_a=side_a

self.side_b=side_b

defarea(self):

returnself.side_a*self.side_b

r1=Rectangle(10,4)

print(r1.side_a,r1.side_b)#104

print(r1.area())#40

r2=Rectangle(7,3)

print(r2.area())#21

Thingsarefinallystartingtotakeshape.Whenanobjectiscreated,the__init__methodisautomaticallyrunforus.Inthiscase,Icodeditsothatwhenwecreateanobject(bycallingtheclassnamelikeafunction),wepassargumentstothecreationcall,likewewouldonanyregularfunctioncall.Thewaywepassparametersfollowsthesignatureofthe__init__method,andtherefore,inthetwocreationstatements,10and7willbeside_aforr1andr2,respectively,while4and3willbeside_b.Youcanseethatthecalltoarea()fromr1andr2reflectsthattheyhavedifferentinstancearguments.Settingupobjectsinthiswayismuchnicerandmoreconvenient.

OOPisaboutcodereuseBynowitshouldbeprettyclear:OOPisallaboutcodereuse.Wedefineaclass,wecreateinstances,andthoseinstancesusemethodsthataredefinedonlyintheclass.Theywillbehavedifferentlyaccordingtohowtheinstanceshavebeensetupbytheinitializer.

InheritanceandcompositionButthisisjusthalfofthestory,OOPismuchmorepowerful.Wehavetwomaindesignconstructstoexploit:inheritanceandcomposition.

InheritancemeansthattwoobjectsarerelatedbymeansofanIs-Atypeofrelationship.Ontheotherhand,compositionmeansthattwoobjectsarerelatedbymeansofaHas-Atypeofrelationship.It'sallveryeasytoexplainwithanexample:

#oop/class_inheritance.py

classEngine:

defstart(self):

pass

defstop(self):

pass

classElectricEngine(Engine):#Is-AEngine

pass

classV8Engine(Engine):#Is-AEngine

pass

classCar:

engine_cls=Engine

def__init__(self):

self.engine=self.engine_cls()#Has-AEngine

defstart(self):

print(

'Startingengine{0}forcar{1}...Wroom,wroom!'

.format(

self.engine.__class__.__name__,

self.__class__.__name__)

)

self.engine.start()

defstop(self):

self.engine.stop()

classRaceCar(Car):#Is-ACar

engine_cls=V8Engine

classCityCar(Car):#Is-ACar

engine_cls=ElectricEngine

classF1Car(RaceCar):#Is-ARaceCarandalsoIs-ACar

pass#engine_clssameasparent

car=Car()

racecar=RaceCar()

citycar=CityCar()

f1car=F1Car()

cars=[car,racecar,citycar,f1car]

forcarincars:

car.start()

"""Prints:

StartingengineEngineforcarCar...Wroom,wroom!

StartingengineV8EngineforcarRaceCar...Wroom,wroom!

StartingengineElectricEngineforcarCityCar...Wroom,wroom!

StartingengineV8EngineforcarF1Car...Wroom,wroom!

"""

TheprecedingexampleshowsyouboththeIs-AandHas-Atypesofrelationshipsbetweenobjects.Firstofall,let'sconsiderEngine.It'sasimpleclassthathastwomethods,startandstop.WethendefineElectricEngineandV8Engine,whichbothinheritfromEngine.Youcanseethatbythefactthatwhenwedefinethem,weputEnginewithinthebracketsaftertheclassname.

ThismeansthatbothElectricEngineandV8EngineinheritattributesandmethodsfromtheEngineclass,whichissaidtobetheirbaseclass.

Thesamehappenswithcars.CarisabaseclassforbothRaceCarandCityCar.RaceCarisalsothebaseclassforF1Car.AnotherwayofsayingthisisthatF1CarinheritsfromRaceCar,whichinheritsfromCar.Therefore,F1CarIs-ARaceCarandRaceCarIs-ACar.Becauseofthetransitiveproperty,wecansaythatF1CarIs-ACaraswell.CityCartoo,Is-ACar.

WhenwedefineclassA(B):pass,wesayAisthechildofB,andBistheparentofA.Theparentandbaseclassesaresynonyms,arechildandderived.Also,wesaythataclassinheritsfromanotherclass,orthatitextendsit.

Thisistheinheritancemechanism.

Ontheotherhand,let'sgobacktothecode.Eachclasshasaclassattribute,engine_cls,whichisareferencetotheengineclasswewanttoassigntoeachtypeofcar.CarhasagenericEngine,whilethetworacecarshaveapowerfulV8engine,andthecitycarhasanelectricone.

Whenacariscreatedintheinitializermethod,__init__,wecreateaninstanceofwhateverengineclassisassignedtothecar,andsetitastheengineinstanceattribute.

Itmakessensetohaveengine_clssharedamongallclassinstancesbecauseit'squitelikelythatthesameinstancesofacarwillhavethesamekindofengine.Ontheotherhand,itwouldn'tbegoodtohaveonesingleengine(aninstanceofanyEngineclass)asaclassattribute,becausewewouldbesharingoneengineamongallinstances,whichisincorrect.

ThetypeofrelationshipbetweenacaranditsengineisaHas-Atype.AcarHas-Aengine.Thisiscalledcomposition,andreflectsthefactthatobjectscanbemadeofmanyotherobjects.AcarHas-Aengine,gears,wheels,aframe,doors,seats,andsoon.

WhendesigningOOPcode,itisofvitalimportancetodescribeobjectsinthiswaysothatwecanuseinheritanceandcompositioncorrectlytostructureourcodeinthebestway.

NoticehowIhadtoavoidhavingdotsintheclass_inheritance.pyscriptname,asdotsinmodulenamesmakeitimportsdifficult.Mostmodulesinthesourcecodeofthebookaremeanttoberunasstandalonescripts,thereforeIchosetoadddotstoenhancereadabilitywhenpossible,butingeneral,youwanttoavoiddotsinyourmodulenames.

Beforeweleavethisparagraph,let'scheckwhetherItoldyouthetruthwithanotherexample:

#oop/class.issubclass.isinstance.py

fromclass_inheritanceimportCar,RaceCar,F1Car

car=Car()

racecar=RaceCar()

f1car=F1Car()

cars=[(car,'car'),(racecar,'racecar'),(f1car,'f1car')]

car_classes=[Car,RaceCar,F1Car]

forcar,car_nameincars:

forclass_incar_classes:

belongs=isinstance(car,class_)

msg='isa'ifbelongselse'isnota'

print(car_name,msg,class_.__name__)

"""Prints:

carisaCar

carisnotaRaceCar

carisnotaF1Car

racecarisaCar

racecarisaRaceCar

racecarisnotaF1Car

f1carisaCar

f1carisaRaceCar

f1carisaF1Car

"""

Asyoucansee,carisjustaninstanceofCar,whileracecarisaninstanceofRaceCar(andofCar,byextension)andf1carisaninstanceofF1Car(andofbothRaceCarandCar,byextension).Abananaisaninstanceofbanana.But,also,itisaFruit.Also,itisFood,right?Thisisthesameconcept.Tocheckwhetheranobjectisaninstanceofaclass,usetheisinstancemethod.Itisrecommendedoversheertypecomparison:(type(object)==Class).

NoticeIhaveleftouttheprintsyougetwheninstantiatingthecars.Wesawtheminthepreviousexample.

Let'salsocheckinheritance–samesetup,differentlogicintheforloops:

#oop/class.issubclass.isinstance.py

forclass1incar_classes:

forclass2incar_classes:

is_subclass=issubclass(class1,class2)

msg='{0}asubclassof'.format(

'is'ifis_subclasselse'isnot')

print(class1.__name__,msg,class2.__name__)

"""Prints:

CarisasubclassofCar

CarisnotasubclassofRaceCar

CarisnotasubclassofF1Car

RaceCarisasubclassofCar

RaceCarisasubclassofRaceCar

RaceCarisnotasubclassofF1Car

F1CarisasubclassofCar

F1CarisasubclassofRaceCar

F1CarisasubclassofF1Car

"""

Interestingly,welearnthataclassisasubclassofitself.ChecktheoutputoftheprecedingexampletoseethatitmatchestheexplanationIprovided.

OnethingtonoticeaboutconventionsisthatclassnamesarealwayswrittenusingCapWords,whichmeansThisWayIsCorrect,asopposedtofunctionsandmethods,whicharewrittenthis_way_is_correct.Also,wheninthecode,youwanttouseanamethatisaPython-reservedkeywordorabuilt-infunctionorclass,theconventionistoaddatrailingunderscoretothename.Inthefirstforloopexample,I'mloopingthroughtheclassnamesusingforclass_in...,becauseclassisareservedword.ButyoualreadyknewallthisbecauseyouhavethoroughlystudiedPEP8,right?

TohelpyoupicturethedifferencebetweenIs-AandHas-A,takealookatthefollowingdiagram:

AccessingabaseclassWe'vealreadyseenclassdeclarations,suchasclassClassA:passandclassClassB(BaseClassName):pass.Whenwedon'tspecifyabaseclassexplicitly,Pythonwillsetthespecialobjectclassasthebaseclassfortheonewe'redefining.Ultimately,allclassesderivefromanobject.Notethat,ifyoudon'tspecifyabaseclass,bracketsareoptional.

Therefore,writingclassA:passorclassA():passorclassA(object):passisexactlythesamething.TheobjectclassisaspecialclassinthatithasthemethodsthatarecommontoallPythonclasses,anditdoesn'tallowyoutosetanyattributesonit.

Let'sseehowwecanaccessabaseclassfromwithinaclass:

#oop/super.duplication.py

classBook:

def__init__(self,title,publisher,pages):

self.title=title

self.publisher=publisher

self.pages=pages

classEbook(Book):

def__init__(self,title,publisher,pages,format_):

self.title=title

self.publisher=publisher

self.pages=pages

self.format_=format_

Takealookattheprecedingcode.ThreeoftheinputparametersareduplicatedinEbook.Thisisquitebadpracticebecausewenowhavetwosetsofinstructionsthataredoingthesamething.Moreover,anychangeinthesignatureofBook.__init__willnotbereflectedinEbook.WeknowthatEbookIs-ABook,andthereforewewouldprobablywantchangestobereflectedinthechildrenclasses.

Let'sseeonewaytofixthisissue:

#oop/super.explicit.py

classBook:

def__init__(self,title,publisher,pages):

self.title=title

self.publisher=publisher

self.pages=pages

classEbook(Book):

def__init__(self,title,publisher,pages,format_):

Book.__init__(self,title,publisher,pages)

self.format_=format_

ebook=Ebook(

'LearnPythonProgramming','PacktPublishing',500,'PDF')

print(ebook.title)#LearnPythonProgramming

print(ebook.publisher)#PacktPublishing

print(ebook.pages)#500

print(ebook.format_)#PDF

Now,that'sbetter.Wehaveremovedthatnastyduplication.Basically,wetellPythontocallthe__init__methodoftheBookclass,andwefeedselftothecall,makingsurethatwebindthatcalltothepresentinstance.

Ifwemodifythelogicwithinthe__init__methodofBook,wedon'tneedtotouchEbook,itwillauto-adapttothechange.

Thisapproachisgood,butwecanstilldoabitbetter.SaythatwechangethenameofBooktoLiber,becausewe'vefalleninlovewithLatin.Wehavetochangethe__init__methodofEbooktoreflectthechange.Thiscanbeavoidedbyusingsuper:

#oop/super.implicit.py

classBook:

def__init__(self,title,publisher,pages):

self.title=title

self.publisher=publisher

self.pages=pages

classEbook(Book):

def__init__(self,title,publisher,pages,format_):

super().__init__(title,publisher,pages)

#Anotherwaytodothesamethingis:

#super(Ebook,self).__init__(title,publisher,pages)

self.format_=format_

ebook=Ebook(

'LearnPythonProgramming','PacktPublishing',500,'PDF')

print(ebook.title)#LearnPythonProgramming

print(ebook.publisher)#PacktPublishing

print(ebook.pages)#500

print(ebook.format_)#PDF

superisafunctionthatreturnsaproxyobjectthatdelegatesmethodcallstoaparentorsiblingclass.Inthiscase,itwilldelegatethatcallto__init__totheBookclass,andthebeautyofthismethodisthatnowwe'reevenfreetochangeBooktoLiberwithouthavingtotouchthelogicinthe__init__methodofEbook.

Nowthatweknowhowtoaccessabaseclassfromachild,let'sexplorePython'smultipleinheritance.

MultipleinheritanceApartfromcomposingaclassusingmorethanonebaseclass,whatisofinteresthereishowanattributesearchisperformed.Takealookatthefollowingdiagram:

Asyoucansee,ShapeandPlotteractasbaseclassesforalltheothers.Polygoninheritsdirectlyfromthem,RegularPolygoninheritsfromPolygon,andbothRegularHexagonandSquareinheritfromRegulaPolygon.NotealsothatShapeandPlotterimplicitlyinheritfromobject,thereforewehavewhatiscalledadiamondor,insimplerterms,morethanonepathtoreachabaseclass.We'llseewhythismattersinafewmoments.Let'stranslateitintosomesimplecode:

#oop/multiple.inheritance.py

classShape:

geometric_type='GenericShape'

defarea(self):#Thisactsasplaceholderfortheinterface

raiseNotImplementedError

defget_geometric_type(self):

returnself.geometric_type

classPlotter:

defplot(self,ratio,topleft):

#Imaginesomeniceplottinglogichere...

print('Plottingat{},ratio{}.'.format(

topleft,ratio))

classPolygon(Shape,Plotter):#baseclassforpolygons

geometric_type='Polygon'

classRegularPolygon(Polygon):#Is-APolygon

geometric_type='RegularPolygon'

def__init__(self,side):

self.side=side

classRegularHexagon(RegularPolygon):#Is-ARegularPolygon

geometric_type='RegularHexagon'

defarea(self):

return1.5*(3**.5*self.side**2)

classSquare(RegularPolygon):#Is-ARegularPolygon

geometric_type='Square'

defarea(self):

returnself.side*self.side

hexagon=RegularHexagon(10)

print(hexagon.area())#259.8076211353316

print(hexagon.get_geometric_type())#RegularHexagon

hexagon.plot(0.8,(75,77))#Plottingat(75,77),ratio0.8.

square=Square(12)

print(square.area())#144

print(square.get_geometric_type())#Square

square.plot(0.93,(74,75))#Plottingat(74,75),ratio0.93.

Takealookattheprecedingcode:theShapeclasshasoneattribute,geometric_type,andtwomethods:areaandget_geometric_type.It'squitecommontousebaseclasses(suchasShape,inourexample)todefineaninterface–methodsforwhichchildrenmustprovideanimplementation.Therearedifferentandbetterwaystodothis,butIwanttokeepthisexampleassimpleaspossible.

WealsohavethePlotterclass,whichaddstheplotmethod,therebyprovidingplottingcapabilitiesforanyclassthatinheritsfromit.Ofcourse,theplotimplementationisjustadummyprintinthisexample.ThefirstinterestingclassisPolygon,whichinheritsfrombothShapeandPlotter.

Therearemanytypesofpolygons,oneofwhichistheregularone,whichisbothequiangular(allanglesareequal)andequilateral(allsidesareequal),sowecreatetheRegularPolygonclassthatinheritsfromPolygon.Foraregularpolygon,whereallsidesareequal,wecanimplementasimple__init__methodonRegularPolygon,whichtakesthelengthoftheside.Finally,wecreatetheRegularHexagonandSquareclasses,whichbothinheritfromRegularPolygon.

Thisstructureisquitelong,buthopefullygivesyouanideaofhowtospecializetheclassificationofyourobjectswhenyoudesignthecode.

Now,pleasetakealookatthelasteightlines.NotethatwhenIcalltheareamethodonhexagonandsquare,Igetthecorrectareaforboth.Thisisbecausetheybothprovidethecorrectimplementationlogicforit.Also,Icancall

get_geometric_typeonbothofthem,eventhoughitisnotdefinedontheirclasses,andPythonhastogoallthewayuptoShapetofindanimplementationforit.Notethat,eventhoughtheimplementationisprovidedintheShapeclass,theself.geometric_typeusedforthereturnvalueiscorrectlytakenfromthecallerinstance.

Theplotmethodcallsarealsointeresting,andshowyouhowyoucanenrichyourobjectswithcapabilitiestheywouldn'totherwisehave.ThistechniqueisverypopularinwebframeworkssuchasDjango(whichwe'llexploreChapter14,WebDevelopment),whichprovidesspecialclassescalledmixins,whosecapabilitiesyoucanjustuseoutofthebox.Allyouhavetodoistodefinethedesiredmixinasonethebaseclassesforyourown,andthat'sit.

Multipleinheritanceispowerful,butcanalsogetreallymessy,soweneedtomakesureweunderstandwhathappenswhenweuseit.

MethodresolutionorderBynow,weknowthatwhenyouaskforsomeobject.attributeandattributeisnotfoundonthatobject,Pythonstartssearchingintheclassthatsomeobjectwascreatedfrom.Ifit'snotthereeither,Pythonsearchesuptheinheritancechainuntileitherattributeisfoundortheobjectclassisreached.Thisisquitesimpletounderstandiftheinheritancechainisonlycomposedofsingle-inheritancesteps,whichmeansthatclasseshaveonlyoneparent.However,whenmultipleinheritanceisinvolved,therearecaseswhenit'snotstraightforwardtopredictwhatwillbethenextclassthatwillbesearchedforifanattributeisnotfound.

Pythonprovidesawaytoalwaysknowtheorderinwhichclassesaresearchedonattributelookup:theMethodResolutionOrder(MRO).

TheMROistheorderinwhichbaseclassesaresearchedforamemberduringlookup.Fromversion2.3,PythonusesanalgorithmcalledC3,whichguaranteesmonotonicity.InPython2.2,new-styleclasseswereintroduced.Thewayyouwriteanew-styleclassinPython2.*istodefineitwithanexplicitobjectbaseclass.ClassicclasseswerenotexplicitlyinheritingfromobjectandhavebeenremovedinPython3.Oneofthedifferencesbetweenclassicandnew-styleclassesinPython2.*isthatnew-styleclassesaresearchedwiththenewMRO.

Withregardstothepreviousexample,let'sseetheMROfortheSquareclass:

#oop/multiple.inheritance.py

print(square.__class__.__mro__)

#prints:

#(<class'__main__.Square'>,<class'__main__.RegularPolygon'>,

#<class'__main__.Polygon'>,<class'__main__.Shape'>,

#<class'__main__.Plotter'>,<class'object'>)

TogettotheMROofaclass,wecangofromtheinstancetoits__class__attribute,andfromthattoits__mro__attribute.Alternatively,wecouldhavecalledSquare.__mro__,orSquare.mro()directly,butifyouhavetodoitdynamically,it'smorelikelyyouwillhaveanobjectthanaclass.

NotethattheonlypointofdoubtisthebisectionafterPolygon,wheretheinheritancechainbreaksintotwoways:oneleadstoShapeandtheothertoPlotter.WeknowbyscanningtheMROfortheSquareclassthatShapeissearchedbeforePlotter.

Whyisthisimportant?Well,considerthefollowingcode:

#oop/mro.simple.py

classA:

label='a'

classB(A):

label='b'

classC(A):

label='c'

classD(B,C):

pass

d=D()

print(d.label)#Hypotheticallythiscouldbeeither'b'or'c'

BothBandCinheritfromA,andDinheritsfrombothBandC.Thismeansthatthelookupforthelabelattributecanreachthetop(A)througheitherBorC.Accordingtowhichisreachedfirst,wegetadifferentresult.

So,intheprecedingexample,weget'b',whichiswhatwewereexpecting,sinceBistheleftmostoneamongthebaseclassesofD.ButwhathappensifIremovethelabelattributefromB?Thiswouldbeaconfusingsituation:willthealgorithmgoallthewayuptoAorwillitgettoCfirst?Let'sfindout:

#oop/mro.py

classA:

label='a'

classB(A):

pass#was:label='b'

classC(A):

label='c'

classD(B,C):

pass

d=D()

print(d.label)#'c'

print(d.__class__.mro())#noticeanotherwaytogettheMRO

#prints:

#[<class'__main__.D'>,<class'__main__.B'>,

#<class'__main__.C'>,<class'__main__.A'>,<class'object'>]

So,welearnthattheMROisD-B-C-A-object,whichmeanswhenweaskford.label,weget'c',whichiscorrect.

Inday-to-dayprogramming,itisnotcommontohavetodealwiththeMRO,butthefirsttimeyoufightagainstsomemixinfromaframework,Ipromiseyou'll

begladIspentaparagraphexplainingit.

ClassandstaticmethodsSofar,wehavecodedclasseswithattributesintheformofdataandinstancemethods,buttherearetwoothertypesofmethodsthatwecanplaceinsideaclass:staticmethodsandclassmethods.

StaticmethodsAsyoumayrecall,whenyoucreateaclassobject,Pythonassignsanametoit.Thatnameactsasanamespace,andsometimesitmakessensetogroupfunctionalitiesunderit.Staticmethodsareperfectforthisusecasesince,unlikeinstancemethods,theyarenotpassedanyspecialargument.Let'slookatanexampleofanimaginaryStringUtilclass:

#oop/static.methods.py

classStringUtil:

@staticmethod

defis_palindrome(s,case_insensitive=True):

#weallowonlylettersandnumbers

s=''.join(cforcinsifc.isalnum())#Studythis!

#Forcaseinsensitivecomparison,welower-cases

ifcase_insensitive:

s=s.lower()

forcinrange(len(s)//2):

ifs[c]!=s[-c-1]:

returnFalse

returnTrue

@staticmethod

defget_unique_words(sentence):

returnset(sentence.split())

print(StringUtil.is_palindrome(

'Radar',case_insensitive=False))#False:CaseSensitive

print(StringUtil.is_palindrome('Anutforajaroftuna'))#True

print(StringUtil.is_palindrome('NeverOdd,OrEven!'))#True

print(StringUtil.is_palindrome(

'InGirumImusNocteEtConsumimurIgni')#Latin!Show-off!

)#True

print(StringUtil.get_unique_words(

'Ilovepalindromes.Ireallyreallylovethem!'))

#{'them!','really','palindromes.','I','love'}

Theprecedingcodeisquiteinteresting.Firstofall,welearnthatstaticmethodsarecreatedbysimplyapplyingthestaticmethoddecoratortothem.Youcanseethattheyaren'tpassedanyspecialargumentso,apartfromthedecoration,theyreallyjustlooklikefunctions.

Wehaveaclass,StringUtil,thatactsasacontainerforfunctions.Anotherapproachwouldbetohaveaseparatemodulewithfunctionsinside.It'sreallyamatterofpreferencemostofthetime.

Thelogicinsideis_palindromeshouldbestraightforwardforyoutounderstandbynow,but,justincase,let'sgothroughit.First,weremoveallcharactersfromsthatareneitherlettersnornumbers.Inordertodothis,weusethejoinmethodofastringobject(anemptystringobject,inthiscase).Bycallingjoinonanemptystring,theresultisthatallelementsintheiterableyoupasstojoinwillbeconcatenatedtogether.Wefeedjoinageneratorexpressionthatsaystotakeanycharacterfromsifthecharacteriseitheralphanumericoranumber.Thisisbecause,inpalindromesentences,wewanttodiscardanythingthatisnotacharacteroranumber.

Wethenlowercasesifcase_insensitiveisTrue,andthenweproceedtocheckwhetheritisapalindrome.Inordertodothis,wecomparethefirstandlastcharacters,thenthesecondandthesecondtolast,andsoon.Ifatanypointwefindadifference,itmeansthestringisn'tapalindromeandthereforewecanreturnFalse.Ontheotherhand,ifweexittheforloopnormally,itmeansnodifferenceswerefound,andwecanthereforesaythestringisapalindrome.

Noticethatthiscodeworkscorrectlyregardlessofthelengthofthestring;thatis,ifthelengthisoddoreven.len(s)//2reacheshalfofs,andifsisanoddamountofcharacterslong,themiddleonewon'tbechecked(suchasinRaDaR,Disnotchecked),butwedon'tcare;itwouldbecomparedwithitselfsoit'salwayspassingthatcheck.

get_unique_wordsismuchsimpler:itjustreturnsasettowhichwefeedalistwiththewordsfromasentence.Thesetclassremovesanyduplicationforus,sowedon'tneedtodoanythingelse.

TheStringUtilclassprovidesusanicecontainernamespaceformethodsthataremeanttoworkonstrings.IcouldhavecodedasimilarexamplewithaMathUtilclass,andsomestaticmethodstoworkonnumbers,butIwantedtoshowyousomethingdifferent.

ClassmethodsClassmethodsareslightlydifferentfromstaticmethodsinthat,likeinstancemethods,theyalsotakeaspecialfirstargument,butinthiscase,itistheclassobjectitself.Averycommonusecaseforcodingclassmethodsistoprovidefactorycapabilitytoaclass.Let'sseeanexample:

#oop/class.methods.factory.py

classPoint:

def__init__(self,x,y):

self.x=x

self.y=y

@classmethod

deffrom_tuple(cls,coords):#clsisPoint

returncls(*coords)

@classmethod

deffrom_point(cls,point):#clsisPoint

returncls(point.x,point.y)

p=Point.from_tuple((3,7))

print(p.x,p.y)#37

q=Point.from_point(p)

print(q.x,q.y)#37

Intheprecedingcode,Ishowedyouhowtouseaclassmethodtocreateafactoryfortheclass.Inthiscase,wewanttocreateaPointinstancebypassingbothcoordinates(regularcreationp=Point(3,7)),butwealsowanttobeabletocreateaninstancebypassingatuple(Point.from_tuple)oranotherinstance(Point.from_point).

Withinthetwoclassmethods,theclsargumentreferstothePointclass.Aswiththeinstancemethod,whichtakesselfasthefirstargument,theclassmethodtakesaclsargument.Bothselfandclsarenamedafteraconventionthatyouarenotforcedtofollowbutarestronglyencouragedtorespect.ThisissomethingthatnoPythoncoderwouldchangebecauseitissostrongaconventionthatparsers,linters,andanytoolthatautomaticallydoessomethingwithyourcodewouldexpect,soit'smuchbettertosticktoit.

Classandstaticmethodsplaywelltogether.Staticmethodsareactuallyquitehelpfulinbreakingupthelogicofaclassmethodtoimproveitslayout.Let'ssee

anexamplebyrefactoringtheStringUtilclass:

#oop/class.methods.split.py

classStringUtil:

@classmethod

defis_palindrome(cls,s,case_insensitive=True):

s=cls._strip_string(s)

#Forcaseinsensitivecomparison,welower-cases

ifcase_insensitive:

s=s.lower()

returncls._is_palindrome(s)

@staticmethod

def_strip_string(s):

return''.join(cforcinsifc.isalnum())

@staticmethod

def_is_palindrome(s):

forcinrange(len(s)//2):

ifs[c]!=s[-c-1]:

returnFalse

returnTrue

@staticmethod

defget_unique_words(sentence):

returnset(sentence.split())

print(StringUtil.is_palindrome('Anutforajaroftuna'))#True

print(StringUtil.is_palindrome('Anutforajarofbeans'))#False

Comparethiscodewiththepreviousversion.Firstofall,notethateventhoughis_palindromeisnowaclassmethod,wecallitinthesamewaywewerecallingitwhenitwasastaticone.Thereasonwhywechangedittoaclassmethodisthatafterfactoringoutacoupleofpiecesoflogic(_strip_stringand_is_palindrome),weneedtogetareferencetothem,andifwehavenoclsinourmethod,theonlyoptionwouldbetocallthemlikethis:StringUtil._strip_string(...)andStringUtil._is_palindrome(...),whichisnotgoodpractice,becausewewouldhardcodetheclassnameintheis_palindromemethod,therebyputtingourselvesinthepositionofhavingtomodifyitwheneverwewanttochangetheclassname.Usingclswillactastheclassname,whichmeansourcodewon'tneedanyamendments.

Noticehowthenewlogicreadsmuchbetterthanthepreviousversion.Moreover,noticethat,bynamingthefactored-outmethodswithaleadingunderscore,Iamhintingthatthosemethodsarenotsupposedtobecalledfromoutsidetheclass,butthiswillbethesubjectofthenextparagraph.

PrivatemethodsandnamemanglingIfyouhaveanybackgroundwithlanguageslikeJava,C#,orC++,thenyouknowtheyallowtheprogrammertoassignaprivacystatustoattributes(bothdataandmethods).Eachlanguagehasitsownslightlydifferentflavorforthis,butthegististhatpublicattributesareaccessiblefromanypointinthecode,whileprivateonesareaccessibleonlywithinthescopetheyaredefinedin.

InPython,thereisnosuchthing.Everythingispublic;therefore,werelyonconventionsandonamechanismcallednamemangling.

Theconventionisasfollows:ifanattribute'snamehasnoleadingunderscores,itisconsideredpublic.Thismeansyoucanaccessitandmodifyitfreely.Whenthenamehasoneleadingunderscore,theattributeisconsideredprivate,whichmeansit'sprobablymeanttobeusedinternallyandyoushouldnotuseitormodifyitfromtheoutside.Averycommonusecaseforprivateattributesarehelpermethodsthataresupposedtobeusedbypublicones(possiblyincallchainsinconjunctionwithothermethods),andinternaldata,suchasscalingfactors,oranyotherdatathatideallywewouldputinaconstant(avariablethatcannotchange,but,surprise,surprise,Pythondoesn'thavethoseeither).

Thischaracteristicusuallyscarespeoplefromotherbackgroundsoff;theyfeelthreatenedbythelackofprivacy.Tobehonest,inmywholeprofessionalexperiencewithPython,I'veneverheardanyonescreaming"ohmyGod,wehaveaterriblebugbecausePythonlacksprivateattributes!"Notonce,Iswear.

Thatsaid,thecallforprivacyactuallymakessensebecausewithoutit,youriskintroducingbugsintoyourcodeforreal.LetmeshowyouwhatImean:

#oop/private.attrs.py

classA:

def__init__(self,factor):

self._factor=factor

defop1(self):

print('Op1withfactor{}...'.format(self._factor))

classB(A):

defop2(self,factor):

self._factor=factor

print('Op2withfactor{}...'.format(self._factor))

obj=B(100)

obj.op1()#Op1withfactor100...

obj.op2(42)#Op2withfactor42...

obj.op1()#Op1withfactor42...<-ThisisBAD

Intheprecedingcode,wehaveanattributecalled_factor,andlet'spretendit'ssoimportantthatitisn'tmodifiedatruntimeaftertheinstanceiscreated,becauseop1dependsonittofunctioncorrectly.We'venameditwithaleadingunderscore,buttheissuehereisthatwhenwecallobj.op2(42),wemodifyit,andthisisreflectedinsubsequentcallstoop1.

Let'sfixthisundesiredbehaviorbyaddinganotherleadingunderscore:

#oop/private.attrs.fixed.py

classA:

def__init__(self,factor):

self.__factor=factor

defop1(self):

print('Op1withfactor{}...'.format(self.__factor))

classB(A):

defop2(self,factor):

self.__factor=factor

print('Op2withfactor{}...'.format(self.__factor))

obj=B(100)

obj.op1()#Op1withfactor100...

obj.op2(42)#Op2withfactor42...

obj.op1()#Op1withfactor100...<-Wohoo!Nowit'sGOOD!

Wow,lookatthat!Nowit'sworkingasdesired.Pythoniskindofmagicandinthiscase,whatishappeningisthatthename-manglingmechanismhaskickedin.

Namemanglingmeansthatanyattributenamethathasatleasttwoleadingunderscoresandatmostonetrailingunderscore,suchas__my_attr,isreplacedwithanamethatincludesanunderscoreandtheclassnamebeforetheactualname,suchas_ClassName__my_attr.

Thismeansthatwhenyouinheritfromaclass,themanglingmechanismgivesyourprivateattributetwodifferentnamesinthebaseandchildclassessothatnamecollisionisavoided.Everyclassandinstanceobjectstoresreferencestotheirattributesinaspecialattributecalled__dict__,solet'sinspectobj.__dict__toseenamemanglinginaction:

#oop/private.attrs.py

print(obj.__dict__.keys())

#dict_keys(['_factor'])

Thisisthe_factorattributethatwefindintheproblematicversionofthisexample.Butlookattheonethatisusing__factor:

#oop/private.attrs.fixed.py

print(obj.__dict__.keys())

#dict_keys(['_A__factor','_B__factor'])

See?objhastwoattributesnow,_A__factor(mangledwithintheAclass),and_B__factor(mangledwithintheBclass).Thisisthemechanismthatensuresthatwhenyoudoobj.__factor=42,__factorinAisn'tchanged,becauseyou'reactuallytouching_B__factor,whichleaves_A__factorsafeandsound.

Ifyou'redesigningalibrarywithclassesthataremeanttobeusedandextendedbyotherdevelopers,youwillneedtokeepthisinmindinordertoavoidtheunintentionaloverridingofyourattributes.Bugslikethesecanbeprettysubtleandhardtospot.

ThepropertydecoratorAnotherthingthatwouldbeacrimenottomentionisthepropertydecorator.ImaginethatyouhaveanageattributeinaPersonclassandatsomepointyouwanttomakesurethatwhenyouchangeitsvalue,you'realsocheckingthatageiswithinaproperrange,suchas[18,99].Youcanwriteaccessormethods,suchasget_age()andset_age(...)(alsocalledgettersandsetters),andputthelogicthere.get_age()willmostlikelyjustreturnage,whileset_age(...)willalsodotherangecheck.Theproblemisthatyoumayalreadyhavealotofcodeaccessingtheageattributedirectly,whichmeansyou'renowuptosometediousrefactoring.LanguageslikeJavaovercomethisproblembyusingtheaccessorpatternbasicallybydefault.ManyJavaIntegratedDevelopmentEnvironments(IDEs)autocompleteanattributedeclarationbywritinggetterandsetteraccessormethodstubsforyouonthefly.

Pythonissmarter,anddoesthiswiththepropertydecorator.Whenyoudecorateamethodwithproperty,youcanusethenameofthemethodasifitwereadataattribute.Becauseofthis,it'salwaysbesttorefrainfromputtinglogicthatwouldtakeawhiletocompleteinsuchmethodsbecause,byaccessingthemasattributes,wearenotexpectingtowait.

Let'slookatanexample:

#oop/property.py

classPerson:

def__init__(self,age):

self.age=age#anyonecanmodifythisfreely

classPersonWithAccessors:

def__init__(self,age):

self._age=age

defget_age(self):

returnself._age

defset_age(self,age):

if18<=age<=99:

self._age=age

else:

raiseValueError('Agemustbewithin[18,99]')

classPersonPythonic:

def__init__(self,age):

self._age=age

@property

defage(self):

returnself._age

@age.setter

defage(self,age):

if18<=age<=99:

self._age=age

else:

raiseValueError('Agemustbewithin[18,99]')

person=PersonPythonic(39)

print(person.age)#39-Noticeweaccessasdataattribute

person.age=42#Noticeweaccessasdataattribute

print(person.age)#42

person.age=100#ValueError:Agemustbewithin[18,99]

ThePersonclassmaybethefirstversionwewrite.Thenwerealizeweneedtoputtherangelogicinplaceso,withanotherlanguage,wewouldhavetorewritePersonasthePersonWithAccessorsclass,andrefactorallthecodethatwasusingPerson.age.InPython,werewritePersonasPersonPythonic(younormallywouldn'tchangethename,ofcourse)sothattheageisstoredinaprivate_agevariable,andwedefinepropertygettersandsettersusingthatdecoration,whichallowsustokeepusingthepersoninstancesaswewerebefore.Agetterisamethodthatiscalledwhenweaccessanattributeforreading.Ontheotherhand,asetterisamethodthatiscalledwhenweaccessanattributetowriteit.Inotherlanguages,suchasJava,it'scustomarytodefinethemasget_age()andset_age(intvalue),butIfindthePythonsyntaxmuchneater.Itallowsyoutostartwritingsimplecodeandrefactorlateron,onlywhenyouneedit,thereisnoneedtopolluteyourcodewithaccessorsonlybecausetheymaybehelpfulinthefuture.

Thepropertydecoratoralsoallowsforread-onlydata(nosetter)andforspecialactionswhentheattributeisdeleted.Pleaserefertotheofficialdocumentationtodigdeeper.

OperatoroverloadingIfindPython'sapproachtooperatoroverloadingtobebrilliant.Tooverloadanoperatormeanstogiveitameaningaccordingtothecontextinwhichitisused.Forexample,the+operatormeansadditionwhenwedealwithnumbers,butconcatenationwhenwedealwithsequences.

InPython,whenyouuseoperators,you'remostlikelycallingthespecialmethodsofsomeobjectsbehindthescenes.Forexample,thea[k]callroughlytranslatestotype(a).__getitem__(a,k).

Asanexample,let'screateaclassthatstoresastringandevaluatestoTrueif'42'ispartofthatstring,andFalseotherwise.Also,let'sgivetheclassalengthpropertythatcorrespondstothatofthestoredstring:#oop/operator.overloading.pyclassWeird:def__init__(self,s):self._s=s

def__len__(self):returnlen(self._s)

def__bool__(self):return'42'inself._s

weird=Weird('Hello!Iam9yearsold!')print(len(weird))#24print(bool(weird))#False

weird2=Weird('Hello!Iam42yearsold!')print(len(weird2))#25print(bool(weird2))#True

Thatwasfun,wasn'tit?Forthecompletelistofmagicmethodsthatyoucanoverrideinordertoprovideyourcustomimplementationofoperatorsforyour

classes,pleaserefertothePythondatamodelintheofficialdocumentation.

Polymorphism–abriefoverviewThewordpolymorphismcomesfromtheGreekpolys(many,much)andmorphē(form,shape),anditsmeaningistheprovisionofasingleinterfaceforentitiesofdifferenttypes.

Inourcarexample,wecallengine.start(),regardlessofwhatkindofengineitis.Aslongasitexposesthestartmethod,wecancallit.That'spolymorphisminaction.

Inotherlanguages,suchasJava,inordertogiveafunctiontheabilitytoacceptdifferenttypesandcallamethodonthem,thosetypesneedtobecodedinsuchawaythattheyshareaninterface.Inthisway,thecompilerknowsthatthemethodwillbeavailableregardlessofthetypeoftheobjectthefunctionisfed(aslongasitextendstheproperinterface,ofcourse).

InPython,thingsaredifferent.Polymorphismisimplicit,nothingpreventsyoufromcallingamethodonanobject;therefore,technically,thereisnoneedtoimplementinterfacesorotherpatterns.

Thereisaspecialkindofpolymorphismcalledadhocpolymorphism,whichiswhatwesawinthelastparagraph:operatoroverloading.Thisistheabilityofanoperatortochangeshape,accordingtothetypeofdataitisfed.

PolymorphismalsoallowsPythonprogrammerstosimplyusetheinterface(methodsandproperties)exposedfromanobjectratherthanhavingtocheckwhichclassitwasinstantiatedfrom.Thisallowsthecodetobemorecompactandfeelmorenatural.

Icannotspendtoomuchtimeonpolymorphism,butIencourageyoutocheckitoutbyyourself,itwillexpandyourunderstandingofOOP.Goodluck!

DataclassesBeforeweleavetheOOPrealm,thereisonelastthingIwanttomention:dataclasses.IntroducedinPython3.7byPEP557(https://www.python.org/dev/peps/pep-0557/),theycanbedescribedas"mutablenamedtupleswithdefaults".Let'sdiveintoanexample:

#oop/dataclass.py

fromdataclassesimportdataclass

@dataclass

classBody:

'''Classtorepresentaphysicalbody.'''

name:str

mass:float=0.#Kg

speed:float=1.#m/s

defkinetic_energy(self)->float:

return(self.mass*self.speed**2)/2

body=Body('Ball',19,3.1415)

print(body.kinetic_energy())#93.755711375Joule

print(body)#Body(name='Ball',mass=19,speed=3.1415)

Inthepreviouscode,Ihavecreatedaclasstorepresentaphysicalbody,withonemethodthatallowsmetocalculateitskineticenergy(usingtherenownedformulaEk=½mv2).Noticethatnameissupposedtobeastring,whilemassandspeedarebothfloats,andbotharegivenadefaultvalue.It'salsointerestingthatIdidn'thavetowriteany__init__method,it'sdoneformebythedataclassdecorator,alongwithmethodsforcomparisonandforproducingthestringrepresentationoftheobject(implicitlycalledonthelastlinebyprint).

YoucanreadallthespecificationsinPEP557ifyouarecurious,butfornowjustrememberthatdataclassesmightofferanicer,slightlymorepowerfulalternativetonamedtuples,incaseyouneedit.

WritingacustomiteratorNowwehaveallthetoolstoappreciatehowwecanwriteourowncustomiterator.Let'sfirstdefineaniterableandaniterator:

Iterable:Anobjectissaidtobeiterableifit'scapableofreturningitsmembersoneatatime.Lists,tuples,strings,anddictionariesarealliterables.Customobjectsthatdefineeitherofthe__iter__or__getitem__methodsarealsoiterables.Iterator:Anobjectissaidtobeaniteratorifitrepresentsastreamofdata.Acustomiteratorisrequiredtoprovideanimplementationfor__iter__thatreturnstheobjectitself,andanimplementationfor__next__thatreturnsthenextitemofthedatastreamuntilthestreamisexhausted,atwhichpointallsuccessivecallsto__next__simplyraisetheStopIterationexception.Built-infunctions,suchasiterandnext,aremappedtocall__iter__and__next__onanobject,behindthescenes.

Let'swriteaniteratorthatreturnsalltheoddcharactersfromastringfirst,andthentheevenones:#iterators/iterator.pyclassOddEven:

def__init__(self,data):self._data=dataself.indexes=(list(range(0,len(data),2))+list(range(1,len(data),2)))

def__iter__(self):returnself

def__next__(self):ifself.indexes:returnself._data[self.indexes.pop(0)]raiseStopIteration

oddeven=OddEven('ThIsIsCoOl!')print(''.join(cforcinoddeven))#TIICO!hssol

oddeven=OddEven('HoLa')#ormanually...it=iter(oddeven)#thiscallsoddeven.__iter__internallyprint(next(it))#H

print(next(it))#L

print(next(it))#o

print(next(it))#a

So,weneededtoprovideanimplementationfor__iter__thatreturnedtheobjectitself,andthenonefor__next__.Let'sgothroughit.Whatneededtohappenwasthereturnof_data[0],_data[2],_data[4],...,_data[1],_data[3],_data[5],...untilwehadreturnedeveryiteminthedata.Inordertodothat,wepreparedalistandindexes,suchas[0,2,4,6,...,1,3,5,...],andwhiletherewasatleastanelementinit,wepoppedthefirstoneandreturnedtheelementfromthedatathatwasatthatposition,therebyachievingourgoal.Whenindexeswasempty,weraisedStopIteration,asrequiredbytheiteratorprotocol.

Thereareotherwaystoachievethesameresult,sogoaheadandtrytocodeadifferentoneyourself.Makesuretheendresultworksforalledgecases,emptysequences,sequencesoflengthsof1,2,andsoon.

SummaryInthischapter,welookedatdecorators,discoveredthereasonsforhavingthem,andcoveredafewexamplesusingoneormoreatthesametime.Wealsosawdecoratorsthattakearguments,whichareusuallyusedasdecoratorfactories.

Wescratchedthesurfaceofobject-orientedprogramminginPython.Wecoveredallthebasics,soyoushouldnowbeabletounderstandthecodethatwillcomeinfuturechapters.Wetalkedaboutallkindsofmethodsandattributesthatonecanwriteinaclass,weexploredinheritanceversuscomposition,methodoverriding,properties,operatoroverloading,andpolymorphism.

Attheend,weverybrieflytouchedbaseoniterators,sonowyouunderstandgeneratorsmoredeeply.

Inthenextchapter,we'regoingtoseehowtodealwithfilesandhowtopersistdatainseveraldifferentwaysandformats.

FilesandDataPersistence"Persistenceisthekeytotheadventurewecalllife."

–TorstenAlexanderLange

Inthepreviouschapters,wehaveexploredseveraldifferentaspectsofPython.Astheexampleshaveadidacticpurpose,we'veruntheminasimplePythonshell,orintheformofaPythonmodule.Theyran,maybeprintedsomethingontheconsole,andthentheyterminated,leavingnotraceoftheirbriefexistence.

Real-worldapplicationsthougharegenerallymuchdifferent.Naturally,theystillruninmemory,buttheyinteractwithnetworks,disks,anddatabases.Theyalsoexchangeinformationwithotherapplicationsanddevices,usingformatsthataresuitableforthesituation.

Inthischapter,wearegoingtostartclosingintotherealworldbyexploringthefollowing:

FilesanddirectoriesCompressionNetworksandstreamsTheJSONdata-interchangeformatDatapersistencewithpickleandshelve,fromthestandardlibraryDatapersistencewithSQLAlchemy

Asusual,Iwilltrytobalancebreadthanddepth,sothatbytheendofthechapter,youwillhaveasolidgraspofthefundamentalsandwillknowhowtofetchfurtherinformationontheweb.

WorkingwithfilesanddirectoriesWhenitcomestofilesanddirectories,Pythonoffersplentyofusefultools.Inparticular,inthefollowingexamples,wewillleveragetheosandshutilmodules.Aswe'llbereadingandwritingonthedisk,Iwillbeusingafile,fear.txt,whichcontainsanexcerptfromFear,byThichNhatHanh,asaguineapigforsomeofourexamples.

OpeningfilesOpeningafileinPythonisverysimpleandintuitive.Infact,wejustneedtousetheopenfunction.Let'sseeaquickexample:

#files/open_try.py

fh=open('fear.txt','rt')#r:read,t:text

forlineinfh.readlines():

print(line.strip())#removewhitespaceandprint

fh.close()

Thepreviouscodeisverysimple.Wecallopen,passingthefilename,andtellingopenthatwewanttoreaditintextmode.Thereisnopathinformationbeforethefilename;therefore,openwillassumethefileisinthesamefolderthescriptisrunfrom.Thismeansthatifwerunthisscriptfromoutsidethefilesfolder,thenfear.txtwon'tbefound.

Oncethefilehasbeenopened,weobtainafileobjectback,fh,whichwecanusetoworkonthecontentofthefile.Inthiscase,weusethereadlines()methodtoiterateoverallthelinesinthefile,andprintthem.Wecallstrip()oneachlinetogetridofanyextraspacesaroundthecontent,includingthelineterminationcharacterattheend,sinceprintwillalreadyaddoneforus.Thisisaquickanddirtysolutionthatworksinthisexample,butshouldthecontentofthefilecontainmeaningfulspacesthatneedtobepreserved,youwillhavetobeslightlymorecarefulinhowyousanitizethedata.Attheendofthescript,weflushandclosethestream.

Closingafileisveryimportant,aswedon'twanttoriskfailingtoreleasethehandlewehaveonit.Therefore,weneedtoapplysomeprecaution,andwrapthepreviouslogicinatry/finallyblock.Thishastheeffectthat,whatevererrormightoccurwhilewetrytoopenandreadthefile,wecanrestassuredthatclose()willbecalled:

#files/open_try.py

try:

fh=open('fear.txt','rt')

forlineinfh.readlines():

print(line.strip())

finally:

fh.close()

Thelogicisexactlythesame,butnowitisalsosafe.

Don'tworryifyoudon'tunderstandtry/finallyfornow.Wewillexplorehowtodealwithexceptionsinthenextchapter.Fornow,sufficetosaythatputtingcodewithinthebodyofatryblockaddsamechanismaroundthatcodethatallowsustodetecterrors(whicharecalledexceptions)anddecidewhattodoiftheyhappen.Inthiscase,wedon'treallydoanythingincaseoferrors,butbyclosingthefilewithinthefinallyblock,wemakesurethatlineisexecutedwhetherornotanyerrorhashappened.

Wecansimplifythepreviousexamplethisway:

#files/open_try.py

try:

fh=open('fear.txt')#rtisdefault

forlineinfh:#wecaniteratedirectlyonfh

print(line.strip())

finally:

fh.close()

Asyoucansee,rtisthedefaultmodeforopeningfiles,sowedon'tneedtospecifyit.Moreover,wecansimplyiterateonfh,withoutexplicitlycallingreadlines()onit.Pythonisveryniceandgivesusshorthandstomakeourcodeshorterandsimplertoread.

Allthepreviousexamplesproduceaprintofthefileontheconsole(checkoutthesourcecodetoreadthewholecontent):

AnexcerptfromFear-ByThichNhatHanh

ThePresentIsFreefromFear

Whenwearenotfullypresent,wearenotreallyliving.We’renotreallythere,either

forourlovedonesorforourselves.Ifwe’renotthere,thenwherearewe?Weare

running,running,running,evenduringoursleep.Werunbecausewe’retryingtoescape

fromourfear.

...

UsingacontextmanagertoopenafileLet'sadmitit:theprospectofhavingtodisseminateourcodewithtry/finallyblocksisnotoneofthebest.Asusual,Pythongivesusamuchnicerwaytoopenafileinasecurefashion:byusingacontextmanager.Let'sseethecodefirst:

#files/open_with.py

withopen('fear.txt')asfh:

forlineinfh:

print(line.strip())

Thepreviousexampleisequivalenttotheonebeforeit,butreadssomuchbetter.Thewithstatementsupportstheconceptofaruntimecontextdefinedbyacontextmanager.Thisisimplementedusingapairofmethods,__enter__and__exit__,thatallowuser-definedclassestodefinearuntimecontextthatisenteredbeforethestatementbodyisexecutedandexitedwhenthestatementends.Theopenfunctioniscapableofproducingafileobjectwheninvokedbyacontextmanager,butthetruebeautyofitliesinthefactthatfh.close()willbecalledautomaticallyforus,evenincaseoferrors.

Contextmanagersareusedinseveraldifferentscenarios,suchasthreadsynchronization,closureoffilesorotherobjects,andmanagementofnetworkanddatabaseconnections.Youcanfindinformationabouttheminthecontextlibdocumentationpage(https://docs.python.org/3.7/library/contextlib.html).

ReadingandwritingtoafileNowthatweknowhowtoopenafile,let'sseeacoupleofdifferentwaysthatwehavetoreadandwritetoit:

#files/print_file.py

withopen('print_example.txt','w')asfw:

print('HeyIamprintingintoafile!!!',file=fw)

Afirstapproachusestheprintfunction,whichyou'veseenplentyoftimesinthepreviouschapters.Afterobtainingafileobject,thistimespecifyingthatweintendtowritetoit("w"),wecantellthecalltoprinttodirectitseffectsonthefile,insteadofthedefaultsys.stdout,which,whenexecutedonaconsole,ismappedtoit.

Thepreviouscodehastheeffectofcreatingtheprint_example.txtfileifitdoesn'texist,ortruncateitincaseitdoes,andwritesthelineHeyIamprintingintoafile!!!toit.

Thisisallniceandeasy,butnotwhatwetypicallydowhenwewanttowritetoafile.Let'sseeamuchmorecommonapproach:#files/read_write.pywithopen('fear.txt')asf:lines=[line.rstrip()forlineinf]

withopen('fear_copy.txt','w')asfw:fw.write('\n'.join(lines))

Inthepreviousexample,wefirstopenfear.txtandcollectitscontentintoalist,linebyline.Noticethatthistime,I'mcallingamoreprecisemethod,rstrip(),asanexample,tomakesureIonlystripthewhitespaceontheright-handsideofeveryline.

Inthesecondpartofthesnippet,wecreateanewfile,fear_copy.txt,andwewritetoitallthelinesfromtheoriginalfile,joinedbyanewline,\n.Pythonisgraciousandworksbydefaultwithuniversalnewlines,whichmeansthateventhoughtheoriginalfilemighthaveanewlinethatisdifferentthan\n,itwillbetranslatedautomaticallyforusbeforethelineisreturned.Thisbehavioris,of

course,customizable,butnormallyitisexactlywhatyouwant.Speakingofnewlines,canyouthinkofoneofthemthatmightbemissinginthecopy?

ReadingandwritinginbinarymodeNoticethatbyopeningafilepassingtintheoptions(oromittingit,asitisthedefault),we'reopeningthefileintextmode.Thismeansthatthecontentofthefileistreatedandinterpretedastext.Ifyouwishtowritebytestoafile,youcanopenitinbinarymode.Thisisacommonrequirementwhenyoudealwithfilesthatdon'tjustcontainrawtext,suchasimages,audio/video,and,ingeneral,anyotherproprietaryformat.

Inordertohandlefilesinbinarymode,simplyspecifythebflagwhenopeningthem,asinthefollowingexample:

#files/read_write_bin.py

withopen('example.bin','wb')asfw:

fw.write(b'Thisisbinarydata...')

withopen('example.bin','rb')asf:

print(f.read())#prints:b'Thisisbinarydata...'

Inthisexample,I'mstillusingtextasbinarydata,butitcouldbeanythingyouwant.Youcanseeit'streatedasabinarybythefactthatyougettheb'This...'prefixintheoutput.

Protectingagainstoverridinganexistingfile

Pythongivesustheabilitytoopenfilesforwriting.Byusingthewflag,weopenafileandtruncateitscontent.Thismeansthefileisoverwrittenwithanemptyfile,andtheoriginalcontentislost.Ifyouwishtoonlyopenafileforwritingincaseitdoesn'texist,youcanusethexflaginstead,inthefollowingexample:

#files/write_not_exists.py

withopen('write_x.txt','x')asfw:

fw.write('Writingline1')#thissucceeds

withopen('write_x.txt','x')asfw:

fw.write('Writingline2')#thisfails

Ifyouruntheprevioussnippet,youwillfindafilecalledwrite_x.txtinyourdirectory,containingonlyonelineoftext.Thesecondpartofthesnippet,infact,failstoexecute.ThisistheoutputIgetonmyconsole:

$pythonwrite_not_exists.py

Traceback(mostrecentcalllast):

File"write_not_exists.py",line6,in<module>

withopen('write_x.txt','x')asfw:

FileExistsError:[Errno17]Fileexists:'write_x.txt'

CheckingforfileanddirectoryexistenceIfyouwanttomakesureafileordirectoryexists(oritdoesn't),theos.pathmoduleiswhatyouneed.Let'sseeasmallexample:

#files/existence.py

importos

filename='fear.txt'

path=os.path.dirname(os.path.abspath(filename))

print(os.path.isfile(filename))#True

print(os.path.isdir(path))#True

print(path)#/Users/fab/srv/lpp/ch7/files

Theprecedingsnippetisquiteinteresting.Afterdeclaringthefilenamewitharelativereference(inthatitismissingthepathinformation),weuseabspathtocalculatethefull,absolutepathofthefile.Then,wegetthepathinformation(byremovingthefilenameattheend)bycallingdirnameonit.Theresult,asyoucansee,isprintedonthelastline.Noticealsohowwecheckforexistence,bothforafileandadirectory,bycallingisfileandisdir.Intheos.pathmodule,youfindallthefunctionsyouneedtoworkwithpathnames.

Shouldyoueverneedtoworkwithpathsinadifferentway,youcancheckoutpathlib.Whileos.pathworkswithstrings,pathliboffersclassesrepresentingfilesystempathswithsemanticsappropriatefordifferentoperatingsystems.Itisbeyondthescopeofthischapter,butifyou'reinterested,checkoutPEP428(https://www.python.org/dev/peps/pep-0428/),anditspageinthestandardlibrary.

ManipulatingfilesanddirectoriesLet'sseeacoupleofquickexamplesonhowtomanipulatefilesanddirectories.Thefirstexamplemanipulatesthecontent:

#files/manipulation.py

fromcollectionsimportCounter

fromstringimportascii_letters

chars=ascii_letters+''

defsanitize(s,chars):

return''.join(cforcinsifcinchars)

defreverse(s):

returns[::-1]

withopen('fear.txt')asstream:

lines=[line.rstrip()forlineinstream]

withopen('raef.txt','w')asstream:

stream.write('\n'.join(reverse(line)forlineinlines))

#nowwecancalculatesomestatistics

lines=[sanitize(line,chars)forlineinlines]

whole=''.join(lines)

cnt=Counter(whole.lower().split())

print(cnt.most_common(3))

Thepreviousexampledefinestwofunctions:sanitizeandreverse.Theyaresimplefunctionswhosepurposeistoremoveanythingthatisnotaletterorspacefromastring,andproducethereversedcopyofastring,respectively.

Weopenfear.txtandwereaditscontentintoalist.Thenwecreateanewfile,raef.txt,whichwillcontainthehorizontally-mirroredversionoftheoriginalone.Wewriteallthecontentoflineswithasingleoperation,usingjoinonanewlinecharacter.Maybemoreinteresting,isthebitintheend.First,wereassignlinestoasanitizedversionofitself,bymeansoflistcomprehension.Thenweputthemtogetherinthewholestring,andfinally,wepasstheresulttoCounter.Noticethatwesplitthestringandputitinlowercase.Thisway,eachwordwillbecountedcorrectly,regardlessofitscase,and,thankstosplit,wedon'tneedtoworryaboutextraspacesanywhere.Whenweprintthethreemostcommonwords,werealizethattrulyThichNhatHanh'sfocusisonothers,asweisthemostcommonwordinthetext:

$pythonmanipulation.py

[('we',17),('the',13),('were',7)]

Let'snowseeanexampleofmanipulationmoreorientedtodiskoperations,inwhichweputtheshutilmoduletouse:

#files/ops_create.py

importshutil

importos

BASE_PATH='ops_example'#thiswillbeourbasepath

os.mkdir(BASE_PATH)

path_b=os.path.join(BASE_PATH,'A','B')

path_c=os.path.join(BASE_PATH,'A','C')

path_d=os.path.join(BASE_PATH,'A','D')

os.makedirs(path_b)

os.makedirs(path_c)

forfilenamein('ex1.txt','ex2.txt','ex3.txt'):

withopen(os.path.join(path_b,filename),'w')asstream:

stream.write(f'Somecontentherein{filename}\n')

shutil.move(path_b,path_d)

shutil.move(

os.path.join(path_d,'ex1.txt'),

os.path.join(path_d,'ex1d.txt')

)

Inthepreviouscode,westartbydeclaringabasepath,whichwillsafelycontainallthefilesandfolderswe'regoingtocreate.Wethenusemakedirstocreatetwodirectories:ops_example/A/Bandops_example/A/C.(Canyouthinkofawayofcreatingthetwodirectoriesbyusingmap?).

Weuseos.path.jointoconcatenatedirectorynames,asusing/wouldspecializethecodetorunonaplatformwherethedirectoryseparatoris/,butthenthecodewouldfailonplatformswithadifferentseparator.Let'sdelegatetojointhetasktofigureoutwhichistheappropriateseparator.

Aftercreatingthedirectories,withinasimpleforloop,weputsomecodethatcreatesthreefilesindirectoryB.Then,wemovethefolderBanditscontenttoadifferentname:D.Andfinally,werenameex1.txttoex1d.txt.Ifyouopenthatfile,you'llseeitstillcontainstheoriginaltextfromtheforloop.Callingtreeontheresultproducesthefollowing:

$treeops_example/

ops_example/

└──A

├──C

└──D

├──ex1d.txt

├──ex2.txt

└──ex3.txt

Manipulatingpathnames

Let'sexplorealittlemoretheabilitiesofos.pathbymeansofasimpleexample:

#files/paths.py

importos

filename='fear.txt'

path=os.path.abspath(filename)

print(path)

print(os.path.basename(path))

print(os.path.dirname(path))

print(os.path.splitext(path))

print(os.path.split(path))

readme_path=os.path.join(

os.path.dirname(path),'..','..','README.rst')

print(readme_path)

print(os.path.normpath(readme_path))

Readingtheresultisprobablyagoodenoughexplanationforthissimpleexample:

/Users/fab/srv/lpp/ch7/files/fear.txt#path

fear.txt#basename

/Users/fab/srv/lpp/ch7/files#dirname

('/Users/fab/srv/lpp/ch7/files/fear','.txt')#splitext

('/Users/fab/srv/lpp/ch7/files','fear.txt')#split

/Users/fab/srv/lpp/ch7/files/../../README.rst#readme_path

/Users/fab/srv/lpp/README.rst#normalized

TemporaryfilesanddirectoriesSometimes,it'sveryusefultobeabletocreateatemporarydirectoryorfilewhenrunningsomecode.Forexample,whenwritingteststhataffectthedisk,youcanusetemporaryfilesanddirectoriestorunyourlogicandassertthatit'scorrect,andtobesurethatattheendofthetestrun,thetestfolderhasnoleftovers.Let'sseehowyoudoitinPython:

#files/tmp.py

importos

fromtempfileimportNamedTemporaryFile,TemporaryDirectory

withTemporaryDirectory(dir='.')astd:

print('Tempdirectory:',td)

withNamedTemporaryFile(dir=td)ast:

name=t.name

print(os.path.abspath(name))

Theprecedingexampleisquitestraightforward:wecreateatemporarydirectoryinthecurrentone("."),andwecreateanamedtemporaryfileinit.Weprintthefilename,aswellasitsfullpath:

$pythontmp.py

Tempdirectory:./tmpwa9bdwgo

/Users/fab/srv/lpp/ch7/files/tmpwa9bdwgo/tmp3d45hm46

Runningthisscriptwillproduceadifferentresulteverytime.Afterall,it'satemporaryrandomnamewe'recreatinghere,right?

DirectorycontentWithPython,youcanalsoinspectthecontentofadirectory.I'llshowyoutwowaysofdoingthis:#files/listing.pyimportos

withos.scandir('.')asit:forentryinit:print(entry.name,entry.path,'File'ifentry.is_file()else'Folder')

Thissnippetusesos.scandir,calledonthecurrentdirectory.Weiterateontheresults,eachofwhichisaninstanceofos.DirEntry,aniceclassthatexposesusefulpropertiesandmethods.Inthecode,weaccessasubsetofthose:name,path,andis_file().Runningthecodeyieldsthefollowing(Iomittedafewresultsforbrevity):$pythonlisting.pyfixed_amount.py./fixed_amount.pyFileexistence.py./existence.pyFile...ops_example./ops_exampleFolder...

Amorepowerfulwaytoscanadirectorytreeisgiventousbyos.walk.Let'sseeanexample:#files/walking.pyimportos

forroot,dirs,filesinos.walk('.'):print(os.path.abspath(root))ifdirs:print('Directories:')fordir_indirs:print(dir_)print()

iffiles:print('Files:')forfilenameinfiles:print(filename)print()

Runningtheprecedingsnippetwillproducealistofallfilesanddirectoriesinthecurrentone,anditwilldothesameforeachsub-directory.

Fileanddirectorycompression

Beforeweleavethissection,letmegiveyouanexampleofhowtocreateacompressedfile.Inthesourcecodeofthebook,Ihavetwoexamples:onecreatesaZIPfile,whiletheotheronecreatesatar.gzfile.Pythonallowsyoutocreatecompressedfilesinseveraldifferentwaysandformats.Here,Iamgoingtoshowyouhowtocreatethemostcommonone,ZIP:

#files/compression/zip.py

fromzipfileimportZipFile

withZipFile('example.zip','w')aszp:

zp.write('content1.txt')

zp.write('content2.txt')

zp.write('subfolder/content3.txt')

zp.write('subfolder/content4.txt')

withZipFile('example.zip')aszp:

zp.extract('content1.txt','extract_zip')

zp.extract('subfolder/content3.txt','extract_zip')

Intheprecedingcode,weimportZipFile,andthen,withinacontextmanager,wewriteintoitfourdummycontextfiles(twoofwhichareinasub-folder,toshowZIPpreservesthefullpath).Afterwards,asanexample,weopenthecompressedfileandextractacoupleoffilesfromit,intotheextract_zipdirectory.Ifyouareinterestedinlearningmoreaboutdatacompression,makesureyoucheckouttheDataCompressionandArchivingsectiononthestandardlibrary(https://docs.python.org/3.7/library/archiving.html),whereyou'llbeabletolearnallaboutthistopic.

DatainterchangeformatsModernsoftwarearchitecturetendstosplitanapplicationintoseveralcomponents.Whetheryouembracetheservice-orientedarchitectureparadigm,oryoupushitevenfurtherintothemicroservicesrealm,thesecomponentswillhavetoexchangedata.Butevenifyouarecodingamonolithicapplication,whosecodebaseiscontainedinoneproject,chancesarethatyouhavetostillexchangedatawithAPIs,otherprograms,orsimplyhandlethedataflowbetweenthefrontendandthebackendpartofyourwebsite,whichverylikelywon'tspeakthesamelanguage.

Choosingtherightformatinwhichtoexchangeinformationiscrucial.Alanguage-specificformathastheadvantagethatthelanguageitselfisverylikelytoprovideyouwithallthetoolstomakeserializationanddeserializationabreeze.However,youwilllosetheabilitytotalktoothercomponentsthathavebeenwrittenindifferentversionsofthesamelanguage,orindifferentlanguagesaltogether.Regardlessofwhatthefuturelookslike,goingwithalanguage-specificformatshouldonlybedoneifitistheonlypossiblechoiceforthegivensituation.

Amuchbetterapproachistochooseaformatthatislanguageagnostic,andcanbespokenbyall(oratleastmost)languages.IntheteamIlead,wehavepeoplefromEngland,Poland,SouthAfrica,Spain,Greece,India,Italy,tomentionjustafew.WeallspeakEnglish,soregardlessofournativetongue,wecanallunderstandeachother(well...mostly!).

Inthesoftwareworld,somepopularformatshavebecomethedefactostandardoverrecentyears.ThemostfamousonesprobablyareXML,YAML,andJSON.ThePythonstandardlibraryfeaturesthexmlandjsonmodules,and,onPyPI(https://docs.python.org/3.7/library/archiving.html),youcanfindafewdifferentpackagestoworkwithYAML.

InthePythonenvironment,JSONisprobablythemostcommonlyusedone.Itwinsovertheothertwobecauseofbeingpartofthestandardlibrary,andforitssimplicity.IfyouhaveeverworkedwithXML,youknowwhatanightmareit

canbe.

WorkingwithJSONJSONistheacronymofJavaScriptObjectNotation,anditisasubsetoftheJavaScriptlanguage.Ithasbeenthereforalmosttwodecadesnow,soitiswellknownandwidelyadoptedbybasicallyalllanguages,eventhoughitisactuallylanguageindependent.Youcanreadallaboutitonitswebsite(https://www.json.org/),butI'mgoingtogiveyouaquickintroductiontoitnow.

JSONisbasedontwostructures:acollectionofname/valuepairs,andanorderedlistofvalues.YouwillimmediatelyrealizethatthesetwoobjectsmaptothedictionaryandlistdatatypesinPython,respectively.Asdatatypes,itoffersstrings,numbers,objects,andvalues,suchastrue,false,andnull.Let'sseeaquickexampletogetusstarted:

#json_examples/json_basic.py

importsys

importjson

data={

'big_number':2**3141,

'max_float':sys.float_info.max,

'a_list':[2,3,5,7],

}

json_data=json.dumps(data)

data_out=json.loads(json_data)

assertdata==data_out#jsonandback,datamatches

Webeginbyimportingthesysandjsonmodules.Thenwecreateasimpledictionarywithsomenumbersinsideandalist.Iwantedtotestserializinganddeserializingusingverybignumbers,bothintandfloat,soIput23141andwhateveristhebiggestfloatingpointnumbermysystemcanhandle.

Weserializewithjson.dumps,whichtakesdataandconvertsitintoaJSONformattedstring.Thatdataisthenfedintojson.loads,whichdoestheopposite:fromaJSONformattedstring,itreconstructsthedataintoPython.Onthelastline,wemakesurethattheoriginaldataandtheresultoftheserialization/deserializationthroughJSONmatch.

Let'ssee,inthenextexample,whatJSONdatawouldlooklikeifweprintedit:

#json_examples/json_basic.py

importjson

info={

'full_name':'SherlockHolmes',

'address':{

'street':'221BBakerSt',

'zip':'NW16XE',

'city':'London',

'country':'UK',

}

}

print(json.dumps(info,indent=2,sort_keys=True))

Inthisexample,wecreateadictionarywithSherlockHolmes'datainit.If,likeme,you'reafanofSherlockHolmes,andareinLondon,you'llfindhismuseumatthataddress(whichIrecommendvisiting,it'ssmallbutverynice).

Noticehowwecalljson.dumps,though.Wehavetoldittoindentwithtwospaces,andsortkeysalphabetically.Theresultisthis:

$pythonjson_basic.py

{

"address":{

"city":"London",

"country":"UK",

"street":"221BBakerSt",

"zip":"NW16XE"

},

"full_name":"SherlockHolmes"

}

ThesimilaritywithPythonishuge.Theonedifferenceisthatifyouplaceacommaonthelastelementinadictionary,likeI'vedoneinPython(asitiscustomary),JSONwillcomplain.

Letmeshowyousomethinginteresting:

#json_examples/json_tuple.py

importjson

data_in={

'a_tuple':(1,2,3,4,5),

}

json_data=json.dumps(data_in)

print(json_data)#{"a_tuple":[1,2,3,4,5]}

data_out=json.loads(json_data)

print(data_out)#{'a_tuple':[1,2,3,4,5]}

Inthisexample,wehaveputatuple,insteadofalist.Theinterestingbitisthat,conceptually,atupleisalsoanorderedlistofitems.Itdoesn'thavetheflexibility

ofalist,butstill,itisconsideredthesamefromtheperspectiveofJSON.Therefore,asyoucanseebythefirstprint,inJSONatupleistransformedintoalist.Naturallythen,theinformationthatitwasatupleislost,andwhendeserializationhappens,whatwehaveindata_out,a_tupleisactuallyalist.Itisimportantthatyoukeepthisinmindwhendealingwithdata,asgoingthroughatransformationprocessthatinvolvesaformatthatonlycomprisesasubsetofthedatastructuresyoucanuseimpliestherewillbeinformationloss.Inthiscase,welosttheinformationaboutthetype(tupleversuslist).

Thisisactuallyacommonproblem.Forexample,youcan'tserializeallPythonobjectstoJSON,asitisnotclearifJSONshouldrevertthat(orhow).Thinkaboutdatetime,forexample.AninstanceofthatclassisaPythonobjectthatJSONwon'tallowserializing.Ifwetransformitintoastringsuchas2018-03-04T12:00:30Z,whichistheISO8601representationofadatewithtimeandtimezoneinformation,whatshouldJSONdowhendeserializing?Shoulditsaythisisactuallydeserializableintoadatetimeobject,soI'dbetterdoit,orshoulditsimplyconsideritasastringandleaveitasitis?Whataboutdatatypesthatcanbeinterpretedinmorethanoneway?

Theansweristhatwhendealingwithdatainterchange,weoftenneedtotransformourobjectsintoasimplerformatpriortoserializingthemwithJSON.Thisway,wewillknowhowtoreconstructthemcorrectlywhenwedeserializethem.

Insomecases,though,andmostlyforinternaluse,itisusefultobeabletoserializecustomobjects,so,justforfun,I'mgoingtoshowyouhowwithtwoexamples:complexnumbers(becauseIlovemath)anddatetimeobjects.

Customencoding/decodingwithJSONIntheJSONworld,wecanconsidertermslikeencoding/decodingassynonymstoserializing/deserializing.TheybasicallyallmeantransformingtoandbackfromJSON.Inthefollowingexample,I'mgoingtoshowyouhowtoencodecomplexnumbers:

#json_examples/json_cplx.py

importjson

classComplexEncoder(json.JSONEncoder):

defdefault(self,obj):

ifisinstance(obj,complex):

return{

'_meta':'_complex',

'num':[obj.real,obj.imag],

}

returnjson.JSONEncoder.default(self,obj)

data={

'an_int':42,

'a_float':3.14159265,

'a_complex':3+4j,

}

json_data=json.dumps(data,cls=ComplexEncoder)

print(json_data)

defobject_hook(obj):

try:

ifobj['_meta']=='_complex':

returncomplex(*obj['num'])

except(KeyError,TypeError):

returnobj

data_out=json.loads(json_data,object_hook=object_hook)

print(data_out)

WestartbydefiningaComplexEncoderclass,whichneedstoimplementthedefaultmethod.Thismethodispassedtoalltheobjectsthathavetobeserialized,oneatatime,intheobjvariable.Atsomepoint,objwillbeourcomplexnumber,3+4j.Whenthatistrue,wereturnadictionarywithsomecustommetainformation,andalistthatcontainsboththerealandtheimaginarypartofthenumber.Thatisallweneedtodotoavoidlosinginformationforacomplexnumber.

Wethencalljson.dumps,butthistimeweusetheclsargumenttospecifyour

customencoder.Theresultisprinted:

{"an_int":42,"a_float":3.14159265,"a_complex":{"_meta":"_complex","num":[3.0,

4.0]}}

Halfthejobisdone.Forthedeserializationpart,wecouldhavewrittenanotherclassthatwouldinheritfromJSONDecoder,but,justforfun,I'veusedadifferenttechniquethatissimplerandusesasmallfunction:object_hook.

Withinthebodyofobject_hook,wefindanothertryblock,butdon'tworryaboutitfornow.I'llexplainitindetailinthenextchapter.Theimportantpartisthetwolineswithinthebodyofthetryblockitself.Thefunctionreceivesanobject(notice,thefunctionisonlycalledwhenobjisadictionary),andifthemetadatamatchesourconventionforcomplexnumbers,wepasstherealandimaginarypartstothecomplexfunction.Thetry/exceptblockisthereonlytopreventmalformedJSONfromruiningtheparty(andifthathappens,wesimplyreturntheobjectasitis).

Thelastprintreturns:

{'an_int':42,'a_float':3.14159265,'a_complex':(3+4j)}

Youcanseethata_complexhasbeencorrectlydeserialized.

Let'sseeaslightlymorecomplex(nopunintended)examplenow:dealingwithdatetimeobjects.I'mgoingtosplitthecodeintotwoblocks,theserializingpart,andthedeserializingafterwards:

#json_examples/json_datetime.py

importjson

fromdatetimeimportdatetime,timedelta,timezone

now=datetime.now()

now_tz=datetime.now(tz=timezone(timedelta(hours=1)))

classDatetimeEncoder(json.JSONEncoder):

defdefault(self,obj):

ifisinstance(obj,datetime):

try:

off=obj.utcoffset().seconds

exceptAttributeError:

off=None

return{

'_meta':'_datetime',

'data':obj.timetuple()[:6]+(obj.microsecond,),

'utcoffset':off,

}

returnjson.JSONEncoder.default(self,obj)

data={

'an_int':42,

'a_float':3.14159265,

'a_datetime':now,

'a_datetime_tz':now_tz,

}

json_data=json.dumps(data,cls=DatetimeEncoder)

print(json_data)

ThereasonwhythisexampleisslightlymorecomplexliesinthefactthatdatetimeobjectsinPythoncanbetimezoneawareornot;therefore,weneedtobemorecareful.Theflowisbasicallythesameasbefore,onlyitisdealingwithadifferentdatatype.Westartbygettingthecurrentdateandtimeinformation,andwedoitbothwithout(now)andwith(now_tz)timezoneawareness,justtomakesureourscriptworks.Wethenproceedtodefineacustomencoderasbefore,andweimplementonceagainthedefaultmethod.Theimportantbitsinthatmethodarehowwegetthetimezoneoffset(off)information,inseconds,andhowwestructurethedictionarythatreturnsthedata.Thistime,themetadatasaysit'sadatetimeinformation,andthenwesavethefirstsixitemsinthetimetuple(year,month,day,hour,minute,andsecond),plusthemicrosecondsinthedatakey,andtheoffsetafterthat.Couldyoutellthatthevalueofdataisaconcatenationoftuples?Goodjobifyoucould!

Whenwehaveourcustomencoder,weproceedtocreatesomedata,andthenweserialize.Theprintstatementreturns(afterI'vedonesomeprettifying):

{

"a_datetime":{

"_meta":"_datetime",

"data":[2018,3,18,17,57,27,438792],

"utcoffset":null

},

"a_datetime_tz":{

"_meta":"_datetime",

"data":[2018,3,18,18,57,27,438810],

"utcoffset":3600

},

"a_float":3.14159265,

"an_int":42

}

Interestingly,wefindoutthatNoneistranslatedtonull,itsJavaScriptequivalent.Moreover,wecanseeourdataseemstohavebeenencodedproperly.Let'sproceedtothesecondpartofthescript:

#json_examples/json_datetime.py

defobject_hook(obj):

try:

ifobj['_meta']=='_datetime':

ifobj['utcoffset']isNone:

tz=None

else:

tz=timezone(timedelta(seconds=obj['utcoffset']))

returndatetime(*obj['data'],tzinfo=tz)

except(KeyError,TypeError):

returnobj

data_out=json.loads(json_data,object_hook=object_hook)

Onceagain,wefirstverifythatthemetadataistellingusit'sadatetime,andthenweproceedtofetchthetimezoneinformation.Oncewehavethat,wepassthe7-tuple(using*tounpackitsvaluesinthecall)andthetimezoneinformationtothedatetimecall,gettingbackouroriginalobject.Let'sverifyitbyprintingdata_out:

{

'a_datetime':datetime.datetime(2018,3,18,18,1,46,54693),

'a_datetime_tz':datetime.datetime(

2018,3,18,19,1,46,54711,

tzinfo=datetime.timezone(datetime.timedelta(seconds=3600))),

'a_float':3.14159265,

'an_int':42

}

Asyoucansee,wegoteverythingbackcorrectly.Asanexercise,I'dliketochallengeyoutowritethesamelogic,butforadateobject,whichshouldbesimpler.

Beforewemoveontothenexttopic,awordofcaution.Perhapsitiscounter-intuitive,butworkingwithdatetimeobjectscanbeoneofthetrickiestthingstodo,so,althoughI'mprettysurethiscodeisdoingwhatitissupposedtodo,IwanttostressthatIonlytesteditverylightly.Soifyouintendtograbitanduseit,pleasedotestitthoroughly.Testfordifferenttimezones,testfordaylightsavingtimebeingonandoff,testfordatesbeforetheepoch,andsoon.Youmightfindthatthecodeinthissectionthenwouldneedsomemodificationstosuityourcases.

Let'snowmovetothenexttopic,IO.

IO,streams,andrequestsIOstandsforinput/output,anditbroadlyreferstothecommunicationbetweenacomputerandtheoutsideworld.ThereareseveraldifferenttypesofIO,anditisoutsidethescopeofthischaptertoexplainallofthem,butIstillwanttoofferyouacoupleofexamples.

Usinganin-memorystreamThefirstwillshowyoutheio.StringIOclass,whichisanin-memorystreamfortextIO.Thesecondoneinsteadwillescapethelocalityofourcomputer,andshowyouhowtoperformanHTTPrequest.Let'sseethefirstexample:

#io_examples/string_io.py

importio

stream=io.StringIO()

stream.write('LearningPythonProgramming.\n')

print('BecomeaPythonninja!',file=stream)

contents=stream.getvalue()

print(contents)

stream.close()

Intheprecedingcodesnippet,weimporttheiomodulefromthestandardlibrary.ThisisaveryinterestingmodulethatfeaturesmanytoolsrelatedtostreamsandIO.OneofthemisStringIO,whichisanin-memorybufferinwhichwe'regoingtowritetwosentences,usingtwodifferentmethods,aswedidwithfilesinthefirstexamplesofthischapter.WecanbothcallStringIO.writeorwecanuseprint,andtellittodirectthedatatoourstream.

Bycallinggetvalue,wecangetthecontentofthestream(andprintit),andfinallywecloseit.Thecalltoclosecausesthetextbuffertobeimmediatelydiscarded.

Thereisamoreelegantwaytowritethepreviouscode(canyouguessit,beforeyoulook?):

#io_examples/string_io.py

withio.StringIO()asstream:

stream.write('LearningPythonProgramming.\n')

print('BecomeaPythonninja!',file=stream)

contents=stream.getvalue()

print(contents)

Yes,itisagainacontextmanager.Likeopen,io.StringIOworkswellwithinacontextmanagerblock.Noticethesimilaritywithopen:inthiscasetoo,wedon'tneedtomanuallyclosethestream.

In-memoryobjectscanbeusefulinamultitudeofsituations.Memoryismuch

fasterthanadiskand,forsmallamountsofdata,canbetheperfectchoice.

Whenrunningthescript,theoutputis:

$pythonstring_io.py

LearningPythonProgramming.

BecomeaPythonninja!

MakingHTTPrequestsLet'snowexploreacoupleofexamplesonHTTPrequests.Iwillusetherequestslibraryfortheseexamples,whichyoucaninstallwithpip.We'regoingtoperformHTTPrequestsagainstthehttpbin.orgAPI,which,interestingly,wasdevelopedbyKennethReitz,thecreatoroftherequestslibraryitself.Thislibraryisamongstthemostwidelyadoptedallovertheworld:

importrequests

urls={

'get':'https://httpbin.org/get?title=learn+python+programming',

'headers':'https://httpbin.org/headers',

'ip':'https://httpbin.org/ip',

'now':'https://now.httpbin.org/',

'user-agent':'https://httpbin.org/user-agent',

'UUID':'https://httpbin.org/uuid',

}

defget_content(title,url):

resp=requests.get(url)

print(f'Responsefor{title}')

print(resp.json())

fortitle,urlinurls.items():

get_content(title,url)

print('-'*40)

Theprecedingsnippetshouldbesimpletounderstand.IdeclareadictionaryofURLsagainstwhichIwanttoperformrequests.Ihaveencapsulatedthecodethatperformstherequestintoatinyfunction:get_content.Asyoucansee,verysimply,weperformaGETrequest(byusingrequests.get),andweprintthetitleandtheJSONdecodedversionofthebodyoftheresponse.Letmespendawordaboutthislastbit.

Whenweperformarequesttoawebsite,orAPI,wegetbackaresponseobject,whichis,verysimply,whatwasreturnedbytheserverweperformedtherequestagainst.Thebodyofallresponsesfromhttpbin.orghappenstobeJSONencoded,soinsteadofgettingthebodyasitis(bygettingresp.text)andmanuallydecodingit,callingjson.loadsonit,wesimplycombinethetwobyleveragingthejsonmethodontheresponseobject.Thereareplentyofreasonswhytherequestspackagehasbecomesowidelyadopted,andoneofthemisdefinitelyitseaseofuse.

Now,whenyouperformarequestinyourapplication,youwillwanttohaveamuchmorerobustapproachindealingwitherrorsandsoon,butforthischapter,asimpleexamplewilldo.Don'tworry,IwillgiveyouamorecomprehensiveintroductiontoHTTPrequestsinChapter14,WebDevelopment.

Goingbacktoourcode,intheend,werunaforloopandgetalltheURLs.Whenyourunit,youwillseetheresultofeachcallprintedonyourconsole,likethis(prettifiedandtrimmedforbrevity):

$pythonreqs.py

Responseforget

{

"args":{

"title":"learnpythonprogramming"

},

"headers":{

"Accept":"*/*",

"Accept-Encoding":"gzip,deflate",

"Connection":"close",

"Host":"httpbin.org",

"User-Agent":"python-requests/2.19.0"

},

"origin":"82.47.175.158",

"url":"https://httpbin.org/get?title=learn+python+programming"

}

...restoftheoutputomitted...

NoticethatyoumightgetaslightlydifferentoutputintermsofversionnumbersandIPs,whichisfine.Now,GETisonlyoneoftheHTTPverbs,anditisdefinitelythemostcommonlyused.ThesecondoneistheubiquitousPOST,whichisthetypeofrequestyoumakewhenyouneedtosenddatatotheserver.Everytimeyousubmitaformontheweb,you'rebasicallymakingaPOSTrequest.So,let'strytomakeoneprogrammatically:

#io_examples/reqs_post.py

importrequests

url='https://httpbin.org/post'

data=dict(title='LearnPythonProgramming')

resp=requests.post(url,data=data)

print('ResponseforPOST')

print(resp.json())

Thepreviouscodeisverysimilartotheonewesawbefore,onlythistimewedon'tcallget,butpost,andbecausewewanttosendsomedata,wespecifythatinthecall.Therequestslibraryoffersmuch,muchmorethanthis,andithasbeenpraisedbythecommunityforthebeautifulAPIitexposes.ItisaprojectthatIencourageyoutocheckoutandexplore,asyouwillendupusingitallthetime,

anyway.

Runningthepreviousscript(andapplyingsomeprettifyingmagictotheoutput)yieldsthefollowing:

$pythonreqs_post.py

ResponseforPOST

{'args':{},

'data':'',

'files':{},

'form':{'title':'LearnPythonProgramming'},

'headers':{'Accept':'*/*',

'Accept-Encoding':'gzip,deflate',

'Connection':'close',

'Content-Length':'30',

'Content-Type':'application/x-www-form-urlencoded',

'Host':'httpbin.org',

'User-Agent':'python-requests/2.7.0CPython/3.7.0b2'

'Darwin/17.4.0'},

'json':None,

'origin':'82.45.123.178',

'url':'https://httpbin.org/post'}

Noticehowtheheadersarenowdifferent,andwefindthedatawesentintheformkey/valuepairoftheresponsebody.

Ihopetheseshortexamplesareenoughtogetyoustarted,especiallywithrequests.Thewebchangeseveryday,soit'sworthlearningthebasicsandthenbrushupeverynowandthen.

Let'snowmoveontothelasttopicofthischapter:persistingdataondiskindifferentformats.

PersistingdataondiskInthelastsectionofthischapter,we'reexploringhowtopersistdataondiskinthreedifferentformats.Wewillexplorepickle,shelve,andashortexamplethatwillinvolveaccessingadatabaseusingSQLAlchemy,themostwidelyadoptedORMlibraryinthePythonecosystem.

SerializingdatawithpickleThepicklemodule,fromthePythonstandardlibrary,offerstoolstoconvertPythonobjectsintobytestreams,andviceversa.EventhoughthereisapartialoverlapintheAPIthatpickleandjsonexpose,thetwoarequitedifferent.Aswehaveseenpreviouslyinthischapter,JSONisatextformat,humanreadable,languageindependent,andsupportsonlyarestrictedsubsetofPythondatatypes.Thepicklemodule,ontheotherhand,isnothumanreadable,translatestobytes,isPythonspecific,and,thankstothewonderfulPythonintrospectioncapabilities,itsupportsanextremelylargeamountofdatatypes.

Regardlessofthesedifferences,though,whichyoushouldknowwhenyouconsiderwhethertouseoneortheother,Ithinkthatthemostimportantconcernregardingpickleliesinthesecuritythreatsyouareexposedtowhenyouuseit.Unpicklingerroneousormaliciousdatafromanuntrustedsourcecanbeverydangerous,soifyoudecidetoadoptitinyourapplication,youneedtobeextracareful.

Thatsaid,let'sseeitinaction,bymeansofasimpleexample:

#persistence/pickler.py

importpickle

fromdataclassesimportdataclass

@dataclass

classPerson:

first_name:str

last_name:str

id:int

defgreet(self):

print(f'Hi,Iam{self.first_name}{self.last_name}'

f'andmyIDis{self.id}'

)

people=[

Person('Obi-Wan','Kenobi',123),

Person('Anakin','Skywalker',456),

]

#savedatainbinaryformattoafile

withopen('data.pickle','wb')asstream:

pickle.dump(people,stream)

#loaddatafromafile

withopen('data.pickle','rb')asstream:

peeps=pickle.load(stream)

forpersoninpeeps:

person.greet()

Inthepreviousexample,wecreateaPersonclassusingthedataclassdecorator,whichwehaveseeninChapter6,OOP,Decorators,andIterators.TheonlyreasonIwrotethisexamplewithadataclassistoshowyouhoweffortlesslypickledealswithit,withnoneedforustodoanythingwewouldn'tdoforasimplerdatatype.

Theclasshasthreeattributes:first_name,last_name,andid.Italsoexposesagreetmethod,whichsimplyprintsahellomessagewiththedata.

Wecreatealistofinstances,andthenwesaveittoafile.Inordertodoso,weusepickle.dump,towhichwefeedthecontenttobepickled,andthestreamtowhichwewanttowrite.Immediatelyafterthat,wereadfromthatsamefile,andbyusingpickle.load,weconvertbackintoPythonthewholecontentofthatstream.Justtomakesurethattheobjectshavebeenconvertedcorrectly,wecallthegreetmethodonbothofthem.Theresultisthefollowing:

$pythonpickler.py

Hi,IamObi-WanKenobiandmyIDis123

Hi,IamAnakinSkywalkerandmyIDis456

Thepicklemodulealsoallowsyoutoconvertto(andfrom)byteobjects,bymeansofthedumpsandloadsfunctions(notethesattheendofbothnames).Inday-to-dayapplications,pickleisusuallyusedwhenweneedtopersistPythondatathatisnotsupposedtobeexchangedwithanotherapplication.OneexampleIstumbleduponrecentlywasthesessionmanagementinaflaskplugin,whichpicklesthesessionobjectbeforesendingittoRedis.Inpractice,though,youareunlikelytohavetodealwiththislibraryveryoften.

Anothertoolthatispossiblyusedevenless,butthatprovestobeveryusefulwhenyouareshortofresources,isshelve.

SavingdatawithshelveAshelf,isapersistentdictionary-likeobject.Thebeautyofitisthatthevaluesyousaveintoashelfcanbeanyobjectyoucanpickle,soyou'renotrestrictedlikeyouwouldbeifyouwereusingadatabase.Albeitinterestinganduseful,theshelvemoduleisusedquiterarelyinpractice.Justforcompleteness,let'sseeaquickexampleofhowitworks:

#persistence/shelf.py

importshelve

classPerson:

def__init__(self,name,id):

self.name=name

self.id=id

withshelve.open('shelf1.shelve')asdb:

db['obi1']=Person('Obi-Wan',123)

db['ani']=Person('Anakin',456)

db['a_list']=[2,3,5]

db['delete_me']='wewillhavetodeletethisone...'

print(list(db.keys()))#['ani','a_list','delete_me','obi1']

deldb['delete_me']#gone!

print(list(db.keys()))#['ani','a_list','obi1']

print('delete_me'indb)#False

print('ani'indb)#True

a_list=db['a_list']

a_list.append(7)

db['a_list']=a_list

print(db['a_list'])#[2,3,5,7]

Apartfromthewiringandtheboilerplatearoundit,thepreviousexampleresemblesanexercisewithdictionaries.WecreateasimplePersonclassandthenweopenashelvefilewithinacontextmanager.Asyoucansee,weusethedictionarysyntaxtostorefourobjects:twoPersoninstances,alist,andastring.Ifweprintthekeys,wegetalistcontainingthefourkeysweused.Immediatelyafterprintingit,wedeletethe(aptlynamed)delete_mekey/valuepairfromshelf.Printingthekeysagainshowsthedeletionhassucceeded.Wethentestacoupleofkeysformembership,andfinally,weappendnumber7toa_list.Noticehowwehavetoextractthelistfromtheshelf,modifyit,andsaveitagain.

Incasethisbehaviorisundesired,thereissomethingwecando:

#persistence/shelf.py

withshelve.open('shelf2.shelve',writeback=True)asdb:

db['a_list']=[11,13,17]

db['a_list'].append(19)#in-placeappend!

print(db['a_list'])#[11,13,17,19]

Byopeningtheshelfwithwriteback=True,weenablethewritebackfeature,whichallowsustosimplyappendtoa_listasifitactuallywasavaluewithinaregulardictionary.Thereasonwhythisfeatureisnotactivebydefaultisthatitcomeswithapricethatyoupayintermsofmemoryconsumptionandslowerclosingoftheshelf.

Nowthatwehavepaidhomagetothestandardlibrarymodulesrelatedtodatapersistence,let'stakealookatthemostwidelyadoptedORMinthePythonecosystem:SQLAlchemy.

SavingdatatoadatabaseForthisexample,wearegoingtoworkwithanin-memorydatabase,whichwillmakethingssimplerforus.Inthesourcecodeofthebook,IhaveleftacoupleofcommentstoshowyouhowtogenerateaSQLitefile,soIhopeyou'llexplorethatoptionaswell.

YoucanfindafreedatabasebrowserforSQLiteatsqlitebrowser.org.Ifyouarenotsatisfiedwithit,youwillbeabletofindawiderangeoftools,somefree,somenotfree,thatyoucanusetoaccessandmanipulateadatabasefile.

Beforewediveintothecode,allowmetobrieflyintroducetheconceptofarelationaldatabase.

Arelationaldatabaseisadatabasethatallowsyoutosavedatafollowingtherelationalmodel,inventedin1969byEdgarF.Codd.Inthismodel,dataisstoredinoneormoretables.Eachtablehasrows(alsoknownasrecords,ortuples),eachofwhichrepresentsanentryinthetable.Tablesalsohavecolumns(alsoknownasattributes),eachofwhichrepresentsanattributeoftherecords.Eachrecordisidentifiedthroughauniquekey,morecommonlyknownastheprimarykey,whichistheunionofoneormorecolumnsinthetable.Togiveyouanexample:imagineatablecalledUsers,withcolumnsid,username,password,name,andsurname.Suchatablewouldbeperfecttocontainusersofoursystem.Eachrowwouldrepresentadifferentuser.Forexample,arowwiththevalues3,gianchub,my_wonderful_pwd,Fabrizio,andRomano,wouldrepresentmyuserinthesystem.

Thereasonwhythemodeliscalledrelationalisbecauseyoucanestablishrelationsbetweentables.Forexample,ifyouaddedatablecalledPhoneNumberstoourfictitiousdatabase,youcouldinsertphonenumbersintoit,andthen,througharelation,establishwhichphonenumberbelongstowhichuser.

Inordertoqueryarelationaldatabase,weneedaspeciallanguage.ThemainstandardiscalledSQL,whichstandsforStructuredQueryLanguage.Itisbornoutofsomethingcalledrelationalalgebra,whichisaverynicefamilyofalgebrasusedtomodeldatastoredaccordingtotherelationalmodel,and

performingqueriesonit.Themostcommonoperationsyoucanperformusuallyinvolvefilteringontherowsorcolumns,joiningtables,aggregatingtheresultsaccordingtosomecriteria,andsoon.TogiveyouanexampleinEnglish,aqueryonourimaginarydatabasecouldbe:Fetchallusers(username,name,surname)whoseusernamestartswith"m",whohaveatmostonephonenumber.Inthisquery,weareaskingforasubsetofthecolumnsintheUsertable.Wearefilteringonusersbytakingonlythosewhoseusernamestartswiththeletterm,andevenfurther,onlythosewhohaveatmostonephonenumber.

BackinthedayswhenIwasastudentinPadova,Ispentawholesemesterlearningboththerelationalalgebrasemantics,andthestandardSQL(amongstotherthings).Ifitwasn'tforamajorbicycleaccidentIhadthedayoftheexam,IwouldsaythatthiswasoneofthemostfunexamsIeverhadtoprepare.

Now,eachdatabasecomeswithitsownflavorofSQL.Theyallrespectthestandardtosomeextent,butnonefullydoes,andtheyarealldifferentfromoneanotherinsomerespects.Thisposesanissueinmodernsoftwaredevelopment.IfourapplicationcontainsSQLcode,itisquitelikelythatifwedecidedtouseadifferentdatabaseengine,ormaybeadifferentversionofthesameengine,wewouldfindourSQLcodeneedsamending.

Thiscanbequitepainful,especiallysinceSQLqueriescanbecomevery,verycomplicatedquitequickly.Inordertoalleviatethispainalittle,computerscientists(blessthem)havecreatedcodethatmapsobjectsofaparticularlanguagetotablesofarelationaldatabase.Unsurprisingly,thenameofsuchtoolsisObject-RelationalMapping(ORMs).

Inmodernapplicationdevelopment,youwouldnormallystartinteractingwithadatabasebyusinganORM,andshouldyoufindyourselfinasituationwhereyoucan'tperformaqueryyouneedtoperform,throughtheORM,youwouldthenresorttousingSQLdirectly.ThisisagoodcompromisebetweenhavingnoSQLatall,andusingnoORM,whichultimatelymeansspecializingthecodethatinteractswiththedatabase,withtheaforementioneddisadvantages.

Inthissection,I'dliketoshowanexamplethatleveragesSQLAlchemy,themostpopularPythonORM.Wearegoingtodefinetwomodels(PersonandAddress)whichmaptoatableeach,andthenwe'regoingtopopulatethedatabaseandperformafewqueriesonit.

Let'sstartwiththemodeldeclarations:

#persistence/alchemy_models.py

fromsqlalchemy.ext.declarativeimportdeclarative_base

fromsqlalchemyimport(

Column,Integer,String,ForeignKey,create_engine)

fromsqlalchemy.ormimportrelationship

Atthebeginning,weimportsomefunctionsandtypes.Thefirstthingweneedtodothenistocreateanengine.ThisenginetellsSQLAlchemyaboutthetypeofdatabasewehavechosenforourexample:

#persistence/alchemy_models.py

engine=create_engine('sqlite:///:memory:')

Base=declarative_base()

classPerson(Base):

__tablename__='person'

id=Column(Integer,primary_key=True)

name=Column(String)

age=Column(Integer)

addresses=relationship(

'Address',

back_populates='person',

order_by='Address.email',

cascade='all,delete-orphan'

)

def__repr__(self):

returnf'{self.name}(id={self.id})'

classAddress(Base):

__tablename__='address'

id=Column(Integer,primary_key=True)

email=Column(String)

person_id=Column(ForeignKey('person.id'))

person=relationship('Person',back_populates='addresses')

def__str__(self):

returnself.email

__repr__=__str__

Base.metadata.create_all(engine)

EachmodeltheninheritsfromtheBasetable,whichinthisexampleconsistsofthemeredefault,returnedbydeclarative_base().WedefinePerson,whichmapstoatablecalledperson,andexposestheattributesid,name,andage.WealsodeclarearelationshipwiththeAddressmodel,bystatingthataccessingtheaddressesattributewillfetchalltheentriesintheaddresstablethatarerelatedtotheparticularPersoninstancewe'redealingwith.Thecascadeoptionaffectshowcreationanddeletionwork,butitisamoreadvancedconcept,soI'dsuggestyou

glideonitfornowandmaybeinvestigatemorelateron.

Thelastthingwedeclareisthe__repr__method,whichprovidesuswiththeofficialstringrepresentationofanobject.Thisissupposedtobearepresentationthatcanbeusedtocompletelyreconstructtheobject,butinthisexample,Isimplyuseittoprovidesomethinginoutput.Pythonredirectsrepr(obj)toacalltoobj.__repr__().

WealsodeclaretheAddressmodel,whichwillcontainemailaddresses,andareferencetothepersontheybelongto.Youcanseetheperson_idandpersonattributesarebothaboutsettingarelationbetweentheAddressandPersoninstances.NotehowIdeclaredthe__str__methodonAddress,andthenassignedanaliastoit,called__repr__.ThismeansthatcallingbothreprandstronAddressobjectswillultimatelyresultincallingthe__str__method.ThisisquiteacommontechniqueinPython,soItooktheopportunitytoshowittoyouhere.

Onthelastline,wetelltheenginetocreatetablesinthedatabaseaccordingtoourmodels.

AdeeperunderstandingofthiscodewouldrequiremuchmorespacethanIcanafford,soIencourageyoutoreadupondatabasemanagementsystems(DBMS),SQL,RelationalAlgebra,andSQLAlchemy.

Nowthatwehaveourmodels,let'susethemtopersistsomedata!

Let'stakealookatthefollowingexample:

#persistence/alchemy.py

fromalchemy_modelsimportPerson,Address,engine

fromsqlalchemy.ormimportsessionmaker

Session=sessionmaker(bind=engine)

session=Session()

Firstwecreatesession,whichistheobjectweusetomanagethedatabase.Next,weproceedbycreatingtwopeople:

anakin=Person(name='AnakinSkywalker',age=32)

obi1=Person(name='Obi-WanKenobi',age=40)

Wethenaddemailaddressestobothofthem,usingtwodifferenttechniques.Oneassignsthemtoalist,andtheotheronesimplyappendsthem:

obi1.addresses=[

Address(email='obi1@example.com'),

Address(email='wanwan@example.com'),

]

anakin.addresses.append(Address(email='ani@example.com'))

anakin.addresses.append(Address(email='evil.dart@example.com'))

anakin.addresses.append(Address(email='vader@example.com'))

Wehaven'ttouchedthedatabaseyet.It'sonlywhenweusethesessionobjectthatsomethingactuallyhappensinit:

session.add(anakin)

session.add(obi1)

session.commit()

AddingthetwoPersoninstancesisenoughtoalsoaddtheiraddresses(thisisthankstothecascadingeffect).CallingcommitiswhatactuallytellsSQLAlchemytocommitthetransactionandsavethedatainthedatabase.Atransactionisanoperationthatprovidessomethinglikeasandbox,butinadatabasecontext.Aslongasthetransactionhasn'tbeencommitted,wecanrollbackanymodificationwehavedonetothedatabase,andbysodoing,reverttothestatewewerebeforestartingthetransactionitself.SQLAlchemyoffersmorecomplexandgranularwaystodealwithtransactions,whichyoucanstudyinitsofficialdocumentation,asitisquiteanadvancedtopic.WenowqueryforallthepeoplewhosenamestartswithObibyusinglike,whichhookstotheLIKEoperatorinSQL:

obi1=session.query(Person).filter(

Person.name.like('Obi%')

).first()

print(obi1,obi1.addresses)

Wetakethefirstresultofthatquery(weknowweonlyhaveObi-Wananyway),andprintit.Wethenfetchanakin,byusinganexactmatchonhisname(justtoshowyouadifferentwayoffiltering):

anakin=session.query(Person).filter(

Person.name=='AnakinSkywalker'

).first()

print(anakin,anakin.addresses)

WethencaptureAnakin'sID,anddeletetheanakinobjectfromtheglobalframe:

anakin_id=anakin.id

delanakin

ThereasonwedothisisbecauseIwanttoshowyouhowtofetchanobjectbyitsID.Beforewedothat,wewritethedisplay_infofunction,whichwewillusetodisplaythefullcontentofthedatabase(fetchedstartingfromtheaddresses,inordertodemonstratehowtofetchobjectsbyusingarelationattributeinSQLAlchemy):

defdisplay_info():

#getalladdressesfirst

addresses=session.query(Address).all()

#displayresults

foraddressinaddresses:

print(f'{address.person.name}<{address.email}>')

#displayhowmanyobjectswehaveintotal

print('people:{},addresses:{}'.format(

session.query(Person).count(),

session.query(Address).count())

)

Thedisplay_infofunctionprintsalltheaddresses,alongwiththerespectiveperson'sname,and,attheend,producesafinalpieceofinformationregardingthenumberofobjectsinthedatabase.Wecallthefunction,thenwefetchanddeleteanakin(thinkaboutDarthVaderandyouwon'tbesadaboutdeletinghim),andthenwedisplaytheinfoagain,toverifyhe'sactuallydisappearedfromthedatabase:

display_info()

anakin=session.query(Person).get(anakin_id)

session.delete(anakin)

session.commit()

display_info()

Theoutputofallthesesnippetsruntogetheristhefollowing(foryourconvenience,Ihaveseparatedtheoutputintofourblocks,toreflectthefourblocksofcodethatactuallyproducethatoutput):

$pythonalchemy.py

Obi-WanKenobi(id=2)[obi1@example.com,wanwan@example.com]

AnakinSkywalker(id=1)[ani@example.com,evil.dart@example.com,vader@example.com]

AnakinSkywalker<ani@example.com>

AnakinSkywalker<evil.dart@example.com>

AnakinSkywalker<vader@example.com>

Obi-WanKenobi<obi1@example.com>

Obi-WanKenobi<wanwan@example.com>

people:2,addresses:5

Obi-WanKenobi<obi1@example.com>

Obi-WanKenobi<wanwan@example.com>

people:1,addresses:2

Asyoucanseefromthelasttwoblocks,deletinganakinhasdeletedonePersonobject,andthethreeaddressesassociatedwithit.Again,thisisduetothefactthatcascadingtookplacewhenwedeletedanakin.

Thisconcludesourbriefintroductiontodatapersistence.Itisavastand,attimes,complexdomain,whichIencourageyoutoexplorelearningasmuchtheoryaspossible.Lackofknowledgeorproperunderstanding,whenitcomestodatabasesystems,canreallybite.

SummaryInthischapter,wehaveexploredworkingwithfilesanddirectories.Wehavelearnedhowtoopenfilesforreadingandwritingandhowtodothatmoreelegantlybyusingcontextmanagers.Wealsoexploreddirectories:howtolisttheircontent,bothrecursivelyandnot.Wealsolearnedaboutpathnames,whicharethegatewaytoaccessingbothfilesanddirectories.

WethenbrieflysawhowtocreateaZIParchive,andextractitscontent.Thesourcecodeofthebookalsocontainsanexamplewithadifferentcompressionformat:tar.gz.

Wetalkedaboutdatainterchangeformats,andhaveexploredJSONinsomedepth.WehadsomefunwritingcustomencodersanddecodersforspecificPythondatatypes.

ThenweexploredIO,bothwithin-memorystreamsandHTTPrequests.

Andfinally,wesawhowtopersistdatausingpickle,shelve,andtheSQLAlchemyORMlibrary.

Youshouldnowhaveaprettygoodideaofhowtodealwithfilesanddatapersistence,andIhopeyouwilltakethetimetoexplorethesetopicsinmuchmoredepthbyyourself.

Thenextchapterwilllookattesting,profiling,anddealingwithexceptions.

Testing,Profiling,andDealingwithExceptions"Justasthewiseacceptsgoldaftertestingitbyheating,cuttingandrubbingit,soaremywordstobeacceptedafterexaminingthem,butnotoutofrespectforme."

–Buddha

IlovethisquotebytheBuddha.Withinthesoftwareworld,ittranslatesperfectlyintothehealthyhabitofnevertrustingcodejustbecausesomeonesmartwroteitorbecauseit'sbeenworkingfineforalongatime.Ifithasnotbeentested,codeisnottobetrusted.

Whyaretestssoimportant?Well,forone,theygiveyoupredictability.Or,atleast,theyhelpyouachievehighpredictability.Unfortunately,thereisalwayssomebugthatsneaksintothecode.Butwedefinitelywantourcodetobeaspredictableaspossible.Whatwedon'twantistohaveasurprise,inotherwords,ourcodebehavinginanunpredictableway.Wouldyoubehappytoknowthatthesoftwarethatchecksonthesensorsoftheplanethatistakingyouonyourholidayssometimesgoescrazy?No,probablynot.

Therefore,weneedtotestourcode;weneedtocheckthatitsbehavioriscorrect,thatitworksasexpectedwhenitdealswithedgecases,thatitdoesn'thangwhenthecomponentsit'stalkingtoarebrokenorunreachable,thattheperformancesarewellwithintheacceptablerange,andsoon.

Thischapterisallaboutthat—makingsurethatyourcodeispreparedtofacethescaryoutsideworld,thatit'sfastenough,andthatitcandealwithunexpectedorexceptionalconditions.

Inthischapter,we'regoingtoexplorethefollowingtopics:

Testing(severalaspectsofit,includingabriefintroductiontotest-drivendevelopment)ExceptionhandlingProfilingandperformances

Let'sstartbyunderstandingwhattestingis.

TestingyourapplicationTherearemanydifferentkindsoftests,somany,infact,thatcompaniesoftenhaveadedicateddepartment,calledqualityassurance(QA),madeupofindividualswhospendtheirdaytestingthesoftwarethecompanydevelopersproduce.

Tostartmakinganinitialclassification,wecandividetestsintotwobroadcategories:white-boxandblack-boxtests.

White-boxtestsarethosethatexercisetheinternalsofthecode;theyinspectitdowntoaveryfinelevelofdetail.Ontheotherhand,black-boxtestsarethosethatconsiderthesoftwareundertestasifwithinabox,theinternalsofwhichareignored.Eventhetechnology,orthelanguageusedinsidethebox,isnotimportantforblack-boxtests.Whattheydoispluginputintooneendoftheboxandverifytheoutputattheotherend—that'sit.

Thereisalsoanin-betweencategory,calledgray-boxtesting,whichinvolvestestingasysteminthesamewaywedowiththeblack-boxapproach,buthavingsomeknowledgeaboutthealgorithmsanddatastructuresusedtowritethesoftwareandonlypartialaccesstoitssourcecode.

Therearemanydifferentkindsoftestsinthesecategories,eachofwhichservesadifferentpurpose.Togiveyouanidea,hereareafew:

Frontendtests:Makesurethattheclientsideofyourapplicationisexposingtheinformationthatitshould,allthelinks,thebuttons,theadvertising,everythingthatneedstobeshowntotheclient.Itmayalsoverifythatitispossibletowalkacertainpaththroughtheuserinterface.Scenariotests:Makeuseofstories(orscenarios)thathelpthetesterworkthroughacomplexproblemortestapartofthesystem.Integrationtests:Verifythebehaviorofthevariouscomponentsofyourapplicationwhentheyareworkingtogethersendingmessagesthroughinterfaces.Smoketests:Particularlyusefulwhenyoudeployanewupdateonyourapplication.Theycheckwhetherthemostessential,vitalpartsofyourapplicationarestillworkingastheyshouldandthattheyarenotonfire.

Thistermcomesfromwhenengineerstestedcircuitsbymakingsurenothingwassmoking.Acceptancetests,oruseracceptancetesting(UAT):Whatadeveloperdoeswithaproductowner(forexample,inaSCRUMenvironment)todeterminewhethertheworkthatwascommissionedwascarriedoutcorrectly.Functionaltests:Verifythefeaturesorfunctionalitiesofyoursoftware.Destructivetests:Takedownpartsofyoursystem,simulatingafailure,toestablishhowwelltheremainingpartsofthesystemperform.Thesekindsoftestsareperformedextensivelybycompaniesthatneedtoprovideanextremelyreliableservice,suchasAmazonandNetflix,forexample.Performancetests:Aimtoverifyhowwellthesystemperformsunderaspecificloadofdataortrafficsothat,forexample,engineerscangetabetterunderstandingofthebottlenecksinthesystemthatcouldbringittoitskneesinaheavy-loadsituation,orthosethatpreventscalability.Usabilitytests,andthecloselyrelateduserexperience(UX)tests:Aimtocheckwhethertheuserinterfaceissimpleandeasytounderstandanduse.Theyaimtoprovideinputtothedesignerssothattheuserexperienceisimproved.Securityandpenetrationtests:Aimtoverifyhowwellthesystemisprotectedagainstattacksandintrusions.Unittests:Helpthedevelopertowritethecodeinarobustandconsistentway,providingthefirstlineoffeedbackanddefenseagainstcodingmistakes,refactoringmistakes,andsoon.Regressiontests:Providethedeveloperwithusefulinformationaboutafeaturebeingcompromisedinthesystemafteranupdate.Someofthecausesforasystembeingsaidtohavearegressionareanoldbugcomingbacktolife,anexistingfeaturebeingcompromised,oranewissuebeingintroduced.

Manybooksandarticleshavebeenwrittenabouttesting,andIhavetopointyoutothoseresourcesifyou'reinterestedinfindingoutmoreaboutallthedifferentkindsoftests.Inthischapter,wewillconcentrateonunittests,sincetheyarethebackboneofsoftware-craftingandformthevastmajorityofteststhatarewrittenbyadeveloper.

Testingisanart,anartthatyoudon'tlearnfrombooks,I'mafraid.Youcanlearnallthedefinitions(andyoushould),andtrytocollectasmuchknowledgeabouttestingasyoucan,butyouwilllikelybeabletotestyoursoftwareproperlyonly

whenyouhavedoneitforlongenoughinthefield.

Whenyouarehavingtroublerefactoringabitofcode,becauseeverylittlethingyoutouchmakesatestblowup,youlearnhowtowritelessrigidandlimitingtests,whichstillverifythecorrectnessofyourcodebut,atthesametime,allowyouthefreedomandjoytoplaywithit,toshapeitasyouwant.

Whenyouarebeingcalledtoooftentofixunexpectedbugsinyourcode,youlearnhowtowritetestsmorethoroughly,howtocomeupwithamorecomprehensivelistofedgecases,andstrategiestocopewiththembeforetheyturnintobugs.

Whenyouarespendingtoomuchtimereadingtestsandtryingtorefactorthemtochangeasmallfeatureinthecode,youlearntowritesimpler,shorter,andbetter-focusedtests.

Icouldgoonwiththiswhenyou...youlearn...,butIguessyougetthepicture.Youneedtogetyourhandsdirtyandbuildexperience.Mysuggestion?Studythetheoryasmuchasyoucan,andthenexperimentusingdifferentapproaches.Also,trytolearnfromexperiencedcoders;it'sveryeffective.

TheanatomyofatestBeforeweconcentrateonunittests,let'sseewhatatestis,andwhatitspurposeis.

Atestisapieceofcodewhosepurposeistoverifysomethinginoursystem.Itmaybethatwe'recallingafunctionpassingtwointegers,thatanobjecthasapropertycalleddonald_duck,orthatwhenyouplaceanorderonsomeAPI,afteraminuteyoucanseeitdissectedintoitsbasicelements,inthedatabase.

Atestistypicallycomposedofthreesections:

Preparation:Thisiswhereyousetupthescene.Youprepareallthedata,theobjects,andtheservicesyouneedintheplacesyouneedthemsothattheyarereadytobeused.Execution:Thisiswhereyouexecutethebitoflogicthatyou'recheckingagainst.Youperformanactionusingthedataandtheinterfacesyouhavesetupinthepreparationphase.Verification:Thisiswhereyouverifytheresultsandmakesuretheyareaccordingtoyourexpectations.Youcheckthereturnedvalueofafunction,orthatsomedataisinthedatabase,someisnot,somehaschanged,arequesthasbeenmade,somethinghashappened,amethodhasbeencalled,andsoon.

Whiletestsusuallyfollowthisstructure,inatestsuite,youwilltypicallyfindsomeotherconstructsthattakepartinthetestinggame:

Setup:Thisissomethingquitecommonlyfoundinseveraldifferenttests.It'slogicthatcanbecustomizedtorunforeverytest,class,module,orevenforawholesession.Inthisphaseusuallydeveloperssetupconnectionstodatabases,maybepopulatethemwithdatathatwillbeneededthereforthetesttomakesense,andsoon.Teardown:Thisistheoppositeofthesetup;theteardownphasetakesplacewhenthetestshavebeenrun.Likethesetup,itcanbecustomizedtorunforeverytest,classormodule,orsession.Typicallyinthisphase,wedestroyanyartefactsthatwerecreatedforthetestsuite,andcleanupafter

ourselves.Fixtures:Theyarepiecesofdatausedinthetests.Byusingaspecificsetoffixture,outcomesarepredictableandthereforetestscanperformverificationsagainstthem.

Inthischapter,wewillusethepytestPythonlibrary.Itisanincrediblypowerfultoolthatmakestestingmucheasierandprovidesplentyofhelperssothatthetestlogiccanfocusmoreontheactualtestingthanthewiringaroundit.Youwillsee,whenwegettothecode,thatoneofthecharacteristicsofpytestisthatfixtures,setup,andteardownoftenblendintoone.

Testingguidelines

Likesoftware,testscanbegoodorbad,withawholerangeofshadesinthemiddle.Towritegoodtests,herearesomeguidelines:

Keepthemassimpleaspossible.It'sokaytoviolatesomegoodcodingrules,suchashardcodingvaluesorduplicatingcode.Testsneed,firstandforemost,tobeasreadableaspossibleandeasytounderstand.Whentestsarehardtoreadorunderstand,youcanneverbeconfidenttheyareactuallymakingsureyourcodeisperformingcorrectly.Testsshouldverifyonethingandonethingonly.It'sveryimportantthatyoukeepthemshortandcontained.It'sperfectlyfinetowritemultipleteststoexerciseasingleobjectorfunction.Justmakesurethateachtesthasoneandonlyonepurpose.Testsshouldnotmakeanyunnecessaryassumptionwhenverifyingdata.Thisistrickytounderstandatfirst,butitisimportant.Verifyingthattheresultofafunctioncallis[1,2,3]isnotthesameassayingtheoutputisalistthatcontainsthenumbers1,2,and3.Intheformer,we'realsoassumingtheordering;inthelatter,we'reonlyassumingwhichitemsareinthelist.Thedifferencessometimesarequitesubtle,buttheyarestillveryimportant.Testsshouldexercisethewhat,ratherthanthehow.Testsshouldfocusoncheckingwhatafunctionissupposedtodo,ratherthanhowitisdoingit.Forexample,focusonthefactthatit'scalculatingthesquarerootofanumber(thewhat),insteadofonthefactthatitiscallingmath.sqrttodoit(thehow).Unlessyou'rewritingperformancetestsoryouhaveaparticularneedtoverifyhowacertainactionisperformed,trytoavoidthistypeoftestingandfocusonthewhat.Testingthehowleadstorestrictivetestsandmakesrefactoringhard.Moreover,thetypeoftestyouhavetowritewhenyouconcentrateonthehowismorelikelytodegradethequalityofyourtestingcodebasewhenyouamendyoursoftwarefrequently.Testsshouldusetheminimalsetoffixturesneededtodothejob.Thisisanothercrucialpoint.Fixtureshaveatendencytogrowovertime.Theyalsotendtochangeeverynowandthen.Ifyouusebigamountsoffixtures

andignoreredundanciesinyourtests,refactoringwilltakelonger.Spottingbugswillbeharder.Trytouseasetoffixturesthatisbigenoughforthetesttoperformcorrectly,butnotanybigger.Testsshouldrunasfastaspossible.Agoodtestcodebasecouldendupbeingmuchlongerthanthecodebeingtesteditself.Itvariesaccordingtothesituationandthedeveloper,but,whateverthelength,you'llenduphavinghundreds,ifnotthousands,ofteststorun,whichmeansthefastertheyrun,thefasteryoucangetbacktowritingcode.WhenusingTDD,forexample,youruntestsveryoften,sospeedisessential.Testsshoulduseuptheleastpossibleamountofresources.Thereasonforthisisthateverydeveloperwhochecksoutyourcodeshouldbeabletorunyourtests,nomatterhowpowerfultheirboxis.ItcouldbeaskinnyvirtualmachineoraneglectedJenkinsbox,yourtestsshouldrunwithoutchewinguptoomanyresources.

AJenkinsboxisamachinethatrunsJenkins,softwarethatiscapableof,amongmanyotherthings,runningyourtestsautomatically.Jenkinsisfrequentlyusedincompanieswheredevelopersusepracticessuchascontinuousintegrationandextremeprogramming.

UnittestingNowthatyouhaveanideaaboutwhattestingisandwhyweneedit,let'sintroducethedeveloper'sbestfriend:theunittest.

Beforeweproceedwiththeexamples,allowmetosharesomewordsofcaution:I'lltrytogiveyouthefundamentalsaboutunittesting,butIdon'tfollowanyparticularschoolofthoughtormethodologytotheletter.Overtheyears,Ihavetriedmanydifferenttestingapproaches,eventuallycomingupwithmyownwayofdoingthings,whichisconstantlyevolving.ToputitasBruceLeewouldhave:"Absorbwhatisuseful,discardwhatisuselessandaddwhatisspecificallyyourown."

WritingaunittestUnitteststaketheirnameafterthefactthattheyareusedtotestsmallunitsofcode.Toexplainhowtowriteaunittest,let'stakealookatasimplesnippet:

#data.py

defget_clean_data(source):

data=load_data(source)

cleaned_data=clean_data(data)

returncleaned_data

Theget_clean_datafunctionisresponsibleforgettingdatafromsource,cleaningit,andreturningittothecaller.Howdowetestthisfunction?

Onewayofdoingthisistocallitandthenmakesurethatload_datawascalledoncewithsourceasitsonlyargument.Thenwehavetoverifythatclean_datawascalledonce,withthereturnvalueofload_data.And,finally,wewouldneedtomakesurethatthereturnvalueofclean_dataiswhatisreturnedbytheget_clean_datafunctionaswell.

Todothis,weneedtosetupthesourceandrunthiscode,andthismaybeaproblem.Oneofthegoldenrulesofunittestingisthatanythingthatcrossestheboundariesofyourapplicationneedstobesimulated.Wedon'twanttotalktoarealdatasource,andwedon'twanttoactuallyrunrealfunctionsiftheyarecommunicatingwithanythingthatisnotcontainedinourapplication.Afewexampleswouldbeadatabase,asearchservice,anexternalAPI,andafileinthefilesystem.

Weneedtheserestrictionstoactasashield,sothatwecanalwaysrunourtestssafelywithoutthefearofdestroyingsomethinginarealdatasource.

Anotherreasonisthatitmaybequitedifficultforasingledevelopertoreproducethewholearchitectureontheirbox.Itmayrequirethesettingupofdatabases,APIs,services,filesandfolders,andsoonandsoforth,andthiscanbedifficult,time-consuming,orsometimesnotevenpossible.

Verysimplyput,anapplicationprogramminginterface(API)isasetoftoolsforbuildingsoftwareapplications.AnAPIexpressesasoftwarecomponentintermsofitsoperations,inputandoutput,andunderlyingtypes.Forexample,ifyoucreateasoftwarethatneedsto

interfacewithadataproviderservice,it'sverylikelythatyouwillhavetogothroughtheirAPIinordertogainaccesstothedata.

Therefore,inourunittests,weneedtosimulateallthosethingsinsomeway.Unittestsneedtoberunbyanydeveloperwithouttheneedforthewholesystemtobesetupontheirbox.

Adifferentapproach,whichIalwaysfavorwhenit'spossibletodoso,istosimulateentitieswithoutusingfakeobjects,butusingspecial-purposetestobjectsinstead.Forexample,ifyourcodetalkstoadatabase,insteadoffakingallthefunctionsandmethodsthattalktothedatabaseandprogrammingthefakeobjectssothattheyreturnwhattherealoneswould,I'dmuchratherspawnatestdatabase,setupthetablesanddataIneed,andthenpatchtheconnectionsettingssothatmytestsarerunningrealcode,againstthetestdatabase,therebydoingnoharmatall.In-memorydatabasesareexcellentoptionsforthesecases.

OneoftheapplicationsthatallowyoutospawnadatabasefortestingisDjango.Withinthedjango.testpackage,youcanfindseveraltoolsthathelpyouwriteyourtestssothatyouwon'thavetosimulatethedialogwithadatabase.Bywritingteststhisway,youwillalsobeabletocheckontransactions,encodings,andallotherdatabase-relatedaspectsofprogramming.Anotheradvantageofthisapproachconsistsintheabilityofcheckingagainstthingsthatcanchangefromonedatabasetoanother.

Sometimes,though,it'sstillnotpossible,andweneedtousefakes,solet'stalkaboutthem.

Mockobjectsandpatching

Firstofall,inPython,thesefakeobjectsarecalledmocks.UptoVersion3.3,themocklibrarywasathird-partylibrarythatbasicallyeveryprojectwouldinstallviapipbut,fromVersion3.3,ithasbeenincludedinthestandardlibraryundertheunittestmodule,andrightfullyso,givenitsimportanceandhowwidespreaditis.

Theactofreplacingarealobjectorfunction(oringeneral,anypieceofdatastructure)withamock,iscalledpatching.Themocklibraryprovidesthepatchtool,whichcanactasafunctionorclassdecorator,andevenasacontextmanagerthatyoucanusetomockthingsout.Onceyouhavereplacedeverythingyoudon'tneedtorunwithsuitablemocks,youcanpasstothesecondphaseofthetestandrunthecodeyouareexercising.Aftertheexecution,youwillbeabletocheckthosemockstoverifythatyourcodehasworkedcorrectly.

AssertionsTheverificationphaseisdonethroughtheuseofassertions.Anassertionisafunction(ormethod)thatyoucanusetoverifyequalitybetweenobjects,aswellasotherconditions.Whenaconditionisnotmet,theassertionwillraiseanexceptionthatwillmakeyourtestfail.Youcanfindalistofassertionsintheunittestmoduledocumentation;however,whenusingpytest,youwilltypicallyusethegenericassertstatement,whichmakesthingsevensimpler.

TestingaCSVgeneratorLet'snowadoptapracticalapproach.Iwillshowyouhowtotestapieceofcode,andwewilltouchontherestoftheimportantconceptsaroundunittesting,withinthecontextofthisexample.

Wewanttowriteanexportfunctionthatdoesthefollowing:ittakesalistofdictionaries,eachofwhichrepresentsauser.ItcreatesaCSVfile,putsaheaderinit,andthenproceedstoaddalltheuserswhoaredeemedvalidaccordingtosomerules.Theexportfunctiontakesalsoafilename,whichwillbethenamefortheCSVinoutput.And,finally,ittakesanindicationonwhethertoallowanexistingfilewiththesamenametobeoverwritten.

Asfortheusers,theymustabidebythefollowing:eachuserhasatleastanemail,aname,andanage.Therecanbeafourthfieldrepresentingtherole,butit'soptional.Theuser'semailaddressneedstobevalid,thenameneedstobenon-empty,andtheagemustbeanintegerbetween18and65.

Thisisourtask,sonowI'mgoingtoshowyouthecode,andthenwe'regoingtoanalyzethetestsIwroteforit.But,firstthingsfirst,inthefollowingcodesnippets,I'llbeusingtwothird-partylibraries:marshmallowandpytest.Theybothareintherequirementsofthebook'ssourcecode,somakesureyouhaveinstalledthemwithpip.

marshmallowisawonderfullibrarythatprovidesuswiththeabilitytoserializeanddeserializeobjectsand,mostimportantly,givesustheabilitytodefineaschemathatwecanusetovalidateauserdictionary.pytestisoneofthebestpiecesofsoftwareIhaveeverseen.Itisusedeverywherenow,andhasreplacedothertoolssuchasnose,forexample.Itprovidesuswithgreattoolstowritebeautifulshorttests.

Butlet'sgettothecode.Icalleditapi.pyjustbecauseitexposesafunctionthatwecanusetodothings.I'llshowittoyouinchunks:

#api.py

importos

importcsv

fromcopyimportdeepcopy

frommarshmallowimportSchema,fields,pre_load

frommarshmallow.validateimportLength,Range

classUserSchema(Schema):

"""Representa*valid*user."""

email=fields.Email(required=True)

name=fields.String(required=True,validate=Length(min=1))

age=fields.Integer(

required=True,validate=Range(min=18,max=65)

)

role=fields.String()

@pre_load(pass_many=False)

defstrip_name(self,data):

data_copy=deepcopy(data)

try:

data_copy['name']=data_copy['name'].strip()

except(AttributeError,KeyError,TypeError):

pass

returndata_copy

schema=UserSchema()

Thisfirstpartiswhereweimportallthemodulesweneed(osandcsv),andsometoolsfrommarshmallow,andthenwedefinetheschemafortheusers.Asyoucansee,weinheritfrommarshmallow.Schema,andthenwesetfourfields.NoticeweareusingtwoStringfields,EmailandInteger.Thesewillalreadyprovideuswithsomevalidationfrommarshmallow.Noticethereisnorequired=Trueintherolefield.

Weneedtoaddacoupleofcustombitsofcode,though.Weneedtoaddvalidate_agetomakesurethevalueiswithintherangewewant.WeraiseValidationErrorincaseit'snot.Andmarshmallowwillkindlytakecareofraisinganerrorshouldwepassanythingbutaninteger.

Next,weaddvalidate_name,becausethefactthatanamekeyinthedictionaryistheredoesn'tguaranteethatthenameisactuallynon-empty.Sowetakeitsvalue,westripallleadingandtrailingwhitespacecharacters,andiftheresultisempty,weraiseValidationErroragain.Noticewedon'tneedtoaddacustomvalidatorfortheemailfield.Thisisbecausemarshmallowwillvalidateit,andavalidemailcannotbeempty.

Wetheninstantiateschema,sothatwecanuseittovalidatedata.Solet'swritetheexportfunction:

#api.py

defexport(filename,users,overwrite=True):

"""ExportaCSVfile.

CreateaCSVfileandfillwithvalidusers.If`overwrite`

isFalseandfilealreadyexists,raiseIOError.

"""

ifnotoverwriteandos.path.isfile(filename):

raiseIOError(f"'{filename}'alreadyexists.")

valid_users=get_valid_users(users)

write_csv(filename,valid_users)

Asyousee,itsinternalsarequitestraightforward.IfoverwriteisFalseandthefilealreadyexists,weraiseIOErrorwithamessagesayingthefilealreadyexists.Otherwise,ifwecanproceed,wesimplygetthelistofvalidusersandfeedittowrite_csv,whichisresponsibleforactuallydoingthejob.Let'sseehowallthesefunctionsaredefined:

#api.py

defget_valid_users(users):

"""Yieldonevaliduseratatimefromusers."""

yieldfromfilter(is_valid,users)

defis_valid(user):

"""Returnwhetherornottheuserisvalid."""

returnnotschema.validate(user)

TurnsoutIcodedget_valid_usersasagenerator,asthereisnoneedtomakeapotentiallybiglistinordertoputitinafile.Wecanvalidateandsavethemonebyone.Theheartofvalidationissimplyadelegationtoschema.validate,whichusesvalidationenginebymarshmallow.Thewaythisworksisbyreturningadictionary,whichisemptyifvalidationsucceeded,orelseitwillcontainerrorinformation.Wedon'treallycareaboutcollectingtheerrorinformationforthistask,sowesimplyignoreit,andwithinis_validwebasicallyreturnTrueifthereturnvaluefromschema.validateisempty,andFalseotherwise.

Onelastpieceismissing;hereitis:

#api.py

defwrite_csv(filename,users):

"""WriteaCSVgivenafilenameandalistofusers.

TheusersareassumedtobevalidforthegivenCSVstructure.

"""

fieldnames=['email','name','age','role']

withopen(filename,'x',newline='')ascsvfile:

writer=csv.DictWriter(csvfile,fieldnames=fieldnames)

writer.writeheader()

foruserinusers:

writer.writerow(user)

Again,thelogicisstraightforward.Wedefinetheheaderinfieldnames,thenweopenfilenameforwriting,andwespecifynewline='',whichisrecommendedinthedocumentationwhendealingwithCSVfiles.Whenthefilehasbeencreated,wegetawriterobjectbyusingthecsv.DictWriterclass.Thebeautyofthistoolisthatitiscapableofmappingtheuserdictionariestothefieldnames,sowedon'tneedtotakecareoftheordering.

Wewritetheheaderfirst,andthenweloopovertheusersandaddthemonebyone.Notice,thisfunctionassumesitisfedalistofvalidusers,anditmaybreakifthatassumptionisfalse(withthedefaultvalues,itwouldbreakifanyuserdictionaryhadextrafields).

That'sthewholecodeyouhavetokeepinmind.Isuggestyouspendamomenttogothroughitagain.Thereisnoneedtomemorizeit,andthefactthatIhaveusedsmallhelperfunctionswithmeaningfulnameswillenableyoutofollowthetestingalongmoreeasily.

Let'snowgettotheinterestingpart:testingourexportfunction.Onceagain,I'llshowyouthecodeinchunks:

#tests/test_api.py

importos

fromunittest.mockimportpatch,mock_open,call

importpytest

from..apiimportis_valid,export,write_csv

Let'sstartfromtheimports:weneedos,temporarydirectories(whichwealreadysawinChapter7,FilesandDataPersistence),thenpytest,and,finally,weusearelativeimporttofetchthethreefunctionsthatwewanttoactuallytest:is_valid,export,andwrite_csv.

Beforewecanwritetests,though,weneedtomakeafewfixtures.Asyouwillsee,afixtureisafunctionthatisdecoratedwiththepytest.fixturedecorator.Inmostcases,weexpectfixturetoreturnsomething,sothatwecanuseitinatest.Wehavesomerequirementsforauserdictionary,solet'swriteacoupleofusers:onewithminimalrequirements,andonewithfullrequirements.Bothneedtobevalid.Hereisthecode:

#tests/test_api.py

@pytest.fixture

defmin_user():

"""Representavaliduserwithminimaldata."""

return{

'email':'minimal@example.com',

'name':'PrimusMinimus',

'age':18,

}

@pytest.fixture

deffull_user():

"""Representvaliduserwithfulldata."""

return{

'email':'full@example.com',

'name':'MaximusPlenus',

'age':65,

'role':'emperor',

}

Inthisexample,theonlydifferenceisthepresenceoftherolekey,butit'senoughtoshowyouthepointIhope.Noticethatinsteadofsimplydeclaringdictionariesatamodulelevel,weactuallyhavewrittentwofunctionsthatreturnadictionary,andwehavedecoratedthemwiththepytest.fixturedecorator.Thisisbecausewhenyoudeclareadictionaryatmodule-level,whichissupposedtobeusedinyourtests,youneedtomakesureyoucopyitatthebeginningofeverytest.Ifyoudon't,youmayhaveatestthatmodifiesit,andthiswillaffectallteststhatfollowit,compromisingtheirintegrity.

Byusingthesefixtures,pytestwillgiveusanewdictionaryeverytestrun,sowedon'tneedtogothroughthatpainourselves.Noticethatifafixturereturnsanothertype,insteadofdict,thenthatiswhatyouwillgetinthetest.Fixturesalsoarecomposable,whichmeanstheycanbeusedinoneanother,whichisaverypowerfulfeatureofpytest.Toshowyouthis,let'swriteafixtureforalistofusers,inwhichweputthetwowealreadyhave,plusonethatwouldfailvalidationbecauseithasnoage.Let'stakealookatthefollowingcode:

#tests/test_api.py

@pytest.fixture

defusers(min_user,full_user):

"""Listofusers,twovalidandoneinvalid."""

bad_user={

'email':'invalid@example.com',

'name':'Horribilis',

}

return[min_user,bad_user,full_user]

Nice.So,nowwehavetwousersthatwecanuseindividually,butalsowehavealistofthreeusers.Thefirstroundoftestswillbetestinghowwearevalidatingauser.Wewillgroupallthetestsforthistaskwithinaclass.Thisnotonlyhelpsgivingrelatedtestsanamespace,aplacetobe,but,aswe'llseelateron,itallowsustodeclareclass-levelfixtures,whicharedefinedjustforthetestsbelongingto

theclass.Takealookatthiscode:

#tests/test_api.py

classTestIsValid:

"""Testhowcodeverifieswhetherauserisvalidornot."""

deftest_minimal(self,min_user):

assertis_valid(min_user)

deftest_full(self,full_user):

assertis_valid(full_user)

Westartverysimplybymakingsureourfixturesareactuallypassingvalidation.Thisisveryimportant,asthosefixtureswillbeusedeverywhere,sowewantthemtobeperfect.Next,wetesttheage.Twothingstonoticehere:Iwillnotrepeattheclasssignature,sothecodethatfollowsisindentedbyfourspacesandit'sbecausetheseareallmethodswithinthesameclass,okay?And,second,we'regoingtouseparametrizationquiteheavily.

Parametrizationisatechniquethatenablesustorunthesametestmultipletimes,butfeedingdifferentdatatoit.Itisveryuseful,asitallowsustowritethetestonlyoncewithnorepetition,andtheresultwillbeveryintelligentlyhandledbypytest,whichwillrunallthosetestsasiftheywereactuallyseparate,thusprovidinguswithclearerrormessageswhentheyfail.Ifyouparametrizemanually,youlosethisfeature,andbelievemeyouwon'tbehappy.Let'sseehowwetesttheage:

#tests/test_api.py

@pytest.mark.parametrize('age',range(18))

deftest_invalid_age_too_young(self,age,min_user):

min_user['age']=age

assertnotis_valid(min_user)

Right,sowestartbywritingatesttocheckthatvalidationfailswhentheuseristooyoung.Accordingtoourrule,auseristooyoungwhentheyareyoungerthan18.Wecheckforeveryagebetween0and17,byusingrange.

Ifyoutakealookathowtheparametrizationworks,you'llseewedeclarethenameofanobject,whichwethenpasstothesignatureofthemethod,andthenwespecifywhichvaluesthisobjectwilltake.Foreachvalue,thetestwillberunonce.Inthecaseofthisfirsttest,theobject'snameisage,andthevaluesareallthosereturnedbyrange(18),whichmeansallintegernumbersfrom0to17areincluded.Noticehowwefeedagetothetestmethod,rightafterself,andthenwedosomethingelse,whichisalsoveryinteresting.Wepassthismethodafixture:

min_user.Thishastheeffectofactivatingthatfixtureforthetestrun,sothatwecanuseit,andcanrefertoitfromwithinthetest.Inthiscase,wesimplychangetheagewithinthemin_userdictionary,andthenweverifythattheresultofis_valid(min_user)isFalse.

WedothislastbitbyassertingonthefactthatnotFalseisTrue.Inpytest,thisishowyoucheckforsomething.Yousimplyassertthatsomethingistruthy.Ifthatisthecase,thetesthassucceeded.Shoulditinsteadbetheopposite,thetestwouldfail.

Let'sproceedandaddallthetestsneededtomakevalidationfailontheage:

#tests/test_api.py

@pytest.mark.parametrize('age',range(66,100))

deftest_invalid_age_too_old(self,age,min_user):

min_user['age']=age

assertnotis_valid(min_user)

@pytest.mark.parametrize('age',['NaN',3.1415,None])

deftest_invalid_age_wrong_type(self,age,min_user):

min_user['age']=age

assertnotis_valid(min_user)

So,anothertwotests.Onetakescareoftheotherendofthespectrum,from66yearsofageto99.Andthesecondoneinsteadmakessurethatageisinvalidwhenit'snotanintegernumber,sowepasssomevalues,suchasastring,afloat,andNone,justtomakesure.Noticehowthestructureofthetestisbasicallyalwaysthesame,but,thankstotheparametrization,wefeedverydifferentinputargumentstoit.

Nowthatwehavetheage-failingallsortedout,let'saddatestthatactuallycheckstheageiswithinthevalidrange:

#tests/test_api.py

@pytest.mark.parametrize('age',range(18,66))

deftest_valid_age(self,age,min_user):

min_user['age']=age

assertis_valid(min_user)

It'saseasyasthat.Wepassthecorrectrange,from18to65,andremovethenotintheassertion.Noticehowalltestsstartwiththetest_prefix,andhaveadifferentname.

Wecanconsidertheageasbeingtakencareof.Let'smoveontowritetestsonmandatoryfields:

#tests/test_api.py

@pytest.mark.parametrize('field',['email','name','age'])

deftest_mandatory_fields(self,field,min_user):

min_user.pop(field)

assertnotis_valid(min_user)

@pytest.mark.parametrize('field',['email','name','age'])

deftest_mandatory_fields_empty(self,field,min_user):

min_user[field]=''

assertnotis_valid(min_user)

deftest_name_whitespace_only(self,min_user):

min_user['name']='\n\t'

assertnotis_valid(min_user)

Thepreviousthreetestsstillbelongtothesameclass.Thefirstonetestswhetherauserisinvalidwhenoneofthemandatoryfieldsismissing.Noticethatateverytestrun,themin_userfixtureisrestored,soweonlyhaveonemissingfieldpertestrun,whichistheappropriatewaytocheckformandatoryfields.Wesimplypopthekeyoutofthedictionary.Thistimetheparametrizationobjecttakesthenamefield,and,bylookingatthefirsttest,youseeallthemandatoryfieldsintheparametrizationdecorator:email,name,andage.

Inthesecondone,thingsarealittledifferent.Insteadofpoppingkeysout,wesimplysetthem(oneatatime)totheemptystring.Finally,inthethirdone,wecheckforthenametobemadeofwhitespaceonly.

Thepreviousteststakecareofmandatoryfieldsbeingthereandbeingnon-empty,andoftheformattingaroundthenamekeyofauser.Good.Let'snowwritethelasttwotestsforthisclass.Wewanttocheckemailvalidity,andtypeforemail,name,andtherole:

#tests/test_api.py

@pytest.mark.parametrize(

'email,outcome',

[

('missing_at.com',False),

('@missing_start.com',False),

('missing_end@',False),

('missing_dot@example',False),

('good.one@example.com',True),

('δοκιμή@παράδειγμα.δοκιμή',True),

('аджай@экзампл.рус',True),

]

)

deftest_email(self,email,outcome,min_user):

min_user['email']=email

assertis_valid(min_user)==outcome

Thistime,theparametrizationisslightlymorecomplex.Wedefinetwoobjects(emailandoutcome),andthenwepassalistoftuples,insteadofasimplelist,tothedecorator.Whathappensisthateachtimethetestisrun,oneofthosetupleswillbeunpackedsotofillthevaluesofemailandoutcome,respectively.Thisallowsustowriteonetestforbothvalidandinvalidemailaddresses,insteadoftwoseparateones.Wedefineanemailaddress,andwespecifytheoutcomeweexpectfromvalidation.Thefirstfourareinvalidemailaddresses,butthelastthreeareactuallyvalid.IhaveusedacoupleofexampleswithUnicode,justtomakesurewe'renotforgettingtoincludeourfriendsfromallovertheworldinthevalidation.

Noticehowthevalidationisdone,assertingtheresultofthecallneedstomatchtheoutcomewehaveset.

Let'snowwriteasimpletesttomakesurevalidationfailswhenwefeedthewrongtypetothefields(again,theagehasbeentakencareofseparatelybefore):

#tests/test_api.py

@pytest.mark.parametrize(

'field,value',

[

('email',None),

('email',3.1415),

('email',{}),

('name',None),

('name',3.1415),

('name',{}),

('role',None),

('role',3.1415),

('role',{}),

]

)

deftest_invalid_types(self,field,value,min_user):

min_user[field]=value

assertnotis_valid(min_user)

Aswedidbefore,justforfun,wepassthreedifferentvalues,noneofwhichisactuallyastring.Thistestcouldbeexpandedtoincludemorevalues,but,honestly,weshouldn'tneedtowritetestssuchasthisone.Ihaveincludeditherejusttoshowyouwhat'spossible.

Beforewemovetothenexttestclass,letmetalkaboutsomethingwehaveseenwhenwewerecheckingtheage.

BoundariesandgranularityWhilecheckingfortheage,wehavewrittenthreeteststocoverthethreeranges:0-17(fail),18-65(success),66-99(fail).Whydidwedothis?Theanswerliesinthefactthatwearedealingwithtwoboundaries:18and65.Soourtestingneedstofocusonthethreeregionsthosetwoboundariesdefine:before18,within18and65,andafter65.Howyoudoitisnotcrucial,aslongasyoumakesureyoutesttheboundariescorrectly.Thismeansifsomeonechangesthevalidationintheschemafrom18<=value<=65to18<=value<65(noticethemissing=),theremustbeatestthatfailsonthe65.

Thisconceptisknownasboundary,andit'sveryimportantthatyourecognizetheminyourcodesothatyoucantestagainstthem.

Anotherimportantthingistounderstandiswhichzoomlevelwewanttogetclosetotheboundaries.Inotherwords,whichunitshouldIusetomovearoundit?Inthecaseofage,we'redealingwithintegers,soaunitof1willbetheperfectchoice(whichiswhyweused16,17,18,19,20,...).Butwhatifyouweretestingforatimestamp?Well,inthatcase,thecorrectgranularitywilllikelybedifferent.Ifthecodehastoactdifferentlyaccordingtoyourtimestampandthattimestamprepresentseconds,thenthegranularityofyourtestsshouldzoomdowntoseconds.Ifthetimestamprepresentsyears,thenyearsshouldbetheunityouuse.Ihopeyougetthepicture.Thisconceptisknownasgranularity,andneedstobecombinedwiththatofboundaries,sothatbygoingaroundtheboundarieswiththecorrectgranularity,youcanmakesureyourtestsarenotleavinganythingtochance.

Let'snowcontinuewithourexample,andtesttheexportfunction.

TestingtheexportfunctionInthesametestmodule,Ihavedefinedanotherclassthatrepresentsatestsuitefortheexportfunction.Hereitis:

#tests/test_api.py

classTestExport:

@pytest.fixture

defcsv_file(self,tmpdir):

yieldtmpdir.join("out.csv")

@pytest.fixture

defexisting_file(self,tmpdir):

existing=tmpdir.join('existing.csv')

existing.write('Pleaseleavemealone...')

yieldexisting

Let'sstartunderstandingthefixtures.Wehavedefinedthematclass-levelthistime,whichmeanstheywillbealiveonlyforaslongasthetestsintheclassarerunning.Wedon'tneedthesefixturesoutsideofthisclass,soitdoesn'tmakesensetodeclarethematamodulelevellikewe'vedonewiththeuserones.

So,weneedtwofiles.IfyourecallwhatIwroteatthebeginningofthischapter,whenitcomestointeractionwithdatabases,disks,networks,andsoon,weshouldmockeverythingout.However,whenpossible,Iprefertouseadifferenttechnique.Inthiscase,Iwillemploytemporaryfolders,whichwillbebornwithinthefixture,anddiewithinit,leavingnotraceoftheirexistence.IammuchhappierifIcanavoidmocking.Mockingisamazing,butitcanbetricky,andasourceofbugs,unlessit'sdonecorrectly.

Now,thefirstfixture,csv_file,definesamanagedcontextinwhichweobtainareferencetoatemporaryfolder.Wecanconsiderthelogicuptoandincludingtheyield,asthesetupphase.Thefixtureitself,intermsofdata,isrepresentedbythetemporaryfilename.Thefileitselfisnotpresentyet.Whenatestruns,thefixtureiscreated,andattheendofthetest,therestofthefixturecode(theoneafteryield,ifany)isexecuted.Thatpartcanbeconsideredtheteardownphase.Inthiscase,itconsistsofexitingthecontextmanager,whichmeansthetemporaryfolderisdeleted(alongwithallitscontent).Youcanputmuchmoreineachphaseofanyfixture,andwithexperience,I'msureyou'llmastertheart

ofdoingsetupandteardownthisway.Itactuallycomesverynaturallyquitequickly.

Thesecondfixtureisverysimilartothefirstone,butwe'lluseittotestthatwecanpreventoverwritingwhenwecallexportwithoverwrite=False.Sowecreateafileinthetemporaryfolder,andweputsomecontentintoit,justtohavethemeanstoverifyithasn'tbeentouched.

Noticehowbothfixturesarereturningthefilenamewiththefullpathinformation,tomakesureweactuallyusethetemporaryfolderinourcode.Let'snowseethetests:

#tests/test_api.py

deftest_export(self,users,csv_file):

export(csv_file,users)

lines=csv_file.readlines()

assert[

'email,name,age,role\n',

'minimal@example.com,PrimusMinimus,18,\n',

'full@example.com,MaximusPlenus,65,emperor\n',

]==lines

Thistestemploystheusersandcsv_filefixtures,andimmediatelycallsexportwiththem.Weexpectthatafilehasbeencreated,andpopulatedwiththetwovaliduserswehave(rememberthelistcontainsthreeusers,butoneisinvalid).

Toverifythat,weopenthetemporaryfile,andcollectallitslinesintoalist.Wethencomparethecontentofthefilewithalistofthelinesthatweexpecttobeinit.Noticeweonlyputtheheader,andthetwovalidusers,inthecorrectorder.

Nowweneedanothertest,tomakesurethatifthereisacommainoneofthevalues,ourCSVisstillgeneratedcorrectly.Beingacomma-separatedvalues(CSV)file,weneedtomakesurethatacommainthedatadoesn'tbreakthingsup:

#tests/test_api.py

deftest_export_quoting(self,min_user,csv_file):

min_user['name']='Aname,withacomma'

export(csv_file,[min_user])

lines=csv_file.readlines()

assert[

'email,name,age,role\n',

'minimal@example.com,"Aname,withacomma",18,\n',

]==lines

Thistime,wedon'tneedthewholeuserslist,wejustneedoneaswe'retestingaspecificthing,andwehavetheprevioustesttomakesurewe'regeneratingthefilecorrectlywithalltheusers.Remember,alwaystrytominimizetheworkyoudowithinatest.

So,weusemin_user,andputanicecommainitsname.Wethenrepeattheprocedure,whichisverysimilartothatoftheprevioustest,andfinallywemakesurethatthenameisputintheCSVfilesurroundedbydoublequotes.ThisisenoughforanygoodCSVparsertounderstandthattheydon'thavetobreakonthecommainsidethedoublequotes.

NowIwantonemoretest,whichneedstocheckthatwhetherthefileexistsandwedon'twanttooverrideit,ourcodewon'ttouchit:

#tests/test_api.py

deftest_does_not_overwrite(self,users,existing_file):

withpytest.raises(IOError)aserr:

export(existing_file,users,overwrite=False)

asserterr.match(

r"'{}'alreadyexists\.".format(existing_file)

)

#let'salsoverifythefileisstillintact

assertexisting_file.read()=='Pleaseleavemealone...'

Thisisabeautifultest,becauseitallowsmetoshowyouhowyoucantellpytestthatyouexpectafunctioncalltoraiseanexception.Wedoitinthecontextmanagergiventousbypytest.raises,towhichwefeedtheexceptionweexpectfromthecallwemakeinsidethebodyofthatcontextmanager.Iftheexceptionisnotraised,thetestwillfail.

Iliketobethoroughinmytest,soIdon'twanttostopthere.Ialsoassertonthemessage,byusingtheconvenienterr.matchhelper(watchout,ittakesaregularexpression,notasimplestring–we'llseeregularexpressionsinChapter14,WebDevelopment).

Finally,let'smakesurethatthefilestillcontainsitsoriginalcontent(whichiswhyIcreatedtheexisting_filefixture)byopeningit,andcomparingallofitscontenttothestringitshouldbe.

FinalconsiderationsBeforewemoveontothenexttopic,letmejustwrapupwithsomeconsiderations.

First,IhopeyouhavenoticedthatIhaven'ttestedallthefunctionsIwrote.Specifically,Ididn'ttestget_valid_users,validate,andwrite_csv.Thereasonisbecausethesefunctionsareimplicitlytestedbyourtestsuite.Wehavetestedis_validandexport,whichismorethanenoughtomakesureourschemaisvalidatinguserscorrectly,andtheexportfunctionisdealingwithfilteringoutinvaliduserscorrectly,respectingexistingfileswhenneeded,andwritingaproperCSV.Thefunctionswehaven'ttestedaretheinternals,theyprovidelogicthatparticipatestodoingsomethingthatwehavethoroughlytestedanyway.Wouldaddingextratestsforthosefunctionsbegoodorbad?Thinkaboutitforamoment.

Theanswerisactuallydifficult.Themoreyoutest,thelessyoucanrefactorthatcode.Asitisnow,Icouldeasilydecidetocallis_validwithanothername,andIwouldn'thavetochangeanyofmytests.Ifyouthinkaboutit,itmakessense,becauseaslongasis_validprovidescorrectvalidationtotheget_valid_usersfunction,Idon'treallyneedtoknowaboutit.Doesthismakesensetoyou?

IfinsteadIhadtestsforthevalidatefunction,thenIwouldhavetochangethem,ifIdecidedtocallitdifferently(ortosomehowchangeitssignature).

So,whatistherightthingtodo?Testsornotests?Itwillbeuptoyou.Youhavetofindtherightbalance.Mypersonaltakeonthismatteristhateverythingneedstobethoroughlytested,eitherdirectlyorindirectly.AndIwantthesmallestpossibletestsuitethatguaranteesmethat.Thisway,Iwillhaveagreattestsuiteintermsofcoverage,butnotanybiggerthannecessary.Youneedtomaintainthosetests!

Ihopethisexamplemadesensetoyou,Ithinkithasallowedmetotouchontheimportanttopics.

Ifyoucheckoutthesourcecodeforthebook,inthetest_api.pymodule,Ihave

addedacoupleofextratestclasses,whichwillshowyouhowdifferenttestingwouldhavebeenhadIdecidedtogoallthewaywiththemocks.Makesureyoureadthatcodeandunderstanditwell.Itisquitestraightforwardandwillofferyouagoodcomparisonwithmypersonalapproach,whichIhaveshownyouhere.

Now,howaboutwerunthosetests?(Theoutputisre-arrangedtofitthisbook'sformat):

$pytesttests

======================testsessionstarts======================

platformdarwin--Python3.7.0b2,pytest-3.5.0,py-1.5.3,...

rootdir:/Users/fab/srv/lpp/ch8,inifile:

collected132items

tests/test_api.py...............................................

.................................................................

....................[100%]

==================132passedin0.41seconds===================

Makesureyourun$pytesttestfromwithinthech8folder(addthe-vvflagforaverboseoutputthatwillshowyouhowparametrizationmodifiesthenamesofyourtests).Asyoucansee,132testswereruninlessthanhalfasecond,andtheyallsucceeded.Istronglysuggestyoucheckoutthiscodeandplaywithit.Changesomethinginthecodeandseewhetheranytestisbreaking.Understandwhyitisbreaking.Isitsomethingimportantthatmeansthetestisn'tgoodenough?Orisitsomethingsillythatshouldn'tcausethetesttobreak?Alltheseapparentlyinnocuousquestionswillhelpyougaindeepinsightintotheartoftesting.

Ialsosuggestyoustudytheunittestmodule,andpytesttoo.Thesearetoolsyouwilluseallthetime,soyouneedtobeveryfamiliarwiththem.

Let'snowcheckouttest-drivendevelopment!

Test-drivendevelopmentLet'stalkbrieflyabouttest-drivendevelopment(TDD).ItisamethodologythatwasrediscoveredbyKentBeck,whowroteTest-DrivenDevelopmentbyExample,AddisonWesley,2002,whichIencourageyoutocheckoutifyouwanttolearnaboutthefundamentalsofthissubject.

TDDisasoftwaredevelopmentmethodologythatisbasedonthecontinuousrepetitionofaveryshortdevelopmentcycle.

First,thedeveloperwritesatest,andmakesitrun.Thetestissupposedtocheckafeaturethatisnotyetpartofthecode.Maybeitisanewfeaturetobeadded,orsomethingtoberemovedoramended.Runningthetestwillmakeitfailand,becauseofthis,thisphaseiscalledRed.

Whenthetesthasfailed,thedeveloperwritestheminimalamountofcodetomakeitpass.Whenrunningthetestsucceeds,wehavetheso-calledGreenphase.Inthisphase,itisokaytowritecodethatcheats,justtomakethetestpass.Thistechniqueiscalledfakeit'tillyoumakeit.Inasecondmoment,testsareenrichedwithdifferentedgecases,andthecheatingcodethenhastoberewrittenwithproperlogic.Addingothertestcasesiscalledtriangulation.

Thelastpieceofthecycleiswherethedevelopertakescareofboththecodeandthetests(inseparatetimes)andrefactorsthemuntiltheyareinthedesiredstate.ThislastphaseiscalledRefactor.

TheTDDmantrathereforeisRed-Green-Refactor.

Atfirst,itfeelsreallyweirdtowritetestsbeforethecode,andImustconfessittookmeawhiletogetusedtoit.Ifyousticktoit,though,andforceyourselftolearnthisslightlycounter-intuitivewayofworking,atsomepointsomethingalmostmagicalhappens,andyouwillseethequalityofyourcodeincreaseinawaythatwouldn'tbepossibleotherwise.

Whenyouwriteyourcodebeforethetests,youhavetotakecareofwhatthecodehastodoandhowithastodoit,bothatthesametime.Ontheotherhand,whenyouwritetestsbeforethecode,youcanconcentrateonthewhatpart

alone,whileyouwritethem.Whenyouwritethecodeafterward,youwillmostlyhavetotakecareofhowthecodehastoimplementwhatisrequiredbythetests.Thisshiftinfocusallowsyourmindtoconcentrateonthewhatandhowpartsinseparatemoments,yieldingabrainpowerboostthatwillsurpriseyou.

Thereareseveralotherbenefitsthatcomefromtheadoptionofthistechnique:

Youwillrefactorwithmuchmoreconfidence:Testswillbreakifyouintroducebugs.Moreover,thearchitecturalrefactorwillalsobenefitfromhavingteststhatactasguardians.Thecodewillbemorereadable:Thisiscrucialinourtime,whencodingisasocialactivityandeveryprofessionaldeveloperspendsmuchmoretimereadingcodethanwritingit.Thecodewillbemorelooselycoupledandeasiertotestandmaintain:Writingthetestsfirstforcesyoutothinkmoredeeplyaboutcodestructure.Writingtestsfirstrequiresyoutohaveabetterunderstandingofthebusinessrequirements:Ifyourunderstandingoftherequirementsislackinginformation,you'llfindwritingatestextremelychallengingandthissituationactsasasentinelforyou.Havingeverythingunittestedmeansthecodewillbeeasiertodebug:Moreover,smalltestsareperfectforprovidingalternativedocumentation.Englishcanbemisleading,butfivelinesofPythoninasimpletestareveryhardtomisunderstand.Higherspeed:It'sfastertowritetestsandcodethanitistowritethecodefirstandthenlosetimedebuggingit.Ifyoudon'twritetests,youwillprobablydeliverthecodesooner,butthenyouwillhavetotrackthebugsdownandsolvethem(and,restassured,therewillbebugs).ThecombinedtimetakentowritethecodeandthendebugitisusuallylongerthanthetimetakentodevelopthecodewithTDD,wherehavingtestsrunningbeforethecodeiswritten,ensuringthattheamountofbugsinitwillbemuchlowerthanintheothercase.

Ontheotherhand,themainshortcomingsofthistechniquearethefollowingones:

Thewholecompanyneedstobelieveinit:Otherwise,youwillhavetoconstantlyarguewithyourboss,whowillnotunderstandwhyittakesyousolongtodeliver.Thetruthis,itmaytakeyouabitlongertodeliverinthe

short-term,butinthelong-term,yougainalotwithTDD.However,itisquitehardtoseethelong-termbecauseit'snotunderournosesliketheshort-termis.Ihavefoughtbattleswithstubbornbossesinmycareer,tobeabletocodeusingTDD.Sometimesithasbeenpainful,butalwayswellworthit,andIhaveneverregretteditbecause,intheend,thequalityoftheresulthasalwaysbeenappreciated.Ifyoufailtounderstandthebusinessrequirements,thiswillreflectinthetestsyouwrite,andthereforeitwillreflectinthecodetoo:ThiskindofproblemisquitehardtospotuntilyoudoUAT,butonethingthatyoucandotoreducethelikelihoodofithappeningistopairwithanotherdeveloper.Pairingwillinevitablyrequirediscussionsaboutthebusinessrequirements,anddiscussionwillbringclarification,whichwillhelpwritingcorrecttests.Badlywrittentestsarehardtomaintain:Thisisafact.Testswithtoomanymocksorwithextraassumptionsorbadly-structureddatawillsoonbecomeaburden.Don'tletthisdiscourageyou;justkeepexperimentingandchangethewayyouwritethemuntilyoufindawaythatdoesn'trequireyouahugeamountofworkeverytimeyoutouchyourcode.

I'mquitepassionateaboutTDD.WhenIinterviewforajob,Ialwaysaskwhetherthecompanyadoptsit.Iencourageyoutocheckitoutanduseit.Useituntilyoufeelsomethingclickinginyourmind.Youwon'tregretit,Ipromise.

ExceptionsEventhoughIhaven'tformallyintroducedthemtoyou,bynowIexpectyoutoatleasthaveavagueideaofwhatanexceptionis.Inthepreviouschapters,we'veseenthatwhenaniteratorisexhausted,callingnextonitraisesaStopIterationexception.WemetIndexErrorwhenwetriedaccessingalistatapositionthatwasoutsidethevalidrange.WealsometAttributeErrorwhenwetriedaccessinganattributeonanobjectthatdidn'thaveit,andKeyErrorwhenwedidthesamewithakeyandadictionary.

Nowthetimehascomeforustotalkaboutexceptions.

Sometimes,eventhoughanoperationorapieceofcodeiscorrect,thereareconditionsinwhichsomethingmaygowrong.Forexample,ifwe'reconvertinguserinputfromstringtoint,theusercouldaccidentallytypealetterinplaceofadigit,makingitimpossibleforustoconvertthatvalueintoanumber.Whendividingnumbers,wemaynotknowinadvancewhetherwe'reattemptingadivisionbyzero.Whenopeningafile,itcouldbemissingorcorrupted.

Whenanerrorisdetectedduringexecution,itiscalledanexception.Exceptionsarenotnecessarilylethal;infact,we'veseenthatStopIterationisdeeplyintegratedinthePythongeneratoranditeratormechanisms.Normally,though,ifyoudon'ttakethenecessaryprecautions,anexceptionwillcauseyourapplicationtobreak.Sometimes,thisisthedesiredbehavior,butinothercases,wewanttopreventandcontrolproblemssuchasthese.Forexample,wemayalerttheuserthatthefilethey'retryingtoopeniscorruptedorthatitismissingsothattheycaneitherfixitorprovideanotherfile,withouttheneedfortheapplicationtodiebecauseofthisissue.Let'sseeanexampleofafewexceptions:

#exceptions/first.example.py

>>>gen=(nforninrange(2))

>>>next(gen)

0

>>>next(gen)

1

>>>next(gen)

Traceback(mostrecentcalllast):

File"<stdin>",line1,in<module>

StopIteration

>>>print(undefined_name)

Traceback(mostrecentcalllast):

File"<stdin>",line1,in<module>

NameError:name'undefined_name'isnotdefined

>>>mylist=[1,2,3]

>>>mylist[5]

Traceback(mostrecentcalllast):

File"<stdin>",line1,in<module>

IndexError:listindexoutofrange

>>>mydict={'a':'A','b':'B'}

>>>mydict['c']

Traceback(mostrecentcalllast):

File"<stdin>",line1,in<module>

KeyError:'c'

>>>1/0

Traceback(mostrecentcalllast):

File"<stdin>",line1,in<module>

ZeroDivisionError:divisionbyzero

Asyoucansee,thePythonshellisquiteforgiving.WecanseeTraceback,sothatwehaveinformationabouttheerror,buttheprogramdoesn'tdie.Thisisaspecialbehavior,aregularprogramorascriptwouldnormallydieifnothingweredonetohandleexceptions.

Tohandleanexception,Pythongivesyouthetrystatement.Whenyouenterthetryclause,Pythonwillwatchoutforoneormoredifferenttypesofexceptions(accordingtohowyouinstructit),andiftheyareraised,itwillallowyoutoreact.Thetrystatementiscomposedofthetryclause,whichopensthestatement,oneormoreexceptclauses(alloptional)thatdefinewhattodowhenanexceptioniscaught,anelseclause(optional),whichisexecutedwhenthetryclauseisexitedwithoutanyexceptionraised,andafinallyclause(optional),whosecodeisexecutedregardlessofwhateverhappenedintheotherclauses.Thefinallyclauseistypicallyusedtocleanupresources(wesawthisinChapter7,FilesandDataPersistence,whenwewereopeningfileswithoutusingacontextmanager).

Mindtheorder—it'simportant.Also,trymustbefollowedbyatleastoneexceptclauseorafinallyclause.Let'sseeanexample:

#exceptions/try.syntax.py

deftry_syntax(numerator,denominator):

try:

print(f'Inthetryblock:{numerator}/{denominator}')

result=numerator/denominator

exceptZeroDivisionErroraszde:

print(zde)

else:

print('Theresultis:',result)

returnresult

finally:

print('Exiting')

print(try_syntax(12,4))

print(try_syntax(11,0))

Theprecedingexampledefinesasimpletry_syntaxfunction.Weperformthedivisionoftwonumbers.WearepreparedtocatchaZeroDivisionErrorexceptionifwecallthefunctionwithdenominator=0.Initially,thecodeentersthetryblock.Ifdenominatorisnot0,resultiscalculatedandtheexecution,afterleavingthetryblock,resumesintheelseblock.Weprintresultandreturnit.Takealookattheoutputandyou'llnoticethatjustbeforereturningresult,whichistheexitpointofthefunction,Pythonexecutesthefinallyclause.

Whendenominatoris0,thingschange.Weentertheexceptblockandprintzde.Theelseblockisn'texecutedbecauseanexceptionwasraisedinthetryblock.Before(implicitly)returningNone,westillexecutethefinallyblock.Takealookattheoutputandseewhetheritmakessensetoyou:

$pythontry.syntax.py

Inthetryblock:12/4#try

Theresultis:3.0#else

Exiting#finally

3.0#returnwithinelse

Inthetryblock:11/0#try

divisionbyzero#except

Exiting#finally

None#implicitreturnendoffunction

Whenyouexecuteatryblock,youmaywanttocatchmorethanoneexception.Forexample,whentryingtodecodeaJSONobject,youmayincurintoValueErrorformalformedJSON,orTypeErrorifthetypeofthedatayou'refeedingtojson.loads()isnotastring.Inthiscase,youmaystructureyourcodelikethis:

#exceptions/json.example.py

importjson

json_data='{}'

try:

data=json.loads(json_data)

except(ValueError,TypeError)ase:

print(type(e),e)

ThiscodewillcatchbothValueErrorandTypeError.Trychangingjson_data='{}'tojson_data=2orjson_data='{{',andyou'llseethedifferentoutput.

Ifyouwanttohandlemultipleexceptionsdifferently,youcanjustaddmoreexceptclauses,likethis:

#exceptions/multiple.except.py

try:

#somecode

exceptException1:

#reacttoException1

except(Exception2,Exception3):

#reacttoException2orException3

exceptException4:

#reacttoException4

...

Keepinmindthatanexceptionishandledinthefirstblockthatdefinesthatexceptionclassoranyofitsbases.Therefore,whenyoustackmultipleexceptclauseslikewe'vejustdone,makesurethatyouputspecificexceptionsatthetopandgenericonesatthebottom.InOOPterms,childrenontop,grandparentsatthebottom.Moreover,rememberthatonlyoneexcepthandlerisexecutedwhenanexceptionisraised.

Youcanalsowritecustomexceptions.Todothat,youjusthavetoinheritfromanyotherexceptionclass.Python'sbuilt-inexceptionsaretoomanytobelistedhere,soIhavetopointyoutotheofficialdocumentation.OneimportantthingtoknowisthateveryPythonexceptionderivesfromBaseException,butyourcustomexceptionsshouldneverinheritdirectlyfromit.Thereasonisbecausehandlingsuchanexceptionwillalsotrapsystem-exitingexceptions,suchasSystemExitandKeyboardInterrupt,whichderivefromBaseException,andthiscouldleadtosevereissues.Inthecaseofdisaster,youwanttobeabletoCtrl+Cyourwayoutofanapplication.

YoucaneasilysolvetheproblembyinheritingfromException,whichinheritsfromBaseExceptionbutdoesn'tincludeanysystem-exitingexceptioninitschildrenbecausetheyaresiblingsinthebuilt-inexceptionshierarchy(seehttps://docs.python.org/3/library/exceptions.html#exception-hierarchy).

Programmingwithexceptionscanbeverytricky.Youcouldinadvertentlysilenceouterrors,ortrapexceptionsthataren'tmeanttobehandled.Playitsafebykeepinginmindafewguidelines:alwaysputinthetryclauseonlythecodethatmaycausetheexception(s)thatyouwanttohandle.Whenyouwriteexceptclauses,beasspecificasyoucan,don'tjustresorttoexceptExceptionbecauseit'seasy.Useteststomakesureyourcodehandlesedgecasesinawaythatrequirestheleastpossibleamountofexceptionhandling.Writinganexceptstatementwithoutspecifyinganyexceptionwouldcatchanyexception,thereforeexposingyourcodetothesamerisksyouincurwhenyouderiveyourcustomexceptions

fromBaseException.

Youwillfindinformationaboutexceptionsalmosteverywhereontheweb.Somecodersusethemabundantly,otherssparingly.Findyourownwayofdealingwiththembytakingexamplesfromotherpeople'ssourcecode.ThereareplentyofinterestingopensourceprojectsonwebsitessuchasGitHub(https://github.com)andBitbucket(https://bitbucket.org/).

Beforewetalkaboutprofiling,letmeshowyouanunconventionaluseofexceptions,justtogiveyousomethingtohelpyouexpandyourviewsonthem.Theyarenotjustsimplyerrors:

#exceptions/for.loop.py

n=100

found=False

forainrange(n):

iffound:break

forbinrange(n):

iffound:break

forcinrange(n):

if42*a+17*b+c==5096:

found=True

print(a,b,c)#799995

Theprecedingcodeisquiteacommonidiomifyoudealwithnumbers.Youhavetoiterateoverafewnestedrangesandlookforaparticularcombinationofa,b,andcthatsatisfiesacondition.Intheexample,conditionisatriviallinearequation,butimaginesomethingmuchcoolerthanthat.Whatbugsmeishavingtocheckwhetherthesolutionhasbeenfoundatthebeginningofeachloop,inordertobreakoutofthemasfastaswecanwhenitis.ThebreakoutlogicinterfereswiththerestofthecodeandIdon'tlikeit,soIcameupwithadifferentsolutionforthis.Takealookatit,andseewhetheryoucanadaptittoothercasestoo:

#exceptions/for.loop.py

classExitLoopException(Exception):

pass

try:

n=100

forainrange(n):

forbinrange(n):

forcinrange(n):

if42*a+17*b+c==5096:

raiseExitLoopException(a,b,c)

exceptExitLoopExceptionasele:

print(ele)#(79,99,95)

Canyouseehowmuchmoreelegantitis?Nowthebreakoutlogicisentirelyhandledwithasimpleexceptionwhosenameevenhintsatitspurpose.Assoonastheresultisfound,weraiseit,andimmediatelythecontrolisgiventotheexceptclausethathandlesit.Thisisfoodforthought.Thisexampleindirectlyshowsyouhowtoraiseyourownexceptions.Readupontheofficialdocumentationtodiveintothebeautifuldetailsofthissubject.

Moreover,ifyouareupforachallenge,youmightwanttotrytomakethislastexampleintoacontextmanagerfornestedforloops.Goodluck!

ProfilingPythonThereareafewdifferentwaystoprofileaPythonapplication.Profilingmeanshavingtheapplicationrunwhilekeepingtrackofseveraldifferentparameters,suchasthenumberoftimesafunctioniscalledandtheamountoftimespentinsideit.Profilingcanhelpusfindthebottlenecksinourapplication,sothatwecanimproveonlywhatisreallyslowingusdown.

Ifyoutakealookattheprofilingsectioninthestandardlibraryofficialdocumentation,youwillseethatthereareacoupleofdifferentimplementationsofthesameprofilinginterface—profileandcProfile:

cProfileisrecommendedformostusers,it'saCextensionwithreasonableoverheadthatmakesitsuitableforprofilinglong-runningprogramsprofileisapurePythonmodulewhoseinterfaceisimitatedbycProfile,butwhichaddssignificantoverheadtoprofiledprograms

Thisinterfacedoesdeterministprofiling,whichmeansthatallfunctioncalls,functionreturns,andexceptioneventsaremonitored,andprecisetimingsaremadefortheintervalsbetweentheseevents.Anotherapproach,calledstatisticalprofiling,randomlysamplestheeffectiveinstructionpointer,anddeduceswheretimeisbeingspent.

Thelatterusuallyinvolveslessoverhead,butprovidesonlyapproximateresults.Moreover,becauseofthewaythePythoninterpreterrunsthecode,deterministicprofilingdoesn'taddasmuchoverheadasonewouldthink,soI'llshowyouasimpleexampleusingcProfilefromthecommandline.

We'regoingtocalculatePythagoreantriples(Iknow,you'vemissedthem...)usingthefollowingcode:

#profiling/triples.py

defcalc_triples(mx):

triples=[]

forainrange(1,mx+1):

forbinrange(a,mx+1):

hypotenuse=calc_hypotenuse(a,b)

ifis_int(hypotenuse):

triples.append((a,b,int(hypotenuse)))

returntriples

defcalc_hypotenuse(a,b):

return(a**2+b**2)**.5

defis_int(n):#nisexpectedtobeafloat

returnn.is_integer()

triples=calc_triples(1000)

Thescriptisextremelysimple;weiterateovertheinterval[1,mx]withaandb(avoidingrepetitionofpairsbysettingb>=a)andwecheckwhethertheybelongtoarighttriangle.Weusecalc_hypotenusetogethypotenuseforaandb,andthen,withis_int,wecheckwhetheritisaninteger,whichmeans(a,b,c)isaPythagoreantriple.Whenweprofilethisscript,wegetinformationinatabularform.Thecolumnsarencalls,tottime,percall,cumtime,percall,andfilename:lineno(function).Theyrepresenttheamountofcallswemadetoafunction,howmuchtimewespentinit,andsoon.I'lltrimacoupleofcolumnstosavespace,soifyouruntheprofilingyourself—don'tworryifyougetadifferentresult.Hereisthecode:

$python-mcProfiletriples.py

1502538functioncallsin0.704seconds

Orderedby:standardname

ncallstottimepercallfilename:lineno(function)

5005000.3930.000triples.py:17(calc_hypotenuse)

5005000.0960.000triples.py:21(is_int)

10.0000.000triples.py:4(<module>)

10.1760.176triples.py:4(calc_triples)

10.0000.000{built-inmethodbuiltins.exec}

10340.0000.000{method'append'of'list'objects}

10.0000.000{method'disable'of'_lsprof.Profil...

5005000.0380.000{method'is_integer'of'float'objects}

Evenwiththislimitedamountofdata,wecanstillinfersomeusefulinformationaboutthiscode.First,wecanseethatthetimecomplexityofthealgorithmwehavechosengrowswiththesquareoftheinputsize.Theamountoftimeswegetinsidetheinnerloopbodyisexactlymx(mx+1)/2.Werunthescriptwithmx=1000,whichmeansweget500500timesinsidetheinnerforloop.Threemainthingshappeninsidethatloop:wecallcalc_hypotenuse,wecallis_int,and,iftheconditionismet,weappendittothetripleslist.

Takingalookattheprofilingreport,wenoticethatthealgorithmhasspent0.393secondsinsidecalc_hypotenuse,whichiswaymorethanthe0.096secondsspentinsideis_int,giventhattheywerecalledthesamenumberoftimes,solet'sseewhetherwecanboostcalc_hypotenusealittle.

Asitturnsout,wecan.AsImentionedearlierinthisbook,the**poweroperatorisquiteexpensive,andincalc_hypotenuse,we'reusingitthreetimes.Fortunately,wecaneasilytransformtwoofthoseintosimplemultiplications,likethis:

defcalc_hypotenuse(a,b):

return(a*a+b*b)**.5

Thissimplechangeshouldimprovethings.Ifweruntheprofilingagain,weseethat0.393isnowdownto0.137.Notbad!Thismeansnowwe'respendingonlyabout37%ofthetimeinsidecalc_hypotenusethatwewerebefore.

Let'sseewhetherwecanimproveis_intaswell,bychangingit,likethis:

defis_int(n):

returnn==int(n)

Thisimplementationisdifferent,andtheadvantageisthatitalsoworkswhennisaninteger.Alas,whenweruntheprofilingagainstit,weseethatthetimetakeninsidetheis_intfunctionhasgoneupto0.135seconds,so,inthiscase,weneedtoreverttothepreviousimplementation.Youwillfindthethreeversionsinthesourcecodeforthebook.

Thisexamplewastrivial,ofcourse,butenoughtoshowyouhowonecouldprofileanapplication.Havingtheamountofcallsthatareperformedagainstafunctionhelpsusbetterunderstandthetimecomplexityofouralgorithms.Forexample,youwouldn'tbelievehowmanycodersfailtoseethatthosetwoforloopsrunproportionallytothesquareoftheinputsize.

Onethingtomention:dependingonwhatsystemyou'reusing,resultsmaybedifferent.Therefore,it'squiteimportanttobeabletoprofilesoftwareonasystemthatisascloseaspossibletotheonethesoftwareisdeployedon,ifnotactuallyonthatone.

Whentoprofile?Profilingissupercool,butweneedtoknowwhenitisappropriatetodoit,andinwhatmeasureweneedtoaddresstheresultswegetfromit.

DonaldKnuthoncesaid,""prematureoptimizationistherootofallevil","and,althoughIwouldn'thaveputitdownsodrastically,Idoagreewithhim.Afterall,whoamItodisagreewiththemanwhogaveusTheArtofComputerProgramming,TeX,andsomeofthecoolestalgorithmsIhaveeverstudiedwhenIwasauniversitystudent?

So,firstandforemost:correctness.Youwantyourcodetodeliverthecorrectresults,thereforewritetests,findedgecases,andstressyourcodeineverywayyouthinkmakessense.Don'tbeprotective,don'tputthingsinthebackofyourbrainforlaterbecauseyouthinkthey'renotlikelytohappen.Bethorough.

Second,takecareofcodingbestpractices.Rememberthefollowing—readability,extensibility,loosecoupling,modularity,anddesign.ApplyOOPprinciples:encapsulation,abstraction,singleresponsibility,open/closed,andsoon.Readupontheseconcepts.Theywillopenhorizonsforyou,andtheywillexpandthewayyouthinkaboutcode.

Third,refactorlikeabeast!TheBoyScoutsrulesays:

"Alwaysleavethecampgroundcleanerthanyoufoundit."

Applythisruletoyourcode.

And,finally,whenallofthishasbeentakencareof,thenandonlythen,takecareofoptimizingandprofiling.

Runyourprofilerandidentifybottlenecks.Whenyouhaveanideaofthebottlenecksyouneedtoaddress,startwiththeworstonefirst.Sometimes,fixingabottleneckcausesarippleeffectthatwillexpandandchangethewaytherestofthecodeworks.Sometimesthisisonlyalittle,sometimesabitmore,accordingtohowyourcodewasdesignedandimplemented.Therefore,start

withthebiggestissuefirst.

OneofthereasonsPythonissopopularisthatitispossibletoimplementitinmanydifferentways.So,ifyoufindyourselfhavingtroubleboostingupsomepartofyourcodeusingsheerPython,nothingpreventsyoufromrollingupyoursleeves,buying200litersofcoffee,andrewritingtheslowpieceofcodeinC—guaranteedtobefun!

SummaryInthischapter,weexploredtheworldoftesting,exceptions,andprofiling.

Itriedtogiveyouafairlycomprehensiveoverviewoftesting,especiallyunittesting,whichisthekindoftestingthatadevelopermostlydoes.IhopeIhavesucceededinchannelingthemessagethattestingisnotsomethingthatisperfectlydefinedthatyoucanlearnfromabook.Youneedtoexperimentwithitalotbeforeyougetcomfortable.Ofalltheeffortsacodermustmakeintermsofstudyandexperimentation,I'dsaytestingistheonethatisthemostimportant.

Webrieflysawhowwecanpreventourprogramfromdyingbecauseoferrors,calledexceptions,thathappenatruntime.And,tosteerawayfromtheusualground,Ihavegivenyouanexampleofasomewhatunconventionaluseofexceptionstobreakoutofnestedforloops.That'snottheonlycase,andI'msureyou'lldiscoverothersasyougrowasacoder.

Attheend,weverybrieflytouchedonprofiling,withasimpleexampleandafewguidelines.Iwantedtotalkaboutprofilingforthesakeofcompleteness,soatleastyoucanplayaroundwithit.

Inthenextchapter,we'regoingtoexplorethewonderfulworldofsecrets,hashing,andcreatingtokens.

IamawarethatIgaveyoualotofpointersinthischapter,withnolinksordirections.I'mafraidthiswasbychoice.Asacoder,therewon'tbeasingledayatworkwhenyouwon'thavetolooksomethingupinadocumentationpage,inamanual,onawebsite,andsoon.Ithinkit'svitalforacodertobeabletosearcheffectivelyfortheinformationtheyneed,soIhopeyou'llforgivemeforthisextratraining.Afterall,it'sallforyourbenefit.

CryptographyandTokens"ThreemaykeepaSecret,iftwoofthemaredead."

–BenjaminFranklin,PoorRichard'sAlmanack

Inthisshortchapter,IamgoingtogiveyouabriefoverviewofthecryptographicservicesofferedbythePythonstandardlibrary.IamalsogoingtotouchuponsomethingcalledJSONWebToken,whichisaveryinterestingstandardtorepresentclaimssecurelybetweentwoparties.

Inparticular,wearegoingtoexplorethefollowing:

HashlibSecretsHMACJSONWebTokenswithPyJWT,whichseemstobethemostpopularPythonlibraryfordealingwithJWTs

Let'sstartbyspendingamomenttalkingaboutcryptographyandwhyitissoimportant.

TheneedforcryptographyAccordingtothestatisticsyoucanfindallovertheweb,theestimatedamountofsmartphoneusersin2019willbearound2.5billion.EachofthosepeopleknowthePINtounlocktheirphone,thecredentialstologintoapplicationsweallusetodo,well,basicallyeverything,frombuyingfoodtofindingastreet,fromsendingamessagetoafriend,toseeingifourbitcoinwallethasincreasedinvaluesincewelastchecked10secondsago.

Ifyouareanapplicationdeveloper,youhavetotakesecurityvery,veryseriously.Itdoesn'tmatterhowsmallorapparentlyinsignificantyourapplicationis:securityshouldalwaysbeaconcernforyou.

Securityininformationtechnologyisachievedbyemployingseveraldifferentmeans,butbyfar,themostimportantoneiscryptography.Everythingyoudowithyourcomputerorphoneshouldincludealayerwherecryptographytakesplace(andifnot,that'sreallybad).Itisusedtopayonlinewithacreditcard,totransfermessagesoverthenetworkinawaythatevenifsomeoneinterceptsthem,theywon'tbeabletoreadthem,anditisusedtoencryptyourfileswhenyoubackthemupinthecloud(becauseyoudo,right?).Listsofexamplesareendless.

Now,thepurposeofthischapterisnotthatofteachingyouthedifferencebetweenhashingandencryption,asIcouldwriteawholeotherbookonthesubject.Rather,itisthatofshowingyouhowyoucanusethetoolsthatPythonoffersyoutocreatedigests,tokens,andingeneral,tobeonthesafe(r)sidewhenyouneedtoimplementsomethingcryptography-related.

Usefulguidelines

Alwaysrememberthefollowingrules:

Rulenumberone:Donotattempttocreateyourownhashorencryptionfunctions.Simplydon't.Usetoolsandfunctionsthataretherealready.Itisincrediblytoughtocomeupwithagood,solid,robustalgorithmtodohashingorencryption,soit'sbesttoleaveittoprofessionalcryptographers.Rulenumbertwo:Followrulenumberone.

Thosearetheonlytworulesyouneed.Apartfromthem,itisveryusefultounderstandcryptography,soyouneedtotryandlearnasmuchasyoucanaboutthissubject.Thereisplentyofinformationontheweb,butforyourconvenience,I'llputsomeusefulreferencesattheendofthischapter.

Now,let'sdigintothefirstofthestandardlibrarymodulesIwanttoshowyou:hashlib.

HashlibThismoduleexposesacommoninterfacetomanydifferentsecurehashandmessagedigestalgorithms.Thedifferenceinthosetwotermsissimplyhistorical:olderalgorithmswerecalleddigests,whilethemodernalgorithmsarecalledhashes.

Ingeneral,ahashfunctionisanyfunctionthatcanbeusedtomapdataofanarbitrarysizetodataofafixedsize.Itisaone-waytypeofencryption,inthatitisnotexpectedtobeabletorecoverthemessagegivenitshash.

Thereareseveralalgorithmsthatcanbeusedtocalculateahash,solet'sseehowtofindoutwhichonesaresupportedbyyoursystem(note,yourresultsmightbedifferentthanmine):

>>>importhashlib

>>>hashlib.algorithms_available

{'SHA512','SHA256','shake_256','sha3_256','ecdsa-with-SHA1',

'DSA-SHA','sha1','sha384','sha3_224','whirlpool','mdc2',

'RIPEMD160','shake_128','MD4','dsaEncryption','dsaWithSHA',

'SHA1','blake2s','md5','sha','sha224','SHA','MD5',

'sha256','SHA384','sha3_384','md4','SHA224','MDC2',

'sha3_512','sha512','blake2b','DSA','ripemd160'}

>>>hashlib.algorithms_guaranteed

{'blake2s','md5','sha224','sha3_512','shake_256','sha3_256',

'shake_128','sha256','sha1','sha512','blake2b','sha3_384',

'sha384','sha3_224'}

ByopeningaPythonshell,wecangetthelistofavailablealgorithmsforoursystem.Ifourapplicationhastotalktothird-partyapplications,it'salwaysbesttopickanalgorithmoutofthoseguaranteed,though,asthatmeanseveryplatformactuallysupportsthem.Noticethatalotofthemstartwithsha,whichmeanssecurehashalgorithm.Let'skeepgoinginthesameshell:wearegoingtocreateahashforthebinarystringb'Hashmenow!',andwe'regoingtodoitintwoways:

>>>h=hashlib.blake2b()

>>>h.update(b'Hashme')

>>>h.update(b'now!')

>>>h.hexdigest()

'56441b566db9aafcf8cdad3a4729fa4b2bfaab0ada36155ece29f52ff70e1e9d'

'7f54cacfe44bc97c7e904cf79944357d023877929430bc58eb2dae168e73cedf'

>>>h.digest()

b'VD\x1bVm\xb9\xaa\xfc\xf8\xcd\xad:G)\xfaK+\xfa\xab\n\xda6\x15^'

b'\xce)\xf5/\xf7\x0e\x1e\x9d\x7fT\xca\xcf\xe4K\xc9|~\x90L\xf7'

b'\x99D5}\x028w\x92\x940\xbcX\xeb-\xae\x16\x8es\xce\xdf'

>>>h.block_size

128

>>>h.digest_size

64

>>>h.name

'blake2b'

Wehaveusedtheblake2bcryptographicfunction,whichisquitesophisticatedandwasaddedinPython3.6.Aftercreatingthehashobjecth,weupdateitsmessageintwosteps.Notthatweneedto,butsometimesweneedtohashdatathatisnotavailableallatonce,soit'sgoodtoknowwecandoitinsteps.

Whenthemessageislikewewantittobe,wegetthehexrepresentationofthedigest.Thiswillusetwocharactersperbyte(aseachcharacterrepresents4bits,whichishalfabyte).Wealsogetthebyterepresentationofthedigest,andthenweinspectitsdetails:ithasablocksize(theinternalblocksizeofthehashalgorithminbytes)of128bytes,adigestsize(thesizeoftheresultinghashinbytes)of64bytes,andaname.Couldallthisbedoneinonesimplerline?Yes,ofcourse:

>>>hashlib.blake2b(b'Hashmenow!').hexdigest()

'56441b566db9aafcf8cdad3a4729fa4b2bfaab0ada36155ece29f52ff70e1e9d'

'7f54cacfe44bc97c7e904cf79944357d023877929430bc58eb2dae168e73cedf'

Noticehowthesamemessageproducesthesamehash,whichofcourseisexpected.

Let'sseewhatwegetif,insteadoftheblake2bfunction,weusesha256:

>>>hashlib.sha256(b'Hashmenow!').hexdigest()

'10d561fa94a89a25ea0c7aa47708bdb353bbb062a17820292cd905a3a60d6783'

Theresultinghashisshorter(andthereforelesssecure).

Hashingisaveryinterestingtopic,andofcoursethesimpleexampleswe'veseensofararejustthestart.Theblake2bfunctionallowsusagreatdealofflexibilityintermsofcustomization.Thisisextremelyusefultopreventsomekindsofattacks(forthefullexplanationofthosethreats,pleasedorefertothestandarddocumentationat:https://docs.python.org/3.7/library/hashlib.htmlforthehashlibmodule).Let'sseeanotherexamplewherewecustomizeahashbyaddingakey,asalt,andaperson.Allofthisextrainformationwillcausethehashtobedifferentthantheonewewouldgetifwedidn'tprovidethem,andarecrucialin

addingextrasecuritytothedatahandledinoursystem:

>>>h=hashlib.blake2b(

...b'Importantpayload',digest_size=16,key=b'secret-key',

...salt=b'random-salt',person=b'fabrizio'

...)

>>>h.hexdigest()

'c2d63ead796d0d6d734a5c3c578b6e41'

Theresultinghashisonly16byteslong.Amongthecustomizationparameters,saltisprobablythemostfamousone.Itisrandomdatathatisusedasanadditionalinputtoaone-wayfunctionthathashesdata.Itiscommonlystoredalongsidetheresultinghash,inordertoprovidethemeanstorecoverthesamehashgiventhesamemessage.

Ifyouwanttomakesureyouhashapasswordproperly,youcanusepbkdf2_hmac,akeyderivationalgorithmthatallowsyoutospecifyasaltandalsothenumberofiterationsusedbythealgorithmitself.Ascomputersgetmoreandmorepowerful,itisimportanttoincreasetheamountofiterationswedoovertime,otherwisethelikelihoodofasuccessfulbrute-forceattackonourdataincreasesastimepasses.Here'showyouwouldusesuchanalgorithm:

>>>importos

>>>dk=hashlib.pbkdf2_hmac(

...'sha256',b'Password123',os.urandom(16),100000

...)

>>>dk.hex()

'f8715c37906df067466ce84973e6e52a955be025a59c9100d9183c4cbec27a9e'

NoticeIhaveusedos.urandomtoprovidea16byterandomsalt,asrecommendedbythedocumentation.

Iencourageyoutoexploreandexperimentwiththismodule,assoonerorlateryouwillhavetouseit.Now,let'smoveontothesecretsone.

SecretsThisnice,smallmoduleisusedforgeneratingcryptographicallystrong,randomnumberssuitableformanagingdatasuchaspasswords,accountauthentication,securitytokens,andrelatedsecrets.ItwasaddedinPython3.6,andbasicallydealswiththreethings:randomnumbers,tokens,anddigestcomparison.Let'sexplorethemveryquickly.

RandomnumbersWecanusethreefunctionsinordertodealwithrandomnumbers:

#secrs/secr_rand.py

importsecrets

print(secrets.choice('Chooseoneofthesewords'.split()))

print(secrets.randbelow(10**6))

print(secrets.randbits(32))

Thefirstone,choice,picksanelementatrandomfromanon-emptysequence.Thesecondone,randbelow,generatesarandomintegerbetween0andtheargumentyoucallitwith,andthethirdone,randbits,generatesanintegerwithnrandombitsinit.Runningthatcodeproducesthefollowingoutput(whichisalwaysdifferent):

$pythonsecr_rand.py

one

504156

3172492450

Youshouldusethesefunctionsinsteadofthosefromtherandommodulewheneveryouneedrandomnessinthecontextofcryptography,asthesearespeciallydesignedforthistask.Let'sseewhatthemodulegivesusfortokens.

TokengenerationAgain,wehavethreefunctionsthatallproduceatoken,albeitindifferentformats.Let'sseetheexample:

#secrs/secr_rand.py

print(secrets.token_bytes(16))

print(secrets.token_hex(32))

print(secrets.token_urlsafe(32))

Thefirstone,token_bytes,simplyreturnsarandombytestringcontainingnbytes(16,inthisexample).Theothertwodothesame,buttoken_hexreturnsatokeninhexadecimalformat,andtoken_urlsafereturnsatokenthatonlycontainscharacterssuitableforbeingincludedinaURL.Let'sseetheoutput(whichisacontinuationfromthepreviousrun):

b'\xda\x863\xeb\xbb|\x8fk\x9b\xbd\x14Q\xd4\x8d\x15}'

9f90fd042229570bf633e91e92505523811b45e1c3a72074e19bbeb2e5111bf7

bl4qz_Av7QNvPEqZtKsLuTOUsNLFmXW3O03pn50leiY

Thisisallnice,sowhydon'twehavesomefunandwritearandompasswordgeneratorusingthesetools?

#secrs/secr_gen.py

importsecrets

fromstringimportdigits,ascii_letters

defgenerate_pwd(length=8):

chars=digits+ascii_letters

return''.join(secrets.choice(chars)forcinrange(length))

defgenerate_secure_pwd(length=16,upper=3,digits=3):

iflength<upper+digits+1:

raiseValueError('Nicetry!')

whileTrue:

pwd=generate_pwd(length)

if(any(c.islower()forcinpwd)

andsum(c.isupper()forcinpwd)>=upper

andsum(c.isdigit()forcinpwd)>=digits):

returnpwd

print(generate_secure_pwd())

print(generate_secure_pwd(length=3,upper=1,digits=1))

Inthepreviouscode,wedefinedtwofunctions.generate_pwdsimplygeneratesarandomstringofgivenlengthbyjoiningtogetherlengthcharacterspickedatrandomfromastringthatcontainsallthelettersofthealphabet(lowercaseand

uppercase),andthe10decimaldigits.

Then,wedefineanotherfunction,generate_secure_pwd,thatsimplykeepscallinggenerate_pwduntiltherandomstringwegetmatchestherequirements,whicharequitesimple.Thepasswordmusthaveatleastonelowercasecharacter,upperuppercasecharacters,digitsdigits,andlengthlength.

Beforewediveintothewhileloop,it'sworthnotingthatifwesumtogethertherequirements(uppercase,lowercase,anddigits)andthatsumisgreaterthantheoveralllengthofthepassword,thereisnowaywecaneversatisfytheconditionwithintheloop.So,inordertoavoidgettingstuckinaninfiniteloop,Ihaveputacheckclauseinthefirstlineofthebody,andIraiseaValueErrorincaseIneedit.Couldyouthinkofhowtowriteatestforthisedgecase?

Thebodyofthewhileloopisstraightforward:firstwegeneratetherandompassword,andthenweverifytheconditionsbyusinganyandsum.anyreturnsTrueifanyoftheitemsintheiterableit'scalledwithevaluatetoTrue.Theuseofsumisactuallyslightlymoretrickyhere,inthatitexploitspolymorphism.CanyouseewhatI'mtalkingaboutbeforeyoureadon?

Well,it'sverysimple:TrueandFalseinPythonaresubclassesofintegernumbers,thereforewhensummingonaniterableofTrue/Falsevalues,theywillautomaticallybeinterpretedlikeintegersbythesumfunction.Thatiscalledpolymorphism,andwe'vebrieflytalkedaboutitinChapter6,OOP,Decorators,andIterators.

Runningtheexampleproducesthefollowingresult:

$pythonsecr_gen.py

nsL5voJnCi7Ote3F

J5e

Thesecondpasswordisprobablynottoosecure...

Onelastexample,beforewemoveontothenextmodule.Let'sgeneratearesetpasswordURL:

#secrs/secr_reset.py

importsecrets

defget_reset_pwd_url(token_length=16):

token=secrets.token_urlsafe(token_length)

returnf'https://fabdomain.com/reset-pwd/{token}'

print(get_reset_pwd_url())

ThisfunctionissoeasyIwillonlyshowyoutheoutput:

$pythonsecr_reset.py

https://fabdomain.com/reset-pwd/m4jb7aKgzTGuyjs9lTIspw

Digestcomparison

Thisisprobablyquitesurprising,butwithinsecrets,youcanfindthecompare_digest(a,b)function,whichistheequivalentofcomparingtwodigestsbysimplydoinga==b.So,whydoweneedthatfunction?It'sbecauseithasbeendesignedtopreventtimingattacks.Thesekindofattackscaninferinformationaboutwherethetwodigestsstartbeingdifferent,accordingtothetimeittakesforthecomparisontofail.So,compare_digestpreventsthisattackbyremovingthecorrelationbetweentimeandfailures.Ithinkthisisabrilliantexampleofhowsophisticatedattackingmethodscanbe.Ifyouraisedyoureyebrowsinastonishment,maybenowit'sclearerwhyIsaidtoneverimplementcryptographyfunctionsbyyourself.

Andthat'sit!Now,let'scheckouthmac.

HMACThismoduleimplementstheHMACalgorithm,asdescribedbyRFC2104(https://tools.ietf.org/html/rfc2104.html).Sinceitisverysmall,butnonethelessimportant,Iwillprovideyouwithasimpleexample:

#hmc.py

importhmac

importhashlib

defcalc_digest(key,message):

key=bytes(key,'utf-8')

message=bytes(message,'utf-8')

dig=hmac.new(key,message,hashlib.sha256)

returndig.hexdigest()

digest=calc_digest('secret-key','ImportantMessage')

Asyoucansee,theinterfaceisalwaysthesameorsimilar.Wefirstconvertthekeyandthemessageintobytes,andthencreateadigestinstancethatwewillusetogetahexadecimalrepresentationofthehash.Notmuchelsetosay,butIthoughttoaddthismoduleanyway,forcompleteness.

Now,let'smoveontoadifferenttypeoftoken:JWTs.

JSONWebTokensAJSONWebToken,orJWT,isaJSON-basedopenstandardforcreatingtokensthatassertsomenumberofclaims.Youcanlearnallaboutthistechnologyonthewebsite(https://jwt.io/).Inanutshell,thistypeoftokeniscomprisedofthreesections,separatedbyadot,intheformatA.B.C.Bisthepayload,whichiswhereweputthedataandtheclaims.Cisthesignature,whichisusedtoverifythevalidityofthetoken,andAisthealgorithmusedtocomputethesignature.A,B,andCareallencodedwithaURLsafeBase64encoding(whichI'llrefertoasBase64URL).

Base64isaverypopularbinary-to-textencodingschemethatrepresentsbinarydatainanASCIIstringformatbytranslatingitintoaradix-64representation.Theradix-64representationusesthelettersA-Z,a-z,andthedigits0-9,plusthetwosymbols+and/foragrandtotalof64symbolsaltogether.Therefore,notsurprisingly,theBase64alphabetismadeupofthese64symbols.Base64isused,forexample,toencodeimagesattachedinanemail.Ithappensseamlessly,sothevastmajorityofpeoplearecompletelyobliviousofthisfact.

ThereasonwhyaJWTisencodedusingBase64URLisbecauseofthecharacters+and/,whichinaURLcontextmeanspace,andpathseparator,respectively.ThereforeintheURLsafeversion,theyarereplacedwith-and_.Moreover,anypaddingcharacter(=),whichisnormallyusedinBase64,isstrippedout,asthistoohasaspecificmeaningwithinaURL.

Thewaythistypeoftokenworksisthereforeslightlydifferentthanwhatweareusedtowhenweworkwithhashes.Infact,theinformationthatthetokencarriesisalwaysvisible.YoujustneedtodecodeAandBtogetthealgorithmandthepayload.However,thesecurityliesinpartC,whichisaHMAChashofthetoken.IfyoutrytomodifytheBpartbyeditingthepayload,encodingitbacktoBase64,andreplacingitinthetoken,thesignaturewon'tmatchanymore,andthereforethetokenwillbeinvalid.

Thismeansthatwecanbuildapayloadwithclaimssuchasloggedinasadmin,orsomethingalongthoselines,andaslongasthetokenisvalid,weknowwecantrustthatthatuserisactuallyloggedinasanadmin.

WhendealingwithJWTs,youwanttomakesureyouhaveresearchedhowtohandlethem

safely.Thingslikenotacceptingunsignedtokens,orrestrictingthelistofalgorithmsyouusetoencodeanddecode,aswellasothersecuritymeasures,areveryimportantandyoushouldtakethetimetoinvestigateandlearnthem.

Forthispartofthecode,youwillhavetohavethePyJWTandcryptographyPythonpackagesinstalled.Asalways,youwillfindthemintherequirementsofthesourcecodeofthisbook.

Let'sstartwithasimpleexample:

#tok.py

importjwt

data={'payload':'data','id':123456789}

token=jwt.encode(data,'secret-key')

data_out=jwt.decode(token,'secret-key')

print(token)

print(data_out)

Wedefinethedatapayload,whichcontainsanIDandsomepayloaddata.Then,wecreateatokenusingthejwt.encodefunction,whichtakesatleastthepayloadandasecretkey,whichisusedtocomputethesignature.ThedefaultalgorithmusedtocalculatethetokenisHS256.Let'sseetheoutput:

$pythontok.py

b'eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJwYXlsb2FkIjoiZGF0YSIsImlkIjoxMjM0NTY3ODl9.WFRY-

uoACMoNYX97PXXjEfXFQO1rCyFCyiwxzOVMn40'

{'payload':'data','id':123456789}

So,asyoucansee,thetokenisabinarystringofBase64URL-encodedpiecesofdata.Wehavecalledjwt.decode,providingthecorrectsecretkey.Hadwedoneotherwise,thedecodingwouldhavebroken.

Sometimes,youmightwanttobeabletoinspectthecontentofthetokenwithoutverifyingit.Youcandosobysimplycallingdecodethisway:

#tok.py

jwt.decode(token,verify=False)

Thisisuseful,forexample,whenvaluesinthetokenpayloadareneededtorecoverthesecretkey,butthattechniqueisquiteadvancedsoIwon'tbespendingtimeonitinthiscontext.Instead,let'sseehowwecanspecifyadifferentalgorithmforcomputingthesignature:

#tok.py

token512=jwt.encode(data,'secret-key',algorithm='HS512')

data_out=jwt.decode(token512,'secret-key',algorithm='HS512')

print(data_out)

Theoutputisouroriginalpayloaddictionary.Incaseyouwanttoallowmorethanonealgorithminthedecodingphase,youcanevenspecifyalistofthem,insteadofonlyone.

Now,whileyouarefreetoputwhateveryouwantinthetokenpayload,therearesomeclaimsthathavebeenstandardized,andtheyenableyoutohaveagreatdealofcontroloverthetoken.

Registeredclaims

Atthetimeofwritingthisbook,thesearetheregisteredclaims:

iss:Theissuerofthetokensub:Thesubjectinformationaboutthepartythistokeniscarryinginformationaboutaud:Theaudienceforthetokenexp:Theexpirationtime,afterwhichthetokenisconsideredtobeinvalidnbf:Thenotbefore(time),orthetimebeforewhichthetokenisconsideredtobenotvalidyetiat:Thetimeatwhichthetokenwasissuedjti:ThetokenID

Claimscanalsobecategorizedaspublicorprivate:

Private:Arethosethataredefinedbyusers(consumersandproducers)oftheJWTs.Inotherwords,theseareadhocclaimsusedforaparticularcase.Assuch,caremustbetakentopreventcollisions.Public:AreclaimsthatareeitherregisteredwiththeIANAJSONWebTokenClaimsRegistry(aregistrywhereuserscanregistertheirclaimsandthuspreventcollisions),ornamedusingacollisionresistantname(forinstance,byprependinganamespacetoitsname).

Tolearnallaboutclaims,pleaserefertotheofficialwebsite.Now,let'sseeacoupleofcodeexamplesinvolvingasubsetoftheseclaims.

Time-relatedclaimsLet'sseehowwemightusetheclaimsrelatedtotime:

#claims_time.py

fromdatetimeimportdatetime,timedelta

fromtimeimportsleep

importjwt

iat=datetime.utcnow()

nfb=iat+timedelta(seconds=1)

exp=iat+timedelta(seconds=3)

data={'payload':'data','nbf':nfb,'exp':exp,'iat':iat}

defdecode(token,secret):

print(datetime.utcnow().time().isoformat())

try:

print(jwt.decode(token,secret))

except(

jwt.ImmatureSignatureError,jwt.ExpiredSignatureError

)aserr:

print(err)

print(type(err))

secret='secret-key'

token=jwt.encode(data,secret)

decode(token,secret)

sleep(2)

decode(token,secret)

sleep(2)

decode(token,secret)

Inthisexample,wesettheissuedat(iat)claimtothecurrentUTCtime(UTCstandsforUniversalTimeCoordinated).Wethensetthenotbefore(nbf)andexpiretime(exp)at1and3secondsfromnow,respectively.Wethendefinedadecodehelperfunctionthatreactstoatokennotbeingvalidyet,orbeingexpired,bytrappingtheappropriateexceptions,andthenwecallitthreetimes,interspersedbytwocallstosleep.Thisway,wewilltrytodecodethetokenwhenit'snotvalidyet,thenwhenit'svalid,andfinallywhenit'salreadyexpired.Thisfunctionalsoprintsausefultimestampbeforeattemptingdecryption.Let'sseehowitgoes(blanklineshavebeenaddedforreadability):

$pythonclaims_time.py

14:04:13.469778

Thetokenisnotyetvalid(nbf)

<class'jwt.exceptions.ImmatureSignatureError'>

14:04:15.475362

{'payload':'data','nbf':1522591454,'exp':1522591456,'iat':1522591453}

14:04:17.476948

Signaturehasexpired

<class'jwt.exceptions.ExpiredSignatureError'>

Asyoucansee,itallexecutedasexpected.Wegetnice,descriptivemessagesfromtheexceptions,andgettheoriginalpayloadbackwhenthetokenisactuallyvalid.

Auth-relatedclaimsLet'sseeanotherquickexampleinvolvingtheissuer(iss)andaudience(aud)claims.Thecodeisconceptuallyverysimilartothepreviousexample,andwe'regoingtoexerciseitinthesameway:#claims_auth.pyimportjwt

data={'payload':'data','iss':'fab','aud':'learn-python'}secret='secret-key'token=jwt.encode(data,secret)

defdecode(token,secret,issuer=None,audience=None):try:print(jwt.decode(token,secret,issuer=issuer,audience=audience))except(jwt.InvalidIssuerError,jwt.InvalidAudienceError)aserr:print(err)print(type(err))

decode(token,secret)#notprovidingtheissuerwon'tbreakdecode(token,secret,audience='learn-python')#notprovidingtheaudiencewillbreakdecode(token,secret,issuer='fab')#bothwillbreakdecode(token,secret,issuer='wrong',audience='learn-python')decode(token,secret,issuer='fab',audience='wrong')

decode(token,secret,issuer='fab',audience='learn-python')

Asyoucansee,thistime,wehavespecifiedissuerandaudience.Itturnsoutthatifwedon'tprovidetheissuerwhendecodingthetoken,itwon'tcausethedecodingtobreak.However,providingthewrongissuerwillactuallybreakdecoding.On

theotherhand,bothfailingtoprovidetheaudience,orprovidingthewrongaudience,willbreakdecoding.

Asinthepreviousexample,Ihavewrittenacustomdecodefunctionthatreactstotheappropriateexceptions.Seeifyoucanfollowalongwiththecallsandtherelativeoutputthatfollows(I'llhelpwithsomeblanklines):$pythonclaims_auth.pyInvalidaudience<class'jwt.exceptions.InvalidAudienceError'>

{'payload':'data','iss':'fab','aud':'learn-python'}

Invalidaudience<class'jwt.exceptions.InvalidAudienceError'>

Invalidissuer<class'jwt.exceptions.InvalidIssuerError'>

Invalidaudience<class'jwt.exceptions.InvalidAudienceError'>

{'payload':'data','iss':'fab','aud':'learn-python'}

Now,let'sseeonefinalexampleforamorecomplexusecase.

Usingasymmetric(public-key)algorithmsSometimes,usingasharedsecretisnotthebestoption.Inthosecases,itmightbeusefultoadoptadifferenttechnique.Inthisexample,wearegoingtocreateatoken(anddecodeit)usingapairofRSAkeys.

Publickeycryptography,orasymmetricalcryptography,isanycryptographicsystemthatusespairsofkeys:publickeyswhichmaybedisseminatedwidely,andprivatekeyswhichareknownonlytotheowner.Ifyouareinterestedinlearningmoreaboutthistopic,pleaseseetheendofthischapterforrecommendations.

Now,let'screatetwopairsofkeys.Onepairwillhavenopassword,andonewill.Tocreatethem,I'mgoingtousethessh-keygenutilsfromOpenSSH(https://www.ssh.com/ssh/keygen/).Inthefolderwheremyscriptsforthischapterare,Icreatedanrsasubfolder.Withinit,runthefollowing:

$ssh-keygen-trsa

Givethenamekeytothepath(itwillbesavedinthecurrentfolder),andsimplyhittheEnterkeywhenaskedforthepassword.Whendone,dothesameagain,butthistimeusethenamekeypwdforthekey,andgiveitapassword.TheoneIchoseistheclassicPassword123.Whenyouaredone,changebacktothech9folder,andrunthiscode:

#token_rsa.py

importjwt

fromcryptography.hazmat.backendsimportdefault_backend

fromcryptography.hazmat.primitivesimportserialization

data={'payload':'data'}

defencode(data,priv_filename,priv_pwd=None,algorithm='RS256'):

withopen(priv_filename,'rb')askey:

private_key=serialization.load_pem_private_key(

key.read(),

password=priv_pwd,

backend=default_backend()

)

returnjwt.encode(data,private_key,algorithm=algorithm)

defdecode(data,pub_filename,algorithm='RS256'):

withopen(pub_filename,'rb')askey:

public_key=key.read()

returnjwt.decode(data,public_key,algorithm=algorithm)

#nopwd

token=encode(data,'rsa/key')

data_out=decode(token,'rsa/key.pub')

print(data_out)

#withpwd

token=encode(data,'rsa/keypwd',priv_pwd=b'Password123')

data_out=decode(token,'rsa/keypwd.pub')

print(data_out)

Inthepreviousexample,wedefinedacoupleofcustomfunctionstoencodeanddecodetokensusingprivate/publickeys.Asyoucanseeinthesignatureoftheencodefunction,weareusingtheRS256algorithmthistime.Weneedtoopentheprivatekeyfilebyusingthespecialload_pem_private_keyfunction,whichallowsustospecifyacontent,password,andbackend..pemisthenameoftheformatinwhichourkeyshavebeencreated.Ifyoutakealookatthosefiles,youwillprobablyrecognizethem,sincetheyarequitepopular.

Thelogicisprettystraightforward,andIwouldencourageyoutothinkaboutatleastoneusecasewherethistechniquemightbemoresuitablethanusingasharedkey.

Usefulreferences

Here,youcanfindalistofusefulreferencesifyouwanttodigdeeperintothefascinatingworldofcryptography:

Cryptography:https://en.wikipedia.org/wiki/CryptographyJSONWebTokens:https://jwt.io

Hashfunctions:https://en.wikipedia.org/wiki/Cryptographic_hash_functionHMAC:https://en.wikipedia.org/wiki/HMACCryptographyservices(PythonSTDlibrary):https://docs.python.org/3.7/library/crypto.html

IANAJSONWebTokenClaimsRegistry:https://www.iana.org/assignments/jwt/jwt.xhtml

PyJWTlibrary:https://pyjwt.readthedocs.io/Cryptographylibrary:https://cryptography.io/

Thereiswaymoreontheweb,andplentyofbooksyoucanalsostudy,butI'drecommendthatyoustartwiththemainconceptsandthengraduallydiveintothespecificsyouwanttounderstandmorethoroughly.

SummaryInthisshortchapter,weexploredtheworldofcryptographyinthePythonstandardlibrary.Welearnedhowtocreateahash(ordigest)foramessageusingdifferentcryptographicfunctions.Wealsolearnedhowtocreatetokensanddealwithrandomdatawhenitcomestothecryptographycontext.

WethentookasmalltouroutsidethestandardlibrarytolearnaboutJSONWebTokens,whichareusedintensivelytodayinauthenticationandclaims-relatedfunctionalitiesbymodernsystemsandapplications.

Themostimportantthingistounderstandthatdoingthingsmanuallycanbeveryriskywhenitcomestocryptography,soit'salwaysbesttoleaveittotheprofessionalsandsimplyusethetoolswehaveavailable.

Thenextchapterwillbeallaboutmovingawayfromonelineofsoftwareexecution.We'regoingtolearnhowsoftwareworksintherealworld,exploreconcurrentexecution,andlearnaboutthreads,processes,andthetoolsPythongivesustodomorethanonethingatatime,sotospeak.

ConcurrentExecution"Whatdowewant?Now!Whendowewantit?Fewerraceconditions!"

–AnnaMelzer

Inthischapter,I'mgoingtoupthegamealittlebit,bothintermsoftheconceptsI'llpresent,andinthecomplexityofthecodesnippetsI'llshowyou.Ifyoudon'tfeeluptothetask,orasyouarereadingthroughyourealizeitisgettingtoodifficult,feelfreetoskipit.Youcanalwayscomebacktoitwhenyoufeelready.

Theplanistotakeadetourfromthefamiliarsingle-threadedexecutionparadigm,anddeepdiveintowhatcanbedescribedasconcurrentexecution.Iwillonlybeabletoscratchthesurfaceofthiscomplextopic,soIwon'texpectyoutobeamasterofconcurrencybythetimeyou'redonereading,butIwill,asusual,trytogiveyouenoughinformationsothatyoucanthenproceedbywalkingthepath,sotospeak.

Wewilllearnaboutalltheimportantconceptsthatapplytothisareaofprogramming,andIwilltrytoshowyouexamplescodedindifferentstyles,togiveyouasolidunderstandingofthebasicsofthesetopics.Todigdeepintothischallengingandinterestingbranchofprogramming,youwillhavetorefertotheConcurrentExecutionsectioninthePythondocumentation(https://docs.python.org/3.7/library/concurrency.html),andmaybesupplementyourknowledgebystudyingbooksonthesubject.

Inparticular,wearegoingtoexplorethefollowing:

ThetheorybehindthreadsandprocessesWritingmultithreadedcodeWritingmultiprocessingcodeUsingexecutorstospawnthreadsandprocessesAbriefexampleofprogrammingwithasyncio

Let'sstartbygettingthetheoryoutoftheway.

ConcurrencyversusparallelismConcurrencyandparallelismareoftenmistakenforthesamething,butthereisadistinctionbetweenthem.Concurrencyistheabilitytorunmultiplethingsatthesametime,notnecessarilyinparallel.Parallelismistheabilitytodoanumberofthingsatthesametime.

Imagineyoutakeyourotherhalftothetheater.Therearetwolines:thatis,forVIPandregulartickets.Thereisonlyonefunctionarycheckingticketsandso,inordertoavoidblockingeitherofthetwoqueues,theycheckoneticketfromtheVIPline,thenonefromtheregularline.Overtime,bothqueuesareprocessed.Thisisanexampleofconcurrency.

Nowimaginethatanotherfunctionaryjoins,sonowwehaveonefunctionaryperqueue.Thisway,bothqueueswillbeprocessedeachbyitsownfunctionary.Thisisanexampleofparallelism.

Modernlaptopprocessorsfeaturemultiplecores(normallytwotofour).Acoreisanindependentprocessingunitthatbelongstoaprocessor.HavingmorethanonecoremeansthattheCPUinquestionhasthephysicalabilitytoactuallyexecutetasksinparallel.Withineachcore,normallythereisaconstantalternationofstreamsofwork,whichisconcurrentexecution.

BearinmindthatI'mkeepingthediscussiongenericonpurposehere.Accordingtowhichsystemyouareusing,therewillbedifferencesinhowexecutionishandled,soIwillconcentrateontheconceptsthatarecommontoall,oratleastmost,systems.

Threadsandprocesses–anoverviewAthreadcanbedefinedasasequenceofinstructionsthatcanberunbyascheduler,whichisthatpartoftheoperatingsystemthatdecideswhichchunkofworkwillreceivethenecessaryresourcestobecarriedout.Typically,athreadliveswithinaprocess.Aprocesscanbedefinedasaninstanceofacomputerprogramthatisbeingexecuted.

Inpreviouschapters,wehaverunourownmodulesandscriptswithcommandssimilarto$pythonmy_script.py.Whathappenswhenacommandlikethatisrun,isthataPythonprocessiscreated.Withinit,amainthreadofexecutionisspawned.Theinstructionsinthescriptarewhatwillberunwithinthatthread.

Thisisjustonewayofworkingthough,andPythoncanactuallyusemorethanonethreadwithinthesameprocess,andcanevenspawnmultipleprocesses.Unsurprisingly,thesebranchesofcomputersciencearecalledmultithreadingandmultiprocessing.

Inordertounderstandthedifference,let'stakeamomenttoexplorethreadsandprocessesinslightlymoredepth.

Quickanatomyofathread

Generallyspeaking,therearetwodifferenttypesofthreads:

User-levelthreads:ThreadsthatwecancreateandmanageinordertoperformataskKernel-levelthreads:Low-levelthreadsthatruninkernelmodeandactonbehalfoftheoperatingsystem

GiventhatPythonworksattheuserlevel,we'renotgoingtodeepdiveintokernelthreadsatthistime.Instead,wewillexploreseveralexamplesofuser-levelthreadsinthischapter'sexamples.

Athreadcanbeinanyofthefollowingstates:

Newthread:Athreadthathasn'tstartedyet,andhasn'tbeenallocatedanyresources.Runnable:Thethreadiswaitingtorun.Ithasalltheresourcesneededtorun,andassoonastheschedulergivesitthegreenlight,itwillberun.Running:Athreadwhosestreamofinstructionsisbeingexecuted.Fromthisstate,itcangobacktoanon-runningstate,ordie.Not-running:Athreadthathasbeenpaused.Thiscouldbeduetoanotherthreadtakingprecedenceoverit,orsimplybecausethethreadiswaitingforalong-runningIOoperationtofinish.Dead:Athreadthathasdiedbecauseithasreachedthenaturalendofitsstreamofexecution,orithasbeenkilled.

Transitionsbetweenstatesareprovokedeitherbyouractionsorbythescheduler.Thereisonethingtobearinmind,though;itisbestnottointerferewiththedeathofathread.

KillingthreadsKillingthreadsisnotconsideredtobegoodpractice.Pythondoesn'tprovidetheabilitytokillathreadbycallingamethodorfunction,andthisshouldbeahintthatkillingthreadsisn'tsomethingyouwanttobedoing.

Onereasonisthatathreadmighthavechildren—threadsspawnedfromwithinthethreaditself—whichwouldbeorphanedwhentheirparentdies.Anotherreasoncouldbethatifthethreadyou'rekillingisholdingaresourcethatneedstobeclosedproperly,youmightpreventthatfromhappeningandthatcouldpotentiallyleadtoproblems.

Later,wewillseeanexampleofhowwecanworkaroundtheseissues.

Context-switchingWehavesaidthattheschedulercandecidewhenathreadcanrun,orispaused,andsoon.Anytimearunningthreadneedstobesuspendedsothatanothercanberun,theschedulersavesthestateoftherunningthreadinawaythatitwillbepossible,atalatertime,toresumeexecutionexactlywhereitwaspaused.

Thisactiscalledcontext-switching.Peopledothatallthetimetoo.Wearedoingsomepaperwork,andwehearbing!onourphone.Westopthepaperworkandcheckourphone.Whenwe'redonedealingwithwhatwasprobablytheumpteenthpictureofafunnycat,wegobacktoourpaperwork.Wedon'tstartthepaperworkfromthebeginning,though;wesimplycontinuewherewehadleftoff.

Context-switchingisamarvelousabilityofmoderncomputers,butitcanbecometroublesomeifyougeneratetoomanythreads.Theschedulerthenwilltrytogiveeachofthemachancetorunforalittletime,andtherewillbealotoftimespentsavingandrecoveringthestateofthethreadsthatarerespectivelypausedandrestarted.

Inordertoavoidthisproblem,itisquitecommontolimittheamountofthreads(thesameconsiderationappliestoprocesses)thatcanberunatanygivenpointintime.Thisisachievedbyusingastructurecalledapool,thesizeofwhichcanbedecidedbytheprogrammer.Inanutshell,wecreateapoolandthenassigntaskstoitsthreads.Whenallthethreadsofthepoolarebusy,theprogramwon'tbeabletospawnanewthreaduntiloneofthemterminates(andgoesbacktothepool).Poolsarealsogreatforsavingresources,inthattheyproviderecyclingfeaturestothethreadecosystem.

Whenyouwritemultithreadedcode,itisusefultohaveinformationaboutthemachineoursoftwareisgoingtorunon.Thatinformation,coupledwithsomeprofiling(we'lllearnaboutitinChapter11,DebuggingandTroubleshooting),shouldenableustocalibratethesizeofourpoolscorrectly.

TheGlobalInterpreterLockInJuly2015,IattendedtheEuroPythonconferenceinBilbao,whereIgaveatalkabouttest-drivendevelopment.Thecameraoperatorunfortunatelylostthefirsthalfofit,butI'vesincebeenabletogivethattalkanothercoupleoftimes,soyoucanfindacompleteversionofitontheweb.Attheconference,IhadthegreatpleasureofmeetingGuidovanRossumandtalkingtohim,andIalsoattendedhiskeynotespeech.

OneofthetopicsheaddressedwastheinfamousGlobalInterpreterLock(GIL).TheGILisamutexthatprotectsaccesstoPythonobjects,preventingmultiplethreadsfromexecutingPythonbytecodesatonce.ThismeansthateventhoughyoucanwritemultithreadedcodeinPython,thereisonlyonethreadrunningatanypointintime(perprocess,ofcourse).

Incomputerprogramming,amutualexclusionobject(mutex)isaprogramobjectthatallowsmultipleprogramthreadstosharethesameresource,suchasfileaccess,butnotsimultaneously.

Thisisnormallyseenasanundesiredlimitationofthelanguage,andmanydeveloperstakeprideincursingthisgreatvillain.Thetruthliessomewhereelsethough,aswasbeautifullyexplainedbyRaymondHettingerinhisKeynoteonConcurrency,atPyBay2017(https://bit.ly/2KcijOB).About10minutesin,RaymondexplainsthatitisactuallyquitesimpletoremovetheGILfromPython.Ittakesaboutadayofwork.ThepriceyoupayforthisGIL-ectomythough,isthatyouthenhavetoapplylocksyourselfwherevertheyareneededinyourcode.Thisleadstoamoreexpensivefootprint,asmultitudesofindividuallockstakemoretimetobeacquiredandreleased,andmostimportantly,itintroducestheriskofbugs,aswritingrobustmultithreadedcodeisnoteasyandyoumightenduphavingtowritedozensorhundredsoflocks.

Inordertounderstandwhatalockis,andwhyyoumightwanttouseit,wefirstneedtotalkaboutoneoftheperilsofmultithreadedprogramming:raceconditions.

Raceconditionsanddeadlocks

Whenitcomestowritingmultithreadedcode,youneedtobeawareofthedangersthatcomewhenyourcodeisnolongerexecutedlinearly.Bythat,Imeanthatmultithreadedcodeisexposedtotheriskofbeingpausedatanypointintimebythescheduler,becauseithasdecidedtogivesomeCPUtimetoanotherstreamofinstructions.

Thisbehaviorexposesyoutodifferenttypesofrisks,thetwomostfamousbeingraceconditionsanddeadlocks.Let'stalkaboutthembriefly.

RaceconditionsAraceconditionisabehaviorofasystemwheretheoutputofaproceduredependsonthesequenceortimingofotheruncontrollableevents.Whentheseeventsdon'tunfoldintheorderintendedbytheprogrammer,araceconditionbecomesabug.

It'smucheasiertoexplainthiswithanexample.

Imagineyouhavetwothreadsrunning.Bothareperformingthesametask,whichconsistsofreadingavaluefromalocation,performinganactionwiththatvalue,incrementingthevalueby1unit,andsavingitback.SaythattheactionistopostthatvaluetoanAPI.

ScenarioA–raceconditionnothappening

ThreadAreadsthevalue(1),posts1totheAPI,thenincrementsitto2,andsavesitback.Rightafterthis,theschedulerpausesThreadA,andrunsThreadB.ThreadBreadsthevalue(now2),posts2totheAPI,incrementsitto3,andsavesitback.

Atthispoint,aftertheoperationhashappenedtwice,thevaluestorediscorrect:1+2=3.Moreover,theAPIhasbeencalledwithboth1and2,correctly.

ScenarioB–raceconditionhappeningThreadAreadsthevalue(1),postsittotheAPI,incrementsitto2,butbeforeitcansaveitback,theschedulerdecidestopausethreadAinfavorofThreadB.

ThreadBreadsthevalue(still1!),postsittotheAPI,incrementsitto2,andsavesitback.TheschedulerthenswitchesovertoThreadAagain.ThreadAresumesitsstreamofworkbysimplysavingthevalueitwasholdingafterincrementing,whichis2.

Afterthisscenario,eventhoughtheoperationhashappenedtwiceasinScenarioA,thevaluesavedis2,andtheAPIhasbeencalledtwicewith1.

Inareal-lifesituation,withmultiplethreadsandrealcodeperformingseveraloperations,theoverallbehavioroftheprogramexplodesintoamyriadofpossibilities.We'llseeanexampleofthislateron,andwe'llfixitusinglocks.

Themainproblemwithraceconditionsisthattheymakeourcodenon-deterministic,whichisbad.Thereareareasincomputersciencewherenon-determinismisusedtoachievethings,andthat'sfine,butingeneralyouwanttobeabletopredicthowyourcodewillbehave,andraceconditionsmakeitimpossibletodoso.

LockstotherescueLockscometotherescuewhendealingwithraceconditions.Forexample,inordertofixtheprecedingexample,allyouneedisalockaroundtheprocedure.Alockislikeaguardianthatwillallowonlyonethreadtotakeholdofit(wesaytoacquirealock),anduntilthatthreadreleasesthelock,nootherthreadcanacquireit.Theywillhavetositandwaituntilthelockisavailableagain.

ScenarioC–usingalockThreadAacquiresthelock,readsthevalue(1),poststotheAPI,increasesto2,andtheschedulersuspendsit.ThreadBisgivensomeCPUtime,soittriestoacquirethelock.Butthelockhasn'tbeenreleasedyetbyThreadA,soThreadBsitsandwaits.Theschedulermightnoticethis,andquicklydecidetoswitchbacktoThreadA.

ThreadAsaves2,andreleasesthelock,makingitavailabletoallotherthreads.

Atthispoint,whetherthelockisacquiredagainbyThreadA,orbyThreadB(becausetheschedulermighthavedecidedtoswitchagain),isnotimportant.Theprocedurewillalwaysbecarriedoutcorrectly,sincethelockmakessurethatwhenathreadreadsavalue,ithastocompletetheprocedure(pingAPI,increment,andsave)beforeanyotherthreadcanreadthevalueaswell.

Thereareamultitudeofdifferentlocksavailableinthestandardlibrary.Idefinitelyencourageyoutoreaduponthemtounderstandalltheperilsyoumightencounterwhencodingmultithreadedcode,andhowtosolvethem.

Let'snowtalkaboutdeadlocks.

DeadlocksAdeadlockisastateinwhicheachmemberofagroupiswaitingforsomeothermembertotakeaction,suchassendingamessageor,morecommonly,releasingalock,oraresource.

Asimpleexamplewillhelpyougetthepicture.Imaginetwolittlekidsplayingtogether.Findatoythatismadeoftwoparts,andgiveeachofthemonepart.Naturally,neitherofthemwillwanttogivetheotheronetheirpart,andtheywillwanttheotheronetoreleasetheparttheyhave.Soneitherofthemwillbeabletoplaywiththetoy,astheyeachholdhalfofit,andwillindefinitelywaitfortheotherkidtoreleasetheotherhalf.

Don'tworry,nokidswereharmedduringthemakingofthisexample.Itallhappenedinmymind.

Anotherexamplecouldbehavingtwothreadsexecutethesameprocedureagain.Theprocedurerequiresacquiringtworesources,AandB,bothguardedbyaseparatelock.Thread1acquiresA,andThread2acquiresB,andthentheywillwaitindefinitelyuntiltheotheronereleasestheresourceithas.Butthatwon'thappen,astheybothareinstructedtowaitandacquirethesecondresourceinordertocompletetheprocedure.Threadscanbemuchmorestubbornthankids.

Youcansolvethisprobleminseveralways.Theeasiestonemightbesimplytoapplyanordertotheresourcesacquisition,whichmeansthatthethreadthatgetsA,willalsogetalltherest:B,C,andsoon.

Anotherwayistoputalockaroundthewholeresourcesacquisitionprocedure,sothatevenifitmighthappenoutoforder,itwillstillbewithinthecontextofalock,whichmeansonlyonethreadatatimecanactuallygatheralltheresources.

Let'snowpauseourtalkonthreadsforamoment,andexploreprocesses.

Quickanatomyofaprocess

Processesarenormallymorecomplexthanthreads.Ingeneral,theycontainamainthread,butcanalsobemultithreadedifyouchoose.Theyarecapableofspawningmultiplesub-threads,eachofwhichcontainsitsownsetofregistersandastack.Eachprocessprovidesalltheresourcesthatthecomputerneedsinordertoexecutetheprogram.

Similarlytousingmultiplethreads,wecandesignourcodetotakeadvantageofamultiprocessingdesign.Multipleprocessesarelikelytorunovermultiplecores,thereforewithmultiprocessing,youcantrulyparallelizecomputation.Theirmemoryfootprints,though,areslightlyheavierthanthoseofthreads,andanotherdrawbacktousingmultipleprocessesisthatinter-processcommunication(IPC)tendstobemoreexpensivethancommunicationbetweenthreads.

Propertiesofaprocess

AUNIXprocessiscreatedbytheoperatingsystem.Ittypicallycontainsthefollowing:

AprocessID,processgroupID,userID,orgroupIDAnenvironmentandworkingdirectoryPrograminstructionsRegisters,astack,andaheapFiledescriptorsSignalactionsSharedlibrariesInter-processcommunicationtools(pipes,messagequeues,semaphores,orsharedmemory)

Ifyouarecuriousaboutprocesses,openupashellandtype$top.Thiscommanddisplaysandupdatessortedinformationabouttheprocessesthatarerunninginyoursystem.WhenIrunitonmymachine,thefirstlinetellsmethefollowing:

$top

Processes:477total,4running,473sleeping,2234threads

...

Thisgivesyouanideaabouthowmuchworkourcomputersaredoingwithoutusbeingreallyawareofit.

Multithreadingormultiprocessing?Givenallthisinformation,decidingwhichapproachisthebestmeanshavinganunderstandingofthetypeofworkthatneedstobecarriedout,andknowledgeaboutthesystemthatwillbededicatedtodoingthatwork.

Thereareadvantagestobothapproaches,solet'strytoclarifythemaindifferences.

Herearesomeadvantagesofusingmultithreading:

Threadsareallbornwithinthesameprocess.Theyshareresourcesandcancommunicatewithoneanotherveryeasily.Communicationbetweenprocessesrequiresmorecomplexstructuresandtechniques.Theoverheadofspawningathreadissmallerthanthatofaprocess.Moreover,theirmemoryfootprintisalsosmaller.ThreadscanbeveryeffectiveatblockingIO-boundapplications.Forexample,whileonethreadisblockedwaitingforanetworkconnectiontogivebacksomedata,workcanbeeasilyandeffectivelyswitchedtoanotherthread.Becausetherearen'tanysharedresourcesbetweenprocesses,weneedtouseIPCtechniques,andtheyrequiremorememorythancommunicationbetweenthreads.

Herearesomeadvantagesofusingmultiprocessing:

WecanavoidthelimitationsoftheGILbyusingprocesses.Sub-processesthatfailwon'tkillthemainapplication.Threadssufferfromissuessuchasraceconditionsanddeadlocks;whileusingprocessesthelikelihoodofhavingtodealwiththemisgreatlyreduced.Context-switchingofthreadscanbecomequiteexpensivewhentheiramountisaboveacertainthreshold.Processescanmakebetteruseofmulticoreprocessors.ProcessesarebetterthanmultiplethreadsathandlingCPU-intensivetasks.

Inthischapter,I'llshowyoubothapproachesformultipleexamples,so

hopefullyyou'llgainagoodunderstandingofthevariousdifferenttechniques.Let'sgettothecodethen!

ConcurrentexecutioninPythonLet'sstartbyexploringthebasicsofPythonmultithreadingandmultiprocessingwithsomesimpleexamples.

Keepinmindthatseveralofthefollowingexampleswillproduceanoutputthatdependsonaparticularrun.Whendealingwiththreads,thingscangetnon-deterministic,asImentionedearlier.So,ifyouexperiencedifferentresults,itisabsolutelyfine.Youwillprobablynoticethatsomeofyourresultswillvaryfromruntoruntoo.

StartingathreadFirstthingsfirst,let'sstartathread:

#start.py

importthreading

defsum_and_product(a,b):

s,p=a+b,a*b

print(f'{a}+{b}={s},{a}*{b}={p}')

t=threading.Thread(

target=sum_and_product,name='SumProd',args=(3,7)

)

t.start()

Afterimportingthreading,wedefineafunction:sum_and_product.Thisfunctioncalculatesthesumandtheproductoftwonumbers,andprintstheresults.Theinterestingbitisafterthefunction.Weinstantiatetfromthreading.Thread.Thisisourthread.Wepassedthenameofthefunctionthatwillberunasthethreadbody,wegaveitaname,andpassedthearguments3and7,whichwillbefedintothefunctionasaandb,respectively.

Afterhavingcreatedthethread,westartitwiththehomonymousmethod.

Atthispoint,Pythonwillstartexecutingthefunctioninanewthread,andwhenthatoperationisdone,thewholeprogramwillbedoneaswell,andexit.Let'srunit:

$pythonstart.py

3+7=10,3*7=21

Startingathreadisthereforequitesimple.Let'sseeamoreinterestingexamplewherewedisplaymoreinformation:

#start_with_info.py

importthreading

fromtimeimportsleep

defsum_and_product(a,b):

sleep(.2)

print_current()

s,p=a+b,a*b

print(f'{a}+{b}={s},{a}*{b}={p}')

defstatus(t):

ift.is_alive():

print(f'Thread{t.name}isalive.')

else:

print(f'Thread{t.name}hasterminated.')

defprint_current():

print('Thecurrentthreadis{}.'.format(

threading.current_thread()

))

print('Threads:{}'.format(list(threading.enumerate())))

print_current()

t=threading.Thread(

target=sum_and_product,name='SumPro',args=(3,7)

)

t.start()

status(t)

t.join()

status(t)

Inthisexample,thethreadlogicisexactlythesameasinthepreviousone,soyoudon'tneedtosweatonitandcanconcentrateonthe(insane!)amountoflogginginformationIadded.Weusetwofunctionstodisplayinformation:statusandprint_current.Thefirstonetakesathreadininputanddisplaysitsnameandwhetherornotit'salivebycallingitsis_alivemethod.Thesecondoneprintsthecurrentthread,andthenenumeratesallthethreadsintheprocess.Thisinformationcomesfromthreading.current_threadandthreading.enumerate.

ThereisareasonwhyIput.2secondsofsleepingtimewithinthefunction.Whenthethreadstarts,itsfirstinstructionistosleepforamoment.Thesneakyschedulerwillcatchthat,andswitchexecutionbacktothemainthread.Youcanverifythisbythefactthatintheoutput,youwillseetheresultofstatus(t)beforethatofprint_currentfromwithinthethread.Thismeansthatthatcallhappenswhilethethreadissleeping.

Finally,noticeIcalledt.join()attheend.ThatinstructsPythontoblockuntilthethreadhascompleted.ThereasonforthatisbecauseIwantthelastcalltostatus(t)totellusthatthethreadisgone.Let'speekattheoutput(slightlyrearrangedforreadability):

$pythonstart_with_info.py

Thecurrentthreadis

<_MainThread(MainThread,started140735733822336)>.

Threads:[<_MainThread(MainThread,started140735733822336)>]

ThreadSumProdisalive.

Thecurrentthreadis<Thread(SumProd,started123145375604736)>.

Threads:[

<_MainThread(MainThread,started140735733822336)>,

<Thread(SumProd,started123145375604736)>

]

3+7=10,3*7=21

ThreadSumProdhasterminated.

Asyoucansee,atfirstthecurrentthreadisthemainthread.Theenumerationshowsonlyonethread.ThenwecreateandstartSumProd.Weprintitsstatusandwelearnitisalive.Then,andthistimefromwithinSumProd,wedisplayinformationaboutthecurrentthreadagain.Ofcourse,nowthecurrentthreadisSumProd,andwecanseethatenumeratingallthreadsreturnsbothofthem.Aftertheresultisprinted,weverify,withonelastcalltostatus,thatthethreadhasterminated,aspredicted.Shouldyougetdifferentresults(apartfromtheIDsofthethreads,ofcourse),tryincreasingthesleepingtimeandseewhetheranythingchanges.

Startingaprocess

Let'snowseeanequivalentexample,butinsteadofusingathread,we'lluseaprocess:

#start_proc.py

importmultiprocessing

...

p=multiprocessing.Process(

target=sum_and_product,name='SumProdProc',args=(7,9)

)

p.start()

Thecodeisexactlythesameasforthefirstexample,butinsteadofusingaThread,weactuallyinstantiatemultiprocessing.Process.Thesum_and_productfunctionisthesameasbefore.Theoutputisalsothesame,exceptthenumbersaredifferent.

StoppingthreadsandprocessesAsmentionedbefore,ingeneral,stoppingathreadisabadidea,andthesamegoesforaprocess.Beingsureyou'vetakencaretodisposeandcloseeverythingthatisopencanbequitedifficult.However,therearesituationsinwhichyoumightwanttobeabletostopathread,soletmeshowyouhowtodoit:

#stop.py

importthreading

fromtimeimportsleep

classFibo(threading.Thread):

def__init__(self,*a,**kwa):

super().__init__(*a,**kwa)

self._running=True

defstop(self):

self._running=False

defrun(self):

a,b=0,1

whileself._running:

print(a,end='')

a,b=b,a+b

sleep(0.07)

print()

fibo=Fibo()

fibo.start()

sleep(1)

fibo.stop()

fibo.join()

print('Alldone.')

Forthisexample,weuseaFibonaccigenerator.We'veseenitbeforesoIwon'texplainit.Theimportantbittofocusonisthe_runningattribute.Firstofall,noticetheclassinheritsfromThread.Byoverridingthe__init__method,wecansetthe_runningflagtoTrue.Whenyouwriteathreadthisway,insteadofgivingitatargetfunction,yousimplyoverridetherunmethodintheclass.OurrunmethodcalculatesanewFibonaccinumber,andthensleepsforabout0.07seconds.

Inthelastblockofcode,wecreateandstartaninstanceofourclass.Thenwesleepforonesecond,whichshouldgivethethreadtimetoproduceabout14Fibonaccinumbers.Whenwecallfibo.stop(),wearen'tactuallystoppingthethread.WesimplysetourflagtoFalse,andthisallowsthecodewithinruntoreachitsnaturalend.Thismeansthatthethreadwilldieorganically.Wecalljoin

tomakesurethethreadisactuallydonebeforeweprintAlldone.ontheconsole.Let'schecktheoutput:

$pythonstop.py

01123581321345589144233

Alldone.

Checkhowmanynumberswereprinted:14,aspredicted.

Thisisbasicallyaworkaroundtechniquethatallowsyoutostopathread.Ifyoudesignyourcodecorrectlyaccordingtomultithreadingparadigms,youshouldn'thavetokillthreadsallthetime,soletthatneedbecomeyouralarmbellthatsomethingcouldbedesignedbetter.

StoppingaprocessWhenitcomestostoppingaprocess,thingsaredifferent,andfuss-free.Youcanuseeithertheterminateorkillmethod,butpleasemakesureyouknowwhatyou'redoing,asalltheprecedingconsiderationsaboutopenresourceslefthangingarestilltrue.

SpawningmultiplethreadsJustforfun,let'splaywithtwothreadsnow:

#starwars.py

importthreading

fromtimeimportsleep

fromrandomimportrandom

defrun(n):

t=threading.current_thread()

forcountinrange(n):

print(f'Hellofrom{t.name}!({count})')

sleep(0.2*random())

obi=threading.Thread(target=run,name='Obi-Wan',args=(4,))

ani=threading.Thread(target=run,name='Anakin',args=(3,))

obi.start()

ani.start()

obi.join()

ani.join()

Therunfunctionsimplyprintsthecurrentthread,andthenentersaloopofncycles,inwhichitprintsagreetingmessage,andsleepsforarandomamountoftime,between0and0.2seconds(random()returnsafloatbetween0and1).

Thepurposeofthisexampleistoshowyouhowaschedulermightjumpbetweenthreads,soithelpstomakethemsleepalittle.Let'sseetheoutput:

$pythonstarwars.py

HellofromObi-Wan!(0)

HellofromAnakin!(0)

HellofromObi-Wan!(1)

HellofromObi-Wan!(2)

HellofromAnakin!(1)

HellofromObi-Wan!(3)

HellofromAnakin!(2)

Asyoucansee,theoutputalternatesrandomlybetweenthetwo.Everytimethathappens,youknowacontextswitchhasbeenperformedbythescheduler.

DealingwithraceconditionsNowthatwehavethetoolstostartthreadsandrunthem,let'ssimulatearaceconditionsuchastheonewediscussedearlier:

#race.py

importthreading

fromtimeimportsleep

fromrandomimportrandom

counter=0

randsleep=lambda:sleep(0.1*random())

defincr(n):

globalcounter

forcountinrange(n):

current=counter

randsleep()

counter=current+1

randsleep()

n=5

t1=threading.Thread(target=incr,args=(n,))

t2=threading.Thread(target=incr,args=(n,))

t1.start()

t2.start()

t1.join()

t2.join()

print(f'Counter:{counter}')

Inthisexample,wedefinetheincrfunction,whichgetsanumbernininput,andloopsovern.Ineachcycle,itreadsthevalueofthecounter,sleepsforarandomamountoftime(between0and0.1seconds)bycallingrandsleep,atinyLambdafunctionIwrotetoimprovereadability,thenincreasesthevalueofthecounterby1.

Ichosetouseglobalinordertohaveread/writeaccesstocounter,butitcouldbeanythingreally,sofeelfreetoexperimentwiththatyourself.

Thewholescriptbasicallystartstwothreads,eachofwhichrunsthesamefunction,andgetsn=5.Noticehowweneedtojoinonboththreadsattheendtomakesurethatwhenweprintthefinalvalueofthecounter(lastline),boththreadsaredonedoingtheirwork.

Whenweprintthefinalvalue,wewouldexpectthecountertobe10,right?Twothreads,fiveloopseach,thatmakes10.However,wealmostneverget10ifwe

runthisscript.Iranitmyselfmanytimes,anditseemstoalwayshitsomewherebetween5and7.Thereasonthishappensisthatthereisaraceconditioninthiscode,andthoserandomsleepsIaddedaretheretoexacerbateit.Ifyouremovedthem,therewouldstillbearacecondition,becausethecounterisincreasedinanon-atomicway(whichmeansanoperationthatcanbebrokendowninmultiplesteps,andthereforepausedinbetween).However,thelikelihoodofthatraceconditionshowingisreallylow,soaddingtherandomsleephelps.

Let'sanalyzethecode.t1getsthecurrentvalueofthecounter,say,3.t1thensleepsforamoment.Iftheschedulerswitchescontextinthatmoment,pausingt1andstartingt2,t2willreadthesamevalue,3.Whateverhappensafterward,weknowthatboththreadswillupdatethecountertobe4,whichwillbeincorrectasaftertworeadingsitshouldhavegoneupto5.Addingthesecondrandomsleepcall,aftertheupdate,helpstheschedulerswitchmorefrequently,andmakesiteasiertoshowtheracecondition.Trycommentingoutoneofthem,andseehowtheresultchanges(itwilldoso,dramatically).

Nowthatwehaveidentifiedtheissue,let'sfixitbyusingalock.Thecodeisbasicallythesame,soI'llshowyouonlywhatchanges:

#race_with_lock.py

incr_lock=threading.Lock()

defincr(n):

globalcounter

forcountinrange(n):

withincr_lock:

current=counter

randsleep()

counter=current+1

randsleep()

Thistimewehavecreatedalock,fromthethreading.Lockclass.Wecouldcallitsacquireandreleasemethodsmanually,orwecanbePythonicanduseitwithinacontextmanager,whichlooksmuchnicer,anddoesthewholeacquire/releasebusinessforus.NoticeIlefttherandomsleepsinthecode.However,everytimeyourunit,itwillnowreturn10.

Thedifferenceisthis:whenthefirstthreadacquiresthatlock,itdoesn'tmatterthatwhenit'ssleeping,amomentlater,theschedulerswitchesthecontext.Thesecondthreadwilltrytoacquirethelock,andPythonwillanswerwitharesoundingno.So,thesecondthreadwilljustsitandwaituntilthatlockis

released.Assoonastheschedulerswitchesbacktothefirstthread,andthelockisreleased,thentheotherthreadwillhaveachance(ifitgetstherefirst,whichisnotnecessarilyguaranteed),toacquirethelockandupdatethecounter.Tryaddingsomeprintsintothatlogictoseewhetherthethreadsalternateperfectlyornot.Myguessisthattheywon't,atleastnoteverytime.Rememberthethreading.current_threadfunction,tobeabletoseewhichthreadisactuallyprintingtheinformation.

Pythonoffersseveraldatastructuresinthethreadingmodule:Lock,RLock,Condition,Semaphore,Event,Timer,andBarrier.Iwon'tbeabletoshowyouallofthem,becauseunfortunatelyIdon'thavetheroomtoexplainalltheusecases,butreadingthedocumentationofthethreadingmodule(https://docs.python.org/3.7/library/threading.html)willbeagoodplacetostartunderstandingthem.

Let'snowseeanexampleaboutthread'slocaldata.

Athread'slocaldataThethreadingmoduleoffersawaytoimplementlocaldataforthreads.Localdataisanobjectthatholdsthread-specificdata.Letmeshowyouanexample,andallowmetosneakinaBarriertoo,soIcantellyouhowitworks:

#local.py

importthreading

fromrandomimportrandint

local=threading.local()

defrun(local,barrier):

local.my_value=randint(0,10**2)

t=threading.current_thread()

print(f'Thread{t.name}hasvalue{local.my_value}')

barrier.wait()

print(f'Thread{t.name}stillhasvalue{local.my_value}')

count=3

barrier=threading.Barrier(count)

threads=[

threading.Thread(

target=run,name=f'T{name}',args=(local,barrier)

)fornameinrange(count)

]

fortinthreads:

t.start()

Westartbydefininglocal.Thatisthespecialobjectthatholdsthread-specificdata.Werunthreethreads.Eachofthemwillassignarandomvaluetolocal.my_value,andprintit.ThenthethreadreachesaBarrierobject,whichisprogrammedtoholdthreethreadsintotal.Whenthebarrierishitbythethirdthread,theyallcanpass.It'sbasicallyanicewaytomakesurethatNamountofthreadsreachacertainpointandtheyallwaituntileverysingleoneofthemhasarrived.

Now,iflocalwasanormal,dummyobject,thesecondthreadwouldoverridethevalueoflocal.my_value,andthethirdwoulddothesame.Thismeansthatwewouldseethemprintingdifferentvaluesinthefirstsetofprints,buttheywouldshowthesamevalue(thelastone)inthesecondroundofprints.Butthatdoesn'thappen,thankstolocal.Theoutputshowsthefollowing:

$pythonlocal.py

ThreadT0hasvalue61

ThreadT1hasvalue52

ThreadT2hasvalue38

ThreadT2stillhasvalue38

ThreadT0stillhasvalue61

ThreadT1stillhasvalue52

Noticethewrongorder,duetotheschedulerswitchingcontext,butthevaluesareallcorrect.

ThreadandprocesscommunicationWehaveseenquitealotofexamplessofar.So,let'sexplorehowtomakethreadsandprocessestalktooneanotherbyemployingaqueue.Let'sstartwiththreads.

ThreadcommunicationForthisexample,wewillbeusinganormalQueue,fromthequeuemodule:

#comm_queue.py

importthreading

fromqueueimportQueue

SENTINEL=object()

defproducer(q,n):

a,b=0,1

whilea<=n:

q.put(a)

a,b=b,a+b

q.put(SENTINEL)

defconsumer(q):

whileTrue:

num=q.get()

q.task_done()

ifnumisSENTINEL:

break

print(f'Gotnumber{num}')

q=Queue()

cns=threading.Thread(target=consumer,args=(q,))

prd=threading.Thread(target=producer,args=(q,35))

cns.start()

prd.start()

q.join()

Thelogicisverybasic.WehaveaproducerfunctionthatgeneratesFibonaccinumbersandputstheminaqueue.Whenthenextnumberisgreaterthanagivenn,theproducerexitsthewhileloop,andputsonelastthinginthequeue:aSENTINEL.ASENTINELisanyobjectthatisusedtosignalsomething,andinourcase,itsignalstotheconsumerthattheproducerisdone.

Theinterestingbitoflogicisintheconsumerfunction.Itloopsindefinitely,readingvaluesoutofthequeueandprintingthemout.Thereareacoupleofthingstonoticehere.First,seehowwearecallingq.task_done()?Thatistoacknowledgethattheelementinthequeuehasbeenprocessed.Thepurposeofthisistoallowthefinalinstructioninthecode,q.join(),tounblockwhenallelementshavebeenacknowledged,sothattheexecutioncanend.

Second,noticehowweusetheisoperatortocompareagainsttheitemsinordertofindthesentinel.We'llseeshortlythatwhenusingamultiprocessing.Queuethis

won'tbepossibleanymore.Beforewegetthere,wouldyoubeabletoguesswhy?

Runningthisexampleproducesaseriesoflines,suchasGotnumber0,Gotnumber1,andsoon,until34,sincethelimitweputis35,andthenextFibonaccinumberwouldbe55.

SendingeventsAnotherwaytomakethreadscommunicateistofireevents.Letmequicklyshowyouanexampleofthat:

#evt.py

importthreading

deffire():

print('Firingevent...')

event.set()

deflisten():

event.wait()

print('Eventhasbeenfired')

event=threading.Event()

t1=threading.Thread(target=fire)

t2=threading.Thread(target=listen)

t2.start()

t1.start()

Herewehavetwothreadsthatrunfireandlisten,respectivelyfiringandlisteningforanevent.Tofireanevent,callthesetmethodonit.Thet2thread,whichisstartedfirst,isalreadylisteningtotheevent,andwillsitthereuntiltheeventisfired.Theoutputfromthepreviousexampleisthefollowing:

$pythonevt.py

Firingevent...

Eventhasbeenfired

Eventsaregreatinsomesituations.Thinkabouthavingthreadsthatarewaitingonaconnectionobjecttobeready,beforetheycanactuallystartusingit.Theycouldbewaitingonanevent,andonethreadcouldbecheckingthatconnection,andfiringtheeventwhenit'sready.Eventsarefuntoplaywith,somakesureyouexperimentandthinkaboutusecasesforthem.

Inter-processcommunicationwithqueuesLet'snowseehowtocommunicatebetweenprocessesusingaqueue.Thisexampleisveryverysimilartotheoneforthreads:

#comm_queue_proc.py

importmultiprocessing

SENTINEL='STOP'

defproducer(q,n):

a,b=0,1

whilea<=n:

q.put(a)

a,b=b,a+b

q.put(SENTINEL)

defconsumer(q):

whileTrue:

num=q.get()

ifnum==SENTINEL:

break

print(f'Gotnumber{num}')

q=multiprocessing.Queue()

cns=multiprocessing.Process(target=consumer,args=(q,))

prd=multiprocessing.Process(target=producer,args=(q,35))

cns.start()

prd.start()

Asyoucansee,inthiscase,wehavetouseaqueuethatisaninstanceofmultiprocessing.Queue,whichdoesn'texposeatask_donemethod.However,becauseofthewaythisqueueisdesigned,itautomaticallyjoinsthemainthread,thereforeweonlyneedtostartthetwoprocessesandallwillwork.Theoutputofthisexampleisthesameastheonebefore.

WhenitcomestoIPC,becareful.Objectsarepickledwhentheyenterthequeue,soIDsgetlost,andthereareafewothersubtlethingstotakecareof.ThisiswhyinthisexampleIcannolongeruseanobjectasasentinel,andcompareusingis,likeIdidinthemulti-threadedversion.Thatsentinelobjectwouldbepickledinthequeue(becausethistimetheQueuecomesfrommultiprocessingandnotfromqueuelikebefore),andwouldassumeanewIDafterunpickling,failingtocomparecorrectly.Thestring"STOP"inthiscasedoesthe

trick,anditwillbeuptoyoutofindasuitablevalueforasentinel,whichneedstobesomethingthatcouldneverclashwithanyoftheitemsthatcouldbeinthesamequeue.Ileaveituptoyoutorefertothedocumentation,andlearnasmuchasyoucanonthistopic.

Queuesaren'ttheonlywaytocommunicatebetweenprocesses.Youcanalsousepipes(multiprocessing.Pipe),whichprovideaconnection(asin,apipe,clearly)fromoneprocesstoanother,andviceversa.Youcanfindplentyofexamplesinthedocumentation;theyaren'tthatdifferentfromwhatwe'veseenhere.

ThreadandprocesspoolsAsmentionedbefore,poolsarestructuresdesignedtoholdNobjects(threads,processes,andsoon).Whentheusagereachescapacity,noworkisassignedtoathread(orprocess)untiloneofthosecurrentlyworkingbecomesavailableagain.Pools,therefore,areagreatwaytolimitthenumberofthreads(orprocesses)thatcanbealiveatthesametime,preventingthesystemfromstarvingduetoresourceexhaustion,orthecomputationtimefrombeingaffectedbytoomuchcontextswitching.

Inthefollowingexamples,Iwillbetappingintotheconcurrent.futuresmoduletousetheThreadPoolExecutorandProcessPoolExecutorexecutors.Thesetwoclasses,useapoolofthreads(andprocesses,respectively),toexecutecallsasynchronously.Theybothacceptaparameter,max_workers,whichsetstheupperlimittohowmanythreads(orprocesses)canbeusedatthesametimebytheexecutor.

Let'sstartfromthemultithreadedexample:

#pool.py

fromconcurrent.futuresimportThreadPoolExecutor,as_completed

fromrandomimportrandint

importthreading

defrun(name):

value=randint(0,10**2)

tname=threading.current_thread().name

print(f'Hi,Iam{name}({tname})andmyvalueis{value}')

return(name,value)

withThreadPoolExecutor(max_workers=3)asexecutor:

futures=[

executor.submit(run,f'T{name}')fornameinrange(5)

]

forfutureinas_completed(futures):

name,value=future.result()

print(f'Thread{name}returned{value}')

Afterimportingthenecessarybits,wedefinetherunfunction.Itgetsarandomvalue,printsit,andreturnsit,alongwiththenameargumentitwascalledwith.Theinterestingbitcomesrightafterthefunction.

Asyoucansee,we'reusingacontextmanagertocallThreadPoolExecutor,towhichwepassmax_workers=3,whichmeansthepoolsizeis3.Thismeansonlythree

threadsatanytimewillbealive.

Wedefinealistoffutureobjectsbymakingalistcomprehension,inwhichwecallsubmitonourexecutorobject.Weinstructtheexecutortoruntherunfunction,withanamethatwillgofromT0toT4.Afutureisanobjectthatencapsulatestheasynchronousexecutionofacallable.

Thenweloopoverthefutureobjects,astheyarearedone.Todothis,weuseas_completedtogetaniteratorofthefutureinstancesthatreturnsthemassoonastheycomplete(finishorwerecancelled).Wegrabtheresultofeachfuturebycallingthehomonymousmethod,andsimplyprintit.Giventhatrunreturnsatuplename,value,weexpecttheresulttobeatwo-tuplecontainingnameandvalue.Ifweprinttheoutputofarun(bearinmindeachruncanpotentiallybeslightlydifferent),weget:

$pythonpool.py

Hi,IamT0(ThreadPoolExecutor-0_0)andmyvalueis5

Hi,IamT1(ThreadPoolExecutor-0_0)andmyvalueis23

Hi,IamT2(ThreadPoolExecutor-0_1)andmyvalueis58

ThreadT1returned23

ThreadT0returned5

Hi,IamT3(ThreadPoolExecutor-0_0)andmyvalueis93

Hi,IamT4(ThreadPoolExecutor-0_1)andmyvalueis62

ThreadT2returned58

ThreadT3returned93

ThreadT4returned62

Beforereadingon,canyoutellwhytheoutputlookslikethis?Couldyouexplainwhathappened?Spendamomentthinkingaboutit.

So,whatgoesonisthatthreethreadsstartrunning,sowegetthreeHi,Iam...messagesprintedout.Onceallthreeofthemarerunning,thepoolisatcapacity,soweneedtowaitforatleastonethreadtocompletebeforeanythingelsecanhappen.Intheexamplerun,T0andT2complete(whichissignaledbytheprintingofwhattheyreturned),sotheyreturntothepoolandcanbeusedagain.TheygetrunwithnamesT3andT4,andfinallyallthree,T1,T3,andT4complete.Youcanseefromtheoutputhowthethreadsareactuallyreused,andhowthefirsttwoarereassignedtoT3andT4aftertheycomplete.

Let'snowseethesameexample,butwiththemultiprocessdesign:

#pool_proc.py

fromconcurrent.futuresimportProcessPoolExecutor,as_completed

fromrandomimportrandint

fromtimeimportsleep

defrun(name):

sleep(.05)

value=randint(0,10**2)

print(f'Hi,Iam{name}andmyvalueis{value}')

return(name,value)

withProcessPoolExecutor(max_workers=3)asexecutor:

futures=[

executor.submit(run,f'P{name}')fornameinrange(5)

]

forfutureinas_completed(futures):

name,value=future.result()

print(f'Process{name}returned{value}')

Thedifferenceistrulyminimal.WeuseProcessPoolExecutorthistime,andtherunfunctionisexactlythesame,withonesmalladdition:wesleepfor50millisecondsatthebeginningofeachrun.Thisistoexacerbatethebehaviorandhavetheoutputclearlyshowthesizeofthepool,whichisstillthree.Ifweruntheexample,weget:

$pythonpool_proc.py

Hi,IamP0andmyvalueis19

Hi,IamP1andmyvalueis97

Hi,IamP2andmyvalueis74

ProcessP0returned19

ProcessP1returned97

ProcessP2returned74

Hi,IamP3andmyvalueis80

Hi,IamP4andmyvalueis68

ProcessP3returned80

ProcessP4returned68

Thisoutputclearlyshowsthepoolsizebeingthree.Itisveryinterestingtonoticethatifweremovethatcalltosleep,mostofthetimetheoutputwillhavefiveprintsofHi,Iam...,followedbyfiveprintsofProcessPxreturned....Howcanweexplainthat?Wellit'ssimple.Bythetimethefirstthreeprocessesaredone,andreturnedbyas_completed,allthreeareaskedfortheirresult,andwhateverisreturned,isprinted.Whilethishappens,theexecutorcanalreadystartrecyclingtwoprocessestorunthefinaltwotasks,andtheyhappentoprinttheirHi,Iam...messages,beforetheprintsintheforloopareallowedtotakeplace.

ThisbasicallymeansProcessPoolExecutorisquitefastandaggressive(intermsofgettingthescheduler'sattention),andit'sworthnotingthatthisbehaviordoesn'thappenwiththethreadcounterpart,inwhich,ifyourecall,wedidn'tneedtouseanyartificialsleeping.

Theimportantthingtokeepinmindthough,isbeingabletoappreciatethateven

simpleexamplessuchasthesecanalreadybeslightlytrickytounderstandorexplain.Letthisbealessontoyou,sothatyouraiseyourattentionto110%whenyoucodeformultithreadedormultiprocessdesigns.

Let'snowmoveontoamoreinterestingexample.

UsingaprocesstoaddatimeouttoafunctionMost,ifnotall,librariesthatexposefunctionstomakeHTTPrequests,providetheabilitytospecifyatimeoutwhenperformingtherequest.ThismeansthatifafterXseconds(Xbeingthetimeout),therequesthasn'tcompleted,thewholeoperationisabortedandexecutionresumesfromthenextinstruction.Notallfunctionsexposethisfeaturethough,so,whenafunctiondoesn'tprovidetheabilitytobeinginterrupted,wecanuseaprocesstosimulatethatbehavior.Inthisexample,we'llbetryingtotranslateahostnameintoanIPv4address.Thegethostbynamefunction,fromthesocketmodule,doesn'tallowustoputatimeoutontheoperationthough,soweuseaprocesstodothatartificially.Thecodethatfollowsmightnotbesostraightforward,soIencourageyoutospendsometimegoingthroughitbeforeyoureadonfortheexplanation:

#hostres/util.py

importsocket

frommultiprocessingimportProcess,Queue

defresolve(hostname,timeout=5):

exitcode,ip=resolve_host(hostname,timeout)

ifexitcode==0:

returnip

else:

returnhostname

defresolve_host(hostname,timeout):

queue=Queue()

proc=Process(target=gethostbyname,args=(hostname,queue))

proc.start()

proc.join(timeout=timeout)

ifqueue.empty():

proc.terminate()

ip=None

else:

ip=queue.get()

returnproc.exitcode,ip

defgethostbyname(hostname,queue):

ip=socket.gethostbyname(hostname)

queue.put(ip)

Let'sstartfromresolve.Itsimplytakesahostnameandatimeout,andcallsresolve_hostwiththem.Iftheexitcodeis0(whichmeanstheprocessterminated

correctly),itreturnstheIPv4thatcorrespondstothathost.Otherwise,itreturnsthehostnameitself,asafallbackmechanism.

Next,let'stalkaboutgethostbyname.Ittakesahostnameandaqueue,andcallssocket.gethostbynametoresolvethehostname.Whentheresultisavailable,itisputintothequeue.Now,thisiswheretheissuelies.Ifthecalltosocket.gethostbynametakeslongerthanthetimeoutwewanttoassign,weneedtokillit.

Theresolve_hostfunctiondoesexactlythis.Itreceivesthehostnameandthetimeout,and,atfirst,itsimplycreatesaqueue.Thenitspawnsanewprocessthattakesgethostbynameasthetarget,andpassestheappropriatearguments.Thentheprocessisstartedandjoinedon,butwithatimeout.

Now,thesuccessfulscenarioisthis:thecalltosocket.gethostbynamesucceedsquickly,theIPisinthequeue,theprocessterminateswellbeforeitstimeouttime,andwhenwegettotheifpart,thequeuewillnotbeempty.WefetchtheIPfromit,andreturnit,alongsidetheprocessexitcode.

Intheunsuccessfulscenario,thecalltosocket.gethostbynametakestoolong,andtheprocessiskilledafteritstimeouthasexpired.Becausethecallfailed,noIPhasbeeninsertedinthequeue,andthereforeitwillbeempty.Intheiflogic,wethereforesettheIPtoNone,andreturnasbefore.Theresolvefunctionwillfindthattheexitcodeisnot0(astheprocessdidn'tterminatehappily,butwaskilledinstead),andwillcorrectlyreturnthehostnameinsteadoftheIP,whichwecouldn'tgetanyway.

Inthesourcecodeofthebook,inthehostresfolderofthischapter,Ihaveaddedsometeststomakesurethisbehaviorisactuallycorrect.YoucanfindinstructionsonhowtorunthemintheREADME.mdfileinthefolder.Makesureyoucheckthetestcodetoo,itshouldbequiteinteresting.

CaseexamplesInthisfinalpartofthechapter,Iamgoingtoshowyouthreecaseexamplesinwhichwe'llseehowtodothesamethingbyemployingdifferentapproaches(single-thread,multithread,andmultiprocess).Finally,I'lldedicateafewwordstoasyncio,amodulethatintroducesyetanotherwayofdoingasynchronousprogramminginPython.

Exampleone–concurrentmergesort

Thefirstexamplewillrevolvearoundthemergesortalgorithm.Thissortingalgorithmisbasedonthedivideetimpera(divideandconquer)designparadigm.Thewayitworksisverysimple.Youhavealistofnumbersyouwanttosort.Thefirststepistodividethelistintotwoparts,sortthem,andmergetheresultsbackintoonesortedlist.Letmegiveyouasimpleexamplewithsixnumbers.Imaginewehavealist,v=[8,5,3,9,0,2].Thefirststepwouldbetodividethelist,v,intotwosublistsofthreenumbers:v1=[8,5,3]andv2=[9,0,2].Thenwesortv1andv2byrecursivelycallingmergesortonthem.Theresultwouldbev1=[3,5,8]andv2=[0,2,9].Inordertocombinev1andv2backintoasortedv,wesimplyconsiderthefirstiteminbothlists,andpicktheminimumofthose.Thefirstiterationwouldcompare3and0.Wepick0,leavingv2=[2,9].Thenwerinseandrepeat:wecompare3and2,wepick2,sonowv2=[9].Thenwecompare3and9.Thistimewepick3,leavingv1=[5,8],andsoonandsoforth.Nextwewouldpick5(5versus9),then8(8versus9),andfinally9.Thiswouldgiveusanew,sortedversionofv:v=[0,2,3,5,8,9].

ThereasonwhyIchosethisalgorithmasanexampleistwofold.First,itiseasytoparallelize.Yousplitthelistintwo,havetwoprocessesworkonthem,andthencollecttheresults.Second,itispossibletoamendthealgorithmsothatitsplitstheinitiallistintoanyN≥2,andassignsthosepartstoNprocesses.Recombinationisassimpleasdealingwithjusttwoparts.Thischaracteristicmakesitagoodcandidateforaconcurrentimplementation.

Single-threadmergesortLet'sseehowallthistranslatesintocode,startingbylearninghowtocodeourownhomemademergesort:#ms/algo/mergesort.pydefsort(v):iflen(v)<=1:returnvmid=len(v)//2v1,v2=sort(v[:mid]),sort(v[mid:])returnmerge(v1,v2)

defmerge(v1,v2):v=[]h=k=0len_v1,len_v2=len(v1),len(v2)whileh<len_v1ork<len_v2:ifk==len_v2or(h<len_v1andv1[h]<v2[k]):v.append(v1[h])h+=1else:v.append(v2[k])k+=1returnv

Let'sstartfromthesortfunction.Firstweencounterthebaseoftherecursion,whichsaysthatifthelisthas0or1elements,wedon'tneedtosortit,wecansimplyreturnitasitis.Ifthatisnotthecase,thenwecalculatethemidpoint(mid),andrecursivelycallsortonv[:mid]andv[mid:].Ihopeyouarebynowveryfamiliarwiththeslicingsyntax,butjustincaseyouneedarefresher,thefirstoneisallelementsinvuptothemidindex(excluded),andthesecondoneisallelementsfrommidtotheend.Theresultsofsortingthemareassignedrespectivelytov1andv2.Finally,wecallmerge,passingv1andv2.

Thelogicofmergeusestwopointers,handk,tokeeptrackofwhichelementsinv1andv2wehavealreadycompared.Ifwefindthattheminimumisinv1,we

appendittov,andincreaseh.Ontheotherhand,iftheminimumisinv2,weappendittovbutincreasekthistime.Theprocedureisrunninginawhileloopwhosecondition,combinedwiththeinnerif,makessurewedon'tgeterrorsduetoindexesoutofbounds.It'saprettystandardalgorithmthatyoucanfindinmanydifferentvariationsontheweb.

Inordertomakesurethiscodeissolid,Ihavewrittenatestsuitethatresidesinthech10/msfolder.Iencourageyoutocheckitout.

Nowthatwehavethebuildingblocks,let'sseehowwemodifythistomakeitsothatitworkswithanarbitrarynumberofparts.

Single-threadmultipartmergesortThecodeforthemultipartversionofthealgorithmisquitesimple.Wecanreusethemergefunction,butwe'llhavetorewritethesortone:

#ms/algo/multi_mergesort.py

fromfunctoolsimportreduce

from.mergesortimportmerge

defsort(v,parts=2):

assertparts>1,'Partsneedtobeatleast2.'

iflen(v)<=1:

returnv

chunk_len=max(1,len(v)//parts)

chunks=(

sort(v[k:k+chunk_len],parts=parts)

forkinrange(0,len(v),chunk_len)

)

returnmulti_merge(*chunks)

defmulti_merge(*v):

returnreduce(merge,v)

WesawreduceinChapter4,Functions,theBuildingBlocksofCode,whenwecodedourownfactorialfunction.Thewayitworkswithinmulti_mergeistomergethefirsttwolistsinv.Thentheresultismergedwiththethirdone,afterwhichtheresultismergedwiththefourthone,andsoon.

Takealookatthenewversionofsort.Ittakesthevlist,andthenumberofpartswewanttosplititinto.Thefirstthingwedoischeckthatwepassedacorrectnumberforparts,whichneedstobeatleasttwo.Then,likebefore,wehavethebaseoftherecursion.Andfinallywegetintothemainlogicofthefunction,whichissimplyamultipartversionoftheonewesawinthepreviousexample.Wecalculatethelengthofeachchunkusingthemaxfunction,justincasetherearefewerelementsinthelistthanparts.Andthenwewriteageneratorexpressionthatcallssortrecursivelyoneachchunk.Finally,wemergealltheresultsbycallingmulti_merge.

Iamawarethatinexplainingthiscode,Ihaven'tbeenasexhaustiveasIusuallyam,andI'mafraiditisonpurpose.Theexamplethatcomesafterthemergesortwillbemuchmorecomplex,soIwouldliketoencourageyoutoreallytrytounderstandtheprevioustwosnippetsasthoroughlyasyoucan.

Now,let'stakethisexampletothenextstep:multithreading.

MultithreadedmergesortInthisexample,weamendthesortfunctiononceagain,sothat,aftertheinitialdivisionintochunks,itspawnsathreadperpart.Eachthreadusesthesingle-threadedversionofthealgorithmtosortitspart,andthenattheendweusethemulti-mergetechniquetocalculatethefinalresult.TranslatingintoPython:

#ms/algo/mergesort_thread.py

fromfunctoolsimportreduce

frommathimportceil

fromconcurrent.futuresimportThreadPoolExecutor,as_completed

from.mergesortimportsortas_sort,merge

defsort(v,workers=2):

iflen(v)==0:

returnv

dim=ceil(len(v)/workers)

chunks=(v[k:k+dim]forkinrange(0,len(v),dim))

withThreadPoolExecutor(max_workers=workers)asexecutor:

futures=[

executor.submit(_sort,chunk)forchunkinchunks

]

returnreduce(

merge,

(future.result()forfutureinas_completed(futures))

)

Weimportalltherequiredtools,includingexecutors,theceilingfunction,andsortandmergefromthesingle-threadedversionofthealgorithm.NoticehowIchangedthenameofthesingle-threadedsortinto_sortuponimportingit.

Inthisversionofsort,wecheckwhethervisemptyfirst,andifnotweproceed.Wecalculatethedimensionofeachchunkusingtheceilfunction.It'sbasicallydoingwhatweweredoingwithmaxintheprevioussnippet,butIwantedtoshowyouanotherwaytosolvetheissue.

Whenwehavethedimension,wecalculatethechunksandprepareanicegeneratorexpressiontoservethemtotheexecutor.Therestisstraightforward:wedefinealistoffutureobjects,eachofwhichistheresultofcallingsubmitontheexecutor.Eachfutureobjectrunsthesingle-threaded_sortalgorithmonthechunkithasbeenassignedto.

Finallyastheyarereturnedbytheas_completedfunction,theresultsaremergedusingthesametechniquewesawintheearliermultipartexample.

MultiprocessmergesortToperformthefinalstep,weneedtoamendonlytwolinesinthepreviouscode.Ifyouhavepaidattentionintheintroductoryexamples,youwillknowwhichofthetwolinesIamreferringto.Inordertosavesomespace,I'lljustgiveyouthediffofthecode:

#ms/algo/mergesort_proc.py

...

fromconcurrent.futuresimportProcessPoolExecutor,as_completed

...

defsort(v,workers=2):

...

withProcessPoolExecutor(max_workers=workers)asexecutor:

...

That'sit!BasicallyallyouhavetodoisuseProcessPoolExecutorinsteadofThreadPoolExecutor,andinsteadofspawningthreads,youarespawningprocesses.

DoyourecallwhenIwassayingthatprocessescanactuallyrunondifferentcores,whilethreadsrunwithinthesameprocesssotheyarenotactuallyrunninginparallel?Thisisagoodexampletoshowyouaconsequenceofchoosingoneapproachortheother.BecausethecodeisCPU-intensive,andthereisnoIOgoingon,splittingthelistandhavingthreadsworkingthechunksdoesn'taddanyadvantage.Ontheotherhand,usingprocessesdoes.Ihaverunsomeperformancetests(runthech10/ms/performance.pymodulebyyourselfandyouwillseehowyourmachineperforms)andtheresultsprovemyexpectations:

$pythonperformance.py

TestingSort

Size:100000

Elapsedtime:0.492s

Size:500000

Elapsedtime:2.739s

TestingSortThread

Size:100000

Elapsedtime:0.482s

Size:500000

Elapsedtime:2.818s

TestingSortProc

Size:100000

Elapsedtime:0.313s

Size:500000

Elapsedtime:1.586s

Thetwotestsarerunontwolistsof100,000and500,000items,respectively.AndIamusingfourworkersforthemultithreadedandmultiprocessingversions.Usingdifferentsizesisquiteusefulwhenlookingforpatterns.Asyoucansee,thetimeelapsedisbasicallythesameforthefirsttwoversions(single-threaded,andmultithreaded),buttheyarereducedbyabout50%forthemultiprocessingversion.It'sslightlymorethan50%becausehavingtospawnprocesses,andhandlethem,comesataprice.Butstill,youcandefinitelyappreciatethatIhaveaprocessorwithtwocoresonmymachine.

ThisalsotellsyouthateventhoughIusedfourworkersinthemultiprocessingversion,Icanstillonlyparallelizeproportionatelytotheamountofcoresmyprocessorhas.Therefore,twoormoreworkersmakesverylittledifference.

Nowthatyouareallwarmedup,let'smoveontothenextexample.

Exampletwo–batchsudoku-solverInthisexample,wearegoingtoexploreasudoku-solver.Wearenotgoingtogointomuchdetailwithit,asthepointisnotthatofunderstandinghowtosolvesudoku,butrathertoshowyouhowtousemulti-processingtosolveabatchofsudokupuzzles.

Whatisinterestinginthisexample,isthatinsteadofmakingthecomparisonbetweensingleandmultithreadedversionsagain,we'regoingtoskipthatandcomparethesingle-threadedversionwithtwodifferentmultiprocessversions.Onewillassignonepuzzleperworker,soifwesolve1,000puzzles,we'lluse1,000workers(well,wewilluseapoolofNworkers,eachofwhichisconstantlyrecycled).Theotherversionwillinsteaddividetheinitialbatchofpuzzlesbythepoolsize,andbatch-solveeachchunkwithinoneprocess.Thismeans,assumingapoolsizeoffour,dividingthose1,000puzzlesintochunksof250puzzleseach,andgivingeachchunktooneworker,foratotaloffourofthem.

ThecodeIwillpresenttoyouforthesudoku-solver(withoutthemultiprocessingpart),comesfromasolutiondesignedbyPeterNorvig,whichhasbeendistributedundertheMITlicense.Hissolutionissoefficientthat,aftertryingtore-implementmyownforafewdays,andgettingtothesameresult,Isimplygaveupanddecidedtogowithhisdesign.Ididdoalotofrefactoringthough,becauseIwasn'thappywithhischoiceoffunctionandvariablenames,soImadethosemorebookfriendly,sotospeak.Youcanfindtheoriginalcode,alinktotheoriginalpagefromwhichIgotit,andtheoriginalMITlicense,inthech10/sudoku/norvigfolder.Ifyoufollowthelink,you'llfindaverythoroughexplanationofthesudoku-solverbyNorvighimself.

WhatisSudoku?Firstthingsfirst.Whatisasudokupuzzle?Sudokuisanumber-placementpuzzlebasedonlogicthatoriginatedinJapan.Theobjectiveistofilla9x9gridwithdigitssothateachrow,column,andbox(3x3subgridsthatcomposethegrid)containsallofthedigitsfrom1to9.Youstartfromapartiallypopulatedgrid,andaddnumberafternumberusinglogicconsiderations.

Sudokucanbeinterpreted,fromacomputerscienceperspective,asaproblemthatfitsintheexactcovercategory.DonaldKnuth,theauthorofTheArtofComputerProgramming(andmanyotherwonderfulbooks),hasdevisedanalgorithm,calledAlgorithmX,tosolveproblemsinthiscategory.AbeautifulandefficientimplementationofAlgorithmX,calledDancingLinks,whichharnessesthepowerofcirculardoubly-linkedlists,canbeusedtosolvesudoku.Thebeautyofthisapproachisthatallitrequiresisamappingbetweenthestructureofthesudoku,andtheDancingLinksalgorithm,andwithouthavingtodoanyofthelogicdeductionsnormallyneededtosolvethepuzzle,itgetstothesolutionatthespeedoflight.

Manyyearsago,whenmyfreetimewasanumbergreaterthanzero,IwroteaDancingLinkssudoku-solverinC#,whichIstillhavearchivedsomewhere,whichwasgreatfuntodesignandcode.Idefinitelyencourageyoutocheckouttheliteratureandcodeyourownsolver,it'sagreatexercise,ifyoucansparethetime.

Inthisexample'ssolutionthough,we'regoingtouseasearchalgorithmusedinconjunctionwithaprocessthat,inartificialintelligence,isknownasconstraintpropagation.Thetwoarequitecommonlyusedtogethertomakeaproblemsimplertosolve.We'llseethatinourexample,theyareenoughforustobeabletosolveadifficultsudokuinamatterofmilliseconds.

Implementingasudoku-solverinPythonLet'snowexploremyrefactoredimplementationofthesolver.I'mgoingtopresentthecodetoyouinsteps,asitisquiteinvolved(also,Iwon'trepeatthesourcenameatthetopofeachsnippet,untilImovetoanothermodule):

#sudoku/algo/solver.py

importos

fromitertoolsimportzip_longest,chain

fromtimeimporttime

defcross_product(v1,v2):

return[w1+w2forw1inv1forw2inv2]

defchunk(iterable,n,fillvalue=None):

args=[iter(iterable)]*n

returnzip_longest(*args,fillvalue=fillvalue)

Westartwithsomeimports,andthenwedefineacoupleofusefulfunctions:cross_productandchunk.Theydoexactlywhatthenameshintat.Thefirstonereturnsthecross-productbetweentwoiterables,whilethesecondonereturnsalistofchunksfromiterable,eachofwhichhasnelements,andthelastofwhichmightbepaddedwithagivenfillvalue,shouldthelengthofiterablenotbeamultipleofn.Thenweproceedtodefineafewstructures,whichwillbeusedbythesolver:

digits='123456789'

rows='ABCDEFGHI'

cols=digits

squares=cross_product(rows,cols)

all_units=(

[cross_product(rows,c)forcincols]

+[cross_product(r,cols)forrinrows]

+[cross_product(rs,cs)

forrsinchunk(rows,3)forcsinchunk(cols,3)]

)

units=dict(

(square,[unitforunitinall_unitsifsquareinunit])

forsquareinsquares

)

peers=dict(

(square,set(chain(*units[square]))-set([square]))

forsquareinsquares

)

Withoutgoingtoomuchintodetail,let'shoverovertheseobjects.squaresisalist

ofallsquaresinthegrid.SquaresarerepresentedbyastringsuchasA3orC7.Rowsarenumberedwithletters,andcolumnswithnumbers,soA3willindicatethesquareinthefirstrow,andthirdcolumn.

all_unitsisalistofallpossiblerows,columns,andblocks.Eachofthoseelementsisrepresentedasalistofthesquaresthatbelongtotherow/column/block.unitsisamorecomplexstructure.Itisadictionarywith81keys.Eachkeyrepresentsasquare,andthecorrespondingvalueisalistwiththreeelementsinit:arow,acolumn,andablock.Ofcourse,thosearetherow,column,andblockthatthesquarebelongsto.

Finally,peersisadictionaryverysimilartounits,butthevalueofeachkey(whichstillrepresentsasquare),isasetcontainingallpeersforthatsquare.Peersaredefinedasallthesquaresbelongingtotherow,column,andblockthesquareinthekeybelongsto.Thesestructureswillbeusedinthecalculationofthesolution,whenattemptingtosolveapuzzle.

Beforewetakealookatthefunctionthatparsestheinputlines,letmegiveyouanexampleofwhataninputpuzzlelookslike:

1..3.......75...3..3.4.8.2...47....9.........689....4..5..178.4.....2.75.......1.

Thefirstninecharactersrepresentthefirstrow,thenanothernineforthesecondrow,andsoon.Emptysquaresarerepresentedbydots:

defparse_puzzle(puzzle):

assertset(puzzle)<=set('.0123456789')

assertlen(puzzle)==81

grid=dict((square,digits)forsquareinsquares)

forsquare,digitinzip(squares,puzzle):

ifdigitindigitsandnotplace(grid,square,digit):

returnFalse#Incongruentpuzzle

returngrid

defsolve(puzzle):

grid=parse_puzzle(puzzle)

returnsearch(grid)

Thissimpleparse_puzzlefunctionisusedtoparseaninputpuzzle.Wedoalittlebitofsanitycheckingatthebeginning,assertingthattheinputpuzzlehastoshrinkintoasetthatisasubsetofthesetofallnumbersplusadot.Thenwemakesurewehave81inputcharacters,andfinallywedefinegrid,whichinitiallyissimplyadictionarywith81keys,eachofwhichisasquare,allwiththesame

value,whichisastringofallpossibledigits.Thisisbecauseasquareinacompletelyemptygridhasthepotentialtobecomeanynumberfrom1to9.Theforloopisdefinitelythemostinterestingpart.Weparseeachofthe81charactersintheinputpuzzle,couplingthemwiththecorrespondingsquareinthegrid,andwetryto"place"them.Iputthatindoublequotesbecause,aswe'llseeinamoment,theplacefunctiondoesmuchmorethansimplysettingagivennumberinagivensquare.Ifwefindthatwecannotplaceadigitfromtheinputpuzzle,itmeanstheinputisinvalid,andwereturnFalse.Otherwise,we'regoodtogoandwereturnthegrid.

parse_puzzleisusedinthesolvefunction,whichsimplyparsestheinputpuzzle,andunleashessearchonit.Whatfollowsisthereforetheheartofthealgorithm:

defsearch(grid):

ifnotgrid:

returnFalse

ifall(len(grid[square])==1forsquareinsquares):

returngrid#Solved

values,square=min(

(len(grid[square]),square)forsquareinsquares

iflen(grid[square])>1

)

fordigitingrid[square]:

result=search(place(grid.copy(),square,digit))

ifresult:

returnresult

Thissimplefunctionfirstcheckswhetherthegridisactuallynon-empty.Thenittriestoseewhetherthegridissolved.Asolvedgridwillhaveonevaluepersquare.Ifthatisnotthecase,itloopsthrougheachsquareandfindsthesquarewiththeminimumamountofcandidates.Ifasquarehasastringvalueofonlyonedigit,itmeansanumberhasbeenplacedinthatsquare.Butifthevalueismorethanonedigit,thenthosearepossiblecandidates,soweneedtofindthesquarewiththeminimumamountofcandidates,andtrythem.Tryingasquarewith23candidatesismuchbetterthantryingonewith23589.Inthefirstcase,wehavea50%chanceofgettingtherightvalue,whileinthesecondone,weonlyhave20%.Choosingthesquarewiththeminimumamountofcandidatesthereforemaximizesthechancesforustoplacegoodnumbersinthegrid.

Oncethecandidateshavebeenfound,wetrytheminorderandifanyofthemresultsinbeingsuccessful,wehavesolvedthegridandwereturn.Youmighthavenoticedtheuseoftheplacefunctioninthesearchtoo.Solet'sexploreitscode:

defplace(grid,square,digit):

"""Eliminatealltheothervalues(exceptdigit)from

grid[square]andpropagate.

Returngrid,orFalseifacontradictionisdetected.

"""

other_vals=grid[square].replace(digit,'')

ifall(eliminate(grid,square,val)forvalinother_vals):

returngrid

returnFalse

Thisfunctiontakesawork-in-progressgrid,andtriestoplaceagivendigitinagivensquare.AsImentionedbefore,"placing"isnotthatstraightforward.Infact,whenweplaceanumber,wehavetopropagatetheconsequencesofthatactionthroughoutthegrid.Wedothatbycallingtheeliminatefunction,whichappliestwostrategiesofthesudokugame:

Ifasquarehasonlyonepossiblevalue,eliminatethatvaluefromthesquare'speersIfaunithasonlyoneplaceforavalue,placethevaluethere

Letmebrieflyofferanexampleofbothpoints.Forthefirstone,ifyouplace,say,number7inasquare,thenyoucaneliminate7fromthelistofcandidatesforallthesquaresthatbelongtotherow,column,andblockthatsquarebelongsto.

Forthesecondpoint,sayyou'reexaminingthefourthrowand,ofallthesquaresthatbelongtoit,onlyoneofthemhasnumber7initscandidates.Thismeansthatnumber7canonlygointhatprecisesquare,soyoushouldgoaheadandplaceitthere.

Thefollowingfunction,eliminate,appliesthesetworules.Itscodeisquiteinvolved,soinsteadofgoinglinebylineandofferinganexcruciatingexplanation,Ihaveaddedsomecomments,andwillleaveyouwiththetaskofunderstandingit:

defeliminate(grid,square,digit):

"""Eliminatedigitfromgrid[square].Propagatewhencandidates

are<=2.

Returngrid,orFalseifacontradictionisdetected.

"""

ifdigitnotingrid[square]:

returngrid#alreadyeliminated

grid[square]=grid[square].replace(digit,'')

##(1)Ifasquareisreducedtoonevalue,eliminatevalue

##frompeers.

iflen(grid[square])==0:

returnFalse#nothinglefttoplacehere,wrongsolution

eliflen(grid[square])==1:

value=grid[square]

ifnotall(

eliminate(grid,peer,value)forpeerinpeers[square]

):

returnFalse

##(2)Ifaunitisreducedtoonlyoneplaceforavalue,

##thenputitthere.

forunitinunits[square]:

places=[sqrforsqrinunitifdigitingrid[sqr]]

iflen(places)==0:

returnFalse#Noplaceforthisvalue

eliflen(places)==1:

#digitcanonlybeinoneplaceinunit,

#assignitthere

ifnotplace(grid,places[0],digit):

returnFalse

returngrid

Therestofthefunctionsinthemodulearen'timportantfortherestofthisexample,soIwillskipthem.Youcanrunthismodulebyitself;itwillfirstperformaseriesofchecksonitsdatastructures,andthenitwillsolveallthesudokupuzzlesIhaveplacedinthesudoku/puzzlesfolder.Butthatisnotwhatwe'reinterestedin,right?Wewanttoseehowtosolvesudokuusingmultiprocessingtechniques,solet'sgettoit.

SolvingsudokuwithmultiprocessingInthismodule,we'regoingtoimplementthreefunctions.Thefirstonesimplysolvesabatchofsudokupuzzles,withnomultiprocessinginvolved.Wewillusetheresultsforbenchmarking.Thesecondandthethirdoneswillusemultiprocessing,withandwithoutbatch-solving,sowecanappreciatethedifferences.Let'sstart:

#sudoku/process_solver.py

importos

fromfunctoolsimportreduce

fromoperatorimportconcat

frommathimportceil

fromtimeimporttime

fromcontextlibimportcontextmanager

fromconcurrent.futuresimportProcessPoolExecutor,as_completed

fromunittestimportTestCase

fromalgo.solverimportsolve

@contextmanager

deftimer():

t=time()

yield

tot=time()-t

print(f'Elapsedtime:{tot:.3f}s')

Afteralonglistofimports,wedefineacontextmanagerthatwe'regoingtouseasatimerdevice.Ittakesareferencetothecurrenttime(t),andthenityields.Afterhavingyielded,that'swhenthebodyofthemanagedcontextisexecuted.Finally,onexitingthemanagedcontext,wecalculatetot,whichisthetotalamountoftimeelapsed,andprintit.It'sasimpleandelegantcontextmanagerwrittenwiththedecorationtechnique,andit'ssuperfun.Let'snowseethethreefunctionsImentionedearlier:

defbatch_solve(puzzles):

#Singlethreadbatchsolve.

return[solve(puzzle)forpuzzleinpuzzles]

Thisoneisasingle-threadedsimplebatchsolver,whichwillgiveusatimetocompareagainst.Itsimplyreturnsalistofallsolvedgrids.Boring.Now,checkoutthefollowingcode:

defparallel_single_solver(puzzles,workers=4):

#Parallelsolve-1processpereachpuzzle

withProcessPoolExecutor(max_workers=workers)asexecutor:

futures=(

executor.submit(solve,puzzle)forpuzzleinpuzzles

)

return[

future.result()forfutureinas_completed(futures)

]

Thisoneismuchbetter.ItusesProcessPoolExecutortouseapoolofworkers,eachofwhichisusedtosolveroughlyone-fourthofthepuzzles.Thisisbecausewearespawningonefutureobjectperpuzzle.Thelogicisextremelysimilartoanymultiprocessingexamplewehavealreadyseeninthechapter.Let'sseethethirdfunction:

defparallel_batch_solver(puzzles,workers=4):

#Parallelbatchsolve-Puzzlesarechunkedinto`workers`

#chunks.Aprocessforeachchunk.

assertlen(puzzles)>=workers

dim=ceil(len(puzzles)/workers)

chunks=(

puzzles[k:k+dim]forkinrange(0,len(puzzles),dim)

)

withProcessPoolExecutor(max_workers=workers)asexecutor:

futures=(

executor.submit(batch_solve,chunk)forchunkinchunks

)

results=(

future.result()forfutureinas_completed(futures)

)

returnreduce(concat,results)

Thislastfunctionisslightlydifferent.Insteadofspawningonefutureobjectperpuzzle,itsplitsthetotallistofpuzzlesintoworkerschunks,andthencreatesonefutureobjectperchunk.Thismeansthatifworkersiseight,we'regoingtospawneightfutureobjects.Noticethatinsteadofpassingsolvetoexecutor.submit,we'repassingbatch_solve,whichdoesthetrick.ThereasonwhyIcodedthelasttwofunctionssodifferentlyisbecauseIwascurioustoseetheseverityoftheimpactoftheoverheadweincurintowhenwerecycleprocessesfromapoolanon-negligibleamountoftimes.

Nowthatwehavethefunctionsdefined,let'susethem:

puzzles_file=os.path.join('puzzles','sudoku-topn234.txt')

withopen(puzzles_file)asstream:

puzzles=[puzzle.strip()forpuzzleinstream]

#singlethreadsolve

withtimer():

res_batch=batch_solve(puzzles)

#parallelsolve,1processperpuzzle

withtimer():

res_parallel_single=parallel_single_solver(puzzles)

#parallelbatchsolve,1batchperprocess

withtimer():

res_parallel_batch=parallel_batch_solver(puzzles)

#Quickwaytoverifythattheresultsarethesame,but

#possiblyinadifferentorder,astheydependonhowthe

#processeshavebeenscheduled.

assert_items_equal=TestCase().assertCountEqual

assert_items_equal(res_batch,res_parallel_single)

assert_items_equal(res_batch,res_parallel_batch)

print('Done.')

Weuseasetof234veryhardsudokupuzzlesforthisbenchmarkingsession.Asyoucansee,wesimplyrunthethreefunctions,batch_solve,parallel_single_solver,andparallel_batch_solver,allwithinatimedcontext.Wecollecttheresults,and,justtomakesure,weverifythatalltherunshaveproducedthesameresults.

Ofcourse,inthesecondandthirdruns,wehaveusedmultiprocessing,sowecannotguaranteethattheorderintheresultswillbethesameasthatofthesingle-threadedbatch_solve.ThisminorissueisbrilliantlysolvedwiththeaidofassertCountEqual,oneoftheworst-namedmethodsinthePythonstandardlibrary.WefinditintheTestCaseclass,whichwecaninstantiatejusttotakeareferencetothemethodweneed.We'renotactuallyrunningunittests,butthisisacooltrick,andIwantedtoshowittoyou.Let'sseetheoutputofrunningthismodule:

$pythonprocess_solver.py

Elapsedtime:5.368s

Elapsedtime:2.856s

Elapsedtime:2.818s

Done.

Wow.Thatisquiteinteresting.Firstofall,youcanonceagainseethatmymachinehasatwo-coreprocessor,asthetimeelapsedforthemultiprocessingrunsisabouthalfthetimetakenbythesingle-threadedsolver.However,whatisactuallymuchmoreinterestingisthefactthatthereisbasicallynodifferenceinthetimetakenbythetwomultiprocessingfunctions.Multiplerunssometimesendinfavorofoneapproach,andsometimesinfavoroftheother.Understandingwhyrequiresadeepunderstandingofallthecomponentsthataretakingpartinthegame,notjusttheprocesses,andthereforeisnotsomethingwecandiscusshere.Itisfairlysafetosaythough,thatthetwoapproachesarecomparableintermsofperformance.

Inthesourcecodeforthebook,youcanfindtestsinthesudokufolder,withinstructionsonhowtorunthem.Takethetimetocheckthemout!

Andnow,let'sgettothefinalexample.

Examplethree–downloadingrandompicturesThisexamplehasbeenfuntocode.Wearegoingtodownloadrandompicturesfromawebsite.I'llshowyouthreeversions:aserialone,amultiprocessingone,andfinallyasolutioncodedusingasyncio.Intheseexamples,wearegoingtouseawebsitecalledhttp://lorempixel.com,whichprovidesyouwithanAPIthatyoucancalltogetrandomimages.Ifyoufindthatthewebsiteisdownorslow,youcanuseanexcellentalternativetoit:https://lorempizza.com/.

ItmaybesomethingofaclichéforabookwrittenbyanItalian,butthepicturesaregorgeous.Youcansearchforanotheralternativeontheweb,ifyouwanttohavesomefun.Whateverwebsiteyouchoose,pleasebesensibleandtrynottohammeritbymakingamillionrequeststoit.Themultiprocessingandasyncioversionsofthiscodecanbequiteaggressive!

Let'sstartbyexploringthesingle-threadedversionofthecode:

#aio/randompix_serial.py

importos

fromsecretsimporttoken_hex

importrequests

PICS_FOLDER='pics'

URL='http://lorempixel.com/640/480/'

defdownload(url):

resp=requests.get(URL)

returnsave_image(resp.content)

defsave_image(content):

filename='{}.jpg'.format(token_hex(4))

path=os.path.join(PICS_FOLDER,filename)

withopen(path,'wb')asstream:

stream.write(content)

returnfilename

defbatch_download(url,n):

return[download(url)for_inrange(n)]

if__name__=='__main__':

saved=batch_download(URL,10)

print(saved)

Thiscodeshouldbestraightforwardtoyoubynow.Wedefineadownload

function,whichmakesarequesttothegivenURL,savestheresultbycallingsave_image,andfeedsitthebodyoftheresponsefromthewebsite.Savingtheimageisverysimple:wecreatearandomfilenamewithtoken_hex,justbecauseit'sfun,thenwecalculatethefullpathofthefile,createitinbinarymode,andwriteintoitthecontentoftheresponse.Wereturnthefilenametobeabletoprintitonscreen.Finallybatch_downloadsimplyrunsthenrequestswewanttorunandreturnsthefilenamesasaresult.

Youcanleapfrogtheif__name__...linefornow,itwillbeexplainedinChapter12,GUIsandScriptsandit'snotimportanthere.Allwedoiscallbatch_downloadwiththeURLandwetellittodownload10images.Ifyouhaveaneditor,openthepicsfolder,andyoucanseeitgettingpopulatedinafewseconds(alsonotice:thescriptassumesthepicsfolderexists).

Let'sspicethingsupabit.Let'sintroducemultiprocessing(thecodeisvastlysimilar,soIwillnotrepeatit):

#aio/randompix_proc.py

...

fromconcurrent.futuresimportProcessPoolExecutor,as_completed

...

defbatch_download(url,n,workers=4):

withProcessPoolExecutor(max_workers=workers)asexecutor:

futures=(executor.submit(download,url)for_inrange(n))

return[future.result()forfutureinas_completed(futures)]

...

Thetechniqueshouldbefamiliartoyoubynow.Wesimplysubmitjobstotheexecutor,andcollecttheresultsastheybecomeavailable.BecausethisisIOboundcode,theprocessesworkquitefastandthereisheavycontext-switchingwhiletheprocessesarewaitingfortheAPIresponse.Ifyouhaveaviewoverthepicsfolder,youwillnoticethatit'snotgettingpopulatedinalinearfashionanymore,butrather,inbatches.

Let'snowlookattheasyncioversionofthisexample.

DownloadingrandompictureswithasyncioThecodeisprobablythemostchallengingofthewholechapter,sodon'tfeelbadifitistoomuchforyouatthismomentintime.Ihaveaddedthisexamplejustasamouthwateringdevice,toencourageyoutodigdeeperintotheheartofPythonasynchronousprogramming.Anotherthingworthknowingisthatthereareprobablyseveralotherwaystowritethissamelogic,sopleasebearinmindthatthisisjustoneofthepossibleexamples.

Theasynciomoduleprovidesinfrastructureforwritingsingle-threaded,concurrentcodeusingcoroutines,multiplexingIOaccessoversocketsandotherresources,runningnetworkclientsandservers,andotherrelatedprimitives.ItwasaddedtoPythoninversion3.4,andsomeclaimitwillbecomethedefactostandardforwritingPythoncodeinthefuture.Idon'tknowwhetherthat'strue,butIknowitisdefinitelyworthseeinganexample:

#aio/randompix_corout.py

importos

fromsecretsimporttoken_hex

importasyncio

importaiohttp

Firstofall,wecannotuserequestsanymore,asitisnotsuitableforasyncio.Wehavetouseaiohttp,sopleasemakesureyouhaveinstalledit(it'sintherequirementsforthebook):

PICS_FOLDER='pics'

URL='http://lorempixel.com/640/480/'

asyncdefdownload_image(url):

asyncwithaiohttp.ClientSession()assession:

asyncwithsession.get(url)asresp:

returnawaitresp.read()

Thepreviouscodedoesnotlooktoofriendly,butit'snotsobad,onceyouknowtheconceptsbehindit.Wedefinetheasynccoroutinedownload_image,whichtakesaURLasparameter.

Incaseyoudon'tknow,acoroutineisacomputerprogramcomponentthatgeneralizes

subroutinesfornon-preemptivemultitasking,byallowingmultipleentrypointsforsuspendingandresumingexecutionatcertainlocations.Asubroutineisasequenceofprograminstructionsthatperformsaspecifictask,packagedasaunit.

Insidedownload_image,wecreateasessionobjectusingtheClientSessioncontextmanager,andthenwegettheresponsebyusinganothercontextmanager,thistimefromsession.get.Thefactthatthesemanagersaredefinedasasynchronoussimplymeansthattheyareabletosuspendexecutionintheirenterandexitmethods.Wereturnthecontentoftheresponsebyusingtheawaitkeyword,whichallowssuspension.Noticethatcreatingasessionforeachrequestisnotoptimal,butIfeltthatforthepurposeofthisexampleIwouldkeepthecodeasstraightforwardaspossible,soIleaveitsoptimizationtoyou,asanexercise.

Let'sproceedwiththenextsnippet:

asyncdefdownload(url,semaphore):

asyncwithsemaphore:

content=awaitdownload_image(url)

filename=save_image(content)

returnfilename

defsave_image(content):

filename='{}.jpg'.format(token_hex(4))

path=os.path.join(PICS_FOLDER,filename)

withopen(path,'wb')asstream:

stream.write(content)

returnfilename

Anothercoroutine,download,getsaURLandasemaphore.Allitdoesisfetchthecontentoftheimage,bycallingdownload_image,savingit,andreturningthefilename.Theinterestingbithereistheuseofthatsemaphore.Weuseitasanasynchronouscontextmanager,sothatwecansuspendthiscoroutineaswell,andallowaswitchtosomethingelse,butmorethanhow,itisimportanttounderstandwhywewanttouseasemaphore.Thereasonissimple,thissemaphoreiskindoftheequivalentofapoolofthreads.WeuseittoallowatmostNcoroutinestobeactiveatthesametime.Weinstantiateitinthenextfunction,andwepass10astheinitialvalue.Everytimeacoroutineacquiresthesemaphore,itsinternalcounterisdecreasedby1,thereforewhen10coroutineshaveacquiredit,thenextonewillsitandwait,untilthesemaphoreisreleasedbyacoroutinethathascompleted.ThisisanicewaytotrytolimithowaggressivelywearefetchingimagesfromthewebsiteAPI.

Thesave_imagefunctionisnotacoroutine,anditslogichasalreadybeendiscussedinthepreviousexamples.Let'snowgettothepartofthecodewhere

executiontakesplace:

defbatch_download(images,url):

loop=asyncio.get_event_loop()

semaphore=asyncio.Semaphore(10)

cors=[download(url,semaphore)for_inrange(images)]

res,_=loop.run_until_complete(asyncio.wait(cors))

loop.close()

return[r.result()forrinres]

if__name__=='__main__':

saved=batch_download(20,URL)

print(saved)

Wedefinethebatch_downloadfunction,whichtakesanumber,images,andtheURLofwheretofetchthem.Thefirstthingitdoesiscreateaneventloop,whichisnecessarytorunanyasynchronouscode.Theeventloopisthecentralexecutiondeviceprovidedbyasyncio.Itprovidesmultiplefacilities,including:

Registering,executing,andcancellingdelayedcalls(timeouts)CreatingclientandservertransportsforvariouskindsofcommunicationLaunchingsubprocessesandtheassociatedtransportsforcommunicationwithanexternalprogramDelegatingcostlyfunctioncallstoapoolofthreads

Aftertheeventloopiscreated,weinstantiatethesemaphore,andthenweproceedtocreatealistoffutures,cors.Bycallingloop.run_until_complete,wemakesuretheeventloopwillrununtilthewholetaskhasbeencompleted.Wefeedittheresultofacalltoasyncio.wait,whichwaitsforthefuturestocomplete.

Whendone,weclosetheeventloop,andreturnalistoftheresultsyieldedbyeachfutureobject(thefilenamesofthesavedimages).Noticehowwecapturetheresultsofthecalltoloop.run_until_complete.Wedon'treallycarefortheerrors,soweassign_totheseconditeminthetuple.ThisisacommonPythonidiomusedwhenwewanttosignalthatwe'renotinterestedinthatobject.

Attheendofthemodule,wecallbatch_downloadandweget20imagessaved.Theycomeinbatches,andthewholeprocessislimitedbyasemaphorewithonly10availablespots.

Andthat'sit!Tolearnmoreaboutasyncio,pleaserefertothedocumentationpage(https://docs.python.org/3.7/library/asyncio.html)fortheasynciomoduleonthestandardlibrary.Thisexamplewasfuntocode,andhopefullyitwillmotivate

youtostudyhardandunderstandtheintricaciesofthiswonderfulsideofPython.

SummaryInthischapter,welearnedaboutconcurrencyandparallelism.Wesawhowthreadsandprocesseshelpinachievingoneandtheother.Weexploredthenatureofthreadsandtheissuesthattheyexposeusto:raceconditionsanddeadlocks.

Welearnedhowtosolvethoseissuesbyusinglocksandcarefulresourcemanagement.Wealsolearnedhowtomakethreadscommunicateandsharedata,andwetalkedaboutthescheduler,whichisthatpartoftheoperatingsystemthatdecideswhichthreadwillrunatanygiventime.Wethenmovedtoprocesses,andexploredabunchoftheirpropertiesandcharacteristics.

Followingtheinitialtheoreticalpart,welearnedhowtoimplementthreadsandprocessesinPython.Wedealtwithmultiplethreadsandprocesses,fixedraceconditions,andlearnedworkaroundstostopthreadswithoutleavinganyresourceopenbymistake.WealsoexploredIPC,andusedqueuestoexchangemessagesbetweenprocessesandthreads.Wealsoplayedwitheventsandbarriers,whicharesomeofthetoolsprovidedbythestandardlibrarytocontroltheflowofexecutioninanon-deterministicenvironment.

Afteralltheseintroductoryexamples,wedeepdivedintothreecaseexamples,whichshowedhowtosolvethesameproblemusingdifferentapproaches:single-thread,multithread,multiprocess,andasyncio.

Welearnedaboutmergesortandhow,ingeneral,divideandconqueralgorithmsareeasytoparallelize.

Welearnedaboutsudoku,andexploredanicesolutionthatusesalittlebitofartificialintelligencetorunanefficientalgorithm,whichwethenranindifferentserialandparallelmodes.

Finally,wesawhowtodownloadrandompicturesfromawebsite,usingserial,multiprocess,andasynciocode.Thelatterwasbyfarthehardestpieceofcodeinthewholebook,anditspresenceinthechapterservesasareminder,orsomesortofmilestonethatwillencouragethereadertolearnPythonwell,anddeeply.

Nowwe'llmoveontomuchsimpler,andmostlyproject-orientedchapters,wherewegetatasteofdifferentreal-worldapplicationsindifferentcontexts.

DebuggingandTroubleshooting"Ifdebuggingistheprocessofremovingsoftwarebugs,thenprogrammingmustbetheprocessofputtingthemin."

–EdsgerW.Dijkstra

Inthelifeofaprofessionalcoder,debuggingandtroubleshootingtakeupasignificantamountoftime.Evenifyouworkonthemostbeautifulcodebaseeverwrittenbyahuman,therewillstillbebugsinit;thatisguaranteed.

Wespendanawfullotoftimereadingotherpeople'scodeand,inmyopinion,agoodsoftwaredeveloperissomeonewhokeepstheirattentionhigh,evenwhenthey'rereadingcodethatisnotreportedtobewrongorbuggy.

Beingabletodebugcodeefficientlyandquicklyisaskillthateverycoderneedstokeepimproving.Somethinkthatbecausetheyhavereadthemanual,they'refine,buttherealityis,thenumberofvariablesinthegameissogreatthatthereisnomanual.Thereareguidelinesonecanfollow,butthereisnomagicbookthatwillteachyoueverythingyouneedtoknowinordertobecomegoodatthis.

Ifeelthatonthisparticularsubject,Ihavelearnedthemostfrommycolleagues.Itamazesmetoobservesomeoneveryskilledattackingaproblem.Ienjoyseeingthestepstheytake,thethingstheyverifytoexcludepossiblecauses,andthewaytheyconsiderthesuspectsthateventuallyleadthemtoasolution.

Everycolleagueweworkwithcanteachussomething,orsurpriseuswithafantasticguessthatturnsouttobetherightone.Whenthathappens,don'tjustremaininwonderment(orworse,inenvy),butseizethemomentandaskthemhowtheygottothatguessandwhy.Theanswerwillallowyoutoseewhetherthereissomethingyoucanstudyin-depthlateronsothat,maybenexttime,you'llbetheonewhowillcatchthebug.

Somebugsareveryeasytospot.Theycomeoutofcoarsemistakesand,onceyouseetheeffectsofthosemistakes,it'seasytofindasolutionthatfixestheproblem.

Butthereareotherbugsthataremuchmoresubtle,muchmoreslippery,and

requiretrueexpertise,andagreatdealofcreativityandout-of-the-boxthinking,tobedealtwith.

Theworstofall,atleastforme,arethenondeterministicones.Thesesometimeshappen,andsometimesdon't.SomehappenonlyinenvironmentAbutnotinenvironmentB,eventhoughAandBaresupposedtobeexactlythesame.Thosebugsarethetrulyevilones,andtheycandriveyoucrazy.

Andofcourse,bugsdon'tjusthappeninthesandbox,right?Withyourbosstellingyou,"Don'tworry!Takeyourtimetofixthis.Havelunchfirst!"Nope.TheyhappenonaFridayathalfpastfive,whenyourbrainiscookedandyoujustwanttogohome.It'sinthosemomentswheneveryoneisgettingupsetinasplitsecond,whenyourbossisbreathingdownyourneck,thatyouhavetobeabletokeepcalm.AndIdomeanit.That'sthemostimportantskilltohaveifyouwanttobeabletofightbugseffectively.Ifyouallowyourmindtogetstressed,saygoodbyetocreativethinking,tologicaldeduction,andtoeverythingyouneedatthatmoment.Sotakeadeepbreath,sitproperly,andfocus.

Inthischapter,Iwilltrytodemonstratesomeusefultechniquesthatyoucanemployaccordingtotheseverityofthebug,andafewsuggestionsthatwillhopefullyboostyourweaponsagainstbugsandissues.

Specifically,we'regoingtolookatthefollowing:

DebuggingtechniquesProfilingAssertions

Troubleshootingguidelines

DebuggingtechniquesInthispart,I'llpresentyouwiththemostcommontechniques,theonesIusemostoften;however,pleasedon'tconsiderthislisttobeexhaustive.

DebuggingwithprintThisisprobablytheeasiesttechniqueofall.It'snotveryeffective,itcannotbeusedeverywhere,anditrequiresaccesstoboththesourcecodeandaTerminalthatwillrunit(andthereforeshowtheresultsoftheprintfunctioncalls).

However,inmanysituations,thisisstillaquickandusefulwaytodebug.Forexample,ifyouaredevelopingaDjangowebsiteandwhathappensinapageisnotwhatyouwouldexpect,youcanfilltheviewwithprintsandkeepaneyeontheconsolewhileyoureloadthepage.Whenyouscattercallstoprintinyourcode,younormallyendupinasituationwhereyouduplicatealotofdebuggingcode,eitherbecauseyou'reprintingatimestamp(likewedidwhenweweremeasuringhowfastlistcomprehensionsandgeneratorswere),orbecauseyouhavesomehowtobuildastringofsomesortthatyouwanttodisplay.

Anotherissueisthatit'sextremelyeasytoforgetcallstoprintinyourcode.

So,forthesereasons,ratherthanusingabarecalltoprint,Isometimesprefertocodeacustomfunction.Let'sseehow.

DebuggingwithacustomfunctionHavingacustomfunctioninasnippetthatyoucanquicklygrabandpasteintothecode,andthenusetodebug,canbeveryuseful.Ifyou'refast,youcanalwayscodeoneonthefly.Theimportantthingistocodeitinawaythatitwon'tleavestuffaroundwhenyoueventuallyremovethecallsanditsdefinition.Thereforeit'simportanttocodeitinawaythatiscompletelyself-contained.Anothergoodreasonforthisrequirementisthatitwillavoidpotentialnameclasheswiththerestofthecode.

Let'sseeanexampleofsuchafunction:

#custom.py

defdebug(*msg,print_separator=True):

print(*msg)

ifprint_separator:

print('-'*40)

debug('Datais...')

debug('Different','Strings','Arenotaproblem')

debug('Afterwhileloop',print_separator=False)

Inthiscase,Iamusingakeyword-onlyargumenttobeabletoprintaseparator,whichisalineof40dashes.

Thefunctionisverysimple.Ijustredirectwhateverisinmsgtoacalltoprintand,ifprint_separatorisTrue,Iprintalineseparator.Runningthecodewillshowthefollowing:

$pythoncustom.py

Datais...

----------------------------------------

DifferentStringsArenotaproblem

----------------------------------------

Afterwhileloop

Asyoucansee,thereisnoseparatorafterthelastline.

Thisisjustoneeasywaytosomehowaugmentasimplecalltotheprintfunction.Let'sseehowwecancalculateatimedifferencebetweencalls,usingoneofPython'strickyfeaturestoouradvantage:

#custom_timestamp.py

fromtimeimportsleep

defdebug(*msg,timestamp=[None]):

print(*msg)

fromtimeimporttime#localimport

iftimestamp[0]isNone:

timestamp[0]=time()#1

else:

now=time()

print(

'Timeelapsed:{:.3f}s'.format(now-timestamp[0])

)

timestamp[0]=now#2

debug('Enteringnastypieceofcode...')

sleep(.3)

debug('Firststepdone.')

sleep(.5)

debug('Secondstepdone.')

Thisisabittrickier,butstillquitesimple.First,noticeweimportthetimefunctionfromthetimemodulefrominsidethedebugfunction.Thisallowsustoavoidhavingtoaddthatimportoutsideofthefunction,andmaybeforgetitthere.

TakealookathowIdefinedtimestamp.It'salist,ofcourse,butwhat'simportanthereisthatitisamutableobject.ThismeansthatitwillbesetupwhenPythonparsesthefunctionanditwillretainitsvaluethroughoutdifferentcalls.Therefore,ifweputatimestampinitaftereachcall,wecankeeptrackoftimewithouthavingtouseanexternalglobalvariable.Iborrowedthistrickfrommystudiesonclosures,atechniquethatIencourageyoutoreadaboutbecauseit'sveryinteresting.

Right,so,afterhavingprintedwhatevermessagewehadtoprintandsomeimportingtime,wetheninspectthecontentoftheonlyitemintimestamp.IfitisNone,wehavenopreviousreference,thereforewesetthevaluetothecurrenttime(#1).

Ontheotherhand,ifwehaveapreviousreference,wecancalculateadifference(whichwenicelyformattothreedecimaldigits)andthenwefinallyputthecurrenttimeagainintimestamp(#2).It'sanicetrick,isn'tit?

Runningthiscodeshowsthisresult:

$pythoncustom_timestamp.py

Enteringnastypieceofcode...

Firststepdone.

Timeelapsed:0.304s

Secondstepdone.

Timeelapsed:0.505s

Whateveryoursituation,havingaself-containedfunctionlikethiscanbeveryuseful.

InspectingthetracebackWebrieflytalkedaboutthetracebackinChapter8,Testing,Profiling,andDealingwithExceptions,whenwesawseveraldifferentkindsofexceptions.Thetracebackgivesyouinformationaboutwhatwentwronginyourapplication.It'shelpfultoreadit,solet'sseeasmallexample:

#traceback_simple.py

d={'some':'key'}

key='some-other'

print(d[key])

Wehaveadictionaryandwetrytoaccessakeythatisn'tinit.YoushouldrememberthatthiswillraiseaKeyErrorexception.Let'srunthecode:

$pythontraceback_simple.py

Traceback(mostrecentcalllast):

File"traceback_simple.py",line3,in<module>

print(d[key])

KeyError:'some-other'

Youcanseethatwegetalltheinformationweneed:themodulename,thelinethatcausedtheerror(boththenumberandtheinstruction),andtheerroritself.Withthisinformation,youcangobacktothesourcecodeandtrytounderstandwhat'sgoingon.

Let'snowcreateamoreinterestingexamplethatbuildsontopofthis,andexercisesafeaturethatisonlyavailableinPython3.Imaginethatwe'revalidatingadictionary,workingonmandatoryfields,thereforeweexpectthemtobethere.Ifnot,weneedtoraiseacustomValidationErrorthatwewilltrapfurtherupstreamintheprocessthatrunsthevalidator(whichisnotshownhere,soitcouldbeanything,really).Itshouldbesomethinglikethis:

#traceback_validator.py

classValidatorError(Exception):

"""RaisedwhenaccessingadictresultsinKeyError."""

d={'some':'key'}

mandatory_key='some-other'

try:

print(d[mandatory_key])

exceptKeyErroraserr:

raiseValidatorError(

f'`{mandatory_key}`notfoundind.'

)fromerr

Wedefineacustomexceptionthatisraisedwhenthemandatorykeyisn'tthere.Notethatitsbodyconsistsofitsdocumentationstring,sowedon'tneedtoaddanyotherstatements.

Verysimply,wedefineadummydictandtrytoaccessitusingmandatory_key.WetrapKeyErrorandraiseValidatorErrorwhenthathappens.Andwedoitbyusingtheraise...from...syntax,whichwasintroducedinPython3byPEP3134(https://www.python.org/dev/peps/pep-3134/),tochainexceptions.ThepurposeofdoingthisisthatwemayalsowanttoraiseValidatorErrorinothercircumstances,notnecessarilyasaconsequenceofamandatorykeybeingmissing.Thistechniqueallowsustorunthevalidationinasimpletry/exceptthatonlycaresaboutValidatorError.

Withoutbeingabletochainexceptions,wewouldloseinformationaboutKeyError.Thecodeproducesthisresult:

$pythontraceback_validator.py

Traceback(mostrecentcalllast):

File"traceback_validator.py",line7,in<module>

print(d[mandatory_key])

KeyError:'some-other'

Theaboveexceptionwasthedirectcauseofthefollowingexception:

Traceback(mostrecentcalllast):

File"traceback_validator.py",line10,in<module>

'`{}`notfoundind.'.format(mandatory_key))fromerr

__main__.ValidatorError:`some-other`notfoundind.

Thisisbrilliant,becausewecanseethetracebackoftheexceptionthatledustoraiseValidationError,aswellasthetracebackfortheValidationErroritself.

Ihadanicediscussionwithoneofmyreviewersaboutthetracebackyougetfromthepipinstaller.HewashavingtroublesettingeverythingupinordertoreviewthecodeforChapter13,DataScience.HisfreshUbuntuinstallationwasmissingafewlibrariesthatwereneededbythepippackagesinordertoruncorrectly.

Thereasonhewasblockedwasthathewastryingtofixtheerrorsdisplayedinthetracebackstartingfromthetopone.Isuggestedthathestartedfromthebottomoneinstead,andfixthat.Thereasonwasthat,iftheinstallerhadgottentothatlastline,Iguessthatbeforethat,whatevererrormayhaveoccurred,itwasstillpossibletorecoverfromit.Onlyafterthelastline,pipdecideditwasn't

possibletocontinueanyfurther,andthereforeIstartedfixingthatone.Oncethelibrariesrequiredtofixthaterrorhadbeeninstalled,everythingelsewentsmoothly.

Readingatracebackcanbetricky,andmyfriendwaslackingthenecessaryexperiencetoaddressthisproblemcorrectly.Therefore,ifyouendupinthesamesituation.Don'tbediscouraged,andtrytoshakethingsupabit,don'ttakeanythingforgranted.

Pythonhasahugeandwonderfulcommunityandit'sveryunlikelythat,whenyouencounteraproblem,you'rethefirstonetoseeit,soopenabrowserandsearch.Bydoingso,yoursearchingskillswillalsoimprovebecauseyouwillhavetotrimtheerrordowntotheminimumbutessentialsetofdetailsthatwillmakeyoursearcheffective.

Ifyouwanttoplayandunderstandthetracebackabitbetter,inthestandardlibrarythereisamoduleyoucanusecalled,surprisesurprise,traceback.Itprovidesastandardinterfacetoextract,format,andprintstacktracesofPythonprograms,mimickingthebehaviorofthePythoninterpreterwhenitprintsastacktrace.

UsingthePythondebuggerAnotherveryeffectivewayofdebuggingPythonistousethePythondebugger:pdb.Insteadofusingitdirectlythough,youshoulddefinitelycheckoutthepdbpplibrary.pdbppaugmentsthestandardpdbinterfacebyprovidingsomeconvenienttools,myfavoriteofwhichisthestickymode,whichallowsyoutoseeawholefunctionwhileyoustepthroughitsinstructions.

Thereareseveraldifferentwaystousethisdebugger(whicheverversion,it'snotimportant),butthemostcommononeconsistsofsimplysettingabreakpointandrunningthecode.WhenPythonreachesthebreakpoint,executionissuspendedandyougetconsoleaccesstothatpointsothatyoucaninspectallthenames,andsoon.Youcanalsoalterdataontheflytochangetheflowoftheprogram.

Asatoyexample,let'spretendwehaveaparserthatisraisingKeyErrorbecauseakeyismissinginadictionary.ThedictionaryisfromaJSONpayloadthatwecannotcontrol,andwejustwant,forthetimebeing,tocheatandpassthatcontrol,sincewe'reinterestedinwhatcomesafterward.Let'sseehowwecouldinterceptthismoment,inspectthedata,fixit,andgettothebottomofit,withpdbpp:

#pdebugger.py

#dcomesfromaJSONpayloadwedon'tcontrol

d={'first':'v1','second':'v2','fourth':'v4'}

#keysalsocomesfromaJSONpayloadwedon'tcontrol

keys=('first','second','third','fourth')

defdo_something_with_value(value):

print(value)

forkeyinkeys:

do_something_with_value(d[key])

print('Validationdone.')

Asyoucansee,thiscodewillbreakwhenkeygetsthe'third'value,whichismissinginthedictionary.Remember,we'repretendingthatbothdandkeyscomedynamicallyfromaJSONpayloadwedon'tcontrol,soweneedtoinspecttheminordertofixdandpasstheforloop.Ifwerunthecodeasitis,wegetthefollowing:

$pythonpdebugger.py

v1

v2

Traceback(mostrecentcalllast):

File"pdebugger.py",line10,in<module>

do_something_with_value(d[key])

KeyError:'third'

Soweseethatthatkeyismissingfromthedictionary,butsinceeverytimewerunthiscodewemaygetadifferentdictionaryorkeystuple,thisinformationdoesn'treallyhelpus.Let'sinjectacalltopdbjustbeforetheforloop.Youhavetwooptions:

importpdb

pdb.set_trace()

Thisisthemostcommonwayofdoingit.Youimportpdbandcallitsset_tracemethod.Manydevelopershavemacrosintheireditortoaddthislinewithakeyboardshortcut.AsofPython3.7though,wecansimplifythingsevenfurther,tothis:

breakpoint()

Thenewbreakpointbuilt-infunctioncallssys.breakpointhook()underthehood,whichisprogrammedbydefaulttocallpdb.set_trace().However,youcanreprogramsys.breakpointhook()tocallwhateveryouwant,andthereforebreakpointwillpointtothattoo,whichisveryconvenient.

Thecodeforthisexampleisinthepdebugger_pdb.pymodule.Ifwenowrunthiscode,thingsgetinteresting(notethatyouroutputmayvaryalittleandthatallthecommentsinthisoutputwereaddedbyme):

$pythonpdebugger_pdb.py

(Pdb++)l

16

17->forkeyinkeys:#breakpointcomesin

18do_something_with_value(d[key])

19

(Pdb++)keys#inspectingthekeystuple

('first','second','third','fourth')

(Pdb++)d.keys()#inspectingkeysof`d`

dict_keys(['first','second','fourth'])

(Pdb++)d['third']='placeholder'#addtmpplaceholder

(Pdb++)c#continue

v1

v2

placeholder

v4

Validationdone.

First,notethatwhenyoureachabreakpoint,you'reservedaconsolethattellsyouwhereyouare(thePythonmodule)andwhichlineisthenextonetobeexecuted.Youcan,atthispoint,performabunchofexploratoryactions,suchasinspectingthecodebeforeandafterthenextline,printingastacktrace,andinteractingwiththeobjects.PleaseconsulttheofficialPythondocumentation(https://docs.python.org/3.7/library/pdb.html)onpdbtolearnmoreaboutthis.Inourcase,wefirstinspectthekeystuple.Afterthat,weinspectthekeysofd.Weseethat'third'ismissing,soweputitinourselves(couldthisbedangerous—thinkaboutit).Finally,nowthatallthekeysarein,wetypec,whichmeans(c)ontinue.

pdbalsogivesyoutheabilitytoproceedwithyourcodeonelineatatimeusing(n)ext,to(s)tepintoafunctionfordeeperanalysis,ortohandlebreakswith(b)reak.Foracompletelistofcommands,pleaserefertothedocumentationortype(h)elpintheconsole.

Youcansee,fromtheoutputoftheprecedingrun,thatwecouldfinallygettotheendofthevalidation.

pdb(orpdbpp)isaninvaluabletoolthatIuseeveryday.So,goandhavefun,setabreakpointsomewhere,andtrytoinspectit,followtheofficialdocumentationandtrythecommandsinyourcodetoseetheireffectandlearnthemwell.

NoticethatinthisexampleIhaveassumedyouinstalledpdbpp.Ifthatisnotthecase,thenyoumightfindthatsomecommandsdon'tworkthesameinpdb.Oneexampleistheletterd,whichwouldbeinterpretedfrompdbasthedowncommand.Inordertogetaroundthat,youwouldhavetoadda!infrontofd,totellpdbthatitismeanttobeinterpretedliterally,andnotasacommand.

InspectinglogfilesAnotherwayofdebuggingamisbehavingapplicationistoinspectitslogfiles.Logfilesarespecialfilesinwhichanapplicationwritesdownallsortsofthings,normallyrelatedtowhat'sgoingoninsideofit.Ifanimportantprocedureisstarted,Iwouldtypicallyexpectacorrespondinglineinthelogs.Itisthesamewhenitfinishes,andpossiblyforwhathappensinsideofit.

Errorsneedtobeloggedsothatwhenaproblemhappens,wecaninspectwhatwentwrongbytakingalookattheinformationinthelogfiles.

TherearemanydifferentwaystosetupaloggerinPython.Loggingisverymalleableandyoucanconfigureit.Inanutshell,therearenormallyfourplayersinthegame:loggers,handlers,filters,andformatters:

Loggers:ExposetheinterfacethattheapplicationcodeusesdirectlyHandlers:Sendthelogrecords(createdbyloggers)totheappropriatedestinationFilters:Provideafiner-grainedfacilityfordeterminingwhichlogrecordstooutputFormatters:Specifythelayoutofthelogrecordsinthefinaloutput

LoggingisperformedbycallingmethodsoninstancesoftheLoggerclass.Eachlineyouloghasalevel.Thelevelsnormallyusedare:DEBUG,INFO,WARNING,ERROR,andCRITICAL.Youcanimportthemfromtheloggingmodule.Theyareinorderofseverityandit'sveryimportanttousethemproperlybecausetheywillhelpyoufilterthecontentsofalogfilebasedonwhatyou'researchingfor.Logfilesusuallybecomeextremelybigsoit'sveryimportanttohavetheinformationinthemwrittenproperlysothatyoucanfinditquicklywhenitmatters.

Youcanlogtoafilebutyoucanalsologtoanetworklocation,toaqueue,toaconsole,andsoon.Ingeneral,ifyouhaveanarchitecturethatisdeployedononemachine,loggingtoafileisacceptable,butwhenyourarchitecturespansovermultiplemachines(suchasinthecaseofservice-orientedormicroservicearchitectures),it'sveryusefultoimplementacentralizedsolutionforloggingsothatalllogmessagescomingfromeachservicecanbestoredandinvestigatedin

asingleplace.Ithelpsalot,otherwisetryingtocorrelategiantfilesfromseveraldifferentsourcestofigureoutwhatwentwrongcanbecometrulychallenging.

Aservice-orientedarchitecture(SOA)isanarchitecturalpatterninsoftwaredesigninwhichapplicationcomponentsprovideservicestoothercomponentsviaacommunicationsprotocol,typicallyoveranetwork.Thebeautyofthissystemisthat,whencodedproperly,eachservicecanbewritteninthemostappropriatelanguagetoserveitspurpose.Theonlythingthatmattersisthecommunicationwiththeotherservices,whichneedstohappenviaacommonformatsothatdataexchangecanbedone.MicroservicearchitecturesareanevolutionofSOAs,butfollowadifferentsetofarchitecturalpatterns.

Here,Iwillpresentyouwithaverysimpleloggingexample.Wewilllogafewmessagestoafile:

#log.py

importlogging

logging.basicConfig(

filename='ch11.log',

level=logging.DEBUG,#minimumlevelcaptureinthefile

format='[%(asctime)s]%(levelname)s:%(message)s',

datefmt='%m/%d/%Y%I:%M:%S%p')

mylist=[1,2,3]

logging.info('Startingtoprocess`mylist`...')

forpositioninrange(4):

try:

logging.debug(

'Valueatposition%sis%s',position,mylist[position]

)

exceptIndexError:

logging.exception('Faultyposition:%s',position)

logging.info('Doneparsing`mylist`.')

Let'sgothroughitlinebyline.First,weimporttheloggingmodule,thenwesetupabasicconfiguration.Ingeneral,aproduction-loggingconfigurationismuchmorecomplicatedthanthis,butIwantedtokeepthingsaseasyaspossible.Wespecifyafilename,theminimumlogginglevelwewanttocaptureinthefile,andthemessageformat.We'lllogthedateandtimeinformation,thelevel,andthemessage.

Iwillstartbylogginganinfomessagethattellsmewe'reabouttoprocessourlist.Then,Iwilllog(thistimeusingtheDEBUGlevel,byusingthedebugfunction)whichisthevalueatsomeposition.I'musingdebugherebecauseIwanttobeabletofilterouttheselogsinthefuture(bysettingtheminimumleveltologging.INFOormore),becauseImighthavetohandleverybiglistsandIdon't

wanttologallthevalues.

IfwegetIndexError(andwedo,sinceI'mloopingoverrange(4)),wecalllogging.exception(),whichisthesameaslogging.error(),butitalsoprintsthetraceback.

Attheendofthecode,Iloganotherinfomessagesayingwe'redone.Theresultisthis:

#ch11.log

[05/06/201811:13:48AM]INFO:Startingtoprocess`mylist`...

[05/06/201811:13:48AM]DEBUG:Valueatposition0is1

[05/06/201811:13:48AM]DEBUG:Valueatposition1is2

[05/06/201811:13:48AM]DEBUG:Valueatposition2is3

[05/06/201811:13:48AM]ERROR:Faultyposition:3

Traceback(mostrecentcalllast):

File"log.py",line15,in<module>

position,mylist[position]))

IndexError:listindexoutofrange

[05/06/201811:13:48AM]INFO:Doneparsing`mylist`.

Thisisexactlywhatweneedtobeabletodebuganapplicationthatisrunningonabox,andnotonourconsole.Wecanseewhatwenton,thetracebackofanyexceptionraised,andsoon.

Theexamplepresentedhereonlyscratchesthesurfaceoflogging.Foramorein-depthexplanation,youcanfindinformationinthePythonHOWTOssectionoftheofficialPythondocumentation:LoggingHOWTO,andLoggingCookbook.

Loggingisanart.Youneedtofindagoodbalancebetweenloggingeverythingandloggingnothing.Ideally,youshouldloganythingthatyouneedtomakesureyourapplicationisworkingcorrectly,andpossiblyallerrorsorexceptions.

OthertechniquesInthisfinalsection,I'dliketodemonstratebrieflyacoupleoftechniquesthatyoumayfinduseful.

ProfilingWetalkedaboutprofilinginChapter8,Testing,Profiling,andDealingwithExceptions,andI'monlymentioningitherebecauseprofilingcansometimesexplainweirderrorsthatareduetoacomponentbeingtooslow.Especiallywhennetworkingisinvolved,havinganideaofthetimingsandlatenciesyourapplicationhastogothroughisveryimportantinordertounderstandwhatmaybegoingonwhenproblemsarise,thereforeIsuggestyougetacquaintedwithprofilingtechniquesandalsoforatroubleshootingperspective.

AssertionsAssertionsareanicewaytomakeyourcodeensureyourassumptionsareverified.Iftheyare,allproceedsregularlybut,iftheyarenot,yougetaniceexceptionthatyoucanworkwith.Sometimes,insteadofinspecting,it'squickertodropacoupleofassertionsinthecodejusttoexcludepossibilities.Let'sseeanexample:

#assertions.py

mylist=[1,2,3]#thisideallycomesfromsomeplace

assert4==len(mylist)#thiswillbreak

forpositioninrange(4):

print(mylist[position])

Thiscodesimulatesasituationinwhichmylistisn'tdefinedbyuslikethat,ofcourse,butwe'reassumingithasfourelements.Soweputanassertionthere,andtheresultisthis:

$pythonassertions.py

Traceback(mostrecentcalllast):

File"assertions.py",line3,in<module>

assert4==len(mylist)#thiswillbreak

AssertionError

Thistellsusexactlywheretheproblemis.

WheretofindinformationInthePythonofficialdocumentation,thereisasectiondedicatedtodebuggingandprofiling,whereyoucanreadupaboutthebdbdebuggerframework,andaboutmodulessuchasfaulthandler,timeit,trace,tracemallock,andofcoursepdb.Justheadtothestandardlibrarysectioninthedocumentationandyou'llfindallthisinformationveryeasily.

TroubleshootingguidelinesInthisshortsection,I'dliketogiveyouafewtipsthatcomefrommytroubleshootingexperience.

UsingconsoleeditorsFirst,getcomfortableusingVimornanoasaneditor,andlearnthebasicsoftheconsole.Whenthingsbreak,youdon'thavetheluxuryofyoureditorwithallthebellsandwhistlesthere.Youhavetoconnecttoaboxandworkfromthere.Soit'saverygoodideatobecomfortablebrowsingyourproductionenvironmentwithconsolecommands,andbeabletoeditfilesusingconsole-basededitors,suchasvi,Vim,ornano.Don'tletyourusualdevelopmentenvironmentspoilyou.

WheretoinspectMysecondsuggestionconcernswheretoplaceyourdebuggingbreakpoints.Itdoesn'tmatterifyouareusingprint,acustomfunction,orpdb,youstillhavetochoosewheretoplacethecallsthatprovideyouwiththeinformation,right?

Well,someplacesarebetterthanothers,andtherearewaystohandlethedebuggingprogressionthatarebetterthanothers.

Inormallyavoidplacingabreakpointinanifclausebecause,ifthatclauseisnotexercised,IlosethechanceofgettingtheinformationIwanted.Sometimesit'snoteasyorquicktogettothebreakpoint,sothinkcarefullybeforeplacingthem.

Anotherimportantthingiswheretostart.Imaginethatyouhave100linesofcodethathandleyourdata.Datacomesinatline1,andsomehowit'swrongatline100.Youdon'tknowwherethebugis,sowhatdoyoudo?Youcanplaceabreakpointatline1andpatientlygothroughallthelines,checkingyourdata.Intheworstcasescenario,99lines(andmanycupsofcoffee)later,youspotthebug.So,considerusingadifferentapproach.

Youstartatline50,andinspect.Ifthedataisgood,itmeansthebughappenslater,inwhichcaseyouplaceyournextbreakpointatline75.Ifthedataatline50isalreadybad,yougoonbyplacingabreakpointatline25.Then,yourepeat.Eachtime,youmoveeitherbackwardorforward,byhalfthejumpyoudidlasttime.

Inourworst-casescenario,yourdebuggingwouldgofrom1,2,3,...,99,inalinearfashion,toaseriesofjumpssuchas50,75,87,93,96,...,99whichiswayfaster.Infact,it'slogarithmic.Thissearchingtechniqueiscalledbinarysearch,it'sbasedonadivide-and-conquerapproach,andit'sveryeffective,sotrytomasterit.

UsingteststodebugDoyourememberChapter8,Testing,Profiling,andDealingwithExceptions,abouttests?Well,ifwehaveabugandalltestsarepassing,itmeanssomethingiswrongormissinginourtestcodebase.So,oneapproachistomodifythetestsinsuchawaythattheycaterforthenewedgecasethathasbeenspotted,andthenworkyourwaythroughthecode.Thisapproachcanbeverybeneficial,becauseitmakessurethatyourbugwillbecoveredbyatestwhenit'sfixed.

Monitoring

Monitoringisalsoveryimportant.Softwareapplicationscangocompletelycrazyandhavenon-deterministichiccupswhentheyencounteredge-casesituationssuchasthenetworkbeingdown,aqueuebeingfull,oranexternalcomponentbeingunresponsive.Inthesecases,it'simportanttohaveanideaofwhatthebigpicturewaswhentheproblemoccurredandbeabletocorrelateittosomethingrelatedtoitinasubtle,perhapsmysteriousway.

YoucanmonitorAPIendpoints,processes,webpagesavailabilityandloadtimes,andbasicallyalmosteverythingthatyoucancode.Ingeneral,whenstartinganapplicationfromscratch,itcanbeveryusefultodesignitkeepinginmindhowyouwanttomonitorit.

SummaryInthisshortchapter,welookedatdifferenttechniquesandsuggestionsfordebuggingandtroubleshootingourcode.Debuggingisanactivitythatisalwayspartofasoftwaredeveloper'swork,soit'simportanttobegoodatit.

Ifapproachedwiththecorrectattitude,itcanbefunandrewarding.

Weexploredtechniquestoinspectourcodebaseonfunctions,logging,debuggers,tracebackinformation,profiling,andassertions.Wesawsimpleexamplesofmostofthemandwealsotalkedaboutasetofguidelinesthatwillhelpwhenitcomestofacingthefire.

Justrememberalwaystostaycalmandfocused,anddebuggingwillbemucheasier.Thistoo,isaskillthatneedstobelearnedandit'sthemostimportant.Anagitatedandstressedmindcannotworkproperly,logically,andcreatively,therefore,ifyoudon'tstrengthenit,itwillbehardforyoutoputallofyourknowledgetogooduse.

Inthenextchapter,wearegoingtoexploreGUIsandscripts,takinganinterestingdetourfromthemorecommonweb-applicationscenario.

GUIsandScripts"Auserinterfaceislikeajoke.Ifyouhavetoexplainit,it'snotthatgood."

–MartinLeBlanc

Inthischapter,we'regoingtoworkonaprojecttogether.Wearegoingtowriteasimplescraperthatfindsandsavesimagesfromawebpage.We'llfocusonthreeparts:

AsimpleHTTPwebserverinPythonAscriptthatscrapesagivenURLAGUIapplicationthatscrapesagivenURL

Agraphicaluserinterface(GUI)isatypeofinterfacethatallowstheusertointeractwithanelectronicdevicethroughgraphicalicons,buttons,andwidgets,asopposedtotext-basedorcommand-lineinterfaces,whichrequirecommandsortexttobetypedonthekeyboard.Inanutshell,anybrowser,anyofficesuitesuchasLibreOffice,and,ingeneral,anythingthatpopsupwhenyouclickonanicon,isaGUIapplication.

So,ifyouhaven'talreadydoneso,thiswouldbetheperfecttimetostartaconsoleandpositionyourselfinafoldercalledch12intherootofyourprojectforthisbook.Withinthatfolder,we'llcreatetwoPythonmodules(scrape.pyandguiscrape.py)andafolder(simple_server).Withinsimple_server,we'llwriteourHTMLpage:index.html.Imageswillbestoredinsimple_server/img.

Thestructureinch12shouldlooklikethis:

$tree-A

.

├──guiscrape.py

├──scrape.py

└──simple_server

├──img

│├──owl-alcohol.png

│├──owl-book.png

│├──owl-books.png

│├──owl-ebook.jpg

│└──owl-rose.jpeg

├──index.html

└──serve.sh

Ifyou'reusingeitherLinuxormacOS,youcandowhatIdoandputthecodetostarttheHTTPserverinaserve.shfile.OnWindows,you'llprobablywanttouseabatchfile.

TheHTMLpagewe'regoingtoscrapehasthefollowingstructure:

#simple_server/index.html

<!DOCTYPEhtml>

<htmllang="en">

<head><title>CoolOwls!</title></head>

<body>

<h1>Welcometomyowlgallery</h1>

<div>

<imgsrc="img/owl-alcohol.png"height="128"/>

<imgsrc="img/owl-book.png"height="128"/>

<imgsrc="img/owl-books.png"height="128"/>

<imgsrc="img/owl-ebook.jpg"height="128"/>

<imgsrc="img/owl-rose.jpeg"height="128"/>

</div>

<p>Doyoulikemyowls?</p>

</body>

</html>

It'sanextremelysimplepage,solet'sjustnotethatwehavefiveimages,threeofwhicharePNGsandtwoofwhichareJPGs(notethateventhoughtheyarebothJPGs,oneendswith.jpgandtheotherwith.jpeg,whicharebothvalidextensionsforthisformat).

So,PythongivesyouaverysimpleHTTPserverforfreethatyoucanstartwiththefollowingcommand(inthesimple_serverfolder):

$python-mhttp.server8000

ServingHTTPon0.0.0.0port8000(http://0.0.0.0:8000/)...

127.0.0.1--[06/May/201816:54:30]"GET/HTTP/1.1"200-

...

Thelastlineisthelogyougetwhenyouaccesshttp://localhost:8000,whereourbeautifulpagewillbeserved.Alternatively,youcanputthatcommandinafilecalledserve.sh,andjustrunthatwiththiscommand(makesureit'sexecutable):

$./serve.sh

Itwillhavethesameeffect.Ifyouhavethecodeforthisbook,yourpageshouldlooksomethinglikethis:

Feelfreetouseanyothersetofimages,aslongasyouuseatleastonePNGandoneJPG,andthatinthesrctagyouuserelativepaths,notabsoluteones.Igottheselovelyowlsfromhttps://openclipart.org/.

Firstapproach–scriptingNow,let'sstartwritingthescript.I'llgothroughthesourceinthreesteps:imports,argumentsparsing,andbusinesslogic.

TheimportsHere'showthescriptstarts:

#scrape.py

importargparse

importbase64

importjson

importos

frombs4importBeautifulSoup

importrequests

Goingthroughthemfromthetop,youcanseethatwe'llneedtoparsethearguments,whichwe'llfeedtothescriptitself(argparse).Wewillneedthebase64librarytosavetheimageswithinaJSONfile(json),andwe'llneedtoopenfilesforwriting(os).Finally,we'llneedBeautifulSoupforscrapingthewebpageeasily,andrequeststofetchitscontent.Iassumeyou'refamiliarwithrequestsaswehaveuseditinpreviouschapters.

WewillexploretheHTTPprotocolandtherequestsmechanisminChapter14,WebDevelopment,sofornow,let'sjust(simplistically)saythatweperformanHTTPrequesttofetchthecontentofawebpage.Wecandoitprogrammaticallyusingalibrary,suchasrequests,andit'smoreorlesstheequivalentoftypingaURLinyourbrowserandpressingEnter(thebrowserthenfetchesthecontentofawebpageanddisplaysittoyou).

Ofalltheseimports,onlythelasttwodon'tbelongtothePythonstandardlibrary,somakesureyouhavetheminstalled:

$pipfreeze|egrep-i"soup|requests"

beautifulsoup4==4.6.0

requests==2.18.4

Ofcourse,theversionnumbersmightbedifferentforyou.Ifthey'renotinstalled,usethiscommandtodoso:

$pipinstallbeautifulsoup4==4.6.0requests==2.18.4

Atthispoint,theonlythingthatIreckonmightconfuseyouisthebase64/jsoncouple,soallowmetospendafewwordsonthat.

Aswesawinthepreviouschapter,JSONisoneofthemostpopularformatsfordataexchangebetweenapplications.It'salsowidelyusedforotherpurposestoo,

forexample,tosavedatainafile.Inourscript,we'regoingtooffertheusertheabilitytosaveimagesasimagefiles,orasaJSONsinglefile.WithintheJSON,we'llputadictionarywithkeysastheimagenamesandvaluesastheircontent.Theonlyissueisthatsavingimagesinthebinaryformatistricky,andthisiswherethebase64librarycomestotherescue.

Thebase64libraryisactuallyquiteuseful.Forexample,everytimeyousendanemailwithanimageattachedtoit,theimagegetsencodedwithbase64beforetheemailissent.Ontherecipientside,imagesareautomaticallydecodedintotheiroriginalbinaryformatsothattheemailclientcandisplaythem.

ParsingargumentsNowthatthetechnicalitiesareoutoftheway,let'sseethesecondsectionofourscript(itshouldbeattheendofthescrape.pymodule):

if__name__=="__main__":

parser=argparse.ArgumentParser(

description='Scrapeawebpage.')

parser.add_argument(

'-t',

'--type',

choices=['all','png','jpg'],

default='all',

help='Theimagetypewewanttoscrape.')

parser.add_argument(

'-f',

'--format',

choices=['img','json'],

default='img',

help='Theformatimagesare_savedto.')

parser.add_argument(

'url',

help='TheURLwewanttoscrapeforimages.')

args=parser.parse_args()

scrape(args.url,args.format,args.type)

Lookatthatfirstline;itisaverycommonidiomwhenitcomestoscripting.AccordingtotheofficialPythondocumentation,the'__main__'stringisthenameofthescopeinwhichtop-levelcodeexecutes.Amodule's__name__issetequalto'__main__'whenreadfromstandardinput,ascript,orfromaninteractiveprompt.

Therefore,ifyouputtheexecutionlogicunderthatif,itwillberunonlywhenyourunthescriptdirectly,asits__name__willbe'__main__'.Ontheotherhand,shouldyouimportfromthismodule,thenitsnamewillbesettosomethingelse,sothelogicundertheifwon'trun.

Thefirstthingwedoisdefineourparser.Iwouldrecommendusingthestandardlibrarymodule,argparse,whichissimpleenoughandquitepowerful.Thereareotheroptionsoutthere,butinthiscase,argparsewillprovideuswithallweneed.

Wewanttofeedourscriptthreedifferentpiecesofdata:thetypesofimageswewanttosave,theformatinwhichwewanttosavethem,andtheURLforthepagetobescraped.

ThetypescanbePNGs,JPGs,orboth(default),whiletheformatcanbeeitherimageorJSON,imagebeingthedefault.URListheonlymandatoryargument.

So,weaddthe-toption,allowingalsothelongversion,--type.Thechoicesare'all','png',and'jpg'.Wesetthedefaultto'all'andweaddahelpmessage.

Wedoasimilarprocedurefortheformatargument,allowingboththeshortandlongsyntax(-fand--format),andfinallyweaddtheurlargument,whichistheonlyonethatisspecifieddifferentlysothatitwon'tbetreatedasanoption,butratherasapositionalargument.

Inordertoparseallthearguments,allweneedisparser.parse_args().Verysimple,isn'tit?

Thelastlineiswherewetriggertheactuallogic,bycallingthescrapefunction,passingalltheargumentswejustparsed.Wewillseeitsdefinitionshortly.Thenicethingaboutargparseisthatifyoucallthescriptbypassing-h,itwillprintaniceusagetextforyouautomatically.Let'stryitout:

$pythonscrape.py-h

usage:scrape.py[-h][-t{all,png,jpg}][-f{img,json}]url

Scrapeawebpage.

positionalarguments:

urlTheURLwewanttoscrapeforimages.

optionalarguments:

-h,--helpshowthishelpmessageandexit

-t{all,png,jpg},--type{all,png,jpg}

Theimagetypewewanttoscrape.

-f{img,json},--format{img,json}

Theformatimagesare_savedto.

Ifyouthinkaboutit,theonetrueadvantageofthisisthatwejustneedtospecifytheargumentsandwedon'thavetoworryabouttheusagetext,whichmeanswewon'thavetokeepitinsyncwiththearguments'definitioneverytimewechangesomething.Thisisprecious.

Hereareafewdifferentwaystocallourscrape.pyscript,whichdemonstratethattypeandformatareoptional,andhowyoucanusetheshortandlongsyntaxestoemploythem:

$pythonscrape.pyhttp://localhost:8000

$pythonscrape.py-tpnghttp://localhost:8000

$pythonscrape.py--type=jpg-fjsonhttp://localhost:8000

Thefirstoneisusingdefaultvaluesfortypeandformat.ThesecondonewillsaveonlyPNGimages,andthethirdonewillsaveonlyJPGs,butinJSONformat.

ThebusinesslogicNowthatwe'veseenthescaffolding,let'sdeepdiveintotheactuallogic(ifitlooksintimidating,don'tworry;we'llgothroughittogether).Withinthescript,thislogicliesaftertheimportsandbeforetheparsing(beforetheif__name__clause):

defscrape(url,format_,type_):

try:

page=requests.get(url)

exceptrequests.RequestExceptionaserr:

print(str(err))

else:

soup=BeautifulSoup(page.content,'html.parser')

images=_fetch_images(soup,url)

images=_filter_images(images,type_)

_save(images,format_)

Let'sstartwiththescrapefunction.Thefirstthingitdoesisfetchthepageatthegivenurlargument.Whatevererrormayhappenwhiledoingthis,wetrapitinRequestException(err)andprintit.RequestExceptionisthebaseexceptionclassforalltheexceptionsintherequestslibrary.

However,ifthingsgowell,andwehaveapagebackfromtheGETrequest,thenwecanproceed(elsebranch)andfeeditscontenttotheBeautifulSoupparser.TheBeautifulSouplibraryallowsustoparseawebpageinnotime,withouthavingtowriteallthelogicthatwouldbeneededtofindalltheimagesinapage,whichwereallydon'twanttodo.It'snotaseasyasitseems,andreinventingthewheelisnevergood.Tofetchimages,weusethe_fetch_imagesfunctionandwefilterthemwith_filter_images.Finally,wecall_savewiththeresult.

Splittingthecodeintodifferentfunctionswithmeaningfulnamesallowsustoreaditmoreeasily.Evenifyouhaven'tseenthelogicofthe_fetch_images,_filter_images,and_savefunctions,it'snothardtopredictwhattheydo,right?Checkoutthefollowing:

def_fetch_images(soup,base_url):

images=[]

forimginsoup.findAll('img'):

src=img.get('src')

img_url=f'{base_url}/{src}'

name=img_url.split('/')[-1]

images.append(dict(name=name,url=img_url))

returnimages

_fetch_imagestakesaBeautifulSoupobjectandabaseURL.Allitdoesisloopthroughalloftheimagesfoundonthepageandfillinthenameandurlinformationabouttheminadictionary(oneperimage).Alldictionariesareaddedtotheimageslist,whichisreturnedattheend.

Thereissometrickerygoingonwhenwegetthenameofanimage.Wesplittheimg_url(http://localhost:8000/img/my_image_name.png)stringusing'/'asaseparator,andwetakethelastitemastheimagename.Thereisamorerobustwayofdoingthis,butforthisexampleitwouldbeoverkill.Ifyouwanttoseethedetailsofeachstep,trytobreakthislogicdownintosmallersteps,andprinttheresultofeachofthemtohelpyourselfunderstand.Towardtheendofthebook,I'llshowyouanothertechniquefordebugginginamuchmoreefficientway.

Anyway,byjustaddingprint(images)attheendofthe_fetch_imagesfunction,wegetthis:

[{'url':'http://localhost:8000/img/owl-alcohol.png','name':'owl-alcohol.png'},

{'url':'http://localhost:8000/img/owl-book.png','name':'owl-book.png'},...]

Itruncatedtheresultforbrevity.Youcanseeeachdictionaryhasaurlandnamekey/valuepair,whichwecanusetofetch,identify,andsaveourimagesaswelike.Atthispoint,Ihearyouaskingwhatwouldhappeniftheimagesonthepagewerespecifiedwithanabsolutepathinsteadofarelativeone,right?Goodquestion!

Theansweristhatthescriptwillfailtodownloadthembecausethislogicexpectsrelativepaths.IwasabouttoaddabitoflogictosolvethisissuewhenIthoughtthat,atthisstage,itwouldbeaniceexerciseforyoutodoit,soI'llleaveituptoyoutofixit.

Hint:Inspectthestartofthatsrcvariable.Ifitstartswith'http',it'sprobablyanabsolutepath.Youmightalsowanttocheckouturllib.parsetodothat.

Ihopethebodyofthe_filter_imagesfunctionisinterestingtoyou.Iwantedtoshowyouhowtocheckonmultipleextensionsusingamappingtechnique:

def_filter_images(images,type_):

iftype_=='all':

returnimages

ext_map={

'png':['.png'],

'jpg':['.jpg','.jpeg'],

}

return[

imgforimginimages

if_matches_extension(img['name'],ext_map[type_])

]

def_matches_extension(filename,extension_list):

name,extension=os.path.splitext(filename.lower())

returnextensioninextension_list

Inthisfunction,iftype_isall,thennofilteringisrequired,sowejustreturnalltheimages.Ontheotherhand,whentype_isnotall,wegettheallowedextensionsfromtheext_mapdictionary,anduseittofiltertheimagesinthelistcomprehensionthatendsthefunctionbody.Youcanseethatbyusinganotherhelperfunction,_matches_extension,Ihavemadethelistcomprehensionsimplerandmorereadable.

All_matches_extensiondoesissplitthenameoftheimagegettingitsextensionandcheckwhetheritiswithinthelistofallowedones.Canyoufindonemicro-improvement(speed-wise)thatcouldbemadetothisfunction?

I'msureyou'rewonderingwhyIhavecollectedalltheimagesinthelistandthenremovedthem,insteadofcheckingwhetherIwantedtosavethembeforeaddingthemtothelist.ThefirstreasonisthatIneeded_fetch_imagesintheGUIapplicationasitisnow.Thesecondreasonisthatcombining,fetching,andfilteringwouldproducealongerandmorecomplicatedfunction,andI'mtryingtokeepthecomplexityleveldown.Thethirdreasonisthatthiscouldbeaniceexerciseforyoutodo:

def_save(images,format_):

ifimages:

ifformat_=='img':

_save_images(images)

else:

_save_json(images)

print('Done')

else:

print('Noimagestosave.')

def_save_images(images):

forimginimages:

img_data=requests.get(img['url']).content

withopen(img['name'],'wb')asf:

f.write(img_data)

def_save_json(images):

data={}

forimginimages:

img_data=requests.get(img['url']).content

b64_img_data=base64.b64encode(img_data)

str_img_data=b64_img_data.decode('utf-8')

data[img['name']]=str_img_data

withopen('images.json','w')asijson:

ijson.write(json.dumps(data))

Let'skeepgoingthroughthecodeandinspectthe_savefunction.Youcanseethat,whenimagesisn'tempty,thisbasicallyactsasadispatcher.Weeithercall_save_imagesor_save_json,dependingonwhatinformationisstoredintheformat_variable.

Wearealmostdone.Let'sjumpto_save_images.Weloopontheimageslistandforeachdictionarywefindthere,weperformaGETrequestontheimageURLandsaveitscontentinafile,whichwenameastheimageitself.

Finally,let'snowstepintothe_save_jsonfunction.It'sverysimilartothepreviousone.Webasicallyfillinthedatadictionary.Theimagenameisthekey,andtheBase64representationofitsbinarycontentisthevalue.Whenwe'redonepopulatingourdictionary,weusethejsonlibrarytodumpitintheimages.jsonfile.I'llgiveyouasmallpreviewofthat:

#images.json(truncated)

{

"owl-alcohol.png":"iVBORw0KGgoAAAANSUhEUgAAASwAAAEICA...

"owl-book.png":"iVBORw0KGgoAAAANSUhEUgAAASwAAAEbCAYAA...

"owl-books.png":"iVBORw0KGgoAAAANSUhEUgAAASwAAAElCAYA...

"owl-ebook.jpg":"/9j/4AAQSkZJRgABAQEAMQAxAAD/2wBDAAEB...

"owl-rose.jpeg":"/9j/4AAQSkZJRgABAQEANAA0AAD/2wBDAAEB...

}

Andthat'sit!Now,beforeproceedingtothenextsection,makesureyouplaywiththisscriptandunderstandhowitworks.Trytomodifysomething,printoutintermediateresults,addanewargumentorfunctionality,orscramblethelogic.We'regoingtomigrateitintoaGUIapplicationnow,whichwilladdalayerofcomplexitysimplybecausewe'llhavetobuildtheGUIinterface,soit'simportantthatyou'rewellacquaintedwiththebusinesslogic—itwillallowyoutoconcentrateontherestofthecode.

Secondapproach–aGUIapplicationThereareseverallibrariesthatwriteGUIapplicationsinPython.ThemostfamousonesareTkinter,wxPython,PyGTK,andPyQt.TheyallofferawiderangeoftoolsandwidgetsthatyoucanusetocomposeaGUIapplication.

TheoneI'mgoingtousefortherestofthischapterisTkinter.TkinterstandsforTkinterfaceanditisthestandardPythoninterfacetotheTkGUItoolkit.BothTkandTkinterareavailableonmostUnixplatforms,macOSX,aswellasonWindowssystems.

Let'smakesurethattkinterisinstalledproperlyonyoursystembyrunningthiscommand:

$python-mtkinter

Itshouldopenadialogwindow,demonstratingasimpleTkinterface.Ifyoucanseethat,we'regoodtogo.However,ifitdoesn'twork,pleasesearchfortkinterinthePythonofficialdocumentation(https://docs.python.org/3.7/library/tkinter.html).Youwillfindseverallinkstoresourcesthatwillhelpyougetupandrunningwithit.

We'regoingtomakeaverysimpleGUIapplicationthatbasicallymimicsthebehaviorofthescriptwesawinthefirstpartofthischapter.Wewon'taddtheabilitytosaveJPGsorPNGssingularly,butafteryou'vegonethroughthischapter,youshouldbeabletoplaywiththecodeandputthatfeaturebackinbyyourself.

So,thisiswhatwe'reaimingfor:

Gorgeous,isn'tit?Asyoucansee,it'saverysimpleinterface(thisishowitshouldlookonamac).Thereisaframe(thatis,acontainer)fortheURLfieldandtheFetchinfobutton,anotherframefortheListbox(Content)toholdtheimagenamesandtheradiobuttontocontrolthewaywesavethem,andfinallythereisaScrape!buttonatthebottom.Wealsohaveastatusbar,whichshowsussomeinformation.

Inordertogetthislayout,wecouldjustplaceallthewidgetsonarootwindow,butthatwouldmakethelayoutlogicquitemessyandunnecessarilycomplicated.So,instead,wewilldividethespaceusingframesandplacethewidgetsinthoseframes.Thiswaywewillachieveamuchnicerresult.So,thisisthedraftforthelayout:

WehaveaRootWindow,whichisthemainwindowoftheapplication.Wedivideitintotworows,thefirstoneinwhichweplacetheMainFrame,andthesecondoneinwhichweplacetheStatusFrame(whichwillholdthestatusbartext).TheMainFrameissubsequentlydividedintothreerows.Inthefirstone,

weplacetheURLFrame,whichholdstheURLwidgets.Inthesecondone,weplacetheImgFrame,whichwillholdtheListboxandtheRadioFrame,whichwillhostalabelandtheradiobuttonwidgets.Andfinallywehavethethirdone,whichwilljustholdtheScrapebutton.

Inordertolayoutframesandwidgets,wewillusealayoutmanager,calledgrid,thatsimplydividesupthespaceintorowsandcolumns,asinamatrix.

Now,allthecodeI'mgoingtowritecomesfromtheguiscrape.pymodule,soIwon'trepeatitsnameforeachsnippet,tosavespace.Themoduleislogicallydividedintothreesections,notunlikethescriptversion:imports,layoutlogic,andbusinesslogic.We'regoingtoanalyzethemlinebyline,inthreechunks.

TheimportsImportsarelikeinthescriptversion,exceptwe'velostargparse,whichisnolongerneeded,andwehaveaddedtwolines:#guiscrape.pyfromtkinterimport*fromtkinterimportttk,filedialog,messagebox...

Thefirstlineisquitecommonpracticewhendealingwithtkinter,althoughingeneralitisbadpracticetoimportusingthe*syntax.Youcanincurinnamecollisionsand,ifthemoduleistoobig,importingeverythingwouldbeexpensive.

Afterthat,weimportttk,filedialog,andmessageboxexplicitly,followingtheconventionalapproachusedwiththislibrary.ttkisthenewsetofstyledwidgets.Theybehavebasicallyliketheoldones,butarecapableofdrawingthemselvescorrectlyaccordingtothestyleyourOSisseton,whichisnice.

Therestoftheimports(omitted)iswhatweneedinordertocarryoutthetaskyouknowwellbynow.Notethatthereisnothingweneedtoinstallwithpipinthissecondpart;wealreadyhaveeverythingweneed.

ThelayoutlogicI'mgoingtopasteitchunkbychunksothatIcanexplainiteasilytoyou.You'llseehowallthosepieceswetalkedaboutinthelayoutdraftarearrangedandgluedtogether.WhatI'mabouttopaste,aswedidinthescriptbefore,isthefinalpartoftheguiscrape.pymodule.We'llleavethemiddlepart,thebusinesslogic,forlast:

if__name__=="__main__":

_root=Tk()

_root.title('Scrapeapp')

Asyouknowbynow,weonlywanttoexecutethelogicwhenthemoduleisrundirectly,sothatfirstlineshouldn'tsurpriseyou.

Inthelasttwolines,wesetupthemainwindow,whichisaninstanceoftheTkclass.Weinstantiateitandgiveitatitle.NotethatIusetheprependingunderscoretechniqueforallthenamesofthetkinterobjects,inordertoavoidpotentialcollisionswithnamesinthebusinesslogic.Ijustfinditcleanerlikethis,butyou'reallowedtodisagree:

_mainframe=ttk.Frame(_root,padding='5555')

_mainframe.grid(row=0,column=0,sticky=(E,W,N,S))

Here,wesetuptheMainFrame.It'sattk.Frameinstance.Weset_rootasitsparent,andgiveitsomepadding.Thepaddingisameasureinpixelsofhowmuchspaceshouldbeinsertedbetweentheinnercontentandthebordersinordertoletourlayoutbreathealittle,otherwisewehaveasardineeffect,wherewidgetsarepackedtootightly.

Thesecondlineismoreinteresting.Weplacethis_mainframeonthefirstrow(0)andfirstcolumn(0)oftheparentobject(_root).Wealsosaythatthisframeneedstoextenditselfineachdirectionbyusingthestickyargumentwithallfourcardinaldirections.Ifyou'rewonderingwheretheycamefrom,it'sthefromtkinterimport*magicthatbroughtthemtous:

_url_frame=ttk.LabelFrame(

_mainframe,text='URL',padding='5555')

_url_frame.grid(row=0,column=0,sticky=(E,W))

_url_frame.columnconfigure(0,weight=1)

_url_frame.rowconfigure(0,weight=1)

Next,westartbyplacingtheURLFramedown.Thistime,theparentobjectis_mainframe,asyouwillrecallfromourdraft.ThisisnotjustasimpleFrame,it'sactuallyaLabelFrame,whichmeanswecansetthetextargumentandexpectarectangletobedrawnaroundit,withthecontentofthetextargumentwritteninthetop-leftpartofit(checkoutthepreviouspictureifithelps).Wepositionthisframeat(0,0),andsaythatitshouldexpandtotheleftandtotheright.Wedon'tneedtheothertwodirections.

Finally,weuserowconfigureandcolumnconfiguretomakesureitbehavescorrectly,shoulditneedtoresize.Thisisjustaformalityinourpresentlayout:

_url=StringVar()

_url.set('http://localhost:8000')

_url_entry=ttk.Entry(

_url_frame,width=40,textvariable=_url)

_url_entry.grid(row=0,column=0,sticky=(E,W,S,N),padx=5)

_fetch_btn=ttk.Button(

_url_frame,text='Fetchinfo',command=fetch_url)

_fetch_btn.grid(row=0,column=1,sticky=W,padx=5)

Here,wehavethecodetolayouttheURLtextboxandthe_fetchbutton.AtextboxinthisenvironmentiscalledEntry.Weinstantiateitasusual,setting_url_frameasitsparentandgivingitawidth.Also,andthisisthemostinterestingpart,wesetthetextvariableargumenttobe_url._urlisaStringVar,whichisanobjectthatisnowconnectedtoEntryandwillbeusedtomanipulateitscontent.Therefore,wedon'tmodifythetextinthe_url_entryinstancedirectly,butbyaccessing_url.Inthiscase,wecallthesetmethodonittosettheinitialvaluetotheURLofourlocalwebpage.

Weposition_url_entryat(0,0),settingallfourcardinaldirectionsforittostickto,andwealsosetabitofextrapaddingontheleftandrightedgesusingpadx,whichaddspaddingonthex-axis(horizontal).Ontheotherhand,padytakescareoftheverticaldirection.

Bynow,youshouldgetthateverytimeyoucallthe.gridmethodonanobject,we'rebasicallytellingthegridlayoutmanagertoplacethatobjectsomewhere,accordingtorulesthatwespecifyasargumentsinthegrid()call.

Similarly,wesetupandplacethe_fetchbutton.Theonlyinterestingparameteriscommand=fetch_url.Thismeansthatwhenweclickthisbutton,wecallthefetch_url

function.Thistechniqueiscalledcallback:

_img_frame=ttk.LabelFrame(

_mainframe,text='Content',padding='9000')

_img_frame.grid(row=1,column=0,sticky=(N,S,E,W))

ThisiswhatwecalledImgFrameinthelayoutdraft.Itisplacedonthesecondrowofitsparent_mainframe.ItwillholdtheListboxandtheRadioFrame:

_images=StringVar()

_img_listbox=Listbox(

_img_frame,listvariable=_images,height=6,width=25)

_img_listbox.grid(row=0,column=0,sticky=(E,W),pady=5)

_scrollbar=ttk.Scrollbar(

_img_frame,orient=VERTICAL,command=_img_listbox.yview)

_scrollbar.grid(row=0,column=1,sticky=(S,N),pady=6)

_img_listbox.configure(yscrollcommand=_scrollbar.set)

Thisisprobablythemostinterestingbitofthewholelayoutlogic.Aswedidwith_url_entry,weneedtodrivethecontentsofListboxbytyingittoan_imagesvariable.WesetupListboxsothat_img_frameisitsparent,and_imagesisthevariableit'stiedto.Wealsopasssomedimensions.

Theinterestingbitcomesfromthe_scrollbarinstance.Notethat,whenweinstantiateit,wesetitscommandto_img_listbox.yview.ThisisthefirsthalfofthecontractbetweenListboxandScrollbar.Theotherhalfisprovidedbythe_img_listbox.configuremethod,whichsetsyscrollcommand=_scrollbar.set.

Byprovidingthisreciprocalbond,whenwescrollonListbox,Scrollbarwillmoveaccordinglyandviceversa,whenweoperateScrollbar,Listboxwillscrollaccordingly:

_radio_frame=ttk.Frame(_img_frame)

_radio_frame.grid(row=0,column=2,sticky=(N,S,W,E))

WeplacetheRadioFrame,readytobepopulated.NotethatListboxisoccupying(0,0)on_img_frame,Scrollbar(0,1),andtherefore_radio_framewillgoin(0,2):

_choice_lbl=ttk.Label(

_radio_frame,text="Choosehowtosaveimages")

_choice_lbl.grid(row=0,column=0,padx=5,pady=5)

_save_method=StringVar()

_save_method.set('img')

_img_only_radio=ttk.Radiobutton(

_radio_frame,text='AsImages',variable=_save_method,

value='img')

_img_only_radio.grid(

row=1,column=0,padx=5,pady=2,sticky=W)

_img_only_radio.configure(state='normal')

_json_radio=ttk.Radiobutton(

_radio_frame,text='AsJSON',variable=_save_method,

value='json')

_json_radio.grid(row=2,column=0,padx=5,pady=2,sticky=W)

Firstly,weplacethelabel,andwegiveitsomepadding.Notethatthelabelandradiobuttonsarechildrenof_radio_frame.

AsfortheEntryandListboxobjects,Radiobuttonisalsodrivenbyabondtoanexternalvariable,whichIcalled_save_method.EachRadiobuttoninstancesetsavalueargument,andbycheckingthevalueon_save_method,weknowwhichbuttonisselected:

_scrape_btn=ttk.Button(

_mainframe,text='Scrape!',command=save)

_scrape_btn.grid(row=2,column=0,sticky=E,pady=5)

Onthethirdrowof_mainframeweplacetheScrapebutton.Itscommandissave,whichsavestheimagestobelistedinListbox,afterwehavesuccessfullyparsedawebpage:

_status_frame=ttk.Frame(

_root,relief='sunken',padding='2222')

_status_frame.grid(row=1,column=0,sticky=(E,W,S))

_status_msg=StringVar()

_status_msg.set('TypeaURLtostartscraping...')

_status=ttk.Label(

_status_frame,textvariable=_status_msg,anchor=W)

_status.grid(row=0,column=0,sticky=(E,W))

Weendthelayoutsectionbyplacingdownthestatusframe,whichisasimplettk.Frame.Togiveitalittlestatusbareffect,wesetitsreliefpropertyto'sunken'andgiveitauniformpaddingoftwopixels.Itneedstosticktotheleft,right,andbottompartsofthe_rootwindow,sowesetitsstickyattributeto(E,W,S).

Wethenplacealabelinitand,thistime,wetieittoaStringVarobject,becausewewillhavetomodifyiteverytimewewanttoupdatethestatusbartext.Youshouldbeacquaintedwiththistechniquebynow.

Finally,onthelastline,weruntheapplicationbycallingthemainloopmethodontheTkinstance:

_root.mainloop()

Pleaserememberthatalltheseinstructionsareplacedundertheif__name__==

"__main__":clauseintheoriginalscript.

Asyoucansee,thecodetodesignourGUIapplicationisnothard.Granted,atthebeginning,youhavetoplayaroundalittlebit.Noteverythingwillworkoutperfectlyatthefirstattempt,butIpromiseyouit'sveryeasyandyoucanfindplentyoftutorialsontheweb.Let'snowgettotheinterestingbit,thebusinesslogic.

ThebusinesslogicWe'llanalyzethebusinesslogicoftheGUIapplicationinthreechunks.Thereisthefetchinglogic,thesavinglogic,andthealertinglogic.

FetchingthewebpageLet'sstartwiththecodetofetchthepageandimages:

config={}

deffetch_url():

url=_url.get()

config['images']=[]

_images.set(())#initialisedasanemptytuple

try:

page=requests.get(url)

exceptrequests.RequestExceptionaserr:

_sb(str(err))

else:

soup=BeautifulSoup(page.content,'html.parser')

images=fetch_images(soup,url)

ifimages:

_images.set(tuple(img['name']forimginimages))

_sb('Imagesfound:{}'.format(len(images)))

else:

_sb('Noimagesfound')

config['images']=images

deffetch_images(soup,base_url):

images=[]

forimginsoup.findAll('img'):

src=img.get('src')

img_url=f'{base_url}/{src}'

name=img_url.split('/')[-1]

images.append(dict(name=name,url=img_url))

returnimages

Firstofall,letmeexplainthatconfigdictionary.WeneedsomewayofpassingdatabetweentheGUIapplicationandthebusinesslogic.Now,insteadofpollutingtheglobalnamespacewithmanydifferentvariables,mypersonalpreferenceistohaveasingledictionarythatholdsalltheobjectsweneedtopassbackandforth,sothattheglobalnamespaceisn'tcloggedupwithallthosenames,andwehaveasingle,clean,easywayofknowingwherealltheobjectsthatareneededbyourapplicationare.

Inthissimpleexample,we'lljustpopulatetheconfigdictionarywiththeimageswefetchfromthepage,butIwantedtoshowyouthetechniquesothatyouhaveatleastoneexample.ThistechniquecomesfrommyexperiencewithJavaScript.Whenyoucodeawebpage,youoftenimportseveraldifferentlibraries.Ifeachoftheseclutteredtheglobalnamespacewithallsortsofvariables,theremightbeissuesinmakingeverythingwork,becauseofnameclashesandvariable

overriding.

So,it'smuchbettertoleavetheglobalnamespaceascleanaswecan.Inthiscase,Ifindthatusingoneconfigvariableismorethanacceptable.

Thefetch_urlfunctionisquitesimilartowhatwedidinthescript.First,wegettheurlvaluebycalling_url.get().Rememberthatthe_urlobjectisaStringVarinstancethatistiedtothe_url_entryobject,whichisanEntry.ThetextfieldyouseeontheGUIistheEntry,butthetextbehindthescenesisthevalueoftheStringVarobject.

Bycallingget()on_url,wegetthevalueofthetext,whichisdisplayedin_url_entry.

Thenextstepistoprepareconfig['images']tobeanemptylist,andtoemptythe_imagesvariable,whichistiedto_img_listbox.This,ofcourse,hastheeffectofcleaningupalltheitemsin_img_listbox.

Afterthispreparationstep,wecantrytofetchthepage,usingthesametry/exceptlogicweadoptedinthescriptatthebeginningofthechapter.Theonedifferenceistheactionwetakeifthingsgowrong.Wecall_sb(str(err))._sbisahelperfunctionwhosecodewe'llseeshortly.Basically,itsetsthetextinthestatusbarforus.Notagoodname,right?Ihadtoexplainitsbehaviortoyou–foodforthought.

Ifwecanfetchthepage,thenwecreatethesoupinstance,andfetchtheimagesfromit.Thelogicoffetch_imagesisexactlythesameastheoneexplainedbefore,soIwon'trepeatmyselfhere.

Ifwehaveimages,usingaquicktuplecomprehension(whichisactuallyageneratorexpressionfedtoatupleconstructor)wefeedthe_imagesasStringVarandthishastheeffectofpopulatingour_img_listboxwithalltheimagenames.Finally,weupdatethestatusbar.

Iftherewerenoimages,westillupdatethestatusbar,andattheendofthefunction,regardlessofhowmanyimageswerefound,weupdateconfig['images']toholdtheimageslist.Inthisway,we'llbeabletoaccesstheimagesfromotherfunctionsbyinspectingconfig['images']withouthavingtopassthatlistaround.

SavingtheimagesThelogictosavetheimagesisprettystraightforward.Hereitis:

defsave():

ifnotconfig.get('images'):

_alert('Noimagestosave')

return

if_save_method.get()=='img':

dirname=filedialog.askdirectory(mustexist=True)

_save_images(dirname)

else:

filename=filedialog.asksaveasfilename(

initialfile='images.json',

filetypes=[('JSON','.json')])

_save_json(filename)

def_save_images(dirname):

ifdirnameandconfig.get('images'):

forimginconfig['images']:

img_data=requests.get(img['url']).content

filename=os.path.join(dirname,img['name'])

withopen(filename,'wb')asf:

f.write(img_data)

_alert('Done')

def_save_json(filename):

iffilenameandconfig.get('images'):

data={}

forimginconfig['images']:

img_data=requests.get(img['url']).content

b64_img_data=base64.b64encode(img_data)

str_img_data=b64_img_data.decode('utf-8')

data[img['name']]=str_img_data

withopen(filename,'w')asijson:

ijson.write(json.dumps(data))

_alert('Done')

WhentheuserclickstheScrape!button,thesavefunctioniscalledusingthecallbackmechanism.

Thefirstthingthatthisfunctiondoesischeckwhetherthereareactuallyanyimagestobesaved.Ifnot,italertstheuseraboutit,usinganotherhelperfunction,_alert,whosecodewe'llseeshortly.Nofurtheractionisperformediftherearenoimages.

Ontheotherhand,iftheconfig['images']listisnotempty,saveactsasadispatcher,anditcalls_save_imagesor_save_json,accordingtowhichvalueisheldby

_same_method.Remember,thisvariableistiedtotheradiobuttons,thereforeweexpectitsvaluetobeeither'img'or'json'.

Thisdispatcherisabitdifferentfromtheoneinthescript.Accordingtowhichmethodwehaveselected,adifferentactionmustbetaken.

Ifwewanttosavetheimagesasimages,weneedtoasktheusertochooseadirectory.Wedothisbycallingfiledialog.askdirectoryandassigningtheresultofthecalltothedirnamevariable.Thisopensupanicedialogwindowthatasksustochooseadirectory.Thedirectorywechoosemustexist,asspecifiedbythewaywecallthemethod.Thisisdonesothatwedon'thavetowritecodetodealwithapotentiallymissingdirectorywhensavingthefiles.

Here'showthisdialogshouldlookonamac:

Ifwecanceltheoperation,dirnamewillbesettoNone.

Beforefinishinganalyzingthelogicinsave,let'squicklygothrough_save_images.

It'sverysimilartotheversionwehadinthescriptsojustnotethat,atthebeginning,inordertobesurethatweactuallyhavesomethingtodo,wecheckonbothdirnameandthepresenceofatleastoneimageinconfig['images'].

Ifthat'sthecase,itmeanswehaveatleastoneimagetosaveandthepathforit,sowecanproceed.Thelogictosavetheimageshasalreadybeenexplained.Theonethingwedodifferentlythistimeisjointhedirectory(whichmeansthecompletepath)totheimagename,bymeansofos.path.join.

Attheendof_save_images,ifwesavedatleastoneimage,wealerttheuserthatwe'redone.

Let'sgobacknowtotheotherbranchinsave.ThisbranchisexecutedwhentheuserselectstheAsJSONradiobuttonbeforepressingtheScrapebutton.Inthiscase,wewanttosaveafile;therefore,wecannotjustaskforadirectory.Wewanttogivetheusertheabilitytochooseafilenameaswell.Hence,wefireupadifferentdialog:filedialog.asksaveasfilename.

Wepassaninitialfilename,whichisproposedtotheuser–theyhavetheabilitytochangeitiftheydon'tlikeit.Moreover,becausewe'resavingaJSONfile,we'reforcingtheusertousethecorrectextensionbypassingthefiletypesargument.Itisalist,withanynumberoftwo-tuples(description,extension),thatrunsthelogicofthedialog.

Here'showthisdialogshouldlookonamacOS:

Oncewehavechosenaplaceandafilename,wecanproceedwiththesavinglogic,whichisthesameasitwasinthepreviousscript.WecreateaJSONobjectfromaPythondictionary(data)thatwepopulatewithkey/valuepairsmadebytheimagesnameandBase64-encodedcontent.

In_save_jsonaswell,wehavealittlecheckatthebeginningthatmakessurethat

wedon'tproceedunlesswehaveafilenameandatleastoneimagetosave.ThisensuresthatiftheuserpressestheCancelbutton,nothingbadhappens.

AlertingtheuserFinally,let'sseethealertinglogic.It'sextremelysimple:

def_sb(msg):

_status_msg.set(msg)

def_alert(msg):

messagebox.showinfo(message=msg)

That'sit!Tochangethestatusbarmessageallweneedtodoistoaccess_status_msgStringVar,asit'stiedtothe_statuslabel.

Ontheotherhand,ifwewanttoshowtheuseramorevisiblemessage,wecanfireupamessagebox.Here'showitshouldlookonamac:

Themessageboxobjectcanalsobeusedtowarntheuser(messagebox.showwarning)ortosignalanerror(messagebox.showerror).Butitcanalsobeusedtoprovidedialogsthataskuswhetherwe'resurewewanttoproceedorifwereallywanttodeletethatfile,andsoon.

Ifyouinspectmessageboxbysimplyprintingoutwhatdir(messagebox)returns,you'llfindmethodssuchasaskokcancel,askquestion,askretrycancel,askyesno,andaskyesnocancel,aswellasasetofconstantstoverifytheresponseoftheuser,suchasCANCEL,NO,OK,OKCANCEL,YES,andYESNOCANCEL.Youcancomparethesetotheuser'schoicesothatyouknowthenextactiontoexecutewhenthedialogcloses.

Howcanweimprovetheapplication?Nowthatyou'reaccustomedtothefundamentalsofdesigningaGUIapplication,I'dliketogiveyousomesuggestionsonhowtomakeoursbetter.

Wecanstartwiththecodequality.Doyouthinkthiscodeisgoodenough,orwouldyouimproveit?Ifso,how?Iwouldtestit,andmakesureit'srobustandcatersforallthevariousscenariosthatausermightcreatebyclickingaroundontheapplication.IwouldalsomakesurethebehavioriswhatIwouldexpectwhenthewebsitewe'rescrapingisdownforanyreason.

Anotherthingthatwecouldimproveisthenaming.Ihaveprudentlynamedallthecomponentswithaleadingunderscore,bothtohighlighttheirsomewhatprivatenature,andtoavoidhavingnameclasheswiththeunderlyingobjectstheyarelinkedto.Butinretrospect,manyofthosecomponentscoulduseabettername,soit'sreallyuptoyoutorefactoruntilyoufindtheformthatsuitsyoubest.Youcouldstartbygivingabetternametothe_sbfunction!

Forwhatconcernstheuserinterface,youcouldtrytoresizethemainapplication.Seewhathappens?Thewholecontentstaysexactlywhereitis.Emptyspaceisaddedifyouexpand,orthewholewidgetssetdisappearsgraduallyifyoushrink.Thisbehaviorisn'texactlynice,thereforeonequicksolutioncouldbetomaketherootwindowfixed(thatis,unabletoresize).

Anotherthingthatyoucoulddotoimprovetheapplicationistoaddthesamefunctionalitywehadinthescript,tosaveonlyPNGsorJPGs.Inordertodothis,youcouldplaceacomboboxsomewhere,withthreevalues:All,PNGs,JPGs,orsomethingsimilar.Theusershouldbeabletoselectoneofthoseoptionsbeforesavingthefiles.

Evenbetter,youcouldchangethedeclarationofListboxsothatit'spossibletoselectmultipleimagesatthesametime,andonlytheselectedoneswillbesaved.Ifyoumanagetodothis(it'snotashardasitseems,believeme),thenyoushouldconsiderpresentingtheListboxabitbetter,maybeprovidingalternatingbackgroundcolorsfortherows.

Anothernicethingyoucouldaddisabuttonthatopensupadialogtoselectafile.ThefilemustbeoneoftheJSONfilestheapplicationcanproduce.Onceselected,youcouldrunsomelogictoreconstructtheimagesfromtheirBase64-encodedversion.Thelogictodothisisverysimple,sohere'sanexample:

withopen('images.json','r')asf:

data=json.loads(f.read())

for(name,b64val)indata.items():

withopen(name,'wb')asf:

f.write(base64.b64decode(b64val))

Asyoucansee,weneedtoopenimages.jsoninreadmode,andgrabthedatadictionary.Oncewehaveit,wecanloopthroughitsitems,andsaveeachimagewiththeBase64-decodedcontent.I'llleaveituptoyoutotiethislogictoabuttonintheapplication.

AnothercoolfeaturethatyoucouldaddistheabilitytoopenupapreviewpanethatshowsanyimageyouselectfromListbox,sothattheusercantakeapeekattheimagesbeforedecidingtosavethem.

Finally,onelastsuggestionforthisapplicationistoaddamenu.MaybeevenasimplemenuwithFileand?toprovidetheusualHelporAbout.Justforfun.Addingmenusisnotthatcomplicated;youcanaddtext,keyboardshortcuts,images,andsoon.

Wheredowegofromhere?IfyouareinterestedindiggingdeeperintotheworldofGUIs,thenI'dliketoofferyouthefollowingsuggestions.

TheturtlemoduleTheturtlemoduleisanextendedreimplementationoftheeponymousmodulefromthePythonstandarddistributionuptoversionPython2.5.It'saverypopularwaytointroducechildrentoprogramming.

It'sbasedontheideaofanimaginaryturtlestartingat(0,0)intheCartesianplane.Youcanprogrammaticallycommandtheturtletomoveforwardandbackward,rotate,andsoon;bycombiningallthepossiblemoves,allsortsofintricateshapesandimagescanbedrawn.

It'sdefinitelyworthcheckingout,ifonlytoseesomethingdifferent.

wxPython,PyQt,andPyGTKAfteryouhaveexploredthevastnessofthetkinterrealm,I'dsuggestyouexploreotherGUIlibraries:wxPython(https://www.wxpython.org/),PyQt(https://riverbankcomputing.com/software/pyqt/intro),andPyGTK(https://pygobject.readthedocs.io/en/latest/).Youmayfindoutoneoftheseworksbetterforyou,oritmakesiteasierforyoutocodetheapplicationyouneed.

Ibelievethatcoderscanrealizetheirideasonlywhentheyareconsciousofwhattoolstheyhaveavailable.Ifyourtoolsetistoonarrow,yourideasmayseemimpossibleorextremelyhardtorealize,andtheyriskremainingexactlywhattheyare,justideas.

Ofcourse,thetechnologicalspectrumtodayishumongous,soknowingeverythingisnotpossible;therefore,whenyouareabouttolearnanewtechnologyoranewsubject,mysuggestionistogrowyourknowledgebyexploringbreadthfirst.

Investigateseveralthings,andthengodeepwiththeoneorthefewthatlookedmostpromising.Thiswayyou'llbeabletobeproductivewithatleastonetool,andwhenthetoolnolongerfitsyourneeds,you'llknowwheretodigdeeper,thankstoyourpreviousexploration.

TheprincipleofleastastonishmentWhendesigninganinterface,therearemanydifferentthingstobearinmind.Oneofthem,whichformeisthemostimportant,isthelaworprincipleofleastastonishment.Itbasicallystatesthatifinyourdesignanecessaryfeaturehasahighastonishingfactor,itmaybenecessarytoredesignyourapplication.Togiveyouoneexample,whenyou'reusedtoworkingwithWindows,wherethebuttonstominimize,maximize,andcloseawindowareonthetop-rightcorner,it'squitehardtoworkonLinux,wheretheyareatthetop-leftcorner.You'llfindyourselfconstantlygoingtothetop-rightcorneronlytodiscoveroncemorethatthebuttonsareontheotherside.

Ifacertainbuttonhasbecomesoimportantinapplicationsthatit'snowplacedinapreciselocationbydesigners,pleasedon'tinnovate.Justfollowtheconvention.Userswillonlybecomefrustratedwhentheyhavetowastetimelookingforabuttonthatisnotwhereit'ssupposedtobe.

ThedisregardforthisruleisthereasonwhyIcannotworkwithproductssuchasJira.Ittakesmeminutestodosimplethingsthatshouldrequireseconds.

ThreadingconsiderationsThistopicisoutsidethescopeofthisbook,butIdowanttomentionit.

IfyouarecodingaGUIapplicationthatneedstoperformalong-runningoperationwhenabuttonisclicked,youwillseethatyourapplicationwillprobablyfreezeuntiltheoperationhasbeencarriedout.Inordertoavoidthis,andmaintaintheapplication'sresponsiveness,youmayneedtorunthattime-expensiveoperationinadifferentthread(orevenadifferentprocess)sothattheOSwillbeabletodedicatealittlebitoftimetotheGUIeverynowandthen,tokeepitresponsive.

Gainagoodgraspofthefundamentalsfirst,andthenhavefunexploringthem!

SummaryInthischapter,weworkedonaprojecttogether.Wehavewrittenascriptthatscrapesaverysimplewebpageandacceptsoptionalcommandsthatalteritsbehaviorindoingso.WealsocodedaGUIapplicationtodothesamethingbyclickingbuttonsinsteadoftypingonaconsole.IhopeyouenjoyedreadingitandfollowingalongasmuchasIenjoyedwritingit.

Wesawmanydifferentconcepts,suchasworkingwithfilesandperformingHTTPrequests,andwetalkedaboutguidelinesforusabilityanddesign.

Ihaveonlybeenabletoscratchthesurface,buthopefullyyouhaveagoodstartingpointfromwhichtoexpandyourexploration.

Throughoutthechapter,Ihavepointedoutseveraldifferentwaysyoucouldimprovetheapplication,andIhavechallengedyouwithafewexercisesandquestions.Ihopeyouhavetakenthetimetoplaywiththoseideas.Youcanlearnalotjustbyplayingaroundwithfunapplicationsliketheonewe'vecodedtogether.

Inthenextchapter,we'regoingtotalkaboutdatascience,oratleastaboutthetoolsthataPythonprogrammerhaswhenitcomestofacingthissubject.

DataScience"Ifwehavedata,let'slookatdata.Ifallwehaveareopinions,let'sgowithmine."

–JimBarksdale,formerNetscapeCEO

Datascienceisaverybroadtermandcanassumeseveraldifferentmeaningsbasedoncontext,understanding,tools,andsoon.Therearecountlessbooksonthissubject,whichisnotsuitableforthefaint-hearted.

Inordertodoproperdatascience,youneedto,attheveryleast,knowmathematicsandstatistics.Then,youmaywanttodigintoothersubjects,suchaspatternrecognitionandmachinelearningand,ofcourse,thereisaplethoraoflanguagesandtoolsyoucanchoosefrom.

Iwon'tbeabletotalkabouteverythinghere.Therefore,inordertorenderthischaptermeaningful,we'regoingtoworkonacoolprojecttogetherinstead.

Aroundtheyear2012/2013,Iwasworkingforatop-tiersocialmediacompanyinLondon.Istayedtherefortwoyears,andIwasprivilegedtoworkwithseveralpeoplewhosebrillianceIcanonlystarttodescribe.WewerethefirstintheworldtohaveaccesstotheTwitterAdsAPI,andwewerepartnerswithFacebookaswell.Thatmeansalotofdata.

Ouranalystsweredealingwithahugenumberofcampaignsandtheywerestrugglingwiththeamountofworktheyhadtodo,sothedevelopmentteamIwasapartoftriedtohelpbyintroducingthemtoPythonandtothetoolsPythongivesyoutodealwithdata.ItwasaveryinterestingjourneythatledmetomentorseveralpeopleinthecompanyandeventuallytookmetoManilawhere,fortwoweeks,IgaveintensivetraininginPythonanddatasciencetotheanalystsoverthere.

Theprojectwe'regoingtodointhischapterisalightweightversionofthefinalexampleIpresentedtomystudentsinManila.Ihaverewrittenittoasizethatwillfitthischapter,andmadeafewadjustmentshereandthereforteachingpurposes,butallthemainconceptsarethere,soitshouldbefunandinstructionalforyou.

Specifically,wearegoingtoexplorethefollowing:

TheJupyterNotebookPandasandNumPy:mainlibrariesfordatascienceinPythonAfewconceptsaroundPandas'sDataFrameclassCreatingandmanipulatingadataset

Let'sstartbytalkingaboutRomangods.

IPythonandJupyterNotebookIn2001,FernandoPerezwasagraduatestudentinphysicsatCUBoulder,andwastryingtoimprovethePythonshellsothathecouldhavethenicetieshewasusedtowhenhewasworkingwithtoolssuchasMathematicaandMaple.TheresultofthatefforttookthenameIPython.

Inanutshell,thatsmallscriptbeganasanenhancedversionofthePythonshelland,throughtheeffortofothercodersandeventuallywithproperfundingfromseveraldifferentcompanies,itbecamethewonderfulandsuccessfulprojectitistoday.Some10yearsafteritsbirth,aNotebookenvironmentwascreated,poweredbytechnologiessuchasWebSockets,theTornadowebserver,jQuery,CodeMirror,andMathJax.TheZeroMQlibrarywasalsousedtohandlethemessagesbetweentheNotebookinterfaceandthePythoncorethatliesbehindit.

TheIPythonNotebookhasbecomesopopularandwidelyusedthat,overtime,allsortsofgoodieshavebeenaddedtoit.Itcanhandlewidgets,parallelcomputing,allsortsofmediaformats,andmuch,muchmore.Moreover,atsomepoint,itbecamepossibletocodeinlanguagesotherthanPythonfromwithintheNotebook.

Thishasledtoahugeprojectthatatsomestagehasbeensplitintotwo:IPythonhasbeenstrippeddowntofocusmoreonthekernelpartandtheshell,whiletheNotebookhasbecomeabrandnewprojectcalledJupyter.Jupyterallowsinteractivescientificcomputationstobemadeinmorethan40languages.

Thischapter'sprojectwillallbecodedandruninaJupyterNotebook,soletmeexplaininafewwordswhataNotebookis.

ANotebookenvironmentisawebpagethatexposesasimplemenuandthecellsinwhichyoucanrunPythoncode.Eventhoughthecellsareseparateentitiesthatyoucanrunindividually,theyallsharethesamePythonkernel.Thismeansthatallthenamesthatyoudefineinacell(thevariables,functions,andsoon)willbeavailableinanyothercell.

Simplyput,aPythonkernelisaprocessinwhichPythonisrunning.TheNotebookwebpage

is,therefore,aninterfaceexposedtotheuserfordrivingthiskernel.Thewebpagecommunicatestoitusingaveryfastmessagingsystem.

Apartfromallthegraphicaladvantages,thebeautyofhavingsuchanenvironmentliesintheabilitytorunaPythonscriptinchunks,andthiscanbeatremendousadvantage.Takeascriptthatisconnectingtoadatabasetofetchdataandthenmanipulatethatdata.Ifyoudoitintheconventionalway,withaPythonscript,youhavetofetchthedataeverytimeyouwanttoexperimentwithit.WithinaNotebookenvironment,youcanfetchthedatainacellandthenmanipulateandexperimentwithitinothercells,sofetchingiteverytimeisnotnecessary.

TheNotebookenvironmentisalsoextremelyhelpfulfordatasciencebecauseitallowsforstep-by-stepintrospection.Youdoonechunkofworkandthenverifyit.Youthendoanotherchunkandverifyagain,andsoon.

It'salsoinvaluableforprototypingbecausetheresultsarethere,rightinfrontofyoureyes,immediatelyavailable.

Ifyouwanttoknowmoreaboutthesetools,pleasecheckoutipython.organdjupyter.org.

IhavecreatedaverysimpleexampleNotebookwithafibonaccifunctionthatgivesyouthelistofalltheFibonaccinumberssmallerthanagivenN.Inmybrowser,itlookslikethis:

EverycellhasanIn[]label.Ifthere'snothingbetweenthebrackets,itmeansthatacellhasneverbeenexecuted.Ifthereisanumber,itmeansthatthecellhasbeenexecuted,andthenumberrepresentstheorderinwhichthecellwasexecuted.Finally,a*meansthatthecelliscurrentlybeingexecuted.

YoucanseeinthepicturethatinthefirstcellIhavedefinedthefibonaccifunction,andIhaveexecutedit.ThishastheeffectofplacingthefibonaccinameintheglobalframeassociatedwiththeNotebook,thereforethefibonaccifunctionisnowavailabletotheothercellsaswell.Infact,inthesecondcell,Icanrunfibonacci(100)andseetheresultsinOut[2].Inthethirdcell,IhaveshownyouoneoftheseveralmagicfunctionsyoucanfindinaNotebookinthesecondcell.%timeitrunsthecodeseveraltimesandprovidesyouwithanicebenchmarkforit.AllthemeasurementsforthelistcomprehensionsandgeneratorsIdidinChapter5,SavingTimeandMemory,werecarriedoutwiththisnicefeature.

Youcanexecuteacellasmanytimesasyouwant,andchangetheorderinwhichyourunthem.Cellsareverymalleable,youcanalsoputinmarkdowntextorrenderthemasheaders.

MarkdownisalightweightmarkuplanguagewithplaintextformattingsyntaxdesignedsothatitcanbeconvertedtoHTMLandmanyotherformats.

Also,whateveryouplaceinthelastrowofacellwillbeautomaticallyprintedforyou.Thisisveryhandybecauseyou'renotforcedtowriteprint(...)

explicitly.

FeelfreetoexploretheNotebookenvironment;onceyou'refriendswithit,it'salong-lastingrelationship,Ipromise.

InstallingtherequiredlibrariesInordertoruntheNotebook,youhavetoinstallahandfuloflibraries,eachofwhichcollaborateswiththeotherstomakethewholethingwork.Alternatively,youcanjustinstallJupyteranditwilltakecareofeverythingforyou.Forthischapter,thereareafewotherdependenciesthatweneedtoinstall.Youcanfindthemlistedinrequirements/requirements.data.science.in.Toinstallthem,pleasetakealookatREADME.rstintherootfolderoftheproject,andyouwillfindinstructionsspecificallyforthischapter.

UsingAnacondaSometimesinstallingdatasciencelibrariescanbeextremelypainful.Ifyouarestrugglingtoinstallthelibrariesforthischapterinyourvirtualenvironment,analternativechoiceyouhaveistoinstallAnaconda.AnacondaisafreeandopensourcedistributionofthePythonandRprogramminglanguagesfordatascienceandmachine-learning-relatedapplicationsthataimstosimplifypackagemanagementanddeployment.Youcandownloaditfromtheanaconda.orgwebsite.Onceyouhaveinstalleditinyoursystem,takeapeekatthevariousrequirementsforthischapterandinstallthemthroughAnaconda.

StartingaNotebookOnceyouhavealltherequiredlibrariesinstalled,youcaneitherstartaNotebookwiththefollowingcommandorbyusingtheAnacondainterface:$jupyternotebook

Youwillhaveanopenpageinyourbrowseratthisaddress(theportmightbedifferent):http://localhost:8888/.GotothatpageandcreateanewNotebookusingthemenu.Whenyoufeelcomfortablewithit,you'rereadytogo.IstronglyencourageyoutotryandgetaJupyterenvironmentrunning,beforeyouproceedreadingon.Itisanexcellentexercisesometimestohavetodealwithdifficultdependencies.

OurprojectwilltakeplaceinaNotebook,thereforeIwilltageachcodesnippetwiththecellnumberitbelongsto,sothatyoucaneasilyreproducethecodeandfollowalong.

Ifyoufamiliarizeyourselfwiththekeyboardshortcuts(lookintheNotebook'sHelpsection),youwillbeabletomovebetweencellsandhandletheircontentwithouthavingtoreachforthemouse.ThiswillmakeyoumoreproficientandwayfasterwhenyouworkinaNotebook.

Let'snowmoveonandtalkaboutthemostinterestingpartofthischapter:data.

Dealingwithdata

Typically,whenyoudealwithdata,thisisthepathyougothrough:youfetchit,youcleanandmanipulateit,andthenyouinspectit,andpresentresultsasvalues,spreadsheets,graphs,andsoon.Iwantyoutobeinchargeofallthreestepsoftheprocesswithouthavinganyexternaldependencyonadataprovider,sowe'regoingtodothefollowing:

1. We'regoingtocreatethedata,simulatingthefactthatitcomesinaformatthatisnotperfectorreadytobeworkedon

2. We'regoingtocleanitandfeedittothemaintoolwe'lluseintheprojectsuchasDataFramefromthepandaslibrary

3. We'regoingtomanipulatethedatainDataFrame4. We'regoingtosaveDataFrametoafileindifferentformats5. We'regoingtoinspectthedataandgetsomeresultsoutofit

SettinguptheNotebook

Firstthingsfirst,let'sproducethedata.Westartfromthech13-dataprepNotebook:

#1

importjson

importrandom

fromdatetimeimportdate,timedelta

importfaker

Cell#1takescareoftheimports.Wehavealreadyencounteredthem,apartfromfaker.Youcanusethismoduletopreparefakedata.It'sveryusefulintests,whenyouprepareyourfixtures,togetallsortsofthingssuchasnames,emailaddresses,phonenumbers,andcreditcarddetails.Itisallfake,ofcourse.

PreparingthedataWewanttoachievethefollowingdatastructure:we'regoingtohavealistofuserobjects.Eachuserobjectwillbelinkedtoanumberofcampaignobjects.InPython,everythingisanobject,soI'musingthisterminagenericway.Theuserobjectmaybeastring,adictionary,orsomethingelse.

Acampaigninthesocialmediaworldisapromotionalcampaignthatamediaagencyrunsonsocialmedianetworksonbehalfofaclient.Rememberthatwe'regoingtopreparethisdatasothatit'snotinperfectshape(butitwon'tbethatbadeither...):

#2

fake=faker.Faker()

Firstly,weinstantiatetheFakerthatwe'llusetocreatethedata:

#3

usernames=set()

usernames_no=1000

#populatethesetwith1000uniqueusernames

whilelen(usernames)<usernames_no:

usernames.add(fake.user_name())

Thenweneedusernames.Iwant1,000uniqueusernames,soIloopoverthelengthoftheusernamessetuntilithas1,000elements.Asetmethoddoesn'tallowduplicatedelements,thereforeuniquenessisguaranteed:

#4

defget_random_name_and_gender():

skew=.6#60%ofuserswillbefemale

male=random.random()>skew

ifmale:

returnfake.name_male(),'M'

else:

returnfake.name_female(),'F'

defget_users(usernames):

users=[]

forusernameinusernames:

name,gender=get_random_name_and_gender()

user={

'username':username,

'name':name,

'gender':gender,

'email':fake.email(),

'age':fake.random_int(min=18,max=90),

'address':fake.address(),

}

users.append(json.dumps(user))

returnusers

users=get_users(usernames)

users[:3]

Here,wecreatealistofusers.Eachusernamehasnowbeenaugmentedtoafull-blownuserdictionary,withotherdetailssuchasname,gender,andemail.EachuserdictionaryisthendumpedtoJSONandaddedtothelist.Thisdatastructureisnotoptimal,ofcourse,butwe'resimulatingascenariowhereuserscometouslikethat.

Notetheskeweduseofrandom.random()tomake60%ofusersfemale.Therestofthelogicshouldbeveryeasyforyoutounderstand.

Notealsothelastline.Eachcellautomaticallyprintswhat'sonthelastline;therefore,theoutputof#4isalistwiththefirstthreeusers:

['{"username":"samuel62","name":"TonyaLucas","gender":"F","email":

"anthonyrobinson@robbins.biz","age":27,"address":"PSC8934,Box4049\\nAPOAA

43073"}',

'{"username":"eallen","name":"CharlesHarmon","gender":"M","email":

"courtneycollins@hotmail.com","age":28,"address":"38661ClarkMewsApt.

528\\nAnthonychester,ID25919"}',

'{"username":"amartinez","name":"LauraDunn","gender":"F","email":

"jeffrey35@yahoo.com","age":88,"address":"0536DanielCourtApt.541\\nPort

Christopher,HI49399-3415"}']

Ihopeyou'refollowingalongwithyourownNotebook.Ifyouare,pleasenotethatalldataisgeneratedusingrandomfunctionsandvalues;therefore,youwillseedifferentresults.TheywillchangeeverytimeyouexecutetheNotebook.

Inthefollowingcode#5isthelogictogenerateacampaignname:

#5

#campaignnameformat:

#InternalType_StartDate_EndDate_TargetAge_TargetGender_Currency

defget_type():

#justsomegibberishinternalcodes

types=['AKX','BYU','GRZ','KTR']

returnrandom.choice(types)

defget_start_end_dates():

duration=random.randint(1,2*365)

offset=random.randint(-365,365)

start=date.today()-timedelta(days=offset)

end=start+timedelta(days=duration)

def_format_date(date_):

returndate_.strftime("%Y%m%d")

return_format_date(start),_format_date(end)

defget_age():

age=random.randint(20,45)

age-=age%5

diff=random.randint(5,25)

diff-=diff%5

return'{}-{}'.format(age,age+diff)

defget_gender():

returnrandom.choice(('M','F','B'))

defget_currency():

returnrandom.choice(('GBP','EUR','USD'))

defget_campaign_name():

separator='_'

type_=get_type()

start,end=get_start_end_dates()

age=get_age()

gender=get_gender()

currency=get_currency()

returnseparator.join(

(type_,start,end,age,gender,currency))

Analystsusespreadsheetsallthetime,andtheycomeupwithallsortsofcodingtechniquestocompressasmuchinformationaspossibleintothecampaignnames.TheformatIchoseisasimpleexampleofthattechnique—thereisacodethattellsusthecampaigntype,thenthestartandenddates,thenthetargetageandgender,andfinallythecurrency.Allvaluesareseparatedbyanunderscore.

Intheget_typefunction,Iuserandom.choice()togetonevaluerandomlyoutofacollection.Probablymoreinterestingisget_start_end_dates.First,Igetthedurationforthecampaign,whichgoesfromonedaytotwoyears(randomly),thenIgetarandomoffsetintimewhichIsubtractfromtoday'sdateinordertogetthestartdate.Giventhatanoffsetisarandomnumberbetween-365and365,wouldanythingbedifferentifIaddedittotoday'sdateinsteadofsubtractingit?

WhenIhaveboththestartandenddates,Ireturnastringifiedversionofthem,joinedbyanunderscore.

Then,wehaveabitofmodulartrickerygoingonwiththeagecalculation.Ihopeyourememberthemodulooperator(%)fromChapter2,Built-inDataTypes.

WhathappenshereisthatIwantadaterangethathasmultiplesoffiveasextremes.So,therearemanywaystodoit,butwhatIdoistogetarandom

numberbetween20and45fortheleftextreme,andremovetheremainderofthedivisionby5.So,if,forexample,Iget28,Iwillremove28%5=3fromit,getting25.Icouldhavejustusedrandom.randrange(),butit'shardtoresistmodulardivision.

Therestofthefunctionsarejustsomeotherapplicationsofrandom.choice()andthelastone,get_campaign_name,isnothingmorethanacollectorforallthesepuzzlepiecesthatreturnsthefinalcampaignname:

#6

#campaigndata:

#name,budget,spent,clicks,impressions

defget_campaign_data():

name=get_campaign_name()

budget=random.randint(10**3,10**6)

spent=random.randint(10**2,budget)

clicks=int(random.triangular(10**2,10**5,0.2*10**5))

impressions=int(random.gauss(0.5*10**6,2))

return{

'cmp_name':name,

'cmp_bgt':budget,

'cmp_spent':spent,

'cmp_clicks':clicks,

'cmp_impr':impressions

}

In#6,wewriteafunctionthatcreatesacompletecampaignobject.Iusedafewdifferentfunctionsfromtherandommodule.random.randint()givesyouanintegerbetweentwoextremes.Theproblemwithitisthatitfollowsauniformprobabilitydistribution,whichmeansthatanynumberintheintervalhasthesameprobabilityofcomingup.

Therefore,whendealingwithalotofdata,ifyoudistributeyourfixturesusingauniformdistribution,theresultsyougetwillalllooksimilar.Forthisreason,Ichosetousetriangularandgauss,forclicksandimpressions.Theyusedifferentprobabilitydistributionssothatwe'llhavesomethingmoreinterestingtoseeintheend.

Justtomakesurewe'reonthesamepagewiththeterminology:clicksrepresentsthenumberofclicksonacampaignadvertisement,budgetisthetotalamountofmoneyallocatedforthecampaign,spentishowmuchofthatmoneyhasalreadybeenspent,andimpressionsisthenumberoftimesthecampaignhasbeenfetched,asaresource,fromitssource,regardlessofthenumberofclicksthatwereperformedonthecampaign.Normally,theamountofimpressionsisgreaterthan

thenumberofclicks.

Nowthatwehavethedata,it'stimetoputitalltogether:

#7

defget_data(users):

data=[]

foruserinusers:

campaigns=[get_campaign_data()

for_inrange(random.randint(2,8))]

data.append({'user':user,'campaigns':campaigns})

returndata

Asyoucansee,eachitemindataisadictionarywithauserandalistofcampaignsthatareassociatedwiththatuser.

CleaningthedataLet'sstartcleaningthedata:

#8

rough_data=get_data(users)

rough_data[:2]#let'stakeapeek

Wesimulatefetchingthedatafromasourceandtheninspectit.TheNotebookistheperfecttoolforinspectingyoursteps.Youcanvarythegranularitytoyourneeds.Thefirstiteminrough_datalookslikethis:

{'user':'{"username":"samuel62","name":"TonyaLucas","gender":"F","email":

"anthonyrobinson@robbins.biz","age":27,"address":"PSC8934,Box4049\\nAPOAA

43073"}',

'campaigns':[{'cmp_name':'GRZ_20171018_20171116_35-55_B_EUR',

'cmp_bgt':999613,

'cmp_spent':43168,

'cmp_clicks':35603,

'cmp_impr':500001},

...

{'cmp_name':'BYU_20171122_20181016_30-45_B_USD',

'cmp_bgt':561058,

'cmp_spent':472283,

'cmp_clicks':44823,

'cmp_impr':499999}]}

So,wenowstartworkingonit:

#9

data=[]

fordatuminrough_data:

forcampaignindatum['campaigns']:

campaign.update({'user':datum['user']})

data.append(campaign)

data[:2]#let'stakeanotherpeek

ThefirstthingweneedtodoinordertobeabletofeedDataFramewiththisdataistodenormalizeit.Thismeanstransformingdataintoalistwhoseitemsarecampaigndictionaries,augmentedwiththeirrelativeuserdictionary.Userswillbeduplicatedineachcampaigntheybelongto.Thefirstitemindatalookslikethis:

{'cmp_name':'GRZ_20171018_20171116_35-55_B_EUR',

'cmp_bgt':999613,

'cmp_spent':43168,

'cmp_clicks':35603,

'cmp_impr':500001,

'user':'{"username":"samuel62","name":"TonyaLucas","gender":"F","email":

"anthonyrobinson@robbins.biz","age":27,"address":"PSC8934,Box4049\\nAPOAA

43073"}'}

Youcanseethattheuserobjecthasbeenbroughtintothecampaigndictionary,whichwasrepeatedforeachcampaign.

Now,Iwouldliketohelpyouandofferadeterministicsecondpartofthechapter,soI'mgoingtosavethedataIgeneratedheresothatI(andyou,too)willbeabletoloaditfromthenextNotebook,andweshouldthenhavethesameresults:

#10

withopen('data.json','w')asstream:

stream.write(json.dumps(data))

Youshouldfindthedata.jsonfileinthesourcecodeforthebook.Nowwearedonewithch13-dataprep,sowecancloseit,andopenupch13.

CreatingtheDataFrameFirst,wehaveanotherroundofimports:

#1

importjson

importcalendar

importnumpyasnp

frompandasimportDataFrame

importarrow

importpandasaspd

Thejsonandcalendarlibrariescomefromthestandardlibrary.numpyistheNumPylibrary,thefundamentalpackageforscientificcomputingwithPython.NumPystandsforNumericPython,anditisoneofthemostwidely-usedlibrariesinthedatascienceenvironment.I'llsayafewwordsaboutitlateroninthischapter.pandasistheverycoreuponwhichthewholeprojectisbased.PandasstandsforPythonDataAnalysisLibrary.Amongmanyotherthings,itprovidesDataFrame,amatrix-likedatastructurewithadvancedprocessingcapabilities.It'scustomarytoimportDataFrameseparatelyandthentoimportpandasaspd.

arrowisanicethird-partylibrarythatspeedsupdealingwithdatesdramatically.Technically,wecoulddoitwiththestandardlibrary,butIseenoreasonnottoexpandtherangeoftheexampleandshowyousomethingdifferent.

Aftertheimports,weloadthedataasfollows:

#2

withopen('data.json')asstream:

data=json.loads(stream.read())

Andfinally,it'stimetocreateDataFrame:

#3

df=DataFrame(data)

df.head()

WecaninspectthefirstfiverowsusingtheheadmethodofDataFrame.Youshouldseesomethinglikethis:

Jupyterrenderstheoutputofthedf.head()callasHTMLautomatically.Inordertohaveatext-basedoutput,simplywrapdf.head()inaprintcall.

TheDataFramestructureisverypowerful.Itallowsustomanipulatealotofitscontents.Youcanfilterbyrows,columns,aggregateondata,andmanyotheroperations.YoucanoperatewithrowsorcolumnswithoutsufferingthetimepenaltyyouwouldhavetopayifyouwereworkingondatawithpurePython.Thishappensbecause,underthecovers,pandasisharnessingthepoweroftheNumPylibrary,whichitselfdrawsitsincrediblespeedfromthelow-levelimplementationofitscore.

UsingDataFrameallowsustocouplethepowerofNumPywithspreadsheet-likecapabilitiessothatwe'llbeabletoworkonourdatainafashionthatissimilartowhatananalystcoulddo.Only,wedoitwithcode.

Butlet'sgobacktoourproject.Let'sseetwowaystoquicklygetabird'seyeviewofthedata:

#4

df.count()

countyieldsacountofallthenon-emptycellsineachcolumn.Thisisgoodtohelpyouunderstandhowsparseyourdatacanbe.Inourcase,wehavenomissingvalues,sotheoutputis:

cmp_bgt5037

cmp_clicks5037

cmp_impr5037

cmp_name5037

cmp_spent5037

user5037

dtype:int64

Nice!Wehave5,037rows,andthedatatypeisintegers(dtype:int64meanslongintegersbecausetheytake64bitseach).Giventhatwehave1,000usersandtheamountofcampaignsperuserisarandomnumberbetween2and8,we're

exactlyinlinewithwhatIwasexpecting:

#5

df.describe()

Thedescribemethodisanice,quickwaytointrospectabitfurther:

cmp_bgtcmp_clickscmp_imprcmp_spent

count5037.0000005037.0000005037.0000005037.000000

mean496930.31705440920.962676499999.498312246963.542783

std287126.68348421758.5052102.033342217822.037701

min1057.000000341.000000499993.000000114.000000

25%247663.00000023340.000000499998.00000064853.000000

50%491650.00000037919.000000500000.000000183716.000000

75%745093.00000056253.000000500001.000000379478.000000

max999577.00000099654.000000500008.000000975799.000000

Asyoucansee,itgivesusseveralmeasures,suchascount,mean,std(standarddeviation),min,andmax,andshowshowdataisdistributedinthevariousquadrants.Thankstothismethod,wealreadyhavearoughideaofhowourdataisstructured.

Let'sseewhicharethethreecampaignswiththehighestandlowestbudgets:

#6

df.sort_index(by=['cmp_bgt'],ascending=False).head(3)

Thisgivesthefollowingoutput:

cmp_bgtcmp_clickscmp_imprcmp_name

33219995778232499997GRZ_20180810_20190107_40-55_M_EUR

236199953453223499999GRZ_20180516_20191030_25-30_B_EUR

222099909613347499999KTR_20180620_20190809_40-50_F_USD

Andacalltotailshowsustheoneswiththelowestbudgets:

#7

df.sort_values(by=['cmp_bgt'],ascending=False).tail(3)

UnpackingthecampaignnameNowit'stimetoincreasethecomplexity.Firstofall,wewanttogetridofthathorriblecampaignname(cmp_name).Weneedtoexplodeitintopartsandputeachpartinonededicatedcolumn.Inordertodothis,we'llusetheapplymethodoftheSeriesobject.

Thepandas.core.series.Seriesclassisbasicallyapowerfulwrapperaroundanarray(thinkofitasalistwithaugmentedcapabilities).WecanextrapolateaSeriesobjectfromDataFramebyaccessingitinthesamewaywedowithakeyinadictionary,andwecancallapplyonthatSeriesobject,whichwillrunafunctionfeedingeachitemintheSeriestoit.WecomposetheresultintoanewDataFrame,andthenjointhatDataFramewithdf:

#8

defunpack_campaign_name(name):

#veryoptimisticmethod,assumesdataincampaignname

#isalwaysingoodstate

type_,start,end,age,gender,currency=name.split('_')

start=arrow.get(start,'YYYYMMDD').date()

end=arrow.get(end,'YYYYMMDD').date()

returntype_,start,end,age,gender,currency

campaign_data=df['cmp_name'].apply(unpack_campaign_name)

campaign_cols=[

'Type','Start','End','Age','Gender','Currency']

campaign_df=DataFrame(

campaign_data.tolist(),columns=campaign_cols,index=df.index)

campaign_df.head(3)

Withinunpack_campaign_name,wesplitthecampaignnameinparts.Weusearrow.get()togetaproperdateobjectoutofthosestrings(arrowmakesitreallyeasytodoit,doesn'tit?),andthenwereturntheobjects.Aquickpeekatthelastlinereveals:

TypeStartEndAgeGenderCurrency

0KTR2019-03-242020-11-0620-35FEUR

1GRZ2017-05-212018-07-2430-45BGBP

2KTR2017-12-182018-02-0830-40FGBP

Nice!Oneimportantthing:evenifthedatesappearasstrings,theyarejusttherepresentationoftherealdateobjectsthatarehostedinDataFrame.

Anotherveryimportantthing:whenjoiningtwoDataFrameinstances,it'simperativethattheyhavethesameindex,otherwisepandaswon'tbeabletoknow

whichrowsgowithwhich.Therefore,whenwecreatecampaign_df,wesetitsindextotheonefromdf.Thisenablesustojointhem.WhencreatingthisDataFrame,wealsopassthecolumn'snames:

#9

df=df.join(campaign_df)

Andafterjoin,wetakeapeek,hopingtoseematchingdata:

#10

df[['cmp_name']+campaign_cols].head(3)

Thetruncatedoutputoftheprecedingcodesnippetisasfollows:

cmp_nameTypeStartEnd

0KTR_20190324_20201106_20-35_F_EURKTR2019-03-242020-11-06

1GRZ_20170521_20180724_30-45_B_GBPGRZ2017-05-212018-07-24

2KTR_20171218_20180208_30-40_F_GBPKTR2017-12-182018-02-08

Asyoucansee,joinwassuccessful;thecampaignnameandtheseparatecolumnsexposethesamedata.Didyouseewhatwedidthere?We'reaccessingDataFrameusingthesquarebracketssyntax,andwepassalistofcolumnnames.ThiswillproduceabrandnewDataFrame,withthosecolumns(inthesameorder),onwhichwethencallthehead()method.

UnpackingtheuserdataWenowdotheexactsamethingforeachpieceofuserJSONdata.Wecallapplyontheuserseries,runningtheunpack_user_jsonfunction,whichtakesaJSONuserobjectandtransformsitintoalistofitsfields,whichwecantheninjectintoabrandnewDataFrame,user_df.Afterthat,we'lljoinuser_dfbackwithdf,likewedidwithcampaign_df:

#11

defunpack_user_json(user):

#veryoptimisticaswell,expectsuserobjects

#tohaveallattributes

user=json.loads(user.strip())

return[

user['username'],

user['email'],

user['name'],

user['gender'],

user['age'],

user['address'],

]

user_data=df['user'].apply(unpack_user_json)

user_cols=[

'username','email','name','gender','age','address']

user_df=DataFrame(

user_data.tolist(),columns=user_cols,index=df.index)

Verysimilartothepreviousoperation,isn'tit?Weshouldalsonoteherethat,whencreatinguser_df,weneedtoinstructDataFrameaboutthecolumnnamesandtheindex.Let'sjoinandtakeaquickpeek:

#12

df=df.join(user_df)

#13

df[['user']+user_cols].head(2)

Theoutputshowsusthateverythingwentwell.We'regood,butwe'renotdoneyet.Ifyoucalldf.columnsinacell,you'llseethatwestillhaveuglynamesforourcolumns.Let'schangethat:

#14

better_columns=[

'Budget','Clicks','Impressions',

'cmp_name','Spent','user',

'Type','Start','End',

'TargetAge','TargetGender','Currency',

'Username','Email','Name',

'Gender','Age','Address',

]

df.columns=better_columns

Good!Now,withtheexceptionof'cmp_name'and'user',weonlyhavenicenames.

CompletingthedatasetNextstepwillbetoaddsomeextracolumns.Foreachcampaign,wehavethenumbersofclicksandimpressions,andwehavetheamountsspent.Thisallowsustointroducethreemeasurementratios:CTR,CPC,andCPI.TheystandforClickThroughRate,CostPerClick,andCostPerImpression,respectively.

Thelasttwoarestraightforward,butCTRisnot.Sufficeittosaythatitistheratiobetweenclicksandimpressions.Itgivesyouameasureofhowmanyclickswereperformedonacampaignadvertisementperimpression—thehigherthisnumber,themoresuccessfultheadvertisementisinattractinguserstoclickonit:

#15

defcalculate_extra_columns(df):

#ClickThroughRate

df['CTR']=df['Clicks']/df['Impressions']

#CostPerClick

df['CPC']=df['Spent']/df['Clicks']

#CostPerImpression

df['CPI']=df['Spent']/df['Impressions']

calculate_extra_columns(df)

Iwrotethisasafunction,butIcouldhavejustwrittenthecodeinthecell.It'snotimportant.WhatIwantyoutonoticehereisthatwe'readdingthosethreecolumnswithonelineofcodeeach,butDataFrameappliestheoperationautomatically(thedivision,inthiscase)toeachpairofcellsfromtheappropriatecolumns.So,eveniftheyaremaskedasthreedivisions,theseareactually5037*3divisions,becausetheyareperformedforeachrow.Pandasdoesalotofworkforus,andalsodoesaverygoodjobofhidingthecomplexityofit.

Thefunction,calculate_extra_columns,takesDataFrame,andworksdirectlyonit.Thismodeofoperationiscalledin-place.Doyourememberhowlist.sort()wassortingthelist?Itisthesamedeal.Youcouldalsosaythatthisfunctionisnotpure,whichmeansithassideeffects,asitmodifiesthemutableobjectitispassedasanargument.

Wecantakealookattheresultsbyfilteringontherelevantcolumnsandcallinghead:

#16

df[['Spent','Clicks','Impressions',

'CTR','CPC','CPI']].head(3)

Thisshowsusthatthecalculationswereperformedcorrectlyoneachrow:

SpentClicksImpressionsCTRCPCCPI

039383625544999970.1251090.6295840.078766

1210452361765000010.0723525.8174480.420903

2342507622995000010.1245985.4977930.685013

Now,Iwanttoverifytheaccuracyoftheresultsmanuallyforthefirstrow:

#17

clicks=df['Clicks'][0]

impressions=df['Impressions'][0]

spent=df['Spent'][0]

CTR=df['CTR'][0]

CPC=df['CPC'][0]

CPI=df['CPI'][0]

print('CTR:',CTR,clicks/impressions)

print('CPC:',CPC,spent/clicks)

print('CPI:',CPI,spent/impressions)

Thisyieldsthefollowingoutput:

CTR:0.12510875065250390.1251087506525039

CPC:0.62958403938996710.6295840393899671

CPI:0.07876647259883560.0787664725988356

Thisisexactlywhatwesawinthepreviousoutput.Ofcourse,Iwouldn'tnormallyneedtodothis,butIwantedtoshowyouhowcanyouperformcalculationsthisway.YoucanaccessSeries(acolumn)bypassingitsnametoDataFrame,insquarebrackets,andthenyouaccesseachrowbyitsposition,exactlyasyouwouldwitharegularlistortuple.

We'realmostdonewithourDataFrame.Allwearemissingnowisacolumnthattellsusthedurationofthecampaignandacolumnthattellsuswhichdayoftheweekcorrespondstothestartdateofeachcampaign.Thisallowsmetoexpandonhowtoplaywithdateobjects:

#18

defget_day_of_the_week(day):

number_to_day=dict(enumerate(calendar.day_name,1))

returnnumber_to_day[day.isoweekday()]

defget_duration(row):

return(row['End']-row['Start']).days

df['DayofWeek']=df['Start'].apply(get_day_of_the_week)

df['Duration']=df.apply(get_duration,axis=1)

Weusedtwodifferenttechniquesherebutfirst,thecode.

get_day_of_the_weektakesadateobject.Ifyoucannotunderstandwhatitdoes,pleasetakeafewmomentstotrytounderstandforyourselfbeforereadingtheexplanation.Usetheinside-outtechniquelikewe'vedoneafewtimesbefore.

So,asI'msureyouknowbynow,ifyouputcalendar.day_nameinalistcall,youget['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'].Thismeansthat,ifweenumeratecalendar.day_namestartingfrom1,wegetpairssuchas(1,'Monday'),(2,'Tuesday'),andsoon.Ifwefeedthesepairstoadictionary,wegetamappingbetweenthedaysoftheweekasnumbers(1,2,3,...)andtheirnames.Whenthemappingiscreated,inordertogetthenameofaday,wejustneedtoknowitsnumber.Togetit,wecalldate.isoweekday(),whichtellsuswhichdayoftheweekthatdateis(asanumber).Youfeedthatintothemappingand,boom!Youhavethenameoftheday.

get_durationisinterestingaswell.First,noticeittakesanentirerow,notjustasinglevalue.Whathappensinitsbodyisthatweperformasubtractionbetweenacampaign'sendandstartdates.Whenyousubtractdateobjects,theresultisatimedeltaobject,whichrepresentsagivenamountoftime.Wetakethevalueofits.daysproperty.Itisassimpleasthat.

Now,wecanintroducethefunpart,theapplicationofthosetwofunctions.

ThefirstapplicationisperformedonaSeriesobject,likewedidbeforefor'user'and'cmp_name';thereisnothingnewhere.

ThesecondoneisappliedtothewholeDataFrameand,inordertoinstructpandastoperformthatoperationontherows,wepassaxis=1.

Wecanverifytheresultsveryeasily,asshownhere:

#19

df[['Start','End','Duration','DayofWeek']].head(3)

Theprecedingcodeyieldsthefollowingoutput:

StartEndDurationDayofWeek

02019-03-242020-11-06593Sunday

12017-05-212018-07-24429Sunday

22017-12-182018-02-0852Monday

So,wenowknowthatbetweenthe24thofMarch,2019andthe6thofNovember,2020thereare593days,andthatthe24thofMarch,2019isaSunday.

Ifyou'rewonderingwhatthepurposeofthisis,I'llprovideanexample.ImaginethatyouhaveacampaignthatistiedtoasportseventthatusuallytakesplaceonaSunday.Youmaywanttoinspectyourdataaccordingtothedayssothatyoucancorrelatethemtothevariousmeasurementsyouhave.We'renotgoingtodoitinthisproject,butitwasusefultosee,ifonlyforthedifferentwayofcallingapply()onDataFrame.

CleaningeverythingupNowthatwehaveeverythingwewant,it'stimetodothefinalcleaning;rememberwestillhavethe'cmp_name'and'user'columns.Thoseareuselessnow,sotheyhavetogo.Also,IwanttoreorderthecolumnsinDataFramesothatitismorerelevanttothedataitnowcontains.Inordertodothis,wejustneedtofilterdfonthecolumnlistwewant.We'llgetbackabrandnewDataFramethatwecanreassigntodfitself:

#20

final_columns=[

'Type','Start','End','Duration','DayofWeek','Budget',

'Currency','Clicks','Impressions','Spent','CTR','CPC',

'CPI','TargetAge','TargetGender','Username','Email',

'Name','Gender','Age'

]

df=df[final_columns]

Ihavegroupedthecampaigninformationatthebeginning,thenthemeasurements,andfinallytheuserdataattheend.NowourDataFrameiscleanandreadyforustoinspect.

Beforewestartgoingcrazywithgraphs,whatabouttakingasnapshotofDataFramesothatwecaneasilyreconstructitfromafile,ratherthanhavingtoredoallthestepswedidtogethere.Someanalystsmaywanttohaveitinspreadsheetform,todoadifferentkindofanalysisthantheonewewanttodo,solet'sseehowtosaveDataFrametoafile.It'seasierdonethansaid.

SavingtheDataFrametoafileWecansaveDataFrameinmanydifferentways.Youcantypedf.to_andthenpressTabtomakeautocompletionpopup,toseeallthepossibleoptions.

We'regoingtosaveDataFrameinthreedifferentformats,justforfun.First,CSV:#21df.to_csv('df.csv')

ThenJSON:

#22

df.to_json('df.json')

Andfinally,inanExcelspreadsheet:

#23

df.to_excel('df.xls')

TheCSVfilelookslikethis(outputtruncated):

,Type,Start,End,Duration,DayofWeek,Budget,Currency,Clicks,Im

0,KTR,2019-03-24,2020-11-06,593,Sunday,847110,EUR,62554,499997

1,GRZ,2017-05-21,2018-07-24,429,Sunday,510835,GBP,36176,500001

2,KTR,2017-12-18,2018-02-08,52,Monday,720897,GBP,62299,500001,

AndtheJSONonelookslikethis(again,outputtruncated):

{

"Age":{

"0":29,

"1":29,

"10":80,

So,it'sextremelyeasytosaveDataFrameinmanydifferentformats,andthegoodnewsisthatthereverseisalsotrue:it'sveryeasytoloadaspreadsheetintoDataFrame.Theprogrammersbehindpandaswentalongwaytoeaseourtasks,somethingtobegratefulfor.

VisualizingtheresultsFinally,thejuicybits.Inthissection,we'regoingtovisualizesomeresults.Fromadatascienceperspective,I'mnotveryinterestedingoingdeepintoanalysis,especiallybecausethedataiscompletelyrandom,butstill,thiscodewillgetyoustartedwithgraphsandotherfeatures.

SomethingIlearnedinmylife,andthismaycomeasasurprisetoyou,isthat—looksalsocount,soit'sveryimportantthatwhenyoupresentyourresults,youdoyourbesttomakethempretty.

First,wetellpandastorendergraphsinthecelloutputframe,whichisconvenient.Wedoitwiththefollowing:

#24

%matplotlibinline

Then,weproceedwithsomestyling:

#25

importmatplotlib.pyplotasplt

plt.style.use(['classic','ggplot'])

importpylab

pylab.rcParams.update({'font.family':'serif'})

Itspurposeistomakethegraphswewilllookatinthissectionalittlebitprettier.YoucanalsoinstructtheNotebooktodothiswhenyoustartitfromtheconsolebypassingaparameter,butIwantedtoshowyouthiswaytoosinceitcanbeannoyingtohavetorestarttheNotebookjustbecauseyouwanttoplotsomething.Inthisway,youcandoitontheflyandthenkeepworking.

Wealsousepylabtosetthefont.familytoserif.Thismightnotbenecessaryonyoursystem.TrytocommentitoutandexecutetheNotebook,andseewhetheranythingchanges.

NowthatDataFrameiscomplete,let'srundf.describe()(#26)again.Theresultsshouldlooksomethinglikethis:

Thiskindofquickresultisperfectforsatisfyingthosemanagerswhohave20secondstodedicatetoyouandjustwantroughnumbers.

Onceagain,pleasekeepinmindthatourcampaignshavedifferentcurrencies,sothesenumbersareactuallymeaningless.ThepointhereistodemonstratetheDataFramecapabilities,nottogettoacorrectordetailedanalysisofrealdata.

Alternatively,agraphisusuallymuchbetterthanatablewithnumbersbecauseit'smucheasiertoreaditanditgivesyouimmediatefeedback.So,let'sgraphoutthefourpiecesofinformationwehaveoneachcampaign—'Budget','Spent','Clicks',and'Impressions':

#27

df[['Budget','Spent','Clicks','Impressions']].hist(

bins=16,figsize=(16,6));

Weextrapolatethosefourcolumns(thiswillgiveusanotherDataFramemadewithonlythosecolumns)andcallthehistogramhist()methodonit.Wegivesomemeasurementsonthebinsandfiguresizes,butbasically,everythingisdoneautomatically.

Oneimportantthing:sincethisinstructionistheonlyoneinthiscell(whichalsomeans,it'sthelastone),theNotebookwillprintitsresultbeforedrawingthegraph.Tosuppressthisbehaviorandhaveonlythegraphdrawnwithnoprinting,justaddasemicolonattheend(youthoughtIwasreminiscingaboutJava,didn'tyou?).Herearethegraphs:

Theyarebeautiful,aren'tthey?Didyounoticetheseriffont?Howaboutthemeaningofthosefigures?Ifyougobackandtakealookatthewaywegeneratethedata,youwillseethatallthesegraphsmakeperfectsense:

Budgetissimplyarandomintegerinaninterval,thereforewewereexpectingauniformdistribution,andtherewehaveit;it'spracticallyaconstantline.Spentisauniformdistributionaswell,butthehighendofitsintervalisthebudget,whichismoving.Thismeansweshouldexpectsomethingsuchasaquadratichyperbolethatdecreasestotheright.Andthereitisaswell.Clickswasgeneratedwithatriangulardistributionwithameanroughly20%oftheintervalsize,andyoucanseethatthepeakisrightthere,atabout20%totheleft.ImpressionswasaGaussiandistribution,whichistheonethatassumesthefamousbellshape.Themeanwasexactlyinthemiddleandwehadastandarddeviationof2.Youcanseethatthegraphmatchesthoseparameters.

Good!Let'splotoutthemeasureswecalculated:

#28

df[['CTR','CPC','CPI']].hist(

bins=20,figsize=(16,6))

Hereistheplotrepresentation:

WecanseethattheCPCishighlyskewedtotheleft,meaningthatmostoftheCPCvaluesareverylow.TheCPIhasasimilarshape,butislessextreme.

Now,allthisisnice,butifyouwantedtoanalyzeonlyaparticularsegmentofthedata,howwouldyoudoit?WecanapplyamasktoDataFramesothatwegetanotheronewithonlytherowsthatsatisfythemaskcondition.It'slikeapplyingaglobal,row-wiseifclause:

#29

mask=(df.Spent>0.75*df.Budget)

df[mask][['Budget','Spent','Clicks','Impressions']].hist(

bins=15,figsize=(16,6),color='g');

Inthiscase,Ipreparedmasktofilteroutalltherowsforwhichtheamountspentislessthanorequalto75%ofthebudget.Inotherwords,we'llincludeonlythosecampaignsforwhichwehavespentatleastthree-quartersofthebudget.Noticethatinmask,IamshowingyouanalternativewayofaskingforaDataFramecolumn,byusingdirectpropertyaccess(object.property_name),insteadofdictionary-likeaccess(object['property_name']).Ifproperty_nameisavalidPythonname,youcanusebothwaysinterchangeably(JavaScriptworkslikethisaswell).

maskisappliedinthesamewaythatweaccessadictionarywithakey.WhenyouapplymasktoDataFrame,yougetbackanotheroneandweselectonlytherelevantcolumnsonthisandcallhist()again.Thistime,justforfun,wewanttheresultstobegreen:

Notethattheshapesofthegraphshaven'tchangedmuch,apartfromtheSpentgraph,whichisquitedifferent.Thereasonforthisisthatwe'veaskedonlyfortherowswheretheamountspentisatleast75%ofthebudget.Thismeansthatwe'reincludingonlytherowswheretheamountspentisclosetothebudget.Thebudgetnumberscomefromauniformdistribution.Therefore,itisquiteobviousthattheSpentgraphisnowassumingthatkindofshape.Ifyoumaketheboundaryeventighterandaskfor85%ormore,you'llseetheSpentgraphbecomemoreandmoreliketheBudgetone.

Let'snowaskforsomethingdifferent.Howaboutthemeasureof'Spent','Clicks',and'Impressions'groupedbydayoftheweek:

#30

df_weekday=df.groupby(['DayofWeek']).sum()

df_weekday[['Impressions','Spent','Clicks']].plot(

figsize=(16,6),subplots=True);

ThefirstlinecreatesanewDataFrame,df_weekday,byaskingforagroupingby'DayofWeek'ondf.Thefunctionusedtoaggregatethedataisanaddition.

Thesecondlinegetsasliceofdf_weekdayusingalistofcolumnnames,somethingwe'reaccustomedtobynow.Ontheresult,wecallplot(),whichisabitdifferenttohist().Thesubplots=Trueoptionmakesplotdrawthreeindependentgraphs:

Interestinglyenough,wecanseethatmostoftheactionhappensonSundaysandWednesdays.Ifthisweremeaningfuldata,thiswouldpotentiallybeimportantinformationtogivetoourclients,whichiswhyI'mshowingyouthisexample.

Notethatthedaysaresortedalphabetically,whichscramblesthemupabit.Canyouthinkofaquicksolutionthatwouldfixtheissue?I'llleaveittoyouasanexercisetocomeupwithsomething.

Let'sfinishthispresentationsectionwithacouplemorethings.First,asimpleaggregation.Wewanttoaggregateon'TargetGender'and'TargetAge',andshow'Impressions'and'Spent'.Forboth,wewanttosee'mean'andthestandarddeviation('std'):

#31

agg_config={

'Impressions':['mean','std'],

'Spent':['mean','std'],

}

df.groupby(['TargetGender','TargetAge']).agg(agg_config)

It'sveryeasytodo.Wewillprepareadictionarythatwe'lluseasaconfiguration.Then,weperformagroupingonthe'TargetGender'and'TargetAge'columns,andwepassourconfigurationdictionarytotheagg()method.Theresultistruncatedandrearrangedalittlebittomakeitfit,andshownhere:

ImpressionsSpent

meanstdmean

TargetGenderTargetAge

B20-25499999.7415731.904111218917.000000

20-30499999.6184212.039393237180.644737

20-35499999.3580252.039048256378.641975

............

M20-25499999.3552632.108421277232.276316

20-30499999.6352942.075062252140.117647

20-35499999.8358211.871614308598.149254

Thisisthetextualrepresentation,ofcourse,butyoucanalsohavetheHTMLone.

Let'sdoonemorethingbeforewewrapthischapterup.Iwanttoshowyousomethingcalledapivottable.It'skindofabuzzwordinthedataenvironment,soanexamplesuchasthisone,albeitverysimple,isamust:

#32

pivot=df.pivot_table(

values=['Impressions','Clicks','Spent'],

index=['TargetAge'],

columns=['TargetGender'],

aggfunc=np.sum

)

pivot

Wecreateapivottablethatshowsusthecorrelationbetween'TargetAge'and'Impressions','Clicks',and'Spent'.Theselastthreewillbesubdividedaccordingto'TargetGender'.Theaggregationfunction(aggfunc)usedtocalculatetheresultsisthenumpy.sumfunction(numpy.meanwouldbethedefault,hadInotspecifiedanything).

Aftercreatingthepivottable,wesimplyprintitwiththelastlineinthecell,andhere'sacropoftheresult:

It'sprettyclearandprovidesveryusefulinformationwhenthedataismeaningful.

That'sit!I'llleaveyoutodiscovermoreaboutthewonderfulworldofIPython,Jupyter,anddatascience.IstronglyencourageyoutogetcomfortablewiththeNotebookenvironment.It'smuchbetterthanaconsole,it'sextremelypractical

andfuntouse,andyoucanevencreateslidesanddocumentswithit.

Wheredowegofromhere?Datascienceisindeedafascinatingsubject.AsIsaidintheintroduction,thosewhowanttodelveintoitsmeandersneedtobewell-trainedinmathematicsandstatistics.Workingwithdatathathasbeeninterpolatedincorrectlyrendersanyresultaboutituseless.Thesamegoesfordatathathasbeenextrapolatedincorrectlyorsampledwiththewrongfrequency.Togiveyouanexample,imagineapopulationofindividualsthatarealignedinaqueue.Ifforsomereason,thegenderofthatpopulationalternatedbetweenmaleandfemale,thequeuewouldbesomethinglikethis:F-M-F-M-F-M-F-M-F...

Ifyousampledittakingonlytheevenelements,youwoulddrawtheconclusionthatthepopulationwasmadeuponlyofmales,whilesamplingtheoddoneswouldtellyouexactlytheopposite.

Ofcourse,thiswasjustasillyexample,Iknow,butit'sveryeasytomakemistakesinthisfield,especiallywhendealingwithbigdatawheresamplingismandatoryandtherefore,thequalityoftheintrospectionyoumakedepends,firstandforemost,onthequalityofthesamplingitself.

WhenitcomestodatascienceandPython,thesearethemaintoolsyouwanttolookat:

NumPy(http://www.numpy.org/):ThisisthemainpackageforscientificcomputingwithPython.ItcontainsapowerfulN-dimensionalarrayobject,sophisticated(broadcasting)functions,toolsforintegratingC/C++andFortrancode,usefullinearalgebra,theFouriertransform,randomnumbercapabilities,andmuchmore.Scikit-Learn(http://scikit-learn.org/):ThisisprobablythemostpopularmachinelearninglibraryinPython.Ithassimpleandefficienttoolsfordatamininganddataanalysis,accessibletoeverybody,andreusableinvariouscontexts.It'sbuiltonNumPy,SciPy,andMatplotlib.Pandas(http://pandas.pydata.org/):Thisisanopensource,BSD-licensedlibraryprovidinghigh-performance,easy-to-usedatastructures,anddataanalysistools.We'veuseditthroughoutthischapter.IPython(http://ipython.org/)/Jupyter(http://jupyter.org/):Theseprovidea

richarchitectureforinteractivecomputing.Matplotlib(http://matplotlib.org/):ThisisaPython2-Dplottinglibrarythatproducespublication-qualityfiguresinavarietyofhard-copyformatsandinteractiveenvironmentsacrossplatforms.MatplotlibcanbeusedinPythonscripts,thePythonandIPythonshell,JupyterNotebook,webapplicationservers,andfourgraphicaluserinterfacetoolkits.Numba(http://numba.pydata.org/):Thisgivesyouthepowertospeedupyourapplicationswithhigh-performancefunctionswrittendirectlyinPython.Withafewannotations,array-orientedandmath-heavyPythoncodecanbejust-in-timecompiledtonativemachineinstructions,similarinperformancetoC,C++,andFortran,withouthavingtoswitchlanguagesorPythoninterpreters.Bokeh(http://bokeh.pydata.org/):ThisisaPython-interactivevisualizationlibrarythattargetsmodernwebbrowsersforpresentation.Itsgoalistoprovideelegant,conciseconstructionofnovelgraphicsinthestyleofD3.js,butalsodeliverthiscapabilitywithhigh-performanceinteractivityoververylargeorstreamingdatasets.

Otherthanthesesinglelibraries,youcanalsofindecosystems,suchasSciPy(http://scipy.org/)andtheaforementionedAnaconda(https://anaconda.org/),thatbundleseveraldifferentpackagesinordertogiveyousomethingthatjustworksinan"out-of-the-box"fashion.

Installingallthesetoolsandtheirseveraldependenciesishardonsomesystems,soIsuggestthatyoutryoutecosystemsaswelltoseewhetheryouarecomfortablewiththem.Itmaybeworthit.

SummaryInthischapter,wetalkedaboutdatascience.Ratherthanattemptingtoexplainanythingaboutthisextremelywidesubject,wedelvedintoaproject.WefamiliarizedourselveswiththeJupyterNotebook,andwithdifferentlibraries,suchasPandas,Matplotlib,andNumPy.

Ofcourse,havingtocompressallthisinformationintoonesinglechaptermeansIcouldonlytouchbrieflyonthesubjectsIpresented.Ihopetheprojectwe'vegonethroughtogetherhasbeencomprehensiveenoughtogiveyouanideaofwhatcouldpotentiallybetheworkflowyoumightfollowwhenworkinginthisfield.

Thenextchapterisdedicatedtowebdevelopment.So,makesureyouhaveabrowserreadyandlet'sgo!

WebDevelopment"Don'tbelieveeverythingyoureadontheweb."

–Confucius

Inthischapter,we'regoingtoworkonawebsitetogether.Byworkingonasmallproject,myaimistoopenawindowforyoutotakeapeekintowhatwebdevelopmentis,alongwiththemainconceptsandtoolsyoushouldknowifyouwanttobesuccessfulwithit.

Inparticular,wearegoingtoexplorethefollowing:

ThebasicconceptsaroundwebprogrammingTheDjangowebframeworkRegularexpressionsAbriefoverviewoftheFlaskandFalconwebframeworks

Let'sstartwiththefundamentals.

Whatistheweb?

TheWorldWideWeb(WWW),orsimplytheweb,isawayofaccessinginformationthroughtheuseofamediumcalledtheinternet.Theinternetisahugenetworkofnetworks,anetworkinginfrastructure.Itspurposeistoconnectbillionsofdevicestogether,allaroundtheglobe,sothattheycancommunicatewithoneanother.Informationtravelsthroughtheinternetinarichvarietyoflanguages,calledprotocols,thatallowdifferentdevicestospeakthesametongueinordertosharecontent.

Thewebisaninformation-sharingmodel,builtontopoftheinternet,whichemploystheHypertextTransferProtocol(HTTP)asabasisfordatacommunication.Theweb,therefore,isjustoneofseveraldifferentwaysinformationcanbeexchangedovertheinternet;email,instantmessaging,newsgroups,andsoon,allrelyondifferentprotocols.

Howdoesthewebwork?Inanutshell,HTTPisanasymmetricrequest-responseclient-serverprotocol.AnHTTPclientsendsarequestmessagetoanHTTPserver.Theserver,inturn,returnsaresponsemessage.Inotherwords,HTTPisapullprotocolinwhichtheclientpullsinformationfromtheserver(asopposedtoapushprotocol,inwhichtheserverpushesinformationdowntotheclient).Takealookatthefollowingdiagram:

HTTPisbasedonTCP/IP(ortheTransmissionControlProtocol/InternetProtocol),whichprovidesthetoolsforareliablecommunicationexchange.

AnimportantfeatureoftheHTTPprotocolisthatit'sstateless.Thismeansthatthecurrentrequesthasnoknowledgeaboutwhathappenedinpreviousrequests.Thisisalimitation,butyoucanbrowseawebsitewiththeillusionofbeingloggedin.Underthecoversthough,whathappensisthat,onlogin,atokenofuserinformationissaved(mostoftenontheclientside,inspecialfilescalledcookies)sothateachrequesttheusermakescarriesthemeansfortheservertorecognizetheuserandprovideacustominterfacebyshowingtheirname,keepingtheirbasketpopulated,andsoon.

Eventhoughit'sveryinteresting,we'renotgoingtodelveintotherichdetailsofHTTPandhowitworks.However,we'regoingtowriteasmallwebsite,whichmeanswe'llhavetowritethecodetohandleHTTPrequestsandreturnHTTPresponses.Iwon'tkeepprependingHTTPtothetermsrequestandresponsefromnowon,asItrusttherewon'tbeanyconfusion.

TheDjangowebframeworkForourproject,we'regoingtouseoneofthemostpopularwebframeworksyoucanfindinthePythonecosystem:Django.

Awebframeworkisasetoftools(libraries,functions,classes,andsoon)thatwecanusetocodeawebsite.Weneedtodecidewhatkindofrequestswewanttoallowtobeissuedagainstourwebserverandhowwerespondtothem.Awebframeworkistheperfecttoolfordoingthatbecauseittakescareofmanythingsforussothatwecanconcentrateonlyontheimportantbitswithouthavingtoreinventthewheel.

Therearedifferenttypesofframeworks.Notallofthemaredesignedforwritingcodefortheweb.Ingeneral,aframeworkisatoolthatprovidesfunctionalitiestofacilitatethedevelopmentofsoftwareapplications,products,andsolutions.

Djangodesignphilosophy

Djangoisdesignedaccordingtothefollowingprinciples:

Don'trepeatyourself(DRY):Don'trepeatcode,andcodeinawaythatmakestheframeworkdeduceasmuchaspossiblefromaslittleaspossible.Loosecoupling:Thevariouslayersoftheframeworkshouldn'tknowabouteachother(unlessabsolutelynecessaryforwhateverreason).Loosecouplingworksbestwhenparalleledwithhighcohesion.Puttingtogetherthingswhichchangeforthesamereason,andspreadingapartthosewhichchangefordifferentreasons.Lesscode:Applicationsshouldusetheleastpossibleamountofcode,andbewritteninawaythatfavorsreuseasmuchaspossible.Consistency:WhenusingtheDjangoframework,regardlessofwhichlayeryou'recodingagainst,yourexperiencewillbeveryconsistentwiththedesignpatternsandparadigmsthatwerechosentolayouttheproject.

Theframeworkitselfisdesignedaroundthemodel-template-view(MTV)pattern,whichisavariantofmodel-view-controller(MVC),whichiswidelyemployedbyotherframeworks.Thepurposeofsuchpatternsistoseparateconcernsandpromotecodereuseandquality.

ThemodellayerOfthethreelayers,thisistheonethatdefinesthestructureofthedatathatishandledbytheapplication,anddealswithdatasources.Amodelisaclassthatrepresentsadatastructure.ThroughsomeDjangomagic,modelsaremappedtodatabasetablessothatyoucanstoreyourdatainarelationaldatabase.

Arelationaldatabasestoresdataintablesinwhicheachcolumnisapropertyofthedataandeachrowrepresentsasingleitemorentryinthecollectionrepresentedbythattable.Throughtheprimarykeyofeachtable,whichisthatpartofthedatathatallowsittouniquelyidentifyeachitem,itispossibletoestablishrelationshipsbetweenitemsbelongingtodifferenttables,thatis,toputthemintorelation.

Thebeautyofthissystemisthatyoudon'thavetowritedatabase-specificcodeinordertohandleyourdata.Youjusthavetoconfigureyourmodelscorrectlyandusethem.TheworkonthedatabaseisdoneforyoubytheDjangoobject-relationalmapping(ORM),whichtakescareoftranslatingoperationsdoneonPythonobjectsintoalanguagethatarelationaldatabasecanunderstand:SQL(orStructuredQueryLanguage).WesawanexampleofORMinChapter7,FilesandDataPersistence,whereweexploredSQLAlchemy.

Onebenefitofthisapproachisthatyouwillbeabletochangedatabaseswithoutrewritingyourcode,sinceallthedatabase-specificcodeisproducedbyDjangoonthefly,accordingtowhichdatabaseit'sconnectedto.RelationaldatabasesspeakSQL,buteachofthemhasitsownuniqueflavorofit;therefore,nothavingtohardcodeanySQLinourapplicationisatremendousadvantage.

Djangoallowsyoutomodifyyourmodelsatanytime.Whenyoudo,youcanrunacommandthatcreatesamigration,whichisthesetofinstructionsneededtoportthedatabaseinastatethatrepresentsthecurrentdefinitionofyourmodels.

Tosummarize,thislayerdealswithdefiningthedatastructuresyouneedtohandleinyourwebsiteandgivesyouthemeanstosaveandloadthemfromandtothedatabasebysimplyaccessingthemodels,whicharePythonobjects.

TheviewlayerThefunctionofaviewishandlingarequest,performingwhateveractionneedstobecarriedout,andeventuallyreturningaresponse.Forexample,ifyouopenyourbrowserandrequestapagecorrespondingtoacategoryofproductsinane-commerceshop,theviewwilllikelytalktothedatabase,askingforallthecategoriesthatarechildrenoftheselectedcategory(forexample,todisplaytheminanavigationsidebar)andforalltheproductsthatbelongtotheselectedcategory,inordertodisplaythemonthepage.

Therefore,theviewisthemechanismthroughwhichwecanfulfillarequest.Itsresult,theresponseobject,canassumeseveraldifferentforms:aJSONpayload,text,anHTMLpage,andsoon.Whenyoucodeawebsite,yourresponsesusuallyconsistofHTMLorJSON.

TheHypertextMarkupLanguage,orHTML,isthestandardmarkuplanguageusedtocreatewebpages.WebbrowsersrunenginesthatarecapableofinterpretingHTMLcodeandrenderitintowhatweseewhenweopenapageofawebsite.

Thetemplatelayer

Thisisthelayerthatprovidesthebridgebetweenbackendandfrontenddevelopment.WhenaviewhastoreturnHTML,itusuallydoesitbypreparingacontextobject(adictionary)withsomedata,andthenitfeedsthiscontexttoatemplate,whichisrendered(thatistosay,transformedintoHTML),andreturnedtothecallerintheformofaresponse(moreprecisely,thebodyoftheresponse).Thismechanismallowsformaximumcodereuse.Ifyougobacktothecategoryexample,it'seasytoseethat,ifyoubrowseawebsitethatsellsproducts,itdoesn'treallymatterwhichcategoryyouclickonorwhattypeofsearchyouperform,thelayoutoftheproductspagedoesn'tchange.Whatdoeschangeisthedatawithwhichthatpageispopulated.

Therefore,thelayoutofthepageisdefinedbyatemplate,whichiswritteninamixtureofHTMLandDjangotemplatelanguages.Theviewthatservesthatpagecollectsalltheproductstobedisplayedinthecontextdictionary,andfeedsittothetemplate,whichwillberenderedintoanHTMLpagebytheDjangotemplateengine.

TheDjangoURLdispatcherThewayDjangoassociatesaUniformResourceLocator(URL)withaviewisbymatchingtherequestedURLwiththepatternsthatareregisteredinaspecialfile.AURLrepresentsapageinawebsitesohttp://mysite.com/categories?id=123wouldprobablypointtothepageforthecategorywithID123onmywebsite,whilehttps://mysite.com/loginwouldprobablybetheuserloginpage.

ThedifferencebetweenHTTPandHTTPSisthatthelatteraddsencryptiontotheprotocolsothatthedatathatyouexchangewiththewebsiteissecured.Whenyouputyourcreditcarddetailsonawebsite,orloginanywhere,ordoanythingaroundsensitivedata,youwanttomakesurethatyou'reusingHTTPS.

RegularexpressionsThewayDjangomatchesURLstopatternsisthrougharegularexpression.Aregularexpressionisasequenceofcharactersthatdefinesasearchpatternwithwhichwecancarryoutoperations,suchaspatternandstringmatching,andfind/replace.

Regularexpressionshaveaspecialsyntaxtoindicatethingssuchasdigits,letters,andspaces,aswellashowmanytimesweexpectacharactertoappear,andmuchmore.Acompleteexplanationofthistopicisoutsidethescopeofthisbook.However,itisaveryimportantsubject,sotheprojectwe'regoingtoworkontogetherwillrevolvearoundit,inthehopethatyouwillbestimulatedtofindthetimetoexploreitabitmoreonyourown.

Togiveyouaquickexample,imaginethatyouwantedtospecifyapatterntomatchadate,suchas"26-12-1947".Thisstringconsistsoftwodigits,onedash,twodigits,onedash,andfinallyfourdigits.Therefore,wecouldwriteitlikethis:r'[0-9]{2}-[0-9]{2}-[0-9]{4}'.Wecreatedaclassbyusingsquarebrackets,andwedefinedarangeofdigitsinside,from0to9,henceallthepossibledigits.Then,betweencurlybrackets,wesaythatweexpecttwoofthem.Thenadash,thenwerepeatthispatternonceasitis,andoncemore,bychanginghowmanydigitsweexpect,andwithoutthefinaldash.Havingaclasssuchas[0-9]issuchacommonpatternthataspecialnotationhasbeencreatedasashortcut:'\d'.Therefore,wecanrewritethepatternlikethis:r'\d{2}-\d{2}-\d{4}'anditwillworkexactlythesame.Thatrinfrontofthestringstandsforraw,anditspurposeistopreventspythonfromtryingtointerpretbackslashescapesequences,sothattheycanbepassedas-istotheregularexpressionengine.

AregexwebsiteSo,hereweare.We'llcodeawebsitethatstoresregularexpressionssothatwe'llbeabletoplaywiththemalittlebit.

Beforeweproceedwithcreatingtheproject,I'dliketotalkaboutCascadingStyleSheets(CSS).CSSarefilesinwhichwespecifyhowthevariouselementsonanHTMLpagelook.Youcansetallsortsofproperties,suchasshape,size,color,margins,borders,andfonts.Inthisproject,Ihavetriedmybesttoachieveadecentresultonthepages,butI'mneitherafrontenddevelopernoradesigner,sopleasedon'tpaytoomuchattentiontohowthingslook.Trytofocusonhowtheywork.

SettingupDjangoOntheDjangowebsite(https://www.djangoproject.com/),youcanfollowthetutorial,whichgivesyouaprettygoodideaofDjango'scapabilities.Ifyouwant,youcanfollowthattutorialfirstandthencomebacktothisexample.So,firstthingsfirst;let'sinstallDjangoinyourvirtualenvironment(youwillfinditisalreadyinstalled,asitispartoftherequirementsfile):$pipinstalldjango

Whenthiscommandisdone,youcantestitwithinaconsole(trydoingitwithbpython,itgivesyouashellsimilartoIPythonbutwithniceintrospectioncapabilities):

>>>importdjango

>>>django.VERSION

(2,0,5,'final',0)

NowthatDjangoisinstalled,we'regoodtogo.We'llhavetodosomescaffolding,soI'llquicklyguideyouthroughthat.

StartingtheprojectChooseafolderinthebook'senvironmentandchangeintothat.I'llusech14.Fromthere,wecanstartaDjangoprojectwiththefollowingcommand:

$django-adminstartprojectregex

ThiswillpreparetheskeletonforaDjangoprojectcalledregex.Changeintotheregexfolderandrunthefollowing:

$pythonmanage.pyrunserver

Youshouldbeabletogotohttp://127.0.0.1:8000/withyourbrowserandseetheItworked!defaultDjangopage.Thismeansthattheprojectiscorrectlysetup.Whenyou'veseenthepage,killtheserverwithCtrl+C(orwhateveritsaysintheconsole).I'llpastethefinalstructurefortheprojectnowsothatyoucanuseitasareference:

$tree-Aregex#fromthech14folder

regex

├──entries

│├──__init__.py

│├──admin.py

│├──forms.py

│├──migrations

││├──0001_initial.py

││└──__init__.py

│├──models.py

│├──static

││└──entries

││└──css

││└──main.css

│├──templates

││└──entries

││├──base.html

││├──footer.html

││├──home.html

││├──insert.html

││└──list.html

│└──views.py

├──manage.py

└──regex

├──__init__.py

├──settings.py

├──urls.py

└──wsgi.py

Don'tworryifyou'remissingfiles,we'llgetthere.ADjangoprojectistypically

acollectionofseveraldifferentapplications.Eachapplicationismeanttoprovideafunctionalityinaself-contained,reusablefashion.We'llcreatejustone,calledentries:

$pythonmanage.pystartappentries

Withintheentriesfolderthathasbeencreated,youcangetridofthetests.pymodule.

Now,let'sfixtheregex/settings.pyfileintheregexfolder.WeneedtoaddourapplicationtotheINSTALLED_APPSlistsothatwecanuseit(additatthebottomofthelist):

INSTALLED_APPS=[

'django.contrib.admin',

...

'entries',

]

Then,youmaywanttofixthelanguageandtimezoneaccordingtoyourpersonalpreference.IliveinLondon,soIsetthemlikethis:

LANGUAGE_CODE='en-gb'

TIME_ZONE='Europe/London'

Thereisnothingelsetodointhisfile,soyoucansaveandcloseit.

Nowit'stimetoapplythemigrationstothedatabase.Djangoneedsdatabasesupporttohandleusers,sessions,andthingslikethat,soweneedtocreateadatabaseandpopulateitwiththenecessarydata.Luckily,thisisveryeasilydonewiththefollowingcommand:

$pythonmanage.pymigrate

Forthisproject,weuseanSQLitedatabase,whichisbasicallyjustafile.Onarealproject,youwoulduseadifferentdatabaseengine,suchasMySQLorPostgreSQL.

CreatingusersNowthatwehaveadatabase,wecancreateasuperuserusingtheconsole:

$pythonmanage.pycreatesuperuser

Afterenteringtheusernameandotherdetails,wehaveauserwithadminprivileges.ThisisenoughtoaccesstheDjangoadminsection,sotrytostarttheserver:

$pythonmanage.pyrunserver

ThiswillstarttheDjangodevelopmentserver,whichisaveryusefulbuilt-inwebserverthatyoucanusewhileworkingwithDjango.Nowthattheserverisrunning,wecanaccesstheadminpageathttp://localhost:8000/admin/.Iwillshowyouascreenshotofthissectionlater.IfyouloginwiththecredentialsoftheuseryoujustcreatedandheadtotheAuthenticationandAuthorizationsection,you'llfindUsers.Openthatandyouwillbeabletoseethelistofusers.Youcaneditthedetailsofanyuseryouwantasanadmin.Inourcase,makesureyoucreateadifferentonesothatthereareatleasttwousersinthesystem(we'llneedthemlater).I'llcallthefirstuserFabrizio(username:fab)andthesecondoneAdriano(username:adri),inhonorofmyfather.

Bytheway,youshouldseethattheDjangoadminpanelcomesforfreeautomatically.Youdefineyourmodels,hookthemup,andthat'sit.ThisisanincredibletoolthatshowshowadvancedDjango'sintrospectioncapabilitiesare.Moreover,itiscompletelycustomizableandextendable.It'strulyanexcellentpieceofwork.

AddingtheEntrymodelNowthattheboilerplateisoutoftheway,andwehaveacoupleofusers,we'rereadytocode.WestartbyaddingtheEntrymodeltoourapplicationsothatwecanstoreobjectsinthedatabase.Here'sthecodeyou'llneedtoadd(remembertousetheprojecttreeforreference):

#entries/models.py

fromdjango.dbimportmodels

fromdjango.contrib.auth.modelsimportUser

fromdjango.utilsimporttimezone

classEntry(models.Model):

user=models.ForeignKey(User,on_delete=models.CASCADE)

pattern=models.CharField(max_length=255)

test_string=models.CharField(max_length=255)

date_added=models.DateTimeField(default=timezone.now)

classMeta:

verbose_name_plural='entries'

Thisisthemodelwe'llusetostoreregularexpressionsinoursystem.We'llstoreapattern,ateststring,areferencetotheuserwhocreatedtheentry,andthemomentofcreation.Youcanseethatcreatingamodelisactuallyquiteeasy,butnonetheless,let'sgothroughitlinebyline.

Firstweneedtoimportthemodelsmodulefromdjango.db.ThiswillgiveusthebaseclassforourEntrymodel.Djangomodelsarespecialclassesandmuchisdoneforusbehindthesceneswhenweinheritfrommodels.Model.

Wewantareferencetotheuserwhocreatedtheentry,soweneedtoimporttheUsermodelfromDjango'sauthorizationapplicationandwealsoneedtoimportthetimezonemodeltogetaccesstothetimezone.now()function,whichprovidesuswithatimezone-awareversionofdatetime.now().Thebeautyofthisisthatit'shookedupwiththeTIME_ZONEsettingsIshowedyoubefore.

Asfortheprimarykeyforthisclass,ifwedon'tsetoneexplicitly,Djangowilladdoneforus.AprimarykeyisakeythatallowsustouniquelyidentifyanEntryobjectinthedatabase(inthiscase,Djangowilladdanauto-incrementingintegerID).

So,wedefineourclass,andwesetupfourclassattributes.WehaveaForeignKeyattributethatisourreferencetotheUsermodel.WealsohavetwoCharFieldattributesthatholdthepatternandteststringsforourregularexpressions.WealsohaveDateTimeField,whosedefaultvalueissettotimezone.now.Notethatwedon'tcalltimezone.nowrightthere,it'snow,notnow().So,we'renotpassingaDateTimeinstance(setatthemomentintimewhenthatlineisparsed)rather,we'repassingacallable,afunctionthatiscalledwhenwesaveanentryinthedatabase.ThisissimilartothecallbackmechanismweusedinChapter12,GUIsandScripts,whenwewereassigningcommandstobuttonclicks.

Thelasttwolinesareveryinteresting.WedefineaMetaclasswithintheEntryclassitself.TheMetaclassisusedbyDjangotoprovideallsortsofextrainformationforamodel.DjangohasagreatdealoflogicunderthehoodtoadaptitsbehavioraccordingtotheinformationweputintotheMetaclass.Inthiscase,intheadminpanel,thepluralizedversionofEntrywouldbeEntrys,whichiswrong,thereforeweneedtosetitmanually.Wespecifythepluralinalllowercase,asDjangotakescareofcapitalizingitforuswhenneeded.

Nowthatwehaveanewmodel,weneedtoupdatethedatabasetoreflectthenewstateofthecode.Inordertodothis,weneedtoinstructDjangothatitneedstocreatethecodetoupdatethedatabase.Thiscodeiscalledmigration.Let'screateitandexecuteit:

$pythonmanage.pymakemigrationsentries

$pythonmanage.pymigrate

Afterthesetwoinstructions,thedatabasewillbereadytostoreEntryobjects.

Therearetwodifferentkindsofmigrations:dataandschemamigrations.Datamigrationsportdatafromonestatetoanotherwithoutalteringitsstructure.Forexample,adatamigrationcouldsetallproductsforacategoryasoutofstockbyswitchingaflagtoFalseor0.Aschemamigrationisasetofinstructionsthatalterthestructureofthedatabaseschema.Forexample,thatcouldbeaddinganagecolumntoaPersontable,orincreasingthemaximumlengthofafieldtoaccountforverylongaddresses.WhendevelopingwithDjango,it'squitecommontohavetoperformbothkindsofmigrationsoverthecourseofdevelopment.Dataevolvescontinuously,especiallyifyoucodeinanagileenvironment.

CustomizingtheadminpanelThenextstepistohooktheEntrymodelupwiththeadminpanel.Youcandoitwithonelineofcode,butinthiscase,Iwanttoaddsomeoptionstocustomizethewaytheadminpanelshowstheentries,bothinthelistviewofallentryitemsinthedatabaseandintheformviewthatallowsustocreateandmodifythem.

Allweneedtodoistoaddthefollowingcode:

#entries/admin.py

fromdjango.contribimportadmin

from.modelsimportEntry

@admin.register(Entry)

classEntryAdmin(admin.ModelAdmin):

fieldsets=[

('RegularExpression',

{'fields':['pattern','test_string']}),

('OtherInformation',

{'fields':['user','date_added']}),

]

list_display=('pattern','test_string','user')

list_filter=['user']

search_fields=['test_string']

Thisissimplybeautiful.Myguessisthatyouprobablyalreadyunderstandmostofit,evenifyou'renewtoDjango.

So,westartbyimportingtheadminmoduleandtheEntrymodel.Becausewewanttofostercodereuse,weimporttheEntrymodelusingarelativeimport(there'sadotbeforemodels).Thiswillallowustomoveorrenametheapplicationwithouttoomuchtrouble.Then,wedefinetheEntryAdminclass,whichinheritsfromadmin.ModelAdmin.ThedecorationontheclasstellsDjangotodisplaytheEntrymodelintheadminpanel,andwhatweputintheEntryAdminclasstellsDjangohowtocustomizethewayithandlesthismodel.

First,wespecifythefieldsetsforthecreate/editpage.Thiswilldividethepageintotwosectionssothatwegetabettervisualizationofthecontent(patternandteststring)andtheotherdetails(userandtimestamp)separately.

Then,wecustomizethewaythelistpagedisplaystheresults.Wewanttoseeallthefields,butnotthedate.Wealsowanttobeabletofilterontheusersothat

wecanhavealistofalltheentriesbyjustoneuser,andwewanttobeabletosearchontest_string.

Iwillgoaheadandaddthreeentries,oneformyselfandtwoonbehalfofmyfather.Theresultisshowninthenexttwoscreenshots.Afterinsertingthem,thelistpagelookslikethis:

IhavehighlightedthethreepartsofthisviewthatwecustomizedintheEntryAdminclass.Wecanfilterbyuser,wecansearch,andwehaveallthefieldsdisplayed.Ifyouclickonapattern,theeditviewopensup.

Afterourcustomization,itlookslikethis:

Noticehowwehavetwosections:RegularExpressionandOtherInformation,thankstoourcustomEntryAdminclass.Haveagowithit,addsomeentriestoacoupleofdifferentusers,getfamiliarwiththeinterface.Isn'titnicetohaveallthisforfree?

CreatingtheformEverytimeyoufillinyourdetailsonawebpage,you'reinsertingdatainformfields.AformisapartoftheHTMLDocumentObjectModel(DOM)tree.InHTML,youcreateaformbyusingtheformtag.Whenyouclickonthesubmitbutton,yourbrowsernormallypackstheformdatatogetherandputsitinthebodyofaPOSTrequest.AsopposedtoGETrequests,whichareusedtoaskthewebserverforaresource,aPOSTrequestnormallysendsdatatothewebserverwiththeaimofcreatingorupdatingaresource.Forthisreason,handlingPOSTrequestsusuallyrequiresmorecarethanGETrequests.

WhentheserverreceivesdatafromaPOSTrequest,thatdataneedstobevalidated.Moreover,theserverneedstoemploysecuritymechanismstoprotectagainstvarioustypesofattacks.Oneattackthatisverydangerousisthecross-siterequestforgery(CSRF)attack,whichhappenswhendataissentfromadomainthatisnottheonetheuserisauthenticatedon.Djangoallowsyoutohandlethisissueinaveryelegantway.

So,insteadofbeinglazyandusingtheDjangoadmintocreatetheentries,I'mgoingtoshowyouhowtodoitusingaDjangoform.Byusingthetoolstheframeworkgivesyou,yougetaverygooddegreeofvalidationworkalreadydone(infact,wewon'tneedtoaddanycustomvalidationourselves).

TherearetwokindsofformclassesinDjango:FormandModelForm.Youusetheformertocreateaformwhoseshapeandbehaviordependsonhowyoucodetheclass,whatfieldsyouadd,andsoon.Ontheotherhand,thelatterisatypeofformthat,albeitstillcustomizable,infersfieldsandbehaviorfromamodel.SinceweneedaformfortheEntrymodel,we'llusethatone:

#entries/forms.py

fromdjango.formsimportModelForm

from.modelsimportEntry

classEntryForm(ModelForm):

classMeta:

model=Entry

fields=['pattern','test_string']

Amazinglyenough,thisisallwehavetodotohaveaformthatwecanputona

page.Theonlynotablethinghereisthatwerestrictthefieldstoonlypatternandtest_string.Onlylogged-inuserswillbeallowedaccesstotheinsertpage,andthereforewedon'tneedtoaskwhotheuseris,wealreadyknowthat.Asforthedate,whenwesaveanEntry,thedate_addedfieldwillbesetaccordingtoitsdefault,thereforewedon'tneedtospecifythataswell.We'llseeintheviewhowtofeedtheuserinformationtotheformbeforesaving.So,nowthatthebackgroundworkisdone,allweneedistheviewsandthetemplates.Let'sstartwiththeviews.

Writingtheviews

Weneedtowritethreeviews.Weneedoneforthehomepage,onetodisplaythelistofallentriesforauser,andonetocreateanewentry.Wealsoneedviewstologinandlogout.ButthankstoDjango,wedon'tneedtowritethem.I'llpastethecodeinsteps:

#entries/views.py

importre

fromdjango.contrib.auth.decoratorsimportlogin_required

fromdjango.contrib.messages.viewsimportSuccessMessageMixin

fromdjango.urlsimportreverse_lazy

fromdjango.utils.decoratorsimportmethod_decorator

fromdjango.views.genericimportFormView,TemplateView

from.formsimportEntryForm

from.modelsimportEntry

Let'sstartwiththeimports.Weneedtheremoduletohandleregularexpressions,thenweneedafewclassesandfunctionsfromDjango,andfinally,weneedtheEntrymodelandtheEntryFormform.

ThehomeviewThefirstviewisHomeView:

#entries/views.py

classHomeView(TemplateView):

template_name='entries/home.html'

@method_decorator(

login_required(login_url=reverse_lazy('login')))

defget(self,request,*args,**kwargs):

returnsuper(HomeView,self).get(request,*args,**kwargs)

ItinheritsfromTemplateView,whichmeansthattheresponsewillbecreatedbyrenderingatemplatewiththecontextwe'llcreateintheview.Allwehavetodoisspecifythetemplate_nameclassattributetopointtothecorrecttemplate.Djangopromotescodereusetoapointthatifwedidn'tneedtomakethisviewaccessibleonlytologged-inusers,thefirsttwolineswouldhavebeenallweneeded.

However,wewantthisviewtobeaccessibleonlytologged-inusers;therefore,weneedtodecorateitwithlogin_required.Now,historicallyviewsinDjangowerefunctions;therefore,thisdecoratorwasdesignedtoacceptafunction,andnotamethodlikewehaveinthisclass.We'reusingDjangoclass-basedviewsinthisprojectso,inordertomakethingswork,weneedtotransformlogin_requiredsothatitacceptsamethod(thedifferencebeinginthefirstargument:self).Wedothisbypassinglogin_requiredtomethod_decorator.

Wealsoneedtofeedthelogin_requireddecoratorwithlogin_urlinformation,andherecomesanotherwonderfulfeatureofDjango.Asyou'llseeafterwe'redonewiththeviews,inDjango,youtieaviewtoaURLthroughapattern,consistingofastringwhichmayormaynotbearegularexpression,andpossiblyotherinformation.Youcangiveanametoeachentryintheurls.pyfilesothatwhenyouwanttorefertoaURL,youdon'thavetohardcodeitsvalueintoyourcode.AllyouhavetodoisgetDjangotoreverse-engineerthatURLfromthenamewegavetotheentryinurls.py,definingtheURLandtheviewthatistiedtoit.Thismechanismwillbecomeclearerlater.Fornow,justthinkofreverse('...')asawayofgettingaURLfromanidentifier.Inthisway,youonlywritetheactualURLonce,intheurls.pyfile,whichisbrilliant.Intheviews.pycode,weneedto

usereverse_lazy,whichworksexactlylikereversewithonemajordifference:itonlyfindstheURLwhenweactuallyneedit(inalazyfashion).Thereasonwhyreverse_lazycanbesousefulisthatsometimesitmighthappenthatweneedtoreverseanURLfromanidentifier,butatthemomentwecallreverse,theurls.pymodulehasn'tbeenloadedyet,whichcausesafailure.Thelazybehaviorofreverse_lazysolvestheissuebecauseevenifthecallismadebeforetheurls.pymodulehasbeenloaded,theactualreversingoftheidentifier,togettotherelatedURL,happensinalazyfashion,lateron,whenurls.pyhassurelybeenloaded.

Thegetmethod,whichwejustdecorated,simplycallsthegetmethodoftheparentclass.Ofcourse,thegetmethodisthemethodthatDjangocallswhenaGETrequestisperformedagainsttheURLtiedtothisview.

TheentrylistviewThisviewismuchmoreinterestingthanthepreviousone:

#entries/views.py

classEntryListView(TemplateView):

template_name='entries/list.html'

@method_decorator(

login_required(login_url=reverse_lazy('login')))

defget(self,request,*args,**kwargs):

context=self.get_context_data(**kwargs)

entries=Entry.objects.filter(

user=request.user).order_by('-date_added')

matches=(self._parse_entry(entry)forentryinentries)

context['entries']=list(zip(entries,matches))

returnself.render_to_response(context)

def_parse_entry(self,entry):

match=re.search(entry.pattern,entry.test_string)

ifmatchisnotNone:

return(

match.group(),

match.groups()orNone,

match.groupdict()orNone

)

returnNone

Firstofall,wedecoratethegetmethodaswedidbefore.Insideofit,weneedtopreparealistofEntryobjectsandfeedittothetemplate,whichshowsittotheuser.Inordertodoso,westartbygettingthecontextdictionarylikewe'resupposedtodo,bycallingtheget_context_datamethodoftheTemplateViewclass.Then,weusetheORMtogetalistoftheentries.Wedothisbyaccessingtheobjectsmanager,andcallingafilteronit.Wefiltertheentriesaccordingtowhichuserisloggedin,andweaskforthemtobesortedindescendingorder(that'-'infrontofthenamespecifiesthedescendingorder).TheobjectsmanageristhedefaultmanagereveryDjangomodelisaugmentedwithoncreation:itallowsustointeractwiththedatabasethroughitsmethods.

Weparseeachentrytogetalistofmatches(actually,Icodeditsothatmatchesisageneratorexpression).Finally,weaddtothecontextan'entries'keywhosevalueisthecouplingofentriesandmatches,sothateachEntryinstanceispairedwiththeresultingmatchofitspatternandteststring.

Onthelastline,wesimplyaskDjangotorenderthetemplateusingthecontext

wecreated.

Takealookatthe_parse_entrymethod.Allitdoesisperformasearchontheentry.test_stringwiththeentry.pattern.IftheresultingmatchobjectisnotNone,itmeansthatwefoundsomething.Ifso,wereturnatuplewiththreeelements:theoverallgroup,thesubgroups,andthegroupdictionary.

Noticethatmatch.groups()andmatch.groupdict()mightreturnrespectivelyanemptytupleandanemptydict.InordertonormalizeemptyresultstoasimplerNone,IuseacommonpatterninPythonbyexploitingtheoroperator.AorB,infact,willreturnAifAevaluatestoatruthyvalue,orBotherwise.Canyouthinkhowthismightdifferfromthebehavioroftheandoperator?

Ifyou'renotfamiliarwiththoseterms,don'tworry,you'llseeascreenshotsoonwithanexample.WereturnNoneifthereisnomatch(whichtechnicallyisnotneeded,asPythonwoulddothatanyway,butIhaveincludedithereforthesakeofbeingexplicit).

TheformviewFinally,let'sexamineEntryFormView:

#entries/views.py

classEntryFormView(SuccessMessageMixin,FormView):

template_name='entries/insert.html'

form_class=EntryForm

success_url=reverse_lazy('insert')

success_message="Entrywascreatedsuccessfully"

@method_decorator(

login_required(login_url=reverse_lazy('login')))

defget(self,request,*args,**kwargs):

returnsuper(EntryFormView,self).get(

request,*args,**kwargs)

@method_decorator(

login_required(login_url=reverse_lazy('login')))

defpost(self,request,*args,**kwargs):

returnsuper(EntryFormView,self).post(

request,*args,**kwargs)

defform_valid(self,form):

self._save_with_user(form)

returnsuper(EntryFormView,self).form_valid(form)

def_save_with_user(self,form):

self.object=form.save(commit=False)

self.object.user=self.request.user

self.object.save()

Thisisparticularlyinterestingforafewreasons.First,itshowsusaniceexampleofPython'smultipleinheritance.Wewanttodisplayamessageonthepage,afterhavinginsertedanEntry,soweinheritfromSuccessMessageMixin.Butwewanttohandleaformaswell,sowealsoinheritfromFormView.

Notethat,whenyoudealwithmixinsandinheritance,youmayhavetoconsidertheorderinwhichyouspecifythebaseclassesintheclassdeclaration,asitwillaffecthowmethodsarefoundwhengoinguptheinheritancechaintoserveacall.

Inordertosetupthisviewcorrectly,weneedtospecifyafewattributesatthebeginning:thetemplatetoberendered,theformclasstobeusedtohandlethedatafromthePOSTrequest,theURLweneedtoredirecttheusertointhecaseofsuccess,andthesuccessmessage.

AnotherinterestingfeatureisthatthisviewneedstohandlebothGETandPOSTrequests.Whenwelandontheformpageforthefirsttime,theformisempty,

andthatistheGETrequest.Ontheotherhand,whenwefillintheformandwanttosubmittheEntry,wemakeaPOSTrequest.YoucanseethatthebodyofgetisconceptuallyidenticaltoHomeView.Djangodoeseverythingforus.

Thepostmethodisjustlikeget.Theonlyreasonweneedtocodethesetwomethodsissothatwecandecoratethemtorequirelogin.

WithintheDjangoform-handlingprocess(intheFormViewclass),thereareafewmethodsthatwecanoverrideinordertocustomizetheoverallbehavior.Weneedtodoitwiththeform_validmethod.Thismethodwillbecalledwhentheformvalidationissuccessful.ItspurposeistosavetheformsothatanEntryobjectiscreatedoutofit,andthenstoredinthedatabase.

Theonlyproblemisthatourformismissingtheuser.Weneedtointerceptthatmomentinthechainofcallsandputtheuserinformationinourselves.Thisisdonebycallingthe_save_with_usermethod,whichisverysimple.

First,weaskDjangotosavetheformwiththecommitargumentsettoFalse.ThiscreatesanEntryinstancewithoutattemptingtosaveittothedatabase.Savingitimmediatelywouldfailbecausetheuserinformationisnotthere.

ThenextlineupdatestheEntryinstance(self.object),addingtheuserinformationand,onthelastline,wecansafelysaveit.ThereasonIcalledobjectandsetitontheinstancelikethatwastofollowwhattheoriginalFormViewclassdoes.

We'refiddlingwiththeDjangomechanismhere,soifwewantthewholethingtowork,weneedtopayattentiontowhenandhowwemodifyitsbehavior,andmakesurewedon'talteritincorrectly.Forthisreason,it'sveryimportanttoremembertocalltheform_validmethodofthebaseclass(weusesuperforthat)attheendofourowncustomizedversion,tomakesurethateveryotheractionthatmethodusuallyperformsiscarriedoutcorrectly.

Notehowtherequestistiedtoeachviewinstance(self.request)sothatwedon'tneedtopassitthroughwhenwerefactorourlogicintomethods.NotealsothattheuserinformationhasbeenaddedtotherequestautomaticallybyDjango.Finally,thereasonwhyalltheprocessissplitintoverysmallmethodsliketheseissothatwecanonlyoverridethosethatweneedtocustomize.Allthisremovestheneedtowritealotofcode.

Nowthatwehavetheviewscovered,let'sseehowwecouplethemtotheURLs.

TyingupURLsandviewsIntheurls.pymodule,wetieeachviewtoaURL.Therearemanywaysofdoingthis.Ichosethesimplestone,whichworksperfectlyfortheextentofthisexercise,butyoumaywanttoexplorethissubjectmoredeeplyifyouintendtoworkwithDjango.Thisisthecorearoundwhichthewholewebsitelogicwillrevolve;therefore,youshouldtrytogetitdowncorrectly.Notethattheurls.pymodulebelongstotheprojectfolder:

#regex/urls.py

fromdjango.contribimportadmin

fromdjango.urlsimportpath

fromdjango.contrib.authimportviewsasauth_views

fromdjango.urlsimportreverse_lazy

fromentries.viewsimportHomeView,EntryListView,EntryFormView

urlpatterns=[

path('admin/',admin.site.urls),

path('entries/',EntryListView.as_view(),name='entries'),

path('entries/insert',

EntryFormView.as_view(),

name='insert'),

path('login/',

auth_views.login,

kwargs={'template_name':'admin/login.html'},

name='login'),

path('logout/',

auth_views.logout,

kwargs={'next_page':reverse_lazy('home')},

name='logout'),

path('',HomeView.as_view(),name='home'),

]

Ifyouarefamiliarwithversion1ofDjango,youwillnoticesomedifferenceshere,asthisprojectiscodedinversion2.Asyoucansee,themagiccomesfromthepathfunction,whichhasrecentlyreplacedtheurlfunction.First,wepassitapathstring(alsoknownasaroute),thentheview,andfinallyaname,whichiswhatwewilluseinthereverseandreverse_lazyfunctionstorecovertheURL.

Notethat,whenusingclass-basedviews,wehavetotransformthemintofunctions,whichiswhatpathisexpecting.Todothat,wecalltheas_view()methodonthem.

Notealsothatthefirstpathentry,fortheadmin,isspecial.Insteadofspecifyinga

URLandaview,itspecifiesaURLprefixandanotherurls.pymodule(fromtheadmin.sitepackage).Inthisway,DjangowillcompletealltheURLsfortheadminsectionbyprepending'admin/'toalltheURLsspecifiedinadmin.site.urls.Wecouldhavedonethesameforourentriesapplication(andweshouldhave),butIfeelitwouldhavebeenabitofoverkillforthissimpleproject.

TheURLpathsdefinedinthismodulearesosimplethattheydon'trequireanyregularexpressiontobedefined.Shouldyouneedtousearegularexpression,youcancheckoutthere_pathfunction,whichisdesignedforthatpurpose.

Wealsoincludeloginandlogoutfunctionalities,byemployingviewsthatcomestraightoutofthedjango.contrib.authpackage.Weenrichthedeclarationwiththenecessaryinformation(suchasthenextpage,forthelogoutview,forexample)andwedon'tneedtowriteasinglelineofcodetohandleauthentication.Thisisbrilliantandsavesusalotoftime.

Eachpathdeclarationmustbedonewithintheurlpatternslistandonthismatter,it'simportanttoconsiderthat,whenDjangoistryingtofindaviewforaURLthathasbeenrequested,thepatternsareexercisedinorder,fromtoptobottom.Thefirstonethatmatchesistheonethatwillprovidetheviewforitso,ingeneral,youhavetoputspecificpatternsbeforegenericones,otherwisetheywillnevergetachancetobecaught.Toshowyouanexamplethatusesregularexpressionsintheroutedeclaration,'^shop/categories/$'needstocomebefore'^shop'(noticethatthe'$'signalstheendofthepattern,anditisnotspecifiedinthelatter),otherwiseitwouldneverbecalled.

So,models,forms,admin,views,andURLsarealldone.Allthat'sleftistotakecareofthetemplates.I'llhavetobeverybriefonthispartbecauseHTMLcanbeveryverbose.

WritingthetemplatesAlltemplatesinheritfromabaseone,whichprovidestheHTMLstructureforallothers,inaveryobject-orientedprogramming(OOP)fashion.Italsospecifiesafewblocks,whichareareasthatcanbeoverriddenbychildrensothattheycanprovidecustomcontentforthoseareas.Let'sstartwiththebasetemplate:

#entries/templates/entries/base.html

{%loadstaticfromstaticfiles%}

<!DOCTYPEhtml>

<htmllang="en">

<head>

{%blockmeta%}

<metacharset="utf-8">

<metaname="viewport"

content="width=device-width,initial-scale=1.0">

{%endblockmeta%}

{%blockstyles%}

<linkhref="{%static"entries/css/main.css"%}"

rel="stylesheet">

{%endblockstyles%}

<title>{%blocktitle%}Title{%endblocktitle%}</title>

</head>

<body>

<divid="page-content">

{%blockpage-content%}

{%endblockpage-content%}

</div>

<divid="footer">

{%blockfooter%}

{%endblockfooter%}

</div>

</body>

</html>

Thereisagoodreasontorepeattheentriesfolderfromthetemplatesone.WhenyoudeployaDjangowebsite,youcollectallthetemplatefilesunderonefolder.Ifyoudon'tspecifythepathslikeIdid,youmaygetabase.htmltemplateintheentriesapplication,andabase.htmltemplateinanotherapp.Thelastonetobecollectedwilloverrideanyotherfilewiththesamename.Forthisreason,byputtingtheminatemplates/entriesfolderandusingthistechniqueforeachDjangoapplicationyouwrite,youavoidtheriskofnamecollisions(thesamegoesforanyotherstaticfile).

Thereisnotmuchtosayaboutthistemplate,really,apartfromthefactthatit

loadsthestatictagsothatwecangeteasyaccesstothestaticpathwithouthardcodingitinthetemplateusing{%static...%}.Thecodeinthespecial{%...%}sectionsiscodethatdefineslogic.Thecodeinthespecial{{...}}representsvariablesthatwillberenderedonthepage.

Wedefinefiveblocks:styles,meta,title,page-content,andfooter,whosepurposeistoholdthemetadata,styleinformation,title,thecontentofthepage,andthefooter,respectively.Blockscanbeoptionallyoverriddenbychildtemplatesinordertoprovidedifferentcontentwithinthem.

Here'sthefooter:

#entries/templates/entries/footer.html

<divclass="footer">

Goback<ahref="{%url"home"%}">home</a>.

</div>

Itgivesusanicelinktothehomepage,whichcomesfromthefollowingtemplate:

#entries/templates/entries/home.html

{%extends"entries/base.html"%}

{%blocktitle%}WelcometotheEntrywebsite.{%endblocktitle%}

{%blockpage-content%}

<h1>Welcome{{user.first_name}}!</h1>

<divclass="home-option">Toseethelistofyourentries

pleaseclick<ahref="{%url"entries"%}">here.</a>

</div>

<divclass="home-option">Toinsertanewentrypleaseclick

<ahref="{%url"insert"%}">here.</a>

</div>

<divclass="home-option">Tologinasanotheruserpleaseclick

<ahref="{%url"logout"%}">here.</a>

</div>

<divclass="home-option">Togototheadminpanel

pleaseclick<ahref="{%url"admin:index"%}">here.</a>

</div>

{%endblockpage-content%}

Itextendsthebase.htmltemplate,andoverridestitleandpage-content.Youcanseethatbasicallyallitdoesisprovidefourlinkstotheuser.Thesearethelistofentries,theinsertpage,thelogoutpage,andtheadminpage.AllofthisisdonewithouthardcodingasingleURL,throughtheuseofthe{%url...%}tag,whichisthetemplateequivalentofthereversefunction.

ThetemplateforinsertingEntryisasfollows:

#entries/templates/entries/insert.html

{%extends"entries/base.html"%}

{%blocktitle%}InsertanewEntry{%endblocktitle%}

{%blockpage-content%}

{%ifmessages%}

{%formessageinmessages%}

<pclass="{{message.tags}}">{{message}}</p>

{%endfor%}

{%endif%}

<h1>InsertanewEntry</h1>

<formaction="{%url"insert"%}"method="post">

{%csrf_token%}{{form.as_p}}

<inputtype="submit"value="Insert">

</form><br>

{%endblockpage-content%}

{%blockfooter%}

<div><ahref="{%url"entries"%}">Seeyourentries.</a></div>

{%include"entries/footer.html"%}

{%endblockfooter%}

Thereissomeconditionallogicatthebeginningtodisplaymessages,ifany,andthenwedefinetheform.Djangogivesustheabilitytorenderaformbysimplycalling{{form.as_p}}(alternatively,form.as_ulorform.as_table).Thiscreatesallthenecessaryfieldsandlabelsforus.Thedifferencebetweenthethreecommandsisinthewaytheformislaidout:asaparagraph,asanunorderedlist,orasatable.Weonlyneedtowrapitinformtagsandaddasubmitbutton.Thisbehaviorwasdesignedforourconvenience:weneedthefreedomtoshapethat<form>tagaswewant,soDjangoisn'tintrusiveonthat.Also,notethat{%csrf_token%}.

ItwillberenderedintoatokenbyDjangoandwillbecomepartofthedatasenttotheserveronsubmission.Thisway,Djangowillbeabletoverifythattherequestwasfromanallowedsource,thusavoidingtheaforementionedCSRFissue.DidyouseehowwehandledthetokenwhenwewrotetheviewfortheEntryinsertion?Exactly.Wedidn'twriteasinglelineofcodeforit.Djangotakescareofitautomaticallythankstoamiddlewareclass(CsrfViewMiddleware).PleaserefertotheofficialDjangodocumentation(https://docs.djangoproject.com/en/2.0/)toexplorethissubjectfurther.

Forthispage,wealsousethefooterblocktodisplayalinktothehomepage.Finally,wehavethelisttemplate,whichisthemostinterestingone:

#entries/templates/entries/list.html

{%extends"entries/base.html"%}

{%blocktitle%}Entrieslist{%endblocktitle%}

{%blockpage-content%}

{%ifentries%}

<h1>Yourentries({{entries|length}}found)</h1>

<div><ahref="{%url"insert"%}">Insertnewentry.</a></div>

<tableclass="entries-table">

<thead>

<tr><th>Entry</th><th>Matches</th></tr>

</thead>

<tbody>

{%forentry,matchinentries%}

<trclass="entries-list{%cycle'light-gray''white'%}">

<td>

Pattern:<codeclass="code">

"{{entry.pattern}}"</code><br>

TestString:<codeclass="code">

"{{entry.test_string}}"</code><br>

Added:{{entry.date_added}}

</td>

<td>

{%ifmatch%}

Group:{{match.0}}<br>

Subgroups:

{{match.1|default_if_none:"none"}}<br>

GroupDict:{{match.2|default_if_none:"none"}}

{%else%}

Nomatchesfound.

{%endif%}

</td>

</tr>

{%endfor%}

</tbody>

</table>

{%else%}

<h1>Youhavenoentries</h1>

<div><ahref="{%url"insert"%}">Insertnewentry.</a></div>

{%endif%}

{%endblockpage-content%}

{%blockfooter%}

{%include"entries/footer.html"%}

{%endblockfooter%}

Itmaytakeyouawhiletogetusedtothetemplatelanguage,butreally,allthereistoitisthecreationofatableusingaforloop.Westartbycheckingwhetherthereareanyentriesand,ifso,wecreateatable.Therearetwocolumns,oneforEntry,andtheotherforthematch.

IntheEntrycolumn,wedisplaytheEntryobject(apartfromtheuser),andintheMatchescolumn,wedisplaythatthree-tuplewecreatedintheEntryListView.Notethattoaccesstheattributesofanobject,weusethesamedotsyntaxweuseinPython,forexample{{entry.pattern}}or{{entry.test_string}},andsoon.

Whendealingwithlistsandtuples,wecannotaccessitemsusingthesquarebracketssyntax,soweusethedotoneaswell({{match.0}}isequivalenttomatch[0],andsoon).Wealsouseafilter,throughthepipe(|)operatortodisplaya

customvalueifamatchisNone.

TheDjangotemplatelanguage(whichisnotproperlyPython)iskeptsimpleforaprecisereason.Ifyoufindyourselflimitedbythelanguage,itmeansyou'reprobablytryingtodosomethinginthetemplatethatshouldactuallybedoneintheview,wherethatlogicismorepertinent.

Allowmetoshowyouacoupleofscreenshotsofthelistandinserttemplates.Thisiswhatthelistofentrieslookslikeformyfather:

Notehowtheuseofthecycletagalternatesthebackgroundcoloroftherowsfromwhitetolightgray.Thoseclassesaredefinedinthemain.cssfile.

TheEntryinsertionpageissmartenoughtoprovideafewdifferentscenarios.Whenyoulandonitatfirst,itpresentsyouwithjustanemptyform.Ifyoufillitincorrectly,itwilldisplayanicemessageforyou(seethefollowingpicture).However,ifyoufailtofillinbothfields,itwilldisplayanerrormessagebeforethem,alertingyouthatthosefieldsarerequired.

Notealsothecustomfooter,whichincludesbothalinktotheentrieslistandalinktothehomepage:

Andthat'sit!YoucanplayaroundwiththeCSSstylesifyouwant.Downloadthecodeforthebookandhavefunexploringandextendingthisproject.Addsomethingelsetothemodel,createandapplyamigration,playwiththetemplates,there'slotstodo!

Djangoisaverypowerfulframework,andofferssomuchmorethanwhatI'vebeenabletoshowyouinthischapter,soyoushoulddefinitelycheckitout.ThebeautyofitisthatDjangoisPython,soreadingitssourcecodeisaveryusefulexercise.

ThefutureofwebdevelopmentComputerscienceisaveryyoungsubject,comparedtootherbranchesofsciencethathaveexistedalongsidehumankindforcenturies.Oneofitsmaincharacteristicsisthatitmovesextremelyfast.Itleapsforwardwithsuchspeedthat,injustafewyears,youcanseechangesthatarecomparabletoreal-worldchangesthattookacenturytohappen.Therefore,asacoder,youmustpayattentiontowhathappensinthisworld,allthetime.

Currently,becausepowerfulcomputersarequitecheapandalmosteveryonehasaccesstothem,thetrendistotrytoavoidputtingtoomuchworkloadonthebackend,andletthefrontendhandlepartofit.Therefore,inthelastfewyears,JavaScriptframeworksandlibraries,suchasjQuery,Backboneand,morerecently,React,havebecomeverypopular.Webdevelopmenthasshiftedfromaparadigmwherethebackendtakescareofhandlingdata,preparingit,andservingittothefrontendtodisplayit,toaparadigmwherethebackendissometimesjustusedasanAPI,asheerdataprovider.ThefrontendfetchesthedatafromthebackendwithanAPIcall,andthenittakescareoftherest.ThisshiftfacilitatestheexistenceofparadigmssuchasSingle-PageApplication(SPA),where,ideally,thewholepageisloadedonceandthenevolves,basedonthecontentthatusuallycomesfromthebackend.E-commercewebsitesthatloadtheresultsofasearchinapagethatdoesn'trefreshthesurroundingstructurearemadewithsimilartechniques.BrowserscanperformasynchronouscallssuchasAsynchronousJavaScriptandXML(AJAX)thatcanreturndatathatcanberead,manipulated,andinjectedbackintothepagewithJavaScriptcode.

So,ifyou'replanningtoworkonwebdevelopment,IstronglysuggestyoutogetacquaintedwithJavaScript(ifyou'renotalready),andalsowithAPIs.Inthelastfewpagesofthischapter,I'llgiveyouanexampleofhowtomakeasimpleAPIusingtwodifferentPythonmicroframeworks:FlaskandFalcon.

WritingaFlaskviewFlask(http://flask.pocoo.org/)isaPythonmicroframework.ItprovidesfarfewerfeaturesthanDjango,butifyourprojectismeanttobeverysmall,thenitmightbeabetterchoice.Inmyexperiencethough,whendeveloperschooseFlaskatthebeginningofaproject,theyeventuallyendupaddingpluginafterplugin,untiltheyhavewhatIcallaDjangoFrankensteinproject.Beingagilemeanshavingperiodicallytospendtimereducingthetechnicaldebtaccumulatedovertime.However,switchingfromFlasktoDjangocanbeadauntingoperation,sowhenstartinganewproject,makesureyouconsideritsevolution.Mycheekyopiniononthismatterisverysimple:IalwaysgowithDjango,asIpersonallypreferittoFlask,butyoumightdisagreewithme,soIwanttoofferyouanexample.

Inyourch14folder,createaflaskfolderwiththefollowingstructure:

$tree-Aflask#fromthech14folder

flask

├──main.py

└──templates

└──main.html

Basically,we'regoingtocodetwosimplefiles:aFlaskapplicationandanHTMLtemplate.FlaskusesJinja2asatemplateengine.It'sextremelypopularandveryfast,tothepointthatevenDjangostartedofferingnativesupportforit:

#flask/templates/main.html

<!doctypehtml>

<title>HellofromFlask</title>

<h1>

{%ifname%}

Hello{{name}}!

{%else%}

Helloshyperson!

{%endif%}

</h1>

Thetemplateisalmostoffensivelysimple.Allitdoesischangethegreetingaccordingtothepresenceofthenamevariable.AbitmoreinterestingistheFlaskapplicationthatrendersit:

#flask/main.py

fromflaskimportFlask,render_template

app=Flask(__name__)

@app.route('/')

@app.route('/<name>')

defhello(name=None):

returnrender_template('main.html',name=name)

Wecreateanappobject,whichisaFlaskapplication.Weonlyfeedthefullyqualifiednameofthemodule,whichisstoredin__name__.

Then,wewriteasimplehelloview,whichtakesanoptionalnameargument.Inthebodyoftheview,wesimplyrenderthemain.htmltemplate,passingtoitthenameargument,regardlessofitsvalue.

What'sinterestingistherouting.DifferentlyfromDjango'swayoftyingupviewsandURLs(theurls.pymodule),inFlaskyoudecorateyourviewswithoneormore@app.routedecorators.Inthiscase,wedecoratetwice:thefirstlinetiestheviewtotherootURL(/),whilethesecondlinetiestheviewtotherootURLwithanameinformation(/<name>).

Changeintotheflaskfolderandtype(makesureyouhaveeitherinstalledFlaskwith$pipinstallflaskorbyinstallingtherequirementsinthesourcecodeforthebook):

$FLASK_APP=main.pyflaskrun

Youcanopenabrowserandgotohttp://127.0.0.1:5000/.ThisURLhasnonameinformation;therefore,youwillseeHelloshyperson!Itiswrittenallniceandbig.TrytoaddsomethingtothatURL,suchashttp://127.0.0.1:5000/Milena.HitEnterandthepagewillchangetoHelloMilena!(soyouwillhavesaidhellotomysister).

Ofcourse,Flaskoffersyoumuchmorethanthis,butwedon'thavetheroomtogothroughamorecomplexexample.It'sdefinitelyworthexploring,though.Severalprojectsuseitsuccessfullyandit'sfunandnicetocreatewebsitesorAPIswithit.Flask'sauthor,ArminRonacher,isasuccessfulandveryprolificcoder.Healsocreatedorcollaboratedonseveralotherinterestingprojects,suchasWerkzeug,Jinja2,Click,andSphinx.HealsocontributedfunctionalitiestothePythonASTmodule.

BuildingaJSONquoteserverinFalconFalcon(http://falconframework.org/)isanothermicroframeworkwritteninPython,whichwasdesignedtobelight,fast,andflexible.Ihaveseenthisrelativelyyoungprojectevolvetobecomesomethingreallypopularduetoitsspeed,whichisimpressive,soI'mhappytoshowyouatinyexampleusingit.We'regoingtobuildanAPIthatreturnsarandomquotefromtheBuddha.

Inyourch14folder,createanewonecalledfalcon.We'llhavetwofiles:quotes.pyandmain.py.Torunthisexample,installFalconandGunicorn($pipinstallfalcongunicornorthefullrequirementsforthebook).Falconistheframework,andGunicorn(GreenUnicorn)isaPythonWSGIHTTPServerforUnix(which,inlayman'sterms,meansthetechnologythatisusedtoruntheserver).

TheWebServerGatewayInterface(WSGI)isasimplecallingconventionforwebserverstoforwardrequeststowebapplicationsorframeworkswritteninPython.Ifyouwishtolearnmore,pleasecheckoutPEP333,whichdefinestheinterface.

Whenyou'reallsetup,startbycreatingthequotes.pyfile:

#falcon/quotes.py

quotes=[

"Thousandsofcandlescanbelightedfromasinglecandle,"

"andthelifeofthecandlewillnotbeshortened."

"Happinessneverdecreasesbybeingshared.",

...

"Peacecomesfromwithin.Donotseekitwithout.",

...

]

Youwillfindthecompletelistofquotesinthesourcecodeforthisbook.Ifyoudon'thaveit,youcaninsteadfillinyourfavoritequotes.Notethatnoteverylinehasacommaattheend.InPython,it'spossibletoconcatenatestringslikethat,aslongastheyareinbrackets(orbraces).It'scalledimplicitconcatenation.

Thecodeforthemainapplicationisnotlong,butitisinteresting:

#falcon/main.py

importjson

importrandom

importfalcon

fromquotesimportquotes

classQuoteResource:

defon_get(self,req,resp):

quote={

'quote':random.choice(quotes),

'author':'TheBuddha'

}

resp.body=json.dumps(quote)

api=falcon.API()

api.add_route('/quote',QuoteResource())

Let'sstartwiththeclass.InDjangowehadagetmethod,inFlaskwedefinedafunction,andherewewriteanon_getmethod,anamingstylethatremindsmeofJava/C#eventhandlers.Ittakesarequestandaresponseargument,bothautomaticallyfedbytheframework.Initsbody,wedefineadictionarywitharandomlychosenquote,andtheauthorinformation.ThenwedumpthatdictionarytoaJSONstringandsettheresponsebodytoitsvalue.Wedon'tneedtoreturnanything,Falconwilltakecareofitforus.

Attheendofthefile,wecreatetheFalconapplication,andwecalladd_routeonittotiethehandlerwehavejustwrittentotheURLwewant.

Whenyou'reallsetup,changetothefalconfolderandtype:

$gunicornmain:api

Then,makearequest(orsimplyopenthepagewithyourbrowser)tohttp://127.0.0.1:8000/quote.WhenIdidit,IgotthisJSONinresponse:

{

quote:"Peacecomesfromwithin.Donotseekitwithout.",

author:"TheBuddha"

}

Withinthefalconfolder,Ihaveleftastress.pymoduleforyou,whichtestshowfastourFalconcodeis.Seeifyoucanmakeitworkbyyourself,itshouldbeveryeasyforyouatthispoint.

Whateverframeworkyouendupusingforyourwebdevelopment,trytokeepyourselfinformedaboutotherchoicestoo.Sometimesyoumaybeinsituationswhereadifferentframeworkistherightwaytogo,andhavingaworkingknowledgeofdifferenttoolswillgiveyouanadvantage.

SummaryInthischapter,wetookalookatwebdevelopment.Wetalkedaboutimportantconcepts,suchastheDRYphilosophyandtheconceptofaframeworkasatoolthatprovidesuswithmanythingsweneedinordertowritecodetoserverequests.WealsotalkedabouttheMTVpattern,andhownicelythesethreelayersplaytogethertorealizearequest-responsepath.

Then,webrieflyintroducedregularexpressions,whichisasubjectofparamountimportance,andit'sthelayerthatprovidesthetoolsforURLrouting.

Therearemanydifferentframeworksoutthere,andDjangoisdefinitelyoneofthebestandmostwidelyused,soit'sworthexploring,especiallyitssourcecode,whichiswellwritten.

Thereareotherveryinterestingandimportantframeworkstoo,suchasFlask.Theyprovidefewerfeaturesbutmightbefaster,bothinexecutiontimeandtosetup.OnethatisextremelyfastistheFalconproject,whosebenchmarksareoutstanding.

It'simportanttogetasolidunderstandingofhowtherequest-responsemechanismworks,andhowthewebingeneralworks,sothateventuallyitwon'tmattertoomuchwhichframeworkyouhavetouse.Youwillbeabletopickitupquicklybecauseitwillonlybeamatterofgettingfamiliarwithawayofdoingsomethingyoualreadyknowalotabout.

Exploreatleastthreeframeworksandtrytocomeupwithdifferentusecasestodecidewhichoneofthemcouldbetheidealchoice.Whenyouareabletomakethatchoice,youwillknowyouhaveagoodenoughunderstandingofthem.

AfarewellIhopethatyouarestillthirstyandthatthisbookwillbejustthefirstofmanystepsyoutaketowardsPython.It'satrulywonderfullanguage,wellworthlearningdeeply.

Ihopethatyouenjoyedthisjourneywithme,Ididmybesttomakeitinterestingforyou.Itsurewasforme,Ihadsuchagreattimewritingthesepages.

Pythonisopensource,sopleasekeepsharingitandconsidersupportingthewonderfulcommunityaroundit.

Untilnexttime,myfriend,farewell!

OtherBooksYouMayEnjoyIfyouenjoyedthisbook,youmaybeinterestedintheseotherbooksbyPackt:

SecretRecipesofthePythonNinjaCodyJackson

ISBN:978-1-78829-487-4

Knowthedifferencesbetween.pyand.pycfilesExplorethedifferentwaystoinstallandupgradePythonpackagesUnderstandtheworkingofthePyPImodulethatenhancesbuilt-indecoratorsSeehowcoroutinesaredifferentfromgeneratorsandhowtheycansimulatemultithreadingGrasphowthedecimalmoduleimprovesfloatingpointnumbersandtheiroperationsStandardizesubinterpreterstoimproveconcurrencyDiscoverPython’sbuilt-indocstringanalyzer

PythonProgrammingBlueprints

DanielFurtado,MarcusPennington

ISBN:978-1-78646-816-1

Learnobject-orientedandfunctionalprogrammingconceptswhiledevelopingprojectsThedosanddon'tsofstoringpasswordsinadatabaseDevelopafullyfunctionalwebsiteusingthepopularDjangoframeworkUsetheBeautifulSouplibrarytoperformwebscrappingGetstartedwithcloudcomputingbybuildingmicroserviceandserverlessapplicationsinAWSDevelopscalableandcohesivemicroservicesusingtheNamekoframeworkCreateservicedependenciesforRedisandPostgreSQL

Leaveareview-letotherreadersknowwhatyouthinkPleaseshareyourthoughtsonthisbookwithothersbyleavingareviewonthesitethatyouboughtitfrom.IfyoupurchasedthebookfromAmazon,pleaseleaveusanhonestreviewonthisbook'sAmazonpage.Thisisvitalsothatotherpotentialreaderscanseeanduseyourunbiasedopiniontomakepurchasingdecisions,wecanunderstandwhatourcustomersthinkaboutourproducts,andourauthorscanseeyourfeedbackonthetitlethattheyhaveworkedwithPackttocreate.Itwillonlytakeafewminutesofyourtime,butisvaluabletootherpotentialcustomers,ourauthors,andPackt.Thankyou!