CSC1015F – Chapter 5, Strings and Input Michelle Kuttel [email protected].

54
CSC1015F – Chapter 5, Strings and Input Michelle Kuttel [email protected]

Transcript of CSC1015F – Chapter 5, Strings and Input Michelle Kuttel [email protected].

Page 1: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

CSC1015F – Chapter 5, Strings and Input

Michelle Kuttel

[email protected]

Page 2: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

The String Data Type

Used for operating on textual information Think of a string as a sequence of

characters

To create string literals, enclose them in single, double, or triple quotes as follows: a = "Hello World" b = 'Python is groovy' c = """Computer says 'Noooo'"""

2

Page 3: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Comments and docstrings It is common practice for the first statement of

function to be a documentation string describing its usage. For example:

def hello:

“””Hello World function”””

print(“Hello”)

print(“I love CSC1015F”)

This is called a “docstring” and can be printed thus:print(hello.__doc__)

3

Page 4: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Comments and docstrings Try printing the doc string for functions you

have been using, e.g.:

print(input.__doc__)

print(eval.__doc__)

4

Page 5: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Checkpoint Str1: Strings and loops. What does the following function do?def oneAtATime(word): for c in word: print("give us a '",c,"' ... ",c,"!", sep='') print("What do you have? -",word)

5

Page 6: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Checkpoint Str1a: Indexing examples does this function do?def str1a(word):

for i in word:

if i in "aeiou":

continue

print(i,end='')

6

0 1 2 3 4 5 6 7 8

H e l l o B o b

Page 7: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Some BUILT IN String functions/methodss.capitalize() Capitalizes the first character. s.count(sub) Count the number of occurences of sub

in ss.isalnum() Checks whether all characters are

alphanumeric. s.isalpha() Checks whether all characters are

alphabetic. s.isdigit() Checks whether all characters are digits.s.islower() Checks whether all characters are low-

ercase. s.isspace() Checks whether all characters are

whitespace.

7

Page 8: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Some BUILT IN String functions/methodss.istitle() Checks whether the string is a

title- cased string (first letter of each word capitalized).

s.isupper() Checks whether all characters are uppercase.

s.join(t) Joins the strings in sequence t with s as a separator.

s.lower() Converts to lowercase. s.lstrip([chrs]) Removes leading

whitespace or characters supplied in chrs. s.upper() Converts a string to uppercase.

8

Page 9: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Some BUILT IN String functions/methodss.replace(oldsub,newsub) Replace all

occurrences of oldsub in s with newsub

s.find(sub) Find the first occurrence of sub in s

9

Page 10: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

BUILT IN String functions/methods

Try printing the doc string for str functions:

print(str.isdigit.__doc__)

10

Page 11: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

The String Data TypeAs string is a sequence of characters, we can

access individual characters called indexing

form:<string>[<expr>]

The last character in a string of n characters has index n-1

11

0 1 2 3 4 5 6 7 8

H e l l o B o b

Page 12: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

String functions: len len tells you how many characters there are

in a string:

len(“Jabberwocky”)

len(“Twas brillig and the slithy toves did gyre and gimble in the wabe”)

12

Page 13: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Checkpoint Str2: Indexing examplesWhat does this function do?

def str2(word):

for i in range(0,len(word),2):

print(word[i],end='')

13

0 1 2 3 4 5 6 7 8

H e l l o B o b

Page 14: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

More Indexing examples - indexing from the endWhat is the output of these lines?greet =“Hello Bob”

greet[-1]

greet[-2]

greet[-3]

14

0 1 2 3 4 5 6 7 8

H e l l o B o b

Page 15: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Checkpoint Str3What is the output of these lines?def str3(word):

for i in range(len(word)-1,-1,-1):

print(word[i],end='')

15

0 1 2 3 4 5 6 7 8

H e l l o B o b

Page 16: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Chopping strings into pieces: slicingThe previous examples can be done much more

simply:

slicing indexes a range – returns a substring, starting at the first position and running up to, but not including, the last position.

16

Page 17: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Examples - slicingWhat is the output of these lines?greet =“Hello Bob”

greet[0:3]

greet[5:9]

greet[:5]

greet[5:]

greet[:]

17

0 1 2 3 4 5 6 7 8

H e l l o B o b

Page 18: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Checkpoint Str4: Strings and loops. What does the following function do?def sTree(word): for i in range(len(word)): print(word[0:i+1])

18

Page 19: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Checkpoint Str5: Strings and loops. What does the following code output?def sTree2(word):

step=len(word)//3

for i in range(step,step*3+1,step):

for j in range(i):

print(word[0:j+1])

print("**\n**\n")

sTree2(“strawberries”)

19

Page 20: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

More info on slicing The slicing operator may be given an optional

stride, s[i:j:stride], that causes the slice to skip elements. Then, i is the starting index; j is the ending index; and

the produced subsequence is the elements s[i], s[i+stride], s[i+2*stride], and so forth until index j is reached (which is not included).

The stride may also be negative. If the starting index is omitted, it is set to the

beginning of the sequence if stride is positive or the end of the sequence if stride is negative.

If the ending index j is omitted, it is set to the end of the sequence if stride is positive or the beginning of the sequence if stride is negative.

20

Page 21: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

More on slicing Here are some examples with strides:

a = "Jabberwocky”b = a[::2] # b = 'Jbewcy'c = a[::-2] # c = 'ycwebJ'd = a[0:5:2] # d = 'Jbe'e = a[5:0:-2] # e = 'rba'f = a[:5:1] # f = 'Jabbe'g = a[:5:-1] # g = 'ykcow'h = a[5::1] # h = 'rwocky'i = a[5::-1] # i = 'rebbaJ'j = a[5:0:-1] # 'rebba'

21

Page 22: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Checkpoint Str6: stridesWhat is the output of these lines?greet =“Hello Bob”

greet[8:5:-1]

22

0 1 2 3 4 5 6 7 8

H e l l o B o b

Page 23: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Checkpoint Str7: Slicing with stridesHow would you do this function in one line with no

loops?

def str2(word):

for i in range(0,len(word),2):

print(word[i],end='')

23

0 1 2 3 4 5 6 7 8

H e l l o B o b

Page 24: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Checkpoint Str8: What does this code display?

#checkpointStr8.py

def crunch(s):

m=len(s)//2

print(s[0],s[m],s[-1],sep='+')

crunch("omelette")

crunch("bug")

24

Page 25: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Example: filters Pirate, Elmer Fudd, Swedish Cheff produce parodies of English speech

How would you write one in Python?

25

Page 26: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Example: Genetic Algorithms (GA’s) GA’s attempt to mimic the process of natural

evolution in a population of individuals use the principles of selection and evolution to

produce several solutions to a given problem. biologically-derived techniques such as inheritance,

mutation, natural selection, and recombination a computer simulation in which a population

of abstract representations (called chromosomes) of candidate solutions (called individuals) to an optimization problem evolves toward better solutions.

over time, those genetic changes which enhance the viability of an organism tend to predominate

Page 27: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Bioinformatics Example: Crossover (recombination)

Evolution works at the chromosome level through the reproductive process portions of the genetic information of each parent are

combined to generate the chromosomes of the offspring

this is called crossover

Page 28: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Crossover MethodsSingle-Point Crossover

randomly-located cut is made at the pth bit of each parent and crossover occurs

produces 2 different offspring

Page 29: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Gene splicing example (for genetic algorithms) We can now do a cross-over!

Crossover3.py

29

Page 30: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Example: palindrome program

palindrome |ˈpalɪndrəʊm|nouna word, phrase, or sequence that reads the same backward as forward,

e.g., madam or nurses run

In Python, write a program to check whether a word is a palindrome.

You don’t need to use loops…

30

Page 31: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

String representation and message encoding On the computer hardware, strings are also

represented as zeros and ones. Computers represent characters as numeric

codes, a unique code for each digit. an entire string is stored by translating each

character to its equivalent code and then storing the whole thing as as a sequence of binary numbers in computer memory

There used to be a number of different codes for storing characters which caused serious headaches!

31

Page 32: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

ASCII (American Standard Code for Information Interchange) An important character encoding standard

are used to represent numbers found on a typical (American) computer keyboard as well as some special control codes used for sending and recieveing information

A-Z uses values in range 65-90 a-z uses values in range 97-122

in use for a long time: developed for teletypes

American-centric Extended ASCII codes have been developed

32

Page 33: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

33

Page 34: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Unicode A character set that includes all the ASCII

characters plus many more exotic characters http://www.unicode.org

34

Python supports Unicode standard

ord returns numeric code

of a character chr

returns character corresponding to a code Unicodes for Cuneiform

Page 35: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Characters in memory Smallest addressable piece of memory is

usually 8 bits, or a byte how many characters can be represented by a

byte?

35

Page 36: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Characters in memory Smallest addressable piece of memory is

usually 8 bits, or a byte how many characters can be represented by a

byte? 256 different values (28) is this enough?

36

Page 37: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Characters in memory Smallest addressable piece of memory is

usually 8 bits, or a byte 256 different values is enough for ASCII (only a 7

bit code) but not enough for UNICODE, with 100 000+

possible characters UNICODE uses different schemes for packing

UNICODE characters into sequences of bytes UTF-8 most common

uses a single byte for ASCIIup to 4 bytes for more exotic characters

37

Page 38: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Comparing strings conditions may compare numbers or

strings when strings are compared, the order is lexographic

strings are put into order based on their Unicode values

e.g “Bbbb” < “bbbb”“B” <”a”

38

Page 39: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

The min function…min(iterable[, key=func]) -> valuemin(a, b, c, ...[, key=func]) -> value

With a single iterable argument, return its smallest item.

With two or more arguments, return the smallest argument.

39

Page 40: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Checkpoint: What do these statements evaluate as?

min(“hello”)

min(“983456”)

min(“Peanut”)

40

Page 41: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Example 2: DNA Reverse Complement Algorithm

A DNA molecule consists of two strands of nucleotides. Each nucleotide is one of the four molecules adenine, guanine, thymine, or cytosine. Adenine always pairs with

guanine and thymine always pairs with cytosine.

A pair of matched nucleotides is called a base pair

Task: write a Python program to calculate the reverse complement of any DNA strand

41

Page 42: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Scrabble letter scores Different languages

should have different scores for the letters how do you work this

out? what is the algorithm?

42

Page 43: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Related Example: Calculating character (base) frequency DNA has the alphabet ACGT

BaseFrequency.py

43

Page 44: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Why would you want to do this? You can calculate the

melting temperature of DNA from the base pair percentage in a DNA References:

Breslauer et al. Proc. Natl. Acad. Sci. USA 83, 3746-3750

Baldino et al. Methods in Enzymol. 168, 761-777).

44

Page 45: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Input/Output as string manipulation eval evaluates a string as a Python expression.

Very general and can be used to turn strings into nearly any other Python data type

The “Swiss army knife” of string conversion eval("3+4")

Can also use Python numeric type conversion functions: int(“4”) float(“4”)

But string must be a numeric literal of the appropriate form, or will get an error

Can also convert numbers to strings with str function

45

Page 46: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

String formatting with formatThe built-in s.format() method is used to

perform string formatting. The {} are slots show where the values will

go. You can “name” the values, or access them

by their position (counting from zero).

>>> a = "Your name is {0} and your age is {age}"

>>> a.format("Mike", age=40) 'Your name is Mike and your age is 40'

46

Page 47: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Example 4: Better output for Calculating character (base) frequency BaseFrequency2.py

47

Page 48: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

More on formatYou can add an optional format specifier to each

placeholder using a colon (:) to specify column widths, decimal places, and alignment.

general format is: [[fill[align]][sign][0][width] [.precision][type]

where each part enclosed in [] is optional. The width specifier specifies the minimum field

width to use the align specifier is one of '<', '>’, or '^' for left,

right, and centered alignment within the field. An optional fill character fill is used to pad the

space

48

Page 49: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

More on formatFor example:name = "Elwood"

r = "{0:<10}".format(name) # r = 'Elwood '

r = "{0:>10}".format(name) # r = ' Elwood'

r = "{0:^10}".format(name) # r = ' Elwood '

r = "{0:=^10}".format(name) # r = '==Elwood==‘

49

Page 50: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

format: type specifier indicates the type of data.

50

Page 51: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

More on format The precision part supplies the number of digits of

accuracy to use for decimals. If a leading '0' is added to the field width for numbers, numeric values are padded with leading 0s to fill the space.

x = 42

r = '{0:10d}'.format(x) # r = ' 42'

r = '{0:10x}'.format(x) # r = ' 2a'

r = '{0:10b}'.format(x) # r = ' 101010'

r = '{0:010b}'.format(x) # r = '0000101010'

y = 3.1415926

r = '{0:10.2f}'.format(y) # r = ' 3.14’

r = '{0:10.2e}'.format(y) # r = ' 3.14e+00'

r = '{0:+10.2f}'.format(y) # r = ' +3.14'

r = '{0:+010.2f}'.format(y) # r = '+000003.14'

r = '{0:+10.2%}'.format(y) # r = ' +314.16%'

51

Page 52: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Example: FormatEg.py

52

Page 53: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Checkpoint: Write down the exact output for the following codetxt="{name}-{0}*{y}+{1}”

print(txt.format("cat","dog",name=”hat",y="rat"))

print(txt.format(1,0,name=2,y=3))

print(txt.format(2,3))

53

Page 54: CSC1015F – Chapter 5, Strings and Input Michelle Kuttel mkuttel@cs.uct.ac.za.

Format to improve formatting BaseFrequency2.py

restuarant2.py

54