Lesson2: String Operations, Writing Scripts › ~naraehan › ling1901 › Lesson2.pdf · String...

Post on 07-Jun-2020

19 views 0 download

Transcript of Lesson2: String Operations, Writing Scripts › ~naraehan › ling1901 › Lesson2.pdf · String...

Lesson2: String Operations,

Writing Scripts

Fundamentals of Text Processing for Linguists

Na-Rae Han

Objectives

Writing and executing Python scripts

Learn Python basic syntax

Variable assignments: = vs. +=

Equality comparison: ==

String operations

Boolean operators: and, or, not

1/15/2014 2

Interactive programming vs. scripts

1/15/2014 3

We have been programming INTERACTIVELY through IDLE shell.

We will now try writing and running a stand-alone

Python SCRIPT.

x = 'a'+'b' print x … …

IDLE interprets and responds to each line

of code you type in

executed Python

script file

You have to retype the code next time

Your IDLE shell

1. Open up a new window (Ctrl+n).

2. Python editor window opens up. Type in:

3. Save the script as "hello.py".

Don't forget to include ".py" extension!

4. Run the script by selecting "Run" > "Run Module" or pressing F5.

5. Success! The command executes in the IDLE shell window.

Python script in IDLE editor mode

1/15/2014 4

print "Hello, world!!"

String operations

1/15/2014 5

String: A single piece of text, composed of a sequence of letters.

Operations on string objects:

print statement. prints string

len() returns integer

+, += returns string

.endswith(), .startswith() returns True/False

in returns True/False

.upper(), .lower() returns string

.replace() returns string

.split() splits a string into a list, returns it

Concatenation and variable assignment

1/15/2014 6

>>> v = 'walk'

>>> v + 'ed'

'walked'

>>> v

'walk'

>>> vd = v + 'ed'

>>> vd

'walked'

>>> v

'walk'

>>> v = v + 'ed'

>>> v

'walked'

>>> v == vd

True

>>> v = 'walk'

>>> v + 'ed'

'walked'

>>> v

'walk'

>>> vd = v + 'ed'

>>> vd

'walked'

>>> v

'walk'

>>> v = v + 'ed'

>>> v

'walked'

>>> v == vd

True

Concatenation does not change original

1/15/2014 7

Value of v is unchanged!

Concatenation operation

>>> v = 'walk'

>>> v + 'ed'

'walked'

>>> v

'walk'

>>> vd = v + 'ed'

>>> vd

'walked'

>>> v

'walk'

>>> v = v + 'ed'

>>> v

'walked'

>>> v == vd

True

New variable

1/15/2014 8

New variable vd is assigned to the output

of concatenation

v is still unaffected

>>> v = 'walk'

>>> v + 'ed'

'walked'

>>> v

'walk'

>>> vd = v + 'ed'

>>> vd

'walked'

>>> v

'walk'

>>> v = v + 'ed'

>>> v

'walked'

>>> v == vd

True

Changing the original variable

1/15/2014 9

Here, v is assigned a new value: its former self suffixed with 'ed'

>>> v = 'walk'

>>> v + 'ed'

'walked'

>>> v

'walk'

>>> vd = v + 'ed'

>>> vd

'walked'

>>> v

'walk'

>>> v = v + 'ed'

>>> v

'walked'

>>> v == vd

True

==: Equality comparison

1/15/2014 10

Double equal sign: confirms that

v and vd have an equal value!

11

Equality comparison vs. assignment

name = value

=

value1 == value2 ==

Attaches a name to a value

Compares two values, returns True or False

>>> num = 5

>>> num = num + 2

>>> num

7

Right Hand Side (RHS) first

1/15/2014 12

num has different values in one statement. How could this be?

Answer: Right hand side is evaluated first, and then variable assignment happens.

This num has value 5

This num has value 7

>>> num = 5

>>> num = num + 2

Right Hand Side (RHS) first

1/15/2014 13

int 5

1. Integer 5 is created in Python memory

Right Hand Side (RHS) first

1/15/2014 14

1. Integer 5 is created in Python memory

2. Variable num is created, points to the memory location

int 5 num >>> num = 5

>>> num = num + 2

Right Hand Side (RHS) first

1/15/2014 15

1. Integer 5 is created in Python memory

2. Variable num is created, points to the memory location

3. num + 2 is evaluated; value 7 is created in memory

int 5

int 7

num >>> num = 5

>>> num = num + 2

Right Hand Side (RHS) first

1/15/2014 16

1. Integer 5 is created in Python memory

2. Variable num is created, points to the memory location

3. num + 2 is evaluated; value 7 is created in memory

4. The name num is then attached to integer 7 in Python memory

int 5

int 7

num >>> num = 5

>>> num = num + 2

Right Hand Side (RHS) first

1/15/2014 17

1. Integer 5 is created in Python memory

2. Variable num is created, points to the memory location

3. num + 2 is evaluated; value 7 is created in memory

4. The name num is then attached to integer 7 in Python memory

5. 5 is no longer needed; gets freed up from memory

int 5

int 7

num ✘ >>> num = 5

>>> num = num + 2

+ and +=

1/15/2014 18

>>> homer = 'doh'

>>> print homer

doh

>>> homer = homer + '!'

>>> print homer

doh!

>>> homer = homer + '!'

>>> print homer

doh!!

>>> homer += '!'

>>> print homer

doh!!!

>>> homer += '!'

>>> print homer

doh!!!!

>>> homer = 'doh'

>>> print homer

doh

>>> homer = homer + '!'

>>> print homer

doh!

>>> homer = homer + '!'

>>> print homer

doh!!

>>> homer += '!'

>>> print homer

doh!!!

>>> homer += '!'

>>> print homer

doh!!!!

+ and +=

1/15/2014 19

Do exactly the same thing:

suffixing homer with '!'

Augmented assignments

1/15/2014 20

foo += 10

foo -= 10

foo *= 10

foo /= 10

foo **= 10

foo %= 10

foo = foo + 10

foo = foo - 10

foo = foo * 10

foo = foo / 10

foo = foo ** 10

foo = foo % 10

Subtract 10

Multiply by 10

Power to 10

Divide by 10

Modulo 10

Practice

1/15/2014 21

2 minutes

Repeated commands: Use and!

>>> chor = 'tra'

>>> chor = chor + '-la'

>>> print chor

tra-la

>>> chor += '-la'

>>> print chor

tra-la-la

>>> chor += '-la'

>>> print chor

tra-la-la-la

>>> num = 2

>>> num += 1

>>> print num

3

>>> num += 1

>>> print num

4

>>> num *= 2

>>> print num

8

>>> num *= 2

>>> print num

16

Also try: -= /= **=

.startswith(), .endswith()

1/15/2014 22

>>> len('cat')

3

>>> 'cat'.endswith('t')

True

>>> 'cat'.startswith('t')

False

>>> 'cat'.endswith('')

True

>>> c = 'cat'

>>> c.endswith('at')

True

>>> c.endswith('cat')

True

>>> c.startswith('cat')

True

>>> len('cat')

3

>>> 'cat'.endswith('t')

True

>>> 'cat'.startswith('t')

False

>>> 'cat'.endswith('')

True

>>> c = 'cat'

>>> c.endswith('at')

True

>>> c.endswith('cat')

True

>>> c.startswith('cat')

True

.startswith(), .endswith()

1/15/2014 23

.endswith() can be called on a string

or a variable whose value is a string

Boolean operators: and, or, not

1/15/2014 24

>>> 'cat'.endswith('at')

True

>>> not 'cat'.endswith('at')

False

>>> not True

False

>>> not False

True

>>> 'cat'.endswith('at') and 'dog'.endswith('g')

True

>>> 'cat'.endswith('at') and 'dog'.endswith('at')

False

>>> 'cat'.endswith('at') or 'dog'.endswith('at')

True

>>> 'cat'.endswith('at') and not 'dog'.endswith('at')

True

>>> 'cat'.endswith('at')

True

>>> not 'cat'.endswith('at')

False

>>> not True

False

>>> not False

True

>>> 'cat'.endswith('at') and 'dog'.endswith('g')

True

>>> 'cat'.endswith('at') and 'dog'.endswith('at')

False

>>> 'cat'.endswith('at') or 'dog'.endswith('at')

True

>>> 'cat'.endswith('at') and not 'dog'.endswith('at')

True

Boolean operators: and, or, not

1/15/2014 25

not is a unary operator: not A flips the truth value of A

>>> 'cat'.endswith('at')

True

>>> not 'cat'.endswith('at')

False

>>> not True

False

>>> not False

True

>>> 'cat'.endswith('at') and 'dog'.endswith('g')

True

>>> 'cat'.endswith('at') and 'dog'.endswith('at')

False

>>> 'cat'.endswith('at') or 'dog'.endswith('at')

True

>>> 'cat'.endswith('at') and not 'dog'.endswith('at')

True

Boolean operators: and, or, not

1/15/2014 26

and and or are binary operators: they take

two arguments A and B A or B

in as a substring operation

1/15/2014 27

>>> 'a' in 'cat'

True

>>> 'ca' in 'cat'

True

>>> 'cat' in 'scattered'

True

>>> 'et' in 'scattered'

False

>>> '' in 'scattered'

True

>>> 'a' in 'cat'

True

>>> 'ca' in 'cat'

True

>>> 'cat' in 'scattered'

True

>>> 'et' in 'scattered'

False

>>> '' in 'scattered'

True

in as a substring operator

1/15/2014 28

in as a "substring" operator A in B

True if A is a substring of B False otherwise

>>> 'a' in 'cat'

True

>>> 'ca' in 'cat'

True

>>> 'cat' in 'scattered'

True

>>> 'et' in 'scattered'

False

>>> '' in 'scattered'

True

>>> 'cat' not in 'scattered'

False

not in

1/15/2014 29

negating in with not

The empty string

1/15/2014 30

>>> ''

''

>>> len('')

0

>>> 'cat'.startswith('')

True

>>> 'cat'.endswith('')

True

>>> ''.startswith('')

True

>>> ''.endswith('')

True

>>> ''.endswith('t')

False

>>> '' in 'cat'

True

>>> '' in ''

True

>>> ''

''

>>> len('')

0

>>> 'cat'.startswith('')

True

>>> 'cat'.endswith('')

True

>>> ''.startswith('')

True

>>> ''.endswith('')

True

>>> ''.endswith('t')

False

>>> '' in 'cat'

True

>>> '' in ''

True

Strings can have 0 length

1/15/2014 31

'' is an empty string; the length is 0

>>> ''

''

>>> len('')

0

>>> 'cat'.startswith('')

True

>>> 'cat'.endswith('')

True

>>> ''.startswith('')

True

>>> ''.endswith('')

True

>>> ''.endswith('t')

False

>>> '' in 'cat'

True

>>> '' in ''

True

'' starts and ends every string

1/15/2014 32

The empty string '' begins and ends

every string – even the empty string

itself!

>>> ''

''

>>> len('')

0

>>> 'cat'.startswith('')

True

>>> 'cat'.endswith('')

True

>>> ''.startswith('')

True

>>> ''.endswith('')

True

>>> ''.endswith('t')

False

>>> '' in 'cat'

True

>>> '' in ''

True

'' is a substring of every string

1/15/2014 33

The empty string '' is a substring of every string –

even the empty string itself!

>>> not True

False

>>> not False

True

>>> 'cat'.endswith('at') and 'dog'.endswith('g')

True

>>> 'cat'.endswith('at') and 'dog'.endswith('at')

False

>>> 'cat'.endswith('at') or 'dog'.endswith('at')

True

>>> 'cat'.endswith('at') and not 'dog'.endswith('at')

True

>>> 'ca' in 'cat'

True

>>> 'cat' in 'scattered'

True

>>> 'et' in 'scattered'

False

Practice

1/15/2014 34

2 minutes

Also try with empty string ''

.upper(), .lower(), .capitalize()

1/15/2014 35

>>> sign = 'Please be quiet'

>>> sign.upper()

'PLEASE BE QUIET'

>>> sign.lower()

'please be quiet'

>>> sign.capitalize()

'Please be quiet'

>>> 'hello!'.upper()

'HELLO!'

>>> 'hello!'.upper().lower()

'hello!'

>>> 'hello!'.capitalize()

'Hello!'

.upper(), .lower()

1/15/2014 36

>>> sign = 'Please be quiet'

>>> sign.upper()

'PLEASE BE QUIET'

>>> sign.lower()

'please be quiet'

>>> sign.capitalize()

'Please be quiet'

>>> 'hello!'.upper()

'HELLO!'

>>> 'hello!'.upper().lower()

'hello!'

>>> 'hello!'.capitalize()

'Hello!'

capitalize and lower-case every character in

a string

.capitalize()

1/15/2014 37

>>> sign = 'Please be quiet'

>>> sign.upper()

'PLEASE BE QUIET'

>>> sign.lower()

'please be quiet'

>>> sign.capitalize()

'Please be quiet'

>>> 'hello!'.upper()

'HELLO!'

>>> 'hello!'.upper().lower()

'hello!'

>>> 'hello!'.capitalize()

'Hello!'

capitalizes the first character

in the string

>>> 'hello!'.upper()

'HELLO!'

>>> 'hello!'.upper().lower()

'hello!'

>>> 'hello!'.capitalize()

'Hello!'

>>> sign = 'Please be quiet'

>>> sign.upper()

'PLEASE BE QUIET'

>>> sign.lower()

'please be quiet'

>>> sign.capitalize()

'Please be quiet'

Nested functions

1/15/2014 38

uppercases 'hello!' and then

lowercases it back!

.replace()

1/15/2014 39

>>> foo = 'hello, world!'

>>> foo.replace('l', 'r')

'herro, worrd!'

>>> foo.replace('l', 'r').replace('r', 'l')

'hello, wolld!'

>>> faa = 'colour rumour'

>>> faa.replace('our', 'or')

'color rumor'

>>> mary = 'Mary had a little lamb'

>>> print mary

Mary had a little lamb

>>> mary.replace(' ', '')

'Maryhadalittlelamb'

.replace()

1/15/2014 40

>>> foo = 'hello, world!'

>>> foo.replace('l', 'r')

'herro, worrd!'

>>> foo.replace('l', 'r') .replace('r', 'l')

'hello, wolld!'

>>> faa = 'colour rumour'

>>> faa.replace('our', 'or')

'color rumor'

>>> mary = 'Mary had a little lamb'

>>> print mary

Mary had a little lamb

>>> mary.replace(' ', '')

'Maryhadalittlelamb'

replaces every instance of 'l' with 'r'

Stacking .replace()

1/15/2014 41

>>> foo = 'hello, world!'

>>> foo.replace('l', 'r')

'herro, worrd!'

>>> foo.replace('l', 'r') .replace('r', 'l')

'hello, wolld!'

>>> faa = 'colour rumour'

>>> faa.replace('our', 'or')

'color rumor'

>>> mary = 'Mary had a little lamb'

>>> print mary

Mary had a little lamb

>>> mary.replace(' ', '')

'Maryhadalittlelamb'

replaces every instance of 'l' with 'r',

and then replaces every 'r' with 'l'

.replace()

1/15/2014 42

>>> foo = 'hello, world!'

>>> foo.replace('l', 'r')

'herro, worrd!'

>>> foo.replace('l', 'r') .replace('r', 'l')

'hello, wolld!'

>>> faa = 'colour rumour'

>>> faa.replace('our', 'or')

'color rumor'

>>> mary = 'Mary had a little lamb'

>>> print mary

Mary had a little lamb

>>> mary.replace(' ', '')

'Maryhadalittlelamb'

replaces a string ('our') with another string ('or')

Text has been Americanized!

.replace() for removing

1/15/2014 43

>>> foo = 'hello, world!'

>>> foo.replace('l', 'r')

'herro, worrd!'

>>> foo.replace('l', 'r') .replace('r', 'l')

'hello, wolld!'

>>> faa = 'colour rumour'

>>> faa.replace('our', 'or')

'color rumor'

>>> mary = 'Mary had a little lamb'

>>> print mary

Mary had a little lamb

>>> mary.replace(' ', '')

'Maryhadalittlelamb'

Removing every space: achieved by

replacing '' with the empty string ''

Splitting a string with .split()

1/15/2014 44

>>> mary = 'Mary had a little lamb'

>>> mary

'Mary had a little lamb'

>>> print mary

Mary had a little lamb

>>> mary.split()

['Mary', 'had', 'a', 'little', 'lamb']

>>> mary.split(' ')

['Mary', 'had', 'a', 'little', 'lamb']

>>> mary.split('a')

['M', 'ry h', 'd ', ' little l', 'mb']

>>> len(mary)

22

>>> len(mary.split())

5

>>> mary = 'Mary had a little lamb'

>>> mary

'Mary had a little lamb'

>>> print mary

Mary had a little lamb

>>> mary.split()

['Mary', 'had', 'a', 'little', 'lamb']

>>> mary.split(' ')

['Mary', 'had', 'a', 'little', 'lamb']

>>> mary.split('a')

['M', 'ry h', 'd ', ' little l', 'mb']

>>> len(mary)

22

>>> len(mary.split())

5

Splitting a string with .split()

1/15/2014 45

splits on

every ''

splits on every 'a'

>>> mary = 'Mary had a little lamb'

>>> mary

'Mary had a little lamb'

>>> print mary

Mary had a little lamb

>>> mary.split()

['Mary', 'had', 'a', 'little', 'lamb']

>>> mary.split(' ')

['Mary', 'had', 'a', 'little', 'lamb']

>>> mary.split('a')

['M', 'ry h', 'd ', ' little l', 'mb']

>>> len(mary)

22

>>> len(mary.split())

5

Splitting a string with .split()

1/15/2014 46

No separator given: splits on

whitespace

splits on

every ''

Same result in this case, but

not always

>>> mary = 'Mary had a little lamb'

>>> mary

'Mary had a little lamb'

>>> print mary

Mary had a little lamb

>>> mary.split()

['Mary', 'had', 'a', 'little', 'lamb']

>>> mary.split(' ')

['Mary', 'had', 'a', 'little', 'lamb']

>>> mary.split('a')

['M', 'ry h', 'd ', ' little l', 'mb']

>>> len(mary)

22

>>> len(mary.split())

5

len() works with strings and lists

1/15/2014 47

len(string) returns the length

of string: # of characters

len(list) returns the # of

items in a list

Splitting and whitespace characters

1/15/2014 48

>>> hi = 'Hello mother,\nHello father.'

>>> print hi

Hello mother,

Hello father.

>>> hi.split()

['Hello', 'mother,', 'Hello', 'father.']

>>> hi.split('\n')

['Hello mother,', 'Hello father.']

>>> record = 'Lisa Simpson\tBorn 8/12 2002\tSpringfield'

>>> print record

Lisa Simpson Born 8/12 2002 Springfield

>>> record.split('\t')

['Lisa Simpson', 'Born 8/12 2002', 'Springfield']

Nope!

>>> la = 'la di da'

>>> la.split()

['la', 'di', 'da']

>>> la.split(' ')

['la', 'di', '', '', 'da']

>>> foo = 'colorless green ideas'

>>> foo.split('e')

['colorl', 'ss gr', '', 'n id', 'as']

1/15/2014 49

So, as long as a string contains only

spaces and no other whitespace, .split()

is the same as .split(' '), right?

three spaces ' '

.split() default behavior

1/15/2014 50

If no separator is specified:

Splits on whitespace (includes space, line break \n, tab \t).

Repeated whitespaces do NOT result in empty string tokens.

If a separator is specified, then:

Repetition DOES result in empty string tokens.

>>> la = 'la di da'

>>> la.split()

['la', 'di', 'da']

>>> la.split(' ')

['la', 'di', '', '', 'da']

>>> foo = 'colorless green ideas'

>>> foo.split('e')

['colorl', 'ss gr', '', 'n id', 'as']

Practice 1

1/15/2014 51

2 minutes

Practice 1

1/15/2014 52

2 minutes

Practice 2

1/15/2014 53

2 minutes

Practice 2

1/15/2014 54

2 minutes

Practice 3

1/15/2014 55

2 minutes

Practice 3

1/15/2014 56

2 minutes

Practice 4

1/15/2014 57

Write a Python script that:

internally stores the "Fox in Sox" text as a string variable named "fox", http://www.pitt.edu/~naraehan/ling1901/sample-texts.txt

prints out the text,

prints out how many lines are in the text,

prints out how many words are in the text,

prints out how many characters are in the text,

prints out how many times 'ree' occurs in the text.

Use .count() method. Try and figure it out!

3 minutes

A note on Python syntax

1/15/2014 58

✔ ✔

if... : elif ... : else :

Conditional

Looking ahead

1/15/2014 59

>>> grade = 87

>>> if grade >= 90 :

print 'You got an A'

elif grade >= 80 :

print 'You got a B'

else :

print 'Try harder!'

You got a B

>>>

Wrap-up

1/15/2014 60

If you want to try more commands, visit:

A Beginner's Python Tutorial, Lesson 6

http://www.sthurlow.com/python/lesson06/

Exercise #2

http://www.pitt.edu/~naraehan/ling1991/exercise.html#ex2

Due Tuesday midnight