Post on 07-Jun-2020
Lesson2: String Operations,
Writing Scripts
Fundamentals of Text Processing for Linguists
Na-Rae Han
Objectives
Writing and executing Python scripts
Learn Python basic syntax
Variable assignments: = vs. +=
Equality comparison: ==
String operations
Boolean operators: and, or, not
1/15/2014 2
Interactive programming vs. scripts
1/15/2014 3
We have been programming INTERACTIVELY through IDLE shell.
We will now try writing and running a stand-alone
Python SCRIPT.
x = 'a'+'b' print x … …
IDLE interprets and responds to each line
of code you type in
executed Python
script file
You have to retype the code next time
Your IDLE shell
1. Open up a new window (Ctrl+n).
2. Python editor window opens up. Type in:
3. Save the script as "hello.py".
Don't forget to include ".py" extension!
4. Run the script by selecting "Run" > "Run Module" or pressing F5.
5. Success! The command executes in the IDLE shell window.
Python script in IDLE editor mode
1/15/2014 4
print "Hello, world!!"
String operations
1/15/2014 5
String: A single piece of text, composed of a sequence of letters.
Operations on string objects:
print statement. prints string
len() returns integer
+, += returns string
.endswith(), .startswith() returns True/False
in returns True/False
.upper(), .lower() returns string
.replace() returns string
.split() splits a string into a list, returns it
Concatenation and variable assignment
1/15/2014 6
>>> v = 'walk'
>>> v + 'ed'
'walked'
>>> v
'walk'
>>> vd = v + 'ed'
>>> vd
'walked'
>>> v
'walk'
>>> v = v + 'ed'
>>> v
'walked'
>>> v == vd
True
>>> v = 'walk'
>>> v + 'ed'
'walked'
>>> v
'walk'
>>> vd = v + 'ed'
>>> vd
'walked'
>>> v
'walk'
>>> v = v + 'ed'
>>> v
'walked'
>>> v == vd
True
Concatenation does not change original
1/15/2014 7
Value of v is unchanged!
Concatenation operation
>>> v = 'walk'
>>> v + 'ed'
'walked'
>>> v
'walk'
>>> vd = v + 'ed'
>>> vd
'walked'
>>> v
'walk'
>>> v = v + 'ed'
>>> v
'walked'
>>> v == vd
True
New variable
1/15/2014 8
New variable vd is assigned to the output
of concatenation
v is still unaffected
>>> v = 'walk'
>>> v + 'ed'
'walked'
>>> v
'walk'
>>> vd = v + 'ed'
>>> vd
'walked'
>>> v
'walk'
>>> v = v + 'ed'
>>> v
'walked'
>>> v == vd
True
Changing the original variable
1/15/2014 9
Here, v is assigned a new value: its former self suffixed with 'ed'
>>> v = 'walk'
>>> v + 'ed'
'walked'
>>> v
'walk'
>>> vd = v + 'ed'
>>> vd
'walked'
>>> v
'walk'
>>> v = v + 'ed'
>>> v
'walked'
>>> v == vd
True
==: Equality comparison
1/15/2014 10
Double equal sign: confirms that
v and vd have an equal value!
11
Equality comparison vs. assignment
name = value
=
value1 == value2 ==
Attaches a name to a value
Compares two values, returns True or False
>>> num = 5
>>> num = num + 2
>>> num
7
Right Hand Side (RHS) first
1/15/2014 12
num has different values in one statement. How could this be?
Answer: Right hand side is evaluated first, and then variable assignment happens.
This num has value 5
This num has value 7
>>> num = 5
>>> num = num + 2
Right Hand Side (RHS) first
1/15/2014 13
int 5
1. Integer 5 is created in Python memory
Right Hand Side (RHS) first
1/15/2014 14
1. Integer 5 is created in Python memory
2. Variable num is created, points to the memory location
int 5 num >>> num = 5
>>> num = num + 2
Right Hand Side (RHS) first
1/15/2014 15
1. Integer 5 is created in Python memory
2. Variable num is created, points to the memory location
3. num + 2 is evaluated; value 7 is created in memory
int 5
int 7
num >>> num = 5
>>> num = num + 2
Right Hand Side (RHS) first
1/15/2014 16
1. Integer 5 is created in Python memory
2. Variable num is created, points to the memory location
3. num + 2 is evaluated; value 7 is created in memory
4. The name num is then attached to integer 7 in Python memory
int 5
int 7
num >>> num = 5
>>> num = num + 2
Right Hand Side (RHS) first
1/15/2014 17
1. Integer 5 is created in Python memory
2. Variable num is created, points to the memory location
3. num + 2 is evaluated; value 7 is created in memory
4. The name num is then attached to integer 7 in Python memory
5. 5 is no longer needed; gets freed up from memory
int 5
int 7
num ✘ >>> num = 5
>>> num = num + 2
+ and +=
1/15/2014 18
>>> homer = 'doh'
>>> print homer
doh
>>> homer = homer + '!'
>>> print homer
doh!
>>> homer = homer + '!'
>>> print homer
doh!!
>>> homer += '!'
>>> print homer
doh!!!
>>> homer += '!'
>>> print homer
doh!!!!
>>> homer = 'doh'
>>> print homer
doh
>>> homer = homer + '!'
>>> print homer
doh!
>>> homer = homer + '!'
>>> print homer
doh!!
>>> homer += '!'
>>> print homer
doh!!!
>>> homer += '!'
>>> print homer
doh!!!!
+ and +=
1/15/2014 19
Do exactly the same thing:
suffixing homer with '!'
Augmented assignments
1/15/2014 20
foo += 10
foo -= 10
foo *= 10
foo /= 10
foo **= 10
foo %= 10
foo = foo + 10
foo = foo - 10
foo = foo * 10
foo = foo / 10
foo = foo ** 10
foo = foo % 10
Subtract 10
Multiply by 10
Power to 10
Divide by 10
Modulo 10
Practice
1/15/2014 21
2 minutes
Repeated commands: Use and!
>>> chor = 'tra'
>>> chor = chor + '-la'
>>> print chor
tra-la
>>> chor += '-la'
>>> print chor
tra-la-la
>>> chor += '-la'
>>> print chor
tra-la-la-la
>>> num = 2
>>> num += 1
>>> print num
3
>>> num += 1
>>> print num
4
>>> num *= 2
>>> print num
8
>>> num *= 2
>>> print num
16
Also try: -= /= **=
.startswith(), .endswith()
1/15/2014 22
>>> len('cat')
3
>>> 'cat'.endswith('t')
True
>>> 'cat'.startswith('t')
False
>>> 'cat'.endswith('')
True
>>> c = 'cat'
>>> c.endswith('at')
True
>>> c.endswith('cat')
True
>>> c.startswith('cat')
True
>>> len('cat')
3
>>> 'cat'.endswith('t')
True
>>> 'cat'.startswith('t')
False
>>> 'cat'.endswith('')
True
>>> c = 'cat'
>>> c.endswith('at')
True
>>> c.endswith('cat')
True
>>> c.startswith('cat')
True
.startswith(), .endswith()
1/15/2014 23
.endswith() can be called on a string
or a variable whose value is a string
Boolean operators: and, or, not
1/15/2014 24
>>> 'cat'.endswith('at')
True
>>> not 'cat'.endswith('at')
False
>>> not True
False
>>> not False
True
>>> 'cat'.endswith('at') and 'dog'.endswith('g')
True
>>> 'cat'.endswith('at') and 'dog'.endswith('at')
False
>>> 'cat'.endswith('at') or 'dog'.endswith('at')
True
>>> 'cat'.endswith('at') and not 'dog'.endswith('at')
True
>>> 'cat'.endswith('at')
True
>>> not 'cat'.endswith('at')
False
>>> not True
False
>>> not False
True
>>> 'cat'.endswith('at') and 'dog'.endswith('g')
True
>>> 'cat'.endswith('at') and 'dog'.endswith('at')
False
>>> 'cat'.endswith('at') or 'dog'.endswith('at')
True
>>> 'cat'.endswith('at') and not 'dog'.endswith('at')
True
Boolean operators: and, or, not
1/15/2014 25
not is a unary operator: not A flips the truth value of A
>>> 'cat'.endswith('at')
True
>>> not 'cat'.endswith('at')
False
>>> not True
False
>>> not False
True
>>> 'cat'.endswith('at') and 'dog'.endswith('g')
True
>>> 'cat'.endswith('at') and 'dog'.endswith('at')
False
>>> 'cat'.endswith('at') or 'dog'.endswith('at')
True
>>> 'cat'.endswith('at') and not 'dog'.endswith('at')
True
Boolean operators: and, or, not
1/15/2014 26
and and or are binary operators: they take
two arguments A and B A or B
in as a substring operation
1/15/2014 27
>>> 'a' in 'cat'
True
>>> 'ca' in 'cat'
True
>>> 'cat' in 'scattered'
True
>>> 'et' in 'scattered'
False
>>> '' in 'scattered'
True
>>> 'a' in 'cat'
True
>>> 'ca' in 'cat'
True
>>> 'cat' in 'scattered'
True
>>> 'et' in 'scattered'
False
>>> '' in 'scattered'
True
in as a substring operator
1/15/2014 28
in as a "substring" operator A in B
True if A is a substring of B False otherwise
>>> 'a' in 'cat'
True
>>> 'ca' in 'cat'
True
>>> 'cat' in 'scattered'
True
>>> 'et' in 'scattered'
False
>>> '' in 'scattered'
True
>>> 'cat' not in 'scattered'
False
not in
1/15/2014 29
negating in with not
The empty string
1/15/2014 30
>>> ''
''
>>> len('')
0
>>> 'cat'.startswith('')
True
>>> 'cat'.endswith('')
True
>>> ''.startswith('')
True
>>> ''.endswith('')
True
>>> ''.endswith('t')
False
>>> '' in 'cat'
True
>>> '' in ''
True
>>> ''
''
>>> len('')
0
>>> 'cat'.startswith('')
True
>>> 'cat'.endswith('')
True
>>> ''.startswith('')
True
>>> ''.endswith('')
True
>>> ''.endswith('t')
False
>>> '' in 'cat'
True
>>> '' in ''
True
Strings can have 0 length
1/15/2014 31
'' is an empty string; the length is 0
>>> ''
''
>>> len('')
0
>>> 'cat'.startswith('')
True
>>> 'cat'.endswith('')
True
>>> ''.startswith('')
True
>>> ''.endswith('')
True
>>> ''.endswith('t')
False
>>> '' in 'cat'
True
>>> '' in ''
True
'' starts and ends every string
1/15/2014 32
The empty string '' begins and ends
every string – even the empty string
itself!
>>> ''
''
>>> len('')
0
>>> 'cat'.startswith('')
True
>>> 'cat'.endswith('')
True
>>> ''.startswith('')
True
>>> ''.endswith('')
True
>>> ''.endswith('t')
False
>>> '' in 'cat'
True
>>> '' in ''
True
'' is a substring of every string
1/15/2014 33
The empty string '' is a substring of every string –
even the empty string itself!
>>> not True
False
>>> not False
True
>>> 'cat'.endswith('at') and 'dog'.endswith('g')
True
>>> 'cat'.endswith('at') and 'dog'.endswith('at')
False
>>> 'cat'.endswith('at') or 'dog'.endswith('at')
True
>>> 'cat'.endswith('at') and not 'dog'.endswith('at')
True
>>> 'ca' in 'cat'
True
>>> 'cat' in 'scattered'
True
>>> 'et' in 'scattered'
False
Practice
1/15/2014 34
2 minutes
Also try with empty string ''
.upper(), .lower(), .capitalize()
1/15/2014 35
>>> sign = 'Please be quiet'
>>> sign.upper()
'PLEASE BE QUIET'
>>> sign.lower()
'please be quiet'
>>> sign.capitalize()
'Please be quiet'
>>> 'hello!'.upper()
'HELLO!'
>>> 'hello!'.upper().lower()
'hello!'
>>> 'hello!'.capitalize()
'Hello!'
.upper(), .lower()
1/15/2014 36
>>> sign = 'Please be quiet'
>>> sign.upper()
'PLEASE BE QUIET'
>>> sign.lower()
'please be quiet'
>>> sign.capitalize()
'Please be quiet'
>>> 'hello!'.upper()
'HELLO!'
>>> 'hello!'.upper().lower()
'hello!'
>>> 'hello!'.capitalize()
'Hello!'
capitalize and lower-case every character in
a string
.capitalize()
1/15/2014 37
>>> sign = 'Please be quiet'
>>> sign.upper()
'PLEASE BE QUIET'
>>> sign.lower()
'please be quiet'
>>> sign.capitalize()
'Please be quiet'
>>> 'hello!'.upper()
'HELLO!'
>>> 'hello!'.upper().lower()
'hello!'
>>> 'hello!'.capitalize()
'Hello!'
capitalizes the first character
in the string
>>> 'hello!'.upper()
'HELLO!'
>>> 'hello!'.upper().lower()
'hello!'
>>> 'hello!'.capitalize()
'Hello!'
>>> sign = 'Please be quiet'
>>> sign.upper()
'PLEASE BE QUIET'
>>> sign.lower()
'please be quiet'
>>> sign.capitalize()
'Please be quiet'
Nested functions
1/15/2014 38
uppercases 'hello!' and then
lowercases it back!
.replace()
1/15/2014 39
>>> foo = 'hello, world!'
>>> foo.replace('l', 'r')
'herro, worrd!'
>>> foo.replace('l', 'r').replace('r', 'l')
'hello, wolld!'
>>> faa = 'colour rumour'
>>> faa.replace('our', 'or')
'color rumor'
>>> mary = 'Mary had a little lamb'
>>> print mary
Mary had a little lamb
>>> mary.replace(' ', '')
'Maryhadalittlelamb'
.replace()
1/15/2014 40
>>> foo = 'hello, world!'
>>> foo.replace('l', 'r')
'herro, worrd!'
>>> foo.replace('l', 'r') .replace('r', 'l')
'hello, wolld!'
>>> faa = 'colour rumour'
>>> faa.replace('our', 'or')
'color rumor'
>>> mary = 'Mary had a little lamb'
>>> print mary
Mary had a little lamb
>>> mary.replace(' ', '')
'Maryhadalittlelamb'
replaces every instance of 'l' with 'r'
Stacking .replace()
1/15/2014 41
>>> foo = 'hello, world!'
>>> foo.replace('l', 'r')
'herro, worrd!'
>>> foo.replace('l', 'r') .replace('r', 'l')
'hello, wolld!'
>>> faa = 'colour rumour'
>>> faa.replace('our', 'or')
'color rumor'
>>> mary = 'Mary had a little lamb'
>>> print mary
Mary had a little lamb
>>> mary.replace(' ', '')
'Maryhadalittlelamb'
replaces every instance of 'l' with 'r',
and then replaces every 'r' with 'l'
.replace()
1/15/2014 42
>>> foo = 'hello, world!'
>>> foo.replace('l', 'r')
'herro, worrd!'
>>> foo.replace('l', 'r') .replace('r', 'l')
'hello, wolld!'
>>> faa = 'colour rumour'
>>> faa.replace('our', 'or')
'color rumor'
>>> mary = 'Mary had a little lamb'
>>> print mary
Mary had a little lamb
>>> mary.replace(' ', '')
'Maryhadalittlelamb'
replaces a string ('our') with another string ('or')
Text has been Americanized!
.replace() for removing
1/15/2014 43
>>> foo = 'hello, world!'
>>> foo.replace('l', 'r')
'herro, worrd!'
>>> foo.replace('l', 'r') .replace('r', 'l')
'hello, wolld!'
>>> faa = 'colour rumour'
>>> faa.replace('our', 'or')
'color rumor'
>>> mary = 'Mary had a little lamb'
>>> print mary
Mary had a little lamb
>>> mary.replace(' ', '')
'Maryhadalittlelamb'
Removing every space: achieved by
replacing '' with the empty string ''
Splitting a string with .split()
1/15/2014 44
>>> mary = 'Mary had a little lamb'
>>> mary
'Mary had a little lamb'
>>> print mary
Mary had a little lamb
>>> mary.split()
['Mary', 'had', 'a', 'little', 'lamb']
>>> mary.split(' ')
['Mary', 'had', 'a', 'little', 'lamb']
>>> mary.split('a')
['M', 'ry h', 'd ', ' little l', 'mb']
>>> len(mary)
22
>>> len(mary.split())
5
>>> mary = 'Mary had a little lamb'
>>> mary
'Mary had a little lamb'
>>> print mary
Mary had a little lamb
>>> mary.split()
['Mary', 'had', 'a', 'little', 'lamb']
>>> mary.split(' ')
['Mary', 'had', 'a', 'little', 'lamb']
>>> mary.split('a')
['M', 'ry h', 'd ', ' little l', 'mb']
>>> len(mary)
22
>>> len(mary.split())
5
Splitting a string with .split()
1/15/2014 45
splits on
every ''
splits on every 'a'
>>> mary = 'Mary had a little lamb'
>>> mary
'Mary had a little lamb'
>>> print mary
Mary had a little lamb
>>> mary.split()
['Mary', 'had', 'a', 'little', 'lamb']
>>> mary.split(' ')
['Mary', 'had', 'a', 'little', 'lamb']
>>> mary.split('a')
['M', 'ry h', 'd ', ' little l', 'mb']
>>> len(mary)
22
>>> len(mary.split())
5
Splitting a string with .split()
1/15/2014 46
No separator given: splits on
whitespace
splits on
every ''
Same result in this case, but
not always
>>> mary = 'Mary had a little lamb'
>>> mary
'Mary had a little lamb'
>>> print mary
Mary had a little lamb
>>> mary.split()
['Mary', 'had', 'a', 'little', 'lamb']
>>> mary.split(' ')
['Mary', 'had', 'a', 'little', 'lamb']
>>> mary.split('a')
['M', 'ry h', 'd ', ' little l', 'mb']
>>> len(mary)
22
>>> len(mary.split())
5
len() works with strings and lists
1/15/2014 47
len(string) returns the length
of string: # of characters
len(list) returns the # of
items in a list
Splitting and whitespace characters
1/15/2014 48
>>> hi = 'Hello mother,\nHello father.'
>>> print hi
Hello mother,
Hello father.
>>> hi.split()
['Hello', 'mother,', 'Hello', 'father.']
>>> hi.split('\n')
['Hello mother,', 'Hello father.']
>>> record = 'Lisa Simpson\tBorn 8/12 2002\tSpringfield'
>>> print record
Lisa Simpson Born 8/12 2002 Springfield
>>> record.split('\t')
['Lisa Simpson', 'Born 8/12 2002', 'Springfield']
Nope!
>>> la = 'la di da'
>>> la.split()
['la', 'di', 'da']
>>> la.split(' ')
['la', 'di', '', '', 'da']
>>> foo = 'colorless green ideas'
>>> foo.split('e')
['colorl', 'ss gr', '', 'n id', 'as']
1/15/2014 49
So, as long as a string contains only
spaces and no other whitespace, .split()
is the same as .split(' '), right?
three spaces ' '
.split() default behavior
1/15/2014 50
If no separator is specified:
Splits on whitespace (includes space, line break \n, tab \t).
Repeated whitespaces do NOT result in empty string tokens.
If a separator is specified, then:
Repetition DOES result in empty string tokens.
>>> la = 'la di da'
>>> la.split()
['la', 'di', 'da']
>>> la.split(' ')
['la', 'di', '', '', 'da']
>>> foo = 'colorless green ideas'
>>> foo.split('e')
['colorl', 'ss gr', '', 'n id', 'as']
Practice 1
1/15/2014 51
2 minutes
Practice 1
1/15/2014 52
2 minutes
Practice 2
1/15/2014 53
2 minutes
Practice 2
1/15/2014 54
2 minutes
Practice 3
1/15/2014 55
2 minutes
Practice 3
1/15/2014 56
2 minutes
Practice 4
1/15/2014 57
Write a Python script that:
internally stores the "Fox in Sox" text as a string variable named "fox", http://www.pitt.edu/~naraehan/ling1901/sample-texts.txt
prints out the text,
prints out how many lines are in the text,
prints out how many words are in the text,
prints out how many characters are in the text,
prints out how many times 'ree' occurs in the text.
Use .count() method. Try and figure it out!
3 minutes
A note on Python syntax
1/15/2014 58
✔ ✔
✘
✘
if... : elif ... : else :
Conditional
Looking ahead
1/15/2014 59
>>> grade = 87
>>> if grade >= 90 :
print 'You got an A'
elif grade >= 80 :
print 'You got a B'
else :
print 'Try harder!'
You got a B
>>>
Wrap-up
1/15/2014 60
If you want to try more commands, visit:
A Beginner's Python Tutorial, Lesson 6
http://www.sthurlow.com/python/lesson06/
Exercise #2
http://www.pitt.edu/~naraehan/ling1991/exercise.html#ex2
Due Tuesday midnight