intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable...

24
Python II Shlomo Hershkop July 2014

Transcript of intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable...

Page 1: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

Python IIShlomo Hershkop

July 2014

Page 2: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

clean up time

python allows easy clean up of input

string

lstrip

rstrip

remove trailing whitespace before/after/both from string

Page 3: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

Regular Expressions

import re

language to describe patterns

find matches

replace matches

very flexible, need to understand how system works so can optimize it

re.search(pattern, str)

Page 4: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

re.match(regexp,str,mod)

get match object

re.findall(regexp,str.mod)

find all matches

re.sub(regex, repl, string, n)

do replacement

Page 5: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

building up a regular expression

basic string

‘shlomo’

re.findall(‘shlomo’,str,re.I)

Page 6: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

character ranges

[aeiou]

[^aeiou]

flip it around

[a-z]

\d

[0-9]

\D

\s

whitespace

\w

alphanumeric

Page 7: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

grouping

using parenthesis in the pattern

.group()

.group(n)

nth group, 1..n from left to right (opening)

can also name (?P<name> )

Page 8: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

?

one or none {0,1}

+

at least one {1,}

*

zero or more {0,}

\d{3,5}

Page 9: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

greedy matching

default is to match as wide as possible

inefficient

non greedy

.?

+?

*?

Page 10: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

can limit:

^

start of str

$

end of str

quantifies

.

*

+

?

Page 11: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

pat = re.compile(r”\s*(?P<header>[^:]+)\s*:(?P<value>.*?)\s*$")

!

bit hard to read

can use the verbose to allow you to inject whitespaces

Page 12: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

pat = re.compile(r"""

\s* # Skip leading whitespace

(?P<header>[^:]+) # Header name

\s* : # Whitespace, and a colon

(?P<value>.*?) # The header's value -- *? used to

# lose the following trailing whitespace

\s*$ # Trailing whitespace to end-of-line

""", re.VERBOSE)

Page 13: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

will do a lot more regular expressions

will get lots of practice

Page 14: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

ip address

www.cnn.com

what happens when you type into the browser

x.x.x.x

!

how to write a regular expression to match ipv4 ?

Page 15: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

pat = re.compile(“\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}")

but the problem will match 999.22.999.999

!

need extra function:

Page 16: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

def valid_ip(address):

try:

host_bytes = address.split(‘.')

valid = [int(b) for b in host_bytes]

valid = [b for b in valid if b >= 0 and b<=255]

return len(host_bytes) == 4 and len(valid) == 4

except:

return False

Page 17: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

sockets

import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

mysock.connect(('www.py4inf.com', 80))

mysock.send('GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n')

while True:

data = mysock.recv(512)

if ( len(data) < 1 ) :

break

print data;

mysock.close()

Page 18: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

import urllib

counts = dict()

fhand = urllib.urlopen('http://www.py4inf.com/code/romeo.txt')

for line in fhand:

words = line.split()

for word in words:

counts[word] = counts.get(word,0) + 1

print counts

Page 19: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

command line programs

import os

os.system(‘run something’)

!

or

!

fh = os.popen(‘ls -la’)

s = fh.read()

Page 20: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

more on modules

modules are re-usable pieces of python

import x

will look for x.py

x.foo - will look for foo in x

from x import foo, foo2

allows you to simply say foo or foo2

Page 21: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

pyc

to improve running time, python will pre-compile modules into .pyc

be careful if you delete module, but its still being accessed :)

Page 22: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

namespace

default code begins on __main__ namespace

Page 23: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

md5

>>> import hashlib

>>> m = hashlib.md5()

>>> m.update("shlomo")

>>> m.digest()

'a\x11\xa4\xd2\xc5\xe5\xbd\x82\x9a\xdf2Y\x0c\x08\x8a\x93'

>>> m.hexdigest()

'6111a4d2c5e5bd829adf32590c088a93'

Page 24: intro python II - Columbia Universitysh553/dip2014/intro python II.pdf · modules are re-usable pieces of python import x will look for x.py x.foo - will look for foo in x from x

for file

import hashlib

def hashfile(afile, hasher, blocksize=65536):

buf = afile.read(blocksize)

while len(buf) > 0:

hasher.update(buf)

buf = afile.read(blocksize)

return hasher.digest()

[(fname, hashfile(open(fname, 'rb'), hashlib.md5()) for fname in fnamelst]