making connections - - MSCS@UICjan/mcs275/connections.pdf · making connections 1 CTA Tables...

41
making connections 1 CTA Tables general transit feed specification stop names and stop times storing the connections in a dictionary 2 CTA Schedules finding connections between stops sparse matrices in SciPy visualizing a matrix 3 Adjacency Matrices matrices as dictionaries of dictionaries searching the adjacency matrix MCS 275 Lecture 40 Programming Tools and File Management Jan Verschelde, 19 April 2017 Programming Tools (MCS 275) making connections L-40 19 April 2017 1 / 41

Transcript of making connections - - MSCS@UICjan/mcs275/connections.pdf · making connections 1 CTA Tables...

making connections1 CTA Tables

general transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

MCS 275 Lecture 40Programming Tools and File Management

Jan Verschelde, 19 April 2017

Programming Tools (MCS 275) making connections L-40 19 April 2017 1 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 2 / 41

GTFS of our CTA

We can download the schedules of the CTA:http://www.transitchicago.com/developers/gtfs.aspx

GTFS = General Transit Feed Specificationis an open format for packaging scheduled service data.

A GTFS feed is a series of text files with data on lines separated bycommas (csv format).

Each file is a table in a relational database.

Programming Tools (MCS 275) making connections L-40 19 April 2017 3 / 41

some tables

stops.txt: stop locations for bus or trainroutes.txt: route list with unique identifierstrips.txt: information about each trip by a vehiclestop_times.txt: scheduled arrival and departure times foreach stop on each trip.

Programming Tools (MCS 275) making connections L-40 19 April 2017 4 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 5 / 41

finding a stop name

$ python3 ctastopname.pyopening CTA/stops.txt ...give a stop id : 3021skipping line 03021 has name "California & Augusta"

The script looks for the line

3021,3021,"California & Augusta",41.89939053, \-87.69688045,0,,1

Programming Tools (MCS 275) making connections L-40 19 April 2017 6 / 41

ctastopname.pyFILENAME = ’CTA/stops.txt’print ’opening’, FILENAME, ’...’DATAFILE = open(FILENAME, ’r’)STOPID = input(’give a stop id : ’)COUNT = 0STOPNAME = Nonewhile True:

LINE = DATAFILE.readline()if LINE == ’’:

breakL = LINE.split(’,’)try:

if int(L[0]) == STOPID:STOPNAME = L[2]break

except:print ’skipping line’, COUNT

COUNT = COUNT + 1print STOPID, ’has name’, STOPNAME

Programming Tools (MCS 275) making connections L-40 19 April 2017 7 / 41

finding head signs

Given an identification of a stop,we look for all CTA vehicles that make a stop there.

$ python3 ctastoptimes.pyopening CTA/stop_times.txt ...give a stop id : 3021skipping line 0adding "63rd Pl/Kedzie"adding "Kedzie/Van Buren"[’"63rd Pl/Kedzie"’, ’"Kedzie/Van Buren"’]

We scan the lines in stop_times.txt for where the given stopidentification occurs.

Programming Tools (MCS 275) making connections L-40 19 April 2017 8 / 41

ctastoptimes.pyFILENAME = ’CTA/stop_times.txt’print ’opening’, FILENAME, ’...’DATAFILE = open(FILENAME, ’r’)STOPID = input(’give a stop id : ’)COUNT = 0TIMES = []while True:

LINE = DATAFILE.readline()if LINE == ’’:

breakL = LINE.split(’,’)try:

if int(L[3]) == id:if not L[5] in TIMES:

print ’adding’, L[5]TIMES.append(L[5])

except:print ’skipping line’, COUNT

COUNT = COUNT + 1print TIMES

Programming Tools (MCS 275) making connections L-40 19 April 2017 9 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 10 / 41

finding connections

The file stop_times.txt has lines

22043803629,07:38:30,07:38:30,30085,22,"UIC",0,9845522043803629,07:40:30,07:40:30,30069,23,"UIC",0,100813

Stops 30085 ("Clinton-Blue")and 30069 ("UIC-Halsted") are connectedvia stop head sign "UIC".

In a dictionary D we store D[(30085,30069)] = "UIC".

Programming Tools (MCS 275) making connections L-40 19 April 2017 11 / 41

ctaconnections.py

The initialization and start of the loop:

FILENAME = ’CTA/stop_times.txt’print ’opening’, FILENAME, ’...’DATAFILE = open(FILENAME, ’r’)COUNT = 0PREV_STOP = -1PREV_HEAD = ’’D = {}while True:

LINE = DATAFILE.readline()if LINE == ’’:

breakL = LINE.split(’,’)

Programming Tools (MCS 275) making connections L-40 19 April 2017 12 / 41

ctaconnections.py

Updating the dictionary D with L:

try:(STOP, HEAD) = (int(L[3]), L[5])if PREV_STOP == -1:

(PREV_STOP, PREV_STOP) = (STOP, HEAD)else:

if PREV_HEAD == HEAD:D[(PREV_STOP, STOP)] = HEAD

else:(PREV_STOP, PREV_HEAD) = (STOP, HEAD)

except:print ’skipping line’, COUNT

COUNT = COUNT + 1print D, len(D)

Programming Tools (MCS 275) making connections L-40 19 April 2017 13 / 41

a sparse matrix

There are 11430 lines in stops.txt.Except for the first line, every line is stop.Viewing each stop as a node in a graph,there are 11429 nodes.The adjacency matrix has 11,429 rows and 11,429 colums or130,622,041 elements.The dictionary stores 583,279 elements, less than 0.5% of the totalpossible 11,429 × 11,429 elements.

Programming Tools (MCS 275) making connections L-40 19 April 2017 14 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 15 / 41

connecting the stops

$ python3 ctaconnectstops.pyopening CTA/stop_times.txt ...loading a big file, be patient ...skipping line 0573036 connectionsgive start stop id : 30085

give end stop id : 3006930085 and 30069 are connected by "UIC"

Programming Tools (MCS 275) making connections L-40 19 April 2017 16 / 41

the function stopdict

FILENAME = ’CTA/stop_times.txt’

def stopdict(name):"""Opens the file with given name.The file contains scheduled arrivaland departure times for each stopon each trip. On return is a dictionaryD with keys (i,j) and strings as values,where i and j are stop ids and thevalue is the empty string if i and jare not connected by a trip, otherwiseD[(i,j)] contains the trip name."""

Programming Tools (MCS 275) making connections L-40 19 April 2017 17 / 41

the function main()

def main():"""Creates a dictionary from the filestop_times.txt and prompts the userfor a start and end stop id.The result of the dictonary querytells whether the stops are connected."""conn = stopdict(FILENAME)print len(conn), ’connections’i = input(’give start stop id : ’)j = input(’ give end stop id : ’)outs = str(i) + ’ and ’ + str(j)if not conn.has_key((i, j)):

print outs + ’ are not connected’else:

print outs + ’ are connected by ’ + conn[(i, j)]

Programming Tools (MCS 275) making connections L-40 19 April 2017 18 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 19 / 41

sparse matrices

>>> from scipy import sparse

To store an adjacency matrix similar to D[(i,j)]we use the COOrdinate format:

>>> from scipy import array>>> from scipy.sparse import coo_matrix>>> row = array([0,3,1,0])>>> col = array([0,3,1,2])>>> data = array([4,5,7,9])>>> A = coo_matrix((data,(row,col)),shape=(4,4))>>> A.todense()matrix([[4, 0, 9, 0],

[0, 7, 0, 0],[0, 0, 0, 0],[0, 0, 0, 5]])

Programming Tools (MCS 275) making connections L-40 19 April 2017 20 / 41

SciPy session continued

>>> B = A*A>>> B.todense()matrix([[16, 0, 36, 0],

[ 0, 49, 0, 0],[ 0, 0, 0, 0],[ 0, 0, 0, 25]])

Property of adjacency matrices A: if (Ak )i ,j �= 0,then nodes i and j are connected by a path of length k .

Programming Tools (MCS 275) making connections L-40 19 April 2017 21 / 41

dictionary of keys sparse matrices

dok_matrix is a dictionary of keys based sparse matrix:

allows for efficient access of individual elements;can be efficient converted to a coo_matrix.

>>> from scipy import sparse>>> A = sparse.dok_matrix((4,4))>>> A[1,2] = 1>>> B = sparse.coo_matrix(A)>>> B.todense()matrix([[ 0., 0., 0., 0.],

[ 0., 0., 1., 0.],[ 0., 0., 0., 0.],[ 0., 0., 0., 0.]])

Programming Tools (MCS 275) making connections L-40 19 April 2017 22 / 41

session continued

>>> B.todense()matrix([[ 0., 0., 0., 0.],

[ 0., 0., 1., 0.],[ 0., 0., 0., 0.],[ 0., 0., 0., 0.]])

>>> B.rowarray([1], dtype=int32)>>> B.colarray([2], dtype=int32)>>> B.dataarray([ 1.])>>> B.nnz1

The attributes row, col, data, and nnz respectively return the row,column indices, the corresponding data, and the number of nonzeros.

Programming Tools (MCS 275) making connections L-40 19 April 2017 23 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 24 / 41

a matrix plot

Programming Tools (MCS 275) making connections L-40 19 April 2017 25 / 41

the script spy_matrixplot.py

import numpy as npfrom matplotlib.pyplot import spyimport matplotlib.pyplot as pltfrom scipy import sparse

r = 0.1 # ratio of nonzeroesn = 100 # dimension of the matrixA = np.random.rand(n,n)A = np.matrix(A < r,int)S = sparse.coo_matrix(A)x = S.row; y = S.colfig = plt.figure()ax = fig.add_subplot(111)ax.plot(x,y,’.’)plt.show()

Programming Tools (MCS 275) making connections L-40 19 April 2017 26 / 41

the matrix plot for the CTA

Programming Tools (MCS 275) making connections L-40 19 April 2017 27 / 41

the script ctamatrixplot.py

# L-40 MCS 275 Wed 20 Apr 2016 : ctamatrixplot.py

# This script creates a sparse matrix A,# which is the adjacency matrix of the stops:# A[i,j] = 1 if stops i and j are connected.

from scipy import sparseimport matplotlib.pyplot as plt

filename = ’CTA/stop_times.txt’print ’opening’, filename, ’...’file = open(filename,’r’)

n = 12165A = sparse.dok_matrix((n,n))

Programming Tools (MCS 275) making connections L-40 19 April 2017 28 / 41

the script continued

i = 0; prev_id = -1; prev_hd = ’’while True:

d = file.readline()if d == ’’: breakL = d.split(’,’)try:

id = int(L[3]); hd = L[5]if prev_id == -1:

(prev_id, prev_hd) = (id, hd)else:

if prev_hd == hd:A[prev_id, id] = 1

else:(prev_id, prev_hd) = (id, hd)

except:pass # print ’skipping line’, i

i = i + 1

Programming Tools (MCS 275) making connections L-40 19 April 2017 29 / 41

making the plot

B = sparse.coo_matrix(A)x = B.row; y = B.colfig = plt.figure()ax = fig.add_subplot(111)ax.set_xlim(-1,n)ax.set_ylim(-1,n)ax.plot(x,y,’b.’)plt.show()

Programming Tools (MCS 275) making connections L-40 19 April 2017 30 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 31 / 41

adjacency matrix

An adjacency matrix A is a matrix of zeroes and ones:

A[row][column] = 1: row and column are connected,A[row][column] = 0: row and column are not connected.

For example:

1 0 1 0 00 1 1 0 10 0 0 0 01 0 1 0 10 1 1 1 0

Programming Tools (MCS 275) making connections L-40 19 April 2017 32 / 41

a random adjacency matrix

from random import randint

def random_adjacencies(dim):"""Returns D, a dictionary of dictionaries torepresent a square matrix of dimension dim.D[row][column] is a random bit."""result = {}for row in range(dim):

result[row] = {}for column in range(dim):

result[row][column] = randint(0, 1)return result

Programming Tools (MCS 275) making connections L-40 19 April 2017 33 / 41

writing the matrix

def write(dim, mat):"""Writes the square matrix of dimension dimrepresented by the dictionary mat."""for row in range(dim):

for column in range(dim):print(’ %d’ % mat[row][column], end=’’)

print(’’)

Programming Tools (MCS 275) making connections L-40 19 April 2017 34 / 41

making connections

1 CTA Tablesgeneral transit feed specificationstop names and stop timesstoring the connections in a dictionary

2 CTA Schedulesfinding connections between stopssparse matrices in SciPyvisualizing a matrix

3 Adjacency Matricesmatrices as dictionaries of dictionariessearching the adjacency matrix

Programming Tools (MCS 275) making connections L-40 19 April 2017 35 / 41

searching the adjacency matrix

Consider again the example:

1 0 1 0 00 1 1 0 10 0 0 0 01 0 1 0 10 1 1 1 0

Observe:There is no direct path from 1 to 3.We can go from 1 to 4 and from 4 to 3.

Programming Tools (MCS 275) making connections L-40 19 April 2017 36 / 41

matrix-matrix multiplication

>>> import numpy as np>>> A = np.matrix([[1, 0, 1, 0, 0],... [0, 1, 1, 0, 1],... [0, 0, 0, 0, 0],... [1, 0, 1, 0, 1],... [0, 1, 1, 1, 0]])>>> A*Amatrix([[1, 0, 1, 0, 0],

[0, 2, 2, 1, 1],[0, 0, 0, 0, 0],[1, 1, 2, 1, 0],[1, 1, 2, 0, 2]])

>>> _[1, 3]1

A2i ,j = 1: there is a path from i to j with one intermediate stop.

Programming Tools (MCS 275) making connections L-40 19 April 2017 37 / 41

the main program

def main():"""Prompts the user for the dimensionans shows a random adjacency matrix."""dim = int(input(’Give the dimension : ’))mtx = random_adjacencies(dim)write(dim, mtx)src = int(input(’Give the source : ’))dst = int(input(’Give the destination : ’))mxt = int(input(’Give the maximum number of steps : ’))pth = search(dim, mtx, dst, 0, mxt, [src])print(’the path :’, pth)

Programming Tools (MCS 275) making connections L-40 19 April 2017 38 / 41

the specfication and base case

def search(dim, mat, destination, level, maxsteps, \accu):"""Searchs the matrix mat of dimension dimfor a path between source and destination withno more than maxsteps intermediate stops.The path is accumulated in accu,initialized with source."""source = accu[-1]if mat[source][destination] == 1:

return accu + [destination]else:

...

Programming Tools (MCS 275) making connections L-40 19 April 2017 39 / 41

the rest of the definition

if level < maxsteps:for k in range(dim):

if k not in accu:if mat[source][k] == 1:

path = search(dim, mat, destination, \level+1, maxsteps, accu + [k])

if path[-1] == destination:return path

return accu

Programming Tools (MCS 275) making connections L-40 19 April 2017 40 / 41

Summary + Exercises

Dictionaries are good to process data on file.

1 Modify ctastopname.py so the user is prompted for a stringinstead of a number. The modified script prints all id’s andcorresponding names that have the given string as substring.

2 Instead of using numpy and scipy,use turtle to draw the spy plot of a matrix.

3 Instead of using numpy and scipy,use the canvas widget of tkinter to draw the spy plot of a matrix.

4 Apply the search to work on the adjacency matrix of the dataobtained for the CTA.

Programming Tools (MCS 275) making connections L-40 19 April 2017 41 / 41