Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism...

23
Data Structures in Java Session 23 Instructor: Bert Huang http://www.cs.columbia.edu/~bert/courses/3134

Transcript of Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism...

Page 1: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

Data Structures in JavaSession 23

Instructor: Bert Huanghttp://www.cs.columbia.edu/~bert/courses/3134

Page 2: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

Announcements• Homework 6 due Dec. 10, last day of class

• Final exam Thursday, Dec. 17th, 4-7 PM, Hamilton 602 (this room)

• same format as midterm (open book/notes)

• Distinguished Lecture: Barbara Liskov, MIT. Turing Award Winner 2009. 11 AM Monday. Davis Auditorium, CEPSR/Schapiro

Page 3: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

Review• External sorting: merge to alternating disks

• Complexity Classes

• P, NP: Euler path

• NP-Complete, NP-Hard: Hamiltonian path, Satisfiability, Graph Coloring

• NP-Hard: Traveling Salesman

• Undecidable: Halting Problem

Page 4: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

Todayʼs Plan

• Note about hw4

• Finish discussion of complexity

• Polynomial Time Approximation Schemes

• Graph Isomorphism

• k-d trees

Page 5: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

Rehashing

• A hash table does not store the input order

• When rehashing, elements are inserted into the new table in the array order

• No penalty on homework, but make sure itʼs correct on the final

Page 6: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

NP-Hardness• An algorithm for an NP-Hard problem

can be used to solve any NP problem via polynomial time conversion

• But we donʼt know algorithms to solve NP-Hard problems in poly. time

• If we did, P = NP, so most conjecture that NP-Hard problems must be intractable

Page 7: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

Poly. Time Approximation

• Certain optimization NP-Hard problems have polynomial time approximation schemes (PTAS)

• An efficient method to find a solution within a constant of the true optimum

• e.g., Optimal TSP path length = PTAS TSP path length

• For fixed constant, must be poly. time, but can scale poorly w.r.t. constant

• E.g., is a valid PTAS timeO(p(N)(1� !))

�≤ �(1 + �)

Page 8: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

Graph Isomorphism Complexity

• The Graph Isomorphism problem is NP,

• but is unknown if NP-Complete/Hard,

• and no poly. time algorithm is known

NPP NP-Complete NP-Hard

Graph Isomorphism Subgraph Isomorphism? ? ?

Page 9: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

• Given graphs G and H, is there a 1-to-1 mapping of vertices from G to vertices from H that preserves the edge structure?

• Subgraph Isomorphism: is a subgraph of G isomorphic to H?

Y

X

WZ

Graph Isomorphism Definition

AC

D

B

B

C

DA

C

A

DB

X

Page 10: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

Complexity

NP P NP-HardNP-Complete

Satisfiability, Hamiltonian Path,

Subgraph Iso,Graph Coloring,

etc.Graph Iso?

Euler Path,Sorting,

Selection,lots of stuff

Traveling Salesman,Halting Problem

Page 11: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

Advanced Data Structures

• Weʼve mostly studied fundamental data structures

• wide application areas, very general

• In practice, youʼll often have more specific goals, and thus need to design your own data structures

Page 12: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

kd-Trees• Useful data structure for data mining and

machine learning applications

• Store elements by k-dimensional keys

• e.g., age, height, weight

• Retrieve elements by ranges in the k dimensions

• e.g., Searching for a new basketball center 18-24 year olds, 6ʼ6”-7ʼ4”, 200+ lbs

Page 13: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

1-d Range Search• BST recursive search:

(1) if key is in range, print node(2) if key > lower bound, search left(3) if key < upper bound, search right

• O(M+log N) for M items returned

• O(log N) to find nodes in range

• O(1) at each node

8

4

2 6

12

10 14

1 3 5 7 9 11 13 15

Search for 3-7

Page 14: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

height

weight

age

age

kd-Tree Structure• Binary search tree

• each level splits on alternating keys

Page 15: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

Search Algorithm• Given lower and upper bounds for each

dimension

• If key is in range, printIf key > current dimensionʼs lower bound search left childIf key < current dimensionʼs upper bound search right child

• Insert recursion is just like BST

Page 16: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

A

BC

E D

F

G

H

2-d Tree

age

heig

ht

A

BC

D

E

F

GH

Page 17: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

kd-Tree Analysis

• Since each level represents a different keys, balancing is not possible

• If we have all the points, we can build a perfectly balanced tree. How?

• Then worst case O(M + kN1−1/k)

Page 18: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

Square Root vs. Log

0 100 200 300 400 500 600 700 800 900 10000

5

10

15

20

25

30

35

logsqrt

0 100 200 300 400 500 600 700 800 900 10000

200

400

600

800

1000

logsqrtN

Page 19: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

Nearest Neighbor Search

• kd-trees are especially helpful for finding nearest neighbors

• Given a data set, find the nearest point to any element x

• Naive O(N) approach is to compute distances everywhere

• Instead, kd-tree offers O(kN1−1/k)

Page 20: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

Nearest Neighbor Algorithm

• Search for x in the kd-tree until you reach a leaf

• Consider leaf point current-best

• Backtrack along search path, and at each node:

• If current point is better, redefine current-best

• If best can be in the unexplored child*, recurse down the unexplored child

Page 21: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

Algorithm Illustration

age

heig

ht A

D

E

FD

A

B C

E F G

x

x

B

G

C

Page 22: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

kd-Tree Summary• Smart data structure for storing and searching

multi-dimensional keys

• Branch BST cycling dimensions at each level

• Running time is sub-linear, but grows to linear when k is infinite

• Efficient range search, search, and nearest-neighbor search

Page 23: Data Structures in Java - Columbia Universitybert/courses/3134/slides/Lecture23.pdfGraph Isomorphism Complexity •The Graph Isomorphism problem is NP, •but is unknown if NP-Complete/Hard,

Reading

• Complexity - Weiss 9.7

• kd-trees - Weiss 12.6