Exam 1 What’s on the Exam? - University of Virginia ... · PDF file– Your own...
-
Upload
nguyencong -
Category
Documents
-
view
213 -
download
1
Transcript of Exam 1 What’s on the Exam? - University of Virginia ... · PDF file– Your own...
cs3102: Theory of Computation
Class 10:
DFAs in Practice
Spring 2010
University of Virginia
David Evans
Menu
• Today:
– Preparing for Exam 1
– Language class for Deterministic PDAs
– Applications of DFAs
• Thursday:
– Exam Review (if you send questions and/or topics)
– Applications of probabilistic DFAs and Grammars
Exam 1
• In class, next Tuesday, 2 March
• Covers:
Classes 1-9
(10 and 11)Sipser Ch 0-2
Problem Sets 1-3 + Comments
Exam 1
Note: unlike nearly all other sets we draw in this class, all of these sets are
finite, and the size (roughly) represents the relative size.
What’s on the Exam?
Definitions
Language, problem, sets
Constructing and understanding computing models
Finite automata (DFA, NFA)
Pushdown automata (DPDA, NPDA)
Grammars (Context-Free Grammar)
Language Classes: Regular and Context Free
Show a language is in the class
Show a language is not in the class
Prove or disprove a closure property
Proof Methods
Proof by Induction
Proof by Construction
Understand and use the pumping lemmas for RL and CFL
Sample exam on website
should give you a good
idea what to expect
Your exam will probably also have “what’s
wrong with this proof” questions
Exam 1 Notesheet
For Exam 1, you may use only:
– Your own brain and body
– A low-tech writing instrument (pen or pencil)
– A single page (both sides) of notes that you create
You may work with others to create your notes page.
Admiral Grace Hopper
John von Neumann
Albert Einstein
Exam Help Available
• Office Hours:
– Thursdays, 8:30-9:30am
– Thursdays, after class
– Fridays, 10-11:30am (Sonali in Stacks)
– Mondays, 1:15-3pm
• TA’s Exam Review Session
– This Sunday, 5-6:30pm, Olsson 228EAll Languages
Regular
Languages
(DFA, NFA, RE, RG)
Finite
Languages
Context-Free
(CFG or NPDA)
w
an
anbncn
ww
Where are the languages recognized by a Deterministic PDA?
Proving Set Equivalence
A = B ⇔ A ⊆ B and B ⊇ A
Sets A and B are equivalent if A is a subset
of B and B is a subset of A.
BA
A ⊆ B B ⊇ A
Proving Formalism Equivalence
Proving Formalism Equivalence Proving Formalism Non-Equivalence
All Languages
Regular
Languages
(DFA, NFA, RE, RG)
Context-Free
(CFG or NPDA)
Which of these could be true?
anbn
Regular
Languages(DFA, NFA, RE, RG)
Context-Free (NPDA)
DPDA
Regular
Languages(DFA, NFA, RE, RG)
Context-Free (NPDA)
DPDA
How can we distinguish these two plausible possibilities?
Regular
Languages(DFA, NFA, RE, RG)
Context-Free (NPDA)
DPDA
Regular
Languages(DFA, NFA, RE, RG)
Context-Free (NPDA)
DPDA
How can we distinguish these two plausible possibilities?
Find some language A that can
be recognized by some NPDA
but not by any DPDA.
A
Prove by construction: for any
NPDA, there is a DPDA that
recognizes the same language.
ε, ε→$
a, ε→+
b, +→εε, $ → ε
b, +→ε
b, ε→ε
ε, $ → ε
Proof by contradiction:
Assume there is a DPDA
that recognizes A. Show
how to construct a NPDA
that recognizes some
language we know is not
context free.
Proved by construction:
We showed an NPDA that
recognizes A.
Proof by contradiction. Suppose there is a DPDA M that recognizes A.
It must be in an accept state only after processing aibi and aib2i.
…
2i transitions, consuming 0i1i
…
i transitions, consuming 1i
Construct M’: copy all the states on the second half, replacing b with c:
… …
What is the language of M’?
Proof by contradiction. Suppose there is a DPDA M that recognizes A.
It must be in an accept state only after processing aibi and aib2i.
… …
Construct M’: copy all the states on the second half, replacing b with c:
… …
Not a Context-Free
Language!
We have a contradiction: if A is in L(DPDA), we could use the DPDA that
recognizes A to construct an DPDA that recognizes a non-context-free
language! Hence, A must not be in L(DPDA).
All Languages
Regular
Languages
(DFA, NFA, RE, RG)
Context-Free
(CFG or NPDA)
anbn
A
Deterministic Context-Free Languages
Recognized by a DPDA (or DCFG)
Context-Free Languages Deterministic
Context-Free LanguagesRegular Languages
DFAs in Practice
Malware
Scanner
W32.Bolzano.Gen:
576a222bd2c20400558b4c240cd9ffff
07fbffffff{0-2}5c4e544c445200{0-2}
5c57494e4e545c73797374656d
33325c6e746f736b726e6c2e657
86500{0-29}3b4658
W32.MyLife.E:
7a6172793230*40656d
61696c2e636f6d
Note: These are the signatures from ClamAV, an open source virus scanner.
Files
Network
Traffic
String Matching
q0 q1 q2 q3 q4 q5
t r u t h
We hold these truths to be self-evident, that …
How much work is it to scan a string of length N for a signature?
Faster String Matching
q0 q1 q2 q3 q4 q5
t r u t h
We hold these truths to be self-evident, that …
s[4] = h?
s[10] = h?
truth
truth
s[9] = t?s[8] = u?
truth
truthtruth
Skip table:
a, b, c, d, e, f, g, i, j, k, l, m, n, o, p, q,
r, s, v, w, x, y, z: 6
h: 0
r: 4
t: 1
u: 2
DFA / Skipping DFA
Is a “Skipping DFA” still a DFA?
(That is, does it still only accept the
Regular Languages?)
J. Strother Moore
(UT Austin)
Boyer-Moore Fast
String Searching
Algorithm (1977)
Best case: N/(w+1) comparisons
where N is the length of the text
and w is the length of the search
string
Is this fast enough for a malware scanner?
Virus Detection
Total number of signatures: 720,033
2
4
6
8
10
12
11/01 05/02 12/02 06/03 01/04 08/04 02/05 09/05 03/06
Siz
e (
MB
)
Symantec
RAV AV
Nate Paul’s study
Can we scan one
input for many
possible malware
signatures quickly?
Combining DFAs?
Regular languages closed under union:
q0
qA0
qB0
qA1
qB1
ε
ε
a
a
…
…
How many states are there now?
Signatures
First byte: Set of signatures:
00000000 ~720000/256
00000001 ~720000/256
00000010 ~720000/256
…
11111111 ~720000/256
Try a Trie
q0
q00
q01
q02
qFF
0x02
…
q0000
q0001
q0002
q01FF
0x02
…
720000/(256*256) ~ 11
Alfred V. Aho and Margaret J. Corasick, 1975
q0000Alure
ona
0x02
Scanner Demo
http://www.virustotal.com
Evasive Malware
Metamorphic Code: as virus
propagates, each new copy is
different
How hard is it to automatically
modify code without changing
its behavior?
Detecting Evasive Malware
• Less exact signatures
(e.g., W32.MyLife.E:
7a6172793230*40656d61696c2e636f6d)– Dangerous – start matching benign programs if you’re
not careful!
• Behavioral signatures: match the behavior, not the program text– Undecidable in general (we’ll see in a few weeks)
– Expensive and difficult in practice (but done by all decent scanners)
Faster String Scanning Charge
• We focus on DFAs, NFAs, PDAs, CFGs, etc. as
abstract models: Number of states, time to
process, etc. don’t matter
• Lots of real applications of these models: but
in practice, what matters is different
If you have topics you want me to review,
post comments (on today’s class
announcement) by 5pm tomorrow.