Application of tries

APPLICATION OF TRIES

Why Trie Data Structure?

• Searching trees in general favor keys which are of

fixed size since this leads to efficient storage

management.

• However in case of applications which are retrieval

based and which call for keys varying length, tries

provide better options.

• Tries are also called as Lexicographic Search trees.

• The name trie (pronounced as “try”)originated from

the word “retrieval”.

Definition For Trie:

• A trie of order m may be empty.

• If not empty, then it consists of an ordered sequence

of exactly m tries of order m.

• The branching at any level of the trie is determined

only by the portion and not by the whole word.

• Alphabetic keys require a trie of order 27(26 letters of

the alphabet + a blank) for their storage and retrieval.

Representation Of Trie

• The trie have two category of node structures.▫Branch node

▫ Information node

• A branch node is merely a collection of LINK

fields each pointing either to a branch node or to

an information node.

• An information node holds the keys that is to be

stored in the trie.

Operations In Trie

•The three operations in the trie data

Structure are

•Searching a trie

• Insertion

•Deletion

Example

•Construct a Trie for the keys001,100,111,011,010STEP 1: Insert (001,100)

0 1

001 100

Example

•STEP 2: 0 1Insert(111)

0 1

001

111100

Example•STEP 3:Insert(011) 0 1

0 1 0 1

111100001 011

Example

•STEP 4:Insert(010)

0 1

0 1 0 1

001 111100

010 011

INSERTION

•To Insert a key K into the trie we begin as

we would to search for the key k, possibly

moving down the trie.

•At the point where the LINK field of the

branch node leads to NIL, the key k is

inserted as an information node.

Insertion • In the Above constructed trie INSERT A

KEY 101. 0 1

0 1 0 1

001 111

010 011100

101

Deletion• The deletion of a key K from a trie proceeds as one would

to search for the key.• On reaching the information node(node l)holding k, the

same is deleted.▫ It need to be ensured the branch node to which node l

is linked accommodates other information node as well! If there is more than1 information node/if there is at

least one LINK field/or both ,then deletion id done. If it leaves the branch node with just one more

key ,we delete the branch node and push the node to a higher level

If the situation leads to node being the only non empty node , once again we delete the branch node and push node to a higher level.

Deletion•Delete 010: 0 1

0 1 0 1

001 111

010 011100

101

Performance Of trie

• The performance of search trees is determined by the

number of keys that form the tree.

• The complexities of the search ,delete and insert

operations were given by O(h) where the height h is

dependent on the number of keys represented in the

search tree.

• In contrast, the performance of the trie is dependent on

the length of the key-The number of characters forming

the key rather than the number of keys itself.

APPLICATIONS OF TRIE DATA STRUCTURES

TRIES IN AUTO COMPLETE

• Since a trie is a tree-like data structure in which each

node contains an array of pointers, one pointer for each

character in the alphabet.

• Starting at the root node, we can trace a word by

following pointers corresponding to the letters in the

target word.

• Starting from the root node, you can check if a word

exists in the trie easily by following pointers

corresponding to the letters in the target word.

AUTO COMPLETE

•Auto-complete functionality is used widely

over the internet and mobile apps. A lot of

websites and apps try to complete your

input as soon as you start typing.

•All the descendants of a node have a

common prefix of the string associated with

that node.

AUTO COMPLETE IN GOOGLE SEARCH

WHY TRIES IN AUTO COMPLETE

• Implementing auto complete using a trie is

easy.

•We simply trace pointers to get to a node

that represents the string the user

entered. By exploring the trie from that

node down, we can enumerate all strings

that complete user’s input.

CRIMINOLOGY

• Suppose that you are at the scene of a crime and

observe the first few characters CRX on the registration

plate of the getaway car. If we have a trie of

registration numbers, we can use the characters CRX

to reach a subtrie that contains all registration

numbers that begin with CRX. The elements in this

subtrie can then be examined to see which cars satisfy

other properties that might have been observed.

AUTOMATIC COMMAND COMPLETION

• When using an operating system such as Unix

or DOS, we type in system commands to

accomplish certain tasks. For example, the

Unix and DOS command cd may be used to

change the current directory.

Commands that have the prefix “ps”

• ps2ascii ps2pdf psbook psmandup psselect

• ps2epsi ps2pk pscal psmerge pstopnm

• ps2frag ps2ps psidtopgm psnup pstops

• ps2gif psbb pslatex psresize pstruct

• Figure 10 Commands that begin with "ps"

We can simply the task of typing in commands by providing a command completion facility which automatically types in the command suffix once the user has typed in a long enough prefix to uniquely identify the command. For instance, once the letters psi have been entered, we know that the command must be psidtopgm because there is only one command that has the prefix psi. In this case, we replace the need to type in a 9 character command name by the need to type in just the first 3 characters of the command!

• Longest prefix match (also called Maximum prefix length match)

refers to an algorithm used by routers in Internet Protocol (IP)

networking to select an entry from a routing table .

• Because each entry in a routing table may specify a network, one

destination address may match more than one routing table entry.

The most specific table entry — the one with the highest subnet

mask — is called the longest prefix match. It is called this because

it is also the entry where the largest number of leading address

bits in the table entry match those of the destination address.

LONGEST PREFIX MATCHING

For example, consider this IPv4 routing table (CIDR notation

is used):

192.168.20.16/28

192.168.0.0/16

When the address 192.168.20.19 needs to be looked up,

both entries in the routing table "match". That is, both

entries contain the looked up address. In this case, the

longest prefix of the candidate routes is 192.168.20.16/28,

since its subnet mask (/28) is higher than the other entry's

mask (/16), making the route more specific.

• A network browser keeps a history of the URLs of

sites that you have visited. By organizing this

history as a trie, the user need only type the prefix

of a previously used URL and the browser can

complete the URL.

SPELL CHECKERS

• Spell checkers are ubiquitous. Word

processors have spell checkers, as

do browser-based e-mail clients.

They all work the same way: a

dictionary is stored in some data

structure, then each word of input

is submitted to a search in the data

structure, and those that fail are

flagged as spelling errors

SPELL CHECKERS

•There are many appropriate data

structures to store the word list, including

a sorted array accessed via binary search,

a hash table, or a bloom filter. In this

exercise you are challenged to store the

word list character-by-character in a trie.

Spell Check..

Spell Check..a b c … p … z

i u

a g s

peg pest

0

1

2

a e

page

0

pig

pug

PHONE BOOK SEARCH..

•Trie data structure are mostly used to search for a contact on phone book.

•Prefix Matching a = a*

Example

albertoramsankarstarstella

a b c … r s … z

t

ram

a e

star stella

0

1

2

Contacts in Phone book

a

sankar

alberto

PHONE BOOK SEARCH..

• Suffix Matching

• Can be used to index all

suffixes in a text in order

to carry out fast full text

searches.

an = *an*

E

TRIES IN T9

• T9 is a technology used on many mobile phones to make

typing text messages easier.

• The idea is simple - each number of the phone's keypad

corresponds to 3-4 letters of the alphabet.

• Many phones will notice when you type in a word that is

not in its dictionary, and will add that word. Others keep

track of the frequency of certain words and favor those

words over other words that have the same sequence of

keypresses.

http://en.wikipedia.org/wiki/T9_(predictive_text)

TRIES IN T9

•How does a T9 dictionary work?

• It can be implemented in several ways, one of

them is Trie. The route is represented by the

digits and the nodes point to collection of words.

• T9 works by filtering the possibilities down

sequentially starting with the first possible

letters.

http://en.wikipedia.org/wiki/Trie

TRIES IN T9

• It can be implemented using nested hash

tables as well, the key of the hash table is a

letter and on every digit the algorithm

calculates all possible routes (O(3^n) routes).

•For example, If we type '4663' we get 'good'

when we press down button we get 'gone'

then 'home' etc..

THANK YOU

Application of tries

Technology

Transcript of Application of tries