13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM...
-
Upload
hayley-codrington -
Category
Documents
-
view
217 -
download
0
Transcript of 13 Other Types of Databases Oracle, M / MUMPS, X.500/LDAP and Search Engines, + Lessons from GM...
13
Other Types of DatabasesOracle, M/MUMPS, X.500/LDAP and Search Engines, + Lessons from GM “Architects Day”
MIS 304 Winter 2006
2
13
Goals for this class
• Identify tools to help evaluate database products• Understand the role of other data management
architectures.• Understand the features of the MUMPS data
structure.• Understand the structure of the X.500/LDAP
directory standards.• Understand the Linear Associative Model
3
13
Database Evaluation
• Requirements, Requirements, Requirements• Do the evaluation!• Make it as realistic as possible• Use outside tools
4
13
Transaction Processing Council
• www.tpc.org
13
M/Mumps
6
13
There are 2 basic ways to organize data
• The tree• The table
7
13
“M” a.k.a. MUMPS
• Massachusetts General Hospital Multi-utility Programming System
• The ANSI Standard version now called simply “M”
• “Multidimensional” database.– http://www.cache.com
8
13
The MUMPS Data Structure
• In traditional programming languages SOMETHING(X,Y) or SOMETHING(1,2,3,4…) or SALES (1,1,1,1) = 42
• In MUMPS Sales(region,salesman,product,time) example: TotalSales = Sales(east,Fred,clocks,Q1) + Sales(east,Ed,clocks,Q1)
Note that the indexes on the “array” are word valued and not number valued.
9
13
So What Does the Query Look Like?
FOR region = EAST to WEST FOR salesman=Adams to Smith FOR product=1111 to 9999 FOR time=Jan to Dec TOTALSALES = TOTALSALES +
SALES(region,salesman,product,time) NEXT time NEXT product NEXT salesmanNEXT region
10
13
13
Directory ServicesA Special Database Case
MIS 304 Fall 2005
12
13
Class Goal
• Understand the application of Naming to network management.
• Understand the idea of a classification hierarchy.
• Understand Lightweight Directory Access Protocol (LDAP) and its application.
13
13
The Case for Directories
• The “Net” has become increasingly complex.
• More need than ever to work across organizational boundaries.
• Wouldn’t it be great if everything had a unique and understandable name?
14
13
What’s in a Name
• People• Buildings• Computers• Printers• Locations• Objects (computer)• Roads• Vehicles
• Rooms• Stock locations• Truck wells• Servers
15
13
The Goal
If you can name it and locate it you can manage it.
16
13
What’s in a Name
• A name draws a distinction between two things. G. Spencer-Brown, Laws of Form, Dutton, 1979.
• To take advantage of human processing capabilities names should be “friendly”.
17
13
Taxonomy
• The study of the general principles of scientific classification.
• A way to organize anything into hierarchical categories based on characteristics.
• Used widely in Biological Sciences.
18
13
Taxonomy Example
P hy laV er tib ra te
P hy laV er tib ra te
K ing domA nim a l
K ing domP lan t
R o ot
19
13
Taxonomy Example in Biology
• Kingdom • Phylum (in animals)
or Division (in plants)• Class• Order• Family• Genus • Species
20
13
Taxonomy Example in IS
O = E D S
G = Ja n ice
S = M or in
O U = P eo p le
O = Fo rd
C = U S
O = D a im le rC h rys le r
C = D E
G = G a ry
S = M or in
O U = A uto m o tive
O = B C E E m e rg is
C = C A
R o ot
21
13
X.500
• Originally part of the Open Systems Interconnect (OSI) network suite.
• Defined directory structure on an OSI network.
• Modified to run over TCP/IP networks (Internet).
22
13
Tags
• C = Country• O = Organization• OU = Organizational Unit• L = Location• G = Given Name• S = Surname
23
13
Person Identifier
• C= CA• O= BCE Emergis• OU= Automotive• S= Morin• G= Gary
24
13
Person Identifier
• Because of object “inheritance” each level inherits the attributes of the preceding level.
25
13
Database Structure
A sse m b ly V iste on
O = Fo rd O = D a im lerCh rys le r
A uto m otive
O = B C E E m e rg is
R o ot
• Can be either hierarchical or relational.• If it’s relational, what’s the key?
O OU SBCE Emergis Automotive MorinDaimler Chrysler Chrysler SmithEDS US CutlerFord Assembly JonesFord Visteon Morin
26
13
Distinguished Name
• A string of globally unique characters.• Almost everything has problems.
– Mohamed Chang?– SSN?– An E-Mail address?
• You almost always have a “messy” key.
27
13
Lookup in SQL
• Select * from DIRECTORY where c = us and o = Ford and s = “Morin”
• Where is DIRECTORY?• SQL may not be the ideal answer.
28
13
LDAP
• X.500 was getting really messy.• Most organizations did not need all of
the features.• Some U of M students wrote the
Lightweight Directory Access Protocol.• Defines how to connect to an query a
X.500 style database with lots less overhead.
29
13
LDAP Examples
• Microsoft Exchange/Outlook• Lotus Notes• Novell NDS• Netscape browser• Open LDAP http://www.openldap.org• WAX500, MAX500, XAX500
30
13
Logical Extensions
• Once you can name it, locate it and have a way of querying it just extend the idea to any object.
31
13
Communities of Interest
• Internet Engineering Task Force X.521 describes a “person” object.
• AIAG has a guideline to describe Companies and Locations.
32
13
Example 1
• CN = ITM Centerline• ou=locations• o=arius.com• street = 25999 Lawrence Ave• l = Centerline• st = Michigan• c = us• postalCode = 48015-0303• buildingNumberOfFloors = 1
33
13
Example 2
• cn = Detroit Medical Center Helipad• ou = locations• landingStripType = concrete• landingStripElevation = 630 ft• landingStripAirportID =5MI0• l = Detroit• st = Michigan• street = 420 St. Antoine• c = us
34
13
Naming Objects
• Computer Objects are somewhat different than physical things.
• Human readability is not so much of an issue and lookup speed is.
35
13
OSI ASN.1
• A notation for describing data structures.
• Uses an Object Identifier (OID) and a short text description to identify levels of the tree.
• If a labeled node is a leaf in a tree then it is an object and contains a value.
36
13
Example 1.3.6.1.2.2
C C IT T (0 )
D irec to ry (2 )
M IB -I (1 ) M IB -II (2 )
M an ag em en t (2 ) E xpe rim e nta l (3 ) P r iv a te (4 )
D O D (1 )
In te rne t (6 )
O R G (3 )
IS O (1 ) Jo in t - IS O -C C IT T (2 )
R o o t
37
13
So What?
• You can build a cross company directory.– Names are agreed on by a common
standards body (AIAG)– Common Query Language (LDAP)
• Each organization keeps its own information current.
• Extensions are easy to add.
13
Search Engines and The Associative Retrieval Modela new kind of Database?
MIS 304 Fall 2004
39
13
Goals for this class
• Understand that a linear associative retrieval model is.
40
13
There are 2 basic ways to organize data?
• The tree• The table
And…• A Matrix of Associations?
41
13
The Problem to be Solved
• The Internet has a large number of documents linked together with the documents spread out physically across many web servers.
• How do you find anything?
42
13
One solution
• Build a data structure that indexes the pages.• The structure is populated by searching
individual pages with a “bot”, a program that surfs the web returning the text of the many pages there.
• The pages returned by the bot are processed into a special kind of database.
43
13
A Simple Document Index Structure
• Create a matrix containing the index terms on one axis and the documents containing them.– Leave out words like a, the, and, it…– Assign a number to each term and document.
– Call this matrix C
Doc 1 Doc 2 Doc 3 Doc 4 Doc 5
Term 1 1 1 1 0 1
Term 2 0 1 1 0 0
Term 3 1 0 0 1 1
44
13
Coordinate Retrieval
• Now suppose we want take all of the documents we have retrieved from the web and query our C matrix for where a term occurs in a document.
• We can do this by creating a 1xt matrix of the terms (t) we want to search for and call it Q then if we normalize so that each row in C sums to 1 we get a 1xd matrix of documents (d) with a score for every document by:
R = QC
45
13
Discussion
• This is a good as far as it goes but… • This does nothing to help us get to the situation
where there are more complex relationships between the terms.
• Synonyms are a good example.• Suppose you are writing a document you don’t
want to use the same word to describe something over and over again so you use a synonym.
• The probability that both words occur in same document is greatly increased.
46
13
Inter-term Relationships
• Suppose we want to include these inter-term relationships in our search.
• We need a Thesaurus.
47
13
Transform
• Now look through the table and create a matrix of the number of times terms occur together in a document.
Term1 Term2 Term3
Term1 4 2 2
Term2 2 2 0
Term3 2 0 3
48
13
Normalization Matrix
• Normalize the transform table so that the cells are the “cost” of that two terms occur together. Call that matrix L.
Term1 Term2 Term3
Term1 .125 .125 .33
Term2 .125 .50 0
Term3 .33 0 .33
49
13
Query Vector
• Now create a vector of the terms you want to search for.
Term 1 Term 2 Term 3
1 0 1
50
13
Now the Math
Multiply the index term table, call it T by the normalized transform table C and the Query vector Q and you get a vector R that contains a ranking of documents 0 to 1.
R = QLC
51
13
Results
• The result vector.
• The documents with the highest value have the most likelihood of being relevant to our search.
Doc 1 Doc 2 Doc 3 Doc 4 Doc 5
Rank 0 .5 .3 .6 .1
52
13Discussion
• The Matrix that is created by the multiplication of L and C now becomes a new kind of structure a matrix of “associations” between documents and terms and the terms themselves.
• This may be the only other way of organizing data besides the table and the tree.
• You can extend this by creating a new structure that is a normalized document by document (dxd) matrix that takes into account associations between documents. (e.g.) chapters or authors.
• This falls into the new category called “Connectionist” models that include Neural Networks.
53
13
A Model of Consciousness
• Some have even gone so far as to say this may be one of the structures in a conscious brain. (Kanerva, 1988)
• Do some thought experiments on your own “associative” brain by trying some stream of consciousness exercises.
54
13
Linear Associative Retrieval Model
• Giulianio and Jones, Linear Associative Retrieval, Vistas in Information Handling, Spartan Press 1962,
• Hough, The Control of Complex Systems, Progress in Cybernetics and Systems Research, Halstead Press, 1975.
• Kanerva, Sparse Distributed Memory, MIT Press, 1988.
55
13
The Future
• More of the same– There is a lot of pent up inertia
– SQL is a pretty good programming language
• More XML– There is no stopping this train.
• More AI/Connectionist/Associative tools• Bigger and bigger databases