Databases, Markup, and Regular Expressions
-
date post
21-Oct-2014 -
Category
Education
-
view
1.676 -
download
1
description
Transcript of Databases, Markup, and Regular Expressions
Databases, Markup, and Regular Expressions
2 November 2010
Weekly reflection
• What keeps you from “being technical,” or feeling like you are?
• Alternately, if you know you’re technical, how did you get to be that way?
• Or both! What keeps you from feeling as technical as you actually are?
Tool of the week: text editors
• AKA “programmers’ editors”
• Just the text, ma’am! No binary garbage, no WYSIWYG; that’s not what these are FOR.
• Look for:• Regular expressions (“grep”)• Syntax coloring in your favorite language• Code-folding, code completion... lots of bells and whistles
• Windows: UltraEdit. Mac: BBEdit (TextWrangler is OK). Cross-platform: jedit. Emacs and vi are for geeks only.
Tip of the week: Getting the most out of library school
• An MLS does not guarantee you a library job. Anybody who says it does is lying to you.
• You get out of library school what you put in.• The “extras” like workshops, talks, committees? NOT EXTRAS.• Don’t breeze through. Take the classes that mean something.• Pick your practicum carefully.• Look for champions. You’ll need those recommendations.
• Get professionally involved NOW.
• Take any chance to have your résumé and sample cover letter read by a professional librarian.
The relational database• Designed by EF Codd in the mid-1960s.
• RUNS THE WORLD. Almost every non-trivial web application you’ll find has a relational DB underneath it.
• Interacts with the outside world (i.e. programs) through SQL: Structured Query Language.
• There is an actual SQL standard...• ... but no two databases implement it quite the same way.• The basics, however, are pretty consistent.
• Taught here at SLIS. If you have any thoughts of being a techie, TAKE THAT CLASS.
Tables• Most tables represent the “things” you’re describing.
• Some tables relate those things to each other.
book_id book_isbn book_barcode
1 441009328 12345_67890
2 441478123 01234_56789
3 441012248 23456_78901
BOOK PATRONpatron_id patron_lname patron_phone
1 Salo 262-5493
2 Gorman 265-5291
3 Tobias 265-6381
book_id book_isbn book_barcode
1 441009328 12345_67890
2 441478123 01234_56789
3 441012248 23456_78901
BOOK PATRONpatron_id patron_lname patron_phone
1 Salo 262-5493
2 Gorman 265-5291
3 Tobias 265-6381
Primary key, foreign key• Every row in a table should have some kind of unique
identifier within the table: PRIMARY KEY.• It is often named <thing>_id, and often just a number.
• You can use a PK in other tables to refer to a row. In that other table, it is a FOREIGN KEY.
• For BOOK, could I have chosen a different PK? PATRON?
The magic: relations!
book_id book_isbn book_barcode
1 441009328 12345_67890
2 441478123 01234_56789
3 441012248 23456_78901
BOOK PATRONpatron_id patron_lname patron_phone
1 Salo 262-5493
2 Gorman 265-5291
3 Tobias 265-6381
checkout_id book_id patron_id
1 2 1
2 3 1
3 1 3
CHECKOUT
My First SQL Query• Syntax: SELECT <thing(s) you want> FROM <table(s)>
WHERE <how you know which things you want>;• Often “how you know...” is the information you’re starting with.
• What’s the barcode on the book with the ISBN 441478123?• What happens if we have two copies of the book with this ISBN?
book_id book_isbn book_barcode
1 441009328 12345_67890
2 441478123 01234_56789
3 441012248 23456_78901
BOOK
SELECT book_barcode FROM book WHERE book_isbn = ‘441478123’;
A little harder!• Who has checked out the book with barcode 12345_67890?
• Oh no! Everything’s in different tables!
book_id book_isbn book_barcode
1 441009328 12345_67890
2 441478123 01234_56789
3 441012248 23456_78901
BOOK PATRONpatron_id patron_lname patron_phone
1 Salo 262-5493
2 Gorman 265-5291
3 Tobias 265-6381
checkout_id book_id patron_id
1 2 1
2 3 1
3 1 3
CHECKOUT
Subqueries• You can put whole queries in the WHERE clause!
• So. What do you want, and from which table?• patron_lname from the PATRON table• SELECT patron_lname FROM patron WHERE...• Or “SELECT patron_lname, patron_phone FROM patron WHERE...”
book_id book_isbn book_barcode
1 441009328 12345_67890
2 441478123 01234_56789
3 441012248 23456_78901
BOOK PATRONpatron_id patron_lname patron_phone
1 Salo 262-5493
2 Gorman 265-5291
3 Tobias 265-6381
checkout_id book_id patron_id1 2 12 3 13 1 3
CHECKOUT
Where what?• Where the patron_id is associated with the right book_id in
the CHECKOUT table.• WHERE patron_id = (SELECT patron_id FROM checkout WHERE...)
book_id book_isbn book_barcode
1 441009328 12345_67890
2 441478123 01234_56789
3 441012248 23456_78901
BOOK PATRONpatron_id patron_lname patron_phone
1 Salo 262-5493
2 Gorman 265-5291
3 Tobias 265-6381
checkout_id book_id patron_id
1 2 1
2 3 1
3 1 3
CHECKOUT
Where what?• You now want the book_id from the BOOK table given the
barcode number.• WHERE book_id = (SELECT book_id FROM book WHERE book_barcode =
‘12345_67890’)
book_id book_isbn book_barcode
1 441009328 12345_67890
2 441478123 01234_56789
3 441012248 23456_78901
BOOK PATRONpatron_id patron_lname patron_phone
1 Salo 262-5493
2 Gorman 265-5291
3 Tobias 265-6381
checkout_id book_id patron_id
1 2 1
2 3 1
3 1 3
CHECKOUT
Putting it all together
• SELECT patron_lname FROM patron WHERE patron_id = (SELECT patron_id FROM checkout WHERE book_id = (SELECT book_id FROM book WHERE book_barcode = ‘12345_67890’));
• Whew!
MarkupXML and (X)HTML
Markup
• In the dark ages of typesetting, we told text what to look like. [ol0[ep[fy120,10,12,1]blah[ep
• Renear: “presentational” markup.
• Lots of drawbacks to this approach!• If “what it looks like” changes, you have to change EVERY
SINGLE PLACE where that particular kind of text appears.• You can’t do ANYTHING consistently across documents with
different designs.
Paragraphs and characters
• Most WYSIWYG programs mark text this way.• Microsoft Word: “paragraph” and “character” styles.
• Most copyeditors still think this way, too.• “keymarking” = going through a manuscript to decide what
each paragraph of text is and label it
• Notice the difference! Now you can tell text what to BE.• Heading 1, Body Text, Abstract, Citation
• What does that let you do?
• But there’s a problem with this, too...
Nested structures
• Structures exist in texts that are bigger than paragraphs.• A list has a beginning and end... but not within the same list item,
most times! And abstracts can be >1 paragraph.• What about a section? Or a pullout? Or a chapter?• Need some hierarchy here!
• WYSIWYG programs can’t do this at all, or do it very badly. Markup does it very well!
• And so (leaving aside decades of development) we have XML.
Extensible Markup Language
• A set of rules for delimiting text structures.
• Also a family of standards designed to work with marked-up text structures!
• DOM: Document Object Model (for programmers)• XSLT: transform one text structure to another• XPath: drill down into a text structure• ... etc.
The Rules• Thou shalt use Unicode, or else mark thy preferred encoding.
• Thou shalt put thy markup in angle brackets, clearly marking the start and end of a text run with “tags.”
• <exclamation>Hello, World!</exclamation>
• To mark a point instead of a text run, thou shalt use empty tags.• <empty /> OR <empty></empty>
• Thou shalt enclose thine entire document in ONE SET of tags.
• Thou shalt not permit overlapping text runs; thou shalt keep thy hierarchy clean.
• <exclamation>Hello, <addressee>World</addressee>!</exclamation>• <exclamation>Hello, <addressee>World!</exclamation></addressee>
More rules• To describe a text run further, thou mayst add “attributes” (key-
value pairs) to thy start tags. Thou shalt put quote marks around the value!
• <exclamation type=”greeting”>Hello, World!</exclamation>
• Thou shalt neither use angle brackets nor ampersands in thy text, lest thou confuse the computer. Thou shalt refer to them thus: & as &, < as <, and > as >.
• Thou shalt always use the same case in thine tag and attribute names.
• <exclamation>Hello, World!</EXCLAMATION>
That’s pretty much it.Those are the rules!
And if your document obeys them, it is “well-formed.”
But wait!Don’t different kinds of text have rules of their own?
Markup languages• The basic rules of XML, plus constraints relating to the
type of text you’re dealing with.• Tag and attribute name/value constraints• Hierarchy constraints• Required/optional constraints• Constraints on number of occurrences
• These constraints are laid out in a Schema or DTD.• “Parser” checks that you’ve followed the XML rules and are
“well-formed.”• “Validator” checks that you’ve followed your constraints. If
you have, you are “well-formed” AND “valid.”
Markup languages we use• XHTML, of course!
• (the “X” is because this version of HTML uses the XML rules)• (earlier versions of HTML didn’t)
• MODS and METS and XMLMARC, oh my!
• TEI• Text Encoding Initiative• For marking up books, manuscripts, dictionaries, etc.
• EAD• Encoded Archival Description• For marking up finding aids.
Regular expressionsthe metadata librarian’s lifesaver!
http://xkcd.com/208