Databases, Markup, and Regular Expressions

27
Databases, Markup, and Regular Expressions 2 November 2010
  • date post

    21-Oct-2014
  • Category

    Education

  • view

    1.676
  • download

    1

description

Class slidedeck for LIS 644, "Digital Trends, Tools, and Debates," at the University of Wisconsin-Madison's School of Library and Information Studies.

Transcript of Databases, Markup, and Regular Expressions

Page 1: Databases, Markup, and Regular Expressions

Databases, Markup, and Regular Expressions

2 November 2010

Page 2: Databases, Markup, and Regular Expressions

Weekly reflection

• What keeps you from “being technical,” or feeling like you are?

• Alternately, if you know you’re technical, how did you get to be that way?

• Or both! What keeps you from feeling as technical as you actually are?

Page 3: Databases, Markup, and Regular Expressions

Tool of the week: text editors

• AKA “programmers’ editors”

• Just the text, ma’am! No binary garbage, no WYSIWYG; that’s not what these are FOR.

• Look for:• Regular expressions (“grep”)• Syntax coloring in your favorite language• Code-folding, code completion... lots of bells and whistles

• Windows: UltraEdit. Mac: BBEdit (TextWrangler is OK). Cross-platform: jedit. Emacs and vi are for geeks only.

Page 4: Databases, Markup, and Regular Expressions

Tip of the week: Getting the most out of library school

• An MLS does not guarantee you a library job. Anybody who says it does is lying to you.

• You get out of library school what you put in.• The “extras” like workshops, talks, committees? NOT EXTRAS.• Don’t breeze through. Take the classes that mean something.• Pick your practicum carefully.• Look for champions. You’ll need those recommendations.

• Get professionally involved NOW.

• Take any chance to have your résumé and sample cover letter read by a professional librarian.

Page 5: Databases, Markup, and Regular Expressions

The relational database• Designed by EF Codd in the mid-1960s.

• RUNS THE WORLD. Almost every non-trivial web application you’ll find has a relational DB underneath it.

• Interacts with the outside world (i.e. programs) through SQL: Structured Query Language.

• There is an actual SQL standard...• ... but no two databases implement it quite the same way.• The basics, however, are pretty consistent.

• Taught here at SLIS. If you have any thoughts of being a techie, TAKE THAT CLASS.

Page 6: Databases, Markup, and Regular Expressions

Tables• Most tables represent the “things” you’re describing.

• Some tables relate those things to each other.

book_id book_isbn book_barcode

1 441009328 12345_67890

2 441478123 01234_56789

3 441012248 23456_78901

BOOK PATRONpatron_id patron_lname patron_phone

1 Salo 262-5493

2 Gorman 265-5291

3 Tobias 265-6381

Page 7: Databases, Markup, and Regular Expressions

book_id book_isbn book_barcode

1 441009328 12345_67890

2 441478123 01234_56789

3 441012248 23456_78901

BOOK PATRONpatron_id patron_lname patron_phone

1 Salo 262-5493

2 Gorman 265-5291

3 Tobias 265-6381

Primary key, foreign key• Every row in a table should have some kind of unique

identifier within the table: PRIMARY KEY.• It is often named <thing>_id, and often just a number.

• You can use a PK in other tables to refer to a row. In that other table, it is a FOREIGN KEY.

• For BOOK, could I have chosen a different PK? PATRON?

Page 8: Databases, Markup, and Regular Expressions

The magic: relations!

book_id book_isbn book_barcode

1 441009328 12345_67890

2 441478123 01234_56789

3 441012248 23456_78901

BOOK PATRONpatron_id patron_lname patron_phone

1 Salo 262-5493

2 Gorman 265-5291

3 Tobias 265-6381

checkout_id book_id patron_id

1 2 1

2 3 1

3 1 3

CHECKOUT

Page 9: Databases, Markup, and Regular Expressions

My First SQL Query• Syntax: SELECT <thing(s) you want> FROM <table(s)>

WHERE <how you know which things you want>;• Often “how you know...” is the information you’re starting with.

• What’s the barcode on the book with the ISBN 441478123?• What happens if we have two copies of the book with this ISBN?

book_id book_isbn book_barcode

1 441009328 12345_67890

2 441478123 01234_56789

3 441012248 23456_78901

BOOK

SELECT book_barcode FROM book WHERE book_isbn = ‘441478123’;

Page 10: Databases, Markup, and Regular Expressions

A little harder!• Who has checked out the book with barcode 12345_67890?

• Oh no! Everything’s in different tables!

book_id book_isbn book_barcode

1 441009328 12345_67890

2 441478123 01234_56789

3 441012248 23456_78901

BOOK PATRONpatron_id patron_lname patron_phone

1 Salo 262-5493

2 Gorman 265-5291

3 Tobias 265-6381

checkout_id book_id patron_id

1 2 1

2 3 1

3 1 3

CHECKOUT

Page 11: Databases, Markup, and Regular Expressions

Subqueries• You can put whole queries in the WHERE clause!

• So. What do you want, and from which table?• patron_lname from the PATRON table• SELECT patron_lname FROM patron WHERE...• Or “SELECT patron_lname, patron_phone FROM patron WHERE...”

book_id book_isbn book_barcode

1 441009328 12345_67890

2 441478123 01234_56789

3 441012248 23456_78901

BOOK PATRONpatron_id patron_lname patron_phone

1 Salo 262-5493

2 Gorman 265-5291

3 Tobias 265-6381

checkout_id book_id patron_id1 2 12 3 13 1 3

CHECKOUT

Page 12: Databases, Markup, and Regular Expressions

Where what?• Where the patron_id is associated with the right book_id in

the CHECKOUT table.• WHERE patron_id = (SELECT patron_id FROM checkout WHERE...)

book_id book_isbn book_barcode

1 441009328 12345_67890

2 441478123 01234_56789

3 441012248 23456_78901

BOOK PATRONpatron_id patron_lname patron_phone

1 Salo 262-5493

2 Gorman 265-5291

3 Tobias 265-6381

checkout_id book_id patron_id

1 2 1

2 3 1

3 1 3

CHECKOUT

Page 13: Databases, Markup, and Regular Expressions

Where what?• You now want the book_id from the BOOK table given the

barcode number.• WHERE book_id = (SELECT book_id FROM book WHERE book_barcode =

‘12345_67890’)

book_id book_isbn book_barcode

1 441009328 12345_67890

2 441478123 01234_56789

3 441012248 23456_78901

BOOK PATRONpatron_id patron_lname patron_phone

1 Salo 262-5493

2 Gorman 265-5291

3 Tobias 265-6381

checkout_id book_id patron_id

1 2 1

2 3 1

3 1 3

CHECKOUT

Page 14: Databases, Markup, and Regular Expressions

Putting it all together

• SELECT patron_lname FROM patron WHERE patron_id = (SELECT patron_id FROM checkout WHERE book_id = (SELECT book_id FROM book WHERE book_barcode = ‘12345_67890’));

• Whew!

Page 15: Databases, Markup, and Regular Expressions

MarkupXML and (X)HTML

Page 16: Databases, Markup, and Regular Expressions

Markup

• In the dark ages of typesetting, we told text what to look like. [ol0[ep[fy120,10,12,1]blah[ep

• Renear: “presentational” markup.

• Lots of drawbacks to this approach!• If “what it looks like” changes, you have to change EVERY

SINGLE PLACE where that particular kind of text appears.• You can’t do ANYTHING consistently across documents with

different designs.

Page 17: Databases, Markup, and Regular Expressions

Paragraphs and characters

• Most WYSIWYG programs mark text this way.• Microsoft Word: “paragraph” and “character” styles.

• Most copyeditors still think this way, too.• “keymarking” = going through a manuscript to decide what

each paragraph of text is and label it

• Notice the difference! Now you can tell text what to BE.• Heading 1, Body Text, Abstract, Citation

• What does that let you do?

• But there’s a problem with this, too...

Page 18: Databases, Markup, and Regular Expressions

Nested structures

• Structures exist in texts that are bigger than paragraphs.• A list has a beginning and end... but not within the same list item,

most times! And abstracts can be >1 paragraph.• What about a section? Or a pullout? Or a chapter?• Need some hierarchy here!

• WYSIWYG programs can’t do this at all, or do it very badly. Markup does it very well!

• And so (leaving aside decades of development) we have XML.

Page 19: Databases, Markup, and Regular Expressions

Extensible Markup Language

• A set of rules for delimiting text structures.

• Also a family of standards designed to work with marked-up text structures!

• DOM: Document Object Model (for programmers)• XSLT: transform one text structure to another• XPath: drill down into a text structure• ... etc.

Page 20: Databases, Markup, and Regular Expressions

The Rules• Thou shalt use Unicode, or else mark thy preferred encoding.

• Thou shalt put thy markup in angle brackets, clearly marking the start and end of a text run with “tags.”

• <exclamation>Hello, World!</exclamation>

• To mark a point instead of a text run, thou shalt use empty tags.• <empty /> OR <empty></empty>

• Thou shalt enclose thine entire document in ONE SET of tags.

• Thou shalt not permit overlapping text runs; thou shalt keep thy hierarchy clean.

• <exclamation>Hello, <addressee>World</addressee>!</exclamation>• <exclamation>Hello, <addressee>World!</exclamation></addressee>

Page 21: Databases, Markup, and Regular Expressions

More rules• To describe a text run further, thou mayst add “attributes” (key-

value pairs) to thy start tags. Thou shalt put quote marks around the value!

• <exclamation type=”greeting”>Hello, World!</exclamation>

• Thou shalt neither use angle brackets nor ampersands in thy text, lest thou confuse the computer. Thou shalt refer to them thus: & as &amp;, < as &lt;, and > as &gt;.

• Thou shalt always use the same case in thine tag and attribute names.

• <exclamation>Hello, World!</EXCLAMATION>

Page 22: Databases, Markup, and Regular Expressions

That’s pretty much it.Those are the rules!

And if your document obeys them, it is “well-formed.”

Page 23: Databases, Markup, and Regular Expressions

But wait!Don’t different kinds of text have rules of their own?

Page 24: Databases, Markup, and Regular Expressions

Markup languages• The basic rules of XML, plus constraints relating to the

type of text you’re dealing with.• Tag and attribute name/value constraints• Hierarchy constraints• Required/optional constraints• Constraints on number of occurrences

• These constraints are laid out in a Schema or DTD.• “Parser” checks that you’ve followed the XML rules and are

“well-formed.”• “Validator” checks that you’ve followed your constraints. If

you have, you are “well-formed” AND “valid.”

Page 25: Databases, Markup, and Regular Expressions

Markup languages we use• XHTML, of course!

• (the “X” is because this version of HTML uses the XML rules)• (earlier versions of HTML didn’t)

• MODS and METS and XMLMARC, oh my!

• TEI• Text Encoding Initiative• For marking up books, manuscripts, dictionaries, etc.

• EAD• Encoded Archival Description• For marking up finding aids.

Page 26: Databases, Markup, and Regular Expressions

Regular expressionsthe metadata librarian’s lifesaver!

Page 27: Databases, Markup, and Regular Expressions

http://xkcd.com/208