Stein Markup 1.1 Markup Languages ML Yaakov J. Stein Chief Scientist RAD Data Communications SG W X...
-
Upload
mitchell-mccoy -
Category
Documents
-
view
217 -
download
0
Transcript of Stein Markup 1.1 Markup Languages ML Yaakov J. Stein Chief Scientist RAD Data Communications SG W X...
Stein Markup 1.1
MarkupMarkupLanguagesLanguages
MLML
MarkupMarkupLanguagesLanguages
MLML
Yaakov J. Stein
Chief ScientistRAD Data Communications
SGSG
WWXXHTHT
VOXVOX
DSDS
DHTDHTGG
mathmath
legal-X
C
SSS A
CP
Stein Markup 1.2
What do I do?What do I do?
business letters email meeting summaries proposals reports requirement specifications project plans
web pages research articles review articles books
I digest, edit and produce documents
Stein Markup 1.3
What do others do?What do others do?
Pretty much the same
US corporations produce >100 billion documents per year
90% of a modern institution’s information is in documents
>50% of typical corporation’s efforts involves documents
That’s why word processing SW was expected to bring efficiency increases
But didn’t!
Stein Markup 1.4
Word processing?Word processing?
PROs makes nicer looking documents expedites document sharing during creation
CONs typically 30% of effort on format and reformat doesn’t increase information accessibility doesn’t facilitate information mining
Stein Markup 1.5
Databases?Databases?
The natural alternative to documents are databases
PROs increase information accessibility facilitate information mining
CONs not human readable format inflexible
Stein Markup 1.6
The solutionThe solutionWhat we really want is to write unconstrained text
but to have information retrieval as well !
Method 1 Automatic text analysisAI program analyzes textRecognizes document structure, sentence syntaxPerforms gisting, facilitates information miningComplete solution equivalent to solving Turing test
Method 2 Manual markupDocument author responsible for markingClarifies document structureEnables automated retrieval of selected informationSuggests presentation format
Stein Markup 1.7
Why is text analysis hard?Why is text analysis hard?
The man cried FIRE !
The man cried FIRE the gun !
The man cried FIRE the gun maker !
Stein Markup 1.8
AreAre ML MLss computer languages?computer languages?There are many different types of computer languages:procedural languages
for (n=0;n<10;i++) if (n>5) printf(“markup languages are fun!\n”);
graphic languagesnewpath0 0 moveto 0 1 lineto 1 1 lineto 1 0 lineto closepath fill
database languagesSELECT book FROM biblio WHERE subject=‘DSP’ AND author=‘STEIN’ ;
logical languagesuseful(DSP), useful(hardware), fun(DSP), fun(web)interesting(X) if useful(X) and fun(X)?-interesting(X)
Stein Markup 1.9
They are!They are!
Markup languages do not directly instruct computers
like procedural languages
rather indirectly instruct computer
like logical languages
They do this by using:elements
attributes
entities
text
<BOOK SUBJECT=“dsp”> <TITLE FORMAT=“short”>DSP-CSP</TITLE> <AUTHOR>J. Stein</AUTHOR> This is a great book! &standard-disclaimer</BOOK>
(tags)}
Stein Markup 1.10
Some markup element functionsSome markup element functions Structural
– Clarifies document structure– Delineates document parts
Descriptive (informative)– Indicates – Facilitates information retrieval
Presentational (display)– Presents information in nice format– Helps human readability
Referential (links, applications)– Provide hypertext links– Launch applications
Stein Markup 1.11
Structural MarkupStructural Markup<HEADING>September 1, 2000</HEADING>
<GREETING>Dear Prof. Stein, </GREETING>
<BODY>
I would like to tell you how much I enjoyed reading your new text
“Digital Signal Processing, A Computer Science Perspective”.
I hope we will be able to meet at the next conference.
</BODY>
<SIGNATURE>
Sincerely,
Dee Espy
</SIGNATURE>
Stein Markup 1.12
Descriptive MarkupDescriptive Markup<DATE>September 1, 2000</DATE>
Dear <PERSON>Prof. Stein,</PERSON>
I would like to tell you how much I enjoyed reading your new text
<BOOK>
“Digital Signal Processing, A Computer Science Perspective”.
</BOOK>
I hope we will be able to meet at the next <EVENT>conference.</EVENT>
Sincerely,
<PERSON>Dee Espy</PERSON>
Stein Markup 1.13
Presentational MarkupPresentational Markup<RIGHT-JUSTIFY>September 1, 2000</RIGHT-JUSTIFY>
<BOLD>Dear Prof. Stein,</BOLD>
I would like to tell you how much I enjoyed reading your new text
<UNDERLINE>
“Digital Signal Processing, A Computer Science Perspective”.
</UNDERLINE>
I hope we will be able to meet at the next
<BLINK>conference.</BLINK>
Sincerely,
<IMAGE SRC=“deesignature.jpg” ALIGN=“left”>
<FONT FACE=“Times-Roman”>Dee Espy</FONT>
Stein Markup 1.14
Relational MarkupRelational Markup<today xlink:form=“simple” href=“date” actuate=“auto”>
Dear Prof. Stein,
I would like to tell you how much I enjoyed reading your new text
<A HREF=“www.amazon.com/exec/obidos/ASIN/04712954”>
“Digital Signal Processing, A Computer Science Perspective”.
</A>
I hope we will be able to meet at the next
<A HREF=“conference”>conference.</A>
Sincerely,
<IMAGE SRC=“dee-signature.jpg” ALIGN=“left”>
<A HREF=“mailto:[email protected]”>Dee Espy</A>
Stein Markup 1.15
GGeneralizedeneralized M Markuparkup L Languageanguage
William Tunnicliffe, Stanley Rice [1960s](independently) invent idea of structural markup language
Problem: need different ML for each type of document (letter, report, article, book, etc)
Charles Goldfarb, Edward Mosher, Raymond Lorie (IBM) [1973]invent Generalized Markup Language (GML)
Solution: use metalanguage Document Type Definition (DTD) defines tags
IBM marked up 90% of its documents with GML
Stein Markup 1.16
With GML structure is evidentWith GML structure is evident
Library
Novels
Journals
Textbooks
Algebraic zoology
Botanical history
Computer poetry
DSP
DSP-CSP
DSP just for fun
Elementary QED
Title
Full: Digital Signal Processing a Computer Science Perspective
Short: DSPCSP
Author
Name: Jonathan (Y) Stein
Association: RAD Data Comm.
Publication
Publisher: John Wiley
Year: 2000
Location: New York
ISBN: 04712954
Stein Markup 1.17
SStandardtandard G Generalizedeneralized M Markuparkup L Languageanguage
Problems with GML:– No validating parser– Not portable (between computer systems)
Solution:
SGML
ANSI [1978]
ISO/IEC 8879 [1986] (Intl Org for Standardization / Intl Electrotechnical Commission)
JTC1/SC34/WG1 (WG 1 of SubCommittee 34 of Joint Technical Committee 1)
For presentation:Document Style Semantics and Specification Language
Stein Markup 1.18
SGML - cont.SGML - cont.
If SGML is so good why doesn’t anyone use it ?
Complexity – base standard >500 pages– SGML is a metalanguage– writing DTD is complex programming– marked up text is hard to read– DSSSL adds to complexity
Inflexibility - requires absolute conformity– assumes only one correct way to markup– constrains author to dictated structure– not good at capturing author’s structure
Stein Markup 1.19
HHyperyperTTextext M Markuparkup L Languageanguage
CERN (particle physics institute in Switzerland) was an early Internet adopter Used extensively for collaboration (articles have long author lists)
Major problems with format incompatibility– only straight ASCII worked reliably
Tim Berners-Lee (computer specialist) defined requirements simplicity (couldn’t expect physicists to use SGML) freedom (didn’t need validation, let browser ignore bad markup) needed hypertext links (including to documents over Internet) presentational markup (papers must look nice - authors used to TEX)
Solution: HTML - a specific application of SGML (not metalanguage)
Stein Markup 1.20
HTML versionsHTML versionsHTML 1.0 (1989) Berners-Lee original CERN versionhypertext, images, head+body structure, presentational markup
HTML 2.0 (1994) IETF standard - RFC 1866added lists, forms, etc.
HTML 3.2 (1997) W3C recommendation (incorporates Netscape extensions)
added tables, applets, super/sub-scripts
HTML 4.0 (1997) W3C recommendation (and similar ISO/IEC 15445)
minimizes presentational markup
XHTML 1.0 (2000) present W3C recommendationreformulates HTML in XML
Stein Markup 1.21
HTML document structureHTML document structure
<HTML>
<HEAD>
global definitions such as
<TITLE>Web page title</TITLE>
</HEAD>
<BODY>
marked-up text
</BODY>
</HTML>
Stein Markup 1.22
Some HTML (body) elementsSome HTML (body) elements <H1>Level 1 Heading</H1> Level 1 Heading <H2>Level 2 Heading</H2> Level 2 Heading <H3>Level 3 Heading</H3> Level 3 Heading <EM> emphasized </EM> emphasized <P> Paragraph </P> Paragraph <A HREF=url>link</A> link <UL> <LI> item 1 </LI> .item 1
<LI> item 2 </LI> . item 2 </UL> <OL> <LI> item 1 </LI> 1 item 1
<LI> item 2 </LI> 2 item 2 </OL> <IMG SRC=url>
Stein Markup 1.23
Problems with HTMLProblems with HTMLPresentational aspects have predominated
<B> bold text </B><BLINK> blinking text </BLINK><FONT COLOR=“red”> red text </FONT>
Practically no descriptive markupSearch engines are reduced to flat text searchSearch by topic only through keywords or portals
Not extensibleCan’t add new tagsUnknown tags ignored
Links are relatively simpleUsually user action is required (except IMG)Only full document (with offset) linkableLink management is logistic nightmare
Stein Markup 1.24
Not everything is HTMLNot everything is HTML
Due to HTML limitations other tools are also used:
Multimedia extensions– (dynamic) gif, jpg, …– streaming audio
Common Gateway Interface– generate HTML on-the-fly– Perl, C, …
Server Push - Server Pull Javascript Java
Stein Markup 1.25
eeXXtensibletensible M Markuparkup L Languageanguage
Simplified (best parts of) SGML (subset of features)
Flexible content management tool
W3C recommendation(s)
Extensible - can add new elements (even without DTD)
Easy to create special purpose languages (with DTD/SCHEMA)
Includes HTML-like hypertext links
– and extensions (XLINK, XPOINTER)
The future of the web !
Stein Markup 1.26
XML - an ExampleXML - an Example<?xml version="1.0" standalone="yes"?>
<bibliography>
<book isbn=04712954>
<title>Digital Signal Processing: a Computer Science Perspective</title>
<author>Jonathan (Y) Stein</author>
<publisher>John Wiley and Sons</publisher>
</book>
<article>
<title>False Alarm Reduction for ASR and OCR</title>
<author>Yaakov Stein</author>
<proceedings>Tenth AICVNN Symposium</proceedings>
<pages>195-200</pages>
</article> ...</bibliography>
Stein Markup 1.27
What can we do with an XML fileWhat can we do with an XML file??
Check if well-formed Check if valid (against DTD or schema) Display “as-is” in browser Parse in special-purpose program (SAX, DOM) Process (XSL) to XML, HTML, etc. Display after processing
Stein Markup 1.28
WWirelessireless M Markuparkup L Languageanguage
Markup language element of Wireless Application Protocol
WAP forum (1997)– Ericsson, Motorola, Nokia, Unwired Planet (phone.com)– bring Internet to cellular phone users– re-use fundamental Internet concepts (TCP/IP, http, html, javascript)
but adapted to lower bandwidth smaller screen limited input facilities limited computational resources
– applications scale across transport options (GSM, TDMA, CDMA, 3G)
and device types (mobile phones, personal assistants)
Stein Markup 1.29
WML PhilosophyWML Philosophy
Defined using XML
Transported in compressed binary (for BW reduction)
Applications are modeled as decks of cards
Features:
Actions (OK, navigation, help) can be performed
Hyperlinks (like in HTML)
String variables
Timers
wbmp images (B&W)
Select boxes, forms (for input)
wmlscript (like javascript)
Stein Markup 1.30
WML structureWML structure< ? xml version=“1.0” ? ><!DOCTYPE wml …>
<wml><card>
<p>text
</p><p>
text</p>
</card><card>...</card>
</wml>
Stein Markup 1.31
Some WML elementsSome WML elements
<p> </p> text <a href=...> </a> hyperlink (anchor) <do> </do> action <go href=.../> goto wml page <timer> trigger event (units = tenths of a second) <input/> input user text <prev/> return to previous page $(…) value of variable <img src=… /> display image <postfield name=… value=…/> set variable <select > <option> <option> </select> select box
Stein Markup 1.32
Some more markup languagesSome more markup languages VML = Vector (graphics) Markup Language VoiceXML SSML = Speech Synthesis Markup Language CPML = Call Policy Markup Language DSML = Directory Services Markup Language MathML = Mathematical Markup Language CML = Chemical Markup Language AML = Astronomical Markup Language LegalXML BSML = Bioinformatic Sequence Markup Language GedML = Genealogical Data Markup Language FinXML = Financial market Markup Language ChessML SDML = Signed Document Markup Language RELML = Real Estate Listing Markup Language etc. etc. etc. ...