1 URLs – Uniform Resource Locators Since web pages may contain pointers to other pages, we will...
-
Upload
priscilla-andrews -
Category
Documents
-
view
215 -
download
0
Transcript of 1 URLs – Uniform Resource Locators Since web pages may contain pointers to other pages, we will...
11
URLs – Uniform Resource URLs – Uniform Resource LocatorsLocators
Since web pages may contain Since web pages may contain pointers to other pagespointers to other pages, , we will see how those pointers are implementedwe will see how those pointers are implemented
When the web was first created, it was apparent that When the web was first created, it was apparent that having one page point to another required having one page point to another required mechanisms for naming and locating pagesmechanisms for naming and locating pages. In . In particular there were 3 questions that had to be particular there were 3 questions that had to be answered before a selected page could be displayed:answered before a selected page could be displayed:
• What is the page called?What is the page called?• Where is the page located?Where is the page located?• How can the page be accessed?How can the page be accessed?
22
URLsURLs
The solution chosen identifies pages The solution chosen identifies pages in a way that solves all 3 problems at in a way that solves all 3 problems at once.once.
Each page is assigned a URL Each page is assigned a URL ((Uniform Resource LocatorUniform Resource Locator) that ) that effectively serves as the effectively serves as the page’s page’s worldwide nameworldwide name..
33
URL’sURL’s URLs have 3 parts:URLs have 3 parts:
• The The protocolprotocol (also called a scheme) (also called a scheme)• The The DNS nameDNS name of the machine on which the of the machine on which the
page is located, andpage is located, and• A local name uniquely indicating the specific A local name uniquely indicating the specific
page (usually just a page (usually just a file namefile name on the machine on the machine where it resides)where it resides)
For example, the URL for the author’s For example, the URL for the author’s department is department is http://www.cs.vu.nl/welcome.htmlhttp://www.cs.vu.nl/welcome.html This This URL consists of 3 parts: the protocol (URL consists of 3 parts: the protocol (httphttp), ), the DNS name of the host (the DNS name of the host (www.cs.vu.nlwww.cs.vu.nl) ) and the file name (and the file name (welcome.htmlwelcome.html) with ) with certain punctuation separating the piecescertain punctuation separating the pieces
44
URLsURLs Many sites have certain shortcuts for file names Many sites have certain shortcuts for file names
built in. For example, built in. For example, ~user/~user/ might be mapped onto might be mapped onto useruser’s WWW directory, with the convention that a ’s WWW directory, with the convention that a reference to the directory itself implies a certain file, reference to the directory itself implies a certain file, say, say, index.htmlindex.html
Thus the author’s home page can be reached at Thus the author’s home page can be reached at http://www.cs.vu.nl/~ast/http://www.cs.vu.nl/~ast/ even though the actual file even though the actual file name is different.name is different.
At many sites a null file name defaults to the At many sites a null file name defaults to the organization’s home page.organization’s home page.
55
URLs – mechanismURLs – mechanism To make a piece of text clickable the page writer To make a piece of text clickable the page writer
must provide 2 items of information:must provide 2 items of information:
• The clickable text to be displayed, andThe clickable text to be displayed, and• The URL of the page to go to if the text is selectedThe URL of the page to go to if the text is selected
When the text is selected, the browser looks up When the text is selected, the browser looks up the host name using DNS. Now armed with the the host name using DNS. Now armed with the host’s IP address, the browser then establishes a host’s IP address, the browser then establishes a TCP connection to the host. Over that connection TCP connection to the host. Over that connection it sends the file name using the specified it sends the file name using the specified protocol. Next, back comes the page.protocol. Next, back comes the page.
66
URLs - protocolsURLs - protocols
The URL scheme is open ended, in the The URL scheme is open ended, in the sense that it is straight forward to have sense that it is straight forward to have protocols other than HTTP. In fact, URLs protocols other than HTTP. In fact, URLs for various other protocols have been for various other protocols have been defined, and many browsers understand defined, and many browsers understand themthem
The next table illustrates slightly simplified The next table illustrates slightly simplified forms of the more common ones:forms of the more common ones:
77
ULRs - ProtocolsULRs - ProtocolsName Used for Example
http Hypertext http://www.cs.vu.nl/~ast/
ftp File Transfer Protocol ftp://ftp.cs.vu.nl/pub
file Local file file:///usr/Suzanne/prog.c
news News group news:comp.os.minix
news News article News:[email protected]
gopher Gopher gopher://gopher.tc.umn.edu/11/Libraries
mailto Sending email mailto:[email protected]
telnet Remote login telnet://www.w3.org:80
88
HTTP – HyperText Transfer HTTP – HyperText Transfer ProtocolProtocol
The standard Web transfer protocol is HTTP The standard Web transfer protocol is HTTP (HyperText Transfer Protocol)(HyperText Transfer Protocol)
The HTTP protocol consists of two fairly The HTTP protocol consists of two fairly distinct items: distinct items:
• the set of requests from browsers to servers, the set of requests from browsers to servers, and and
• the set of responses going back the other waythe set of responses going back the other way
99
HTTPHTTP HTTP is an ASCII protocolHTTP is an ASCII protocol (each interaction consists of an (each interaction consists of an
ASCII request, followed by one MIME-like response)ASCII request, followed by one MIME-like response)
MIMEMIME (Multipurpose Internet Mail Extensions) – in the early (Multipurpose Internet Mail Extensions) – in the early days of the ARPNET email messages consisted exclusively days of the ARPNET email messages consisted exclusively of text messages written in English and expressed in ASCII. of text messages written in English and expressed in ASCII. Nowadays on the Internet this approach is no longer Nowadays on the Internet this approach is no longer adequate, as the following need to be addressed:adequate, as the following need to be addressed:
• Messages in languages with accents (French, German)Messages in languages with accents (French, German)• Messages in nonLatin alphabets (e.g. Hebrew, Russian)Messages in nonLatin alphabets (e.g. Hebrew, Russian)• Messages in languages withough alphabets (e.g. Chinese, Messages in languages withough alphabets (e.g. Chinese,
Japanese)Japanese)• Messages not containing text at all (e.g. audio, video)Messages not containing text at all (e.g. audio, video)
1010
MIMEMIME
The basic idea of MIME is to define encoding The basic idea of MIME is to define encoding rules for non-ASCII messages. MIME defines 5 rules for non-ASCII messages. MIME defines 5 message headers:message headers:
Header Meaning
MIME-Version Identifies the MIME version
Content-Description Human readable string telling what is the message
Content-ID Unique identifier
Content-Transfer-Encoding How the body is wrapped for the transmission
Content-Type Nature of the message
1111
MIME – Content TypeMIME – Content TypeHeader Subtype Meaning
Text PlainRichtext
Unformatted textText including simple formatting
Image GifJpeg
Still picture in GIF formatStill picture in JPEG format
Audio Basic Audible sound
Video Mpeg Movie in MPEG format
Application Octet-streamPostscript
An uninterpreted byte sequenceA printable document in PostScript
Message Rfc822PartialExternal-body
A MIME RFC 822 messageMessage has been split for transmissionMessage must be fetched over the net
Multipart MixedAlternativeParallelDigest
Independent parts Same message in different formatsParts must be viewed simultaneouslyEach part is a complete RFC 822 message
1212
HTTP - requestHTTP - request Although HTTP was designed for use in the Web, it has Although HTTP was designed for use in the Web, it has
been intentionally made more general than necessary with been intentionally made more general than necessary with an eye to future object oriented applications. For this an eye to future object oriented applications. For this reason the reason the first word of a requestfirst word of a request line is simply the name of line is simply the name of the the methodmethod (command) to be executed on the Web page (command) to be executed on the Web page (or general object)(or general object)
The built in methods are as follows:The built in methods are as follows:
MethodMethod DescriptionDescription
GETGET Request to read a Web pageRequest to read a Web page
HEADHEAD Request to read a Web page’s headerRequest to read a Web page’s header
PUTPUT Request to store a Web pageRequest to store a Web page
POSTPOST Append to a named resource (web page)Append to a named resource (web page)
DELETEDELETE Remove the Web pageRemove the Web page
LINKLINK Connects two existing resourcesConnects two existing resources
UNLINKUNLINK Breaks an existing connection between Breaks an existing connection between resourcesresources
1313
HTTP request / responseHTTP request / response A request is just a GET line, naming the page desired and A request is just a GET line, naming the page desired and
the HTTP protocol version:the HTTP protocol version:
GET /hypertext/WWW/TheProject.html HTTP/1.1GET /hypertext/WWW/TheProject.html HTTP/1.1
The response is just the raw page, headers, and MIME The response is just the raw page, headers, and MIME informationinformation
For example, because HTTP is an ASCII protocol, it is easy For example, because HTTP is an ASCII protocol, it is easy for aperson at a terminal (opposed to a browser) to direcly for aperson at a terminal (opposed to a browser) to direcly talk to Web servers. All that is a needed is a TCP connection talk to Web servers. All that is a needed is a TCP connection to port 80 on the server. The simplest way to get such to port 80 on the server. The simplest way to get such connection is the Telnet program:connection is the Telnet program:
1414
HTTP - exampleHTTP - exampleClient: Telnet www.w3.org 80Client: Telnet www.w3.org 80
Trying 18.23.0.23Trying 18.23.0.23
Connected to www.w3.orgConnected to www.w3.org
Client: GET /hypertext/WWW/TheProject.html HTTP/1.1Client: GET /hypertext/WWW/TheProject.html HTTP/1.1
Server: HTTP/1.1 200 Document followsServer: HTTP/1.1 200 Document follows
Server: MIME-Version: 1.0Server: MIME-Version: 1.0
Server: Server: CERN/3.0Server: Server: CERN/3.0
Server: Content-Type: text/htmlServer: Content-Type: text/html
Server: Content-Length: 8247Server: Content-Length: 8247
Server: <HEAD><TITLE>The World Wide Web Consortium (W3C) </TITLE> </HEAD>Server: <HEAD><TITLE>The World Wide Web Consortium (W3C) </TITLE> </HEAD>
Server: <BODY> …Server: <BODY> …
1515
HTTP ExampleHTTP Example Or could use a command line Or could use a command line
browser, (such as WFetch) to review browser, (such as WFetch) to review the same informationthe same information
1616
1717
HTML – HyperText Markup HTML – HyperText Markup LanguageLanguage
HTMLHTML is a is a markup languagemarkup language, a language for , a language for describing describing how documents are to be formattedhow documents are to be formatted. . The term “markup” comes from the old days The term “markup” comes from the old days when copyeditors acutally marked up documents when copyeditors acutally marked up documents to tell the printer (in those days a human being) to tell the printer (in those days a human being) which fonts to use, and so on.which fonts to use, and so on.
Markup languages thus contain Markup languages thus contain explicit explicit commands for formattingcommands for formatting. For example, in HTML, . For example, in HTML, <B> <B> means start boldface mode, andmeans start boldface mode, and </B> </B> means leave boldface mode.means leave boldface mode.
1818
HTMLHTML
The advantage of a markup language over one The advantage of a markup language over one with no explicit markup is that writing a browser with no explicit markup is that writing a browser for it is straightforward: the browser simply has to for it is straightforward: the browser simply has to understand the markup commands.understand the markup commands.
By embedding the markup commands within By embedding the markup commands within each HTML file and standardizing them, it each HTML file and standardizing them, it becomes possible for any Web browser to read becomes possible for any Web browser to read and reformat any Web page. and reformat any Web page.
1919
HTMLHTML HTTP and HTML are HTTP and HTML are constantly evolvingconstantly evolving. When . When
Mosaic was the only browser, the language it Mosaic was the only browser, the language it interpreted, HTML 1.0, was de facto standard. interpreted, HTML 1.0, was de facto standard.
When new browsers came along, there was a When new browsers came along, there was a need for a formal Internet standard, so the need for a formal Internet standard, so the HTML 2.0 standard was produced. Next, HTML HTML 2.0 standard was produced. Next, HTML 3.0 was created as a research effort to add 3.0 was created as a research effort to add many new features to HTML 2.0, including many new features to HTML 2.0, including tables, toolbars, mathematical formulas, tables, toolbars, mathematical formulas, advanced style sheets (for defining page advanced style sheets (for defining page layout and the meaning of symbols), etc.layout and the meaning of symbols), etc.
2020
HTML – brief introductionHTML – brief introduction A proper Web page consists of a head and body A proper Web page consists of a head and body
enclosed by <HTML> and </HTML> enclosed by <HTML> and </HTML> tagstags (formatting commands), although most browsers (formatting commands), although most browsers do not complain if these tags are missing.do not complain if these tags are missing.
The head is bracketed by <HEAD> </HEAD> tags, The head is bracketed by <HEAD> </HEAD> tags, and the body is bracketed by <BODY> </BODY> and the body is bracketed by <BODY> </BODY> tagstags
The commands inside the tags are called The commands inside the tags are called directivesdirectives. Most HTML tags have this format, that . Most HTML tags have this format, that is, <SOMETHING> to mark the beginning of is, <SOMETHING> to mark the beginning of something and </SOMETHING> to mark its end.something and </SOMETHING> to mark its end.
2121
HTML – brief introductionHTML – brief introduction
Numerous other examples of HTML are easily Numerous other examples of HTML are easily available. Most browsers have a menu item available. Most browsers have a menu item VIEW SOURCE or something similar. Selecting VIEW SOURCE or something similar. Selecting this item for an HTML page, displays the this item for an HTML page, displays the current HTML source, instead of formatted current HTML source, instead of formatted outputoutput
2222
DNS – Domain Name SystemDNS – Domain Name System
Programs rarely refer to hosts, Programs rarely refer to hosts, mailboxes, and other resources by their mailboxes, and other resources by their binary network addresses. Instead, they binary network addresses. Instead, they use ASCII strings, such as use ASCII strings, such as [email protected]@art.ucsb.edu
Nevertheless, the network itself only Nevertheless, the network itself only understands binary addresses, so some understands binary addresses, so some mechanismmechanism is required to is required to convert the convert the ASCII strings to network addressesASCII strings to network addresses. .
2323
DNSDNS
Way back in the ARPANET, there was simply a Way back in the ARPANET, there was simply a file, hosts.txt, that listed all the hosts and their file, hosts.txt, that listed all the hosts and their IP addresses. Every night, all the hosts would IP addresses. Every night, all the hosts would fetch it from the site and at which it was fetch it from the site and at which it was maintained. For a network of a few hundred maintained. For a network of a few hundred large timeshareing machines, this approach large timeshareing machines, this approach worked reasonably well.worked reasonably well.
However, when thousands of workstations were However, when thousands of workstations were connected to the net, everyone realized that this connected to the net, everyone realized that this approach could not continue to work forever.approach could not continue to work forever.
2424
DNSDNS For one thing, the size of the file would For one thing, the size of the file would
become too large. However, even more become too large. However, even more important, host name conflicts would important, host name conflicts would occur constantly unless names were occur constantly unless names were centrally managed, something centrally managed, something unthinkable in a huge international unthinkable in a huge international network. network.
To solve these problems, To solve these problems, DNS (the DNS (the Domain Name System)Domain Name System) was invented. was invented.
2525
DNSDNS
The essence of DNS is the invention The essence of DNS is the invention of a hierarchical, domain-based of a hierarchical, domain-based naming scheme and a distributed naming scheme and a distributed database system for implementing database system for implementing this naming scheme.this naming scheme.
It is primarily used for mapping host It is primarily used for mapping host names and email destinations to IP names and email destinations to IP addresses.addresses.
2626
DNS – how it is usedDNS – how it is used To map a name onto an IP address, an To map a name onto an IP address, an
application program calls a library application program calls a library procedure called the procedure called the resolverresolver, passing it , passing it the name as a parameter. The resolver the name as a parameter. The resolver sends a UDP packet to a sends a UDP packet to a local DNS serverlocal DNS server, , which then looks up the name and which then looks up the name and returns returns the IP addressthe IP address to the resolver, which then to the resolver, which then returns it to the caller.returns it to the caller.
Armed with the IP address, Armed with the IP address, the program the program can then establish a TCP connectioncan then establish a TCP connection with with the destination, or send it UDP packets.the destination, or send it UDP packets.