Web Architecture I - Graz University of...
Transcript of Web Architecture I - Graz University of...
u www.tugraz.at
Web Architecture I
03.12.2014
Web Architecture I
1
Outline
• Development of the Web
• Quality Requirements
• HTTP Protocol
• Web Architecture
• A Changing Web
• Web Applications and State Management
• Web n-Tier Architecture
• Web Data Management
03.12.2014
WWW Architecture I
2
u www.tugraz.at
Introduction
3
History of the web
• Devised 1989 to deliver static content
• Hypermedia: documents linked into a web
• Navigate by flowing links
• Underlying standards
• HTTP (Hyper Text Transfer Protocol)
• HTML (Hyper Text Mark-up Language)
• URL (Uniform Resource Locator)
• All underlying standards
• Simple
• Free of charge
03.12.2014
WWW Architecture I
4
Robert Cailliau [Wikipedia]
Tim Berners-Lee [Wikipedia]
World Wide Web vs. Internet
03.12.2014
WWW Architecture I
5
https://en.wikipedia.org/wiki/World_Wide_Web#mediaviewer/File:Internet_Key_Layers.png [Wikipedia]
Growth of the Web I
03.12.2014
WWW Architecture I
6
Growth of the Web II
• Time to reach 50 million people
• Telephone: 75 years
• Radio: 35 years
• TV: 13 years
• WWW: 4 years
03.12.2014
WWW Architecture I
7
Growth of the Web III
03.12.2014
WWW Architecture I
8
95,5
40,4
15,8
32,0
9,8
0
10
20
30
40
50
60
70
80
90
100
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014*
Pe
r 1
00
inh
ab
ita
nts
Global ICT developments, 2001-2014
Mobile-cellular telephone subscriptions
Individuals using the Internet
Fixed-telephone subscriptions
Active mobile-broadband subscriptions
Fixed (wired)-broadband subscriptions
Note: * Estimate
Source: ITU World Telecommunication /ICT Indicators database
u www.tugraz.at
Quality Requirements
9
Quality attributes I
• Usability - it must be very easy to use
• I.e. very easy to create, structure and reference
information
• Participation was voluntary and it was the only
possibility to attract the users
• Very error forgiving in structuring and referencing
because of non-technical background of users
• Some things might look different from today’s point of
view
03.12.2014
WWW Architecture I
10
Quality attributes II
• Technical simplicity - it must be very easy for
developers to implement
• All components simple and text-based
• I.e. the first version of HTTP: servers need to respond
to the GET method
• HTML very simple: easy to write parsers and browsers
• URLs extremely simple
03.12.2014
WWW Architecture I
11
Quality attributes III
• Extensibility - it must be easy to add new features
• The first versions of components (standards) where
very simple - improvements were needed
• User requirements change even in a closed
environment
• In a global scope the change is only feature that does
not change
• Examples:
• users wanted to have search facility apart browsing
• Interaction with the content
⇒ HTML forms were introduced
03.12.2014
WWW Architecture I
12
Quality attributes IV
• Scalability - it needs to match the Internet-scale
• anarchic scalability (think about growth rate)
• The Internet is not under control of a single
organization – it is totally decentralized
• Need to continue operating when under an
unanticipated load or malformed or maliciously
constructed data
• Examples:
• 40,000 Google search queries every second
• https://en.wikipedia.org/wiki/List_of_most_viewed_
YouTube_videos
03.12.2014
WWW Architecture I
13
Quality attributes V
• Anarchic scalability - consequences
• Clients cannot be expected to maintain knowledge of
all servers ⇒ Make it searchable!
• Servers cannot be expected to retain knowledge of
state across requests ⇒ Make it stateless!
• Documents cannot have back-links: the number of
references to a resource is proportional to the number
of people interested in that information (Google
PageRank)
03.12.2014
WWW Architecture I
14
Development of the Web
• The original Web was not designed to meet all of
the requirements and quality attributed defined
above
• It lacked also an architectural vision that would meet
these ambitious requirements
• World Wide Web Consortium (W3C) was founded to
solve these problems
• A lot of researchers worked on defining an
architecture to meet these needs
• Security and Encryption was not mentioned at all
Conclusion of all the quality attributes
03.12.2014
WWW Architecture I
15
u www.tugraz.at
Web Protocols - HTTP
16
Overview
• Content
• HTML (Hyper Text Mark-up Language)
• Identification
• URL (Uniform Resource Locator)
• Communication / information exchange
• HTTP (Hyper Text Transfer Protocol)
• Based on TCP Connections, where TCP
• itself is based on IP
• Original design was completely stateless
03.12.2014
WWW Architecture I
17
HTTP Characteristics
• Text based protocol, human readable
• Request consists of:
• Method
• Number of headers (key-value pairs)
• Some methods allow a payload
• Response includes
• Status code
• Number of headers (key-value pairs)
• Depending an the request, a payload is returned
03.12.2014
WWW Architecture I
18
HTTP Examples
• RequestGET /webpage/index.html HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) …
Cookie: JSESSIONID=9C6694142332E65F0CB175BDF1758243;
• ResponseHTTP/1.1 200 OK
Server: Apache-Coyote/1.1
Content-Type: text/html;charset=ISO-8859-1
Content-Length: 64
Date: Wen, 03 Dec 2014 14:15:05 GMT
<content>
03.12.2014
WWW Architecture I
19
HTTP Versions
• HTTP/0.9 - released in 1991
• HTTP/1.0 - released in 1996
• Stateless, i.e. each request is done in a new TCP
session
• HTTP/1.1 - Todays standard
• Reusing of TCP sessions can increases the
throughput (keep alive flag)
• Header specifying the content length needed
• HTTP/2 - different drafts are already tested
• HTTP/3 – talks have already started
03.12.2014
WWW Architecture I
20
u www.tugraz.at
Web Architecture
21
Deriving the Web architecture
• Introducing constraints on the Web architecture to
obtain an optimal solution to the requirements and
quality attributes
• Each constraint will have advantages and
disadvantages
• The whole design process is then a balancing
process
• Optimisation to obtain a best-match for the Web
architecture
03.12.2014
WWW Architecture I
22
Client-Server: Separation of concerns I
03.12.2014
WWW Architecture I
23
Client-Server: Separation of concerns II
03.12.2014
WWW Architecture I
24
Client-Server: Separation of concerns III
• Separates user-interface from data manipulation
concerns
• Supports independent evolvability
• Clients and servers can be developed independently
and across organizational boundaries
• E.g. someone uses Google Maps on their own
homepage
• Supports Internet-scale attribute
03.12.2014
WWW Architecture I
25
Stateless I
03.12.2014
WWW Architecture I
26
Stateless II
• Communication must be stateless in nature
• Each request from client must contain all the
information needed to process that request
• I.e. it can not take advantage of session information
stored on the server
• Session state is completely on the client
• Possible Drawback
• Information might need to be send multiple times
• Important Benefits are visibility, reliability and
scalability
03.12.2014
WWW Architecture I
27
Stateless III
• Visibility:
• Only look at a single request to determine the full
nature of the request
• Reliability:
• It eases the task of recovering from partial failures
• Scalability:
• Server can free resources after each request
• Simplifies implementation because servers do not need
to manage information across multiple requests
03.12.2014
WWW Architecture I
28
Cache I
03.12.2014
WWW Architecture I
29
Cache II
• Information can be labeled (by servers) as
cacheable
• If a response is cacheable, then a client cache is
given the right to reuse that response data for later,
equivalent requests
• Advantage: Improves efficiency, scalability, user-
perceived performance
• Disadvantage: Decreases reliability if the data does
not match
• Midway: ask a server if the data has changed
03.12.2014
WWW Architecture I
30
Uniform interface I
03.12.2014
WWW Architecture I
31
Uniform interface II
• Uniform interface between components
• Advantages:
• Visibility of interactions is improved
• Simplifies the overall architecture
• Decouples implementations from the services
• Improves Internet-scale
• Disadvantages:
• Degrades efficiency
03.12.2014
WWW Architecture I
32
Uniform interface III
• Prerequisites for a uniform interface
• Unambiguous Identification of resources (URL)
• Manipulation of resources through representations
• In the beginning: HTML
• Later: Extensible Markup Language (XML - still
widely used)
• Now: JavaScript Object Notation (JSON)
• Self-descriptive messages
• HTTP Methods describe the action (GET, POST, PUT,
DELETE)
03.12.2014
WWW Architecture I
33
Layered system I
03.12.2014
WWW Architecture I
34
https://upload.wikimedia.org/wikipedia/commons/c/c4/IP_stack_connections.svg [Wikipedia]
Layered system II
03.12.2014
WWW Architecture I
35
• Improves Internet-scale
• Application composed of layers that are only aware
of the neighbouring components not the complete
system
• Bounds complexity and promotes independence
between components
• Each laver
• Uses the service of the underlying layer
• Provides a service to the layer above
• Communicates to peer-layers in the neighbouring comp
Layered system III
• Supports scalability by introduction of proxies,
shared-caches, gateways
• E.g. load-balancing behind a gateway
• Reduce user-perceived performance because they
add processing overhead
03.12.2014
WWW Architecture I
36
Code on demand I
03.12.2014
WWW Architecture I
37
Code on demand II
• Client functionality extension by downloading code
• Advantages:
• Improves extensibility
• Independent development
• Be aware of security concerns!
• Technologies
• JavaScript (by far most important)
• Flash (is loosing ground fast)
• Java applets (already dead)
• Microsoft Silverlight (was that ever used?)
03.12.2014
WWW Architecture I
38
u www.tugraz.at
A Changing Web
39
The Web evolved as a platform I
• The Web evolved as a platform
• Started out with simple Homepages with static
documents (1990s)
• Developed into more and more interactivity (2000s)
• Now the web is a complex system of different types,
applications, services,
• Two faces of the Web nowadays
• The Web as an application platform
• The Web as a huge distributed database
03.12.2014
WWW Architecture I
40
The Web evolved as a platform II
03.12.2014
WWW Architecture I
41
u www.tugraz.at
Web Applications and State
Management
42
What are the issues when building Web
applications?
• User requirements
• User interface and usability
• Application state (manage state) and hypertext
(navigate)
• Addressability
• Architecture
• Scalability
• Performance
• Fast development circles
03.12.2014
WWW Architecture I
43
Traditional Stack of Web-Applications
Example: Apache Tomcat
03.12.2014
WWW Architecture I
44
Operating System
Virtual Machine / Hardware
Web Server
Web Application Server
Web
Applic
ation 1
Web
Applic
ation 3
Web
Applic
ation 2 • Application logic
• Answers the requests
• Manages the sessions
• “Servelts” packaged in a war-file
• Stateless connection handling (HTTP)
• “Coyote”-part of Tomcat
• Support for Session handling
• Servlet-container
• “Catalina”-part of Tomcat
Modern Stack of Web-Applications
Example: Dropwizard
03.12.2014
WWW Architecture I
45
Operating System (OS)
Virtual Machine / Hardware
• Each Web application is an application
on OS-Level
• Web-Server functionality is provided by
library / framework (i.e. part of
Dropwizard)
• Result is one complete Java application
as jar-file
• Session Handling is done by the
application (with library / framework)
• Solves scaling issues of traditional
stack
• Provides better isolation of applications
Web
Serv
er
Web A
pplic
ation 1
Web
Serv
er
Web
Serv
er
Web A
pplic
ation 2
Web A
pplic
ation 2
Session Tracking
• HTTP is stateless
• Sessions are tracked by unique identifiers (Session-
Id)
• Session-Ids are transmitted from and to the server
• As part of the URL (URL rewriting, permalink)
• In the HTTP Header (Cookies)
• Sessions must either be tracked by
• Application
• Client
• Both?
03.12.2014
WWW Architecture I
46
Session Tracking on the server
• Cookies or URL rewriting can be used
• Web server provides only low-level tracking
• I.e. they provide the framework for session tracking not
the full logic
• Application server has other responsibilities as well
• Can lead to serious scalability problems
• Load balancing between server becomes
complicated
• Handover form one server to another in one session
gets difficult or even impossible
03.12.2014
WWW Architecture I
47
Session Tracking on the client
• URL rewriting can be used
• Transfer parts of the application logic into the client
(Code on demand)
• Manage it there with AJAX (Asynchronous JavaScript
and XML)
• But other problems arise
• How to recover states with a new session: AJAX
applications have typically single URLs?
• How to recover previous state, i.e. browser back button
problem?
03.12.2014
WWW Architecture I
48
Session Tracking on the client and server
• The optimal solution is typically somewhere in the
middle:
• Manage only important states on the server
• Give each state an own URL
• Use linking to relate states to each other
• No management of the state on the server: no
scalability problems
• No management of the state on the client: no
recovery problems
03.12.2014
WWW Architecture I
49
Session Management - URL Rewriting
• Advantages
• Meaningful, easier for humans, readable
• URL can be bookmarked, share with others
• Search engines can retrieve different parts and index it
• Advantages for service integration, as you might link
services to each other
• Make different content representations addressable
(HTML for humans, XML or JSON for services)
• Disadvantages
• Too long links, Browser limits are usually 2048 or 4096
characters
03.12.2014
WWW Architecture I
50
Session Management - Cookies
• Advantages
• Can store more data
• Limit depends on browser (4kB to 10MB per domain)
• Short URLs are kept
• Disadvantages
• Might be difficult for the user to grasp, as nothing is
seen
• Legal issues
• Must be used with care, use URL rewriting when
whenever possible
03.12.2014
WWW Architecture I
51
Session Management - Example
• Google Maps uses AJAX to maintain a permalink
• Any action that you execute changes the permalink
• The permalink is kept as a part of HTML
• This is the equivalent of the address bar
03.12.2014
WWW Architecture I
52
Session Management - Example
03.12.2014
WWW Architecture I
53
Session Management - Example
• A little bit of extra DOM/JavaScript work keeps the
Permalink up to date as you navigate
• Every point on the map is a separate application
state that has its own URL
• Application states were destroyed by AJAX but was
put back by application design
• It allowed communities to grow around the Google
Maps application
• Only because of proper management of application
states with URLs
03.12.2014
WWW Architecture I
54
u www.tugraz.at
Web n-Tier Architecture
55
Starting point - 2-layer applications
• Everything runs on the
server
• One and the same scripts
implements application
logic and the presentation
(e.g. generating of HTML)
• Application / Presentation
• Scripts (e.g. PHP)
• Data Management
• Relational database
03.12.2014
WWW Architecture I
56
Data management
Application / Presentation
Problems of 2-layer applications
• Mixture of application and presentation related
functionality
• Changes in application logic lead to changes in
presentation functionality and vice versa
• E.g. changing a table that present some application
data leads to changes in the return values of some
application specific functions
• Even more dangerous the presentation layer talks
directly to the database via a data manipulation
language (DML)
⇒ Better modularity is achieved with the third layer
03.12.2014
WWW Architecture I
57
Evolvement - 3-layer applications
• Separation between Application and Presentation
layer
• No direct connection between Presentation and Data
Management
• Decoupling of Application and Presentation layer
• Possibility to exchange Presentation layers
• Example:
• Making a Web gateway to an existing application
• Old GUI (e.g. a standalone GUI) is replaced with a Web
GUI
03.12.2014
WWW Architecture I
58
3-layer applications - Architecture
• Presentation tier
• HTML, templates and
scripts to generate HTML
• Application logic tier
• actual application, the
business logic
• Data access tier
• manages persistent
application data
03.12.2014
WWW Architecture I
59
Data management
Process Logic
User Interface
3-layer applications - Surroundings
• User interacts via the
Web browser
• Complete Work is done
in the Web application
• Provide GUI
• Do the actual logic
• Load & store data
• Persistence Backend
realised with a relational
database
03.12.2014
WWW Architecture I
60
Web Browser
Database
Data management
Process Logic
User Interface
Web Application
HTTP
SQL
3-layer applications - Client-side / Browser
inclusion I
• With introduction of AJAX different possibilities
where to situate tiers
• E.g. presentation in browser: HTML + (presentation)
JavaScript, application and data access on server
• E.g. presentation and application in browser: HTML +
(presentation and application) JavaScript, data access
on server
• Note: May require additional considerations in regard
to security (if the application logic is done on the
client)
03.12.2014
WWW Architecture I
61
3-layer applications - Client-side / Browser
inclusion II
03.12.2014
WWW Architecture I
62
Web Browser
Database
Data management
Process Logic
User Interface
Web Application
HTTP
SQL
Web Browser
Database
Data management
Process Logic
User Interface
Web Application
HTTP
SQL
3-layer applications - Model-View-Controller
• There are numerous architecture variants built on
the top of N-tier architectures
• In traditional software engineering User-oriented
database applications are built with an N-tier
architecture
• The most important for Web applications: Model-
View-Controller architecture
• It was invented in the early days of GUIs
• To decouple the graphical interface from the application
data and logic
• Very useful also for Web applications
03.12.2014
WWW Architecture I
63
3-layer applications - Current state
03.12.2014
WWW Architecture I
64
Database
Data management
Process Logic
Other Web Apps.
Web Application
SQL
Database Web Application
NoSQL HTTP
Client
HTML , JSON/XML over HTTP
Static Content
User Interface
• Browsers combine static
content (HTML) with
dynamic data
• Other Web Application
only use the dynamic
data
• Web Application provides
different endpoints for
static and dynamic
content
• Combination of existing
DBs/services with new
ones
u www.tugraz.at
Web Data Management
65
Data Backbone
• Often Web applications deal with relational
databases
• Need to manage relational data in object-oriented
applications
• Use design patterns like Data Access Object (DAO)
• Use object/relational mapping (ORM), like Hibernate
framework or Java Persistence API (JPA)
03.12.2014
WWW Architecture I
66
Web as a database
• The Web we use is full of data
• Book information, opinions, prices, arrival times, blogs,
tags, tweets, etc.
• The data is organized around a simple data model:
node-link model
• Each node is a data item that has a unique address
and a representation
• Representation formats are e.g. HTML, PDF,... for
humans, or e.g. XML, JSON for programs
• Nodes can be interlinked using their unique
addresses
03.12.2014
WWW Architecture I
67
Information retrieval
• How to find what I’m looking for (again)?
• The mainstream approach are search engines with
full-text processing
• Another approaches analyze links
• Links in databases, or within/between
documents/sites
• Mixed approach: full-text and links, e.g. Google
03.12.2014
WWW Architecture I
68
Managing Metadata
• Metadata is data about other data, often semi-
structured
• On the web
• Tag information items (everything that you can access
via URL) in a structured manner
• Social Web 2.0 applications http://del.icio.us or http://www.flickr.com
• Semantic annotation of Web content (Microformats)
⇒ Search inside metadata
03.12.2014
WWW Architecture I
69
u www.tugraz.at
Web Architecture I
03.12.2014
Web Architecture I
70