Web Servers Guntis Bārzdiņš Artūrs Lavrenovs Normunds Grūzītis.

Post on 19-Jan-2016

225 views 4 download

Transcript of Web Servers Guntis Bārzdiņš Artūrs Lavrenovs Normunds Grūzītis.

Web Servers

Guntis BārzdiņšArtūrs LavrenovsNormunds Grūzītis

What a basic web server does

What a basic web server does

● Implements the HTTP protocol● Listens for HTTP requests from clients (e.g. browsers)

● Tries to fulfill them with static content from the file system● A web server itself serves only static files

● Receives content from clients (e.g. via HTML forms, incl. uploading of files)

● Forwards dynamic content requests for external execution

● Does other useful tasks via extension modules

Web server market share

F: Apache 1.1, modules supportedH: Apache supports HTTP/1.1 virtual hostingI: Microsoft IIS/4.0 and Active Server PagesM: Apache 2.0Q: Microsoft .NET frameworkN,O,R: Code Red worm, Nimda worm, SQL Slammer wormV: Google App EngineW: Microsoft Hyper-V

Jun 2015

Apache

Constantly has been the most popular server

Highly configurable and extensible (compiled modules)

Runs on many operating systems (primarily, on Unix)

SSL / TSL support

Supports various authentication schemes

Flexible URL rewriting and aliasing

Virtual Hosts

Custom log files, etc.

Apache modules

mod_access Access control based on client hostname or IP address

mod_alias Mapping different parts of the host filesystem in the document tree,

and URL redirection

mod_auth_xxx Various user authentication approaches (file, dbm, form, etc.)

mod_autoindex Automatic directory listings

mod_cgi Execution of CGI scripts

Apache modules

mod_include Server-parsed documents (SSI)

mod_mime Determining document types using file extensions

mod_proxy Caching proxy abilities

mod_rewrite Powerful URI-to-filename mapping using regular expressions

mod_usertrack User tracking using Cookies

Apache modules

mod_ssl Provides strong cryptography via the Secure Sockets

Layer (SSL) and Transport Layer Security (TLS) protocols by the help of the Open Source SSL/TLS toolkit OpenSSL

Since Apache 1.3+ (1998) Latest version: Apache 2.4 (since 2012)

Private and Public keys Thawte (thawte.com), Verisign (verisign.com)

Apache modules

Third-party modules for server-side scripting:

mod_php Executes PHP within Apache

mod_python Executes Python within Apache

mod_ruby Executes Ruby within Apache

mod_jk Connects Tomcat with Apache

etc.

Compiling and installing Apache

./configure --enable-layout=Debian

Use Debian style directory layout

--enable-suexec Allows you to uid and gid for spawned processes (CGI, SSI)

--enable-MODULE=shared Compiles, installs and adds the module as .so

--disable-MODULE Some modules are compiled by default (e.g. autoindex, cgi) and

have to be disabled explicitly

vs. e.g. apt-get install <module>

Apache directory layout

Debian

/etc/init.d/apache2

Apache control script

/etc/apache2/

Apache configuration files

/var/www/

Default Document Root

/usr/lib/cgi-bin/

Default directory for scripts

/var/log/apache2/

Log files (access.log, error.log)

/usr/bin/

htpasswd, htdigest, htdbm

/usr/lib/apache2/modules/

Apache modules

/usr/lib/apache2/suexec

CGI wrapper

Apache access log

LogFormat "%v %h %l %u %t \"%r\" %>s %b" commonCustomLog /usr/local/apache/logs/access_log common

%v – virtual host %h – remote host %u – user %t - time %r – HTTP request %>s – status code %b – size

www.atlants.lv 159.148.85.46 - - [21/Nov/2004:17:23:36 +0200]

"GET /index.php?m=5 HTTP/1.1" 200 32257

Apache error log

ErrorLog /usr/local/apache/logs/error_logLogLevel warn

[Sun Nov 21 09:13:42 2004] [error] PHP Fatal error: Call to undefined function PN_DBMsgError() in /home/msaule/public_html/referer.

php on line 85

[Sun Nov 21 12:41:09 2004] [error] [client 81.198.145.117] File does not exist: /home/sms/public_html/favicon.ico

php on line 85

[Sun Nov 21 13:02:50 2004] [error] [client 66.249.66.173] File does not exist: /home/code/public_html/robots.txt

[Sun Nov 21 13:08:26 2004] [error] [client 81.198.176.114] File does not exist: /home/refuser2/public_html/_vti_bin/owssvr.dll

[Sun Nov 21 13:08:26 2004] [error] [client 81.198.176.114] File does not exist: /home/refuser2/public_html/MSOffice/cltreq.asp

Configuring Apache

Edit httpd.conf

Check configuration: apachectl configtest

Restart Apache

Test changes

http://httpd.apache.org/docs/

Virtual hosts

<VirtualHost *>

ServerName www.jrt.lv

ServerAlias www.jrt.com

CustomLog /usr/local/apache/logs/jrt_access_log common

ErrorLog /usr/local/apache/logs/jrt_error_log

DocumentRoot /home/jrt/public_html

</VirtualHost>

Configuring Apache

.htaccess (directory-level, read on every request)

AuthType Basic

AuthUserFile /home/someuser/passwd

AuthName "Admin"

require valid-user

htpasswd

htpasswd -c <password file> <username>

user1:Y90u499mUj6xE

user2:DOrWgcNwzaQUQ

Configuring Apache

Script Engine (PHP, Python, ...)

Browser Web Server

HTMLPNGCSS

...

Database Server(MySQL, ...)

Dynamic content

LAMP

● Linux - Apache - MySQL - PHP● The most common web server stack● Simple to install and configure● Simple to develop web applications● Acceptable performance and security

● apt-get install apache2 mysql-server php5 libapache2-mod-php5

MySQL

● Unix distributions moving towards MariaDB after the acquisition of MySQL by Oracle● MySQL fork, being led by the original developers of MySQL

● Fast relation DB implementation● Fairly easy to user (app developer)● Different storage engines

● With/without without transactions, memory based, etc.

● Query caching● User quotas

PHP

● One of the most popular programming languages for web applications

● Easy to learn (though, bad coding practices)● Interpreted language● Functions from Unix libraries and tools● Huge amount of ready applications, libraries and

modules

● Create a database● Using the MySQL command prompt accessed by

– $ mysql -u root -p– > CREATE DATABASE `example` COLLATE

'utf8_general_ci';– > CREATE TABLE `posts` (...)– > CREATE USER 'example'@'localhost' IDENTIFIED BY

PASSWORD '...'– > GRANT ... ON `example`.* TO 'example'@'localhost';– > INSERT INTO `posts` (`title`,`info`) VALUES

('a','a');

Simple web app

Simple web app

● Or be lazy and use a web interface like phpMyAdmin or Adminer– Download single file adminer.php

– Drop it into /var/www/

– Navigate your browser to http://localhost/adminer.php

– Do all the tasks in browser without really knowing SQL

Simple web app

● Create file example.php in /var/www/● Write your HTML with PHP code inside

– Connect to database

– Select data

– Show data

● Your simple web site is ready● Navigate your browser to http://localhost/example.php● Enjoy result

Simple web app

Simple web app

● From http://localhost/example.php

Webservers cannot create dynamic content by themselves

Two options how to server dynamic content [Apache] modules

CGI / SSI, FastCGI, SCGI, WSGI, ...

Potentially many programming languages PHP, Perl, Python, Java, ...

C, C++, shell scripts, ...

Dynamic content

CGI - Common Gateway Interface

● A standard environment for web servers to interface with external executable programs● Any script or binary executable

● For each request, webserver defines set of environment variables derived from the request and the server configuration

● Web server starts the external program in the prepared environment● No additional libraries required

● Sends GET/POST data as standard input

● Waits for standard output from executed program, and returns it to the client● With additional HTTP headers

● REQUEST_METHOD: name of HTTP method

● PATH_INFO: path suffix, if appended to URL after program name and a slash

● PATH_TRANSLATED: corresponding full path as supposed by server, if PATH_INFO is present

● SCRIPT_NAME: relative path to the program, like /cgi-bin/script.cgi

● QUERY_STRING: the part of URL after the ? character (GET)

● REMOTE_HOST: host name of the client

● REMOTE_ADDR: IP address of the client (dot-decimal)

● Variables passed by the user agent (HTTP_ACCEPT, HTTP_ACCEPT_LANGUAGE, HTTP_USER_AGENT, HTTP_COOKIE and possibly others) contain values of corresponding HTTP headers

● Few more

CGI enivronment variables

CGI example

#!/bin/bash

echo "Content-type: text/plain"

echo ""

echo "Hello world!"

echo "Today is:" `date`

SSI – Server Side Includes

• Directives in HTML pages that are evaluated by the server while the pages are being served

• Without having to serve the entire page via a CGI program

• Configure httpd.conf or .htaccess: Options +Includes

• Two ways to tell Apache which files should be parsed:

• Parse any file with a particular file extension:

• AddType text/html .shtml

• AddOutputFilter INCLUDES .shtml

• Parse files if they have the execute bit set:

• XBitHack on

• For existing files: chmod instead of changing the file name

SSI – Server Side Includes

• <!--#echo var="DATE_LOCAL" -->

• <!--#flastmod file="index.html" -->

• <!--#include virtual="/footer.html" -->

• <!--#include virtual="/cgi-bin/counter.pl" -->

• <!--#exec cmd="ls" -->

• Setting variables

• Conditional expressions

• A simple but Turing complete programming language

• Loops can be implemented via recursive redirects

CGI issues

● Each request forks a new process: a big overhead for process creation and destruction

● All scripts must be interpreted on each request: another overhead● May be reduced by using compiled CGI programs

● Not scalable● Not suitable for modern web servers (needs)● Still widely used in embedded systems (e.g. WiFi

router web management consoles) that require occasional requests

FastCGI

● One or more persistent processes started (pre-forked)● Web server communicates over sockets or TCP● Each process serves many requests● Performance comparable to modules● Facilitates reuse of resources (DB connections, in-

memory caching, etc.)● Separation of web server and dynamic content system● Scalability – deploy processes across a server farm● apt-get install libapache2-mod-fastcgi php5-fpm

Other communication methods

● Integrate the dynamic content generation system with the web server process (Apache modules)

● CGI derivatives● Simple Common Gateway Interface (SCGI): similar to

FastCGI but is designed to be easier to implement

● *SGI (web-server gateway interfaces) implement programming language specific method of communication between web server and applications● WSGI – Python, PSGI – Perl, Rack - Ruby

● Proxy requests to applications that implement communication via HTTP

C10K problem

● Dan Kegel, 1999● Web servers should handle 10,000 clients

simultaneously (not the same as 10K requests)● Operating system kernel limitations● Functionality provided by the operating system● Web server design flaws

C10K – OS kernel

● Open source nature of Unix kernels allowed to quickly identify C10K bottlenecks and fix them

● Networking related algorithms and data structures in Unix kernels originally implemented with complexities O(n|n^2|...) which where fixed to O(1|n)

● As a result networking capabilities of Unix kernels are virtually limitless (limited by hardware resources)

C10K – OS functionality

● Implemented new scalable I/O event notification mechanisms (epoll – Linux, kqueue – *BSD)– Better performance than traditional poll/select

– e.g. on a large number of file descriptors

– Can receive all pending event using one system call

● AIO – the POSIX asynchronous I/O (AIO) interface – allows applications to initiate one or more I/O operations that are performed asynchronously (i.e., in the background)

● The application can select to be notified of completion of the I/O operation in a variety of ways: by delivery of a signal, by instantiation of a thread, or no notification at all

C10K – web server design

● Non-blocking I/O for networking and disk– Don't block waiting on action completion, serve other

requests and wait for notifications about I/O completion

● Many threads– Use all available CPU cores to achieve maximum

concurrency, avoid locking data structures

● Each thread serves many requests– Don't create thread per request, reuse threads, while some

non-blocking action completes process other requests

C10M problem

● 10 million concurrent connections per server● Doubling the CPU speed does not double the number of

open connections● Current Unix kernels can't handle that

– Application thread locks in kernel– Hardware drivers (NIC)– Memory management

● Solution: new generation of high load Unix kernels– 1 main application per server– Minimize system call amount– Minimize kernel work

nginx

• A C10K webserver● Apache implements a thread per connection model

● nginx does not create a new process/thread per connection (does not use the thread scheduler as a packet scheduler)● Typically, one single-threaded worker process per CPU

● Each worker can asynchronously handle thousands of concurrent connections (handles the scheduling itself)

• Event-driven: event is a new connection

• Asynchronous: handles interaction for more than one connection at a time

• Non-blocking: does not stop disk I/O because the CPU is busy; works on other events until the I/O is freed up

nginx

● Efficient CPU usage● Less cores needed

● Small memory footprint per request● High-performance

● Thousands connections/requests per second

● Often used as front-end to high-load websites● Load-balancing (reverse proxy), caching etc.

High-load web systems

● Busy dynamic web sites cannot reside in one server● Need some strategy how to split load across multiple

web servers● One possible strategy

– One entry point, front-end, which receives all requests and splits the load (e.g. nginx, Varnish)

– Back-ends process requests from redirected from the front-end (e.g. nginx, Apache)

Varnish

● Starpniekserveris (proxy server)– Reversais

– Kešojošais

– Programmējams

● Slodzes dalītājs (load balancer)● Dinamiskā satura ģenerētājs● Rīki: žurnalēšana, atkļūdošana, monitorēšana● Lietotāji: Facebook, Twitter, WikiLeaks, ThePirateBay

● Izstrādāts Norvēģijā

● Fantastiska veiktspēja pat uz lētā gala serveriem – no 1000 līdz 10000 pieprasījumu uz serveri sekundē tā ir norma

● C + labi C programmētāji

● Izmanto Unix arhitektūras priekšrocības

● Pēc «tjūninga» desmitiem tūkstošu pieprasījumu sekundē, testēšanā pārsniegti 100k/s

● Pieprasījuma orientēta domēnspecifiska konfigurēšanas/programmēšanas valoda VCL

Varnish

● Jebkura dinamiskas tīmekļa lapas ģenerēšana ir ļoti lēna - atkarībā no vides simtiem vai tūkstošiem reižu lēnāka nekā statiska satura atgriešana

● Lētā gala serveris var ģenerēt pāris simtus šādu dinamisku lapu sekundē

● Jebkurš izstrādes ietvars padara dinamiskas lapas ģenerēšanu vēl desmitiem vai simtiem reižu lēnāku

● Jau tikai daži desmiti pieprasījumi sekundē

● Rupja matemātika: 100x100=10 000 reižu lēnāk kā statiska lapa

Kešošana

● Ideāli būtu atgriezt dinamisku saturu ar veiktspēju līdzīgu statiskām lapām

● Saturu, kas noteiktā laika intervālā būtiski nemainās, iespējams uz laiku saglabāt, lai atkalizmantotu

● Cietā diska izmantošana lēna, labā prakse izmantot tikai RAM vai servera SSD visa kešotā satura glabāšanai

● Katram konkrētam gadījumam jāveido kešošanas stratēģija, kas var būt ļoti subjektīva

Kešošana

● Pēc pieprasījuma adreses (pilnas vai regulāras izteiksmes) var noteikt, kurus pieprasījumus kešot, cik ilgi konkrētu elementu kešot vai nekešot

● Reklamējas, ka var paātrināt lapas atgriešanu no simtiem līdz tūkstošiem reižu, t.i., tikai aptuveni līdz 10 reizēm lēnāk nekā statisks saturs● Ātrs, salīdzinoši ar citām kešošanas pieejām

Varnish kešošana

DSL VCL● Vienkārša sintakse (līdzīga C), kas tiek notranslēta

uz C un tad nokompilēts uz mašīnkodu● =, ==, !=, ~, !~, !, &&, ||, +, “string”● if () {} else {}, set, unset, return

● 9 subrutīnas, kas ir dažādi katra pieprasījuma apstrādes posmi, kurās var kaut ko ietekmēt

● Tikai predefinēti objekti - client, server, req, bereq, beresp, obj, resp

sub vcl_recv {

if (req.request == "GET" && req.url ~ “\.js$”) {

return (lookup); }

}

VCL apstrādes arhitektūra

Integrēšana● Fiksētais kešošanas laiks var nebūt optimāls

● Saturs var mainīties biežāk par uzstādīto laiku - lietotāji dabū vecu informāciju

● Retāk – serveri veic nevajadzīgu darbu

● Risinājums – jāpaziņo serverim, ka saturs ir jāatjaunina

acl purge { "192.168.0.0"/24; }

sub vcl_recv { if (req.request == "PURGE" ) {

if (!client.ip ~ purge) { error 405 "Not allowed."; } return (lookup); } }

sub vcl_hit { if (req.request == "PURGE") {

purge;

error 200 "Purged."; } }

Dinamiskā satura ģenerēšana ESI● Bieži vien tīmekļa lapas sastāv no blokiem, kuru

mainība ir dažāda● Vai arī ir neliels informācijas bloks, kas atbilst katram

lietotājam (piemēram, “Sveiks, [Jāni Bērziņ], Tev ir [0] jauns ziņas”)

● Mēs to varam ielādēt pēc lapas ielādes, izmantojot JSON vai arī ģenerēt saturu ar Varnish

<TABLE><TR><esi:include src=”sveiks.html”/></TR>

<TR><TD><esi:include src=”index.html”/></TD>

<TD><esi:include src=”article.html”/></TD></TR>

</TABLE>● Varnish parsē <esi> birkas un saliek elementus kopā, visi

elementi konfigurēti un kešoti kā neatkarīgi

Slodzes dalīšana● Vienu adresi var apstrādāt vairāki ar bakendi● Dažādus url var apstrādāt dažādi bakendi● Monitorēšana

● Beigto serveru atslēgšana (restart, upgrade, repair)● Atdzīvojušos serveru pieslēgšana atpakaļ (arī jauni)

● Faktiski nozīmē, ka var lietot kaudzi LĒTU desktop grade dzelžu dinamiskā satura ģenerēšanai

● Ja pievienojam vēl vienu frontend, tad iegūstam augstu, bet lētu bojājumpiecietība (fault tolerance)

● Ja izmantojam NoSQL vai kā savādāk iegūstam replicētu datubāzi, tad nav nepieciešami dārgi serveri vispār

Varnish lietojums Latvijā$ curl -I www.tvnet.lv

● HTTP/1.1 200 OK

● Server: Apache

● Last-Modified: Wed, 07 Nov 2012 20:09:08 GMT

● Expires: Wed, 07 Nov 2012 20:10:08 GMT

● Cache-Control: max-age=60

● Vary: Accept-Encoding

● Content-Type: text/html; charset=UTF-8

● Content-Length: 185924

● Date: Wed, 07 Nov 2012 20:10:15 GMT

● X-Varnish: 2025605055 2025545136

● Age: 67

● Via: 1.1 varnish

● Connection: keep-alive

● $ curl -I www.delfi.lv

● HTTP/1.1 200 OK

● X-Fe-Node: nuffy

● Content-type: text/html; charset=utf-8

● Server: lighttpd/1.4.31 (PLD Linux)

● Content-Length: 159097

● Date: Wed, 07 Nov 2012 20:20:58 GMT

● X-Varnish: 734492112 734450241

● Age: 58

● Via: 1.1 varnish

● Connection: keep-alive

Situācija šobrīd

● Standarta tīmekļa izstrādes risinājums ir HTTP serveris un kāda klasiska dinamiskā satura ģenerējošā sistēma (PHP, ASP, Python u.c.), pastāv problēmas:● Ilglaicīgie pieprasījumi un pastāvīgie savienojumi● Vienlaicīgi apkalpojamo klientu skaits● Savietojamība ar citām tehnoloģijām● Nākotnes attīstības iespējas

Notikumvirzītie programmēšanas ietvari

● Ideja un realizācija nav jauni (Python Twisted, Perl Object Environment, Ruby EventMachine, Node.js)

● Maza izplatība tīmekļa risinājumos● Risina standarta tehnoloģiju problēmas● Reaktora projektējums, C10K problēma● Ļauj tīmekļa programmētājiem veidot tīkla risinājumus