Apache logs monitoring
-
Upload
umair-amjad -
Category
Technology
-
view
577 -
download
0
Transcript of Apache logs monitoring
Monitoring Apache
Monitoring Apache
There are many ways to examine apaches status and performance
apachectl v tells you the version number
apachectl V gives you complete compiler settings
apachectl status gives you the servers status in the form of a scoreboard where, for each apache child, you see its status as one of these characters:_ waiting for connection
S starting up
R reading a request
W sending a reply
K keepalive
D performing DNS lookup
C closing connection
L logging information
G gracefully finishing
I idle cleanup
. open slot with no current process
_____CCCCCCC_____RR_CCCCCRR_________CC_CCC__......._____CCCCCCCRW______..................____CCCCLLCCCCCR____..................
Extended Status
You can obtain even more information (including PID) using ./apachectl fullstatus
this gives you a snapshot of the current status of each child
to use fullstatusload the mod_info.so module (not needed in apache 2.2, part of the core)
add the directive ExtendedStatus On to your httpd.conf file
add a container for the address /server-status in your httpd.conf file that has the directive SetHandler server-status
Now, when you type ./apachectl fullstatus, the listing gives you more details:Srv child server number & generation (in the form 5-1), and PID
Accesses of this connection for this child
Mode (as per last slide, _, C, R, W, etc)
CPU usage, number of seconds
Seconds since beginning of most recent request
Milliseconds required to process most recent request
Kilobytes transferred for the connection
Mbytes transferred for this child
server-status and server-info
You can also obtain view this information via web browser
Either server status information (as from the last slide), or server information
for either/both of these, add a or containerNOTE: the URL for these is simply http://ipaddress/server-status or http://ipaddress/server-info
also to the container the proper handler, SetHandler server-status or SetHandler server-info
Information available by server-info includes
version, compilation date
modules loaded, directives of each
hostname, port
timeout, keep-alive directives
server root, configuration file location
Security
Making this information available presents a security flaw
by knowing the version of apache, it is easier to hack into the server and manipulate/destroy files
yet this might be useful for a web administrator to check status or server information at any time either locally or remotely
In the container from the previous slide, lets add proper allow/deny statements to limit who can access this information
deny access to all except for specific IP address/port of the location where our webadmin will access the server information fromOrder deny,allow
Deny from all
Allow from 10.2.3.0/24
by using 0 as the last octet, we are allowing access to anyone from this subnetwork (10.2.3)the 24 is used to indicate a mask to indicate which octet to examine (8 for first octet, 16 for first two, 24 for first three)
do this for both and containers (if we use both)
Error Pages
Apache is configured to generate a generic page on an error based on the status code
these response pages may lack useful information and so apache allows you to alter the default configuration on errors
you cancreate your own error pages
create your own error scripts for instance, a php script
generate a short automated message
use a multi-language error page available in the errors directory
redirect the attempt to a local URL see for instance what happens at www.nku.edu when you specify any incorrect URL/filename
redirect the attempt to an external URL
in your httpd.conf file, you set these up using the ErrorDocument directive of the form:ErrorDocument error-code document-name (or message)
Examples
ErrorDocument 401 /subscribe.html
here, presumably the user was not able to validly log in and thus generated a 401 error, so we bring up the page /subscribe.html
ErrorDocument 404 /cgi-bin/notfound.php
here, we run a script that we set up to handle any 404 (URL not found) errors (this is what NKU does)
ErrorDocument 500 Server Error!!
here, we return a page with the text Server Error!!
ErrorDocument 410 /var/web/errors/HTTP_GONE.html.var
here, we use one of the error pages made available in apache
these can respond differently based on several situations language of choice based on language negotiation, response includes environment variable(s) value(s) such as $HTTP_REFERER
ErrorDocument 505 http://www.errors.org/error505.cgi
redirect to an external URL because of wrong HTTP version
Using the Multi-Language Files
To use the multi-language error document files available in your error directory, there are several steps you will have to make
create an alias from /error/ to the actual location in your filespace of your error documentsAlias /error/ /usr/local/apache2/error/notice the use of trialing / here!
create a container for that directory containing at a minimumOptions IncludesNoExec
AddOutputFilter Includes html
AddHandler type-map var the files in this directory end with a .var extension
Order deny,allow
Allow from all (this is needed since / (root) is denied to all)
add your ErrorDocument directivee.g., ErrorDocument 404 /error/HTTP_NOT_FOUND.html.var
these already exist the file httpd-multilang-errordoc.conf
More on Multi-Language Pages
The nice thing about the use of the multi-language error pages that are available in Apache is that, based on browser information, the actual language returned can be specialized
if you look at any of these files, you see entries for Content-language for a number of different languages
based on the Content-language sent by the browser, the matching Body is selected and returnedfurther, an if statement allows for a more specialized message as to whether the page was reached directory or from a referer (a link)
In order to get the language selected appropriately, you might want to include two additional directives in your container from the previous slide:
LanguagePriority list (of languages here, e.g., en cs de es )
ForceLanguagePriority Prefer Fallback
External Redirects on Errors
An external redirect is not a matter of simply passing the buck
recall from chapter 5 a redirect sends a response to the web browser with a redirect status code (30x) and a new URLthe web browser then sends out a new HTTP request of the new URL
this can confuse crawlers and other agents who were expecting content back from their requests or error codes if the request could not be fulfilled, instead, they are given a new URL to pursuethe redirection can also cause problems if it arises during authentication because the browser is not receiving a 401 code and so will not prompt the user for a password potentially leaving the user confused as to why the original request was not fulfilled yet taken to the wrong location
Automatic Logging
There are two forms of logging that are taken care of automatically
access logging logging every request sent by clients (browsers, users, software)
error logging logging any request that results in an error
Either type of event will place a new entry into the appropriate log file
Each entry will contain at a minimum
the time/date of the request
the URL
the IP address of the requester
For errors, the status code will be included with the entry
For accesses, the command serviced (e.g., GET), the status code, and the browsers specification (type, OS, HTTP version) will be included with the entry
Typically, Apache performs the logging itself
rather than invoking syslogd or klogd as with other Linux services
Error Logging
Errors can be written to a log file, sent to a pipe (that is, piped to a Linux command) or written to the linux syslog service
There are two apache directives to control logging
ErrorLog specify the file or syslogif you do not set ErrorLog, it defaults to writing to the file error_log
if you specify a filename, it is assumed to be under ServerRoot unless you specify the full path
if the filename starts with | then the information is piped to the command that follows | as in | cat which would display the error information to the terminal window, probably a poor option
if you specify syslog, the syslogd service is used and follows the action in the /etc/syslog.conf file for local7 messages
LogLevel one of emerg, alert, crit, error, warn, notice, info, debug (see table 7-2 on page 182 for more detail)
IE Browser Error Pages
IE tends to ignore the error pages sent by Apache and it displays its own, more generic page
MS considers their own pages to be more user friendly
the problem is that the error page sent by Apache might include some useful content that an IE user will not see
IE will only display error pages for403, 405, 410 errors if the pages size > 256 bytes
400, 404, 406, 408, 409, 500, 501, 505 errors if the pages size > 512 bytes
but these pages, as generated by apache, tend to be smaller than the byte size listed above
there is a way to force IE to display the sent error page using the Windows Registry, but most users will not be aware of this
or, you could create your own error pages and make sure that they are > 512 bytes to force IE to post your pagesI tried both of these and I could not get IE to post the apache page so Im unsure if its even possible!
I/O Logging
Aside from logging requests and errors, you can log regular apache I/O if desired
this requires the use of the mod_dumpio modulethis is not part of the base apache, so it must be separately compiled
add the LoadModule statement to httpd.conf
there are three directivesDumpIOInput on (or off, the default)
DumpIOOutput on (or off, the default)
LogLevel=value where value is one of emerg, alert, crit, error, warn, notice, info, debug here, you need to use debug
the I/O logging is sent to your error log file, and because this generates an enormous amount of messages, you will probably not want to use this feature at all, or for a very long time
Access Logging
All http requests to your server are logged in the access log
these include requests that result in errors
Unlike error logging, these can only be logged to a specified file or written to a pipe
they cannot be sent to syslogd
You can specialize the access log using the mod_log_config module which offers two directives
CustomLog allows you to specify a new place for the output (a different file or a pipe)
LogFormat which allows you to specify how accesses are logged in terms of what types of information (we will see details on this in the next slide)in addition, the mod_sentenvif module can be used can be used to set various environment variables based on attributes of a request
Log Formatting
The LogFormat directive allows you to specify how you want your log entries to appear
you are able to define different formats and have them sent to different files although this may not be usefulLogFormat format name
CustomLog location name
format is a specification of the type of information to record and in what order it should be recorded (covered over the next few slides)
location is the location in your file system where you want the log file to be written if you specify a relative path, it is relative to ServerRoot
name is the same on both lines used to link a specified format to a log fileyou can shorten this by just doing CustomLog location format and omit the second directive and the name, but this means that you cannot share a format between two or more different log files
You can also specify under what condition(s) a format might be used (for instance, if the access resulted in a 200 status)
therefore, you can specify multiple logs, each with its own format
More
The format will comprise a series of percent directives (covered on the next slide) that specify what information should be logged (recorded)
these include such pieces of information as requestors IP address, URL requested, time of request, etc
the entire format is placed inside of for example, %a %U means IP address of client and URL requested
Conditional directives allow you to specify what status code(s) you desire for that piece of information to be logged
multiple status codes are separated by commas, and the code(s) appear between % and the directive%200a means to log %a (IP addr of client) if the status code is 200
%400,401,402,403,404U means to log %U (URL) if the status code is any of 400-404
you can also place ! in front of the number as in %!200a
if the condition is not met, the requested value is replaced by a hyphen (- ) in the log file
Useful Percent Directives
The full set of percent directives is given in table 7-3 on page 188, here, we look at the most useful
%a remote IP address,
%A server IP address
%B bytes sent excluding header
%c connection status when complete
%D duration of request
%f filename (resource)
%H request protocol
%m request method
%P PID of child servicing request
%s status
%t time of request
%u remote user (only available if user has authenticated)
%U requested URL
%{X}e output the value of environment variable X
%{X}i output Xs header (X might be User-agent or Referer)
%{format}t output the time using the provided format
Examples
The Common Log Format is a standard format developed for NCSA servers
this format string is %h %l %u %t \%r\ %>s %b which isthe host, remote logname (or if not known/supported), user name (if known through login), date, request (inside of since the \ is an escape character), status (3 digit code) with a > prior to the status, and bytes of the transferred file including the header
Imagine that your website is linked from other sites and you want to know how often a visitor has reached your site through one of those links (referers)
use %{Referer}i -> %U
this records into your log file the referer and the URL (how they got here and where they tried to go)
Or you might want to know the web browser of a visitor
use %{User-agent}i
Multiple Logs
Lets imagine that we want to have one log for all successful operations, one log for redirections, and one log for 40x errors
LogFormat %200a %200U %200t success
CustomLog logs/success_log success
LogFormat %301,302,303%a %301,302,303U %301,302,303t redirection
CustomLog logs/redirection_log redirection
LogFormat %401,403,404,410a %401,403,404,410U %401,403,404,410t error40x
CustomLog logs/error_log_40x error40x
We could change the format so that each log file logs different types of information
for instance we might want to know the specific error for the error_log_40x file by adding %snote that %s will return the original requests status in the case of a redirection (e.g., 30x), if we want the final status, use %>s
or the size of the file (%B) on a 200 success
SetEnvIf Directives
This directive allows you to set an environment variable which you can then use for your logging
the format of the directive isSetEnvIf attribute regex env-variable[=value]you can set multiple variables if desired
the attribute is usually a value from the request header (e.g., Method, Protocol, Host, User-Agent, Referer, Range) or it can be one of Remote_Addr, Remote_User, Request_URI or it can be an already defined environment variableexample: SetEnvIf Referer www\.nku\.edu internalthis sets the variable local (to true)
example: SetEnvIf Remote_Addr 127\.0\.0\.1 self
example: SetEnvIf Request_URI \.gif$ type=gif
SetEnvIf Request_URI \.jpg$ type=jpg
this will set the variable type to be of the type of image requested
Log Rotation
If you are running a web server for even a modest sized web site, you may receive thousands of hits a day
each of these is logged in the access_log file and the error_log file may become large as well
log rotation is the process of moving the current log file into a retired log filethese typically appear with .# after their name as in access_log, access_log.1, access_log.2, access_log.3 with the previous access_log.3 being deleted and the new access_log starting blank
depending on how quickly a log file fills up, you may want to rotate the files every day, every week or every month
while you might write your own script to handle this and then issue a crontab job, there is a built-in apache program called rotatelogs that does this for youthis program is typically in the same directory as apachectl
you run it as rotatelogs filename rotationtime (in seconds, 86400 is every day)
Favicon.ico
The favicon is an icon that is displayed in the browsers address bar next to the URL of the sites logo (you can also see these in bookmarks)
the icon will reside in the web sites home directory (DocumentRoot)
If a site does not have a favicon.ico in that directory, typically the error and access logs fill up with error messages
you have three ways to prevent thiscreate an icon and put it in this directory
create a 0 byte file whose name is favicon.ico in this directory
suppress the log messages as follows:SetEnvIf Request_URI favicon\.ico favicon
CustomLog logs/access_log common env=!favicon
this says for any request for favicon.ico, set the variable favicon to true, and log anything when favicon is false
Reporting Programs
You want to search your log files for useful information
how many people are visiting? what errors are arising? is the same IP address sending numerous requests (e.g., denial of service request)?
wading through thousands of entries can be time consuming
you have many choices such as using awk or writing your own shell scriptswith awk, you could count the number of times each unique IP address is found to see if you are being attacked
with your own script, you could generate a report that lists all of the 404 errors by URL so that you could see if there are URLs that are being misinterpreted by the users
AWStats is a reporting tool that can dig through your file(s) for useful information like trends, that you might want to share with your marketing department this is open source software