Post on 18-Jan-2016
2001 Prentice Hall, Inc. All rights reserved.
Chapter 17 - Web Automation and Networking
Outline17.1 Introduction17.2 Introduction to LPW17.3 LPW Commands17.4 The LPW::Simple Module17.5 HTML Parsing17.6 Introduction to Advanced Networking17.7 Protocols17.8 Transport Control Protocol (TCP)17.9 Simple Mail Transfer Protocol (SMTP)17.10 Post Office Protocol (POP)17.11 Searching the World Wide Web
2001 Prentice Hall, Inc. All rights reserved.
17.1 Introduction
• Perl– Internet-based language
– Used to create CGI scripts
– Web-related modules
– Automated tasks
2001 Prentice Hall, Inc. All rights reserved.
17.2 Introduction to LPW
• LWP– Library for the WWW in Perl
• Common use: mimic browser request of a Web page
– Request object• HTTP::Request
– method» One of get, put, post or head
– URL» Address of request item
– headers» Key-value pairs that provide extra information
– content» Data sent from client to server
2001 Prentice Hall, Inc. All rights reserved.
17.2 Introduction to LPW (II)
– Response object• HTTP::Response
– code» Status indicator for outcome of request
– message» String that corresponds to code
– headers» Additional information about response
» Description of content– content
» Data associated with response
2001 Prentice Hall, Inc. All rights reserved.
17.2 Introduction to LPW (III)
– User Agent• Usually a Web browser
– timeout» How long user waits before timing out
– agent» Name of the user agent
– from» E-mail address of person using the browser
– credentials» Any usernames or passwords for the response
2001 Prentice Hall, Inc. All rights reserved.
17.3 LPW Commands
• LWP– Is used to interact programmatically between a Perl program
and a Web server.
2001 Prentice Hall, Inc. All rights reserved.
Outline1 #!usr/bin/perl
2 # Fig 17.1: fig17_01.pl
3 # Simple LWP commands.
4
5 use strict;
6 use warnings;
7 use LWP::UserAgent;
8
9 my $url = "http://localhost/home.html";
10 open( OUT, ">response.txt" ) or
11 die( "Cannot open OUT file: $!" );
12
13 my $agent = new LWP::UserAgent();
14 my $request = new HTTP::Request( 'GET' => $url );
15 my $response = $agent->request( $request );
16
17 if ( $response->is_success() ) {
18 print( OUT $response->content() );
19 }
20 else {
21 print( OUT "Error: " . $response->status_line() . "\n" );
22 }
23
24 print( OUT "\n------------------------\n" );
25
26 $url = "http://localhost/cgi-bin/fig16_02.pl";
27
fig17_01.pl
This creates a new user agent object
This creates a new request object. The argument indicates that it is a GET request, requesting $url
If there was a response then the program will output the content
If there was no response then it finds out the status of the response
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_01.pl
Program Output
28 $request = new HTTP::Request( 'POST', $url );
29 $request->content_type( 'application/x-www-form-urlencoded' );
30 $request->content( 'type=another' );
31 $response = $agent->request( $request );
32
33 print( OUT $response->as_string() );
34 print( OUT "\n" );
35 close( OUT ) or die( "Cannot close out file : $!" );
<html><title>This is my home page.</title> <body bgcolor = "skyblue"><h1>This is my home page.</h1><b>I enjoy programming, swimming, and dancing.</b><br></br><b><i>Here are some of my favorite links:</i></b><br></br><a href = "http://www.C++.com">programming</a><br></br><a href = "http://www.swimmersworld.com">swimming</a><br></br><a href = "http://www.abt.org">dancing</a><br></br></body></html>------------------------
Creates a new request to POSTDetermines how the response will be encodedGets the agents request and
prints it out as a string
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_01.pl
Program Output
HTTP/1.1 200 OKConnection: closeDate: Tue, 21 Nov 2000 15:20:19 GMTServer: Apache/1.3.12 (Win32)Content-Type: text/htmlClient-Date: Tue, 21 Nov 2000 15:20:19 GMTClient-Peer: 127.0.0.1:80Title: Your Style Page <html><head><title>Your StylePage</title></head><body bgcolor = "#ffffc0" text = "#ee82ee" link = "#3cb371" vlink = "#3cb371"><p>This is your style page.</p><p>You chose the colors.</p><a href = "/fig16_01.html">Choose a newstyle.</a></body></html>
2001 Prentice Hall, Inc. All rights reserved.
17.3 LPW Commands
Fig. 17.2 Contents of home.html.
<html> <title>This is my home page.</title> <body bgcolor = "skyblue"> <h1>This is my home page.</h1> <b>I enjoy programming, swimming, and dancing.</b> <br></br> <b><i>Here are some of my favorite links:</i></b> <br></br> <a href = "http://www.C++.com">programming</a> <br></br> <a href = "http://www.swimmersworld.com">swimming</a> <br></br> <a href = "http://www.abt.org">dancing</a> <br></br> </body></html>
2001 Prentice Hall, Inc. All rights reserved.
17.4 The LPW::Simple Module
• LPW::Simple module– Provides procedural interface to LPW
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_03.pl
Program Output
1 #!usr/bin/per; 2 # Fig 17.3: fig17_03.pl 3 # A program that uses LWP::Simple 4 5 use strict; 6 use warnings; 7 use LPW::Simple; 8 9 my $url = "HTTP://localhost/home.html";10 my $page = get( $url );11 print( “\n$page\n\n" );12 my $status = getprint( $url );13 print( "\n\n$status\n" );14 $status = getstore( $url, "page.txt" )15 print( "\n$status\n" )
<html><title>This is my home page.</title> <body bgcolor = "skyblue"><h1>This is my home page.</h1><b>I enjoy programming, swimming, and dancing.</b><br></br><b><i>Here are some of my favorite links:</i></b><br></br><a href = "http://www.C++.com">programming</a><br></br><a href = "http://www.swimmersworld.com">swimming</a><br></br><a href = "http://www.abt.org">dancing</a><br></br></body></html>
Retrieves a Web page and stores its contents in a scalar
Gets the Web page and stores it into a file
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_03.pl
Program Output
<html> <title>This is my home page.</title> <body bgcolor = "skyblue"><h1>This is my home page.</h1><b>I enjoy programming, swimming, and dancing.</b><br></br> <b><i>Here are some of my favorite links:</i></b><br></br><a href = "http://www.C++.com">programming</a><br></br><a href = "http://www.swimmersworld.com">swimming</a><br></br><a href = "http://www.abt.org">dancing</a><br></br></body></html> 200 200
2001 Prentice Hall, Inc. All rights reserved.
17.5 HTML Parsing
• HTML::TokeParser– Way of extracting HTML easily
– Can walk through manually but TokeParser is simpler
• Token– Array references
– 5 types• Start token (S)
– starting HTML tag
• End token (E)
– Array holding the tag, the name, and the original text
• Text token (T)
• Comment token (C)
• Declaration token (D)
2001 Prentice Hall, Inc. All rights reserved.
17.5 HTML Parsing
Fig. 17.4 Resulting page.txt file.
<html><title>This is my home page.</title> <body bgcolor = "skyblue"><h1>This is my home page.</h1><b>I enjoy programming, swimming, and dancing.</b><br></br> <b><i>Here are some of my favorite links:</i></b><br></br><a href = "http://www.C++.com">programming</a><br></br><a href = "http://www.swimmersworld.com">swimming</a><br></br><a href = "http://www.abt.org">dancing</a><br></br></body></html>
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_05.pl
1 #!/usr/bin/perl
2 # Fig 17.5: fig17_05.pl
3 # A program to strip tags from an HTML document.
4
5 use strict;
6 use warnings;
7 use LWP::UserAgent;
8 use HTML::TokeParser;
9
10 my $url = "http://localhost/home.html";
11 my $agent = new LWP::UserAgent();
12 my $request = new HTTP::Request( 'GET' => $url );
13 my $response = $agent->request( $request );
14 my $document = $response->content();
15
16 my $page = new HTML::TokeParser( \$document );
17
18 while ( my $token = $page->get_token() ) {
19 my $type = shift( @{ $token } );
20 my $text = shift( @{ $token } );
21
22 if ( $type eq "T" ) {
23 print( "$text" );
24 }
25 }
Gets a Web page and stores its contents to $document
Creates a new TokeParser object
Goes through the tokens to display the text
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_05.pl
Program Output
This is my home page. This is my home page.I enjoy programming, swimming, and dancing. Here are some of my favorite links: programming swimming dancing
2001 Prentice Hall, Inc. All rights reserved.
17.6 Introduction to Advanced Networking
• Sockets– All network communications are done with sockets
– 1 connection = 2 sockets
– Allows date to be passed• Streams
– Sequenced
– Reliable
• Datagrams
– Less reliable
– Not sequenced
– Require less system resources
» Connection is not permanent
2001 Prentice Hall, Inc. All rights reserved.
17.6 Introduction to Advanced Networking (II)
• Server– One endpoint / socket
– Listens for a connection
– Knows how to process requests
• Client– Other endpoint / socket
– Knows the server
– Initiates the connection
– Sends a request
2001 Prentice Hall, Inc. All rights reserved.
17.7 Protocols
• Standardization Protocols– Need to be standardized or else server would have to know
how to process each individual request
– HTTP (Chapter 7)
– POP• receiving e-mail
– STMP• sending e-mail
2001 Prentice Hall, Inc. All rights reserved.
17.8 Transport Control Protocol (TCP)
• Internet connections– TCP
• Most general way for computers to talk
• Connection-oriented
2001 Prentice Hall, Inc. All rights reserved.
Outline1 #!/usr/bin/perl
3 # TCP chat client.
4
5 use strict;
6 use warnings;
7 use IO::Socket;
8
9 my $host = '192.168.1.71';
10 my $port = 5833;
11
12 my $socket = new IO::Socket::INET(
13 PeerAddr => $host,
14 PeerPort => $port,
15 Proto => "tcp",
17 or die( "Cannot connect to $host:$port : $@\n" );
18
20 print( $socket "What is your name?\n" );
21 print( "What is your name?\n" );
22
23 my $response = <$socket>;
24 print( "From server: $response" );
25
26 my $input = <STDIN>;
27
28 chomp( $input );
29
2 # Fig 17.6: fig17_06.pl
19 local $| = 1;
16 Type => SOCK_STREAM )
fig17_06.pl
Initializes the location of the server
Creates the Internet connection, will make a socket and automatically connect if server is found
Turns off line buffering
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_06.pl
30 while ( $input ne "q" ) {
31 print( $socket "$input\n" );
32 $response = <$socket>;
33 print( "From server: $response" );
34
35 $input = <STDIN>;
36 chomp( $input );
37 }
38
39 print( "done\n" );
40 print( $socket "$input\n" );
41
42 close ( $socket ) or die( "Cannot close socket: $!" );
The user enters ‘q’ to close the connection
2001 Prentice Hall, Inc. All rights reserved.
Outline
4
8
10
16
18
21
24
28
32 }3334 close ( $server ) or die( "Cannot end connection: $!" );
31 print( "From client: $response\n" );30 chomp( $response );29 $response = <$client>;
27 print( $client "$input" );26 my $input = <STDIN>;25 while ( $response ne "q" ) {
23 print( "From client: $response\n" );22 chomp $response;
20 my $response = <$client>;19 my $client = $server->accept();
17 local $| = 1;
15 or die( "Cannot be a server on $port: $@\n" );14 Listen => 10 )13 Type => SOCK_STREAM,12 LocalPort => $port,11 my $server = new IO::Socket::INET(
9 my $port = 5833;
7 use IO::Socket;6 use warnings;5 use strict;
3 # TCP chat server.2 # Fig 17.7: fig17_07.pl1 #!/usr/bin/perl
fig17_07.pl
Specifies the port to check for a client
Creates a new socket object
Listen makes the server wait for a connection and specifies that 10 clients can be waiting to connect
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_07.pl
Program Output
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_07.pl
Program Output
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_07.pl
Program Output
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_07.pl
Program Output
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_07.pl
Program Output
2001 Prentice Hall, Inc. All rights reserved.
17.9 Simple Mail Transfer Protocol (SMTP)
• Net::SMTP module
2001 Prentice Hall, Inc. All rights reserved.
Outline
3029 print( textfield( "subject" ), br() );
28 print("Enter what you want to appear in the \"subject\" header:");2726 print( textfield( "to" ), br() );25 print( "Enter what you want to appear in the \"to\" header: " );2423 print( textfield( "address" ), br() );22 print( "Enter where you would like to send this e-mail: " );2120 print( textfield( "from" ), br() );
19 print( "Enter what you want to appear in the \"from\" header: " );1817 print( textfield( "server" ), br() );16 print( "Enter the SMTP server to connect to: " ); 1514 print( start_form( -action => "fig17_09.pl" ) );1312 print( h1( "The e-mail home page." ) );1110 print( start_html( "Send e-mail!" ) );9 print( header() );87 use CGI qw( :standard );6 use warnings;5 use strict;43 # Form to send an e-mail message.2 # Fig. 17.8: fig17_08.pl1 #!/usr/bin/perl
fig17_08.pl
Gets the STMP server
Gets the address to send the e-mail to
2001 Prentice Hall, Inc. All rights reserved.
Outline
38 print( end_html() );3736 print( br(), submit( "submit" ), end_form() );3534 -wrap => 1 ), br() );33 print( textarea( -name => "message", -rows => 5, -columns => 50, 32 print( br() );31 print( "Enter the message you want to send in the e-mail: " );
fig17_08.pl
Program Output
2001 Prentice Hall, Inc. All rights reserved.
Outline
30 $smtp->quit();29 $smtp->dataend();28 $smtp->datasend( "$message\n" );27 $smtp->datasend( "Subject: $subject\n\n" );26 $smtp->datasend( "To: $to\n" );25 $smtp->datasend( "From: $from\n" );24 $smtp->data();2322 $smtp->to( "$address" );21 $smtp->mail( "$my_address" );2019 or die( "Cannot send e-mail: $!" );18 my $smtp = new Net::SMTP( "$server", Hello => "$server" )1716 my $my_address = 'my_address.smtp';15 my $message = param( "message" );14 my $subject = param( "subject" );13 my $to = param( "to" );12 my $address = param( "address" );11 my $from = param( "from" );10 my $server = param( "server" );98 use CGI qw( :standard );7 use Net::SMTP;6 use warnings;5 use strict;43 # Send an e-mail message.2 # Fig 17.9: fig17_09.pl1 #!/usr/bin/perl
fig17_09.pl
Creates a new Net::SMTP object
The mail method creates an e-mail message, takes address of sender
The to method is who the receiver of the email is
Starts and stops the transfer of data
2001 Prentice Hall, Inc. All rights reserved.
Outline
35 print( end_html() );34 print( h1( "Your e-mail has been sent." ) );33 print( start_html( "Send e-mail!" ) );32 print( header() );31
fig17_09.pl
2001 Prentice Hall, Inc. All rights reserved.
17.10 Post Office Protocol (POP)
• POP– Created to make the storage and retrieval of e-mail easier
– Allow checking, reading, storing and deleting of mail
2001 Prentice Hall, Inc. All rights reserved.
Outline
25 print( end_html() );
24
23 FORM
22 </form>
21 <input type = "reset" value = "reset">
20 <input type = "submit" value = "check mail">
19 <input name = "offset" value = "0" type = "hidden">
18 <input name = "server" type = "text" size = "20"></p>
17 <p>Server:
16 <input name = "password" type = "password" size = "20"></p>
15 <p>Password:
14 <input name = "userName" type = "text" size = "20"></p>
13 <p>Username:
12 <form action = "fig17_11.pl" method = "post">
11 print <<FORM;
10
9 print( start_html( -title => 'Please Login' ) );
8 print( header() );
7
6 use CGI qw( :standard );
5 use warnings;
4 use strict;
3
2 # Fig. 17.10: fig17_10.pl
1 #!/usr/bin/perl
fig17_10.pl
Creates an HTML page that asks for a username and password and then the IP address of the server
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_10.pl
2001 Prentice Hall, Inc. All rights reserved.
Outline
3130 print( "<p>$_: " );29 for ( $start .. $end ) {2827 my $end = ( $offset2 < $messages ? $offset2 : $messages );26 my $start = 1 + $offset;25 my $offset2 = $offset + 5;24 my $offset1 = $offset - 5;23 print( "<p>You have $messages messages in your inbox.</p>" );22 my $messages = $pop->Count();2120 print( h1( "Cannot connect: $!" ) );19 PASSWORD => $password, HOST => $server ) or18 my $pop = new Mail::POP3Client( USER => $user, 1716 print( start_html( -title => "Check your mail!" ) );15 print( header() );1413 my $offset = param( "offset" );12 my $server = param( "server" );11 my $password = param( "password" );10 my $user = param( "userName" );98 use CGI qw( :standard );7 use Mail::POP3Client;6 use MD5;5 use warnings;4 use strict;32 # Fig. 17.11: fig17_11.pl1 #!/usr/bin/perl
fig17_11.pl
Gets the parameters from the user entered Web data
Allows only a total of 5 messages to be displayed at once
A tally of the messages in the inbox
2001 Prentice Hall, Inc. All rights reserved.
Outline34 }35
37 }3839 print <<FORM1 if ( $offset );
41 <input name = "userName" value = $user type = "hidden"> 42 <input name = "password" value = $password type = "hidden"> 43 <input name = "server" value = $server type = "hidden"> 44 <input name = "offset" value = $offset1 type = "hidden"> 45 <input type = "submit" value = "See previous 5">46 </form>47 FORM14849 print <<FORM2 if ( $end != $messages );
51 <input name = "userName" value = $user type = "hidden"> 50 <form action = "fig17_11.pl" method = "post">
52 <input name = "password" value = $password type = "hidden"> 53 <input name = "server" value = $server type = "hidden"> 54 <input name = "offset" value = $offset2 type = "hidden"> 55 <input type = "submit" value = "See next 5">
61 $pop->Close();
60
59 print( end_html() );
5857 FORM256 </form>
32 foreach ( $pop->Head( $_ ) ) {
36 print( "</p>\n" );
40 <form action = "fig17_11.pl" method = "post">
33 /^(From|subject):\s+/i and print $_, "<br/>";
fig17_11.plGoes through the headers of each message
The next 5 messages to be shown
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_11.pl
Program Output
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_11.pl
2001 Prentice Hall, Inc. All rights reserved.
17.11 Searching the World Wide Web
• Searching– A major application of the Web
– Perl has several modules for searching
2001 Prentice Hall, Inc. All rights reserved.
Outline
3332 print( "WebCrawler", br() );31 print( "name = \"WebCrawler\" value = \"1\">" );
30 print( "<input type = \"checkbox\" " ); 2928 print( "HotBot", br() );27 print( "name = \"HotBot\" value = \"1\">" );26 print( "<input type = \"checkbox\" " ); 2524 print( "AltaVista", br() );23 print( "name = \"AltaVista\" value = \"1\">" );22 print( "<input type = \"checkbox\" " ); 2120 print( textfield( "amount" ), br(), br() );19 print( "from each search engine, 1-50: " );18 print( br() );17 print( "Enter number of sites you want " );1615 print( textfield( "query" ), br(), br() );14 print( "Enter query: " );13
12 print( start_form( -method =>"post",-action =>"fig17_13.pl" ));1110 print( h1( "Search the Web!" ) );9 print( header(), start_html( "Web Search" ) );87 use CGI qw( :standard );6 use warnings;5 use strict;43 # Program to begin a Web search.2 # Fig. 17.12: fig17_12.pl1 #!/usr/bin/perl
fig17_12.pl
What topic is to be searched for
How many results the user desires to be returned
Allows the user to check which of the 4 engines to use
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_12.pl
Program Output
34 print( "<input type = \"checkbox\" " );
35 print( "name = \"NorthernLight\" value = \"1\">" );
36 print( "NorthernLight", br() );
37
38 print( br(), submit( "Search!" ), end_form() );
39
40 print( end_html() );
2001 Prentice Hall, Inc. All rights reserved.
Outline1 #!/usr/bin/perl
3 # A program that collects search results.
4
5 use strict;
6 use warnings;
7 use WWW::Search;
8 use CGI qw( :standard );
9
10 my @engines;
11 my $search;
12
13 my $query = param( "query" );
14 my $amount = param( "amount" );
15
17 print( header(), start_html() );
18 print( h1( "Please try again." ) );
20 print( end_html() );
21 exit();
22 }
23
24 if ( !$amount || $amount > 50 ) {
25 $amount = 5;
26 }
27
28 my $value;
29
16 if ( !$query ) {
2 # Fig 17.13: fig17_13.pl
19 print( "<a href = \"/cgi-bin/fig17_12.pl\">Go back</a>" );
fig17_13.pl
Allows a large use of search engines
Displays if the user did not enter any input
If there is no amount or it is greater than 50 then set it to 5
2001 Prentice Hall, Inc. All rights reserved.
Outline31 push( @engines, "HotBot" ) if ( param( "HotBot" ) );
30 push( @engines, "AltaVista" ) if ( param( "AltaVista" ) );
32 push( @engines, "WebCrawler" ) if ( param( "WebCrawler" ) );
33 push( @engines, "NorthernLight" ) if ( param( "NorthernLight" ) );
34
35 print( header() );
36 print( start_html( "Web Search" ) );
37
38 foreach ( @engines ) {
39 my $search = new WWW::Search( $_ );
40 $search->native_query( WWW::Search::escape_query( $query ) );
42
43 for ( 1 .. $amount ) {
44 my $result = $search->next_result();
45 $value = $result->url();
46 print( "<a href = $value>$value</a>" );
47 print( br() );
48 }
49
50 print( br() );
51 }
52
53 print( end_html() );
41 print( b( i( "Web sites found by $_:" ) ), br() );
fig17_13.pl
Insert the engines into the array if the user checked them
Displays the results
Searches the Web for results
2001 Prentice Hall, Inc. All rights reserved.
Outline
fig17_13.pl