March 20, 2008Electronic Resources and Libraries College Center for Library Automation Tallahassee,...

43
March 20, 2008 March 20, 2008 Electronic Resources and Electronic Resources and Libraries Libraries College Center for Library College Center for Library Automation Automation Tallahassee, FL Tallahassee, FL Susan B. Campbell Susan B. Campbell ([email protected]) ([email protected]) Jim McGill Jim McGill ([email protected]) ([email protected])

Transcript of March 20, 2008Electronic Resources and Libraries College Center for Library Automation Tallahassee,...

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

College Center for Library College Center for Library AutomationAutomation

Tallahassee, FLTallahassee, FL

• Susan B. Campbell Susan B. Campbell ([email protected])([email protected])

• Jim McGill Jim McGill

([email protected])([email protected])

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

automating retrieval and reporting automating retrieval and reporting of database usage statistics for a of database usage statistics for a

consortiumconsortium• CCLA provides and maintains the Library Information CCLA provides and maintains the Library Information

Network for 28 Community Colleges (LINCC) for Florida's Network for 28 Community Colleges (LINCC) for Florida's 65+ community college libraries. 65+ community college libraries.

• db statistics we’re collecting and reportingdb statistics we’re collecting and reporting• 19 vendors19 vendors• over 200 databasesover 200 databases• monthly reports by database, campus, statewidemonthly reports by database, campus, statewide• on demandon demand

• customers for monthly reportscustomers for monthly reports• 28 community colleges in Florida28 community colleges in Florida• internal reportsinternal reports

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

automating retrieval and reporting of automating retrieval and reporting of database usage statistics for a database usage statistics for a

consortiumconsortium

• problemproblem• what we were doing and why it doesn’t workwhat we were doing and why it doesn’t work

• solutionsolution• the pieces, the parts and how they fit togetherthe pieces, the parts and how they fit together

• futurefuture• what we’ve learned and our expectationswhat we’ve learned and our expectations

• problemproblem• what we were doing and why it doesn’t workwhat we were doing and why it doesn’t work

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

• excel excess excel excess

the problemthe problem

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

the problemthe problem

• vendor varietyvendor variety

repeat 28 times or more for each vendor

(and sometimes each database)

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

automating retrieval and reporting of automating retrieval and reporting of database usage statistics for a database usage statistics for a

consortiumconsortium

• problemproblem• what we were doing and why it doesn’t workwhat we were doing and why it doesn’t work

• solutionsolution• the pieces, the parts and how they fit togetherthe pieces, the parts and how they fit together

• futurefuture• what we’ve learned and our expectationswhat we’ve learned and our expectations

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

the solutionthe solution• automatingautomating

• maintenance utilitiesmaintenance utilities• handling retrieved datahandling retrieved data• reporting in multiple formatsreporting in multiple formats

• retrieval of vendor dataretrieval of vendor data

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

intranet web interfaceintranet web interface

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

Vendor not

responding

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

reportingreporting

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

creating retrieval scriptscreating retrieval scripts“nuts and bolts”“nuts and bolts”

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

Process Trace File(ParseHTTPTrace.pl)

Generic Web Page retrieval (GetWebPage_VENDOR.pl)

Automated Web Page Retrieval(GetWebPage_VENDOR.pl)

Web Page Code(GetWebPage_VENDOR.html)

SQL Server EXPRESS

Manual Edits

Parse Web Page Information(ProcessVENDOR.pl)

Parameters

Statistics

ProcessVENDOR.sql

One Time, 4 Step Process Automated Process

(Manual edits for testing & first cleanup – remove everything that isn’t in table. This is iterative and run from the command prompt until satisfactory file is returned.)

Web InterfaceQueue

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

This is a manual process to create the Perl script that will accept variables and create GetWebPage_VENDOR.pl

step 1. capture HTTP headers

Process Trace File(ParseHTTPTrace.pl)

Generic Web Page retrieval(GetWebPage_VENDOR.pl)

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

step 2. modify Perl script to accept command line variables

to reformat standard YYYYMM format to two separate variables: MM and YYYY for URL

$Period=$ARGV[0];$ScopeCustID=$ARGV[1];$UserName=$ARGV[2];$Password=$ARGV[3];

#$ScopeCustID="bcc";#$Period="200701";

$yr=substr($Period,0,4);$mon=substr($Period,4,2);if ($mon < 10) {$mon=~s/0//gi;};

YYYYMM - our DB formatvendor specific scope customer ID

remarks - unremarked for testing

Automated Web Page Retrieval(GetWebPage_VENDOR.pl)

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

Step 3. modify script with command line variables and parse runtime variables

... iodFromMonth=' . $mon . '&timePeriodFromYear=' . $yr . '&timeP ...$content0=$resp5->content;

$pos=index($content0,"VIEWSTATE")+13;$pos2=substr($content0,$pos,5000);$pos3=index($pos2,"value")+7;$pos4=index($pos2,"\/>");$VIEWSTATE=substr($pos2,$pos3,$pos4-$pos3-2);$VIEWSTATE=~s/\//\%2F/gi;$VIEWSTATE=~s/\+/\%2B/gi;$VIEWSTATE=~s/\=/\%3D/gi;

$pos=index($content0,"EVENTVALIDATION")+13;$pos2=substr($content0,$pos,2000);$pos3=index($pos2,"value")+7;$pos4=index($pos2,"\/>");$EVENTVALIDATION=substr($pos2,$pos3,$pos4-$pos3-2);$EVENTVALIDATION=~s/\//\%2F/gi;$EVENTVALIDATION=~s/\+/\%2B/gi;$EVENTVALIDATION=~s/\=/\%3D/gi;

SECURITY CODES

some codes are session based & must be parsed out to pass to subsequent

pages

Automated Web Page Retrieval(GetWebPage_VENDOR.pl)

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

step 4. create page parser (part 1)

Parse Web Page Information(ProcessVENDOR.pl)

creating ProcessVendor.pl script

include file with needed subroutines

$col=$ARGV[0];$vendor=“vendorname";$VDBSuffix=“VENDOR";$jumpin="<b>Site:";$jumpout="Grand Total";require ("../VDBProcs.pl");

anonymized (for this presentation) vendor name

college name – when needed

points to begin and stop processing file

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

After processing, each table row is on one line with all carriage returns, linefeeds, and tabs removed. Blank lines and page feeds are not output, code outside jump* is ignored. Period, college name and other variables are passed from the database by the VDBProc.pl file.

VDBProcs.pl

htmlclean()

htmltotxt()

getperiod()

writestats()

validation()Vendor.pl

SQL log file

Validation is run on SQL log file to look for error messages and write to log. Entries are made for no data, change from previously retrieved period value or other potential problems.

Step 4. create page parser (part 2)

Parse Web Page Information(ProcessVENDOR.pl)

procedures called from common include file

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

automated process

Automated Web Page Retrieval(GetWebPage_VENDOR.pl)

Web Page Code(GetWebPage_VENDOR.html)

SQL Server EXPRESS

Parse Web Page Information(ProcessVENDOR.pl)

Parameters

Statistics

ProcessVENDOR.sql

Automated Process

Web InterfaceQueue

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

handling retrieved datahandling retrieved data delete from VDBStatistics where vendor=‘VENDOR' and college='VALENCIA COMM COLLEGE' and datasource=‘SOME VENDOR DATABASE' and datatype='Sessions' and subdatatype='0' and period='200802'insert into VDBStatistics ( sourcefile, vendor, college, period, datatype, subdatatype, datasource, quantity ) values ('ProcessVENDOR.sql',‘VENDOR','VALENCIA COMM COLLEGE‘,'200802','Sessions','0',SOME VENDOR DATABASE','4348')

ProcessVENDOR.sql

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

handling retrieved datahandling retrieved data

• where/how we store what we where/how we store what we retrieveretrieve

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

daily backup of database via windows daily backup of database via windows schedulerscheduler

* SQL Server Express does not support SQL Agent

handling retrieved datahandling retrieved data

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

toolstools

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

software usedsoftware used• retrieval of data – freeretrieval of data – free

• Internet ExplorerInternet Explorer• PerlPerl

• LWP library (Library for the WWW for Perl)LWP library (Library for the WWW for Perl)• ieHTTP HeadersieHTTP Headers• ParseHTTPTrace.plParseHTTPTrace.pl

• SQLExpress and managerSQLExpress and manager• Intranet Site (IIS, .asp, vbscript, java)Intranet Site (IIS, .asp, vbscript, java)

• reporting – some costreporting – some cost• EZView (low cost)EZView (low cost)• Crystal Reports (had it)Crystal Reports (had it)

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

structurestructure• environmentenvironment

• each vendor has its own working each vendor has its own working directorydirectory

• each vendor has several files in this each vendor has several files in this directorydirectory

• batch file (called from SQL Server)batch file (called from SQL Server)• Perl script (gets web page)Perl script (gets web page)• Perl script (makes sql to load data)Perl script (makes sql to load data)• log files (troubleshoot)log files (troubleshoot)

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

• activePerl 5.8.6 build 811 to download webpagesactivePerl 5.8.6 build 811 to download webpages

• run from command prompt in development and testingrun from command prompt in development and testing

• ieHTTPHeaders - an add-on for IE that displays ieHTTPHeaders - an add-on for IE that displays HTTP HeadersHTTP Headers

http://www.blunck.se/iehttpheaders/iehttpheaders.htmlhttp://www.blunck.se/iehttpheaders/iehttpheaders.html

• once trace file is captured with ieHTTPHeaders once trace file is captured with ieHTTPHeaders add-on, use ParseHTTPTrace.pl to create add-on, use ParseHTTPTrace.pl to create GetWebPage_VENDOR.pl file.GetWebPage_VENDOR.pl file.

• http://www.codeproject.com/KB/perl/http://www.codeproject.com/KB/perl/webautomaton.aspxwebautomaton.aspx

retrieval of vendor dataretrieval of vendor data

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

automating retrieval and reporting of automating retrieval and reporting of database usage statistics for a database usage statistics for a

consortiumconsortium

• problemproblem• what we were doing and why it doesn’t workwhat we were doing and why it doesn’t work

• solutionsolution• the pieces, the parts and how they fit togetherthe pieces, the parts and how they fit together

• futurefuture• what we’ve learned and our expectationswhat we’ve learned and our expectations

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

what have we learned?what have we learned?• large change in service requires staffing and large change in service requires staffing and

supportsupport• project name should be closely related to the project name should be closely related to the

service service • administration understanding of needsadministration understanding of needs

• assignment of prioritiesassignment of priorities• proof-of-conceptproof-of-concept• need for ongoing support –vendor changes, local needsneed for ongoing support –vendor changes, local needs

• moving from proof-of-concept is NOT trivialmoving from proof-of-concept is NOT trivial• data checking/revisions/data checking/revisionsdata checking/revisions/data checking/revisions• handoff from development to maintenancehandoff from development to maintenance

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

expectationsexpectations• future usefuture use

• until SUSHI is widespread ORuntil SUSHI is widespread OR• until data collection and reporting in ERM until data collection and reporting in ERM

products is mature ORproducts is mature OR• until existing automated systems have until existing automated systems have

reasonable consortial pricingreasonable consortial pricing• future plansfuture plans

• customer/college interfacecustomer/college interface• hope…hope…

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

March 20, 2008March 20, 2008 Electronic Resources and LibrariesElectronic Resources and Libraries

Thank you

College Center for Library Automation

1753 W. Paul Dirac Drive

Tallahassee, Florida 32310

Susan Campbell [email protected]

Jim McGill [email protected]