My Website Was Lost, But Now It’s Found

16
My Website Was Lost, But Now It’s Found Frank McCown CS 110 – Intro to Computer Science April 23, 2007

description

My Website Was Lost, But Now It’s Found. Frank McCown CS 110 – Intro to Computer Science April 23, 2007. Frank McCown. Education Ph.D. in Computer Science – Old Dominion Univ. (2007 expected) M.S. in Computer Science – Univ of Arkansas in Little Rock (2002) - PowerPoint PPT Presentation

Transcript of My Website Was Lost, But Now It’s Found

Page 1: My Website Was Lost, But Now It’s Found

My Website Was Lost, But Now It’s Found

Frank McCownCS 110 – Intro to Computer Science

April 23, 2007

Page 2: My Website Was Lost, But Now It’s Found
Page 3: My Website Was Lost, But Now It’s Found

Frank McCown

Education Ph.D. in Computer Science – Old Dominion Univ. (2007 expected) M.S. in Computer Science – Univ of Arkansas in Little Rock (2002) B.S. in Computer Science – Harding University (1996)

Work Experience 1997-2004 – Instructor of CS at Harding University (Searcy, AR) 1996-1997 – Software Eng for Lockheed Martin (Denver, CO) 1995 – Software Engineer Intern for Auto-trol (Denver, CO)

Honors 2007 – Outstanding Graduate Research Assistant 2006 – College of Sciences Dissertation Fellowship 2005 – Outstanding Graduate Assistant 2004 – Dominion Scholar

Page 4: My Website Was Lost, But Now It’s Found
Page 5: My Website Was Lost, But Now It’s Found

Industry vs. Academia

39.5%42.5%

18.0%

No preference

AcademiaIndustry

2000 survey by The Scientist magazine asked their readers:

Overall which environment do you prefer?

73% of survey respondents had held research positions in industry and academia.

http://www.the-scientist.com/2001/4/16/28/2/

Page 6: My Website Was Lost, But Now It’s Found

Industry vs. Academia

Movement Academia Industry is common Industry Academia very uncommon

Flexibility Schedule Focus

Compensation

Page 7: My Website Was Lost, But Now It’s Found

Research Interests

Digital preservation Will we be able to see our websites 20 years from now?

Web crawling How can search engines and web archives duplicate/

download our websites more efficiently and effectively?

Search engines How much/what content do commercial search engines

index and cache? How synchronized are search engines APIs with what the

general user sees?

Page 8: My Website Was Lost, But Now It’s Found

Black hat: http://img.webpronews.com/securitypronews/110705blackhat.jpgVirus image: http://polarboing.com/images/topics/misc/story.computer.virus_1137794805.jpg Hard drive: http://www.datarecoveryspecialist.com/images/head-crash-2.jpg

Page 9: My Website Was Lost, But Now It’s Found

Web Infrastructure

Page 10: My Website Was Lost, But Now It’s Found
Page 11: My Website Was Lost, But Now It’s Found
Page 12: My Website Was Lost, But Now It’s Found

Cached Image

Page 13: My Website Was Lost, But Now It’s Found

First developed in fall of 2005 Available for download at

http://www.cs.odu.edu/~fmccown/warrick/ www2006.org – first lost website reconstructed (Nov

2005) DCkickball.org – first website someone else

reconstructed without our help (late Jan 2006) www.iclnet.org – first website we reconstructed for

someone else (mid Mar 2006) Internet Archive officially endorses Warrick (mid Mar

2006)

Page 14: My Website Was Lost, But Now It’s Found

Warrick-related Publications Frank McCown, Norou Diawara, and Michael L. Nelson.

Factors Affecting Website Reconstruction from the Web Infrastructure. JCDL 2007. June 2007. Vancouver, British Columbia, Canada.

Catherine C. Marshall, Frank McCown, and Michael L. Nelson. Evaluating Personal Archiving Strategies for Internet-based Information. IS&T Archiving 2007. May 2007. Arlington, Virginia.

Frank McCown and Michael L. Nelson. Characterization of Search Engine Caches. IS&T Archiving 2007. May 2007. Arlington, Virginia, USA.

Frank McCown, Joan A. Smith, Michael L. Nelson, and Johan Bollen. Lazy Preservation: Reconstructing Websites by Crawling the Crawlers. WIDM 2006. November 2006. Arlington, Virginia.

Frank McCown and Michael L. Nelson. Evaluation of Crawling Policies for a Web-Repository Crawler. HYPERTEXT 2006. August 2006. Odense, Denmark.

Page 15: My Website Was Lost, But Now It’s Found

Search Engine APIs

Frank McCown and Michael L. Nelson. Poster: Search Engines and Their Public Interfaces: Which APIs are the Most Synchronized? WWW 2007

Frank McCown and Michael L. Nelson. Agreeing to Disagree: Search Engines and their Public Interfaces. JCDL 2007

Page 16: My Website Was Lost, But Now It’s Found

Thank You

Questions?