Information Science Blog Aggregation

9

Click here to load reader

Transcript of Information Science Blog Aggregation

Page 1: Information Science Blog Aggregation

ISBLOGSbuilding a better blog database

FRANCESCAGIANNETTI

ELLIEDICKSON

FRANNYGAEDE

DARIENLARGE

VIRGINIATRUEHEART

Page 2: Information Science Blog Aggregation

Goals✦ Pull in RSS feeds to show article snippets

& other info✦ Create a tag cloud to offer an additional

entry point to the collection

Create a resource for incoming and continuing iSchool students

Page 3: Information Science Blog Aggregation

Blog Curationsee also: sisyphean tasks

✦ Many (most?) incoming students are not info science people

✦ Info science is truly multi-disciplinary, blogosphere is doubleplusbig

✦ How to find the good stuff?✦ Get your friends to find it for

you✦ Aren’t we friends?

Page 4: Information Science Blog Aggregation

Populating the DBsetting the table(s)

✦ Virginia the Architect structured the database.

✦ Look at all the table definitions. Look at ‘em.

‣ author‣ blog‣ blog_author‣ blog_cat‣ blog_maintainer

‣ category‣ feed‣ maintainer‣ tag‣ tag_blog

Page 5: Information Science Blog Aggregation

$toreturn['title'] = $article->find('title', 0)->plaintext;

$toreturn['pubDate'] = $article->find('published', 0)->plaintext; //print($toreturn['pubDate']);

$toreturn['link'] = $article->find('link', 0)->href;

$articletext = $article->find('summary', 0)->xmltext; //print($articletext); $articletext=trim($articletext); //print "<p>found content:encoded: $articletext</p>"; if ($articletext=='') { print "<em style='background-color:yellow;'>Could not find article in content:encoded; trying description</em>"; $articletext=$article->find('description', 0)->xmltext; } $articletext=preg_replace("/\[...\]/", "", $articletext); $articletext=preg_replace("/<img[^>]*\/>/", "", $articletext); $articletext=preg_replace("/<iframe[^>]*>/", "", $articletext); $articletext=preg_replace("/src *= *'[^']*'/", "", $articletext); $articletext=preg_replace("/<div[^>]*>/", "", $articletext); $articletext=preg_replace("/<span[^>]*>/", "", $articletext); $firstparapos=strpos($articletext, "</p>"); // print $articletext;$toreturn[text]=$articletext; $html->clear(); unset($html);return $toreturn;}?>

RSS & PHPacronym bros

✦ Select items to display by blog, category or maintainer

✦ Add & modify feed URLs for blogs✦ Retrieve & display content from blog

feeds

Page 6: Information Science Blog Aggregation

$toreturn['title'] = $article->find('title', 0)->plaintext;

$toreturn['pubDate'] = $article->find('published', 0)->plaintext; //print($toreturn['pubDate']);

$toreturn['link'] = $article->find('link', 0)->href;

$articletext = $article->find('summary', 0)->xmltext; //print($articletext); $articletext=trim($articletext); //print "<p>found content:encoded: $articletext</p>"; if ($articletext=='') { print "<em style='background-color:yellow;'>Could not find article in content:encoded; trying description</em>"; $articletext=$article->find('description', 0)->xmltext; } $articletext=preg_replace("/\[...\]/", "", $articletext); $articletext=preg_replace("/<img[^>]*\/>/", "", $articletext); $articletext=preg_replace("/<iframe[^>]*>/", "", $articletext); $articletext=preg_replace("/src *= *'[^']*'/", "", $articletext); $articletext=preg_replace("/<div[^>]*>/", "", $articletext); $articletext=preg_replace("/<span[^>]*>/", "", $articletext); $firstparapos=strpos($articletext, "</p>"); // print $articletext;$toreturn[text]=$articletext; $html->clear(); unset($html);return $toreturn;}?>

RSS & PHPacronym bros

✦ Retrieve info from database✦ Check format of URL

✦ RSS vs. Atom✦ Retrieve contents as object✦ Get latest item from contents✦ Parse elements✦ Search for text of latest blog entry✦ Perform text processing✦ Return the information found

Page 7: Information Science Blog Aggregation

Tag Cloudchance of rain: 0%

✦ Ellie & Frankie installed and customized v-nessa.net’s PHP-based tag cloud like bosses

✦ And this is it:

Page 8: Information Science Blog Aggregation

Putting it Togetherbug squashing 4evar

✦ CSS is fun. And magic. And occasionally a pain in the ass

✦ RSS is not the most consistent medium✦ Populating the tables through forms✦ Backups through CRON✦ Search