Making Mashups with Marmite
-
Upload
derek-hood -
Category
Documents
-
view
28 -
download
1
description
Transcript of Making Mashups with Marmite
![Page 1: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/1.jpg)
Making Mashups with Marmite
Jeff WongJason I. Hong
Carnegie Mellon University
![Page 2: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/2.jpg)
The Big Picture Problem
• Lots of content out there on the web– But not always in a form amenable to your needs
– Ex. Easy to get a list of hotels in San Jose, not so easy to sort by distance to convention center
• Two observations:– In many cases, all of the data and services people need
already exist, but not connected together
– Unlikely that a web site can predict all possible needs
![Page 3: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/3.jpg)
A Solution: Mashups
• Rapidly growing community of users creating “mashups” combining content from multiple web sites– Ex. Housingmaps.com
![Page 4: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/4.jpg)
![Page 5: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/5.jpg)
![Page 6: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/6.jpg)
![Page 7: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/7.jpg)
A Solution: Mashups
• Rapidly growing community of users creating “mashups” combining content from multiple web sites– Ex. Housingmaps.com
– Ex. MySpace child predators
– Ex. Friendster locations
– Ex. Most popular videos on YouTube, Yahoo Video, …
![Page 8: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/8.jpg)
A Solution: Mashups
• Rapidly growing community of users creating “mashups” combining content from multiple web sites– Ex. Housingmaps.com
– Ex. MySpace child predators
– Ex. Friendster locations
– Ex. Most popular videos on YouTube, Yahoo Video, …
• ProgrammableWeb.com statistics– ~1500 mashups created since April 2005
– 356 open web-based APIs available
![Page 9: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/9.jpg)
But Creating Mashups is Hard
• Requires lots of skill to create a mashup– Ex. Housingmaps creator has PhD in computer science
– Ex. MySpace child predator list took months
• Requires programming expertise in many areas– Web crawling
– Text parsing
– Pattern matching
– Databases
– HTML
![Page 10: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/10.jpg)
MarmiteEnd-User Programming for Mashups
• Main idea: make it easy to create web mashups
• Use a dataflow approach connecting small operators– Inspired by Unix pipes and Apple’s Automator
• Example:– Get all events from Upcoming.org
– Filter out events that are too old
– Put them all onto a map
• Runs inside of a standard web browser
![Page 11: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/11.jpg)
Set of Operators
![Page 12: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/12.jpg)
Data Flow View
![Page 13: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/13.jpg)
Data View
![Page 14: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/14.jpg)
Using Marmite (Envisioned)
• Extract content from one or more web pages – names, addresses, dates, phone #, URLs
• Process it in a data flow manner– filtering out values or adding metadata
– integrating with other data sources (similar to a database join operation)
• Direct the output to a variety of sinks– databases, map services, text files, visualizations, web
pages, or source code that can be further edited
![Page 15: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/15.jpg)
Marmite
• Motivation and Examples• Features and Design Rationale• User Evaluation
![Page 16: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/16.jpg)
Features and Design Rationale
• Conducted a series of quick evaluations to understand design space and potential problems– Automator
– Lo-fi prototypes
![Page 17: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/17.jpg)
Automator
![Page 18: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/18.jpg)
Informal Automator Evaluation
• Had three novices try three simple web-based tasks– Warm-up task
– Traverse a set of web pages
– Download a set of images
• Some findings:– Some difficulties knowing how to start and what to do next
– Little feedback about state of system between operations
– Difficult to iterate due to network speed issues
![Page 19: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/19.jpg)
Lo-Fi Prototypes
• 6 paper prototypes with 20 participants
![Page 20: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/20.jpg)
Design Solutions
• Problem: how to start and what to do next• Solution: Suggest next actions
– Weak data typing to find types (addresses, numbers, etc)
– Filter operators to only show relevant ones
– Suggest operators that might be applicable
![Page 21: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/21.jpg)
![Page 22: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/22.jpg)
Design Solutions
• Problem: little feedback about state of system between operations
• Solution: link data flow and data view together– Many systems take program-centric view (ex. Automator)
or data-centric view (ex. spreadsheets)
– Use hybrid data flow / data view, showing an operation and its effects together
– Data view usually “spreadsheet”, other views possible too (for example, maps)
![Page 23: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/23.jpg)
![Page 24: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/24.jpg)
![Page 25: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/25.jpg)
Design Solutions
• Problem: difficult to iterate due to network speeds• Solution: cache data, let people “replay” data
– Reload, pause, play
![Page 26: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/26.jpg)
Other Design Findings
• Screen real estate issues– Collapsible operators, leaving a readable label
![Page 27: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/27.jpg)
Extracting Generic Content
• Can’t have pre-defined extractor operators for every possible web site– Need a more general way of extracting data from pages
• Developed a generic wizard UI for selecting links– Content from that set could be extracted via other operators
– Uses Solvent (MIT), an XPath-based algorithm for finding patterns in web pages
• Finds “groups” of related web content based on how HTML is structured
![Page 28: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/28.jpg)
Marmite
![Page 29: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/29.jpg)
Operators
• Operators have input types – Operator uses this to guess which columns it wants
• Operators have output types
![Page 30: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/30.jpg)
Implementation
• JavaScript (for underlying code) and Extensible Binding Language (XBL for UI)
• Operators currently in JavaScript– Ideally could be scriptable in any programming language
– Currently ~15 operators
![Page 31: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/31.jpg)
Marmite
• Motivation and Examples• Features and Design Rationale• User Evaluation
![Page 32: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/32.jpg)
Evaluation
• Informal user study with 6 people– 2 novices
– 2 people with spreadsheet experience (formulas)
– 2 people with programming experience
• Tasks (in increasing difficulty)– Warmup task showing how to retrieve a set of addresses
and how to geocode an address
– Search for and filter out events further than a week away
– Compile a list of events from two event services and plot them on a map
– Recreate the housingmaps site
![Page 33: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/33.jpg)
Results
• Three people able to complete all tasks in ~1 hour– First two users confused about suggested actions
(automatically popped up, made manual for other 4 users)
– Novice made some progress, not able to finish all tasks
• Able to re-create housingmaps in ~15 minutes
![Page 34: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/34.jpg)
Marmite
![Page 35: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/35.jpg)
More Results
• Biggest barrier was understanding the data flow– Did not understand input and output concept
– Applied operators as one-off, did not realize that it was a static representation of flow
– Did not understand data flow and data view were linked
![Page 36: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/36.jpg)
Future Directions
• Short-term– Better screen-scraping operators
– More operators
– Better connection with web services (WSDL and REST)
– Better help for starting a data flow
• Long-term– Intelligence analysis
– Better visualizations
– Location-based services
![Page 37: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/37.jpg)
Conclusions
• Marmite, a tool for creating web-based mashups– Extract content from one or more web pages
– Process it in a data flow manner
– Direct the output to a variety of sinks
• Hybrid data flow / data view• User evaluation shows some promising results
Jeff Wong, Jason Hong, Making Mashups with Marmite: Re-purposing Web Content through End-User Programming, CHI 2007
![Page 38: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/38.jpg)
![Page 39: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/39.jpg)
![Page 40: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/40.jpg)
![Page 41: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/41.jpg)
Marmite
![Page 42: Making Mashups with Marmite](https://reader030.fdocuments.net/reader030/viewer/2022032612/56812f2f550346895d94c2dd/html5/thumbnails/42.jpg)
Types of Operators
• Sources– Add data into Marmite by querying databases, extracting
information from web pages, and so on.
• Processors– modify, combine, or delete existing rows. Example operators
include geocoding (converting street addresses to latitude and longitude) and filtering. Processor operators might add or remove columns as well
• Sinks– redirect the flow the data out of Marmite. Examples include
showing data on a map, saving it to a file, or to a web page.