TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

20
TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker

Transcript of TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Page 1: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

TECHNIQUES FOR VISUALIZING MASSIVE DATA SETSLeilani Battle, Mike Stonebraker

Page 2: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Context

Visualization System

Database

query

result

Page 3: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Problem

• Performance• Vis systems don’t scale well for big data• Or are turning into databases

• Over-plotting• Makes visualizations unreadable• Waste of time/resources

Page 4: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Solution: Resolution Reduction

Visualization System

Database

Resolution Reduction Layer

query

queryplan query

queryplan result

modified query

reduced result

Page 5: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

ScalaR

• Scalable vis system for data exploration• Web front-end• Uses SciDB (www.scidb.org)

• Visualizes query results• Performs Resolution Reduction

Page 6: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Demo of ScalaR

Page 7: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Array Browser

• Collaboration with:• Brown: Justin DeBrabant, Stan Zdonik, Ugur Cetintemel• Stanford: Zhicheng Liu, Jeff Heer

• Google Maps-style exploration experience• Fetches subsets of the data (aka data tiles)

Page 8: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Array Browser Example

Page 9: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Array Browser Architecture

Page 10: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Demo of Array Browser

Page 11: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Future Work: Prefetching

• Goal: Reduce user-wait time by prefetching tiles• Cache tiles in the tile buffer• Need algorithms to decide what to pre-fetch

Page 12: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

User Behavior Predictor (Seer)

P

P

• Learn common query sequences from user traces

Page 13: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Statistical Analysis Predictor

P

P

P

• Look for statistical similarities in tiles• Try to guess what’s important based on patterns

Page 14: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Using Multiple Predictors

• Run multiple predictors (or experts) in parallel• Compare predictions to user’s actual behavior• Use predictions from best performing expert

• May change over time based on user’s goals

Page 15: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Other Challenges

• Lots if interesting problems left to address• Best eviction policy for the tile buffer?• How to share data between multiple users?• More predictors?

Page 16: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Questions?

Page 17: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.
Page 18: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Gemini Sagittarius

Dogs Cats

Page 19: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.
Page 20: TECHNIQUES FOR VISUALIZING MASSIVE DATA SETS Leilani Battle, Mike Stonebraker.

Prefetching Experts

• User behavior predictor (Seer)• Learn common query sequences from user traces

• Stats analysis predictor• Look for statistical similarities in tiles• Try to guess what’s important based on patterns