Similar search with trillions of time series

Searching and MiningTrillions of Time Series Subsequencesunder Dynamic Time Warping

Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen,

Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, Eamonn Keogh

Hoan Nguyen – Trung Minh Nguyen

Abstract

Optimizationsto search and mine

large databasesvery fast

Outline

Problem

Related work

Definitions

Method

Results

Conclusion

Problem

Similarity search is an important part of most time series data mining algorithm.

Dynamic Time Warping is the best measure to use but slow.

DefinitionsTime series

Time series T is an ordered list:

T = t1, t2, … ,tm

DefinitionsSubsequence

Subsequence Ti,k of time series T is a time series of length k start at position i:

T = t1, t2, … ,tm

DefinitionsDynamic Time Warping

Related workKnown optimizations

Squared distance

√❑

Lower bounding

LB_KimFL LB_Keogh

Early abandon

MethodEarly abandon Z-Normalization

T3’T2’

Long Time series

SubsequencesNormalized

Subsequences

QueryNormalized

Normal approach

MethodEarly abandon Z-Normalization Novel approach

Early abandon with Z-normalization

1. Query is Z-normalized

2. Z-normalization of each subsequence will be calculated on the fly with the distance calculation.

3. If distance > best_so_far then early abandon both calculation

MethodRe-ordering Early Abandoning

Ordering is created based on the query.

MethodCascading Lower Bounds

Lower bounds are used in a cascade to prune candidates.

Results

Comparison between:

Naïve

- Z-normalization from start

- full ED(DTW) calculation

State-of-the-art (SOTA)

- Z-normalization from start

- early abandoning

- LB_Keogh bounding for DTW

UCRSuite

ResultsBaseline Tests on Random Walk

Million Billion Trillion0

UCR-ED

SOTA-ED

UCR-DTW

SOTA-DTWmin

|𝑄|=128

Million Billion0

UCR-ED

SOTA-ED

UCR-DTW

SOTA-DTWseco

|𝑄|=128

|𝑇|=2×106

ResultsEEG

Series10

UCR-ED

SOTA-ED

Conclusion

- The approach is very simple yet so effective.

- These optimizations can be applied to most measures but may not work for some, like: Hamming distance

Similar search with trillions of time series

Education

Transcript of Similar search with trillions of time series

Binary Search Tree 황승원 Fall 2011 CSE, POSTECH 2 2 Search Trees Search trees are ideal for implementing dictionaries – Similar or better performance than.

MOVING THE TRILLIONS THROUGH POSITIVE CARBON PRICING · CONTENT • Preface: The Challenge of Moving the Trillions by Alfredo Sirkis • The low carbon transition between the animal

trillions · 4 Trillions December 2016 Trillions is the official publication of the North America Procurement Council (NAPC), a Colorado Public Benefit Corporation whose primary mission

Efficient Similar Region Search with Deep Metric LearningEfficient Similar Region Search with Deep Metric Learning KDD 2018, August 19–23, 2018, London, United Kingdom! "! #! $!

FROM BILLIONS TO TRILLIONS - World Bankpubdocs.worldbank.org/.../2015/7/...action-booklet.pdf · 2 | From Billions to Trillions: MDB Contributions to Financing for Development The

Low-Leakage Secure Search for Boolean Expressions · Low-Leakage Secure Search for Boolean Expressions ... size, similar performance is harder to achieve for secure search. Obliv-ious

Geolus Search - Fermilab Search... · Geolus Search–the Google of 3D 6 • Capabilities – Finds identical and/or similar parts – Seed a search using an existing part, or attributes,

From Billions to Trillions: Transforming Development Finance

NHE in trillions

How to Make Trillions of Dollars _ Thought Catalog

Worldwide Intellectual Property Service · Prior art search / Invalidity search IntoMark search for similar trademarks Analysis of patent, technology, and market trend WIPS Global

Clustering Similar Query Sessions Toward Interactive Web Search

Juniata College Search Committee Guidelines for …...Juniata College Search Committee Guidelines for Diversity Employment 2 Unfortunately, we have not yet witnessed similar results

Local Search (Ch. 4-4.1) - · Local beam search Beam search is similar to hill climbing, except we track multiple states simultaneously Initialize: start with K random nodes 1. Find

Binary Search Trees CSE, POSTECH. Search Trees Search trees are ideal for implementing dictionaries – Similar or better performance than skip lists and.

Database Searching for Similar Sequences Search a sequence database for sequences that are similar to a query sequence Search a sequence database for sequences.

Scalable Similar Image Search by Joint Indices

PRIMER Dynasty Trusts: How the Wealthy Shield Trillions ...

GDP (Trillions of Chained 2000 Dollars) Year 1990- 1991 2001.

ELearning… · Corporation (‘GTS’) operates, and penalties can amount to trillions and trillions and trillions of $$$! (no, really!) • Section 1 of the Bribery Act 2010 not