Computationally E˜cient Earthquake Detection in Continuous ... · •Min-Hash [3] reduces...

1
Clara Yoon 1 , Ossian O’Reilly 1 , Karianne Bergen 2 , Gregory Beroza 1 Department of Geophysics 1 , Institute for Computational & Mathematical Engineering 2 , Stanford University ([email protected]) Computationally Efficient Earthquake Detection in Continuous Waveform Data 1. Introduction 2. FAST Method: Single Channel 4. FAST Detection Results 3. Network Detection 5. Summary and Future Work Detection Sensitivity Computational Efficiency Template Matching New approach: FAST STA/LTA General Applicability Autocorrelation Figure 1: Comparison of earthquake detection methods in terms of 3 qualitative metrics: 1) Detection sensitivity, 2) General applicability, 3) Computational efficiency. FAST scores high on all 3 metrics, while other detection methods score high on only 2 out of 3. New approach to earthquake detection: Apply “big data” methods to observational seismology • Adapt efficient search algorithm to find similar audio clips [1] to detect similar waveforms in continuous seismic data New earthquake detection algorithm: Fingerprint and Similarity Thresholding (FAST) 1) Sensitive : waveform correlation, over network of stations 2) General : finds seismic signals from unknown sources 3) Efficient : fast, scalable to years of continuous data Potential applications: find unknown seismic events • Reduce catalog completeness magnitudes • Identify small repeating earthquakes • Find low SNR, non-impulsive events • Monitor during seismically active periods - Foreshocks, aftershocks, swarms • Find events in sparse seismic networks - Induced seismicity Motivation: • Improve earthquake detection and monitoring • Find more low-magnitude events in very large continuous data sets Identify songs Find duplicate web pages Search for copyright content A. Feature Extraction Figure 2: Feature extraction computes binary fingerprints, which are compact proxies for the original waveforms. Spectral Images (short spectrogram windows) Haar Wavelet Transform (for fast data compression) Top Deviation Wavelet Coefficients (extract only key discriminative features) Fingerprints (sparse, binary) Frequency (Hz) Time (s) 500 1000 1500 2000 2500 3000 3500 0 5 10 −5 0 5 0 500 1000 1500 2000 2500 3000 3500 −400 −200 0 200 400 Time (s) Amplitude Time (s) Frequency (Hz) 0 2 4 6 8 10 0 2 4 6 8 10 −5 0 5 Time (s) Frequency (Hz) 0 2 4 6 8 10 0 2 4 6 8 10 −5 0 5 wavelet transform x index wavelet transform y index 0 20 40 60 0 5 10 15 20 25 30 −5 0 5 wavelet transform x index wavelet transform y index 0 20 40 60 0 5 10 15 20 25 30 −5 0 5 wavelet transform x index wavelet transform y index 0 20 40 60 0 5 10 15 20 25 30 1 0 1 wavelet transform x index wavelet transform y index 0 20 40 60 0 5 10 15 20 25 30 1 0 1 fingerprint x index fingerprint y index 0 20 40 60 0 5 10 15 20 25 30 0 1 fingerprint x index fingerprint y index 0 20 40 60 0 5 10 15 20 25 30 0 1 Continuous Time Series Seismic Data Spectrogram fingerprint x index fingerprint y index Similar fingerprint pair 0 16 32 48 64 0 16 32 48 64 both 0 one 1 both 1 0 5 10 −0.5 0 0.5 Time (s) Amplitude Similar waveform pair start 1266.95 s start 1629 s MHS subset match? Yes No Yes 155 64 231 35 110 21 155 64 207 35 110 21 Table 1 Table 2 Table 3 A h(A) B h(B) Database A B A B A B B. Database Generation Waveforms: Correlation coefficient Fingerprints: Jaccard similarity CC = 0.9808 J = 0.7544 Figure 3: (Left) Correlation coefficient CC measures how similar two waveforms a and b are. (Right) Jaccard similarity J(A,B) measures how similar two binary fingerprints A and B are. Locality-sensitive hash (LSH) functions [2] group highly similar fingerprints together with high probability in the database Min-Hash [3] reduces fingerprint dimensionality, while preserving Jaccard similarity in probabilistic way, to short integer arrays: Min-Hash Signature (MHS) Figure 4: LSH example: how to group 2 similar fingerprints A and B in the database. The MHS has 6 integers. Each hash table (red box) gets a different MHS subset; A and B enter same group (oval) in Tables 1 and 3, where their MHS subsets match. However, in Table 2, the MHS subsets are not equal, so A and B enter different groups. Hash Table b=3 Hash Table 1 Hash Table 2 FAST similarity 1 2/3 1/3 0 B A ( , ) ( , ) ( , ) ( , ) C. Similarity Search Figure 5: (A) Example database generated by LSH, with 3 hash tables (red boxes). Each table has many groups (ovals); similar earthquake waveforms (colored) are likely to be in the same group, while noise waveforms (black) are in other groups. (B) Search for waveforms in database similar to query waveform (blue). First, LSH determines the group in each table to which the query waveform belongs. Then we collect all other database waveforms in these groups, form pairs of (query, database) waveforms, and compute their FAST similarity. We ignore all other groups in the database, so the query time is near-constant, and scalable for large data sets. FAST similarity: Fraction of hash tables with fingerprint pair in same group Note: We show waveforms for easy visualization, but we actually store references to fingerprints in the groups. The “groups” are technically hash buckets. Figure 6: Similarity search output for single channel of data, as a sparse similarity matrix : we use every possible waveform in the data as a search query, with near-linear runtime. Each square represents a pair of fingerprints at two different times. Black squares indicate high FAST similarity, where we find highly similar waveforms. Example: One Pair of Waveforms Station Time 1 Time 2 FAST Similarity CCOB.EHZ CADB.EHZ CAO.EHZ CHR.EHZ CML.EHZ 0.31 0.03 0.02 0.40 0.12 Network Similarity = 0.88 + Why detect over a distributed network of stations? • Detect more low-magnitude events, as shown by template matching studies [4] • Fewer false positive detections: coherent signal at multiple stations more likely to be earthquake, not local noise • Need to detect on at least 4 stations to locate earthquake Figure 7: Example network similarity matrix calculation (one element) for one pair of similar earthquake waveforms from two different times, at 5 stations. FAST similarity values from each station coherently add. We apply a detection threshold on the network similarity matrix. Method: Network Similarity Matrix • Sum each single-channel similarity matrix, from different stations 122˚W 122˚W 121.9˚W 121.9˚W 121.8˚W 121.8˚W 121.7˚W 121.7˚W 121.6˚W 121.6˚W 121.5˚W 121.5˚W 121.4˚W 121.4˚W 121.3˚W 121.3˚W 37˚N 37˚N 37.1˚N 37.1˚N 37.2˚N 37.2˚N 37.3˚N 37.3˚N 37.4˚N 37.4˚N 37.5˚N 37.5˚N 37.6˚N 37.6˚N 122˚W 122˚W 121.9˚W 121.9˚W 121.8˚W 121.8˚W 121.7˚W 121.7˚W 121.6˚W 121.6˚W 121.5˚W 121.5˚W 121.4˚W 121.4˚W 121.3˚W 121.3˚W 37˚N 37˚N 37.1˚N 37.1˚N 37.2˚N 37.2˚N 37.3˚N 37.3˚N 37.4˚N 37.4˚N 37.5˚N 37.5˚N 37.6˚N 37.6˚N San Jose CCOB CADB CAO CHR CML Calaveras Fault NCSN Stations Mainshock Mw 4.1 Catalog Earthquakes Cities 0 10 km N 0 10 20 1 2 3 4 5 6 7 CCOB.EHE CCOB.EHN CCOB.EHZ CADB.EHZ CAO.EHZ CHR.EHZ CML.EHZ Time (s), start = 1733 Trace number 2011−01−08 network similarity = 1.14 0 10 20 1 2 3 4 5 6 7 CCOB.EHE CCOB.EHN CCOB.EHZ CADB.EHZ CAO.EHZ CHR.EHZ CML.EHZ Time (s), start = 70510 Trace number 2011−01−08 network similarity = 0.58 121.8˚W 121.8˚W 121.7˚W 121.7˚W 121.6˚W 121.6˚W 121.5˚W 121.5˚W 37.1˚N 37.1˚N 37.2˚N 37.2˚N 37.3˚N 37.3˚N 37.4˚N 37.4˚N 121.8˚W 121.8˚W 121.7˚W 121.7˚W 121.6˚W 121.6˚W 121.5˚W 121.5˚W 37.1˚N 37.1˚N 37.2˚N 37.2˚N 37.3˚N 37.3˚N 37.4˚N 37.4˚N CCOB.EHN Calaveras Fault NCSN Stations Mainshock Mw 4.1 Catalog Earthquakes (Detected) Catalog Earthquakes (Missed) 0 10 km N 0 10 20 553.95 616.74 792.45 993.81 1264.18 1626.34 1786.78 4859.12 8212.12 22925.43 51805.57 150967.85 152038.98 153018.91 157526.70 161549.05 166401.85 174144.37 175332.16 395178.84 583296.05 Time (s) Catalog event time in continuous data (s) Catalog events 0 10 20 826 1156 1335 1726 1806 8288 10317 70510 90729 152081 152110 153061 159891 191275 218909 237276 245266 282208 314782 377222 377458 377588 Time (s) FAST detection time in continuous data (s) FAST new events, also in autocorrelation 0 10 20 377757 378051 378137 379059 380207 395075 411557 429893 442362 444559 444715 446071 446371 446430 447459 480125 480714 489761 524185 537113 537379 Time (s) FAST new events, also in autocorrelation 0 10 20 7790 11296 63713 136263 138966 176074 188987 189017 322949 352902 403900 427201 429006 444646 489645 489675 489696 489737 489805 489944 490170 504882 506661 519785 524516 Time (s) FAST new events, not in autocorrelation 0 10 20 73919 81227 176032 256800 263895 323884 324483 411585 432734 489334 542189 577120 Time (s) FAST detection time in continuous data (s) False detections 0 10 20 314076.86 336727.14 361735.92 Time (s) Catalog event time (s) Missed catalog events 0 10 20 51448.55 55724.95 57585.65 245004.15 298128.35 329703.75 329780.85 329855.75 331733.55 332528.45 332918.65 340640.25 396395.25 442172.55 449503.35 452129.25 510464.25 560483.95 571979.45 Time (s) Autocorrelation detection time in continuous data (s) Autocorrelation new events missed by FAST 0 10 20 1 2 3 4 5 6 7 CCOB.EHE CCOB.EHN CCOB.EHZ CADB.EHZ CAO.EHZ CHR.EHZ CML.EHZ Time (s), start = 11295 Trace number 2011−01−08 network similarity = 0.53 Autocorrelation FAST 0 20 40 60 80 100 Number of detected events Detection Performance 86 89 earthquakes earthquakes Autocorrelation FAST 0 10 20 30 40 Number of detected events Detection Performance 37 39 earthquakes earthquakes −200 0 200 CCOB.EHE −200 0 200 CCOB.EHN −200 0 200 CCOB.EHZ −200 0 200 CADB.EHZ −200 0 200 CAO.EHZ −200 0 200 CHR.EHZ 0 3 6 9 12 15 18 21 24 −200 0 200 CML.EHZ Time (hr) Autocorrelation FAST 0 4 8 12 16 20 24 28 32 36 Runtime (hr) Runtime Performance, 1 processor 31 hours 25 minutes 46 minutes 1 day Autocorrelation FAST 0 2 4 6 8 10 Runtime (days) Runtime Performance, 1 processor 9 days 13 hours 1 hour 36 minutes 1 week Single channel, 1 week continuous data Multiple Stations, 1 day continuous data Goal: Detect uncataloged earthquakes in continuous data with FAST • Aftershocks of Mw 4.1 event on Calaveras Fault • Data from Northern California Seismic Network (NCSN) Figure 8: Map of catalog events, 5 stations, used 7 channels in network similarity matrix. CCOB: 3 components, other stations: only vertical. Bandpass filters applied to remove correlated noise: 4-10 Hz CCOB, 2-6 Hz CML, 2-10 Hz all others. Decimate to 20 sps. FAST Detections: All 13 catalog events, 26 new events, 8 false detections Figure 9: (Top) FAST detection results plotted on continuous data used for detection. (Top right) Example uncataloged earthquakes detected with FAST. (Bottom right) FAST finds about the same total number of events as autocorrelation, but runs 40 times faster. Figure 10: Map of catalog events, 1 station, used single channel CCOB.EHN for detection. Bandpass filter applied to remove correlated noise: 4-10 Hz. Decimate to 20 sps. FAST Detections: 21/24 catalog events, 68 new events Figure 11: (Top) Waveforms of FAST detections: catalog events (blue), new uncataloged events (red). (Top right) Waveforms of FAST detection errors: false positives (green), false negatives (black). (Bottom right) FAST finds about the same total number of events as autocorrelation, but runs 140 times faster. FAST Detection Errors: 12 false detections, 22 missed detections 10 3 10 4 10 5 10 6 10 7 10 8 10 −4 10 −2 10 0 10 2 10 4 10 6 10 8 10 10 10 12 Data duration (s) Runtime (s) Autocorrelation, 1 processor Autocorrelation, 1000 processors FAST, 1 processor Week: 140x Month: 600x Year: 7500x { { { ? ? ? Day: 40x { Figure 12: Detection algorithm runtime as a function of continuous data duration. Dashed lines are extrapolations based purely on runtime scaling properties (without memory constraints): quadratic for autocorrelation, and near-linear for FAST. FAST should have its greatest utility for longer duration data sets (years), where it could run faster than even massively parallel autocorrelation. Summary: FAST algorithm adapted efficient audio search technology to detect earthquakes with similar waveforms FAST finds as many total events as autocorrelation, with more false detections, but runs 140x faster on 1 week of continuous data Detected earthquakes in network version of FAST using continuous data from 5 stations Future Work: Process longer duration continuous data (month, year): find infrequently repeating events? Process data in parallel “Data mining” on diverse data sets to detect earthquakes: foreshocks, aftershocks, swarms, induced seismicity, uncataloged events References [1] Baluja and Covell (2008), “Waveprint” [2] Leskovec et al. (2014), “Mining of Massive Datasets” [3] Broder et al. (2000), “Min-Wise Indep. Permutations” [4] Shelly et al. (2007), “Non-volcanic tremor”

Transcript of Computationally E˜cient Earthquake Detection in Continuous ... · •Min-Hash [3] reduces...

Page 1: Computationally E˜cient Earthquake Detection in Continuous ... · •Min-Hash [3] reduces fingerprint dimensionality, while preserving Jaccard similarity in probabilistic way, to

Clara Yoon1, Ossian O’Reilly1, Karianne Bergen2, Gregory Beroza1

Department of Geophysics1, Institute for Computational & Mathematical Engineering2, Stanford University([email protected])

Computationally E�cient Earthquake Detection in Continuous Waveform Data

1. Introduction

2. FAST Method: Single Channel

4. FAST Detection Results3. Network Detection

5. Summary and Future Work

Detection Sensitivity

Com

puta

tiona

l E�

cien

cyTemplate Matching

New approach: FAST

STA/LTA

General Applicability

Autocorrelation

Figure 1: Comparison of earthquake detection methods in terms of 3 qualitative metrics: 1) Detection sensitivity,2) General applicability,3) Computational e�ciency.FAST scores high on all 3 metrics, while other detection methods score high on only 2 out of 3.

New approach to earthquake detection: • Apply “big data” methods to observational seismology• Adapt efficient search algorithm to find similar audio clips [1] to detect similar waveforms in continuous seismic data

New earthquake detection algorithm: Fingerprint and Similarity Thresholding (FAST)1) Sensitive: waveform correlation, over network of stations2) General: �nds seismic signals from unknown sources3) E�cient: fast, scalable to years of continuous data

Potential applications: �nd unknown seismic events• Reduce catalog completeness magnitudes• Identify small repeating earthquakes• Find low SNR, non-impulsive events• Monitor during seismically active periods - Foreshocks, aftershocks, swarms• Find events in sparse seismic networks - Induced seismicity

Motivation: • Improve earthquake detection and monitoring• Find more low-magnitude events in very large continuous data sets

Identify songsFind duplicate

web pagesSearch for

copyright content

A. Feature Extraction

Figure 2: Feature extraction computes binary �ngerprints, which are compact proxies for the original waveforms.

Spectral Images (short

spectrogram windows)

Haar Wavelet Transform

(for fast data compression)

Top Deviation Wavelet

Coe�cients(extract only key

discriminative features)

Fingerprints (sparse, binary)

Freq

uenc

y (H

z)

Time (s)

log10(|Spectrogram|)

500 1000 1500 2000 2500 3000 35000

5

10

−5

0

5

0 500 1000 1500 2000 2500 3000 3500−400−200

0200400

Time (s)

Ampl

itude

Time (s)

Freq

uenc

y (H

z)

0 2 4 6 8 100

2

4

6

8

10

−5

0

5

Time (s)

Freq

uenc

y (H

z)

0 2 4 6 8 100

2

4

6

8

10

−5

0

5

wavelet transform x index

wav

elet

tran

sfor

m y

inde

x

0 20 40 600

5

10

15

20

25

30

−5

0

5

wavelet transform x index

wav

elet

tran

sfor

m y

inde

x

0 20 40 600

5

10

15

20

25

30

−5

0

5

wavelet transform x index

wav

elet

tran

sfor

m y

inde

x

0 20 40 600

5

10

15

20

25

30

−1

0

1

wavelet transform x index

wav

elet

tran

sfor

m y

inde

x

0 20 40 600

5

10

15

20

25

30

−1

0

1

fingerprint x index

finge

rprin

t y in

dex

0 20 40 600

5

10

15

20

25

30

0

1

fingerprint x index

finge

rprin

t y in

dex

0 20 40 600

5

10

15

20

25

30

0

1

Continuous Time Series

Seismic Data

Spectrogram

fingerprint x index

finge

rprin

t y in

dex

Similar fingerprint pair

0 16 32 48 64 0

16

32

48

64

both 0

one 1

both 1

0 5 10−0.5

0

0.5

Time (s)

Ampl

itude

Similar waveform pair

start 1266.95 sstart 1629 s

MHS subset match?

Yes

No

Yes

155

64

231

35

110

21

155

64

207

35

110

21

Table 1

Table 2

Table 3

A h(A) B h(B) Database

A B

AB

A B

B. Database GenerationWaveforms: Correlation coe�cient Fingerprints: Jaccard similarity

CC = 0.9808

J = 0.7544

Figure 3: (Left) Correlation coe�cient CC measures how similar two waveforms a and b are. (Right) Jaccard similarity J(A,B) measures how similar two binary �ngerprints A and B are.

• Locality-sensitive hash (LSH) functions [2] group highly similar fingerprints together with high probability in the database• Min-Hash [3] reduces fingerprint dimensionality, while preserving Jaccard similarity in probabilistic way, to short integer arrays: Min-Hash Signature (MHS)

Figure 4: LSH example: how to group 2 similar �ngerprints A and B in the database. The MHS has 6 integers. Each hash table (red box) gets a di�erent MHS subset; A and B enter same group (oval) in Tables 1 and 3, where their MHS subsets match. However, in Table 2, the MHS subsets are not equal, so A and B enter di�erent groups.

Hash Table b=3Hash Table 1 Hash Table 2

FAST

sim

ilarit

y 1

2/3

1/3

0

• •

• •

• • •

B

A

( , )

( , ) ( , )

( , )

C. Similarity Search

Figure 5: (A) Example database generated by LSH, with 3 hash tables (red boxes). Each table has many groups (ovals); similar earthquake waveforms (colored) are likely to be in the same group, while noise waveforms (black) are in other groups. (B) Search for waveforms in database similar to query waveform (blue). First, LSH determines the group in each table to which the query waveform belongs. Then we collect all other database waveforms in these groups, form pairs of (query, database) waveforms, and compute their FAST similarity. We ignore all other groups in the database, so the query time is near-constant, and scalable for large data sets.

FAST similarity: Fraction of hash tables with

�ngerprint pair in same group

Note: We show waveforms for easy visualization, but we actually store references to �ngerprints in the groups. The “groups” are technically hash buckets.

Figure 6: Similarity search output for single channel of data, as a sparse similarity matrix: we use every possible waveform in the data as a search query, with near-linear runtime. Each square represents a pair of �ngerprints at two di�erent times. Black squares indicate high FAST similarity, where we �nd highly similar waveforms.

Example: One Pair of WaveformsStation Time 1 Time 2 FAST Similarity

CCOB.EHZ

CADB.EHZ

CAO.EHZ

CHR.EHZ

CML.EHZ

0.31

0.03

0.02

0.40

0.12

Network Similarity = 0.88

+

Why detect over a distributed network of stations?• Detect more low-magnitude events, as shown by template matching studies [4]• Fewer false positive detections: coherent signal at multiple stations more likely to be earthquake, not local noise• Need to detect on at least 4 stations to locate earthquake

Figure 7: Example network similarity matrix calculation (one element) for one pair of similar earthquake waveforms from two di�erent times, at 5 stations. FAST similarity values from each station coherently add. We apply a detection threshold on the network similarity matrix.

Method: Network Similarity Matrix• Sum each single-channel similarity matrix, from different stations

122˚W

122˚W

121.9˚W

121.9˚W

121.8˚W

121.8˚W

121.7˚W

121.7˚W

121.6˚W

121.6˚W

121.5˚W

121.5˚W

121.4˚W

121.4˚W

121.3˚W

121.3˚W

37˚N 37˚N

37.1˚N 37.1˚N

37.2˚N 37.2˚N

37.3˚N 37.3˚N

37.4˚N 37.4˚N

37.5˚N 37.5˚N

37.6˚N 37.6˚N

122˚W

122˚W

121.9˚W

121.9˚W

121.8˚W

121.8˚W

121.7˚W

121.7˚W

121.6˚W

121.6˚W

121.5˚W

121.5˚W

121.4˚W

121.4˚W

121.3˚W

121.3˚W

37˚N 37˚N

37.1˚N 37.1˚N

37.2˚N 37.2˚N

37.3˚N 37.3˚N

37.4˚N 37.4˚N

37.5˚N 37.5˚N

37.6˚N 37.6˚N

San Jose

CCOB

CADB

CAOCHR

CML

Calaveras Fault

NCSN StationsMainshock Mw 4.1Catalog EarthquakesCities0 10

km N

0 10 20

1

2

3

4

5

6

7

CCOB.EHE

CCOB.EHN

CCOB.EHZ

CADB.EHZ

CAO.EHZ

CHR.EHZ

CML.EHZ

Time (s), start = 1733

Trac

e num

ber

2011−01−08 network similarity = 1.14

0 10 20

1

2

3

4

5

6

7

CCOB.EHE

CCOB.EHN

CCOB.EHZ

CADB.EHZ

CAO.EHZ

CHR.EHZ

CML.EHZ

Time (s), start = 70510

Trac

e nu

mbe

r

2011−01−08 network similarity = 0.58

121.8˚W

121.8˚W

121.7˚W

121.7˚W

121.6˚W

121.6˚W

121.5˚W

121.5˚W

37.1˚N 37.1˚N

37.2˚N 37.2˚N

37.3˚N 37.3˚N

37.4˚N 37.4˚N

121.8˚W

121.8˚W

121.7˚W

121.7˚W

121.6˚W

121.6˚W

121.5˚W

121.5˚W

37.1˚N 37.1˚N

37.2˚N 37.2˚N

37.3˚N 37.3˚N

37.4˚N 37.4˚N

CCOB.EHN

Calaveras Fault

NCSN StationsMainshock Mw 4.1Catalog Earthquakes (Detected)Catalog Earthquakes (Missed)

0 10

km

N

0 10 20

553.95

616.74

792.45

993.81

1264.18

1626.34

1786.78

4859.12

8212.12

22925.43

51805.57

150967.85

152038.98

153018.91

157526.70

161549.05

166401.85

174144.37

175332.16

395178.84

583296.05

Time (s)

Cat

alog

eve

nt ti

me

in c

ontin

uous

dat

a (s

)

Catalog events

0 10 20

826 1156 1335 1726 1806 8288

10317 70510 90729152081152110153061159891191275218909237276245266282208314782377222377458377588

Time (s)

FAST

det

ectio

n tim

e in

con

tinuo

us d

ata

(s)

FAST new events,also in autocorrelation

0 10 20

377757

378051

378137

379059

380207

395075

411557

429893

442362

444559

444715

446071

446371

446430

447459

480125

480714

489761

524185

537113

537379

Time (s)

FAST new events,also in autocorrelation

0 10 20

7790 11296 63713136263138966176074188987189017322949352902403900427201429006444646489645489675489696489737489805489944490170504882506661519785524516

Time (s)

FAST new events,not in autocorrelation

0 10 20

73919

81227

176032

256800

263895

323884

324483

411585

432734

489334

542189

577120

Time (s)

FAST

det

ectio

n tim

e in

con

tinuo

us d

ata

(s)

False detections

0 10 20

314076.86

336727.14

361735.92

Time (s)

Cata

log

even

t tim

e (s

)

Missed catalog events

0 10 20

51448.55

55724.95

57585.65

245004.15

298128.35

329703.75

329780.85

329855.75

331733.55

332528.45

332918.65

340640.25

396395.25

442172.55

449503.35

452129.25

510464.25

560483.95

571979.45

Time (s)

Auto

corr

elat

ion

dete

ctio

n tim

e in

con

tinuo

us d

ata

(s)

Autocorrelation new eventsmissed by FAST

0 10 20

1

2

3

4

5

6

7

CCOB.EHE

CCOB.EHN

CCOB.EHZ

CADB.EHZ

CAO.EHZ

CHR.EHZ

CML.EHZ

Time (s), start = 11295

Trac

e nu

mbe

r

2011−01−08 network similarity = 0.53

Autocorrelation FAST 0

20

40

60

80

100

Num

ber o

f det

ecte

d ev

ents

Detection Performance

86 89earthquakes earthquakes

Autocorrelation FAST 0

10

20

30

40

Num

ber

of d

etec

ted

even

ts

Detection Performance

37 39earthquakes earthquakes

−2000

200 CCOB.EHE

−2000

200 CCOB.EHN

−2000

200 CCOB.EHZ

−2000

200 CADB.EHZ

−2000

200 CAO.EHZ

−2000

200 CHR.EHZ

0 3 6 9 12 15 18 21 24−2000

200 CML.EHZ

Time (hr)

Autocorrelation FAST 0 4 812162024283236

Run

time

(hr)

Runtime Performance, 1 processor

31 hours25 minutes

46 minutes

1 day

Autocorrelation FAST 0

2

4

6

8

10

Run

time

(day

s)

Runtime Performance, 1 processor

9 days13 hours

1 hour36 minutes

1 week

Single channel, 1 week continuous data

Multiple Stations, 1 day continuous data

Goal: Detect uncataloged earthquakes in continuous data with FAST• Aftershocks of Mw 4.1 event on Calaveras Fault• Data from Northern California Seismic Network (NCSN)

Figure 8: Map of catalog events, 5 stations, used 7 channels in network similarity matrix. CCOB: 3 components, other stations: only vertical. Bandpass �lters applied to remove correlated noise: 4-10 Hz CCOB, 2-6 Hz CML, 2-10 Hz all others. Decimate to 20 sps.

FAST Detections: All 13 catalog events, 26 new events, 8 false detections

Figure 9: (Top) FAST detection results plotted on continuous data used for detection. (Top right) Example uncataloged earthquakes detected with FAST. (Bottom right) FAST �nds about the same total number of events as autocorrelation, but runs 40 times faster.

Figure 10: Map of catalog events, 1 station, used single channel CCOB.EHN for detection. Bandpass �lter applied to remove correlated noise: 4-10 Hz. Decimate to 20 sps. FAST Detections:

21/24 catalog events, 68 new eventsFigure 11: (Top) Waveforms of FAST detections: catalog events (blue), new uncataloged events (red). (Top right) Waveforms of FAST detection errors: false positives (green), false negatives (black). (Bottom right) FAST �nds about the same total number of events as autocorrelation, but runs 140 times faster.

FAST Detection Errors: 12 false detections, 22 missed detections103 104 105 106 107 10810−4

10−2

100

102

104

106

108

1010

1012

Data duration (s)

Runt

ime

(s)

Autocorrelation, 1 processorAutocorrelation, 1000 processorsFAST, 1 processor

Week: 140x

Month: 600x

Year: 7500x

{ {{?

??

Day: 40x

{

Figure 12: Detection algorithm runtime as a function of continuous data duration. Dashed lines are extrapolations based purely on runtime scaling properties (without memory constraints): quadratic for autocorrelation, and near-linear for FAST. FAST should have its greatest utility for longer duration data sets (years), where it could run faster than even massively parallel autocorrelation.

Summary:• FAST algorithm adapted efficient audio search technology to detect earthquakes with similar waveforms• FAST finds as many total events as autocorrelation, with more false detections, but runs 140x faster on 1 week of continuous data• Detected earthquakes in network version of FAST using continuous data from 5 stations

Future Work:• Process longer duration continuous data (month, year): find infrequently repeating events?• Process data in parallel• “Data mining” on diverse data sets to detect earthquakes: foreshocks, aftershocks, swarms, induced seismicity, uncataloged events

References[1] Baluja and Covell (2008), “Waveprint”[2] Leskovec et al. (2014), “Mining of Massive Datasets”[3] Broder et al. (2000), “Min-Wise Indep. Permutations”[4] Shelly et al. (2007), “Non-volcanic tremor”