Computationally E˜cient Earthquake Detection in Continuous ... · •Min-Hash [3] reduces...

Clara Yoon1, Ossian O’Reilly1, Karianne Bergen2, Gregory Beroza1

Department of Geophysics1, Institute for Computational & Mathematical Engineering2, Stanford University(ceyoon@stanford.edu)

Computationally E�cient Earthquake Detection in Continuous Waveform Data

1. Introduction

2. FAST Method: Single Channel

4. FAST Detection Results3. Network Detection

5. Summary and Future Work

Detection Sensitivity

l E�

cyTemplate Matching

New approach: FAST

STA/LTA

General Applicability

Autocorrelation

Figure 1: Comparison of earthquake detection methods in terms of 3 qualitative metrics: 1) Detection sensitivity,2) General applicability,3) Computational e�ciency.FAST scores high on all 3 metrics, while other detection methods score high on only 2 out of 3.

New approach to earthquake detection: • Apply “big data” methods to observational seismology• Adapt efficient search algorithm to find similar audio clips [1] to detect similar waveforms in continuous seismic data

New earthquake detection algorithm: Fingerprint and Similarity Thresholding (FAST)1) Sensitive: waveform correlation, over network of stations2) General: �nds seismic signals from unknown sources3) E�cient: fast, scalable to years of continuous data

Potential applications: �nd unknown seismic events• Reduce catalog completeness magnitudes• Identify small repeating earthquakes• Find low SNR, non-impulsive events• Monitor during seismically active periods - Foreshocks, aftershocks, swarms• Find events in sparse seismic networks - Induced seismicity

Motivation: • Improve earthquake detection and monitoring• Find more low-magnitude events in very large continuous data sets

Identify songsFind duplicate

web pagesSearch for

copyright content

A. Feature Extraction

Figure 2: Feature extraction computes binary �ngerprints, which are compact proxies for the original waveforms.

Spectral Images (short

spectrogram windows)

Haar Wavelet Transform

(for fast data compression)

Top Deviation Wavelet

Coe�cients(extract only key

discriminative features)

Fingerprints (sparse, binary)

Time (s)

log10(|Spectrogram|)

500 1000 1500 2000 2500 3000 35000

0 500 1000 1500 2000 2500 3000 3500−400−200

0200400

Time (s)

0 2 4 6 8 100

Time (s)

0 2 4 6 8 100

wavelet transform x index

0 20 40 600

fingerprint x index

t y in

0 20 40 600

fingerprint x index

t y in

0 20 40 600

Continuous Time Series

Seismic Data

Spectrogram

fingerprint x index

t y in

Similar fingerprint pair

0 16 32 48 64 0

both 0

both 1

0 5 10−0.5

Time (s)

Similar waveform pair

start 1266.95 sstart 1629 s

MHS subset match?

Table 1

Table 2

Table 3

A h(A) B h(B) Database

B. Database GenerationWaveforms: Correlation coe�cient Fingerprints: Jaccard similarity

CC = 0.9808

J = 0.7544

Figure 3: (Left) Correlation coe�cient CC measures how similar two waveforms a and b are. (Right) Jaccard similarity J(A,B) measures how similar two binary �ngerprints A and B are.

• Locality-sensitive hash (LSH) functions [2] group highly similar fingerprints together with high probability in the database• Min-Hash [3] reduces fingerprint dimensionality, while preserving Jaccard similarity in probabilistic way, to short integer arrays: Min-Hash Signature (MHS)

Figure 4: LSH example: how to group 2 similar �ngerprints A and B in the database. The MHS has 6 integers. Each hash table (red box) gets a di�erent MHS subset; A and B enter same group (oval) in Tables 1 and 3, where their MHS subsets match. However, in Table 2, the MHS subsets are not equal, so A and B enter di�erent groups.

Hash Table b=3Hash Table 1 Hash Table 2

ilarit

• •

• • •

( , ) ( , )

C. Similarity Search

Figure 5: (A) Example database generated by LSH, with 3 hash tables (red boxes). Each table has many groups (ovals); similar earthquake waveforms (colored) are likely to be in the same group, while noise waveforms (black) are in other groups. (B) Search for waveforms in database similar to query waveform (blue). First, LSH determines the group in each table to which the query waveform belongs. Then we collect all other database waveforms in these groups, form pairs of (query, database) waveforms, and compute their FAST similarity. We ignore all other groups in the database, so the query time is near-constant, and scalable for large data sets.

FAST similarity: Fraction of hash tables with

�ngerprint pair in same group

Note: We show waveforms for easy visualization, but we actually store references to �ngerprints in the groups. The “groups” are technically hash buckets.

Figure 6: Similarity search output for single channel of data, as a sparse similarity matrix: we use every possible waveform in the data as a search query, with near-linear runtime. Each square represents a pair of �ngerprints at two di�erent times. Black squares indicate high FAST similarity, where we �nd highly similar waveforms.

Example: One Pair of WaveformsStation Time 1 Time 2 FAST Similarity

CCOB.EHZ

CADB.EHZ

CAO.EHZ

CHR.EHZ

CML.EHZ

Network Similarity = 0.88

Why detect over a distributed network of stations?• Detect more low-magnitude events, as shown by template matching studies [4]• Fewer false positive detections: coherent signal at multiple stations more likely to be earthquake, not local noise• Need to detect on at least 4 stations to locate earthquake

Figure 7: Example network similarity matrix calculation (one element) for one pair of similar earthquake waveforms from two di�erent times, at 5 stations. FAST similarity values from each station coherently add. We apply a detection threshold on the network similarity matrix.

Method: Network Similarity Matrix• Sum each single-channel similarity matrix, from different stations

122˚W

121.9˚W

121.8˚W

121.7˚W

121.6˚W

121.5˚W

121.4˚W

121.3˚W

37˚N 37˚N

37.1˚N 37.1˚N

37.2˚N 37.2˚N

37.3˚N 37.3˚N

37.4˚N 37.4˚N

37.5˚N 37.5˚N

37.6˚N 37.6˚N

122˚W

121.9˚W

121.8˚W

121.7˚W

121.6˚W

121.5˚W

121.4˚W

121.3˚W

37˚N 37˚N

37.1˚N 37.1˚N

37.2˚N 37.2˚N

37.3˚N 37.3˚N

37.4˚N 37.4˚N

37.5˚N 37.5˚N

37.6˚N 37.6˚N

San Jose

CAOCHR

Calaveras Fault

NCSN StationsMainshock Mw 4.1Catalog EarthquakesCities0 10

0 10 20

CCOB.EHE

CCOB.EHN

CCOB.EHZ

CADB.EHZ

CAO.EHZ

CHR.EHZ

CML.EHZ

Time (s), start = 1733

2011−01−08 network similarity = 1.14

0 10 20

CCOB.EHE

CCOB.EHN

CCOB.EHZ

CADB.EHZ

CAO.EHZ

CHR.EHZ

CML.EHZ

121.8˚W

121.7˚W

121.6˚W

121.5˚W

37.1˚N 37.1˚N

37.2˚N 37.2˚N

37.3˚N 37.3˚N

37.4˚N 37.4˚N

121.8˚W

121.7˚W

121.6˚W

121.5˚W

37.1˚N 37.1˚N

37.2˚N 37.2˚N

37.3˚N 37.3˚N

37.4˚N 37.4˚N

CCOB.EHN

Calaveras Fault

NCSN StationsMainshock Mw 4.1Catalog Earthquakes (Detected)Catalog Earthquakes (Missed)

0 10 20

553.95

616.74

792.45

993.81

1264.18

1626.34

1786.78

4859.12

8212.12

22925.43

51805.57

150967.85

152038.98

153018.91

157526.70

161549.05

166401.85

174144.37

175332.16

395178.84

583296.05

Time (s)

Catalog events

0 10 20

826 1156 1335 1726 1806 8288

10317 70510 90729152081152110153061159891191275218909237276245266282208314782377222377458377588

Time (s)

FAST new events,also in autocorrelation

0 10 20

377757

378051

378137

379059

380207

395075

411557

429893

442362

444559

444715

446071

446371

446430

447459

480125

480714

489761

524185

537113

537379

Time (s)

FAST new events,also in autocorrelation

0 10 20

7790 11296 63713136263138966176074188987189017322949352902403900427201429006444646489645489675489696489737489805489944490170504882506661519785524516

Time (s)

FAST new events,not in autocorrelation

0 10 20

176032

256800

263895

323884

324483

411585

432734

489334

542189

577120

Time (s)

False detections

0 10 20

314076.86

336727.14

361735.92

Time (s)

Missed catalog events

0 10 20

51448.55

55724.95

57585.65

245004.15

298128.35

329703.75

329780.85

329855.75

331733.55

332528.45

332918.65

340640.25

396395.25

442172.55

449503.35

452129.25

510464.25

560483.95

571979.45

Time (s)

Autocorrelation new eventsmissed by FAST

0 10 20

CCOB.EHE

CCOB.EHN

CCOB.EHZ

CADB.EHZ

CAO.EHZ

CHR.EHZ

CML.EHZ

Autocorrelation FAST 0

Detection Performance

86 89earthquakes earthquakes

Detection Performance

37 39earthquakes earthquakes

−2000

200 CCOB.EHE

−2000

200 CCOB.EHN

−2000

200 CCOB.EHZ

−2000

200 CADB.EHZ

−2000

200 CAO.EHZ

−2000

200 CHR.EHZ

0 3 6 9 12 15 18 21 24−2000

200 CML.EHZ

Time (hr)

Autocorrelation FAST 0 4 812162024283236

Runtime Performance, 1 processor

31 hours25 minutes

46 minutes

Runtime Performance, 1 processor

9 days13 hours

1 hour36 minutes

1 week

Single channel, 1 week continuous data

Multiple Stations, 1 day continuous data

Goal: Detect uncataloged earthquakes in continuous data with FAST• Aftershocks of Mw 4.1 event on Calaveras Fault• Data from Northern California Seismic Network (NCSN)

Figure 8: Map of catalog events, 5 stations, used 7 channels in network similarity matrix. CCOB: 3 components, other stations: only vertical. Bandpass �lters applied to remove correlated noise: 4-10 Hz CCOB, 2-6 Hz CML, 2-10 Hz all others. Decimate to 20 sps.

FAST Detections: All 13 catalog events, 26 new events, 8 false detections

Figure 9: (Top) FAST detection results plotted on continuous data used for detection. (Top right) Example uncataloged earthquakes detected with FAST. (Bottom right) FAST �nds about the same total number of events as autocorrelation, but runs 40 times faster.

Figure 10: Map of catalog events, 1 station, used single channel CCOB.EHN for detection. Bandpass �lter applied to remove correlated noise: 4-10 Hz. Decimate to 20 sps. FAST Detections:

21/24 catalog events, 68 new eventsFigure 11: (Top) Waveforms of FAST detections: catalog events (blue), new uncataloged events (red). (Top right) Waveforms of FAST detection errors: false positives (green), false negatives (black). (Bottom right) FAST �nds about the same total number of events as autocorrelation, but runs 140 times faster.

FAST Detection Errors: 12 false detections, 22 missed detections103 104 105 106 107 10810−4

10−2

Data duration (s)

Autocorrelation, 1 processorAutocorrelation, 1000 processorsFAST, 1 processor

Week: 140x

Month: 600x

Year: 7500x

Day: 40x

Figure 12: Detection algorithm runtime as a function of continuous data duration. Dashed lines are extrapolations based purely on runtime scaling properties (without memory constraints): quadratic for autocorrelation, and near-linear for FAST. FAST should have its greatest utility for longer duration data sets (years), where it could run faster than even massively parallel autocorrelation.

Summary:• FAST algorithm adapted efficient audio search technology to detect earthquakes with similar waveforms• FAST finds as many total events as autocorrelation, with more false detections, but runs 140x faster on 1 week of continuous data• Detected earthquakes in network version of FAST using continuous data from 5 stations

Future Work:• Process longer duration continuous data (month, year): find infrequently repeating events?• Process data in parallel• “Data mining” on diverse data sets to detect earthquakes: foreshocks, aftershocks, swarms, induced seismicity, uncataloged events

References[1] Baluja and Covell (2008), “Waveprint”[2] Leskovec et al. (2014), “Mining of Massive Datasets”[3] Broder et al. (2000), “Min-Wise Indep. Permutations”[4] Shelly et al. (2007), “Non-volcanic tremor”

Computationally E˜cient Earthquake Detection in Continuous ... · •Min-Hash [3] reduces...

Documents

Transcript of Computationally E˜cient Earthquake Detection in Continuous ... · •Min-Hash [3] reduces...

Jaccard -- De La Réalité Au Texte

Dimensionality Reduction

Combining Mahalanobis and Jaccard Distance to Overcome ...

Parallel Jaccard and Related Graph Clustering Techniques

Strahlenfolter Stalking - TI - Britta Leia Jaccard - Psychotronfolter.wordpress.com

Pr Arnaud Jaccard - hemato.chu-limoges.fr

Jaccard Catalog

DNA Microarray Data Sets Biclustering using a ...scc/seminars/biclust-india06.pdf · most of data mining problems computationally infeasible (curse of dimensionality). Some data values

OPERACIÓN DEL HORNO ELÉCTRICO DE ARCO – CON EJEMPLOS Autor: Luis Ricardo Jaccard jaccard@uol.com.br.

OPERATION OF THE ELECTRIC ARC FURNACE – WITH EXAMPLES Author: Luis Ricardo Jaccard jaccard@uol.com.br.

0517 HFM Jaccard - Clinical Intelligence

06 Vacunas Dr Jaccard

Indice Jaccard

Dimensionality and dimensionality … and dimensionality reductiondimensionality reduction Nuno Vasconcelos ECE Depp,artment, UCSD. Note ... The curse of dimensionality

SUBMERGED ARC FURNACES – OPTIMAL ELECTRICAL PARAMETERS Author : Luis Ricardo Jaccard jaccard@uol.com.br.

CURRICULUM VITAE MARK JACCARDrem-main.rem.sfu.ca/papers/jaccard/Jaccard_CV.pdf · CURRICULUM VITAE MARK JACCARD January, 2017 Energy and Materials Research Group jaccard@sfu.ca School

Computationally Independent Model and Service Specification · Computationally Independent Model and Service Specification ... Service Specification v. 1.5 - 2 ... 0 Computationally

Bioinformatics Practices · 2017. 5. 18. · random4 random5 random6 Jaccard 0.41 0.52 0.85 0.73 0.43 0.74 By comparing true Jaccard with random Jaccards, the chance that true Jaccard

Computationally-predicted AOPs · computationally-predicted AOPs & networks

QlikView Dimensionality