"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar...
-
Upload
dataconomy-media -
Category
Data & Analytics
-
view
94 -
download
0
Transcript of "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar...
![Page 1: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/1.jpg)
Business Proprietary & Confidential
SimilarWeb & Tel-Aviv universityOn
Quantum Clustering
Sigalit Bechler
December 1, 2014
![Page 2: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/2.jpg)
Business Proprietary & Confidential
• SimilarWeb – a quick introduction
• Quantum Clustering
December 1, 2014
Agenda
![Page 3: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/3.jpg)
3/31
$65M
Funding
2007Founded 6
Offices300
Employees
SimilarWeb
Some of our clients
![Page 4: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/4.jpg)
What We Do
60M WEBSITES DAILYFOR EVERY WEBSITE:• TRAFFIC ESTIMATION• TRAFFIC SOURCES• AUDIENCE• INDUSTRY• CONTENT
We Provide Digital Insights to the Entire World2M MOBILE APPS DAILYFOR EVERY MOBILE APP:RATINGENGAGEMENTAPP STORE DATACATEGORYKEYWORDS
![Page 5: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/5.jpg)
What We Do
60M WEBSITES DAILYFOR EVERY WEBSITE:• TRAFFIC METRICS• TRAFFIC SOURCES• AUDIENCE• INDUSTRY• CONTENT
2M MOBILE APPS DAILYFOR EVERY MOBILE APP:• RATING• ENGAGEMENT• APP STORE• CATEGORY• KEYWORDS
INGEST:INTERNATIONAL PANEL, CRAWLING, ISP DATA, LEARNING SET
• 90K events/sec• 4TB/day compressed
BATCH & ON DEMAND PROCESSING:
• 100TB i/o a day• > 150 machines just in processing
cluster• Statistical & machine learning
algorithms
We Provide Digital Insights to the Entire World
![Page 6: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/6.jpg)
Business Proprietary & Confidential
Quantum clustering
December 1, 2014
Prof. David Horn and Dr. Assaf Gottlieb.Phys. Rev. Lett. 88 (2002) 018702
![Page 7: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/7.jpg)
• Unsupervised learning problem - dealing with unlabeled data• Goal: group together elements that are similar to each other in some sense.• We usually have an idea or a desire of what this “sense” should be• Might discover new patterns
Clustering - general overview
label feature1 feature2 feature3 feature4 label feature1 feature2 feature3 feature4
![Page 8: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/8.jpg)
• The user identity is unknown• Leaving it in for the example
Clustering - general overview
label feature1 feature2 feature3 feature4 label feature1 feature2 feature3 feature4
?
?
?
?
?
?
?
?
![Page 9: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/9.jpg)
• Grouping by gender
Clustering - general overview
label feature1 feature2 feature3 feature4 label feature1 feature2 feature3 feature4
![Page 10: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/10.jpg)
• Grouping by fields of interest
Clustering- general overview
label feature1 feature2 feature3 feature4 label feature1 feature2 feature3 feature4
![Page 11: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/11.jpg)
Quantum Clustering - Motivation
• Relatively easy clustering task
• Still need to set the number of clusters manually.
• Very complex clustering task. • Unbiased analysis of X-Ray
absorption data
![Page 12: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/12.jpg)
Quantum Clustering - Example
Analyzing Big Data with Dynamic Quantum Clustering M. Weinstein, F. Meirer, A. Hume, Ph. Sciau, G. Shaked, R. Hofstetter, E. Persi, A. Mehta, D. Horn http://arxiv.org/abs/1310.2700
![Page 13: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/13.jpg)
• Information era - big data• Massive collection of data• Strong presence of outliers• Unknown structures• Non trivial patterns
Why is it important?
Quantum Clustering
Distributed computationtechnologies
![Page 14: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/14.jpg)
Quantum clustering - the potential trick1. Turn data-points into Gaussians centered around the data points:
2. Plug into Schrodinger equation and find V(). Define the solution for V as the potential transform
• Single point → Gaussian →• Multi-points: =
3. Move each data point towards the direction of the minima of the according to the potential surface with gradient descent.
![Page 15: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/15.jpg)
Quantum clustering – reasoning
• Why does it make sense?• Models the divergence effects from the cluster center.• V() : The effects that bind points from the same cluster together.• We may say that we are looking for the minima of V() since this is where the
divergence effects are minimal (slow changes – small numerator and high density- denominator:
• SVD may be performed prior to the clustering: X=USVT , perform QC on U or V• Solve the fact that each feature is of a different dimension type, and scale.• enable dimension reduction to those with the highest variance.
![Page 16: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/16.jpg)
A topographic map of the probability distribution for the crab data set with =1/2 using principal components 2 and 3. There exists only one maximum.
A topographic map of the potential for the crab data set with =1/2 using principal components 2 and 3 . The four minima are denoted by crossed circles. The contours are set at values V=cE for c=0.2,…,1.
The Crabs Example (from Ripley’s textbook), 4 classes, 50 samples each, d=5
The data 3D Plot of the potential
![Page 17: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/17.jpg)
Quantum clustering - summary
• Built-in capability to handle outliers (divergence part): no need for additional parameters or processes, no effect on the amount of significant clusters
• The cluster may be a line or other shape and not necessarily a point in the feature space.
• The clusters are not defined by geometric or probability considerations alone
• No need to pre-define the amount clusters
![Page 18: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/18.jpg)
• Existing approximated quantum clustering variation for improving time complexity.
• Sensitive to small variations in the data density unlike geometry consideration alone.
• Possible Distributed calculation:• Since all we have is to calculate V, V for every data point parts can be calculated at
each point separately in a different machine
• Performed exceptionally in exposing hidden patterns of data structures from a wide range of fields - finance, on-line marketing, experimental physics, speech-recognition, biological data.
Quantum clustering
![Page 19: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/19.jpg)
• Physics may provide interesting perspective to questions that at the first glance has no connection to physics.
• It has been done in scale space theory • Sensitive to small variations in the data density• In bio-informatics for extracting protein structure• And many more
Quantum clustering
![Page 20: "Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler, Researcher, Similar Web](https://reader035.fdocuments.net/reader035/viewer/2022070516/5873b5521a28abbc788b45ff/html5/thumbnails/20.jpg)
Business Proprietary & Confidential
Thank You!
December 1, 2014