Geolocation and Cassandra at Physi
-
Upload
cassandra-austin -
Category
Software
-
view
376 -
download
2
Transcript of Geolocation and Cassandra at Physi
Geolocation with CassandraAustin Cassandra Users – Jan 21, 2016
Matt Vorst
• Cassandra User– Since 2011
• Architect / Java developer
• Corporate Life– EntekIRD & Rockwell Automation
• Serial Entrepreneur– EventsInCincinnati.com – Co-founder– Dotloop, Inc. – Co-founder and CTO– Physi, Inc. – Co-founder and C*O
Physi [fiz-ee] (noun)1. a mobile app that pairs nearby people to play sports2. a movement to make a smaller, happier, healthier
world through play
Why Cassandra
• Operations is Hard– Most relational DB’s don’t scale easily or well– Murphy’s Law always strikes at the worst time– Recovery shouldn’t come at a high cost
• Distributed Design– Cassandra is a distributed technology– Applications are designed to be distributed
Necessary Location Services
• Proximity Search– Postal code range search– Distance between postal codes
• Location Conversion– Postal code to latitude/longitude– Latitude/longitude to postal code
• Search– City name lookup
Setup• Create the Keyspace
cqlsh> CREATE KEYSPACE physi WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
cqlsh> USE physi;
Postal Code to Latitude/Longitude• Use Case
– Place markers on a map
• Solution– Buy a database– PK: Country/postal code
Postal Code to Latitude/Longitude• Create Column Family
cqlsh>CREATE TABLE zip_code_master (location_country text, zip_code text, location_uuid uuid,
location_type text, city text, county text, state text, latitude_e6 bigint, longitude_e6 bigint, PRIMARY KEY (location_country, zip_code));
Postal Code to Latitude/Longitude• Add data
cqlsh> INSERT INTO zip_code_master (location_country, zip_code, location_uuid, location_type, city, county, state, latitude_e6, longitude_e6)VALUES(‘US’,’45219’, 7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39,’REGIONAL’,’Cincinnati’,’Hamilton’,’OH’,39127564,-84514489);
Postal Code to Latitude/Longitude• Search
cqlsh>SELECT * FROM zip_code_master WHERE location_country = 'US' AND zip_code = '45219';
location_country | zip_code | city | county | latitude_e6 | location_type | location_uuid | longitude_e6 | state------------------+----------+------------+----------+-------------+---------------+--------------------------------------+--------------+------ US | 45219 | Cincinnati | Hamilton | 39127564 | REGIONAL | 7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39 | -84514489 | OH
• Results
Postal Code to Latitude/Longitude• Things to Know
– Row width: ~10– Postal codes cover different areas– A single postal codes can span different cities,
counties, and even states– The largest postal code covers 10,000 mi2
Latitude/Longitude to Postal Code• Use Case
– Determine which postal code a user is currently in server side
– Use this to return suggestions
Latitude/Longitude to Postal Code• The Relational Way
– Draw a box, loop, and calculate
– Query: SELECT * FROM location_table
WHERE (min lat) < latitude AND latitude < (max lat)AND (min long) < longitude AND longitude < (max long)
Latitude/Longitude to Postal Code• Cassandra Solution
– Prebuild a lookup table• Slice the US up into 7mi by <=7mi squares• ~69 miles between lines of latitude• Longitude is not equally spaced
– PK: latE1|longE1
Latitude/Longitude to Postal Code• Cassandra Solution (cont.)
– Build: Add bordering postal codes
– Read: Loop and calculate distance
Latitude/Longitude to Postal Code• Create Column Family
cqlsh>CREATE TABLE latitude_longitude_zip_code (latitude_e1 int, longitude_e1 int, location_country text,
zip_code text, location text, PRIMARY KEY ((latitude_e1, longitude_e1),
location_country, zip_code));
Latitude/Longitude to Postal Code• Add data
cqlsh> INSERT INTO latitude_longitude_zip_code (latitude_e1, longitude_e1, location_country, zip_code,
location) VALUES(391,-845,'US','45219','{json data}');
cqlsh> INSERT INTO latitude_longitude_zip_code (latitude_e1, longitude_e1, location_country, zip_code,
location) VALUES(391,-845,'US','45220','{json data}');
Latitude/Longitude to Postal Code• Search
cqlsh>SELECT * FROM latitude_longitude_zip_code
WHERE latitude_e1 = 391 AND longitude_e1 = -845;
• Results latitude_e1 | longitude_e1 | location_country | zip_code | location-------------+--------------+------------------+----------+------------- 391 | -845 | US | 45206 | {json data} 391 | -845 | US | 45219 | {json data} 391 | -845 | US | 45220 | {json data}
Latitude/Longitude to Postal Code• Things to Know
– Row width: 1 to ~50– This was a short lived solution– Primarily using client location services– Still used as a fallback for web– Creation of the lookup table took 3 hours on
localhost with RAID 0 SSDs
City Name Lookup• Use Case
– Auto-complete city name
• Solution– Create a lookup– RK: searchTerm– CN: (0 padded count)|country|city
City Name Lookup• Create Column Family
cqlsh>CREATE TABLE name_search (search_term text, occurrence_count int, location_country text, city text, state text, location text, PRIMARY KEY ((search_term), occurrence_count,
location_country, city, state));
City Name Lookup• Add data
cqlsh> INSERT INTO name_search (search_term, occurrence_count, location_country, city,
state, location)VALUES ('aus', 31, 'US', 'austin', 'TX', '{json data}');
cqlsh> INSERT INTO name_search (search_term, occurrence_count, location_country, city,
state, location)VALUES ('aus', 10, 'US', 'austell', 'GA', '{json data}');
City Name Lookup• Search
cqlsh>SELECT * FROM name_search WHERE search_term = 'aus' ORDER BY occurrence_count DESC;
• Results search_term | occurrence_count | location_country | city | state | location-------------+------------------+------------------+-------------+-------+------------- aus | 31 | US | austin | TX | {json data} aus | 10 | US | austell | GA | {json data} aus | 10 | US | ausablefork | NY | {json data}
City Name Lookup• Things to Know
– Row width: 10 – 60K– Remove whitespace, special characters, convert
search terms to lowercase– Only search when 2 or more characters have
been entered
Postal Code Range Search• Use Case
– Find nearby neighborhoods
• Solution– Create a lookup table– RK: country|postal code
Postal Code Range Search• Create Column Family
cqlsh>CREATE TABLE zip_code_distance (location_country text, zip_code text, distance_e2 int,
location text, PRIMARY KEY ((location_country, zip_code),
distance_e2));
Postal Code Range Search• Add Data
cqlsh> INSERT INTO zip_code_distance (location_country, zip_code, distance_e2, location)VALUES('US', '78741', 0, '{json data for 78741}');
cqlsh> INSERT INTO zip_code_distance (location_country, zip_code, distance_e2, location)VALUES('US', '78741', 180, '{json data for 78702}');
cqlsh> INSERT INTO zip_code_distance (location_country, zip_code, distance_e2, location)VALUES('US', '78741', 220, '{json data for 78721}');
Postal Code Range Search• Search
cqlsh>SELECT * FROM zip_code_distance WHERE location_country = 'US' AND zip_code = '78741'AND distance_e2 < 200 ORDER BY distance_e2;
• Results location_country | zip_code | distance_e2 | location------------------+----------+-------------+----------------------- US | 78741 | 0 | {json data for 78741} US | 78741 | 180 | {json data for 78702}
Postal Code Range Search• Things to know
– Row width: 1 to ~45K
Distance Between Postal Codes• Use Case
– Estimate the distance between postal codes
• Solution– Create a lookup table– RK: country|postal code– CN: country|postal code– Value: distanceE2
Distance Between Postal Codes• Create Column Family
cqlsh>CREATE TABLE zip_code_distance_between(location_country_1 text, zip_code_1 text,location_country_2 text, zip_code_2 text, distance_e2 int,PRIMARY KEY ((location_country_1, zip_code_1),location_country_2, zip_code_2));
Distance Between Postal Codes• Add Data
cqlsh> INSERT INTO zip_code_distance_between (location_country_1, zip_code_1, location_country_2, zip_code_2, distance_e2)
VALUES('US', '78741', 'US', '78741', 0);
cqlsh> INSERT INTO zip_code_distance_between (location_country_1, zip_code_1, location_country_2, zip_code_2, distance_e2)
VALUES('US', '78741', 'US', '78702', 180);
Distance Between Postal Codes• Select
cqlsh>SELECT * FROM zip_code_distance_between WHERE location_country_1 = 'US' AND zip_code_1 = '78741' AND location_country_2 = 'US' AND zip_code_2 = '78702';
• Results location_country_1 | zip_code_1 | location_country_2 | zip_code_2 | distance_e2--------------------+------------+--------------------+------------+------------- US | 78741 | US | 78702 | 180
Distance Between Postal Codes• Things to know
– Row width: ~45K
Final Thoughts• Why just Cassandra?
– Fewer technologies to support• Operations• Development
– But be reasonable• Prebuild reference data
– Consider prebuilding data to reduce read time