Geolocation and Cassandra at Physi

Click here to load reader

download Geolocation and Cassandra at Physi

of 36

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Geolocation and Cassandra at Physi

  • Geolocation with CassandraAustin Cassandra Users Jan 21, 2016

  • Matt VorstCassandra UserSince 2011

    Architect / Java developer

    Corporate LifeEntekIRD & Rockwell Automation

    Serial Co-founderDotloop, Inc. Co-founder and CTOPhysi, Inc. Co-founder and C*O


  • Physi [fiz-ee] (noun)a mobile app that pairs nearby people to play sports

    a movement to make a smaller, happier, healthier world through play

  • Why CassandraOperations is HardMost relational DBs dont scale easily or wellMurphys Law always strikes at the worst timeRecovery shouldnt come at a high cost

    Distributed DesignCassandra is a distributed technologyApplications are designed to be distributed

  • Necessary Location ServicesProximity SearchPostal code range searchDistance between postal codes

    Location ConversionPostal code to latitude/longitudeLatitude/longitude to postal code

    SearchCity name lookup

  • SetupCreate the Keyspace

    cqlsh> CREATE KEYSPACE physi WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};

    cqlsh> USE physi;

  • Postal Code to Latitude/LongitudeUse CasePlace markers on a map

    SolutionBuy a databasePK: Country/postal code

  • Postal Code to Latitude/LongitudeCreate Column Family

    cqlsh>CREATE TABLE zip_code_master (location_country text, zip_code text, location_uuid uuid, location_type text, city text, county text, state text, latitude_e6 bigint, longitude_e6 bigint, PRIMARY KEY (location_country, zip_code));

  • Postal Code to Latitude/LongitudeAdd data

    cqlsh>INSERT INTO zip_code_master (location_country, zip_code, location_uuid, location_type, city, county, state, latitude_e6, longitude_e6)VALUES(US,45219, 7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39,REGIONAL,Cincinnati,Hamilton,OH,39127564,-84514489);

  • Postal Code to Latitude/LongitudeSearch

    cqlsh>SELECT * FROM zip_code_master WHERE location_country = 'US' AND zip_code = '45219';

    location_country | zip_code | city | county | latitude_e6 | location_type | location_uuid | longitude_e6 | state------------------+----------+------------+----------+-------------+---------------+--------------------------------------+--------------+------ US | 45219 | Cincinnati | Hamilton | 39127564 | REGIONAL | 7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39 | -84514489 | OHResults

  • Postal Code to Latitude/LongitudeThings to KnowRow width: ~10Postal codes cover different areasA single postal codes can span different cities, counties, and even statesThe largest postal code covers 10,000 mi2

  • Latitude/Longitude to Postal CodeUse CaseDetermine which postal code a user is currently in server sideUse this to return suggestions

  • Latitude/Longitude to Postal CodeThe Relational WayDraw a box, loop, and calculate


    SELECT * FROM location_table WHERE (min lat) < latitude AND latitude < (max lat)AND (min long) < longitude AND longitude < (max long)

  • Latitude/Longitude to Postal CodeCassandra SolutionPrebuild a lookup tableSlice the US up into 7mi by
  • Latitude/Longitude to Postal CodeCassandra Solution (cont.)Build: Add bordering postal codes

    Read: Loop and calculate distance

  • Latitude/Longitude to Postal CodeCreate Column Family

    cqlsh>CREATE TABLE latitude_longitude_zip_code (latitude_e1 int, longitude_e1 int, location_country text, zip_code text, location text, PRIMARY KEY ((latitude_e1, longitude_e1), location_country, zip_code));

  • Latitude/Longitude to Postal CodeAdd data

    cqlsh>INSERT INTO latitude_longitude_zip_code (latitude_e1, longitude_e1, location_country, zip_code, location) VALUES(391,-845,'US','45219','{json data}');cqlsh>INSERT INTO latitude_longitude_zip_code (latitude_e1, longitude_e1, location_country, zip_code, location) VALUES(391,-845,'US','45220','{json data}');

  • Latitude/Longitude to Postal CodeSearch

    cqlsh>SELECT * FROM latitude_longitude_zip_code WHERE latitude_e1 = 391 AND longitude_e1 = -845;Results

    latitude_e1 | longitude_e1 | location_country | zip_code | location-------------+--------------+------------------+----------+------------- 391 | -845 | US | 45206 | {json data} 391 | -845 | US | 45219 | {json data} 391 | -845 | US | 45220 | {json data}

  • Latitude/Longitude to Postal CodeThings to KnowRow width: 1 to ~50This was a short lived solutionPrimarily using client location servicesStill used as a fallback for webCreation of the lookup table took 3 hours on localhost with RAID 0 SSDs

  • City Name LookupUse CaseAuto-complete city name

    SolutionCreate a lookupRK: searchTermCN: (0 padded count)|country|city

  • City Name LookupCreate Column Family

    cqlsh>CREATE TABLE name_search (search_term text, occurrence_count int, location_country text, city text, state text, location text, PRIMARY KEY ((search_term), occurrence_count, location_country, city, state));

  • City Name LookupAdd data

    cqlsh> INSERT INTO name_search (search_term, occurrence_count, location_country, city, state, location)VALUES ('aus', 31, 'US', 'austin', 'TX', '{json data}');cqlsh> INSERT INTO name_search (search_term, occurrence_count, location_country, city, state, location)VALUES ('aus', 10, 'US', 'austell', 'GA', '{json data}');

  • City Name LookupSearch

    cqlsh>SELECT * FROM name_search WHERE search_term = 'aus' ORDER BY occurrence_count DESC;Results

    search_term | occurrence_count | location_country | city | state | location-------------+------------------+------------------+-------------+-------+------------- aus | 31 | US | austin | TX | {json data} aus | 10 | US | austell | GA | {json data} aus | 10 | US | ausablefork | NY | {json data}

  • City Name LookupThings to KnowRow width: 10 60KRemove whitespace, special characters, convert search terms to lowercaseOnly search when 2 or more characters have been entered

  • Postal Code Range SearchUse CaseFind nearby neighborhoods

    SolutionCreate a lookup tableRK: country|postal code

  • Postal Code Range SearchCreate Column Family

    cqlsh>CREATE TABLE zip_code_distance (location_country text, zip_code text, distance_e2 int, location text, PRIMARY KEY ((location_country, zip_code), distance_e2));

  • Postal Code Range SearchAdd Data

    cqlsh>INSERT INTO zip_code_distance (location_country, zip_code, distance_e2, location)VALUES('US', '78741', 0, '{json data for 78741}');cqlsh>INSERT INTO zip_code_distance (location_country, zip_code, distance_e2, location)VALUES('US', '78741', 180, '{json data for 78702}');cqlsh>INSERT INTO zip_code_distance (location_country, zip_code, distance_e2, location)VALUES('US', '78741', 220, '{json data for 78721}');

  • Postal Code Range SearchSearch

    cqlsh>SELECT * FROM zip_code_distance WHERE location_country = 'US' AND zip_code = '78741'AND distance_e2 < 200 ORDER BY distance_e2;Results

    location_country | zip_code | distance_e2 | location------------------+----------+-------------+----------------------- US | 78741 | 0 | {json data for 78741} US | 78741 | 180 | {json data for 78702}

  • Postal Code Range SearchThings to knowRow width: 1 to ~45K

  • Distance Between Postal CodesUse CaseEstimate the distance between postal codes

    SolutionCreate a lookup tableRK: country|postal codeCN: country|postal codeValue: distanceE2

  • Distance Between Postal CodesCreate Column Family

    cqlsh>CREATE TABLE zip_code_distance_between(location_country_1 text, zip_code_1 text,location_country_2 text, zip_code_2 text, distance_e2 int,PRIMARY KEY ((location_country_1, zip_code_1),location_country_2, zip_code_2));

  • Distance Between Postal CodesAdd Data

    cqlsh>INSERT INTO zip_code_distance_between (location_country_1, zip_code_1, location_country_2, zip_code_2, distance_e2)VALUES('US', '78741', 'US', '78741', 0);cqlsh>INSERT INTO zip_code_distance_between (location_country_1, zip_code_1, location_country_2, zip_code_2, distance_e2)VALUES('US', '78741', 'US', '78702', 180);

  • Distance Between Postal CodesSelect

    cqlsh>SELECT * FROM zip_code_distance_between WHERE location_country_1 = 'US' AND zip_code_1 = '78741' AND location_country_2 = 'US' AND zip_code_2 = '78702';Results

    location_country_1 | zip_code_1 | location_country_2 | zip_code_2 | distance_e2--------------------+------------+--------------------+------------+------------- US | 78741 | US | 78702 | 180

  • Distance Between Postal CodesThings to knowRow width: ~45K

  • Final ThoughtsWhy just Cassandra?Fewer technologies to supportOperationsDevelopment

    But be reasonable

    Prebuild reference dataConsider prebuilding data to reduce read time

  • Questions & Contact Info

    Matt VorstCTO Physi,