Sándor Laki (C) Geolocation by IP address
1
Geolocation by IP address
Locating Internet hosts
Sándor [email protected]
http://lakis.web.elte.hu
Sándor Laki (C) Geolocation by IP address
2
Outline
RADAR a wireless solution IP2Geo on the Internet Constraint-Based Geolocation
(GeoLim) OCTANT framework
Topology-Based Geolocation
Sándor Laki (C) Geolocation by IP address
4
RADAR, a wireless approach
Focus on the indoor environment GPS does not work indoors Dedicated technologies
Goals Leverage existing infrastucture Use wireless LAN Software solution Better scalability and lower cost than dedicated
technology
Sándor Laki (C) Geolocation by IP address
5
RADAR
Key idea: Signal strength matching Offline calibration
Construct radio map (<location, Sstr>)
Real-time location and tracking Extract SStr from beacons Find table entry that best matches the
measured SStr
Sándor Laki (C) Geolocation by IP address
6
RADAR: Determine location
Find nearest neighbor in signal space (NNSS)
1st solution Physical position of NNSS gives the user
location 2nd solution
K-NNSS Average the coordinates of k nearest
neighbor gives the wanted position
Sándor Laki (C) Geolocation by IP address
7
Correlation between physical location and signal strength
•Base system:
•INFOCOM 2000 paper
•Enhanced system:
•Microsoft Technical Report MSR-TR-2000-12
Sándor Laki (C) Geolocation by IP address
9
IP2Geo - Motivation Much focus on location-aware services in wireless
and mobile contexts Such services are relevant in the Internet context
too targeted advertising event notification territorial rights management network diagnostics
It is a challenging problem IP address does not inherently contain an indication
of location
Sándor Laki (C) Geolocation by IP address
10
IP2Geo Multi-pronged approach that exploits various “properties” of
the Internet DNS names of router interfaces often indicate location network delay tends to correlate with geographic distance hosts that are aggregated for the purposes of Internet routing also
tend to be clustered geographically
GeoTrack determine location of closest router with a recognizable DNS name
GeoPing use delay measurements to estimate location
GeoCluster extrapolate partial (and possibly inaccurate) IP-to-location mapping
information using BGP prefix clusters
Sándor Laki (C) Geolocation by IP address
11
GeoTrack – main idea
Extract geographical information from DNS names of routers on the path
Localizes the target to the last router whose position is known
Example ngcore1-serial8-0-0-0.Seattle.cw.net => Seattle 184.atm6-0.xr2.ewr1.alter.net => New York dnvr-scrm.abilene.ucaid.edu => Denver
Sándor Laki (C) Geolocation by IP address
12
GeoTrack GeoTrack operation
do a traceroute to the target IP address determine location of last recognizable router along the path
Key ideas in GeoTrack partitioned city code database to minimize chance of false match ISP-specific parsing rules delay-based correction
Limitations routers may not respond to traceroute DNS name may not contain location information or lookup may
fail target host may be behind a proxy or a firewall
Sándor Laki (C) Geolocation by IP address
13
GeoPing - Delay based localization Delay-based triangulation is conceptually simple
delay to distance distance from 3 or more non-colinear points => target
location
But there are practical difficulties network path may be circuitous transmission & queuing delays may corrupt delay
estimate OWD is hard to measure OWD ≠ RTT/2 because of routing asymmetry
Sándor Laki (C) Geolocation by IP address
14
GeoPing - details Measure the network delay to the target host from several
geographically distributed probes typically more than 3 probes are used round-trip delay measured using ping utility small-sized packets => transmission delay is negligible pick minimum among several delay samples
Nearest Neighbor in Delay Space (NNDS) akin to Nearest Neighbor in Signal Space (NNSS) in RADAR construct a delay map containing (delay vector,location) tuples given a vector of delay measurements, search through the delay
map for the NNDS location of the NNDS is our estimate for the location of the target
host
More robust that directly trying to map from delay to distance
Sándor Laki (C) Geolocation by IP address
15
GeoPing – Delay tends to increase with geographic distance
Sándor Laki (C) Geolocation by IP address
18
GeoCluster A passive technique unlike GeoTrack and GeoPing
Basic idea: breaks the IP address space into clusters assign a geographical location to each cluster based on
IP-to-location third party databases given a target IP address, first find the matching cluster
using longest-prefix match. location of matching cluster is our estimate of host
location
Sándor Laki (C) Geolocation by IP address
19
GeoCluster Example:
consider the cluster 128.95.0.0/16 (containing 65536 IP addresses)
suppose we know that the location corresponding to a few IP addresses in this cluster is Seattle
then given a new address, say 128.95.4.5, we deduce that it is likely to be in Seattle too
Sándor Laki (C) Geolocation by IP address
20
GeoCluster – Clustering IP addresses
Exploit the hierarchical nature of Internet routing inter-domain routing in the Internet uses
the Border Gateway Protocol (BGP) BGP operates on address aggregates we treat these aggregates as clusters in all we had about 100,000 clusters of
different sizes
Sándor Laki (C) Geolocation by IP address
21
IP-to-location mapping
Data sources e-mail service, business web-hosting companies,
etc. requires a large, fine-grain and fresh database!
Information partial information (i.e., only for a small subset
of addresses) possibly inaccurate (e.g., manual input from
user)
Sándor Laki (C) Geolocation by IP address
22
Extrapolating IP-to-location mapping Determine location most likely to correspond to a cluster
majority polling “average” location dispersion is an indicator of our confidence in the location estimate
What if there is a large geographic spread in locations? some clusters correspond to large ISPs and the internal
subdivisions are not visible at the BGP level sub-clustering algorithm: keep sub-dividing clusters until there is
sufficient consensus in the individual sub-clusters some clients connect via proxies or firewalls (e.g., AOL clients) sub-clustering may help if there are local or regional proxies otherwise large dispersion => no location estimate made many tools fail in this regard
Sándor Laki (C) Geolocation by IP address
23
Performance of GeoCluster
Median errors:
GeoCluster ~30km
GeoPing ~300km
GeoTrack ~100km
Sándor Laki (C) Geolocation by IP address
24
Other database-oriented applications NetGeo and IP2LL
based on WHOIS DB not closely regulated the address information often indicates the
head office of the owner which may be far from the actual target
Quova Commercial service with thier own database
Gtrace using DNS LOC entries
Sándor Laki (C) Geolocation by IP address
26
Octant overview
Combine very different techniques Active and passive
Constraint-based Weighted positive and negative constraints Constraint => region
Using Bézier-regions Efficient implementations of clipping and union
operations are available
Sándor Laki (C) Geolocation by IP address
27
Octant - Notations
i : the region in which the target node is located
j : a constraint It is a region where the node might be
reside associated with weight Set of nodes
Landmarks: physical locations are at least partially known (Lj)
Every Lj has an „estimated” location Lj
Sándor Laki (C) Geolocation by IP address
28
Octant – Landmarks and constraints Primary landmark
GPS, street address Low error
Secondary landmark Position computed by Octant itself
Positive constraints ( set: ) Node A is within d miles of Lk
= (x,y) in k c(x,y,d), where c(x,y,d) is a disc.
Negative constraints( set: ) Node A is further than d miles from Lk
= (x,y) in k c(x,y,d)
Sándor Laki (C) Geolocation by IP address
30
Mapping latencies to distances Latency between a target and a landmark
bounds thier maximum distance Calculate with speed of light
delay*2/3*c Low precision
Octant’s way Dynamic calibration For each L landmark compute two bounds RL(d) and
rL(d) where d is the ping time of node i
rL(d) || loc(L) – loc(i) || RL(d)
When queuing delays are dominant then rL(d) = 0.
Sándor Laki (C) Geolocation by IP address
31
Mapping latencies to distances Each landmark periodically pings all other landmarks =>
creating a correlation table Determines the convex hull around the points => R(d)
and r(d) It is sufficient when the target has a direct and
congestion-free path to the landmark Octant introduce a cut off at latency p
a tunable percentage of landmark lie to the left of p discard the others (z is a fictitious datapoint, placed far away)
Sándor Laki (C) Geolocation by IP address
33
Last hop delays Mapping is further complicated by queuing
and transmission delays associated with the last hop Cable and DSL connections Overloaded PlanetLAB nodes
Goal: isolate the delay components which artificially inflate latencies Detailed maps of the underlying physical
network, as in network tomography (not in Octant)
Octant introduce a simple metric called height
Sándor Laki (C) Geolocation by IP address
34
Last hop delays in Octant Based on pair-wise latency measurements
between landmarks Primary landmarks: a, b, c Measure thier latencies(RTT): [a,b], [a,c],
[b,c] The positions of primary landmarks are
known -> we can estimate the transmission delays: (a,b), (a,c), (b,c)
Lasthop delay(a,b) = [a,b] - (a,b) Landmark coordinates (alon, alat),…
Sándor Laki (C) Geolocation by IP address
35
Last hop delays in Octant How much of the delays can be attributed to each
landmark? Denoted by a’, b’ and c’ // height
Similarly, for a target t, we can compute t’, as an estimation…
We can solve for t’, tlon, tlat
Sándor Laki (C) Geolocation by IP address
36
Last hop delays
tlon and tlat has relatively high error not used in the later stages
Given the target and landmark heights Each landmark can shift its
RL up if t’ < heights of the other landmarks
rL down if t’ > heights of the other landmarks
Sándor Laki (C) Geolocation by IP address
37
Indirect routes The preceding assumption
Route lengths are proportional to great circle distances
not the case in practise, due to policy routing
Example: a subscriber Ithaca, NY -> Cornell Univ. (Ithaca) Syracuse, NY -> Brockport, IL -> New York City ->
Cornell Univ. ~1 mile physical distance VS. ~800 miles length
path
Sándor Laki (C) Geolocation by IP address
38
Indirect routes discovery Landmark’s heigth can
indicate… Localizing routers on the
network path Secondary landmarks Localization by latencies Extract location from router
names Reverse DNS lookup – undns
tool Using ZIP code to determine
geographical location
Sándor Laki (C) Geolocation by IP address
39
Handling uncertainty
Filter out errorneous constraint Latency based constraints
Weight system that decreases exponentially with increasing latency
Weight threshold
Sándor Laki (C) Geolocation by IP address
40
Iterative refinement
Two phase: First, we use accurate and mostly
conservative constraints Second, less acurate and more
aggressive constraints to obtain a better estimation (inside the initial estimated region)
…
Sándor Laki (C) Geolocation by IP address
44
Motivations
Problems with CBG Use constraints that are less than speed of light
Risk of underestimates When an underestimate occurs, the final region
does not contain the true location
Topology based geolocation using the speed of light to generate constraints inspired by Sensor Network Localization
Sándor Laki (C) Geolocation by IP address
45
Summary of techniques Traceroute from landmarks
Map topology Estimate hop latency
Improve accuracy Cluster network interfaces
Increase structuring Validate location hints
Incorporate location hints Constraint optimization
Geolocate targets
Sándor Laki (C) Geolocation by IP address
46
Estimate hop latencies
Using traceroute tool to infer link latency
Estimate hop latency from the difference in RTT to adjacent routers Accurate only if the link is traversed both
directions (symmetric routing) How can we discover this property?
Three different techniques
Sándor Laki (C) Geolocation by IP address
47
Estimate hop latencies First, observing the reverse TTL values
Most routers initialize the TTL values for thier packets from a small set. 30,32,64,128,150,255
If TTL values changes significantly from one node to the next => discard the link estimate
Second, measuring paths in both direction between pairs of landmarks If both paths traverse a particular link => taking the differences of
measurements to the two endpoints
This estimation has high confidence
Third, increasing vantage points from which we probe a certain link For every link on a path from a landmark we probes to both endpoints from
all other landmarks…
If these probes pass over the link => estimate for the link
Sándor Laki (C) Geolocation by IP address
48
Clustering interfaces Clustering interfaces that belong to the same router (IP
aliases)
Sándor Laki (C) Geolocation by IP address
49
Clustering interfaces Two IP-aliases techniques
Mercator UDP probes are send to high-numbered ports on a set of
interfaces Routers send back a port-unreachable ICMP message with the
source address If two diff. interfaces replie with the same source address =>
aliases
Ally Used on pairs of interfaces Sends probes to the two if. Examines the IP-ID Most routers generate the IP-ID using a single counter that has
incremented after each packet has been created
Sándor Laki (C) Geolocation by IP address
50
Validating location hints
DNS names -> locations Some names are incorrect…
Missnamed, reconfig, reassignment of IP addresses
Topology constraints can be used to verify location hints RTT measurements -> upper bounds… Clustering -> aliases Hop latencies
Sándor Laki (C) Geolocation by IP address
51
Constraint optimization Targets:X = {xi} Landmarks:L = {li} Distance bw i and j: d(i,j) Hard delay constraint
D(li xj) <= cij // cij Speed of light Set of hdc = Cd
Soft Link Latency Constraints Hop latency bw i and j: hij
D(xi,xj) = hij + eij
Where eij is some error Set of SLLC = Cl
Sándor Laki (C) Geolocation by IP address
52
Constraint optimization
Minimize: i,j in Cl|eij|
Subject to: Cd , Cl
Not a convex optimization problem But we can recast it as a semidefinite
program Using fast solvers
SeDuMi Vivaldi
Sándor Laki (C) Geolocation by IP address
55
References RADAR and IP2Geo
http://eris.prakinf.tu-ilmenau.de/res/papers/coopStreaming/padmanabhan01Locating.pdf
Bernard Wong, Ivan Stoyanov and Emin Gün Sirer.
Octant: A Comprehensive Framework for the Geolocalization of Internet Hosts.In Proceedings of the Symposium on Networked System Design and Implementation, Cambridge, Massachusetts, April 2007.
Katz-Bassett, E., John, J. P., Krishnamurthy, A., Wetherall, D., Anderson, T., and Chawathe, Y.
Towards IP geolocation using delay and topology measurements.
In Proceedings of the 6th ACM SIGCOMM on internet Measurement (Rio de Janeriro, Brazil, October 25 - 27, 2006). IMC '06. ACM Press, New York, NY, 71-84.
Top Related