Clusters in the Expanse: Understanding and Unbiasing IPv6 ... · IPv6 measurements Differences in...
Transcript of Clusters in the Expanse: Understanding and Unbiasing IPv6 ... · IPv6 measurements Differences in...
Chair of Network Architectures and ServicesDepartment of InformaticsTechnical University of Munich
Clusters in the Expanse:Understanding and Unbiasing IPv6 Hitlists
Oliver GasserTechnical University of Munich
RIPE 77, Amsterdam
Joint work
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 2
Internet measurementsActive Internet measurements
• Important tool to understand specific networks• Which IP addresses run an HTTPS web server in the Internet?• How securely configured are IoT devices in a company network?• Are my DNS servers vulnerable to amplification attacks?
• Used by researchers, security companies,. . .
but also badactors
Why is this research relevant for operators?
• Learn measurements techniques used in IPv6 vs. in IPv4• Understand how devices can be discovered in your network• Take action by conducting measurements yourself
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 3
Internet measurementsActive Internet measurements
• Important tool to understand specific networks• Which IP addresses run an HTTPS web server in the Internet?• How securely configured are IoT devices in a company network?• Are my DNS servers vulnerable to amplification attacks?
• Used by researchers, security companies,. . . but also badactors
Why is this research relevant for operators?
• Learn measurements techniques used in IPv6 vs. in IPv4• Understand how devices can be discovered in your network• Take action by conducting measurements yourself
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 3
Internet measurementsActive Internet measurements
• Important tool to understand specific networks• Which IP addresses run an HTTPS web server in the Internet?• How securely configured are IoT devices in a company network?• Are my DNS servers vulnerable to amplification attacks?
• Used by researchers, security companies,. . . but also badactors
Why is this research relevant for operators?
• Learn measurements techniques used in IPv6 vs. in IPv4• Understand how devices can be discovered in your network• Take action by conducting measurements yourself
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 3
IPv6 measurements
Differences in IPv4 and IPv6 measurement approaches
• IPv4• Brute-force scan complete Internet in a few hours (e.g. ZMap)
• IPv6• Address space too expansive for brute force scanning• Assemble target list of IPv6 addresses for scanning→ IPv6 hitlist
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 4
IPv6 hitlist
Assembling an IPv6 hitlist
• Leverage DNS to gather IPv6 addresses• Exploit structural properties to learn new addresses• Use crowdsourcing to get client addresses
Challenges
1. Clusters in hitlist sources2. Aliased prefixes3. Finding reachable addresses
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 5
IPv6 hitlist
Assembling an IPv6 hitlist
• Leverage DNS to gather IPv6 addresses• Exploit structural properties to learn new addresses• Use crowdsourcing to get client addresses
Challenges
1. Clusters in hitlist sources2. Aliased prefixes3. Finding reachable addresses
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 5
Hitlist sourcesWhere can we learn potential IPv6 addresses?
• Domain lists: zonefiles, toplists,blacklists• Rapid7 ANY DNS• Domains extracted from Certifi-
cate Transparency• Bitcoin node addresses• RIPE Atlas: traceroutes, ipmap• Scamper: traceroute to all as-
sembled addresses
2017-082017-09
2017-102017-11
2017-122018-01
2018-022018-03
2018-042018-05
10M
20M
30M
40M
50M
60MDomainlistsFDNSCTAXFRBitnodesRIPE AtlasScamper
Figure 1: Cumulative runup ofIPv6 addresses.
Observation
• Many addresses from domain lists, CT, and scamper
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 6
Hitlist sourcesWhere can we learn potential IPv6 addresses?
• Domain lists: zonefiles, toplists,blacklists• Rapid7 ANY DNS• Domains extracted from Certifi-
cate Transparency• Bitcoin node addresses• RIPE Atlas: traceroutes, ipmap• Scamper: traceroute to all as-
sembled addresses
2017-082017-09
2017-102017-11
2017-122018-01
2018-022018-03
2018-042018-05
10M
20M
30M
40M
50M
60MDomainlistsFDNSCTAXFRBitnodesRIPE AtlasScamper
Figure 1: Cumulative runup ofIPv6 addresses.
Observation
• Many addresses from domain lists, CT, and scamper
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 6
Hitlist sourcesWhere can we learn potential IPv6 addresses?
• Domain lists: zonefiles, toplists,blacklists• Rapid7 ANY DNS• Domains extracted from Certifi-
cate Transparency• Bitcoin node addresses• RIPE Atlas: traceroutes, ipmap• Scamper: traceroute to all as-
sembled addresses
2017-082017-09
2017-102017-11
2017-122018-01
2018-022018-03
2018-042018-05
10M
20M
30M
40M
50M
60MDomainlistsFDNSCTAXFRBitnodesRIPE AtlasScamper
Figure 1: Cumulative runup ofIPv6 addresses.
Observation
• Many addresses from domain lists, CT, and scamperOliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 6
Hitlist sourcesHow diverse are the addresses from different sources?
100 101 102 103 104
AS
0.0
0.2
0.4
0.6
0.8
1.0
Frac
tion
of a
ddr.
in to
p X
ASes
DomainlistsFDNSCTAXFRBitnodesRIPE AtlasScamper
Figure 2: AS distribution for hitlist sources.
Autonomous System distribution
• Unbalanced (CT, domain lists) vs. balanced (RIPE Atlas)
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 7
Hitlist sourcesHow diverse are the addresses from different sources?
100 101 102 103 104
AS
0.0
0.2
0.4
0.6
0.8
1.0
Frac
tion
of a
ddr.
in to
p X
ASes
DomainlistsFDNSCTAXFRBitnodesRIPE AtlasScamper
Figure 2: AS distribution for hitlist sources.
Autonomous System distribution
• Unbalanced (CT, domain lists) vs. balanced (RIPE Atlas)
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 7
Hitlist sourcesHow diverse are the addresses from different sources?
100 101 102 103 104
AS
0.0
0.2
0.4
0.6
0.8
1.0
Frac
tion
of a
ddr.
in to
p X
ASes
DomainlistsFDNSCTAXFRBitnodesRIPE AtlasScamper
Figure 2: AS distribution for hitlist sources.
Autonomous System distribution
• Unbalanced (CT, domain lists) vs. balanced (RIPE Atlas)Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 7
Hitlist sourcesHow much of the announced address space do we cover?
5M
153K
2K
49
1
Inp
ut IP
ad
dre
sses
Figure 3: IPv6 prefixes with number of hitlist addresses per prefix.
BGP prefix distribution
• Good coverage of BGP prefixes: 25.5 k of 51.2 k• Some prefixes with many addresses
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 8
Hitlist sourcesHow much of the announced address space do we cover?
5M
153K
2K
49
1
Inp
ut IP
ad
dre
sses
Figure 3: IPv6 prefixes with number of hitlist addresses per prefix.
BGP prefix distribution
• Good coverage of BGP prefixes: 25.5 k of 51.2 k• Some prefixes with many addresses
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 8
Hitlist sourcesHow much of the announced address space do we cover?
5M
153K
2K
49
1
Inp
ut IP
ad
dre
sses
Figure 3: IPv6 prefixes with number of hitlist addresses per prefix.
BGP prefix distribution
• Good coverage of BGP prefixes: 25.5 k of 51.2 k• Some prefixes with many addresses
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 8
Hitlist sources
Key take-aways for network operations
1. IPv6 address space too vast to conduct brute-force measurements
2. Your addresses can be gathered from many dif-ferent publicly available sources (e.g. DNS, CT)
3. About 50 % of announced prefixes are coveredin our IPv6 hitlist
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 9
Address entropy clusteringAddressing schemes
• Question: How similar are addressing schemes in our hitlist?• Approach: Group addresses to find similar address schemes
01020304050/32 prefixes [%]
1
2
3
4
5
6
Clu
ster
ID
10 12 14 16 18 20 22 24 26 28 30 32IPv6 nybble (hex character)
0.0
0.2
0.4
0.6
0.8
1.0
Media
n e
ntr
opy
Figure 4: Addressing schemes.
• Only few addressing schemes• Low-bit addresses (e.g. ::1), privacy extensions, and EUI-
64 mapped MAC addresses clearly visible
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 10
Address entropy clusteringAddressing schemes
• Question: How similar are addressing schemes in our hitlist?• Approach: Group addresses to find similar address schemes
01020304050/32 prefixes [%]
1
2
3
4
5
6
Clu
ster
ID
10 12 14 16 18 20 22 24 26 28 30 32IPv6 nybble (hex character)
0.0
0.2
0.4
0.6
0.8
1.0
Media
n e
ntr
opy
Figure 4: Addressing schemes.
• Only few addressing schemes• Low-bit addresses (e.g. ::1), privacy extensions, and EUI-
64 mapped MAC addresses clearly visible
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 10
Address entropy clusteringAddressing schemes
• Question: How similar are addressing schemes in our hitlist?• Approach: Group addresses to find similar address schemes
01020304050/32 prefixes [%]
1
2
3
4
5
6
Clu
ster
ID
10 12 14 16 18 20 22 24 26 28 30 32IPv6 nybble (hex character)
0.0
0.2
0.4
0.6
0.8
1.0
Media
n e
ntr
opy
Figure 4: Addressing schemes.
• Only few addressing schemes• Low-bit addresses (e.g. ::1), privacy extensions, and EUI-
64 mapped MAC addresses clearly visibleOliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 10
Address entropy clustering
Key take-aways for network operations
1. Most networks use one of a handful of address-ing schemes
2. Good: Industry best practices are followed
3. Bad: Addressing schemes might uncover “hid-den” hosts
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 11
Detecting aliased prefixesAliases
• Alias: Multiple addresses belonging to the same host• Aliased prefix: Complete prefix bound to the same host• Bias: As some hosts are overrepresented, aliased prefixes
introduce bias in the hitlist
Detecting aliased prefixes using pseudo-random probing
2001:0db8:0407:8000::/64
2001:0db8:0407:8000:0151:2900:77e9:03a8...
2001:0db8:0407:8000:f693:2443:915e:1d2e
Table 1: IPv6 fan-out for multi-level aliased prefix detection.
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 12
Detecting aliased prefixesAliases
• Alias: Multiple addresses belonging to the same host• Aliased prefix: Complete prefix bound to the same host• Bias: As some hosts are overrepresented, aliased prefixes
introduce bias in the hitlist
Detecting aliased prefixes using pseudo-random probing
2001:0db8:0407:8000::/64
2001:0db8:0407:8000:0151:2900:77e9:03a8...
2001:0db8:0407:8000:f693:2443:915e:1d2e
Table 1: IPv6 fan-out for multi-level aliased prefix detection.
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 12
Detecting aliased prefixes
5M
148K
2K
48
1
Inp
ut IP
ad
dre
sses
Figure 5: All prefixes covered by hitlist.
5M
148K
2K
48
1
Inp
ut IP
ad
dre
sses
Figure 6: Aliased prefixes.
• 55.1 M raw IPv6 addresses in hitlist• Few prefixes are aliased (e.g. Amazon, see right figure)• 25.7 M IPv6 addresses in aliased prefixes (46.6 %)• Validation using fingerprinting (iTTL, TCP opts, TCP TS)
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 13
Detecting aliased prefixes
Key take-aways for network operations
1. Aliased prefixes can introduce bias in IPv6 mea-surements
2. Can be detected with pseudo-random probing
3. Using aliasing to hide your prefixes and hosts isnot very effective
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 14
Address responsiveness
Cross protocol responsiveness
• If address responds on protocol X, how likely is it to respondon protocol Y?• Goal: Identify relevant addresses for specific measurements
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 15
Address responsiveness
ICMP TCP/80 TCP/443 UDP/53 UDP/443Protocol X
UDP/443
UDP/53
TCP/443
TCP/80
ICMP
Pr[P
roto
col Y
| Pr
otoc
ol X
]
0.017 0.035 0.054 0.0065 1
0.069 0.1 0.14 1 0.029
0.29 0.58 1 0.54 0.98
0.45 1 0.91 0.61 0.99
1 0.95 0.93 0.89 0.990.2
0.4
0.6
0.8
1.0
Figure 7: Likeliness to respond on protocol Y, if responding to protocol X.
• If responsive to one of the probes→ at least 89% chance itwill answer to ICMPv6• Web protocols: QUIC→HTTPS and HTTP, HTTPS→HTTP;
but not the other way around
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 16
Address responsiveness
ICMP TCP/80 TCP/443 UDP/53 UDP/443Protocol X
UDP/443
UDP/53
TCP/443
TCP/80
ICMP
Pr[P
roto
col Y
| Pr
otoc
ol X
]
0.017 0.035 0.054 0.0065 1
0.069 0.1 0.14 1 0.029
0.29 0.58 1 0.54 0.98
0.45 1 0.91 0.61 0.99
1 0.95 0.93 0.89 0.990.2
0.4
0.6
0.8
1.0
Figure 7: Likeliness to respond on protocol Y, if responding to protocol X.
• If responsive to one of the probes→ at least 89% chance itwill answer to ICMPv6• Web protocols: QUIC→HTTPS and HTTP, HTTPS→HTTP;
but not the other way aroundOliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 16
Address responsiveness
Key take-aways for network operations
1. Knowing responsiveness on one service mightleak information about other services
2. Horizontal port scanning on all devices is notnecessary
3. Attackers might pick one port (e.g. TCP/80) andthen continue with only responsive hosts
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 17
Learning new addressesTechniques to learn new addresses
• Entropy/IP: Generate new addresses by leveraging entropyof seed addresses• Similar approach to grouping addresses based on their structure as
shown earlier• Presented at RIPE74 in Budapest by Paweł Foremski
• 6Gen: Generate new addresses in dense address regions• If we see addresses
• 2001:0db8:0407:8000::3• 2001:0db8:0407:8000::4• 2001:0db8:0407:8000::5• 2001:0db8:0407:8000::8• 2001:0db8:0407:8000::9
• Likely other valid addresses• 2001:0db8:0407:8000::6• 2001:0db8:0407:8000::7
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 18
Learning new addressesTechniques to learn new addresses
• Entropy/IP: Generate new addresses by leveraging entropyof seed addresses• Similar approach to grouping addresses based on their structure as
shown earlier• Presented at RIPE74 in Budapest by Paweł Foremski
• 6Gen: Generate new addresses in dense address regions• If we see addresses
• 2001:0db8:0407:8000::3• 2001:0db8:0407:8000::4• 2001:0db8:0407:8000::5• 2001:0db8:0407:8000::8• 2001:0db8:0407:8000::9
• Likely other valid addresses• 2001:0db8:0407:8000::6• 2001:0db8:0407:8000::7
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 18
Learning new addressesHow well do Entropy/IP and 6Gen perform?
• Input: All previously found IPv6 addresses• Generation: 118 M and 129 M, only 675 k overlapping• Responsiveness: 278 k and 489 k• Magnitude higher response rate for overlapping addresses
Table 2: Top 5 responsive protocol combinations for 6Gen and Entropy/IP.
ICMP TCP/80 TCP/443 UDP/53 UDP/443 6Gen Entropy/IP
3 7 7 7 7 66.8 % 41.1 %3 3 3 7 7 9.2 % 12.3 %7 7 7 3 7 7.3 % 23.1 %3 3 7 7 7 4.9 % 3.4 %3 3 3 7 3 3.2 % 6.1 %
• Different host populations
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 19
Learning new addressesHow well do Entropy/IP and 6Gen perform?
• Input: All previously found IPv6 addresses• Generation: 118 M and 129 M, only 675 k overlapping• Responsiveness: 278 k and 489 k• Magnitude higher response rate for overlapping addresses
Table 2: Top 5 responsive protocol combinations for 6Gen and Entropy/IP.
ICMP TCP/80 TCP/443 UDP/53 UDP/443 6Gen Entropy/IP
3 7 7 7 7 66.8 % 41.1 %3 3 3 7 7 9.2 % 12.3 %7 7 7 3 7 7.3 % 23.1 %3 3 7 7 7 4.9 % 3.4 %3 3 3 7 3 3.2 % 6.1 %
• Different host populations
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 19
Learning new addressesHow well do Entropy/IP and 6Gen perform?
• Input: All previously found IPv6 addresses• Generation: 118 M and 129 M, only 675 k overlapping• Responsiveness: 278 k and 489 k• Magnitude higher response rate for overlapping addresses
Table 2: Top 5 responsive protocol combinations for 6Gen and Entropy/IP.
ICMP TCP/80 TCP/443 UDP/53 UDP/443 6Gen Entropy/IP
3 7 7 7 7 66.8 % 41.1 %3 3 3 7 7 9.2 % 12.3 %7 7 7 3 7 7.3 % 23.1 %3 3 7 7 7 4.9 % 3.4 %3 3 3 7 3 3.2 % 6.1 %
• Different host populationsOliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 19
Learning new addresses
Key take-aways for network operations
1. Address learning uncovers previously unknownaddresses
2. Techniques provide complementary address sets
3. Hiding in the expansive IPv6 address space mightbe more difficult
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 20
Conclusion
• IPv6 Internet too vast to conduct brute-force measurements• But you might be less “hidden” in IPv6 than you’d have thought• Addressing schemes might uncover “hidden” hosts• Responsiveness of one service might leak information about
other services
ipv6hitlist.github.io
Oliver Gasser <[email protected]>https://www.net.in.tum.de/~gasser/
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 21
Conclusion
• IPv6 Internet too vast to conduct brute-force measurements• But you might be less “hidden” in IPv6 than you’d have thought• Addressing schemes might uncover “hidden” hosts• Responsiveness of one service might leak information about
other services
ipv6hitlist.github.io
Oliver Gasser <[email protected]>https://www.net.in.tum.de/~gasser/
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 21
Conclusion
• IPv6 Internet too vast to conduct brute-force measurements• But you might be less “hidden” in IPv6 than you’d have thought• Addressing schemes might uncover “hidden” hosts• Responsiveness of one service might leak information about
other services
ipv6hitlist.github.io
Oliver Gasser <[email protected]>https://www.net.in.tum.de/~gasser/
Oliver Gasser — Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists 21