BIND DS Servfail
-
Upload
karthikeyan-subramanian -
Category
Documents
-
view
38 -
download
3
Transcript of BIND DS Servfail
SERVFAILS IN .NET AFTER PUBLICATION OF DS RECORD DUANE WESSELS VERISIGN LABS
Table of Contents Introduction................................................................................................................................... 3 Experiment Setup......................................................................................................................... 3
Phase 1: Pre-Signing.............................................................................................................................. 4 Phase 2: Deliberately Unvalidatable..................................................................................................... 4 Phase 3: DS in .NET................................................................................................................................ 5 Phase 4: Unblinding ............................................................................................................................... 5 Phase 5: DS in root................................................................................................................................. 5
SERVFAIL from BIND 9.7.0 .......................................................................................................... 5 Dependency on Phase Duration ................................................................................................. 6 When TTLs match......................................................................................................................... 8 Earlier Versions of BIND.............................................................................................................. 8 Fixed in BIND 9.7.1b1 ................................................................................................................... 9 Without Blinding......................................................................................................................... 10 Workarounds .............................................................................................................................. 10
INTRODUCTION On the day that the DS record for .NET was published in the root zone (December 9,
2010) a user reported experiencing a failure to resolve all .NET domain names for a
period of approximately two hours until his nameserver was restarted.
The resolver was configured with a root zone trust anchor. The resolver software was
BIND 9.7.0-P2.
This document describes an attempt to reproduce the reported problem in a lab
environment. This experiment is designed to mimic the procedures for signing the .NET
zone. There are five phases:
1. Pre-signing. The .NET zone is not signed and does not contain DNSKEY or DS
records.
2. Deliberately Unvalidatable. The .NET zone is signed, but DNSKEY records are
“blinded.”
3. DS in .NET. DS records are published in the .NET zone.
4. Unblinding. The DNSKEY records in the .NET zone are unblinded.
5. DS in root. The DS record for .NET is published in the root zone.
During the actual .NET deployment, at least 48 hours passed between phases 4 and 5,
allowing sufficient time for all DNSKEY records (which have TTL 24 hours) to expire from
resolver caches.
EXPERIMENT SETUP In these experiments, shorter TTLs are used. Real world TTLs are divided by 1440 so
that 1-day now becomes 1-minute.
A root zone is created with a single delegation to .NET.
A .NET zone is created with two delegations to UNSIGNED.NET and SIGNED.NET.
Zones are signed and served authoritatively by BIND-9.7.0 tools.
A Perl script using Net::DNS sends queries to the BIND resolver every 15 seconds and
logs the query time, query name, response code, AD bit, RDATA, and TTL.
PHASE 1: PRE-SIGNING
The root zone is signed with 1024-bit keys using algorithm 8. It contains only a single
delegation to .NET, plus the necessary glue.
. 60 IN SOA ns1. dns. ( 1292377267 ; serial 15 ; refresh (15 seconds) 10 ; retry (10 seconds) 420 ; expire (7 minutes) 60 ; minimum (1 minute) ) 60 RRSIG SOA 8 0 60 20110119191346 ( 20101220191346 11325 . DoREtOvBUCc18cuo8Jst7wJ046Ie3PoYNc4l a7B5GGkoJEWY2YwL2vsyeOrSPBc+z+waLe2R mfu3OdrYA6QfUMWX0Ej4Gh83+OsWXlbxnqla +8dIYY5JZws7n64izxsbVXFATM3HutECdtxi /q7JxKbD3A9PBdJhKADIwG1/CK4= ) 360 NS a.root-servers.net. (etc) net. 120 IN NS a.gtld-servers.net. (etc)
PHASE 2: DELIBERATELY UNVALIDATABLE
Keys are generated for the .NET zone (1024-bit, algorithm 8). The zone is signed.
DNSKEY records are blinded before publishing using this command and the zoneblind-
private.pl script from the root-dnssec repository.
$ named-checkzone -i none -n ignore -o - net ../zones/net.signed \ | perl zoneblind-private.pl \ > net.blind $scp -p net.blind tld:/etc/namedb/master/net.zone
DNSKEY records are given a 60 second TTL.
PHASE 3: DS IN .NET
The SIGNED.NET zone is signed (1024-bit Algorithm 8) and its DS records are added to
the .NET zone.
The .NET zone is re-signed and re-blinded.
PHASE 4: UNBLINDING
The .NET zone is re-signed but no longer blinded.
PHASE 5: DS IN ROOT
DS records for .NET are added to the root zone, which is re-signed and published.
SERVFAIL FROM BIND 9.7.0 Following phase 5, after cached records expire, BIND-9.7.0-P2 may return SERVFAIL
for the unsigned zone (and NOERROR for the signed zone). Here is the output from the
query script:
(in phase 4 here) 1293039186 www.unsigned.net NOERROR AD=0 127.0.0.1 30 1293039201 www.signed.net NOERROR AD=0 127.0.0.1 15 1293039201 www.unsigned.net NOERROR AD=0 127.0.0.1 15 1293039216 www.signed.net NOERROR AD=0 127.0.0.1 60 1293039216 www.unsigned.net NOERROR AD=0 127.0.0.1 60 1293039231 www.signed.net NOERROR AD=0 127.0.0.1 45 1293039231 www.unsigned.net NOERROR AD=0 127.0.0.1 45 1293039246 www.signed.net NOERROR AD=0 127.0.0.1 30 1293039246 www.unsigned.net NOERROR AD=0 127.0.0.1 30 1293039261 www.signed.net NOERROR AD=0 127.0.0.1 15 1293039261 www.unsigned.net NOERROR AD=0 127.0.0.1 15 (phase 5 begins here) 1293039276 www.signed.net NOERROR AD=0 127.0.0.1 60 1293039277 www.unsigned.net SERVFAIL AD=0 1293039292 www.signed.net NOERROR AD=0 127.0.0.1 44 1293039292 www.unsigned.net SERVFAIL AD=0 1293039307 www.signed.net NOERROR AD=0 127.0.0.1 29 1293039307 www.unsigned.net SERVFAIL AD=0 1293039322 www.signed.net NOERROR AD=0 127.0.0.1 14 1293039322 www.unsigned.net SERVFAIL AD=0
(all cached records expired by now) 1293039338 www.signed.net NOERROR AD=1 127.0.0.1 60 1293039338 www.unsigned.net SERVFAIL AD=0 1293039353 www.signed.net NOERROR AD=1 127.0.0.1 45 1293039353 www.unsigned.net SERVFAIL AD=0 1293039368 www.signed.net NOERROR AD=1 127.0.0.1 30 1293039368 www.unsigned.net SERVFAIL AD=0 1293039383 www.signed.net NOERROR AD=1 127.0.0.1 15 1293039383 www.unsigned.net SERVFAIL AD=0 1293039398 www.signed.net NOERROR AD=1 127.0.0.1 60 1293039398 www.unsigned.net SERVFAIL AD=0 1293039413 www.signed.net NOERROR AD=1 127.0.0.1 44 1293039413 www.unsigned.net SERVFAIL AD=0
Experimentation showed that the SERVFAIL condition did not always happen and may
depend on the duration of each phase of the deployment.
DEPENDENCY ON PHASE DURATION A script was written to repeatedly run the simulation, each time varying the durations of
phases 3 and 4. One goal of this experiment is to discover if, by using a longer phase 4
duration, the SERVFAIL condition can be avoided.
The duration of phase 3 is the time between publishing DS records in the NET zone and
unblinding the NET DNSKEYs.
The duration of phase 4 is the time between unblinding the NET DNSKEYs and
publishing the NET DS record in the root zone.
In this experiment, once the NET DS record is published, the script sleeps for 70
seconds to allow cached records to expire. It then uses dig to issue a query for
WWW.UNSIGNED.NET and records the response code. The following figure shows the
results:
The phase 3 duration is represented on the Y-axis, and the phase 4 duration on the X-
axis. Red triangles indicate SERVFAIL results, while green circles show cases where
resolution of unsigned names was successful.
Recall that these experiments use TTLs equal to real-world TTLs divided by 1440. Thus,
a TTL originally equal to one day becomes a 1-minute TTL in the experiment. The axis
labels in the graph indicate that the axis values may be interpreted as either days or
minutes.
The pattern appears to be related to the sum of the phase 3 and phase 4 duration, as
shown here:
BIND 9.7.0 P2 DS Introduction Behavior
Time between Unblinding and DS in Root (minutes or days)0 1 2 3 4 5 6 7 8 9 10
Tim
e be
twee
n Si
gnin
g an
dUn
blin
ding
(min
utes
or d
ays)
0
1
2
3
SERVFAILNOERROR
BIND 9.7.0 P2 DS Introduction Behavior
Time between Signing and DS in Root (minutes or days)0 1 2 3 4 5 6 7 8 9 10
Tim
e be
twee
n Si
gnin
g an
dUn
blin
ding
(min
utes
or d
ays)
0
1
2
3
SERVFAILNOERROR
Furthermore, the fact that the pattern has period of 2 days with a “good region” that is 1
day long followed by a “bad region” that is one day long, leads us to believe that it is
caused by the difference in NS/glue TTLs (2 days) versus DNSKEY TTLs (1 day).
WHEN TTLS MATCH In the production NET zone, NS/glue records have 2-day TTLs and DNSKEY records
have 1-day TTLs. In these experiments so far they have been 2-minute and 1-minute
TTLs.
When both DNSKEY and NS/glue TTLs are set to 1 day/minute, the following results are
obtained:
Clearly, when the TTLs match, BIND 9.7.0-P2 does not exhibit the SERVFAIL problem.
EARLIER VERSIONS OF BIND The bug is present in versions as early as BIND-9.6.2-P2, which is one of the first
versions to support the SHA-256 algorithm:
BIND 9.7.0 P2 DS Introduction BehaviorMatching TTLs
Time between Unblinding and DS in Root (minutes or days)0 1 2 3 4 5 6 7 8 9 10
Tim
e be
twee
n Si
gnin
g an
dUn
blin
ding
(min
utes
or d
ays)
0
1
2
3
SERVFAILNOERROR
FIXED IN BIND 9.7.1B1 Based on bug descriptions in the BIND CHANGES file, the following entry sounds like it
could be the bug we are seeing here:
2890. [bug] Handle the introduction of new trusted-keys and DS, DLV RRsets better. [RT #21097]
The same tests as above were made against BIND 9.7.1b1 with the following results:
BIND 9.6.2 P2 DS Introduction BehaviorMismatched TTLs
Time between Unblinding and DS in Root (minutes or days)0 1 2 3 4 5 6 7 8 9 10
Tim
e be
twee
n Si
gnin
g an
dUn
blin
ding
(min
utes
or d
ays)
0
1
2
3
SERVFAILNOERROR
BIND 9.7.1B1 DS Introduction BehaviorMismatched TTLs
Time between Unblinding and DS in Root (minutes or days)0 1 2 3 4 5 6 7 8 9 10
Tim
e be
twee
n Si
gnin
g an
dUn
blin
ding
(min
utes
or d
ays)
0
1
2
3
SERVFAILNOERROR
WITHOUT BLINDING In the following test, the .NET keys are not blinded, but the DNSKEY and NS TTLs
remain different. In this case the bug still manifests, indicating that the introduction of
the DS record in the root zone, rather than the blinding/unblinding process, is the likely
cause.
WORKAROUNDS Based on these tests, the following there are three ways to work around this bug in
BIND:
1. Upgrade resolver software to BIND 9.7.1b1 or later
2. Make the zoneʼs DNSKEY and NS TTLs match
3. Restart the resolver after publication of DS record
The workarounds are, for now, only enumerated here without further discussion as to
their relative merits or operational impacts.
© 2011 Verisign, Inc. All rights reserved. VERISIGN and other trademarks, service marks, and designs are registered or unregistered trademarks of Verisign, Inc. and its subsidiaries in the United States and in foreign countries. All other trademarks are property of their respective owners
BIND 9.7.0 P2 DS Introduction BehaviorWithout Blinding
Time between No Blinding and DS in Root (minutes or days)0 1 2 3 4 5 6 7 8 9 10
Tim
e be
twee
n Si
gnin
g an
dN
oBl
indi
ng (m
inut
es o
r day
s)
0
1
2
3
SERVFAILNOERROR