NY Web Performance - DNS as a Web Performance Tool

57
Intelligent DNS & Traffic Management Intelligent DNS & Traffic Management April 19, 2016 Kris Beevers, co-founder & CEO DNS as a web performance tool

Transcript of NY Web Performance - DNS as a Web Performance Tool

Intelligent DNS & Traffic ManagementIntelligent DNS & Traffic Management

April 19, 2016

Kris Beevers, co-founder & CEO

DNS as a web performance tool

Hi! Me:Huge performance geekBackend & internet infrastructure at scalePreviously: cloud, bare metal, CDN, etcThese days: intelligent DNS @ NS1

Doesn’t DNS “just work”?Sure, most of the time.Knowledge = power.

Today:DNS crash course / refresherWhat can go wrong & how perf is impactedOptimizing DNS lookup performanceDNS as an offensive tool to optimize app perf

DNS LOOKUP: A QUICK RECAP.

Delegation:Authoritative nameservers for a domain are DELEGATED via the registrar at the TLD nameservers

Recursive lookup:DNS is a hierarchical distributed database.

End-to-end lookups start at the root;continue at TLD servers;and are resolved by authorities for a domain(which may subdelegate, reference other domains, etc).

Caching resolvers:End-to-end resolution is usually performed by special servers called resolvers.

Resolvers cache DNS records according to a TTL specified by the authority.

Client-side caching:Some clients (OSes, modern browsers) implement their own DNS caching.

End to end:

(1) request

www.example.com A?

(2) local cache?

(3) query

(8) answer

(4) cache checks:www.example.com Aexample.com NScom NS

(5) com NS?

(6) example.com NS?

(7) www.example.com A?

RESOLVER

ROOT (“.”)AUTHORITY

COMAUTHORITY

EXAMPLE.COMAUTHORITY

Understanding DNS perf is hardCaching everywhereMost of the lookup process short circuited most of the time

What are the most commonly incurred DNS lookup costs?

Depends on many factors:

● Domain popularity● Distribution of users● Cache TTLs

RESOLVER

ROOT (“.”)AUTHORITY

COMAUTHORITY

EXAMPLE.COMAUTHORITY

60-100%

5-50%

PITFALLS: HOW DNS LOOKUP CAN NEGATIVELY IMPACT PERF.

Client → Resolver

Up to the client. Not much you can do.

RESOLVER

ROOT (“.”)AUTHORITY

COMAUTHORITY

EXAMPLE.COMAUTHORITY

Resolver → Authority

Direct latency impact on cache missLong lookup times block asset loadsMinimize authoritative DNS response timesAvoid embedding assets that are bad at this!

RESOLVER

ROOT (“.”)AUTHORITY

COMAUTHORITY

EXAMPLE.COMAUTHORITY

A problematic asset:What does a slow authoritative DNS lookup look like?

Results in WPT or from actual clients can be intermittent because of caching.

Examine the lookup path of your assets.

Let’s dig in (pun intended).

Follow the lookup trail:Who is the authority?

Follow the lookup trail:Is it the final authority? Query it directly.

In this case, no -- we don’t have a webserver IP yet.

Let’s continue to gather more information like a resolver would.

Follow the CNAME:Who is the authority for the CNAME?

Is it the final authority?Lookup the CNAME target like a resolver would.

We’ve gotten back A records (IP addresses) -- these are the webservers for this asset.

Checking for slownessRecap: we needed to talk to two authoritative nameservers to resolve the asset’s webserver IPs:

How performant are they?

Checking for slownessRecap: we needed to talk to two authoritative nameservers to resolve the asset’s webserver IPs:

How performant are they?

Let’s keep it simple: traceroutes from NYC

Checking for slownessRecap: we needed to talk to two authoritative nameservers to resolve the asset’s webserver IPs:

How performant are they?

How about traceroutes from California?

Checking for slownessSo:

is a problematic authority.

If there is a cache miss at a user’s resolver, in NYC this lookup will introduce at least ~80ms delay for the asset, and in California at least ~150ms.

A well-managed asset:Quick look at a better authoritative DNS situation.

A well-managed asset:Quick look at a better authoritative DNS situation.

Who is authoritative? Cloudflare.

A well-managed asset:Quick look at a better authoritative DNS situation.

Who is authoritative? Cloudflare.

Final authority? Nope, there’s a CNAME.

A well-managed asset:Quick look at a better authoritative DNS situation.

Who is authoritative? Cloudflare.

Final authority? Nope, there’s a CNAME.

CNAME authority? NS1.

A well-managed asset:Quick look at a better authoritative DNS situation.

Who is authoritative? Cloudflare.

Final authority? Nope, there’s a CNAME.

CNAME authority? NS1.

Final authority? Yep, there’s the A record.

A well-managed asset:Quick look at a better authoritative DNS situation.

Who is authoritative? Cloudflare.

Final authority? Nope, there’s a CNAME.

CNAME authority? NS1.

Final authority? Yep, there’s the A record.

Are Cloudflare or NS1 slow from NYC? No.

A well-managed asset:Quick look at a better authoritative DNS situation.

Who is authoritative? Cloudflare.

Final authority? Nope, there’s a CNAME.

CNAME authority? NS1.

Final authority? Yep, there’s the A record.

Are Cloudflare or NS1 slow from NYC? No.

From California? No.

Signs are good that this asset’s authoritative DNS won’t cause trouble!

Hunt down assets with slow lookups in the DNS resolution path.You won’t see the impact every time, but they will cause intermittent slowdowns.

OPTIMIZING DNS LOOKUP PERF FOR YOUR DOMAINS.

“Common knowledge” is out of dateNever okay for one of your nameservers to failGeographically redundant nameservers != fastAll managed DNS topologies not created equal

When a nameserver fails

Sure, you’re still “up” ... if 1500ms is “up”

Timeout: usually 1500ms

Geographically redundant NS’s

SRTT = smoothed round trip time (performance affinity -- prefer fastest authority)

Even when SRTT is in place, do you want to leave your perf up to ISPs?

SRTT: 50-60% of resolvers

1-10% 90-99%

Non-SRTT: the rest

50% 50%

Just use managed DNS… right?

Not all managed DNS networks are created equal

# POPs

40+

50+

?

<1010+15+

30+15+20+

Anycasted DNS Networks

The (globally) highest performing DNS networks are anycasted and tightly optimized for latency

Operationally complexExpensive to build and maintain“Don’t try this at home”

Well-managed property: nytimes.comLet’s look at authoritative DNS for every single asset on nytimes.com homepage

Well-managed property: nytimes.comLet’s look at authoritative DNS for every single asset on nytimes.com homepage

100% anycasted managed DNS and CDN:

CDN:Facebook ( , )Google ( )Akamai ( )Edgecast ( )CDNetworks ( )

Managed DNS:NS1 ( )AWS Route53 ( )Dyn ( )Akamai ( )

GOING ON OFFENSE: DNS AS A PERF OPTIMIZATION TOOL.

DNS lookup is an opportunity to inject intelligence in the delivery path

First indication of interest in content or application

DNS lookup: powerful tool for service endpoint selection

Multiple datacentersMultiple CDNsInfrastructure elasticity…

Effective service endpoint selection is critical in 2016:

Web applications are increasingly dynamic and distributed

Web applications have increasingly wide audiences

Early 2000s

Centralized applicationsLocalized audiences

LB

DB

W W W

App datacenter

Early 2010s

Centralized origin / dynamic applicationCaching CDNGlobal audience

LB

DB

W W W

CDNCache

Origin

Today

Distributed application nodes (dynamic app delivery moving to where the users are)

Multiple CDNs (optimized for different use cases & markets)

Impatient global audience (blame the Big Boys for paving the way)

CDNB

CDNA

Why is DNS lookup good for service endpoint selection?

Pervasive: part of every application, used by every client

Lightweight: no application / architecture changes, no hardware / software

Endpoint selection example:

Historical approach: geographic routing

Modern DNS traffic management platforms use real-time perf data

Infrastructure telemetryRoute based on app infrastructure performance & workload

TopologyHealthLoadTraffic…

DNS

Modern DNS traffic management platforms use real-time perf data

Network telemetryRoute based on real measured end-user performance

LatencyThroughputReachability…

DNS

Modern DNS traffic management platforms use real-time perf data

TopologyHealthLoadTrafficLatencyThroughputReachability…

DNS

RESOLVERMODELING

APP-SPECIFIC ROUTING ALGO

WRAPPING UP:SOME TAKEAWAYS.

Avoid embedding assets serviced by slow DNS

Follow the end-to-end DNS lookup path of your assetsUse globally anycasted DNS networks

Use intelligence in DNS lookup to control & optimize your traffic as:

Your audience globalizesYour application becomes more distributedYour CDN needs become more specialized

com@nsoneinc