Cassandra Summit 2014: Novel Multi-Region Clusters — Cassandra Deployments Split Between...

35
Novel Multi-region Clusters Cassandra Deployments Split Between Heterogeneous Data Centres with NAT & DNS-SD #CassandraSummit

description

Presenter: Adam Zeglin, CTO of Instaclustr In this presentation we discuss a method of provisioning and running an Apache Cassandra deployment spilt between multiple heterogeneous data centers which, rather than allocating per-node public IPv4 addresses or configuring mesh VPNs, uses Port Address Translation (PAT) for node↔internet connectivity and is self- configuring and discoverable via DNS Service Discovery (DNS-SD or wide-area Bonjour). While Cassandra has built-in support for AWS EC2 multi-region/data centre topologies (via Ec2MultiRegionSnitch, etc), the existing solution requires the wasteful allocation of public IPv4 addresses per-node. Additionally there is little support for topologies that are either a mix of or deploy completely on alternative infrastructure providers. Our solution uses a single public IP address per data center, is provider-agnostic, doesn’t introduce the configuration and management overheads of a mesh VPN between data centres, and allows nodes to automatically discover each-other.

Transcript of Cassandra Summit 2014: Novel Multi-Region Clusters — Cassandra Deployments Split Between...

Novel Multi-region ClustersCassandra Deployments Split Between Heterogeneous Data Centres

with NAT & DNS-SD

#CassandraSummit

Adam ZegelinCo-founder & VP of Engineering

www.instaclustr.com!

[email protected]@adamzegelin

Instaclustr

• Instaclustr provides Cassandra-as-a-service in the cloud (Currently only on AWS — Google Cloud in private beta)

• We currently manage 50+ Cassandra nodes for various customers

• We often get requests to do cool things — and try and make it happen!

Multi-DC @ Instaclustr• Cloud ⇄ cloud, “classic” internet-facing data centre ⇄ cloud

• Works out-of-the-box today.

• Requires per-node public IP

• Private network clusters ⇄ Cloud clusters

• Easy if your private network allocates per-node public IP addresses

• VPNs

• Something else?

• Overview of multi- region/data centre clusters

• What is supported out-of-the-box

• Alternative solutions

• Supporting technology overview (NAT/PAT and DNS-SD)

• Implementation

Single Node

• What you get from running apt-get install cassandra and /usr/bin/cassandra

• Fragile (no redundancy)

• Dev/test/sandbox only

C*

Multi-node, Single Data Centre• Two or more servers running

Cassandra within one DC

• Replication of data (redundancy)

• Increased capacity (storage + throughput)

• Baseline for production clusters

C* C*

C*

Multi-node, Multi-DC

• Cassandra running in two or more data centres

• Global deployments

• Data near your customers (reduced latency)

• Supported out-of-the-box

C* C*

C*

C* C*

C*

C* C*

C*

Snitches• Understands data centres and racks

• Implementation may automatically determine node DC and rack (EC2MultiRegionSnitch uses AWS internal metadata service, GossipingPropertiesFileSnitch loads a .properties file)

• Node DC and rack is advertised via Gossip

• Determine node proximity (estimated link latency)

• Cluster may use a combination of Snitch implementations

Data Centres

• Collection of Racks

• Complete replications

• Geographically separate

• Possibly high-latency interconnects (e.g. East Coast US → Sydney, ~300ms round-trip)

Racks

• Collection of nodes

• May fail as a single unit

• Modelled on the traditional DC rack/cage (n-servers running of a UPS)

☁️• Amazon Web Services

(use EC2MultiRegionSnitch)

• Data Centre ≡ AWS Region(e.g. US_East_1, AP_SOUTHEAST_2)

• Rack ≡ Availability Zone(e.g. us-east-1a, ap-southeast-2b)

• Google Cloud Platform(no out-of-the-box auto-configuring snitch — use GossipingPropertiesFileSnitch, or roll your own!)

• Data Centre ≡ GCP Region(e.g. US, Europe)

• Rack ≡ Zone(e.g. us-central1-a, europe-west1-a)

Data Centre Aware• Cassandra is data centre aware

• Only fetch data from a remote DC if absolutely required (remote data is more “expensive”)

!

• Clients can be made data centre aware

• If your app knows its DC, client will talk to the closest DC

Cluster cluster = Cluster.builder() .addContactPoint(…) .withLoadBalancingPolicy(new DCAwareRoundRobinPolicy(“US_EAST_1")) .build();

Multi DC Support

• Per-node public (internet-facing) IP address

• Optionally, per-node private IP address

• Per-node public address is used for inter-data centre connectivity

• Per node private address is used for intra-data centre connectivity

Multi DC Support• Cloud ⇄ cloud, traditional ⇄ cloud, traditional ⇄ traditional

• Easy to setup per-node public and private addresses

• Private network clusters ⇄ Cloud clusters

• Private networks: 𝑛 public addresses, shared by 𝑥 private addresses. Not 1 ↔ 1 (where often 𝑥 > 𝑛)

• done via Network Address Translation

IPv4 Address Space Exhaustion

Source: http://www.potaroo.net/tools/ipv4/

Multi-DC Support

• IPv4

• Address exhaustion

• Over time, will become more expensive to purchase addresses

• Wasteful(being a good internet citizen)

Alternatives• IPv6

• Java supports it ∴ Cassandra probably supports it (untested by us)

• Global IPv6 adoption is ~4%(according to Google — google.com/intl/en/ipv6/statistics.html)

• IPv6/IPv4 hybrid(Teredo, 6over4, et. al.)

• AWS EC2 does not support IPv6. End of story. (Elastic Load Balancer does support IPv6)

Alternatives• VPNs

• tinc, OpenVPN, etc.

• All private address space — no dual addressing

• Requires multiple links — between every DC and per client

• Address space overlaps between multiple VPNs

• Connectivity to multiple clusters an issue (for multi-cluster apps, centralised monitoring, etc)

Data Centres Links

3 3

5 10

7 21

Alternatives

• Network Address Translation (NAT)(aka IP Masquerading or Port Address Translation (PAT))

• Deployed on most private networks

• Connectivity between private network clusters ⇄ Cloud clusters

• Supports client connectivity to multiple clusters

NAT Basics• Re-maps IP address spaces

(e.g. Public 96.31.81.80 ↔ Private 192.168.*.*)

• 𝑛 public addresses, shared by 𝑥 private addresses. Not 1 ↔ 1 (where often n = 1, 𝑥 > 𝑛)

• Port Address Translation

• Private port ↔ Public port

• Outbound connections only without port forwarding or NAT traversal

• Per DC gateway device — performs NAT and port forwarding

NAT with Inbound Connections

• Static port forwarding(configured on the gateway)

• Automatic port forwarding — UPnP, NAT-PMP/PCP (configured by the application, e.g. Cassandra)

• NAT Traversal — STUN, ICE, etc.

NAT + C∗

Situation: 𝑛 Cassandra nodes, 1 public address per data centre

• Port forward different public ports for each node

• Advertise assigned ports

• Modify Cassandra and client applications to connect to advertised ports

Advertising Port Mappings• Extend Cassandra Gossip

• Include port numbers in node address announcements

• Allow seed node addresses to include port numbers

• Allow multiple nodes to have identical public & private addresses(only port numbers differ per DC)

• How to bootstrap? SIP?

• Cassandra must be aware of the allocated ports in order to advertise

• Hard if C* is not directly responsible for the port mapping (e.g. static port forwarding)

• Too many modifications to internals

Advertising Port Mappings• DNS-SD — dns-sd.org

(aka Bonjour/Zeroconf)

• Reads — works with existing DNS implementations(it’s just a DNS query)

• Even inside restrictive networks, DNS usually works

• Combination of DNS TXT, SRV and PTR records.

• Updates

• via DNS Update & TSIG — supported by bind

• via API — e.g. for AWS Route 53

Advertising Port Mappings• DNS-SD cont’d.

• SRV records contain hostname and port(i.e., hostname of the NAT gateway and public C* port)

• TXT records contain key=value pairs(useful for additional connection & config details)

• Modify C* connection code to lookup foreign node port from DNS

• Modify client driver connection code to lookup ports from DNS

• Can be queried & updated out-of-band(updated by the NAT device or central management server which knows which ports were mapped)

Advertised Details• Each cluster is it’s own browse domain

• Each NAT gateway device has an A record in the browse domain

• Each DNS-SD service is named based on the private IP address

• Requires unique private IP addresses across data centres

• SRV port is the C* thrift port

• Additional ports are advertise via TXT

Configuration• Cassandra is configured to only use private addresses

• On cluster creation

• Establish a new DNS-SD browse domain

• Create A records for each gateway device

• NAT gateway device is notified when a new C* node is started

• Allocates random public ports for C* and configures Port Forwarding

• Updates DNS-SD

• New SRV and TXT record

$ dns-sd -B _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.Browsing for _cassandra._tcp!A/R Flags if Domain Service Type Instance NameAdd 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-4Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-2Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-3Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-2-2Add 3 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-4Add 2 0 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. _cassandra._tcp. 192-168-1-3

$ dns-sd -L 192-168-1-4 _cassandra._tcp 1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.Lookup 192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.!192-168-1-4._cassandra._tcp.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au. can be reached at aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.:1236 (interface 0) version=2.0.7 cqlport=1237

$ nslookup aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.au.!Non-authoritative answer:Name: aws-us-east1-gateway.1da53f83-e635-11e3-96eb-2ec9d09504f5.clusters.instaclustr.com.auAddress: 54.209.123.195

Output of dns-sd (Can also use avahi-browse, dig, or any other DNS query tool)

Java Driver Modifications

• This is usually a no-op (the default is IdentityTranslater)

• Modify translate() to perform a DNS-SD lookup.

• The address parameter is a node private IP address.

• Locate a service with a name = private IP address to determine public IP/port.

public interface AddressTranslater { ! public InetSocketAddress translate(InetSocketAddress address); !}

Modifying Cassandra

• Responsible for managing Socket connections.

• Modify newSocket() to perform a DNS-SD lookup.

• The endpoint parameter is a node private IP address.

• Locate a service with a name = private IP address to determine public IP/port

public class OutboundTcpConnectionPool!{!! !! ⋮! public static Socket newSocket(InetAddress endpoint) throws IOException {…} ⋮ }

C* C*

C*

C* C*

C*

NAT Gateway NAT Gateway

DNS (+ DNS-SD) Server (Route 53, Self-hosted, etc)Client

Application"

Thanks! Questions?

[email protected]