C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop by Derek...
-
Upload
planet-cassandra -
Category
Technology
-
view
608 -
download
0
description
Transcript of C* Summit 2013: The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop by Derek...
reimagining the business of apps
#Cassandra13
©2013 NativeX Holdings, LLC
The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
#Cassandra13
About the Presenters Jeff Smoley – Infrastructure Architect
Derek Bromenshenkel – Infrastructure Architect
#Cassandra13
Agenda • About NativeX • Why Cassandra? • Challenges • Auto Id Generation • FluentCassandra • Hector • IKVM.NET • HectorNet • Reporting Integration
©2013 NativeX Holdings, LLC #Cassandra13
About NativeX • Formerly W3i
• Home Office in Sartell, MN
• 75 miles NW of Minneapolis
• Remote Offices in MSP and SF
• 150 Employees
©2013 NativeX Holdings, LLC #Cassandra13
What NativeX Does • Marketing technology
platform that enables developers to build successful business around their apps.
• We provide Publishers with a way to monetize and Advertisers with a way to gain distribution.
#Cassandra13
Mobile Vanity Metrics • Over 700M unique devices
• 1000s of Apps
• > 100M Monthly Active Users
• > 200GB of data ingest per week
©2013 NativeX Holdings, LLC #Cassandra13
Backstory • From 100M session/quarter
to 5B.
• Anticipate 7B sessions in Q2.
• Growth was anticipated.
• Realized infrastructure needed to change to support this. 0
1
2
3
4
5
6
2011 Q4 2012 Q1 2012 Q2 2012 Q3 2012 Q4 2013 Q1
Billion
s
API Requests
©2013 NativeX Holdings, LLC #Cassandra13
Original OLTP Architecture • Microsoft SQL Server
• 2 Node Cluster (Failover)
• 12 cores / node
• 192 GB mem / node • Compellent SAN
• 172 Tiered Disk
• SSD, FC, SATA
©2013 NativeX Holdings, LLC #Cassandra13
Objectives
Scale
• Horizontal • Incremental cost structure
Resiliency
• No single point of failure
• Geographically distributed
©2013 NativeX Holdings, LLC #Cassandra13
What is NoSQL • Stands for Not Only SQL.
• The NoSQL movement is about understanding problems and focusing on solutions.
• It’s not about silver bullets and black boxes.
• It is about using the right tool for the right problem.
©2013 NativeX Holdings, LLC #Cassandra13
Researched Products • Compared features like:
• Distributed / Shared Nothing
• Multi-Cluster Support
• Maturity & Popularity
• Documentation
• .NET Support
©2013 NativeX Holdings, LLC #Cassandra13
Why Cassandra? • Multi-node • Multi-cluster • Highly Available
• Durable • Shared Nothing • Tunable Consistency
©2013 NativeX Holdings, LLC #Cassandra13
Cassandra at NativeX • C* was not a replacement DB system.
• We continue to use MS SQL Server alongside C*.
• SQL Server used for storing configuration data.
• C* solves a very specific problem for us.
• Writing large volumes of data quickly.
• Reading very specific data out of a large record set.
#Cassandra13
Challenges • C* does not have Auto Id generation.
• How to connect to C* with C#?
• Finding a connector with good Failure Tolerance.
• How to integrate our reporting system?
©2013 NativeX Holdings, LLC #Cassandra13
Auto ID Generation • Pre-existing requirements
• Unique, 64-bit positive integers
• Increasing (sortable) a plus
• Previously SQL Server Identity column
• A Time-based UUID is sortable and unique
• Changed everything we could
• The future for us
©2013 NativeX Holdings, LLC #Cassandra13
Auto ID – What are the options? • SQL dummy table
• Easy & familiar, but limited • Pre-generated range
• Proposed by @mdennis
• Distributed, but more complicated to implement • Sharding [Instagram]
• Discovered too late
• Unfamiliar with Postgres
©2013 NativeX Holdings, LLC #Cassandra13
We chose Snowflake • Built by Twitter, Apache 2.0 license
• https://github.com/twitter/snowflake
• “… network service for generating unique ID numbers at high scale..”
• Same motivation; MySQL -> C*
• A few tweaks for our Windows environment
©2013 NativeX Holdings, LLC #Cassandra13
Technical reasons for Snowflake • Meets all requirements
• Tested in high transaction system
• Java based [Scala] implementation
• Thrift server
• Run as a Windows service with Apache Daemon
• Con: Requires Apache Zookeeper
• Coordinate the worker id
©2013 NativeX Holdings, LLC #Cassandra13
Connecting to Snowflake • Built our own .NET
Snowflake Client
• Snowflake server on each web node
• Local instance is primary
• Round robin failover to other nodes
• Auto failover AND recovery
• “Circuit Breaker” pattern
Web App SF
Server 1
Web App SF
Server 3
Web App SF
Server 2
Web App SF
Server 4
#Cassandra13
Challenges • C* does not have Auto Id generation.
• How to connect to C* with C#?
• Finding a connector with good Failure Tolerance.
• How to integrate our reporting system?
©2013 NativeX Holdings, LLC #Cassandra13
Connecting to Cassandra with C# • Thrift alone too low level • Needs
• CQL support • Active development / support
• Wants • ADO.NET / LINQ feel • ????
• FluentCassandra is where we started
©2013 NativeX Holdings, LLC #Cassandra13
Vetting FluentCassandra • Pros
• Open source - https://github.com/fluentcassandra/fluentcassandra
• Nick Berardi, project owner, is excellent
• Designed for CQL
• Familiar feel
• Were able to start project development with it
©2013 NativeX Holdings, LLC #Cassandra13
Vetting FluentCassandra • Cons
• Immaturity
• Few users with high transaction system
• Permanent node blacklisting
• Lacked auto retry
• Couldn’t live with these limitations
• Tried hiring independent contractor to help us mature it
#Cassandra13
Challenges • C* does not have Auto Id generation.
• How to connect to C* with C#?
• Finding a connector with good Failure Tolerance.
• How to integrate our reporting system?
©2013 NativeX Holdings, LLC #Cassandra13
Hector: Yes, please • Popular C* connector
• Use cases matching ours
• Good maturity
• Auto node discovery
• Auto retry
• Auto failure recovery
• Written in Java – major roadblock
©2013 NativeX Holdings, LLC #Cassandra13
Help! • We knew we still needed help.
• We found a company named Concord.
• Based out of the Twin Cites.
• Specialize in System, Process, and Data Integration.
• http://concordusa.com/
©2013 NativeX Holdings, LLC #Cassandra13
Concord’s Recommendation • Concord recommended that we use IKVM.NET to port Hector to
a .NET assembly.
• They had previous success using IKVM for other Java to .NET ports.
• They felt that maturing FluentCassandra was going to take longer than our timeline allowed.
©2013 NativeX Holdings, LLC #Cassandra13
About the IKVM.NET Project • http://www.ikvm.net/
• Open Source Project.
• Main contributor is Jeroen Frijters.
• He is actively contributing to the project.
• License allows for use in commercial applications.
©2013 NativeX Holdings, LLC #Cassandra13
What is IKVM.NET? • IKVM.NET includes the following components:
• A Java Virtual Machine implemented in .NET.
• A .NET implementation of the Java class libraries.
• Set of tools that enable Java and .NET interoperability.
©2013 NativeX Holdings, LLC #Cassandra13
Uses for IKVM • Drop-in JVM
• Included is a distribution of a .NET implementation of a Java Virtual Machine.
• Allows you to run jar files using the .NET stack.
• Example: ikvm -jar myapp.jar
©2013 NativeX Holdings, LLC #Cassandra13
Uses for IKVM • Use Java libraries in your .NET applications
• Using ikvmc you can compile Java bytecode to .NET IL.
• Example: ikvmc -target:library mylib.jar
©2013 NativeX Holdings, LLC #Cassandra13
Uses for IKVM • Develop .NET applications in Java
• Write code in Java.
• Compile to JVM bytecode.
• Use ikvmc to produce a .NET Executable.
• Can also use .NET API’s in Java code using the ikvmstub application to generate a Java jar file.
• Example: ikvmstub MyDotNetAssemblyName
©2013 NativeX Holdings, LLC #Cassandra13
Hector Converted to .NET • Per Concord’s recommendation we chose to compile the Hector
jar into a .NET Assembly.
• Hector and all of it’s dependencies are pulled into one .NET dll that can be referenced by any .NET assembly.
• In addition you will have to reference some core IKVM assemblies.
• Each Java dependency is given it’s own namespace with in the .NET dll.
©2013 NativeX Holdings, LLC #Cassandra13
HectorNet • Concord also created a dll called HectorNet that wraps some of
the Hector behaviors and makes it feel more like .NET.
• Such as supporting connection strings.
• Mapping Thrift byte arrays to .NET data types.
• Mapping to native .NET collections instead of using Java collections.
#Cassandra13
Challenges • C* does not have Auto Id generation.
• How to connect to C* with C#?
• Finding a connector with good Failure Tolerance.
• How to integrate our reporting system?
©2013 NativeX Holdings, LLC #Cassandra13
Integrating Reporting
OLTP C*
Extract Transform
CUBE SSAS
OLAP MS SQL
Load
ETL -‐ SSIS
©2013 NativeX Holdings, LLC #Cassandra13
Integrating Reporting • The SSIS Extract process uses C# Script Tasks.
• Script Task needs references to HectorNet and all of its dependencies.
• SSIS can only reference assemblies that are in the GAC.
• Assemblies in the GAC have to be Signed.
#Cassandra13
Why Not DataStax C# Driver? • We built everything using CQL 2.0. • Wasn’t ready in time for our launch date.
#Cassandra13
DSE for the Win! • We use DataStax Enterprise.
• Mainly for support, which continues to be a life saver.
©2013 NativeX Holdings, LLC #Cassandra13
Thank you! • We are hiring
• http://nativex.com/careers/ • Join the MSP C* Meetup
• http://www.meetup.com/Minneapolis-St-Paul-Cassandra-Meetup/ • Contact us
• [email protected] • [email protected] @breakingtrail
• Slide Deck • http://www.slideshare.net/jjsmoley/the-perils-and-triumphs-of-using-
cassandra-at-a-netmicrosoft-shop