Scalable Databases - From Relational Databases To Polyglot Persistence
Developing polyglot persistence applications #javaone 2012
-
Upload
chris-richardson -
Category
Technology
-
view
3.352 -
download
0
description
Transcript of Developing polyglot persistence applications #javaone 2012
DEVELOPING POLYGLOT PERSISTENCE APPLICATIONS
Chris Richardson
Author of POJOs in ActionFounder of the original CloudFoundry.com
@[email protected]://plainoldobjects.com/
Presentation goal
The benefits and drawbacks of polyglot persistence
and
How to design applications that use this approach
About Chris
(About Chris)
About Chris()
About Chris
About Chris
http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/
vmc push About-Chris
Developer Advocate for CloudFoundry.com
Signup at http://cloudfoundry.com promo code: cfjavaone
Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
• Using a modular asynchronous architecture
Food to Go
• Take-out food delivery service
• “Launched” in 2006
Food To Go Architecture
Order taking
Restaurant Management
MySQL Database
CONSUMER RESTAURANT OWNER
Success ⇒ Growth challenges
• Increasing traffic
• Increasing data volume
• Distribute across a few data centers
• Increasing domain model complexity
Limitations of relational databases
• Scalability
• Distribution
• Schema updates
• O/R impedance mismatch
• Handling semi-structured data
Solution: Spend Money
http://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPG
OR
http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_5_series/madone_5_2/#
Solution: Use NoSQL
Benefits
• Higher performance
• Higher scalability
• Richer data-model
• Schema-less
Drawbacks
• Limited transactions
• Limited querying
• Relaxed consistency
• Unconstrained data
Example NoSQL Databases
Database Key features
Cassandra Extensible column store, very scalable, distributed
Neo4j Graph database
MongoDB Document-oriented, fast, scalable
Redis Key-value store, very fast
http://nosql-database.org/ lists 122+ NoSQL databases
• Advanced key-value store
• Very fast, e.g. 100K reqs/sec
• Optional persistence
• Transactions with optimistic locking
• Master-slave replication
• Sharding using client-side consistent hashing
RedisK1 V1
K2 V2
... ...
Sorted sets
myset a
5.0
b
10.
Key Value
ScoreMembers are sorted by score
Adding members to a sorted setRedis Server
zadd myset 5.0 a myset a
5.0
Key Score Value
Adding members to a sorted setRedis Server
zadd myset 10.0 b myset a
5.0
b
10.
Adding members to a sorted setRedis Server
zadd myset 1.0 c myset a
5.0
b
10.
c
1.0
Retrieving members by index range
Redis Server
zrange myset 0 1
myset a
5.0
b
10.
c
1.0ac
Key StartIndex
EndIndex
Retrieving members by score
Redis Server
zrangebyscore myset 1 6
myset a
5.0
b
10.
c
1.0ac
Key Min value
Max value
Redis use cases• Replacement for Memcached
• Session state
• Cache of data retrieved from system of record (SOR)
• Replica of SOR for queries needing high-performance
• Handling tasks that overload an RDBMS
• Hit counts - INCR
• Most recent N items - LPUSH and LTRIM
• Randomly selecting an item – SRANDMEMBER
• Queuing – Lists with LPOP, RPUSH, ….
• High score tables – Sorted sets and ZINCRBY
• …
Redis is great but there are tradeoffs
• Low-level query language: PK-based access only
• Limited transaction model:
• Read first and then execute updates as batch
• Difficult to compose code
• Data must fit in memory
• Single-threaded server : run multiple with client-side sharding
• Missing features such as access control, ...
And don’t forget:
An RDBMS is fine for many applications
The future is polyglot
IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg
e.g. Netflix• RDBMS• SimpleDB• Cassandra• Hadoop/Hbase
Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
• Using a modular asynchronous architecture
Increase scalability by caching
Order taking
Restaurant Management
MySQL Database
CONSUMER RESTAURANT OWNER
Cache
Caching Options
• Where:
• Hibernate 2nd level cache
• Explicit calls from application code
• Caching aspect
• Cache technologies: Ehcache, Memcached, Infinispan, ...
Redis is also an option
Using Redis as a cache
• Spring 3.1 cache abstraction
• Annotations specify which methods to cache
• CacheManager - pluggable back-end cache
• Spring Data for Redis
• Simplifies the development of Redis applications
• Provides RedisTemplate (analogous to JdbcTemplate)
• Provides RedisCacheManager
Using Spring 3.1 Caching@Servicepublic class RestaurantManagementServiceImpl implements RestaurantManagementService {
private final RestaurantRepository restaurantRepository;
@Autowired public RestaurantManagementServiceImpl(RestaurantRepository restaurantRepository) { this.restaurantRepository = restaurantRepository; } @Override public void add(Restaurant restaurant) { restaurantRepository.add(restaurant); }
@Override @Cacheable(value = "Restaurant") public Restaurant findById(int id) { return restaurantRepository.findRestaurant(id); }
@Override @CacheEvict(value = "Restaurant", key="#restaurant.id") public void update(Restaurant restaurant) { restaurantRepository.update(restaurant); }
Cache result
Evict from cache
Configuring the Redis Cache Manager
<cache:annotation-driven />
<bean id="cacheManager" class="org.springframework.data.redis.cache.RedisCacheManager" > <constructor-arg ref="restaurantTemplate"/> </bean>
Enables caching
Specifies CacheManager implementation
The RedisTemplate used to access Redis
TimeRange MenuItem
Domain object to key-value mapping?
Restaurant
TimeRange MenuItem
ServiceArea
K1 V1
K2 V2
... ...
RedisTemplate
• Analogous to JdbcTemplate
• Encapsulates boilerplate code, e.g. connection management
• Maps Java objects ⇔ Redis byte[]’s
Serializers: object ⇔ byte[]
• RedisTemplate has multiple serializers
• DefaultSerializer - defaults to JdkSerializationRedisSerializer
• KeySerializer
• ValueSerializer
• HashKeySerializer
• HashValueSerializer
Serializing a Restaurant as JSON@Configurationpublic class RestaurantManagementRedisConfiguration {
@Autowired private RestaurantObjectMapperFactory restaurantObjectMapperFactory; private JacksonJsonRedisSerializer<Restaurant> makeRestaurantJsonSerializer() { JacksonJsonRedisSerializer<Restaurant> serializer =
new JacksonJsonRedisSerializer<Restaurant>(Restaurant.class); ... return serializer; }
@Bean @Qualifier("Restaurant") public RedisTemplate<String, Restaurant> restaurantTemplate(RedisConnectionFactory factory) { RedisTemplate<String, Restaurant> template = new RedisTemplate<String, Restaurant>(); template.setConnectionFactory(factory); JacksonJsonRedisSerializer<Restaurant> jsonSerializer = makeRestaurantJsonSerializer(); template.setValueSerializer(jsonSerializer); return template; }
}
Serialize restaurants using Jackson JSON
Caching with Redis
Order taking
Restaurant Management
MySQL Database
CONSUMER RESTAURANT OWNER
RedisCacheFirst Second
Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
• Using a modular asynchronous architecture
Finding available restaurants
Available restaurants =Serve the zip code of the delivery address
AND Are open at the delivery time
public interface AvailableRestaurantRepository {
List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime); ...}
Food to Go – Domain model (partial)
class Restaurant { long id; String name; Set<String> serviceArea; Set<TimeRange> openingHours; List<MenuItem> menuItems;}
class MenuItem { String name; double price;}
class TimeRange { long id; int dayOfWeek; int openTime; int closeTime;}
Database schemaID Name …
1 Ajanta
2 Montclair Eggshop
Restaurant_id zipcode
1 94707
1 94619
2 94611
2 94619
Restaurant_id dayOfWeek openTime closeTime
1 Monday 1130 1430
1 Monday 1730 2130
2 Tuesday 1130 …
RESTAURANT table
RESTAURANT_ZIPCODE table
RESTAURANT_TIME_RANGE table
Finding available restaurants on Monday, 6.15pm for 94619 zipcode
Straightforward three-way join
select r.*from restaurant r inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_id where ’94619’ = sa.zip_code and tr.day_of_week=’monday’ and tr.openingtime <= 1815 and 1815 <= tr.closingtime
How to scale queries?
Option #1: Query caching
• [ZipCode, DeliveryTime] ⇨ list of available restaurants
BUT
• Long tail queries
• Update restaurant ⇨ Flush entire cache
Ineffective
Option #2: Master/Slave replication
MySQL Master
MySQL Slave 1
MySQL Slave 2
MySQL Slave N
Writes Consistent reads
Queries(Inconsistent reads)
Master/Slave replication
• Mostly straightforward
BUT
• Assumes that SQL query is efficient
• Complexity of administration of slaves
• Doesn’t scale writes
Option #3: Redis materialized views
Order taking
Restaurant Management
MySQL Database
CONSUMER RESTAURANT OWNER
CacheRedis
Systemof RecordCopy
findAvailable()update()
BUT how to implement findAvailableRestaurants() with Redis?!
?select r.*from restaurant r inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_id where ’94619’ = sa.zip_code and tr.day_of_week=’monday’
and tr.openingtime <= 1815 and 1815 <= tr.closingtime
K1 V1
K2 V2
... ...
Where we need to be
ZRANGEBYSCORE myset 1 6
select value,scorefrom sorted_setwhere key = ‘myset’ and score >= 1 and score <= 6
=
key value score
sorted_set
We need to denormalize
Think materialized view
Simplification #1: Denormalization
Restaurant_id Day_of_week Open_time Close_time Zip_code
1 Monday 1130 1430 94707
1 Monday 1130 1430 94619
1 Monday 1730 2130 94707
1 Monday 1730 2130 94619
2 Monday 0700 1430 94619
…
SELECT restaurant_id FROM time_range_zip_code WHERE day_of_week = ‘Monday’ AND zip_code = 94619 AND 1815 < close_time AND open_time < 1815
Simpler query: No joins Two = and two <
Simplification #2: Application filtering
SELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ AND zip_code = 94619 AND 1815 < close_time AND open_time < 1815
Even simpler query• No joins• Two = and one <
Simplification #3: Eliminate multiple =’s with concatenation
SELECT restaurant_id, open_time FROM time_range_zip_code WHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time
Restaurant_id Zip_dow Open_time Close_time
1 94707:Monday 1130 1430
1 94619:Monday 1130 1430
1 94707:Monday 1730 2130
1 94619:Monday 1730 2130
2 94619:Monday 0700 1430
…
key
range
Simplification #4: Eliminate multiple RETURN VALUES with concatenation
SELECT open_time_restaurant_id, FROM time_range_zip_code WHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time
zip_dow open_time_restaurant_id close_time
94707:Monday 1130_1 1430
94619:Monday 1130_1 1430
94707:Monday 1730_1 2130
94619:Monday 1730_1 2130
94619:Monday 0700_2 1430
...
✔
zip_dow open_time_restaurant_id close_time
94707:Monday 1130_1 1430
94619:Monday 1130_1 1430
94707:Monday 1730_1 2130
94619:Monday 1730_1 2130
94619:Monday 0700_2 1430
...
Sorted Set [ Entry:Score, …]
[1130_1:1430, 1730_1:2130]
[0700_2:1430, 1130_1:1430, 1730_1:2130]
Using a Redis sorted set as an index
Key
94619:Monday
94707:Monday
94619:Monday 0700_2 1430
Querying with ZRANGEBYSCORE
ZRANGEBYSCORE 94619:Monday 1815 2359
{1730_1}
1730 is before 1815 Ajanta is open
Delivery timeDelivery zip and day
Sorted Set [ Entry:Score, …]
[1130_1:1430, 1730_1:2130]
[0700_2:1430, 1130_1:1430, 1730_1:2130]
Key
94619:Monday
94707:Monday
Adding a Restaurant
Text
@Componentpublic class AvailableRestaurantRepositoryImpl implements AvailableRestaurantRepository {
@Override public void add(Restaurant restaurant) { addRestaurantDetails(restaurant); addAvailabilityIndexEntries(restaurant); }
private void addRestaurantDetails(Restaurant restaurant) { restaurantTemplate.opsForValue().set(keyFormatter.key(restaurant.getId()), restaurant); } private void addAvailabilityIndexEntries(Restaurant restaurant) { for (TimeRange tr : restaurant.getOpeningHours()) { String indexValue = formatTrId(restaurant, tr); int dayOfWeek = tr.getDayOfWeek(); int closingTime = tr.getClosingTime(); for (String zipCode : restaurant.getServiceArea()) { redisTemplate.opsForZSet().add(closingTimesKey(zipCode, dayOfWeek), indexValue,
closingTime); } } }
key member
score
Store as JSON
Finding available Restaurants@Componentpublic class AvailableRestaurantRepositoryImpl implements AvailableRestaurantRepository { @Override public List<AvailableRestaurant>
findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { String zipCode = deliveryAddress.getZip(); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime); int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); String closingTimesKey = closingTimesKey(zipCode, dayOfWeek);
Set<String> trsClosingAfter = redisTemplate.opsForZSet().rangeByScore(closingTimesKey, timeOfDay, 2359);
Set<String> restaurantIds = new HashSet<String>(); for (String tr : trsClosingAfter) { String[] values = tr.split("_"); if (Integer.parseInt(values[0]) <= timeOfDay) restaurantIds.add(values[1]); } Collection<String> keys = keyFormatter.keys(restaurantIds); return availableRestaurantTemplate.opsForValue().multiGet(keys); }
Find those that close after
Filter out those that open after
Retrieve open restaurants
Sorry Ted!
http://en.wikipedia.org/wiki/Edgar_F._Codd
Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
• Using a modular asynchronous architecture
MySQL & Redis need to be consistent
Two-Phase commit is not an option
• Redis does not support it
• Even if it did, 2PC is best avoided http://www.infoq.com/articles/ebay-scalability-best-practices
AtomicConsistentIsolatedDurable
Basically AvailableSoft stateEventually consistent
BASE: An Acid Alternative http://queue.acm.org/detail.cfm?id=1394128
Updating Redis #FAILbegin MySQL transaction update MySQL update Redisrollback MySQL transaction
Redis has updateMySQL does not
begin MySQL transaction update MySQLcommit MySQL transaction<<system crashes>> update Redis
MySQL has updateRedis does not
Updating Redis reliablyStep 1 of 2
begin MySQL transactionupdate MySQLqueue CRUD event in MySQL
commit transaction
ACID
Event IdOperation: Create, Update, DeleteNew entity state, e.g. JSON
Updating Redis reliably Step 2 of 2
for each CRUD event in MySQL queue get next CRUD event from MySQL queue
If CRUD event is not duplicate then Update Redis (incl. eventId)end ifbegin MySQL transaction
mark CRUD event as processedcommit transaction
ID JSON processed?
ENTITY_CRUD_EVENT
EntityCrudEventProcessor
RedisUpdater
apply(event)
Step 1 Step 2
EntityCrudEventRepository
INSERT INTO ...
Timer
SELECT ... FROM ...
Redis
Updating Redis
WATCH restaurant:lastSeenEventId:≪restaurantId≫
Optimistic locking
Duplicate detection
lastSeenEventId = GET restaurant:lastSeenEventId:≪restaurantId≫
if (lastSeenEventId >= eventId) return;
Transaction
MULTI SET restaurant:lastSeenEventId:≪restaurantId≫ eventId
... update the restaurant data...
EXEC
Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
• Using a modular asynchronous architecture
How do we generate CRUD events?
Change tracking options
• Explicit code
• Hibernate event listener
• Service-layer aspect
• CQRS/Event-sourcing
ID JSON processed?
ENTITY_CRUD_EVENT
EntityCrudEventRepository
HibernateEventListener
Hibernate event listenerpublic class ChangeTrackingListener implements PostInsertEventListener, PostDeleteEventListener, PostUpdateEventListener { @Autowired private EntityCrudEventRepository entityCrudEventRepository; private void maybeTrackChange(Object entity, EntityCrudEventType eventType) { if (isTrackedEntity(entity)) { entityCrudEventRepository.add(new EntityCrudEvent(eventType, entity)); } }
@Override public void onPostInsert(PostInsertEvent event) { Object entity = event.getEntity(); maybeTrackChange(entity, EntityCrudEventType.CREATE); }
@Override public void onPostUpdate(PostUpdateEvent event) { Object entity = event.getEntity(); maybeTrackChange(entity, EntityCrudEventType.UPDATE); }
@Override public void onPostDelete(PostDeleteEvent event) { Object entity = event.getEntity(); maybeTrackChange(entity, EntityCrudEventType.DELETE); }
Agenda
• Why polyglot persistence?
• Using Redis as a cache
• Optimizing queries using Redis materialized views
• Synchronizing MySQL and Redis
• Tracking changes to entities
• Using a modular asynchronous architecture
Original architecture
WAR
RestaurantManagement
...
Drawbacks of this monolithic architecture
• Obstacle to frequent deployments
• Overloads IDE and web container
• Obstacle to scaling development
• Technology lock-in
WAR
RestaurantManagement
...
Need a more modular architecture
Using a message broker
Asynchronous is preferred
JSON is fashionable but binary format is more efficient
Modular architecture
Order taking
Restaurant Management
MySQL Database
RedisCacheRedis RabbitMQ
CONSUMER RESTAURANT OWNER
Event Publisher
Timer
Benefits of a modular asynchronous architecture
• Scales development: develop, deploy and scale each service independently
• Redeploy UI frequently/independently
• Improves fault isolation
• Eliminates long-term commitment to a single technology stack
• Message broker decouples producers and consumers
Step 2 of 2
for each CRUD event in MySQL queue get next CRUD event from MySQL queue
Publish persistent message to RabbitMQbegin MySQL transaction
mark CRUD event as processedcommit transaction
Message flowEntityCrudEvent
Processor
RedisUpdater
RABBITMQ
AvailableRestaurantManagementService
REDIS
Spring Integration glue code
RedisUpdater ⇒ AMQP<beans>
<int:gateway id="redisUpdaterGateway" service-interface="net...RedisUpdater" default-request-channel="eventChannel" />
<int:channel id="eventChannel"/>
<int:object-to-json-transformer input-channel="eventChannel" output-channel="amqpOut"/>
<int:channel id="amqpOut"/>
<amqp:outbound-channel-adapter channel="amqpOut" amqp-template="rabbitTemplate" routing-key="crudEvents" exchange-name="crudEvents" />
</beans>
Creates proxy
AMQP ⇒ Available...Service<beans> <amqp:inbound-channel-adapter channel="inboundJsonEventsChannel" connection-factory="rabbitConnectionFactory" queue-names="crudEvents"/>
<int:channel id="inboundJsonEventsChannel"/> <int:json-to-object-transformer input-channel="inboundJsonEventsChannel" type="net.chrisrichardson.foodToGo.common.JsonEntityCrudEvent" output-channel="inboundEventsChannel"/> <int:channel id="inboundEventsChannel"/> <int:service-activator input-channel="inboundEventsChannel" ref="availableRestaurantManagementServiceImpl" method="processEvent"/></beans>
Invokes service
Summary
• Each SQL/NoSQL database = set of tradeoffs
• Polyglot persistence: leverage the strengths of SQL and NoSQL databases
• Use Redis as a distributed cache
• Store denormalized data in Redis for fast querying
• Reliable database synchronization required
Questions?
@crichardson [email protected]://slideshare.net/chris.e.richardson/
Sign up for CloudFoundry.com using promo code cfjavaone