NoSQL matters in Catchoom Recognition Service

download NoSQL matters in Catchoom Recognition Service

If you can't read please download the document

description

David Arcos from Catchoom presented at NoSQLMatters Barcelona (6 Oct 2012) how Catchoom Recognition Service (a SaaS platform for visual recognition) was implemented using Redis and other deployment tools. David argues about the necessity of NoSQL for critical components of the service.

Transcript of NoSQL matters in Catchoom Recognition Service

  • 1. NoSQL matters inCatchoom Recognition ServiceDavid Arcos [email protected] | @DZPM catchoom.com | @catchoomcatchoom.com | @catchoom

2. 1) Introduction 2) What did we need? 3) How we build it 4) Advantages of NoSQL 5) Cool uses of NoSQL 6) Limits 7) ConclusionDavid Arcos | @DZPM catchoom.com | @catchoom 3. Hi! Im David Arcos - Python/Django developer (>4yr) - Web backend, distributed systems, databases, scalability, security - Team leader at Catchoom - You can follow me at @DZPMDavid Arcos | @DZPMcatchoom.com | @catchoom 4. Catchoom technology recognizes anobject by searching through a largecollection of images in a fraction of asecond.Catchoom targets applicationdevelopers and integrators.David Arcos | @DZPM catchoom.com | @catchoom 5. Our customers are leaders in Augmented RealityDavid Arcos | @DZPM catchoom.com | @catchoom 6. Visual Recognition:Identify an object in front of the camera by comparing itto a huge collection of reference imagesDavid Arcos | @DZPM catchoom.com | @catchoom 7. Examples of recognized objects:- CD/DVD and book covers- Newspapers and magazines- Logos and brands- Posters- Packaged goods- Monuments and placesDavid Arcos | @DZPM catchoom.com | @catchoom 8. Catchoom Recognition Service: - Cloud-based Visual Recognition (SaaS) - RESTful API to integrate - Add VR features to your app/platformDavid Arcos | @DZPM catchoom.com | @catchoom 9. - Small team of 4 developers, doing SCRUMDavid Arcos | @DZPMcatchoom.com | @catchoom 10. 1) Introduction 2) What did we need? 3) How we build it 4) Advantages of NoSQL 5) Cool uses of NoSQL 6) Limits 7) ConclusionDavid Arcos | @DZPM catchoom.com | @catchoom 11. Minimum requirements:- a public API for the final users to perform VisualRecognition- a private API for the customer to manage theCollections and get statistics- a nice website for the customer, providing thefunctionality of both APIsDavid Arcos | @DZPMcatchoom.com | @catchoom 12. Must be flexible:- A customer who does Augmented Reality, andneeds a 3D model (binary format) in the item- Another one who needs just the item id- Our data model needs to allow everything(structured and unstructured data)David Arcos | @DZPMcatchoom.com | @catchoom 13. Must be reliable:- Images or data should never be lost- Avoid single points of failure- We need redundancyDavid Arcos | @DZPM catchoom.com | @catchoom 14. Must be very fast:Layar has been using Catchooms Visual Search technology since thelaunch of Layar Vision, allowing users to quickly view the AR content placedon top of images by just pointing their camera to the image.Weve benchmarked Catchooms technology in 2011 against 3 of their maincompetitors and found they had the best results both on speed and onsuccessful matches (including lowest false positives)Dirk Groten CTO of LayarDavid Arcos | @DZPMcatchoom.com | @catchoom 15. 1) Introduction 2) What did we need? 3) How we built it 4) Advantages of NoSQL 5) Cool uses of NoSQL 6) Limits 7) ConclusionDavid Arcos | @DZPM catchoom.com | @catchoom 16. Technology stack: - Development: Python, Django, Tornado, Gevent - Deployed using: Supervisord, Nginx, gunicorn, Fabric - AWS: EC2, S3, ELBDavid Arcos | @DZPM catchoom.com | @catchoom 17. The Panel:- typical customer portal:- manage your Collections, run Visual Recognition- get usage statistics- and configure the payment method :)David Arcos | @DZPMcatchoom.com | @catchoom 18. David Arcos | @DZPM catchoom.com | @catchoom 19. David Arcos | @DZPM catchoom.com | @catchoom 20. Mobile apps:- for Android, iOS- use the Visual Recognition API- the code will be publishedDavid Arcos | @DZPMcatchoom.com | @catchoom 21. Data models:- Collection: a set of items. Has at least one token.- Item: has at least one Image. Has metadata.- Image: you want several images if the item has differentsides, logos, flavours...- Token: for authenticating the requests.David Arcos | @DZPMcatchoom.com | @catchoom 22. Components:- the platform is highly modular- Do one thing, and do it well- they pass json messages- optimized hardware settingsDavid Arcos | @DZPMcatchoom.com | @catchoom 23. - Frontend:gets the API request- Extractor:extracts the visual points- Collector:message exchange- Searcher:looks for matchesDavid Arcos | @DZPMcatchoom.com | @catchoom 24. Required NoSQL features:- key-value storage- cache- message lists- message pub/sub- real-time analysisWhat servers have we chosen?David Arcos | @DZPMcatchoom.com | @catchoom 25. Required NoSQL features:- key-value storage- cache- message lists- message pub/sub- real-time analysisDavid Arcos | @DZPMcatchoom.com | @catchoom 26. Required NoSQL features:- key-value storage- cache- message lists- message pub/sub- real-time analysis- and Filesystem:David Arcos | @DZPMcatchoom.com | @catchoom 27. 1) Introduction 2) What did we need? 3) How we build it 4) Advantages of NoSQL 5) Cool uses of NoSQL 6) Limits 7) ConclusionDavid Arcos | @DZPM catchoom.com | @catchoom 28. Performance:- Cant afford writing to disk, or querying slow databases- Using Redis, everything stays on memory- One V.R. query takes just 300 msDavid Arcos | @DZPM catchoom.com | @catchoom 29. Scalability:- Need to scale different components, separately- Load balancing using Redis Lists: BLPOP: Remove and get the first element in a list, or block until one is available- But focus on the bottlenecks!David Arcos | @DZPM catchoom.com | @catchoom 30. Unstructured data: query- A query object has many optional parameters - each component can add/remove fields dynamically - schema change between versions- Cant fit in a SQL table- We model the query in Redis as a jsonDavid Arcos | @DZPM catchoom.com | @catchoom 31. Unstructured data: metadata- Metadata is optional and unstructed, can be from a json to abinary blob- Cant fit in a SQL table, and would be too slow- Serve the data from Redis, and use S3 as a backup- Warning: in the future, if we have huge metadata files,Redis will get out of memory. Well improve this approachDavid Arcos | @DZPM catchoom.com | @catchoom 32. Availability:- Avoid single points of failure. Replicate everything!- Replicating a SQL server is painful- Redis instances configured as Master/Slave - When the master dies:- promote a slave to be the new master- reconfigure the other slaves to use this new master - Redis Sentinel does this (beta)David Arcos | @DZPMcatchoom.com | @catchoom 33. 1) Introduction 2) What did we need? 3) How we build it 4) Advantages of NoSQL 5) Cool uses of NoSQL 6) Limits 7) ConclusionDavid Arcos | @DZPM catchoom.com | @catchoom 34. Do real-time calculations:- Usage statistics - total, monthly, daily, hourly - per image, item or collection- Metric monitoring for internal use - response times, queue size, etc- QoS: enforce rate limiting - max hits per minuteDavid Arcos | @DZPMcatchoom.com | @catchoom 35. Sorted Sets:- To create indexes and filters- In example, Most recognized images (sorted by hits)- Updating the Sorted Set, no need to reconsolidate: ZADD Add one or more members to a sorted set, or update its score if it already existsDavid Arcos | @DZPMcatchoom.com | @catchoom 36. Cache:- Redis is compatible with memcached API- Cache everything: - Sessions, metadata, etc- ...although the website is internal: no bottleneck here- Better focus on optimizing other stuff!David Arcos | @DZPM catchoom.com | @catchoom 37. Volatile data:- Redis can set an expiration time for a value- Very easy for: - implementing timeouts - removing old queries - adding temporary cappingDavid Arcos | @DZPMcatchoom.com | @catchoom 38. Messages:- Redis implements pub/sub and lists.- Publish/Subscribe to a channel - all components get the message - use it for monitoring- List: push/pop messages - only one component gets the message - use the blocking versions for load balancingDavid Arcos | @DZPM catchoom.com | @catchoom 39. 1) Introduction 2) What did we need? 3) How we build it 4) Advantages of NoSQL 5) Cool uses of NoSQL 6) Limits 7) ConclusionDavid Arcos | @DZPM catchoom.com | @catchoom 40. Django apps compatibility:- we use Django and several contrib and external apps. - (Standing in the shoulder of giants)- but no support for NoSQL in Django ORM- dropping SQL is not an option!- we use MySQL. South migrations.David Arcos | @DZPMcatchoom.com | @catchoom 41. 1) Introduction 2) What did we need? 3) How we build it 4) Advantages of NoSQL 5) Cool uses of NoSQL 6) Limits 7) ConclusionDavid Arcos | @DZPM catchoom.com | @catchoom 42. Summary:- We use a combination of SQL and NoSQL- Using NoSQL was necessary to meet the requirements- There are a lot of different uses for NoSQLDavid Arcos | @DZPM catchoom.com | @catchoom 43. Recommendations:- There is no silver bullet- Use the best tool for each task- But avoid unneeded complexity!- Try Redis. Dont do a migration, just add it to your stackDavid Arcos | @DZPMcatchoom.com | @catchoom 44. Thanks for attending!- Our beta will be ready soon.Get a free trial at http://catchoom.com- Contact me [email protected] Questions?David Arcos | @DZPM catchoom.com | @catchoom