A multi-tool in computing clouds: Tuple Space
-
Upload
joerg-fritsch -
Category
Technology
-
view
108 -
download
1
description
Transcript of A multi-tool in computing clouds: Tuple Space
![Page 1: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/1.jpg)
Joerg Fritsch,
School of Computer Science & Informatics
Cardiff University, 16 January 2014
A multi-tool in computing clouds: Tuple Space
![Page 2: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/2.jpg)
• Key themes: parallelization, shared nothing, Challenging Data (aka “Big Data”)
• Tuple Space: the multi-tool
• Use Case (1): Overcoming limitations of tier based architectures
• Use Case (2): In Stream processing of Big Data
• Some miscellaneous remarks
2
Agenda
![Page 3: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/3.jpg)
• Eventually everything is about scalability.
• Scalable software: Make use of 1000s of cores– Distribution– Decomposition & modularity– Coordination
• Data does not fit in main memory– Distribution– Stream processing
• Need for speed: reduce time complexity
3
The why’s and how’s
![Page 4: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/4.jpg)
Key Themes: Parallelization
• Clouds will need to support scalable programs.
• Many programs have to parallelize relative small computations with high inter-dependency.
• “Any” application scaled through distribution over parallel (multicore) hardware.
• Everything “inside a cloud” is physically distributed (data, processing).
• Large scale distributed processing. “Many Core”.
4
![Page 5: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/5.jpg)
Key Themes: Shared Nothing
5
• Synchronization = shared “something” for example memory, disk, data(base)
• Asynchronous = shared “shared nothing”
• Avoid synchronization issues
• Abstract multithreading and parallelization issues away from the developer, i.e. actor model
• Highly scalable! –for example Erlang
![Page 6: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/6.jpg)
Challenging Data (aka “Big Data”)
6
• Data in computing clouds is challenging
• 3V Data (Gartner, 2001): Volume, Variety, Velocity
• Volume: perceived as “Big”
– Hadoop & traditional RDBMs often similar in data volume
– Differ in number of nodes (proportional to no. cores)
– Analytics
• Variety: unstructured data, data mashups
– Hadoop does not cast into schemes, rows, cols
• Velocity: streams
![Page 7: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/7.jpg)
Challenging Data (aka “Big Data”)
7
• Batch tasks are the prevailing computational model:– Map Reduce
– Computation over “offline” data set (on disks)
– Parallelized Polynomial time: Nm/k
• Stream Processing catching up: – Operating on real-time data
– N * log (N) time
– You only got ‘one shot’
– In memory data structures (e.g. Redis, Memcached)
– Examples: Storm project, AWS Kinesis, Apache S4,
![Page 8: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/8.jpg)
• Tuples are key-value pairs
• Tuple Space acting as Distributed Shared Memory (DSM)
• Four primitives to manipulate and store tuples: rd(), in(), out(), eval()
• No schema, ideal for unstructured data
• Tuples matched using associative lookup
• Associative lookup generally very powerful: CAM table/Routing, Data Flow programming & processors
• Commercial use as in-memory Data Grids
8
Tuple Space, Gelertner (1985)
![Page 9: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/9.jpg)
• Loose coupling
• Decoupled in
– Time
– Space
– Synchornization
• Distributed shared memory (DSM) vs distributed memory (“like MPI”)
9
Tuple Space, cont’d
![Page 10: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/10.jpg)
• In memory key-value store, can be persistent across system reboot
• No schema
• Keys matched with glob-style patterns in O(1) time– Good enough implementation of associative lookup
• Binary safe
• Other key-value stores may be equally suitable and have different advantages/disadvantages– Distributed Hash Tables (DHT)
– Memcached Distribution
– Dynamo Presence as application service in AWS
10
Redis Key-Value Store as Tuple Space
![Page 11: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/11.jpg)
• Coordination vs Threading
• Composition happens outside of the worker or agentcode– FPLs: composition and currying outside of functions
– Stream processing and composition of kernels
– Unix Pipes: application_1 | applications_3 | application_2
– Pipes/(Message Queues) represent the dataflow graph
• Error handling?– What happens to the mutable state if app_3 (or any of the
kernels) fail?
11
Coordination Language LINDA,Gelertner (1992)
app_1 app_3 app_2 std_out
![Page 12: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/12.jpg)
• Not enough expressive power! (for complex coordination)
• Ways to make it more sonic:– Algebra of communication processes (ACP)
– ACP generally quite suitable for streams, clicks, GUIs, Dataflow programming
– Constraint Handling Rules (CHRs)
– Agent Communicating through Logic Theories (ACLT), Omicini et al (1995), Denti et al (1998)
• For example: Barrier (i.e. MPI_barrier)/Eureka conditions, Turing powerful implementation
12
Coordination Language LINDA,cont’d
![Page 13: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/13.jpg)
• Database , Data Grid– No schema
• Key / Value store
• Extension to programming languages– Without adaptation not Turing-powerful
• Message bus, Message Queue
• Means of coordination– Workers, Agents, Skeletons
• Memory virtualization– Extension of main memory across physical boundaries
13
Recap: Tuple Space is like a(n) …
![Page 14: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/14.jpg)
14
Use Case (1 of 2)Overcoming limitations of tier based
architectures
![Page 15: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/15.jpg)
• Concept has been around since 1998
• Costly serialization (of data) required at every system boundary latency!
• Often depicted w three simple tiers: web server, application server and data(base)
• Many more devices & protocols involved: redundant load balancers, spanning tree, etc.
15
Tier-based architectures
![Page 16: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/16.jpg)
• To date: not many alternatives
• Space based architectures
– Gigaspaces
– Tibco activespace
• Notion of a one stop shop
– Networks L2 Ethernet fabrics
– Networks Integrated packet processing
• Nobody wants to hit a spindle!
– In-memory computing
16
Tier-based architectures(alternatives)
![Page 17: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/17.jpg)
17
The end of Tier-based architectures
Source: http://wiki.gigaspaces.com
![Page 18: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/18.jpg)
18
The end of Tier-based architectures (cont’d 1)
Source: http://wiki.gigaspaces.com
![Page 19: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/19.jpg)
19
The end of Tier-based architectures (cont’d 2)
Space based cloud platformNo tiersImplicit load balancingHarmonization of messaging, data and coordination
Traditional tier-based cloud platform
![Page 20: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/20.jpg)
20
Use Case (2 of 2)In stream processing of Big Data
![Page 21: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/21.jpg)
21
“More programmer-friendly parallel dataflow languages await discovery, I think. Map Reduce is one (small) step in that direction.”
Engineer-to-Engineer LecturesJeff HammerbacherJune 2010
![Page 22: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/22.jpg)
• Stream
– An unbounded sequence of tuples
• Map Reduce excels in ad-hoc queries, no fit for recursion ≠ machine learning (ML)
• Error resilient: Stateful stream processing
– Redis knows transactions
– Tuple space can contain global mutual state
• Tuple vs Batch / Fine grain vs coarse grain
22
Stream processing of 3V Data
![Page 23: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/23.jpg)
• Redis has a built-in Lua interpreter to manipulate data
• Commercial tuple spaces are mostly “reactive”
• Context-based recursion on portion of data that is in memory (aka “granularity”)
23
(Reactive) in Memory Tuple Space
![Page 24: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/24.jpg)
24
Tuple space architecture for in stream processing of Big Data
![Page 25: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/25.jpg)
25
Commonalities
FPLs & Flow based Programming
(Johnston, 2004)
Immutable Data. Shared nothing.
Freedom of side effects
Locality of effects
Lazy evaluation
Data dependency equivalent to scheduling
FPLs & Tuple Space
(Fritsch & Walker, 2013)
Coordination
Distribution
Decoupling
Inter process communication (IPC)
![Page 26: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/26.jpg)
26
Commonalities cont’d
Flow based Programming & Tuple
Space
Both need “a space”
IP Space in Flow based programs
Tuple Spave in LINDA
Altogether
(Data) Queues
Coordonation does not need to reckon w side effects
Coordination & composition
Representation of dataflow graph in place of a (thread) call graph
![Page 27: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/27.jpg)
• News/RSS streams• Clicks
– Online advertisement analytics (e.g. spider.io)– URLs (e.g. bit.ly)– GUI programming
• Logistics & Transportation• Media
– GPUs (streams + kernels)
• Mashups: create new wisdom from multiple data sources (incompatible in velocity, volume, variety/structure)– Separate errors
• Debit card transactions– Data Masking– Fraud detection/Feedback Context Mashups
27
Real World Applications
![Page 28: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/28.jpg)
• The ultimate mashup: batch data (aka “map reduce”) and speed data (aka “streams”)
– Lambda architecture
– Complementary to each other (e.e. Apache Spark, Lambda Architecture)
• Currently three paradigms: RDBMs, Map Reduce, Streams.
– Distributed query processing is a key element
28
Points to ponder
![Page 29: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/29.jpg)
• Tuple Space is a piece of software as well
• Scalability of tuple space
– Distribution vs fast in memory computation
• Complex coordination is a must!
– So is error handling (stream replay?)
• Number of supporting elements needed
– (auto) scaler
– cloud-like deployment: DevOps recipes
– Zookeper?
29
Issues
![Page 30: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/30.jpg)
30
Thank you
![Page 31: A multi-tool in computing clouds: Tuple Space](https://reader034.fdocuments.net/reader034/viewer/2022051612/54c671cc4a7959e37d8b45e2/html5/thumbnails/31.jpg)
Denti, Enrico, Antonio Natali, and Andrea Omicini. "On the expressive power of a language for programming coordination media." Proceedings of the 1998 ACM symposium on Applied Computing. ACM, 1998.
Fritsch, Joerg, and Coral Walker. "CMQ-A lightweight, asynchronous high-performance messaging queue for the cloud." Journal of Cloud Computing 1.1 (2012): 1-13.
Fritsch J. Walker C. (2013), “Cwmwl, a LINDA-based PaaS fabric for the cloud”, Journal of Communications, SI on Cloud and Big Data (to be published)
Fritsch, Joerg, and Coral Walker. "CMQ-A lightweight, asynchronous high-performance messaging queue for the cloud." Journal of Cloud Computing 1.1 (2012): 1-13.
Gelernter, David. "Generative communication in Linda." ACM Transactions on Programming Languages and Systems (TOPLAS) 7.1 (1985): 80-112.
Gelernter, David, and Nicholas Carriero. "Coordination languages and their significance." Communications of the ACM 35.2 (1992): 96.
Johnston, Wesley M., J. R. Hanna, and Richard J. Millar. "Advances in dataflow programming languages." ACM Computing Surveys (CSUR) 36.1 (2004): 1-34.
Omicini, A., Denti, E., & Natali, A. (1995). Agent coordination and control through logic theories. In Topics in Artificial Intelligence (pp. 439-450). Springer Berlin Heidelberg.
31
References