Apsara OS: Behind Cloud Computing
description
Transcript of Apsara OS: Behind Cloud Computing
![Page 1: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/1.jpg)
Apsara OS:Behind Cloud Computing
倪浩平台技术部 - 系统平台 - 计算架构
![Page 2: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/2.jpg)
Cloud Computing
• 一个时髦名词,囊括了:– Coding– Architecture– Load Balance– Business Model– Service
• 对用户来说:一个像用电一样使用存储、计算、 IO 资源的服务
![Page 3: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/3.jpg)
3 Models of Cloud Computing
• IaaS– Infrastructure as a Service
• PaaS– Platform as a Service
• SaaS– Software as a Service
![Page 4: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/4.jpg)
Google, Amazon & Salesforce
• Amazon– EC2, S3, SQS, SimpleDB– Server Instance (powered by Xen)– Open
• Google– Google App Engine, Google Apps– GFS, BigTable, Borg, MapReduce, Chubby– Extremely Scalable
![Page 5: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/5.jpg)
Things Hide Behind• 一个超级计算机,极强的扩展性和可靠性– Scalability, The Ultimate Goal
• Utility Computing– 将超级计算机的资源提供给用户使用,包括用户管理,权限控制,
计费等
Super Computer
Utility Computing
Users
App Framework (EC2, S3, AppEngine)
![Page 6: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/6.jpg)
如何构建超级计算机?• Goal : Scalable & Reliable Super Computer• 方法:– 基于廉价服务器,编写一个可靠的、扩展性极
强的分布式的操作系统• 文件系统• 进程调度系统• 内存管理 & 设备驱动• 用户界面
![Page 7: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/7.jpg)
Apsara 分布式操作系统
盘古 (Storage)
伏羲
Scheduler
神农
Monitor
有巢 Structured Data
Processing
女娲 (Naming) 夸父 (Communication) 仓颉 (Language) Security
Apsara OS API
![Page 8: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/8.jpg)
盘古 :分布式文件系统• 目标:– 可靠、高效的大文件的读写服务– 可扩展到上万台机器 , PB 级别的存储能力– Ten millions 个文件
• 非目标:– 高性能的小文件存取能力– 结构化数据的存储– 随机写
![Page 9: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/9.jpg)
难题• 廉价服务器随时宕机,天经地义– 如何保证可靠性
• 业务增长迅速,数据海量增长– 如何能做到线性扩展
• 异构的环境– 网络– 机器– 时间
![Page 10: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/10.jpg)
盘古的架构
![Page 11: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/11.jpg)
盘古的架构• 文件被拆分为多个 64MB 的块 (chunk)• 每个 chunk 有多个拷贝,以保证可靠性• Chunk 的拷贝位于不同的机器上,当一个机器宕
机时,文件仍然可用;并且盘古会立刻再选择一个机器复制一个拷贝
• [mincopy, maxcopy] Chunk 1@server1
Chunk 1@server2
Chunk 1@server3
Chunk 2@server2
Chunk 2@server3
Chunk 2@server4
Chunk 3@server3
Chunk 3@server4
Chunk 3@serve1
Chunk 4@server4
Chunk 4@server1
Chunk 4@server2
![Page 12: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/12.jpg)
盘古的架构• 多个拷贝之间的数据同步– 写入数据保证各个拷贝之间的一致性– 只有当每个拷贝都写入了数据时,文件的 meta
信息才被修改• 读取数据– 当读取一个 chunk 的拷贝,服务器宕机时,自
动切换到其他服务器– 并行读取
![Page 13: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/13.jpg)
可扩展性设计 • 瓶颈:盘古 Master– Master 只做很少工作,管理系统的元数据– 读操作在获取了元数据不和 master 交互
![Page 14: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/14.jpg)
伏羲:分布式调度系统• 将一堆独立的机器变成一个– 共同协作的– 高度可靠、可扩展的– 容易分布式编程的系统。
![Page 15: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/15.jpg)
伏羲的架构• 支持 Service 和 Job 两种框架– 处理了冗余容错、负载均衡等各种工作– 提供了多种分布式计算的模式
• Service– 一直运行的服务程序
• Job– 将任务划分为 DAG 图 ( 有向无环图 )– DAG 中的每个节点叫做一个 Task– 支持 map reduce, sort reduce 等各种编程模式
![Page 16: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/16.jpg)
伏羲的架构
![Page 17: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/17.jpg)
伏羲的架构
![Page 18: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/18.jpg)
伏羲的哲学• 世界 World – Machine, Network, Job, Service…– 很大 Massive– 动态的 Dynamic– 不一致的 Inconsistent
• 现实 (Reality)– 世界在发生什么事情
• 预测 (Prediction)– 预测下一时刻世界会发生什么事情
• 愿望 (Vision)– 根据现实和预测产生的调度结果
![Page 19: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/19.jpg)
调度的原则• 因为永远无法获得世界的即时状况,伏羲通过–粗粒度 & 及时的调度– 不断地优化– 分而治之这些原则来调度所有的 Service 和 Job
![Page 20: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/20.jpg)
挑战• 预测未来• Scalability
![Page 21: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/21.jpg)
Sort A Large File
![Page 22: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/22.jpg)
有巢:结构化数据处理系统• 基于盘古的存储、伏羲的调度• KeyValueEngine– 存取小的 Key Value Pair :小文件,网页
• SQLEngine– 以表的形式,提供对 SQL 的支持–日志分析,表 join
• IndexEngine–搜索引擎 build索引,查询
![Page 23: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/23.jpg)
有巢的逻辑• 所有结构化数据都可以抽象为一个或多个
KeyValue Pair
ID Name Age Sex
001 Ivan 28 Male
002 Helen 27 Female
003 Tom 26 Male
<‘001’ , ‘Name’ > = ‘Ivan’<‘001’ , ‘Age’ > = 28<‘001’ , ‘Sex’ > = ‘Male’<‘002’ , ‘Name’ > = ‘Helen’<‘002’ , ‘Age’ > = 27<‘002’ , ‘Sex‘ > = ‘Female’<‘003’ , ‘Name’> = ‘Tom’<‘003’ , ‘Age’ > = 26<‘003’ , ‘Sex ‘ > = ‘Male’
Tablet 1
Tablet 2
![Page 24: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/24.jpg)
有巢的存储• 按照 Key Range 将数据分成多个 Partition• 每个 Partition 内部多级 Index 来查找数据
Partition
Youchao File (non-mutable sorted)
BlockCellCell
MemFile
RedoLog
![Page 25: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/25.jpg)
SQL Query: 一个伏羲的 Job
• SELECT MAX(salary), AVG(slaray)• FROM employees, departments• WHERE employees.age > 35 AND • employees.department_id = departments.department_id• GROUP BY employees.department_id;
![Page 26: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/26.jpg)
女娲: Naming & 分布式锁• 为外部提供了–命名服务• 一个服务程序拥有永久的名称: nuwa://
nuwa_address/ 盘古 /master• 即时运行这个程序的服务器宕机了,对其他应用来
说透明– 选举服务• 多个 Master 选举
• 自身通过 PAXOS 算法来保证相当极其高的可靠性
![Page 27: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/27.jpg)
夸父 & 神农• 夸父:网络传输– 以女娲为基础,提供了 RPC通信机制
• 神农:–监控所有进程的状态,独立于伏羲–收集性能数据,分树状结构向上汇报
RpcEndPoint rpc(“nuwa://nuwa_addr/pangu”);rpc.AsyncCall(“CreateFile”, “/home/alibaba/”);
![Page 28: Apsara OS: Behind Cloud Computing](https://reader038.fdocuments.net/reader038/viewer/2022102819/568144f1550346895db1c23b/html5/thumbnails/28.jpg)
Q&A