CSCE678 - Datacenterscourses.cse.tamu.edu/chiache/csce678/s19/slides/datacenter.pdf · Types of...

Datacenters

CSCE678

Announcement

• Pop quiz results will be on GradeScope (soon)

• No class meeting on Jan 18;

Lecture video will be available on blackboard

• No class meeting on Jan 21 (MLK Day)

Everything is in the Cloud Nowadays

Datacenters Across the Globe

Datacenters with 100,000+ Servers

Google Datacenter

Microsoft Datacenter

Facebook Datacenter

Internal of a Data Center

Server

Cluster

Aggregating Resources

Google TensorFlow Processing Units (TPUs)

1 TPU Pod = 64 TPUv2= 11.52 Petaflops (11.52 x 1015 float-point ops per sec)

TPU RacksCPU Rack CPU Rack

Why Datacenters?

• Cost Efficiency:• Reduce facility, management, power cost

• Elasticity:• Adaptive resource allocation for customers’ need

• Pay-per-use

• Multi-tenancy:• Accommodating multiple users in one infrastructure

• Ease of management:• Fault/crash tolerance and disaster recovery

Cost of Maintaining a Datacenter

Types of Cloud Deployments

• Private:• Datacenters run by organizations (e.g., banks, DoD)

• Strong isolation but expensive to build

• Public:• AWS, Microsoft Azure, GCP, etc

• Rented by public

• Hybrid:• Private cloud using public cloud as backend or backup

• Private cloud hosted by public cloud

Not Just a Collection of Servers

Microsoft Azure

Azure SQL Database

SQL Data Warehouse

RedisCache

SQL Server on Virtual Machines

Azure Database

for PostgreSQL

Azure Cosmo DB

Google Cloud

Platform

Google Cloud

Storage

Google Cloud SQL

Google Cloud

Bigtable

Compute Engine

Google App Engine

CDN to CLoud

Multi-Tier Data Centers

Front End

Webservers

Back End

DB &Storage

Aggregatorsor middle tieror compute nodesor microservices

Workers

Front-End Services

• Directly deal with data from/to users

• Latency matters a lot

• Oftentimes dealing with mobile apps, streaming

service, telecommunication, etc

Middle-Tier Services

• Each node has a specific task

• Installed on commodity computers ➔ Highly elastic

and easy to scale up

• Example:

• Data processing: Hadoop, Spark

• Caching: Redis, Memcached

• Application servers: JBOSS, Tomcat

Back-end Services

• Storage/DB or batch processing

• Sometimes are separately rented and connected to

other cloud ➔ Ex: AWS S3 or EBS

• Often need all-to-all connectivity to the cloud

• Example:

• Sorting / filtering / indexing

• Federated learning

• Databases / key-value stores

Types of Cloud Services

• Software as a Service (SaaS)

• Vendors sell licensed cloud software to customers

• Naturally multi-tenant and scalable

• Example: Google Doc, Office 365, Saleforces.com

• Platform as a service (PaaS)

• Vendors/Providers provide development platforms

• Rapid deployment

• Example: Google AppEngine

Types of Cloud Services

• Infrastructure as a Service (IaaS)

• Providers provision computing resources• vCPU, memory, network, disks, etc

• Customers is provided with virtual machine instances

• a1.large, t3.medium, t2.micro, etc

• Customers have control over OS, storage, system configuration, etc

• Example: AWS EC2

Other New Service Models (I)

• Function as a Service (FaaS)

• Developers deploy functions, not VMs

• Event-driven, pay-per-use

• Spread out execution to 5000+ cores in < 1 sec

• Backend as a Service (BaaS)

• Connect web/mobile apps with APIs or SDKs

• Example: connect cloud services (DB, LDAP, etc) to a unified REST API for mobiles

FaaS + BaaS = Serverless Computing

Other New Service Models (II)

• Database as a Service (DBaaS)

• Provide users access to a relational or NoSQL database

• Big Data as a Service (BDaaS)

• Programmable analytic platform for users (i.e., PaaS)

• Security as a Service (SECaaS)

• Security primitives (single sign-on, antivirus, intrusion detection, etc) for corporates

Service Level Agreement (SLA)

• A contract about expectation and responsibility of

cloud providers and customers

• Comes with financial penalties

• Need to be defined in precise metrics

• Service Availability

• Defect Rates

• Technical Quality

• Security

Examples of SLAAvailability % Downtime per year Downtime per month Downtime per week

90% ("one nine") 36.5 days 72 hours 16.8 hours

95% 18.25 days 36 hours 8.4 hours

97% 10.96 days 21.6 hours 5.04 hours

98% 7.30 days 14.4 hours 3.36 hours

99% ("two nines") 3.65 days 7.20 hours 1.68 hours

99.50% 1.83 days 3.60 hours 50.4 minutes

99.80% 17.52 hours 86.23 minutes 20.16 minutes

99.9% ("three nines") 8.76 hours 43.8 minutes 10.1 minutes

99.95% 4.38 hours 21.56 minutes 5.04 minutes

99.99% ("four nines") 52.56 minutes 4.32 minutes 1.01 minutes

100.00% 26.28 minutes 2.16 minutes 30.24 seconds

99.999% ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds

99.9999% ("six nines") 31.5 seconds 2.59 seconds 0.605 seconds

99.99999% ("seven nines") 3.15 seconds 0.259 seconds 0.0605 second

SLA Decides Everything

• Cloud providers make decisions based on SLA

• Resource Allocation

• Disaster & failure recovery

• … and a lot more

Resource Allocation

Tenant 1: 2 CPUs, 4GB RAM

4 CPUs

8GBRAM

2 CPUs

12GBRAM

Server A Server B

Resource Allocation

4 CPUs

8GBRAM

2 CPUs

12GBRAM

Server A Server B

Resource Allocation Failure

4 CPUs

8GBRAM

2 CPUs

12GBRAM

Server A Server B

Tenant 3: 2 CPUs, 6GB RAM12

Enough spared resources,but not on one server.

Overprovisioning vs Reserved

• Virtualization Technology allows overprovisioning

• Offering more resources than you have

• Example: 8GB RAM shared by 4 tenants who want 4GB

• Assuming they won’t use 4GB simultaneously

• Problem: cannot guarantee availability and

performance isolation

• Vendors are worried about violating SLA

• Out of memory will cause thrashing

• Generally, vendors just reserve resources for customers

Resource Disaggregation

• Break down a workload into smaller ones to make

allocation easier

• 2 CPU, 6GB RAM ➔ (1 CPU, 3GB RAM) * 2

• Occupation time also matters

• Short occupation ➔ easier for time-sharing

• Example: FaaS

• Proportional DRAM (128/192/…/3008 MBs) and CPUs

• Time limit (10 mins)

Dealing with Failures

• Even a small server room has failures

• A warehouse-size datacenter faces failures daily;

cannot be prevented at hardware level.

• Can only handle at software & management level

Failure Rates of Datacenters

• According to Google fellow Jeff Dean (2008),

in the first year of a datacenter:

• Typically 1,000 machine failures

• 500-1,000 machines are down for ~6 hours

• 20 racks with 40-80 machines vanish from network

• 5 racks got half of the packets missing

• 50% chance of overheating in the whole cluster, taking down all servers in 5 mins and taking 1-2 days to recover

Large Disasters in Datacenters

Apr 20, 2014Samsung SDS datacenter burning during a fire,causing disruption in global data services

Replication

Replica 1

Replica 2

Datacenter A

Datacenter B

Replica 3Sync

Availability Zones

References

• “The Datacenter as a Computer – An Introduction to the Design of Warehouse-

Scale Machines”, 2nd Edition, by Barroso, Clidaras, and Hölzle

• Course material: “Datacenter Fundamentals: The Datacenter as a Computer”,

by George Porter, UCSD

• “What is Resiliency in Azure?”

(https://azure.microsoft.com/mediahandler/files/resourcefiles/azure-resiliency-

infographic/Azure_resiliency_infographic.pdf)

CSCE678 - Datacenterscourses.cse.tamu.edu/chiache/csce678/s19/slides/datacenter.pdf · Types of...

Documents

Transcript of CSCE678 - Datacenterscourses.cse.tamu.edu/chiache/csce678/s19/slides/datacenter.pdf · Types of...

CSCE678 - MapReduce & Hadoopcourses.cse.tamu.edu/.../csce678/s19/slides/mapreduce.pdfMapReduce (1/3) •A programming model for enabling: •Automatic parallelization and distribution

Investimento R$ 1.280,00 ou € 640,00 - manoelveras.com.br datacenter.pdf · INFRAESTRUTURA DE TI VIÇO - CLOUD ACENTER ACENTER EM CAMADAS – TIER ALAÇÕES ENERGIA REFRIGERAÇÃO

A survey of attacks on Ethereum smart contractscourses.cse.tamu.edu/chiache/csce678/s19/download/atzei16.pdfcontracts secure. Indeed, several security vulnerabilities in Ethereum smart

Spark - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/spark.pdf · SparkSQL & Dataframe Catalyst Optimizer Spark Streaming Mllib (Machine learning) GraphX (Graph

CSCE678 - CAP Theoremcourses.cse.tamu.edu/.../csce678/s19/slides/CAP-theorem.pdf · 2019-04-29 · CAP Theorem •Trade-offs of three properties in a distributed system •Consistency:

AWS vs. Azure vs. GCPcourses.cse.tamu.edu/chiache/csce678/s19/slides/cloud-platforms.pdf · •AWS Lambda in 2014, followed by Azure Function & Google Cloud Function in 2016 •AWS

CSCE678 - Network Function Virtualizationcourses.cse.tamu.edu/chiache/csce678/s19/slides/nfv.pdf · •“ClickOS and the Art of Network Function Virtualization”, by Martins et

Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Considerations for Building Multi- Datacenter Applicationsjeffpoole.net/talks/multi-datacenter.pdf · - Works well if a user's data "lives" in one DC - Better latency to only have

TRUNG TÂM - fptidc.com.vnfptidc.com.vn/FTI-DataCenter.pdf · lực lao động sáng tạo trong khoa học kỹ thuật và ... Không gian dự phòng cho việc mở ... Tuyến

with VMware - Webs.nsit.com/.../pdf/high-performance-datacenter.pdf · facility failures, natural and manmade disasters, and day-to-day maintenance should not impact the service levels

ZooKeeper - courses.cse.tamu.educourses.cse.tamu.edu/chiache/csce678/s19/slides/zookeeper.pdf · ZooKeeper CSCE 678. Weak vs Strong Consistency • In distributed systems, consistency

Cloud and Datacenter Networking - unina.itwpage.unina.it/rcanonic/didattica/dcn/lucidi/DCN-L01-v01-DATACENTER.pdf · Datacenter: rack and cabinets A cabinet is the enclosure that

CSCE678 - Ethereum & Smart Contractscourses.cse.tamu.edu/chiache/csce678/s19/slides/ethereum.pdf · 2019-04-29 · Bitcoin vs Ethereum •Bitcoin: miners (& other users) only has

Professional Reliable High-end - Jonhonjonhon.us/pdf/datacenter.pdf · Rated voltage: 220V 3. ... Load current of the power contacts 15A, ... each air-conditioner can provide 30KW

Foreshadow: Extracting the Keys to the Intel SGX Kingdom ...courses.cse.tamu.edu/chiache/csce678/s19/download/vanbulck18.pdf · Various off-the-shelf Blu-ray players and initially

MIT Data Center Trade Off Analysiskepner/2005dec09-MIT-DataCenter.pdf · 2005. 12. 9. · MIT Lincoln Laboratory Slide4 Data Center Motivation • In the commercial world, large scale

StruxureWare for Data Centers Software for DataCenter.pdf · energy management, and operational performance and efficiency. With StruxureWare for Data Centers software, you can see

คู มือการใช งานระบบ Datacenter ในการรายงานhome.dsd.go.th/sdpaa/doc/Datacenter.pdf · datacenter โดยจะแสดงข

OpenStack The foundational tool of the new datacenter.pdf