Adwords Seminar 4: Remarketing - Overlaying it with "normal" Adwords
QNIBTerminal: Understand your datacenter by overlaying multiple information layers.
-
Upload
qnib-solutions -
Category
Technology
-
view
107 -
download
1
Transcript of QNIBTerminal: Understand your datacenter by overlaying multiple information layers.
OSDC 2014
Overlay Datacenter InformationChristian Kniep Bull SAS2014-04-10
❖ R&D [OpsDev?](>1y)
❖ DevOps (>4y)
❖ SysOps v1.1 (>8y)
❖ BSc (2008-2011)
❖ SysOps (>10y)
About Me
❖ Me (>30y)
2
❖ QNIBTerminal (virtual cluster using docker)
❖ QNIB/ng
❖ Motivation (InfiniBand use-case)
❖ Cluster Stack
Agenda
3
Cluster Stack
QNIBngIB
I.
QNIBTerminal
II.
III.
Cluster Stack Work Environment
4
Cluster?
5
„A computer cluster consists of a set of loosely connected or tightly connected computers that work together so that in many respects they can be viewed as a single system.“ - wikipedia.org
User
HPC-Cluster
6
High Performance Computing
❖ HPC: Surfing the bottleneck
❖ Weakest link breaks performance
Cluster Layers
7
Hardware: IMPI, lm_sensors, IB counterOperating System: Kernel, Userland toolsMiddleWare: MPI, ISV-libsServices: Storage, Job Scheduler, sshdSoftware: End user application
(rough estimate)
EndUser
Excel: KPI, SLA
Mgm
t
SysO
psPow
er User/ISV
SysO
ps M
gmt ISV Mgm
t
SysO
ps L
2
SysO
ps L
1
Events Metrics
SysO
ps L
3
Layern
❖ Every Layer is composed of layers
❖ How deep to go?
8
❖ Niche solutions misleading
❖ Experience driven
❖ Connecting is manual labour
❖ No way of connecting them
Little Data w/o Connection
9
❖ Multiple data sources
IB + QNIBng Motivation
10
Modular Switch
11
❖ Looks like one „switch“
Modular Switch
12
❖ Looks like one „switch“
❖ Composed of a network itself
Modular Switch
13
❖ Looks like one „switch“
❖ Composed of a network itself
❖ Which route is taken is transparent to application
❖ LB1<>FB1<>LB4
Modular Switch
14
❖ Looks like one „switch“
❖ Composed of a network itself
❖ Which route is taken is transparent to application
❖ LB1<>FB1<>LB4
❖ LB1<>FB2<>LB4
Modular Switch
15
❖ Looks like one „switch“
❖ Composed of a network itself
❖ Which route is taken is transparent to application
❖ LB1<>FB1<>LB4
❖ LB1<>FB2<>LB4
❖ LB1 ->FB1 ->LB4 / LB1 <-FB2 <-LB4
❖ changing one plug, recomputes routes :)
❖ Relevant information
❖ Job status (Resource Scheduler)
❖ Routes (IB Subnet Manager)
❖ IB Counter (Command Line)
❖ multiple autonomous job-cells
❖ 96 port switch
Debug-Nightmare
16
❖ Job seems to fail due to bad internal link
Communication Networks
IBPM: Demo Overview Background: InfiniBand (IB)
Rate Measurement in IB Networks
IBPM: Demo Scenarios
IBPM: Open-Source-Based InfiniBand Performance Monitoring
Michael Hoefling, Michael Menth, Christian Kniep, and Marcus Camen: "IBPM: An Open-Source-Based Framework for InfiniBand Performance Monitoring", in Proceedings of the 16th GI/ITG Conference on Measurement, Modeling, and Evaluation of Computer and Communication Systems (MMB) and Dependability and Fault Tolerance (DFT), March 2012, Kaiserslautern, Germany
University of Tuebingen · Sand 13 · 72076 Tübingen Phone: +49-7071-29-70507
[email protected] http://kn.inf.uni-tuebingen.de/staff/hoefling
IBPM: An Open-Source-Based Framework for InfiniBand Performance Monitoring Michael Hoefling1, Michael Menth1, Christian Kniep2, Marcus Camen2
1 These authors are with the University of Tuebingen, Tuebingen, Germany. 2 These authors are with science+computing ag, Tuebingen, Germany.
f State-of-the art communication technology for interconnection in high-performance computing data centers
f Point-to-point bidirectional links f High throughput (40 Gbit/s with QDR) f Low latency f Dynamic on-line network reconfiguration
in cooperation with
Idea f Extract raw network information from IB network f Analyze output f Derive statistics about performance of the network Topology Extraction f Subnet discovery using ibnetdiscover f Produces human readable file of network topology f Process output to produce graphical representation of the
network Remote Counter Readout f Each port has its own set of performance counters f Counters measure, e.g., transferred data, congestion, errors,
link states changes
Features f Automatic topology extraction and visualization f Visualization of traffic locality f Visualization of link utilization f Visualization of congestion f Visualization of port performance history Architecture
ibsim-Based Network Simulation f ibsim simulates an IB network f Simple topology changes possible (GUI) f ibsim limitations
� No performance simulation possible � No data rate changes possible
Real IB Network f Physical network f Allows performance measurements f GUI controlled traffic scenarios
Scenario 1: Topology Changes f Node and/or switch becomes unavailable f Connectivity state is represented in the topology map Scenario 2: Port Performance and Link Utilization f Nodes communicate with each other f Port performance accessible through simple point-and-click
interface on a node or switch f Link utilization is visualized through utilization-based-coloring of
the links in the performance map Scenario 3: Traffic Locality f Nodes use pre-defined traffic patterns f Traffic locality is visualized through locality-based-coloring of
the switches in the locality map
17
❖ Callback triggered for every reply
❖ Dumps info to file
❖ All ports reply with metrics
❖ Sends token to all ports
❖ OpenSM Performance Manager
Sw
OpenSM
18
OpenSM
PerfMgmtosmeventplugin
nodenode
Swnode
nodenode
node
node
❖ osmeventplugin
OpenSM
PerfMgmt
qnibngqnib
OpenSM
19
❖ sends metrics to RRDtool
❖ events to PostgreSQL
❖ qnib
❖ sends metrics to graphite
❖ events to logstash
❖ qnibng
Graphite Events port is up/down
20
21
22
QNIBTerminal Proof of Concept
23
Cluster Stack Mock-Up❖ IB events and metrics are not enough
❖ How to get real-world behavior?
❖ Wanted:
❖ Slurm (Resource Scheduler)
❖ MPI enabled compute nodes
❖ As much additional cluster stack as possible (Graphite,elasticsearch/logstash/kibana, Icinga, Cluster-FS, …)
24
Classical Virtualization
❖ Big overhead for simple node
❖ Resources provisioned in advance
❖ Host resources allocated
25
LXC (docker)
❖ minimal overhead ( couple of MB)
❖ no resource pinning
❖ cgroups option
❖ highly automatable
26
NOW: Watch OSDC2014 talk ‚Docker‘ by ‚Tobias Schwab‘
❖ alarming (Icinga) [not integrated]
❖ compute nodes (slurmd)
❖ log mgmt (ELK)
❖ monitoring (graphite + statsd)
Virtual Cluster Nodes
❖ Master Node (etcd, DNS, slurmctld)
27
host
master
monitoring
log mgm
tcom
pute0com
pute1
computeN
Master Node
❖ takes care of inventory (etcd)
❖ provides DNS (+PTR)
❖ Integrate Rudder, ansible, chef,…?
28
Non-Master Nodes (in general)
❖ are started with master as DNS
❖ mounting /scratch, /chome (sits on SSDs)
❖ supervisord kicks in and starts services and setup-scripts
❖ sending metrics to graphite
❖ logs to logstash
29
docker-compute
❖ slurmd
❖ sshd
❖ logstash-forwarder
❖ openmpi
❖ qperf
30
docker-graphite (monitoring)
❖ full graphite stack + statsd
❖ stresses IO (<3 SSDs)
❖ needs more care (optimize IO)
31
docker-elk (Log Mgmt)
❖ elasticsearch, logstash, kibana
❖ inputs: syslog, lumberjack
❖ filters: none
❖ outputs: elasticsearch
32
It’s alive!
33
Start Compute Node
34
Start Compute Node
35
Check Slurm Config
36
Run MPI-Job
37
TCP benchmark
38
QNIBTerminal Future Work
39
docker-icinga
40
❖ Icinga to provide
❖ state-of-the-cluster overview
❖ bundle with graphite/elk
❖ no big deal…
❖ Is this going to scale?
docker-(GlusterFS,Lustre)
❖ Cluster scratch to integrate with
❖ Use of kernel-modules freezes attempt
❖ Might be pushed in VirtualBox (vagrant)
41
❖ feared by them
❖ adopt them
❖ react to the changes
❖ How is SysOps/DevOps/Mgmt
Humans!
42
❖ Interaction
❖ Metrics
❖ Events
❖ Truckload of
Big Data!
43
job1.node01.system.memory.usage 9job1.node13.system.memory.usage 14job1.node35.system.memory.usage 12job1.node95.system.memory.usage 11
target=sumSeries(job01.*.system.memory.usage)
node01.system.memory.usage 9node13.system.memory.usage 14node35.system.memory.usage 12node95.system.memory.usage 11
target=sumSeries(node{01,13,35,95}.system.memory.usage)
pipework / mininet
❖ Currently all containers are bound to docker0 bridge
❖ Creating topology with virtual/real switches would be nice
❖ First iteration might use pipework
❖ More complete one should use vSwitches (mininet?)
44
Dockerfiles❖ Only 3 images are fd20 based
45
Questions?❖ Pictures
❖ p2: http://de.wikipedia.org/wiki/Datei:Audi_logo.svg http://commons.wikimedia.org/wiki/File:Daimler_AG.svg http://ffb.uni-lueneburg.de/20JahreFFB/
❖ p4: https://www.flickr.com/photos/adeneko/4229090961
❖ p6: cae t100 https://www.flickr.com/photos/losalamosnatlab/7422429706
❖ p8: http://www.brendangregg.com/Slides/SCaLE_Linux_Performance2013.pdf
❖ p9: https://www.flickr.com/photos/riafoge/6796129047
❖ p10: https://www.flickr.com/photos/119364768@N03/12928685224/
❖ p11: http://www.mellanox.com/page/products_dyn?product_family=74
❖ p23: https://www.flickr.com/photos/jaxport/3077543062
❖ p25/26: https://blog.trifork.com/2013/08/08/next-step-in-virtualization-docker-lightweight-containers/
❖ p33: https://www.flickr.com/photos/fkehren/5139094564
❖ p39: https://www.flickr.com/photos/brizzlebornandbred/12852909293
46