Post on 13-Dec-2015
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Overview
Resources, real-time, “continuous” media streams, …
(CPU) Scheduling
Memory management
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Resources Resource:
“A resource is a system entity required by a task for manipulating data” [Steimetz & Narhstedt 95]
Characteristics: active: provides a service, e.g., CPU, disk or network adapter passive: system capabilities required by active resources, e.g.,
memory
exclusive: only one process at a time can use it, e.g., CPU shared: can be used by several concurrent processed, e.g.,
memory
single: exists only once in the system, e.g., loudspeaker multiple: several within a system, e.g., CPUs in a multi-
processor system
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Real–Time Real-time process:
“A process which delivers the results of the processing in a given time-span”
Real-time system:“A system in which the correctness of a computation depends not only on obtaining the result, but also upon providing the result on time”
Many real-time applications, e.g.: temperature control in a nuclear/chemical plant
driven by interrupts from an external device these interrupts occur irregularly
defense system on a navy boat driven by interrupts from an external device these interrupts occur irregularly
control of a flight simulator execution at periodic intervals scheduled by timer-services which the application requests from the OS
...
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Real–Time Deadline:
“A deadline represents the latest acceptable time for the presentation of the processing result”
Hard deadlines: must never be violated system failure too late results
have no value, e.g., processing weather forecasts
means severe (catastrophic) system failure, e.g., processing of an incoming torpedo signal in a navy boat scenario
Soft deadlines: in some cases, the deadline might be missed
not too frequently not by much time
result still may have some (but decreasing) value, e.g., a late I-frame in MPEG
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Real–Time and Multimedia Multimedia systems
have periodic processing requirements (e.g., each 33 ms in a 30 fps video)
require large bandwidths (e.g., average of 3.5 Mbps for DVD video only)
typically have soft deadlines (may miss a frame) are non-critical (user may be annoyed, but …)
need predictability (guarantees) adapt real-time mechanisms to continuous media priority-based schemes are of special importance
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Admission and Reservation To prevent overload, admission may be performed:
schedulability test: “are there enough resources available for a new stream?” “can we find a schedule for the new task without disturbing the existing workload?” a task is allowed if the utilization remains < 1
yes – allow new task, allocate/reserve resources no – reject
Resource reservation is analogous to booking(asking for resources) pessimistic
avoid resource conflicts making worst-case reservations potentially under-utilized resources guaranteed QoS
optimistic reserve according to average load high utilization overload may occur
perfect must have detailed knowledge about resource requirements of all processes too expensive to make/takes much time
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Real–Time and Operating Systems The operating system manages local resources
(CPU, memory, disk, network card, busses, ...)
In a real-time, multimedia scenario, support is needed for: real-time processing efficient memory management
This also means support for proper … scheduling –
high priorities for time-restrictive multimedia tasks timer support –
clock with fine granularity and event scheduling with high accuracy kernel preemption –
avoid long periods where low priority processes cannot be interrupted memory replacement –
prevent code for real-time programs from being paged out fast switching –
both interrupts and context switching should be fast ...
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Start playback at t1
Consumed bytes (offset) variable rate constant rate
Must start retrieving data earlier
Data must arrive beforeconsumption time
Data must be sent before arrival time
Data must be read from disk before sending time
Streaming Data
t1
time
data offset
consume function
arrive function
send functionread function
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Need buffers to hold data between the functions, e.g., client B(t) = A(t) – C(t), i.e., t : A(t) ≥ C(t)
Latest start of data arrival is given by min[B(t,t0,t1) ; t B(t,t0,t1) ≥ 0],
i.e., the buffer must at all times t have more data to consume
Streaming Data
time
data offset
t1
consume function
arrive function
t 0
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
file systemcommunication
system
application
“Continuous Media” and “continuous streams” are ILLUSIONS retrieve data in blocks from disk transfer blocks from file
system to application send packets to communication system
split packets into appropriate MTUs
... (intermediate nodes) ... (client)
different optimal sizes
pseudo-parallel processes (run in time slices)
need for scheduling(to have timing and appropriate resource allocation)
Streaming Data
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Scheduling A task is a schedulable entity
(a process/thread executing a job, e.g., an packet through the communication system or a disk request through the file system)
In a multi-tasking system, several tasks may wish to use a resource simultaneously
A scheduler decides which task that may use the resource, i.e., determines order by which requests are serviced, using a scheduling algorithm
Each active (CPU, disk, NIC) resources needs a scheduler(passive resources are also “scheduled”, but in a slightly different way)
resource
requests
scheduler
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Scheduling Scheduling algorithm classification:
dynamic make scheduling decisions at run-time flexible to adapt considers only actual task requests and execution time parameters large run-time overhead finding a schedule
static make scheduling decisions at off-line (also called pre-run-time) generates a dispatching table for run-time dispatcher at compile time needs complete knowledge of task before compiling small run-time overhead
preemptive currently executing task may be interrupted (preempted) by higher priority
processes preempted process continues later at the same state potential frequent contexts switching (almost!?) useless for disk and network cards
non-preemptive running tasks will be allowed to finish its time-slot (higher priority processes
must wait) reasonable for short tasks like sending a packet (used by disk and network
cards) less frequent switches
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Scheduling Preemption:
tasks waits for processing scheduler assigns priorities task with highest priority will be
scheduled first preempt current execution if a higher
priority (more urgent) task arrives
real-time and best effort priorities(real-time processes have higher priority - if exists, they will run)
to kinds of preemption: preemption points
o predictable overheado simplified scheduler accounting
immediate preemptiono needed for hard real-time systemso needs special timers and
fast interrupt and context switch handling
resource
requests
scheduler preemption
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Scheduling Scheduling is difficult and takes time:
process 1 process 2 process 3 process 4 process N RT process…
RT process
request
round-robin
process 1 process 2 process 3 process 4 process N…
RT process
requestpriority,non-preemtive
delay
RT process
delay
process 1 process 2 process 3 process 4 process N…
requestpriority,preemtive p 1 p 1 process 2 process 3 process 4 process N…
RT process
RT process p 1 process 2 process 3 process 4 process N…
only delay switching and interrupts
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Priorities and Multimedia Multimedia streams need predictable access to
resources – high priorities, e.g.:
Within each class one could have a second-level scheduler 1 and 2: real-time scheduling and fine grained
priorities 3: may use traditional approaches as round-robin
1. multimedia traffic with guaranteed QoS
2. multimedia traffic with predictive QoS
3. other requests
may not exist
must not starve
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Scheduling in Windows 2000 Preemptive kernel Schedules threads individually
Time slices given in quantums 3 quantums = 1 clock interval (length of interval may vary)
defaults: Win2000 server: 36 quantums Win2000 workstation (professional) : 6 quantums
may manually be increased between threads (1x, 2x, 4x, 6x)
foreground quantum boost (add 0x, 1x, 2x): active window can get longer time slices (assumed needs fast response)
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Scheduling in Windows 2000 32 priority levels:
Round Robin (RR) within each level
Interactive and throughput-oriented: “Real time” – 16 system levels
fixed priority may run forever
Variable – 15 user levels priority may change:
thread priority = process priority ± 2 uses much drops user interactions, I/O completions increase
Idle/zero-page thread – 1 system level runs whenever there are no other processes to run e.g., clearing memory pages for memory manager
31
30
...
17
16
15
14
...
2
1
0
Real Time (system thread)
Variable (user thread)
Idle (system thread)
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Scheduling in Linux Preemptive kernel Threads and processes used to be equal,
but Linux uses (in 2.6) thread scheduling
SHED_FIFO may run forever, no timeslices may use it’s own scheduling algorithm
SHED_RR each priority in RR timeslices of 10 ms (quantums)
SHED_OTHER ordinary user processes uses “nice”-values: 1≤ priority≤40 timeslices of 10 ms (quantums)
Threads with highest goodness are selected first:
realtime (FIFO and RR):goodness = 1000 + priority
timesharing (OTHER): goodness = (quantum > 0 ? quantum + priority : 0)
Quantums are reset when no ready process has quantums left (end of epoch):quantum = (quantum/2) + priority
1
2
...
126
127
1
2
...
126
127
default (20)
-20
-19
...
18
19
SHED_FIFO
SHED_RR
SHED_OTHER
nice
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Scheduling in AIX Similar to Linux, but has
always only used thread scheduling SHED_FIFO SHED_RR SHED_OTHER
BUT, SHED_OTHER may change “nice” values running long (whole
timeslices) penalty – nice increase
interrupted (e.g., I/O) gives initial “nice” value back
1
2
...
126
127
1
2
...
126
127
default
-20
-19
...
18
19
SHED_FIFO
SHED_RR
SHED_OTHER
nice
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Multimedia streams are usually periodic (fixed frame rates and audio sample frequencies)
Time constraints for a periodic task: s – starting point
(first time the task require processing) e – processing time d – deadline p – period (r – rate (r = 1/p))
0 ≤ e ≤ d (often d ≤ p: we’ll use d = p – end of period, but Σd ≤ Σp is enough)
the kth processing of the task is ready at time s + (k – 1) p must be finished at time s + (k – 1) p + d
the scheduling algorithm must account for these properties
Real–Time Scheduling
s time
ed
p
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Real–Time Scheduling Resource reservation
QoS can be guaranteed relies on knowledge of tasks no fairness origin: time sharing operating systems e.g., earliest deadline first (EDF) and rate monotonic (RM)
(AQUA, HeiTS, RT Upcalls, ...)
Proportional share resource allocation no guarantees requirements are specified by a relative share allocation in proportion to competing shares size of a share depends on system state and time origin: packet switched networks e.g., Scheduler for Multimedia And Real-Time (SMART)
(Lottery, Stride, Move-to-Rear List, ...)
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Earliest Deadline First (EDF) Preemptive scheduling based on dynamic task priorities
Task with closest deadline has highest priority stream priorities vary with time
Dispatcher selects the highest priority task
Assumptions: requests for all tasks with deadlines are periodic the deadline of a task is equal to the end on its period (starting
of next) independent tasks (no precedence) run-time for each task is known and constant context switches can be ignored
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Earliest Deadline First (EDF)
Example:
Task A
Task Btime
Dispatching
deadlines
priority A > priority B
priority A < priority B
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Rate Monotonic (RM) Scheduling Classic algorithm for hard real-time systems with one
CPU [Liu & Layland ‘73]
Pre-emptive scheduling based on static task priorities
Optimal: no other algorithms with static task priorities can schedule tasks that cannot be scheduled by RM
Assumptions: requests for all tasks with deadlines are periodic the deadline of a task is equal to the end on its period (starting of
next) independent tasks (no precedence) run-time for each task is known and constant context switches can be ignored any non-periodic task has no deadline
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Process priority based on task periods task with shortest period gets
highest static priority task with longest period gets
lowest static priority dispatcher always selects task requests with highest priority
Example:
Rate Monotonic (RM) Scheduling
pri
ori
ty
period length
shortest period, highest priority
longest period, lowest priority
Task 1
p1
Dispatching
Task 2
p2 P1 < P2
P1 highest priority
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
EDF Versus RM It might be impossible to prevent deadline misses in a strict, fixed priority system:
Task A
Task B
Fixed priorities,A has priority, no dropping
Fixed priorities,B has priority, no dropping
Fixed priorities,A has priority, dropping
Fixed priorities,B has priority, dropping
time
deadline miss
deadline miss
deadline miss
deadline miss
Earliest deadline first
deadlines
waste of time
waste of time
waste of time
Rate monotonic (as the first)
deadline miss
RM may give somedeadline violationswhich is avoided by EDF
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
NOTE: this means that EDF is usually more efficient than RM, i.e., if switchesare free and EDF uses resources ≤ 1, then RM may need ≤ ln(2) resources to schedule the same workload
EDF Versus RM EDF
dynamic priorities changing in time overhead in priority switching QoS calculation – maximal throughput:
Ri x ei ≤ 1, R – rate, e – processing time
RM static priorities based on periods may map priority onto fixed OS priorities (like Linux) QoS calculation:
Ri x ei ≤ ln(2), R – rate, e – processing time
all streams i
all streams i
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
SMART (Scheduler for Multimedia And Real–Time applications)
Designed for multimedia and real-time applications
Principles
priority – high priority tasks should not suffer degradation due to presence of low priority tasks
proportional sharing – allocate resources proportionally and distribute unused resources (work conserving)
tradeoff immediate fairness – real-time and less competitive processes (short-lived, interactive, I/O-bound, ...) get instantaneous higher shares
graceful transitions – adapt smoothly to resource demand changes
notification – notify applications of resource changes
Proportional shares no admission control
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Tasks have importance and urgency urgency – an immediate real-time constraint, short deadline
(determine when a task will get resources) importance – a priority measure
expressed by a tuple: [ priority p , biased virtual finishing time bvft ]
p is static: supplied by user or assigned a default value
bvft is dynamic:o virtual finishing time: degree to which the share was consumedo bias: bonus for interactive tasks
Best effort schedule based on urgency and importance find most important tasks – compare tuple:
T1 > T2 (p1 > p2) (p1 = p2 bvft1 > bvft2) sort after urgency (EDF based sorting) iteratively select task from candidate set as long as schedule is
feasible
SMART (Scheduler for Multimedia And Real–Time applications)
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Evaluation of a Real–Time Scheduling
Tests performed by IBM (1993) executing tasks with and without EDF on an 57 MHz, 32 MB RAM, AIX Power 1
Video playback program: one real-time process
read compressed data decompress data present video frames via X server to user
process requires 15 timeslots of 28 ms each per second 42 % of the CPU time
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Evaluation of a Real–Time Scheduling
task numberevent number
lax
ity [
s]3 Load Processes
-0.05
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
0.03
0.04
0.05
0 20 40 60 80 100 120 140 160 180 200
without real-time schedulingwith real-time scheduling
laxit
y (
rem
ain
ing t
ime t
o d
eadlin
e)
several deadlineviolations by thenon-real-timescheduler
the real-time scheduler reaches all its deadlines
3 load processes(competing with the video playback)
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Evaluation of a Real–Time Scheduling
0.026
0.028
0.03
0.032
0.034
0.036
0.038
0.04
0.042
0 20 40 60 80 100 120 140 160 180 200task number
laxit
y (
rem
ain
ing t
ime t
o d
eadlin
e)
Varied the number of load processes(competing with the video playback)
NB! The EDF scheduler kept its deadlines
4 other processes
16 other processes
Only video process
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Evaluation of a Real–Time Scheduling
Tests again performed by IBM (1993) on an 57 MHz, 32 MB RAM, AIX Power 1
“Stupid” end system program: 3 real-time processes only requesting CPU cycles each process requires 15 timeslots of 21 ms each per
second 31.5 % of the CPU time each 94.5 % of the CPU time required for real-time tasks
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Evaluation of a Real–Time Scheduling
1 Load Process
event number
laxi
ty [
s]
-0.08
-0.06
-0.04
-0.02
0
0.02
0.04
0.06
0 20 40 60 80 100 120 140 160 180 200
without real-time scheduling
with real-time scheduling
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0 20 40 60 80 100 120 140 160 180 200
with real-time scheduling – process 1with real-time scheduling – process 2with real-time scheduling – process 3
16 Load Processes
laxi
ty [
s]
event number
1 load process(competing with the real-time processes)
task number
laxit
y (
rem
ain
ing t
ime t
o d
eadlin
e)
the real-time scheduler reaches all its deadlines
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Evaluation of a Real–Time Scheduling
16 load process(competing with the real-time processes)
task number
laxit
y (
rem
ain
ing t
ime t
o d
eadlin
e)
1 Load Process
event number
laxi
ty [
s]
-0.08
-0.06
-0.04
-0.02
0
0.02
0.04
0.06
0 20 40 60 80 100 120 140 160 180 200
without real-time scheduling
with real-time scheduling
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0 20 40 60 80 100 120 140 160 180 200
with real-time scheduling – process 1with real-time scheduling – process 2with real-time scheduling – process 3
16 Load Processesla
xity
[s]
event number
Regardless of other load, the EDF-scheduler reach its deadlines(laxity almost equal as in 1 load process scenario)
process 1
process 2
process 3NOTE: Processes are scheduled in same order
1 Load Process
event number
laxi
ty [
s]
-0.08
-0.06
-0.04
-0.02
0
0.02
0.04
0.06
0 20 40 60 80 100 120 140 160 180 200
without real-time scheduling
with real-time scheduling
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0 20 40 60 80 100 120 140 160 180 200
with real-time scheduling – process 1with real-time scheduling – process 2with real-time scheduling – process 3
16 Load Processes
laxi
ty [
s]
event number
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Delivery Systems
Network
bus(es)
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
file systemcommunication
system
application
user space
kernel space
bus(es)
Delivery Systems
several disk-to-memory transfers
several in-memory data movements and context switches
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Memory Caching
communication system
application
disk network card
expensive
file system
cache
caching possible
How do we manage a cache? how much memory to use? how much data to prefetch? which data item to replace? …
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Is Caching Useful in a Multimedia Scenario?
High rate data may need lots of memory for caching…
Tradeoff: amount of memory, algorithms complexity, gain, …
Cache only frequently used data – how?(e.g., first (small) parts of a broadcast partitioning scheme, allow “top-ten” only, …)
Buffer vs. Rate
160 Kbps(e.g., MP3)
1.4 Mbps (e.g., uncompressed
CD)
3.5 Mbps (e.g., average DVD
video)
100 Mbps (e.g., uncompressed
HDTV)
100 MB 85 min 20 s 9 min 31 s 3 min 49 s 8 s
1 GB 14 hr 33 min 49 s
1 hr 37 min 31 s 39 min 01 s 1 min 22 s
16 GB 133 hr 01 min 01 s
26 hr 00 min 23 s
10 hr 24 min 09 s
21 min 51 s
32 GB 266 hr 02 min 02 s
52 hr 00 min 46 s
20 hr 48 min 18 s
43 min 41 sMaximum amount of memory (totally)that a Dell Server can manage in 2004 – and all is NOT used for caching
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Need For Special “Multimedia Algorithms” ?
Most existing systems use an LRU-variant keep a sorted list replace first in list insert new data elements at the end if a data element is re-accessed (e.g., new client or rewind),
move back to the end of the list
Extreme example – video frame playout:LRU buffer
longest time
since accessshortest time
since access
play video (7 frames): 1234567
rewind and restart playout at 1: 7 6 5 4 3 21
playout 2: 1 7 6 5 4 32
playout 3: 2 1 7 6 5 43
playout 4: 3 2 1 7 6 54
In this case, LRU replaces the next needed frame. So the answer is in many cases YES…
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
“Classification” of Mechanisms Block-level caching consider (possibly unrelated) set of blocks
each data element is viewed upon as an independent item usually used in “traditional” systems e.g., FIFO, LRU, CLOCK, …
multimedia (video) approaches: Least/Most Relevant for Presentation (L/MRP) …
Stream-dependent caching consider a stream object as a whole related data elements are treated in the same way research prototypes in multimedia systems e.g.,
BASIC DISTANCE Interval Caching (IC) Generalized Interval Caching (GIC) Split and Merge (SAM) SHR
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Least/Most Relevant for Presentation (L/MRP)
L/MRP is a buffer management mechanism for a single interactive, continuous data stream
adaptable to individual multimedia applications
preloads units most relevant for presentation from disk
replaces units least relevant for presentation
client pull based architecture
[Moser et al. 95]
Server
request
Homogeneous stream e.g., MJPEG video
ClientBuffer
request
Continuous Presentation Units (COPU)e.g., MJPEG video frames
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
current presentation point
Least/Most Relevant for Presentation (L/MRP) Relevance values are calculated with respect to current playout of the
multimedia stream presentation point (current position in file) mode / speed (forward, backward, FF, FB, jump)
relevance functions are configurable
[Moser et al. 95]
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
COPUs – continuous object presentation units
1011
2021
26
COPU number10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
relevance value
1.0
0
0.8
0.6
0.4
0.2
X referenced
X history
playback direction
1213
1415 16 17 18 19
2524
2322
X skipped
16 18
20
22
24
26
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
loaded frames
Global relevance value each COPU can have more than one relevance value
bookmark sets (known interaction points) several viewers (clients) of the same
= maximum relevance for each COPU
Least/Most Relevant for Presentation (L/MRP)[Moser et al. 95]
... ...
0
1
Relevance
Bookmark-Set Referenced-SetHistory-Set
100 101 102 1039998
current presentation
point S1
91 92 93 949089 95 96 97 104 105 106
current presentation
point S2
global relevance value
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Least/Most Relevant for Presentation (L/MRP)
L/MRP … … gives “few” disk accesses (compared to other schemes) … supports interactivity … supports prefetching
… targeted for single streams (users) … expensive (!) to execute
(calculate relevance values for all COPUs each round)
Variations: Q-L/MRP – extends L/MRP with multiple streams and changes
prefetching mechanism (reduces overhead) [Halvorsen et. al. 98]
MPEG-L/MRP – gives different relevance values for different MPEG frames [Boll et. all. 00]
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Interval Caching (IC) Interval caching (IC) is a caching strategy for streaming servers
caches data between requests for same video stream – based on playout intervals between requests
following requests are thus served from the cache filled by preceding stream
sort intervals on length, buffer requirement is data size of interval
to maximize cache hit ratio (minimize disk accesses) the shortest intervals are cached first
Video clip 1
S11
Video clip 1
S11S12
Video clip 1
S12 S11S13
Video clip 2
S22 S21
Video clip 3
S33 S31S32S34
I11I12
I21
I31I32I33
: I32 I33 I21I11I31I12
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Generalized Interval Caching (GIC) Interval caching (IC) does not work for short clips
a frequently accessed short clip will not be cached
GIC generalizes the IC strategy manages intervals for long video objects as IC short intervals extend the interval definition
keep track of a finished stream for a while after its termination define the interval for short stream as the length between the new stream and the position of the old
stream if it had been a longer video object the cache requirement is, however, only the real requirement
cache the shortest intervals as in IC
Video clip 1
S11S12
I11
C11
S11
Video clip 2
S22 S21
I21
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
wasted buffering
LRU vs. L/MRP vs. IC Caching What kind of caching strategy is best (VoD
streaming)? caching effect
movie X
S5 S4 S2 S1S3
Memory (L/MRP):
Memory (IC):
loaded page frames
global relevance values
I1 I2I3 I4
4 streams from disk, 1 from cache
2 streams from disk, 3 from cache
Memory (LRU): 4 streams from disk, 1 from cache
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
LRU vs. L/MRP vs. IC Caching What kind of caching strategy is best (VoD
streaming)? caching effect (IC best) CPU requirementLRU
for each I/O request reorder LRU chain
L/MRP
for each I/O request for each COPU RV = 0 for each stream tmp = r ( COPU, p, mode ) RV = max ( RV, tmp )
IC
for each block consumed if last part of interval release memory element
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
In Memory Copy Operations
communication system
application
disk network card
expensive
file system
expensive
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Cost of Data Transfers Data copy operations are expensive
consume CPU, memory, hub, bus and interface resources (proportional to size)
profiling shows that ~40% of CPU time is consumed by copying data
speed-gap between memory and CPU increase different access times to different banks
System calls makes a lot of switches between user and kernel space ~450 ns in 2000 on 933MHz PentiumIII ~920 ns in 2005 on 1.7GHz PentiumIV
memcpy() - 1.7GHz PentiumIV
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Cost of Data Transfers THUS; data movement costs should be kept
small careful management of contiguous media data avoid unnecessary physical copy operations apply appropriate buffer management schemes
reduce overhead by removing physical in-memory copy operation, i.e., ZERO-COPY ZERO-COPY data pathsdata paths
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
file systemcommunication
system
application
user space
kernel space
bus(es)
data_pointer data_pointer
Basic Idea of Zero–Copy Data Paths
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Zero–Copy (Streaming) Mechanisms
Linux: sendfile() between two descriptors (file and TCP-socket) bi-directional: disk-network and network-disk need TCP_CORK
AIX: send_file() only TCP uni-directional: disk-network
INSTANCE (MMBUF-based, in NetBSDv1.5): by UniK/IFI (2000) uni-directional: disk-network
(network-disk ongoing work) stream_read() and stream_send()
(zero-copy 1) stream_rdsnd()
(zero-copy 2)
splice(), stream(), IO-Lite, MMBUF, …
Kernel streaming using zero-copy
Application streaming using zero-copy
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
INSTANCE Zero–Copy Transfer Rate
Throughput increase of ~2.7 times per stream (can at least double the number of streams)
Zero-copy transfer rate limited by network cardand storage system
saturated a 1 Gbps NIC and 32-bit, 33 MHz PCI
reduced processing time by approximately 50 %
huge improvement in number of concurrent streams
approx. 12 Mbps
approx. 6 Mbps
read, write, with copy
read, write, no copy
read, automatic write, no copy
Existing Linux Existing Linux Data PathsData Paths
A lot of research has been performed in this area!!!!BUT, what is the status today of commodity operating systems?
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Content Download
file systemcommunication
system
application
user space
kernel space
bus(es)
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Content Download: read / send
application
kernel
page cache socket buffer
applicationbuffer
read send
copycopy
DMA transfer DMA transfer
2n copy operations 2n system calls
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Content Download: mmap / send
application
kernel
page cache socket buffer
mmap send
copy
DMA transfer DMA transfer
n copy operations 1 + n system calls
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Content Download: sendfile
application
kernel
page cache socket buffer
sendfile
gather DMA transfer
append descriptor
DMA transfer
0 copy operations 1 system calls
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Content Download: Results
UDP TCP
Tested transfer of 1 GB file on Linux 2.6 Both UDP (with enhancements) and TCP
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Streaming
file systemcommunication
system
application
user space
kernel space
bus(es)
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Streaming: mmap / send
application
kernel
page cache socket buffer
application buffer
mmap uncork
copy
DMA transfer DMA transfer
2n copy operations 1 + 4n system calls
copy
sendsendcork
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Streaming: mmap / writev
application
kernel
page cache socket buffer
application buffer
mmap writev
copy
DMA transfer DMA transfer
2n copy operations 1 + n system calls
copy
Previous solution three less calls per packet
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Streaming: sendfile
application
kernel
page cache socket buffer
application buffer
DMA transfer
n copy operations 4n system calls
gather DMA transfer
append descriptor
copy
uncorksendfilesendcork
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Streaming: Results Tested streaming of 1 GB file on Linux 2.6 RTP over UDP
TCP sendfile (content download)
Compared to not sending an RTP header over UDP, we get an increase of 29%(additional send call)
More copy operations and system calls required potential for improvements
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Enhanced Streaming: mmap / msend
application
kernel
page cache socket buffer
application buffer
DMA transfer
n copy operations 1 + 4n system calls
gather DMA transfer
append descriptor
copy
msend allows to send data from anmmap’ed file without copy
mmap uncorksendsendcork msend
copy
DMA transfer
Previous solution one more copy per packet
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Enhanced Streaming: mmap / rtpmsend
application
kernel
page cache socket buffer
application buffer
DMA transfer
n copy operations 1 + n system calls
gather DMA transfer
append descriptor
copy
mmap uncorksendsendcork rtpmsend
RTP header copy integrated intomsend system call
previous solution require three more calls per packet
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Enhanced Streaming: mmap / krtpmsend
application
kernel
page cache socket buffer
application buffer
DMA transfer
0 copy operations 1 system call
gather DMA transfer
append descriptor
copy
krtpmsend
previous solution require one more call per packet
An RTP engine in the kernel adds RTP headers
rtpmsend
RTP engine
previous solution require one more copy per packet
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Enhanced Streaming: rtpsendfile
application
kernel
page cache socket buffer
application buffer
DMA transfer
n copy operations n system calls
gather DMA transfer
append descriptor
copy
rtpsendfile
existing solution require three more calls per packet
uncorksendfilesendcork
RTP header copy integrated intosendfile system call
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Enhanced Streaming: krtpsendfile
application
kernel
page cache socket buffer
application buffer
DMA transfer
0 copy operations 1 system call
gather DMA transfer
append descriptor
copy
krtpsendfile
previous solution require one more call per packet
An RTP engine in the kerneladds RTP headers
rtpsendfile
RTP engine
previous solution require one more copy per packet
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Enhanced Streaming: Results
Tested streaming of 1 GB file on Linux 2.6 RTP over UDP
TCP
send
file
(con
tent
dow
nlo
ad)Ex
isting
mec
hani
sm
(str
eam
ing)
mmap based mechanisms sendfile based mechanisms
~27%
impr
ovem
ent
~25%
impr
ovem
ent
2005 Carsten Griwodz & Pål Halvorsen
INF5070 – media storage and distribution systems
Summary All resources needs to be scheduled Scheduling algorithms for multimedia tasks have to…
… consider real-time requirements … provide good resource utilization (… be implementable)
Memory management is an important issue caching copying is expensive
Rule of thumb: watch out for bottlenecks copying data touching operations frequent context switches (system calls) scheduling of slow devices (disk) ...