Memory management in Linux kernel
-
Upload
vadim-nikitin -
Category
Technology
-
view
18.087 -
download
2
description
Transcript of Memory management in Linux kernel
1
2
Memory management in Linux kernel
3
Memory management tasks
• Physical memory allocator• Physical memory management• Virtual memory allocator• PTE management• Memory allocator for kernel
needs
4
Memory management subsystem
• >100K lines• Buddy allocator• Page replacement (“LRU” reclaim model)• PTE management• Slab/slob/slub kernel allocator• Pagecache/writeback/readahead/swap• Cgroup memory controller• Compaction
5
Hardware
• X86_64• Paging (MMU, TLB, ...)• 4KB, 2MB and 1GB pages• NUMA• 4-level PTE's• Hardware referenced bit
6
Physical memory description
• Node (pg_data_t)• Zone (struct zone)• Page (struct page)
$ cat /proc/zoneinfo | grep NodeNode 0, zone DMANode 0, zone DMA32Node 0, zone NormalNode 1, zone Normal
7
Virtual memory description
• Address space (struct mm_struct)• VM area (struct vm_area_struct)
$ cat /proc/self/maps 00400000-0040c000 r-xp 00000000 08:03 2359718 /usr/bin/cat
0060b000-0060c000 r--p 0000b000 08:03 2359718 /usr/bin/cat0060c000-0060d000 rw-p 0000c000 08:03 2359718 /usr/bin/cat011a7000-011c8000 rw-p 00000000 00:00 0 [heap]7f4d072e5000-7f4d0d80e000 r--p 00000000 08:03 2369473 /usr/lib/locale/locale-archive7f4d0d80e000-7f4d0d9c2000 r-xp 00000000 08:03 2366682 /usr/lib64/libc-2.18.so7f4d0d9c2000-7f4d0dbc2000 ---p 001b4000 08:03 2366682 /usr/lib64/libc-2.18.so
7f4d0dbc2000-7f4d0dbc6000 r--p 001b4000 08:03 2366682 /usr/lib64/libc-2.18.so...
8
File mappings
• File mappings (struct address_space)
• Radix tree with all resident pages• Pagecache• Major/minor pagefault
9
Kernel API
• __get_free_page()• kmalloc()/kfree()• vmalloc()• ...
10
Userspace API
• pagefault• mmap()/munmap()• brk()• mlock()/munlock()• fadvise(), madvise()• ...
11
Memory reclaim• Normal/direct reclaim (free pool)• Per-node kswapd• Working set• Memory pressure• File memory vs anonymous memory• Swap• OOM
12
“LRU” model
• 5 double linked lists: inactive file, active file, inactive anon, active anon, unevictable
• Referenced flag in struct page_struct flag
13
List transition rules• mark_page_accessed():
– unreferenced -> referenced– inactive && referenced -> active
• shrink_inactive_list():– if (ptes referenced)
• anonymous -> active• referenced -> active• (ptes referenced > 1) -> active (3.2)• (vm_flags & VM_EXEC) -> active (3.2)• set referenced• rotate
– else• reclaim
• shrink_active_list():– If referenced
• file & VM_EXEC -> rotate
– -> inactive
14
Memory pressure balancing
• nr_pages_to_scan = nr_pages/2^priority
• priority = [12..0]1/4096, 1/2048, 1/1024, ...
• swappiness• active > inactive
15
Yasearch-specific problems & solutions
• Working set > 1/2 available memory
• Memory thrashing• promote_mapped_pages• file_inactive_ratio
16
Monitoring & tools• top• vmtouch• /proc/vmstat• /proc/buddyinfo• /proc/slabinfo• perf top• oom-message in dmesg
17
Demonstration
18
Cgroups
• Each cgroup has own LRU lists.• No common LRU (since 3.3)!• Common free pool(s)• Common kswapd thread(s)• Global reclaim vs target reclaim
19
Memory controller
• memory.limit_in_bytes• memory.soft_limit_in_bytes (will
be deprecated)• memory.use_hierarchy• ...
20
Monitoring
• memory.usage_in_bytes• memory.max_usage_in_bytes• memory.stat
21
Accounting
• Each page belongs to one cgroup• First accessed - owner• memory.move_charge_at_immigr
ate
22
Yasearch-specific problems & solutions
• memory.low_limit_in_bytes• First accessed – owner? mlock()?
low_limit?• memory.recharge_on_pgfault
23
Compaction
• Physical pages migration to zone's top
• https://lwn.net/Articles/368869• Broken in 3.3-3.7• Replacement for lumpy reclaim• Use perf top for problem diagnostics
24
Спасибо за внимание!