LocalityofReference2017.10.28 NAGOYA.BIN #1 KOUJI MATSUI (@KEKYO2)
Kouji Matsui - kekyo• NAGOYA city, AICHI pref., JP
• Twitter – @kekyo2 / Facebook
• ux-spiral corporation
• Microsoft Most Valuable Professional VS and DevTech 2015-
• Certified Scrum master / Scrum product owner
• Center CLR organizer.
• .NET/C#/F#/IL/metaprogramming or like…
• Bike rider
Agenda
•Physical side scales
• Logical side scales
•Data stream between physicals and logicals
• Locality of reference
•Anti-locality of reference
•Conclusion
Physical side scales
Physical side scales
Processor #1
Physical Core #4
Logical Core #1Logical Core #1Logical Core #1Logical Core #1Physical Core #3
Logical Core #1Logical Core #1Logical Core #1Logical Core #1Physical Core #2
Logical Core #1Logical Core #1Logical Core #1Logical Core #1Physical Core #1
Logical Core #1Logical Core #1Logical Core #1Logical Core #1
Processor #2
Physical Core #8
Logical Core #1Logical Core #1Logical Core #1Logical Core #1Physical Core #7
Logical Core #1Logical Core #1Logical Core #1Logical Core #1Physical Core #6
Logical Core #1Logical Core #1Logical Core #1Logical Core #1Physical Core #5
Logical Core #1Logical Core #1Logical Core #1Logical Core #17
Processor #3
Physical Core #12
Logical Core #1Logical Core #1Logical Core #1Logical Core #1Physical Core #11
Logical Core #1Logical Core #1Logical Core #1Logical Core #1Physical Core #10
Logical Core #1Logical Core #1Logical Core #1Logical Core #1Physical Core #9
Logical Core #1Logical Core #1Logical Core #1Logical Core #33
Physical side scales
The memory/IO bind at the fixed CPU/Core(Non configurable)
Physical side scales
The “shared cache memory” bind at the fixed CPU/Core
(Non configurable)
Physical side scales
The “cache memory” bind at the fixed CPU/Core(Non configurable)
The “shared cache memory” bind at the fixed CPU/Core
(Non configurable)
Agenda
•Physical side scales
• Logical side scales
•Data stream between physicals and logicals
• Locality of reference
•Anti-locality of reference
•Conclusion
Logical side scales
Process #1
VirtualMemory
Space
Thread #1Thread #1Thread #1Thread #1Thread #1
Process #2
VirtualMemory
Space
Thread #1Thread #1Thread #1Thread #1Thread #11
Process #3
VirtualMemory
Space
Thread #1Thread #1Thread #1Thread #1Thread #21
Process #4
VirtualMemory
Space
Thread #1Thread #1Thread #1Thread #1Thread #31
Process #5
VirtualMemory
Space
Thread #1Thread #1Thread #1Thread #1Thread #41
Logical side scales
Thread #1Thread #1Thread #1Thread #1Thread #1
Thread #1Thread #1Thread #1Thread #1Thread #11
Thread #1Thread #1Thread #1Thread #1Thread #21
Thread #1Thread #1Thread #1Thread #1Thread #31
Thread #1Thread #1Thread #1Thread #1Thread #41
Logical Core #4
Logical Core #3
Logical Core #2
Logical Core #1
This is true story
Execution context
Logical side scales
Thread #1Thread #1Thread #1Thread #1Thread #1
Thread #1Thread #1Thread #1Thread #1Thread #11
Thread #1Thread #1Thread #1Thread #1Thread #21
Thread #1Thread #1Thread #1Thread #1Thread #31
Thread #1Thread #1Thread #1Thread #1Thread #41
Logical Core #4
Logical Core #3
Logical Core #2
Logical Core #1
Switch execution context
Agenda
•Physical side scales
• Logical side scales
•Data stream between physicals and logicals
• Locality of reference
•Anti-locality of reference
•Conclusion
Data stream between physicals and logicals
Thread #1Thread #1Thread #1Thread #1Thread #1
Thread #1Thread #1Thread #1Thread #1Thread #11
Thread #1Thread #1Thread #1Thread #1Thread #21
Thread #1Thread #1Thread #1Thread #1Thread #31
Thread #1Thread #1Thread #1Thread #1Thread #41
Logical Core #4
Logical Core #3
Logical Core #2
Logical Core #1
L1/L2 cache #1
L1/L2 cache #2 L1/L2 cache #4
L1/L2 cache #3
Thread #1Thread #1Thread #1Thread #1Thread #1
Thread #1Thread #1Thread #1Thread #1Thread #11
Thread #1Thread #1Thread #1Thread #1Thread #21
Thread #1Thread #1Thread #1Thread #1Thread #31
Thread #1Thread #1Thread #1Thread #1Thread #41
Logical Core #4
Logical Core #3
Logical Core #2
Logical Core #1
L1/L2 cache #1
L1/L2 cache #2 L1/L2 cache #4
L3 cache #1 L3 cache #2
L1/L2 cache #3
Thread #1Thread #1Thread #1Thread #1Thread #1
Thread #1Thread #1Thread #1Thread #1Thread #11
Thread #1Thread #1Thread #1Thread #1Thread #21
Thread #1Thread #1Thread #1Thread #1Thread #31
Thread #1Thread #1Thread #1Thread #1Thread #41
Logical Core #4
Logical Core #3
Logical Core #2
Logical Core #1
L1/L2 cache #1
L1/L2 cache #2
L1/L2 cache #3
L1/L2 cache #4
L3 cache #1 L3 cache #2
NUMA node bound memory
Agenda
•Physical side scales
• Logical side scales
•Data stream between physicals and logicals
• Locality of reference
•Anti-locality of reference
•Conclusion
Thread #1Thread #1Thread #1Thread #1Thread #31
Thread #1Thread #1Thread #1Thread #1Thread #41
Logical Core #4
L1/L2 cache #4
L3 cache #2
NUMA node bound memorydeclaredType
currentType
stopType
field
FieldInfo[]
Thread #33 context
Load
/Prelo
ad
Thread #1Thread #1Thread #1Thread #1Thread #31
Thread #1Thread #1Thread #1Thread #1Thread #41
Logical Core #4
L1/L2 cache #4
L3 cache #2
NUMA node bound memory
__stack0_0
Thread #42 context
__stack0_1
__stack0_2
__stack1_0
declaredType
currentType
local0
local1
field
Load
/Prelo
ad
Switch
Thread #1Thread #1Thread #1Thread #1Thread #21
Thread #1Thread #1Thread #1Thread #1Thread #31
Thread #1Thread #1Thread #1Thread #1Thread #41
Logical Core #4
Logical Core #3L1/L2 cache #3
L1/L2 cache #4
L3 cache #2
NUMA node bound memorydeclaredType
currentType
stopType
field
FieldInfo[]
Thread #1Thread #1Thread #1Thread #1Thread #21
Thread #1Thread #1Thread #1Thread #1Thread #31
Thread #1Thread #1Thread #1Thread #1Thread #41
Logical Core #4
Logical Core #3L1/L2 cache #3
L1/L2 cache #4
L3 cache #2
NUMA node bound memorydeclaredType
currentType
stopType
field
FieldInfo[]
stopType
field
FieldInfo[]
field
Load
/Prelo
ad
Switch
Agenda
•Physical side scales
• Logical side scales
•Data stream between physicals and logicals
• Locality of reference
•Anti-locality of reference
•Conclusion
Thread #1Thread #1Thread #1Thread #1Thread #21
Thread #1Thread #1Thread #1Thread #1Thread #31
Thread #1Thread #1Thread #1Thread #1Thread #41
Logical Core #4
Logical Core #3L1/L2 cache #3
L1/L2 cache #4
L3 cache #2
NUMA node bound memoryCommon value
Common value
Common value
Load
/Prelo
ad
Load
/Prelo
ad
These threads access common value
Thread #1Thread #1Thread #1Thread #1Thread #21
Thread #1Thread #1Thread #1Thread #1Thread #31
Thread #1Thread #1Thread #1Thread #1Thread #41
Logical Core #4
Logical Core #3L1/L2 cache #3
L1/L2 cache #4
L3 cache #2
NUMA node bound memoryCommon value
Common value
Common value
Race condition(Receive coherence penalty)
STRATEGY:• Turn to immutable• Hashed indexer
Wri
te b
ack
Wri
te b
ack
Agenda
•Physical side scales
• Logical side scales
•Data stream between physicals and logicals
• Locality of reference
•Anti-locality of reference
•Conclusion
Conclusion
The execution context bounds not THREAD. The code executor is CPU CORE.
CPU cores have structuable nested cache system.
Cache miss penalty is large.
Cache coherency penalty is large.
Both I/O systems too.
Important cache-related architecture:◦ Locality of reference
◦ Immutable
Thanks join!
My blog◦ http://www.kekyo.net/
Current active project:◦ IL2C - A translator implementation of .NET intermediate language to C
language.◦ YouTube: http://bit.ly/2xtu4MH
◦ GitHub: https://github.com/kekyo/IL2C
Top Related