Laura W. McGarrity and Robert Botne - University of Washington
© 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The...
-
Upload
jesse-lindsey -
Category
Documents
-
view
231 -
download
0
Transcript of © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The...
![Page 1: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/1.jpg)
© 2
005
The
Mat
hWor
ks, I
nc.
Handling Large Data Sets Efficiently in MATLAB®
Stuart McGarrity
The MathWorks, Inc.
![Page 2: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/2.jpg)
2
Handling Large Data Sets is Like…
![Page 3: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/3.jpg)
3
Agenda
Problems in handling large data sets Strategies for handling large data set Maximizing available memory on your system Minimizing required memory in MATLAB®
![Page 4: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/4.jpg)
4
Problems in Handling Large Data Sets
![Page 5: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/5.jpg)
5
What are Large Data Sets?
Data in MATLAB® represent physical quantities
Large Data– Lots of quantities– Varying by time, space– High resolution– Example: 5-10 TB data
per flight test Trends: Devices, Computers,
RAM, Hard drives
![Page 6: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/6.jpg)
6
What are Large Data Handling Problems?
Running out of memory– Large data sets need lots of memory to store and process– Computers have finite memory– Data set size > available memory: “Out of memory” errors
Slowness – Large data sets need lots of operations to process, and
access– Today's CPUs have limited speed– Slowness due to page file use
![Page 7: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/7.jpg)
7
Causes of Out of Memory Error
Required memory > available memory– Memory constraints on (32-bit) computer system– Memory usage characteristics of MATLAB
Lack of understanding of memory constraints or requirements– >>A=rand(6e3,6e3);B=svd(A);” Why out of
memory?”– “I have 1 GB file but I have 3 GB of RAM. Why out of
memory?” Mistakes
– >>a=rand(10000); % need 800MB storage
![Page 8: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/8.jpg)
8
Example from CSSM: MATLAB Memory Limitation Problem!!!!!!!!!!!"erican" <[email protected]> wrote in message news:<ijkq1gh4vqd8@legacy>...
It seems Matlab has a bad memory management. I got a new PC with 4G physical memory and hoped to free my worry about memory usage.However, I found that I can only use about 1G memory for several big matrices. If I continue to create small matrix, I can use about 1.5G memory. Although I tried to change the system performance so that the swap space was set to the maximum, it still doesn't work. I want to maximize the usage of my 4G memory, at least to 2G. Any suggestions?
Thanks!
![Page 9: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/9.jpg)
9
MATLAB Users’ Data Set Sizes
25% of MATLAB users have data set sizes > 100 MB
![Page 10: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/10.jpg)
10
MATLAB Users’ File Sizes
30% of MATLAB users access files > 100 MB
![Page 11: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/11.jpg)
11
Strategies for Handling Large Data Sets
![Page 12: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/12.jpg)
12
Strategies
Ensure available memory > required memory Maximizing available memory on your system
– System configuration Minimize required memory in MATLAB
– During access, storage, processing, plotting
![Page 13: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/13.jpg)
13
Two Tactics Not Focused on Today
Use 64-bit– Removes one key memory constraint allowing processes
to address many tera bytes of data. Distributed computing
– N Machines ~= N x Memory– Subset of all applications (data parallel)
![Page 14: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/14.jpg)
14
Maximizing Available Memory on Your System
![Page 15: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/15.jpg)
15
What’s the biggest array in MATLAB under Windows XP?
>>a=zeros(?,1);>>whos
a) 600MBb) 1GBc) 1.5GB
![Page 16: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/16.jpg)
16
Memory Constraints Causing Out of Memory Errors
DataDatazeros(1e9,1)zeros(1e9,1)
New Data New Data RequirementsRequirements
ContiguousContiguousFree BlockFree Block}}
Other FragmentsOther Fragments
MATLABMATLABWorkspaceWorkspace
Other ML VariablesOther ML Variables
WorkspaceWorkspace
ML ML footprint, footprint, Win DLLsWin DLLs
}}
MATLABMATLABProcess virtual Process virtual
memorymemory
Limit OS dependantLimit OS dependante.g. 2GB in Win2ke.g. 2GB in Win2k
Other AppsOther Apps
MATLABMATLABProcessProcessVirtual Virtual memorymemory
All ApplicationsAll Applicationsmemorymemory
requirementrequirement
}} RAMRAM
Page FilePage File
}}
Total SystemTotal SystemMemory Memory
![Page 17: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/17.jpg)
17
Total System Memory
DataDatarand(1e8,1)rand(1e8,1)
New Data New Data RequirementsRequirements
ContiguousContiguousFree BlockFree Block}}
Other FragmentsOther Fragments
MATLABMATLABWorkspaceWorkspace
Other ML VariablesOther ML Variables
WorkspaceWorkspace
ML ML footprint, footprint, Win DLLsWin DLLs
}}
MATLABMATLABProcess virtual Process virtual
memorymemory
Limit OS dependantLimit OS dependante.g. 2GB in Win2ke.g. 2GB in Win2k
Other AppsOther Apps
MATLABMATLABProcessProcessVirtual Virtual memorymemory
All ApplicationsAll Applicationsmemorymemory
RequirementRequirement
}} RAMRAM
Page FilePage File
}}
Total SystemTotal SystemMemory Memory
![Page 18: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/18.jpg)
18
Total System Memory Available
Storage for all processes =
– Physical RAM (fast and expensive)+
– Page file on disk (cheap and slow)
Memory Management Guide Tech note 1106 Amount of RAM affects performance; not direct cause of
“out of memory” errors
![Page 19: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/19.jpg)
19
Viewing Total System Memory Available and Usage: Task Manager Alt-Ctrl-Del Right-click task bar Physical, commit charge Process Explorer (Google
“process explorer”)
![Page 20: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/20.jpg)
20
Maximizing Total System Memory
Size– Ensure non-zero or system managed page file– Max 4 GB
Performance– Add RAM– Max 4 GB
![Page 21: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/21.jpg)
21
The MATLAB Process’s Virtual Memory
DataDatarand(1e8,1)rand(1e8,1)
New Data New Data RequirementsRequirements
ContiguousContiguousFree BlockFree Block}}
Other FragmentsOther Fragments
MATLABMATLABWorkspaceWorkspace
Other ML VariablesOther ML Variables
WorkspaceWorkspace
ML ML footprint, footprint, Win DLLsWin DLLs
}}
MATLABMATLABProcess virtual Process virtual
memorymemory
Limit OS dependantLimit OS dependante.g. 2GB in Win2ke.g. 2GB in Win2k
Other AppsOther Apps
MATLABMATLABProcessProcessVirtual Virtual memorymemory
All ApplicationsAll Applicationsmemorymemory
RequirementRequirement
}} RAMRAM
Page FilePage File
}}
Total SystemTotal SystemMemory Memory
![Page 22: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/22.jpg)
22
It is Limited and OS-Dependent
32-bit platforms– Windows 2000 and XP (by default): 2 GB– Linux/UNIX/MAC system configurable: 3-4 GB– Windows XP with /3gb boot.ini switch: 3 GB
64-bit platforms– Linux: 8TB (not all 64 bits used)
![Page 23: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/23.jpg)
23
Maximizing The MATLAB Process’s Virtual Memory Choose OS with largest process memory (in order):
– 64-bit Linux, (future Win64)– 32-bit UNIX/Linux/MAC– Windows XP with /3gb– Window 2000, Windows XP (by default)
![Page 24: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/24.jpg)
24
Checking The Virtual Memory Limit
>>system_dependent memstats UNDOCUME
NTED
![Page 25: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/25.jpg)
25
Increasing Process Limit on XP to 3G: 3GB Switch
Right-click Properties> Advanced > Startup and Recovery > Edit
Make copy of [Operating system line], change comment and add /3gb
Reboot, select new OS option, check memstats
![Page 26: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/26.jpg)
26
![Page 27: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/27.jpg)
27
![Page 28: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/28.jpg)
28
![Page 29: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/29.jpg)
29
![Page 30: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/30.jpg)
30
![Page 31: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/31.jpg)
31
![Page 32: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/32.jpg)
![Page 33: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/33.jpg)
33
![Page 34: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/34.jpg)
34
![Page 35: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/35.jpg)
35
![Page 36: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/36.jpg)
36
![Page 37: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/37.jpg)
37
![Page 38: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/38.jpg)
38
The MATLAB Process’s Virtual Memory Limit with 3GB Switch>>system_dependent memstats UNDO
CUMENTED
![Page 39: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/39.jpg)
39
Workspace Size and Largest Block
DataDatarand(1e8,1)rand(1e8,1)
New Data New Data RequirementsRequirements
ContiguousContiguousFree BlockFree Block}}
Other FragmentsOther Fragments
MATLABMATLABWorkspaceWorkspace
Other ML VariablesOther ML Variables
WorkspaceWorkspace
ML ML footprint, footprint, Win DLLsWin DLLs
}}
MATLABMATLABProcess virtual Process virtual
memorymemory
Limit OS dependantLimit OS dependante.g. 2GB in Win2ke.g. 2GB in Win2k
Other AppsOther Apps
MATLABMATLABProcessProcessVirtual Virtual memorymemory
All ApplicationsAll Applicationsmemorymemory
RequirementRequirement
}} RAMRAM
Page FilePage File
}}
Total SystemTotal SystemMemory Memory
![Page 40: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/40.jpg)
40
Workspace Size and Largest Block
Workspace Size = 2/3GB minus:– System DLLs– Java – MATLAB.exe and DLLs
Largest block (for numerical arrays)– Fragmentation (Mainly on Windows)– Third party DLLs (e.g., Google Desktop, Fineprint)– Windows security updates
![Page 41: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/41.jpg)
41
Finding Size of Workspace and Largest Block>>system_dependent memstats
Workspace size– 1.7 or 2.7 GB
Largest block– Goal 1.5 GB– Diagnose if less
Atlantis
UNDOCUME
NTED
![Page 42: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/42.jpg)
42
Diagnosing Memory (Workspace) Fragmentation>>system_dependent dumpmem Shows MATLAB memory map and DLLs Shows where largest blocks starts Reveals causing fragmentation
UNDOCUME
NTED
![Page 43: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/43.jpg)
43
Example: Print Driver and New System DLLs Third party DLLs (e.g. Fineprint) Windows Security Updates and service pack DLLs
– Use movedlls.exe– www.mathworks.com/support (Solution Number: 1-1HE4G5)
![Page 44: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/44.jpg)
44
Maximizing Largest Contiguous Block
Uninstall third-party tools if issue Use XP SP2 and movedlls fix utility Other techniques
– Pack (save and load): Useful when using lots of variables after working for a while
– Restart MATLAB
![Page 45: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/45.jpg)
45
Final Memory Available on XP
32-bit Windows XP default– 1.7 GB total, 1.5 GB contiguous
32-bit Windows XP with /3gb switch– 2.7 GB Total, 1.5 GB contiguous
Rough guide what is possible– Can process 100s MB arrays with simple operations– Can process 10s MB arrays with complex operations,
many operations, lines of codes
![Page 46: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/46.jpg)
46
What’s the biggest real array in MATLAB under 64-bit Linux?
>>a=zeros(?,1);>>whos
a) 2 GBb) 16 GBc) 8 TB
![Page 47: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/47.jpg)
47
64-bit Linux MATLAB
Workspace size limit: TB’s Single array number of elements size limit 2e9 elements (2^31-2), mxarray limitation
![Page 48: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/48.jpg)
48
Summary: Maximizing Available Memory
Maximize total system memory and performance– Use non-zero page file and have lots RAM
Minimize the system memory requirements– Close other applications
Maximize MATLAB process’s virtual memory– Use best OS or configure
Maximize largest contiguous block– Diagnose fragmentation
![Page 49: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/49.jpg)
49
Minimizing Required Memory in MATLAB
![Page 50: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/50.jpg)
50
Minimizing Memory Requirements
Data access Data storage Processing Plotting
![Page 51: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/51.jpg)
51
Application Problem Description
Semiconductor wafer thickness test data– Six months of production, millions of wafers
Problem– What percentage of the wafers manufactured last month
meet thickness specifications? Thickness data
– Large text waferdata.csv– Nine position plus other information– Try import– View contents waferdata_start.csv
![Page 52: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/52.jpg)
52
Data Access
![Page 53: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/53.jpg)
53
Take Only What You Need
Take only what you need for the calculation– Not usually problem with databases– Common problem with big flat files
Consider block processing– Independent blocks– State saved (e.g., filtering)
Tip: Clear variable first
![Page 54: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/54.jpg)
54
For Text Files Use textscan
Take rows and columns you need For example: data = textscan(fid, format, N, ‘delimiter’,’,’)– Select columns with format string– Select rows with N
Returns cells for each data type in format string. You need to convert to doubles.
Only read in nine columns and one month (1e6 rows)
Exercise 5
![Page 55: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/55.jpg)
55
Binary Files: Try memmapfile
Map a section of a binary file into memory Benefits
– Faster than fread and fwrite– Access files with MATLAB indexing operations
Other– Random access to sections– Can have multiple views– Take from MATLAB memory
Example– Simple homogenous file – Mixed data types and access as arrays
![Page 56: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/56.jpg)
56
Memory Mapping Example
%% Default, map whole file as uint8m=memmapfile('waferdata_uint8.bin')m.Data(1:20);
%% Specify format and namem=memmapfile('waferdata_uint8.bin','format',{'uint8' [20 100] 'x'},'repeat',20*1000)
A=m.Data;
%% Change format on the flym.format={'uint8' [1 4] 'headerbits';... 'uint8' [4 9] 'middle';... 'uint8' [7 1] 'tail'};A=m.Data;
![Page 57: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/57.jpg)
57
Data Storage
![Page 58: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/58.jpg)
58
Use Smallest Data Type
Depends on intended actions Complicated Math (e.g., Linear Algebra)
– Doubles or singles, 8 or 4 bytes– For example: a=single(7)
Simple arithmetic and original data is integers– Integers,1-4 bytes, for example e.g. a=int8(7)– Can be faster than doubles– Try with waferdata (Exercise 6)
Sparse– Just non-zero values and index stored>>a = sparse(2e9,1,pi);
Exercise 6a cell execute
![Page 59: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/59.jpg)
59
Use Smallest Data Type (cont.)
Categories, dates– Cell arrays of strings, 60 byte header
for each element– Use sparingly
Contiguousness– Numeric arrays must be contiguous– Cell arrays and structures do not
Comparison with C– For numerical processing, similar
choice of data types
![Page 60: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/60.jpg)
60
Load and Store as uint8
Read in uint8– Note: when preallocating need to specify data types
Exercise 6
![Page 61: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/61.jpg)
61
Processing
![Page 62: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/62.jpg)
62
Memory for Processing
Calculate only the results you need MATLAB operators and functions need extra memory
storage– Passing data to functions by value and assignments– Copy on write
Makes MATLAB safer, easier to debug – More memory than in-place operations using pointers
![Page 63: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/63.jpg)
63
Monitoring MATLAB Memory Usage
Process Explorer or Task Manager
MATLAB Monitoring Tool: www.MATLABcentral.com
MATLAB Central > File Exchange > Utilities > Development Environment > MATLAB Monitoring Tool
Example:– x=rand(10e6,1);– y=x; y(2)=1.5;
UNDOCUME
NTED
![Page 64: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/64.jpg)
64
Monitoring MATLAB Memory Usage (cont.) Total memory usage: workspaces, graphics>>system_dependent(‘CheckMallocMemoryUsage’);ans = 6820440 Show results in bytes (not MBytes) Starts with a few MB To use it, set an environmental variable
– MATLAB_MEM_MGR debug– In DOS window, C:\ setx MATLAB_MEM_MGR debug
UNDOCUME
NTED
![Page 65: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/65.jpg)
65
How much temporary memory does data=data+1 require, where data is a 1 MB array take in an M-File?
Run M-file containing:data=zeros(1e6,1,’int8’); data=data+1;
a) 0 MBb) 1 MBc) 2 MB
![Page 66: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/66.jpg)
66
JIT Vs Interpreter for Large Data Set Handling Run loaddata Offset in the thicknesses data needs to be corrected
with data = data+5– At command line– In M-File
Run Loaddata, then exercise 7a
UNDOCUME
NTED
![Page 67: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/67.jpg)
67
Minimizing Copies and Temporaries
Share data rather than pass to functions– Nested Functions – Global
Reduce size of array to scalars or smaller blocks– Memory copies are equal to size of array– Process with for loops, de-vectorize
Can be slower, so must trade off speed for less memory
![Page 68: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/68.jpg)
68
What is the fastest way to process MATLAB matrices with for loops?
a) Down the columnsb) Along the rowsc) Doesn't matter
Exercise 7
UNDOCUME
NTED
![Page 69: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/69.jpg)
69
Example: De-Vectorizing
1D 2D
Exercise 7
![Page 70: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/70.jpg)
70
Find Percentage of Wafers Meeting Specification What percentage of the wafers meet specification
– Must be < Maximum thickness of 200– Must be > Minimum thickness of 70
exercise 8a, 8c
![Page 71: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/71.jpg)
71
Plotting
![Page 72: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/72.jpg)
72
How much extra memory do you need to plot a 10 MB double array?
>>x=rand(125e4,1);>>plot(x);
a) 10 MBb) 20 MBc) 40 MB
Exercise 9a, process explorer on MATLAB task
![Page 73: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/73.jpg)
73
Memory Copies When Plotting
Need extra memory to plot– For example: >>plot(data(:,1));
plot(x,y)– Copy of x, y to xdata and ydata properties of line
plot(y)– Copy of y in ydata and indices in xdata, 1:size(y)
Results– Memory requirements triple at least (more
temporarily)– Integers more (stored as doubles)
![Page 74: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/74.jpg)
74
Minimize Copies: Plot Only What You Need Limited resolution of screen/human eye Sub-select, down-sample
– Plot every Nth element– Plot min/max in each block
![Page 75: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/75.jpg)
75
Minimize Memory Requirements: Review
Accessing data– Take only what you need– Try processing in blocks
Data Storage– Store in smallest data type
Processing– Reduce temporaries with loops, blocking, nested, globals
Plotting– Plot what you need
![Page 76: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/76.jpg)
76
Summary: Strategies For Handling Large Data Sets Ensure available memory > required memory 64-bit and distributed computing 32-bit single CPU
– Maximize available memory– Minimize required memory
![Page 77: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/77.jpg)
77
Appendix
![Page 78: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/78.jpg)
78
Appendix Contents
Resources 1 MB is Not Equal to Million Bytes Setting the Paging File Size File Size vs. Data Size Minimizing Other Applications’ Memory
![Page 79: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/79.jpg)
79
Resources
Technical note 1106– www.mathworks.com/support, enter 1106
“Large Data Set Handling in MATLAB 7” Digest Article November 2004: www.mathworks.com/company/newsletters/digest/nov04/newfeatures.html
![Page 80: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/80.jpg)
80
1 MB is Not Equal to Million Bytes
1 KB= 2^10 bytes = 1024 bytes– Not 1e3 bytes
1 MB=2^20 bytes = 1048576 bytes– Not 1e6 bytes
1 GB=2^30 bytes = 1073741824 bytes– Not 1e9 bytes
![Page 81: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/81.jpg)
81
Setting the Paging File Size
My Computer > Properties > Advanced > Performance > Advanced > Virtual memory, Change then press “Set”
Set to System Managed Size or Custom
Slow down with page file, swapping
![Page 82: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/82.jpg)
82
File Size vs. Data Size
Binary files: File size = data size– Assuming you use same data type as in the binary files
Text files: File size ~= data size, for example:– Data element: double= 8 bytes– Text file element, format 1: 3.4, 4 chars = 4 bytes– Text file element, format 2: 3.47896e+001, 13 chars= 13
bytes
![Page 83: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com.](https://reader030.fdocuments.net/reader030/viewer/2022033020/56649e2a5503460f94b17ae0/html5/thumbnails/83.jpg)
83
Minimizing Other Applications’ Memory
Shut them down Restart Computer, if can’t
get < 300 MB Use windows msconfig
to avoid starting up applications
Not too important if you have a page file