Multidimensional Indexing: Spatial Data Management & High Dimensional Indexing
TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing...
Transcript of TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing...
![Page 1: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/1.jpg)
TagIt: An Integrated Indexing and Search Servicefor File Systems
Hyogi Sim†,∗, Youngjae Kim‡, Sudharshan S. Vazhkudai∗, Geoffroy R. Vallee∗,Seung-Hwan Lim∗, and Ali R. Butt†
Virginia Tech†, Oak Ridge National Laboratory∗, Sogang University‡
![Page 2: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/2.jpg)
Need for scientific data management service• Big data in scientific computing– Ever-increasing data generation rate from scientific applications and experiments– Growing number of data files in the shared storage system– Extremely cumbersome to locate datasets of interest
• Where are my result files from the Supernova simulation last month?
Number of file in the Spider II storage system (32PB) in Oak Ridge Leadership Computing Facility
![Page 3: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/3.jpg)
Existing solutions• File systems do not directly support scalable search and discovery semantics– GPFS, Lustre, HDFS, GlusterFS, Ceph, …– They are designed to provide scalable storage and failure resilience
• Tagging and search solutions in commodity/desktop file systems cannot be simply extended to the large scale file systems– Spotlight /HFS+ in OS X, Google Desktop
• Ad-hoc methods to manage the scientific datasets– Directory hierarchy and descriptive file names– Manual annotations and domain-specific datasets with external database catalog
(e.g., ESGF, …)
![Page 4: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/4.jpg)
Current state of the art: using an external database catalog
• Extra resources and efforts to design, deploy and maintain at scale• Updating the external database catalog is very costly• Disconnect between file system and external database catalog– Inevitable inconsistency between the two systems
File System External Database Catalog
FS Scan &DB Update
![Page 5: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/5.jpg)
TagIt: file system-integrated data management service
File System
File system-integrated distributed metadata indexing• Supporting user-defined tags• Consistent and scalable metadata indexing
Additional data management services using active operations• Server-side data reduction/filtering• Automatic metadata extraction framework
TagIt integrates user-defined metadata with the datasets, making the file system inherently searchable!
![Page 6: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/6.jpg)
GlusterFS: shared-nothing distributed FS• No dedicated metadata server– Directory hierarchy is mirrored on all volume servers– A file is created in a single volume server (DHT)
Volume server #1 Volume server #2 Volume server #3 …
Network
Client #1 Client #2 Client #3 …$ mkdir /mnt/gluster/testdir
/testdir /testdir /testdir/testdir/file1 /testdir/file2
Brick #1 Brick #2 Brick #3
$ touch /mnt/gluster/testdir/file1$ touch /mnt/gluster/testdir/file2
• File/directory semantics are kept in volume servers• All operations to a file are isolated to a single volume server
![Page 7: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/7.jpg)
TagIt architecture overview
…
Network
Client #2 Client #3
…
Brick #1 Brick #2 Brick #3
Client #2
Volume Server #1 Volume Server #2 Volume Server #3
Distributed Metadata Index Database
DB Manager
Active Manager
DB Manager
Active Manager
DB Manager
Active Manager
TagItUtility
TagItUtility
TagItUtility
find-likecommandlineutility
Proc-likedynamicvirtualview
![Page 8: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/8.jpg)
Volumeserver#1
Indexshard#1
Distributed metadata index database
• File system distributes files evenly across multiple servers based on DHT• In the shared-nothing architecture, all operations to a file take place in a single server
GID GFID FID GID PATH NAME
NID NAME XID GID NID VALUE
FILEID FILEPATHNAME
ATTR.VALUEATTR.NAME
Brick#1 Metadata Dataa.txt
Volumeserver#2
Indexshard#2
GID GFID FID GID PATH NAME
NID NAME XID GID NID VALUE
FILEID FILEPATHNAME
ATTR.VALUEATTR.NAME
Brick#2 Metadata Datab.txt
…GID GFID1 1000
FID GID PATH NAME10 1 /my/test a.txt
NID NAME101 job
XID GID NID VALUE100 1 101 “sim-a”
GID GFID1 1001
FID GID PATH NAME10 1 /my/test b.txt
NID NAME101 job
XID GID NID VALUE100 1 101 “sim-b”
• Index database is evenly distributed across multiple servers• The consistency and durability problems are localized to a single server
![Page 9: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/9.jpg)
Index update:TagIt-Sync
• Synchronous index update– Consistent, durable index database– Significant performance penalty due to the extra I/O operations from DB
VolumeServer
Brick(e.g.,XFS)
ClientI/Orequest
FileI/Omanager IndexDBManager
I/Ooperation IndexDBupdate
Returntotheclient
MetadataDBFile
![Page 10: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/10.jpg)
Index update: TagIt-Async
• Dedicated DB update thread– Negligible runtime overhead– Consistency and durability of the index database
• Queueing delay is under 1ms in a congested environment (1:8 server to client ratio)• For an unexpected shutdown, lost records are recovered from the GlusterFS journal and the
backend local FS (30 sec. to recover metadata from 10,000 files)
VolumeServer
Brick(e.g.,XFS)
ClientI/Orequest
FileI/Omanager
IndexDBManager
I/Ooperation
Returntotheclient
MetadataDBFile
mmaped DB
memorymapped
Requestupdate
PeriodicsyncbyOS
UpdateDB
DBthread
![Page 11: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/11.jpg)
TagIt: data management service• User-defined tags and file search– TagIt indexes the standard POSIX extended attributes
• setfattr and getfattr commands
– tagit utility supports to search files based on stat and extended attributes• e.g., Where are the files that I generated last month with the Supernova simulation?
• Advanced active operations associated to the search– Similar to find … -exec ...– Operations are offloaded and performed in the volume servers
![Page 12: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/12.jpg)
user $ ls /tagit/datachkpnt1.out chkpnt2.out chkpnt3.out run.log tmp.txtuser $ setfattr –n job –v Supernova /tagit/data/chkpnt*.outuser $ tagit /tagit/data –attr “job=Supernova”/tagit/data/chkpnt1.out/tagit/data/chkpnt2.out/tagit/data/chkpnt3.out
Volumeserver#1
IndexDB
Client
Brick#1
ActiveManager
Volumeserver#2
IndexDB
Brick#2
ActiveManager …
Networksearchquery
searchresult
Tagging with standard xttr
commands
File search based on tags
![Page 13: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/13.jpg)
user $ tagit /tagit/data –attr “job=Supernova”/tagit/data/chkpnt1.out # we want to calculate the average/tagit/data/chkpnt2.out # of temperature values in each file/tagit/data/chkpnt3.outuser $ tagit /tagit/data –attr “job=Supernova” –exec ./avgavgtemp=1000 # result of ./avg /tagit/data/chkpnt1.outavgtemp=2000 # result of ./avg /tagit/data/chkpnt2.outavgtemp=1500 # result of ./avg /tagit/data/chkpnt3.out
Volumeserver#1
IndexDB
Client
Brick#1
ActiveManager
Volumeserver#2
IndexDB
Brick#2
ActiveManager …
Networksearchquery+command
searchresult searchresult
activeprocessingactiveprocessing
executionresult Active Operation
Dataset of interest
![Page 14: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/14.jpg)
user $ tagit /tagit/data –attr “job=Supernova” –exec ./avgavgtemp=1000 # result of ./avg /tagit/data/chkpnt1.outavgtemp=2000 # result of ./avg /tagit/data/chkpnt2.outavgtemp=1500 # result of ./avg /tagit/data/chkpnt3.outuser $ tagit /tagit/data –attr “job=Supernova” –exec ./avg -indexuser $ tagit /tagit/data –attr “job=Supernova and avgtemp>1000”/tagit/data/chkpnt2.out/tagit/data/chkpnt3.out
Dataset of interest
Indexing the result (metadata
extraction)
Volumeserver#1
IndexDB
Client
Brick#1
ActiveManager
Volumeserver#2
IndexDB
Brick#2
ActiveManager …
Networksearchquery+command
searchresult searchresult
activeprocessingactiveprocessing
Returncode
processingresult processingresult
Search with new attribute
![Page 15: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/15.jpg)
Performance evaluation
• Implementation using GlusterFS-3.7–Modular architecture based on translators– Server-side: A dedicated translator for the metadata indexing and active
operations using SQLite embedded database– Client-side: Command line utilities using GlusterFS API (glapi)
1. What is the overhead of the extra metadata indexing?2. What is the file search performance?3. How effective are the active operators?
![Page 16: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/16.jpg)
1. What is the overhead of the extra metadata indexing?
0.98 0.95 0.980.91
0.97 1.00 0.96
0
0.2
0.4
0.6
0.8
1
1.2
F-create F-stat F-read F-remove D-create D-stat D-remove
Nor
mal
ized
IOPS
GlusterFS TagIt-Async
• mdtest with 104 node cluster• Two four-core Xeon E5410 processor with 16GB RAM• Mellanox MT25208 10Gbit/sec Infiniband network
• 80 volume servers using 80 physical nodes (tmpfs) and 24 clients
Less than 10%
![Page 17: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/17.jpg)
2. What is the file search performance?• Comparing to the external database approach– TagIt vs. MySQL, both with 16 identical servers using SSDs–Workload: 1.3 million entries from a Spider II PFS daily snapshot
• Populating the index database with the MySQL took 96 minutes1. Scan of the file system2. Populate the database with the scanning result
• TagIt does not need extra data population
![Page 18: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/18.jpg)
Test queries and method• Test file search queries– Q1: Locate files/directories with pathname containing ‘never-existing’ (name)– Q2: Count the number of regular files in ‘/proj’, owned by me (path, mode, uid)– Q3: Find regular files with a ‘.mpi’ extension owned by our group under /proj
(path, mode, uid)– Q4: List all files owned by our group (path, mode, gid)– Q5: List all files that were created within the last 24 hours (path, mode, ctime)
• Test method– Each client is repeatedly execute a query for 50 times.– Increase the number of clients up to 16.
![Page 19: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/19.jpg)
0500010000150002000025000300003500040000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Num
ber o
f Rec
ords
Server ID
Q5 - Record Distribution
MySQL-16
TagIt
0
100
200
300
400
500
600
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Num
ber o
f Rec
ords
Server ID
Q4 - Record Distribution
MySQL-16
TagIt
562/64735,150/50,502
0100200300400500600700800900
1 2 4 8 16
Runt
ime
(sec
onds
)
Number of clients
Q4
MySQL-16
TagIt
02004006008001000120014001600
1 2 4 8 16
Runt
ime
(sec
onds
)
Number of clients
Q5
MySQL-16
TagIt
![Page 20: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/20.jpg)
3. How effective are the active operators?• Workload– AMIP* atmospheric measurement dataset with 132 netCDF files
(each 1.2 GB, total 150 GB)– Calculating the average temperature from each file
• Offline vs. TagIt operator– 16 volume servers– Offline: traditional way with file I/O system calls
• up to 16 processes from 16 client nodes– TagIt: using the active operator
• tagit /AMIP –name *.nc –exec ./getavg
*AMIP: Atmospheric Model Intercomparison Project
![Page 21: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/21.jpg)
Active operator runtime comparison65.52
34.91
19.51
11.657.28
4.26 4.26 4.26 4.26 4.26
0
10
20
30
40
50
60
70
1 2 4 8 16
Run
time
(sec
onds
)
Number of Clients
Offline-Multi TagIt
![Page 22: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/22.jpg)
TagIt summary• File system-integrated indexing and search service– Consistent, scalable metadata indexing framework– Advanced data management services including active operations and
metadata extraction– No need for additional resources
![Page 23: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/23.jpg)
Questions?
![Page 24: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/24.jpg)
0
1
2
3
4
5
6
7
2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78 82 86 90 94
Run
time
(sec
onds
)
Number of volume servers
Query Performance at Scale
Q1Q2Q3
Query broadcasting overhead
• 96 volume servers using 48 physical nodes (OLCF Rhea)• Populating 105 million files, metadata index database: 140GB• Executing queries from a single server
Q1:0.013x
Q2:0.018xQ3:0.016x
• Q1: 0• Q2: 1 (count)• Q3: 4,766
4.5sec.
6.1sec.
![Page 25: TagIt : An Integrated Indexing and Search Service for File ... · TagIt : An Integrated Indexing and Search Service for File Systems Hyogi Sim†,∗, Youngjae Kim‡, SudharshanS.Vazhkudai](https://reader033.fdocuments.net/reader033/viewer/2022052803/5f25afef0a434a37f15ab50a/html5/thumbnails/25.jpg)
Active operator runtime comparison65.52
43.56
35.79 34.63 34.25
65.52
34.91
19.51
11.657.28
4.26 4.26 4.26 4.26 4.26
0
10
20
30
40
50
60
70
1 2 4 8 16
Run
time
(sec
onds
)
Number of Clients
Offline-Single Offline-Multi TagIt