Descriptive Data Analysis of File Transfer Data
description
Transcript of Descriptive Data Analysis of File Transfer Data
![Page 1: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/1.jpg)
Descriptive Data Analysis of File
Transfer DataSudarshan Srinivasan
Victor HazlewoodGregory D. Peterson
![Page 2: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/2.jpg)
2
Objective
· Understanding the GridFTP log transfer data we have at NICS.
· Analyze the data and identify areas of potential improvement.
· Perform predictive analysis to improve efficiency.· Apply knowledge to XSEDE service providers.
![Page 3: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/3.jpg)
3
NICS GridFTP Infrastructure
![Page 4: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/4.jpg)
4
GridFTP Logging
· Gridftp data transfer protocol version 5.2.2.· Two types of logging: "usage" logging and
"log_transfer" logging (enabled in 5.2.2).· Prior to 5.2.2 endpoint IP address data was
filled with 0.0.0.0.· Thanks to the Globus folks for fixing this bug!
![Page 5: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/5.jpg)
5
Transfer Logs· NICS uses a PostgreSQL database for storing
transfer log data.· Two new tables: n_gridftp_usage and n_gridftp_usage_detail.
· n_gridftp_usage: quick lookup of aggregate monthly GridFTP usage information.
· n_gridftp_usage_detail: Detailed records of each data transfer.
· Log data includes: starttime, endtime, nbytes, user, filename, source and destination end points.
![Page 6: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/6.jpg)
Log Data Collection· Data from each GridFTP
server is copied to log files to a central NFS location.
· Each month we run a processing script on the log files that checks for errors in the log entry.
· Following this, we run a script to load the log files into database table.
· We chose transfer log data for the year 2013 for this analysis.
DATE=20130401132041.657463 HOST=datamover1.nics.utk.edu PROG=globus-gridftp-server NL_EVNT=FTP_INFO START=2013041132041.534646 USER=username NBYTES=1048576 VOLUME=/ STREAMS=1 STRIPS=1 DEST=[192.249.6.164] TYPE=RETR CODE=226
![Page 7: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/7.jpg)
7
Log Data Analysis· Two variables were identified: number of transfers
and total amount of data transferred.· Data transfer rate based on starttime, endtime and
nbytes.· Monthly visual comparison of data coming into and
going out of NICS from everywhere.· Intra XSEDE site number of transfers and data
transferred coming into and going out of NICS.· Bucketing of transfer data based on transfer size (ts).· R statistical computing language was used to plot all
histograms and graphs.
![Page 8: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/8.jpg)
8
Basic Statistics for the year 2013
Type Quantity
Total Transfers 67,160,380
Average transfers per month 5,596,698
File transfers ts > 64 GB 813 (0.001%)
File transfers 1 MB < ts < 64GB 19,374,549 (28.85%)
File transfers ts < 1 MB 47,785,018 (71.15%)
![Page 9: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/9.jpg)
9
Number of transfers and amount transferred for the year 2013
Number of transfers (in millions)Total = 83.54 millions
Total amount transferred (in TB)Total = 1235.7millions
MonthTota
l am
ount
tran
sfer
red
(in T
B)
Num
ber o
f tra
nsfe
rs(in
mill
ions
) Mean
![Page 10: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/10.jpg)
10
Percentage of transfers vs Transfer size for the year 2013
Total transfers: 67160380
Transfers size (ts)
Per
cent
age
of tr
ansf
ers
![Page 11: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/11.jpg)
11
Transfer speed for top 500 transfers with transfer size > 1GB
Month
gbps
![Page 12: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/12.jpg)
12
Monthly comparison between number of transfers coming into and going out
of NICS for year 2013
Month
Tota
l num
ber o
f tra
nsfe
rs(in
mill
ions
)
![Page 13: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/13.jpg)
13
Monthly comparison between total amount of data coming into and going
out of NICS for year 2013
Month
Tota
l am
ount
of d
ata
mov
ed(in
TB
)
![Page 14: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/14.jpg)
Transfer data buckets for November 2013
14
All transfers for November 2013Total transfers: 2181157
Transfer size (ts)
Per
cent
age
of tr
ansf
ers
All transfers for November 2013, ts < 1MBTotal transfers: 749747
Per
cent
age
of tr
ansf
ers
Transfer size (ts)
All transfers for November 2013, 1MB < ts < 64GBTotal transfers: 1431385
Per
cent
age
of tr
ansf
ers
Transfer size (ts)
All transfers for November 2013, ts > 64GBTotal transfers: 25
Per
cent
age
of tr
ansf
ers
Transfer size (ts)
![Page 15: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/15.jpg)
15
Intra XSEDE Sites and Abbreviation
Site Name AbbreviationTexas Advanced Computer Center TACC
Pittsburgh Supercomputing Center PSC
San Diego Supercomputer Center SDSC
National Institute for Computational Sciences/ Georgia Institute of
Technology
NICS/GaTech
Indiana University IU
Open Science Grid OSG
National Center for Atmospheric Research
NCAR
![Page 16: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/16.jpg)
16
Intra XSEDE site data coming into NICSN
umbe
r of t
rans
fers
(in th
ousa
nds)
Tota
l am
ount
tran
sfer
red
(in T
B)
Month
TACCPSCSDSCNICS/GaTech
IUOSGNCAR
![Page 17: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/17.jpg)
17
Intra XSEDE site data going out of NICS
Month
Num
ber o
f tra
nsfe
rs(in
thou
sand
s)
TACCPSCSDSCNICS/GaTech
IUOSGNCAR
Tota
l am
ount
tran
sfer
red
(in T
B)
![Page 18: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/18.jpg)
18
Intra XSEDE site data coming into and going out of NICS together
TACCPSCSDSCNICS/GaTech
IUOSGNCAR
Num
ber o
f tra
nsfe
rs(in
thou
sand
s)To
tal a
mou
nt tr
ansf
erre
d(in
TB
)
Month
![Page 19: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/19.jpg)
19
Future Work· Currently in progress:
– Moving from using PostgreSQL database to loading data completely in memory in a separate machine.
– Using Apache Spark for fast large-scale data processing.– Combining SQL, streaming, and complex analytics.– Using advanced data mining and machine learning
algorithms provided in libraries in Python.
· Next Step:– Analyze by combing job data, filesystem data, and archive
data for analysis.– Visualize data flow within XSEDE network on a
geographical map.
![Page 20: Descriptive Data Analysis of File Transfer Data](https://reader033.fdocuments.net/reader033/viewer/2022051020/568162a2550346895dd31de9/html5/thumbnails/20.jpg)
Thank You!
Questions?