Horizontal aggregatios in sql to prepare dataset using split-spj metho
-
Upload
arifur-rahman-sazal -
Category
Technology
-
view
1.355 -
download
1
description
Transcript of Horizontal aggregatios in sql to prepare dataset using split-spj metho
Welcome To
Thesis Presentation
PresentationOn
Horizontal Aggregations in SQL to prepare Dataset using Split-SPJ Method
ATHESIS & PROJECT
BY
Arifur Rahman (074051)Md. Taz Uddin (074044)
Md. Tareq Imran (074050)
Supervised BY
Sumaya KazaryAssistant professor, Dept. of CSE, DUET
Introduction1
Analysis2
Experimental Overview3
Compare Performance4
Future plans5
Overview
3April 10, 2023
IntroductionPreparing a data set for analysis is generally the most time
consuming task in a data mining project, requiring many complex SQL queries, joining tables, and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations in Split-SPJ method. Horizontal aggregations build data sets with a horizontal de-normalized layout (e.g., point-dimension, observation variable, instance-feature), which is the standard layout required by most data mining algorithms.
4April 10, 2023
5April 10, 2023
Introduction (Contd)
Data Mining : Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into usefulinformation.
6April 10, 2023
Introduction (Contd)
Dataset : A dataset (or data set) is a collection of data, usually presented in tabular form. Each column represents a particular variable. Each row corresponds to a given member of the dataset in question.
Vertical Aggregation : It arrange dataset from database in vertically as respect with necessary query (such as group by clause in SQL) .Generally in relational database system the aggregation are arranged by vertical aggregation.
7April 10, 2023
Introduction (Contd)
Horizontal Aggregation : Here introduce a new class of aggregations that have similar behavior to SQL standard aggregations, but which produce tables with a horizontal layout. In contrast, we call standard SQL aggregations vertical aggregations since they produce tables with a vertical layout. Horizontal aggregations just require a small syntax extension to aggregate functions called in a SELECT statement.
8April 10, 2023
Analysis
Problem of Horizontal Aggregation : Number of column may be exceed than the allowed number of column of DBMS. That means reaching the maximum number of columns in one table and reaching the maximum column name length when columns are automatically named.
To elaborate on this, a horizontal aggregation can return a table that goes beyond the maximum number of columns in the DBMS when the set of columns {R1,. . .,Rk} has a large number of distinct combinations of values, or when there are multiple horizontal aggregations in the same query.
9April 10, 2023
Analysis (Contd)
Column limit of different Database System :
Database Maximum Permitted ColumnMicrosoft Access 255Microsoft SQL Server 1024MySql 4096Oracle Default 1000 but it can be
increase by command.
10April 10, 2023
Analysis (Contd)
Introduce with Split-SPJ method
If vertical attributes of a table is :ID, VA1, VA2, VA3, VA4,, . . . . .. . . . . ,VA255, VA256, VA257, . . . . ,VA272, VA273 (It is impossible to aggregate in SPJ method)
The output of Split-SPJ method :Table-1ID, VA1, VA2, VA3, VA4, VA5, VA6, VA7, . . . . . . . . . ,VA255Table-2ID, VA256, VA257, . . . . . . . . . . . ,VA270, VA271, VA272, VA273
11April 10, 2023
Experimental Overview
Facebook_id Image_name Character_lengthUser1 Pic1 31User1 Pic2 27User1 Pic4 20User1 Pic10 30
.
.
.
.
.
.
.
.
.
.
.
.User4 Pic200 10User4 Pic220 26User4 Pic299 15User4 Pic340 25User4 Pic360 35
Vertical aggregation of experimental data :
12April 10, 2023
Experimental Overview (Contd)
Horizontal aggregation in SPJ method :
Facebook_id Image_name_pic1 Image_name_pic2 Image_name_pic3
. . . . . . . . . . . . . . . .
Image_name_pic255
User1 31 31 20
User2 14 17 14
User3 17 15 13
User4 10 5 8
13April 10, 2023
Experimental Overview (Contd)Horizontal aggregation in proposed Split-SPJ method :
Facebook_id Image_name_pic1 Image_name_pic2 Image_name_pic3
...
Image_name_pic255
User1 31 31 20
User2 14 17 14
User3 17 15 13
User410 5 8
Facebook_id Image_name_pic256 Image_name_pic267 Image_name_pic258
...
Image_name_pic360
User1 31 31 60 50
User2 14 45 40
User3 17 15
User4 10 5 80
Table-1
Table-2
14April 10, 2023
Experimental OverviewCompare Performance :
When aggregated column < 255, performance is same for SPJ and Split-SPJ method.
15April 10, 2023
Experimental Overview (Contd)Compare Performance :
When aggregated column > 255, it is unable to aggregate up to 255 column.
16April 10, 2023
Experimental Overview (Contd)Compare Performance :
When aggregated column > 255, it is possible to aggregate into multiple table.
Future Plan
17April 10, 2023
If the length of aggregate object is exceed column length of related database than there occur an error which may be overcome by using alias method. That means it is very complex to aggregate when data field’s are contain image or file (such as blob data).