Thesis summary knowledge discovery from academic data using association rule mining

13
Knowledge Discovery from Academic Data using Association Rule Mining SUBMITTED BY Rajshakhar Paul Student ID: 0805020 Shibbir Ahmed Student ID: 0805097 Summary of the Thesis Department of Computer Science and Engineering BANGLADESH UNIVERSITY OF ENGINEERING AND TECHNOLOGY

Transcript of Thesis summary knowledge discovery from academic data using association rule mining

Knowledge Discovery from Academic Data

using Association Rule Mining

SUBMITTED BY

Rajshakhar Paul

Student ID: 0805020

Shibbir Ahmed

Student ID: 0805097

Summary of the Thesis

Department of Computer Science and Engineering

BANGLADESH UNIVERSITY OF ENGINEERING AND TECHNOLOGY

Page | 1

i. Introduction

Students are one of the fundamental elements of any academic institution. Indeed, the prime concern for

an educational institution is to ensure qualified technical foundation, scholarly guidance and high standard

education to all of its students. For a large educational institute like public university which generates

large volumes of data, it requires an efficient way to apply data mining techniques for obtaining

knowledge on the development and performance improvement of academic activities. The knowledge

acquired from the institutional database will be sufficient to look for answers to such questions as: Which

factors determine better or worse academic performance of students? What are the causes behind the

students' retention i.e., the extended continuation of the studies in the university? Why do students drop

out before graduation i.e., students‟ abandonment from an educational institute. Concepts and techniques

of data mining are essential to discover the hidden knowledge from large datasets.

Bangladesh University of Engineering and Technology (BUET) is the topmost technological university of

Bangladesh and it enrolls the top most brilliant 1000 students selected by a competitive examination

among one million students competing higher secondary education. Among these 1000 students, top

ranked students can get admission into the different departments under different faculties. Although, this

university possesses most of the brightest students of Bangladesh, statistics demonstrates that

performance of some students degrades noticeably. On the other hand, some students perform

outstandingly at the initial stage of the undergraduate studies but they can not demonstrate the same level

of excellence till the completion of their graduation. Some students can not perform well initially but at

the end of their graduation they possess pretty good academic career. Again, there are some students in

this university who have to continue their studies year after year and take a very long time for the

completion of their graduation. Unfortunately there are also some meritorious students who drop out

before the graduation. Only statistical analysis is not sufficient for finding the reasons of all the above

problems in any academic institution. The hidden knowledge inside the institutional academic and

personal data of students is necessary to find out the possible causes of all these problems and take

suitable precaution for them. That is why knowledge discovery and data mining form academic data is

essential for educational institution like BUET to improve academic performance of students as well as

refine the standard of teaching methodologies and reshape the decision makings for the betterment of the

institution.

Discovering the hidden knowledge from educational data and applying it properly for decision making is

essential for ensuring high quality education in any academic institution. For this, data mining techniques

are very effective. But all the data mining techniques can not be applied directly on academic data

because of complex structure. This requires rigorous preprocessing. The choice of support and

confidence, selection of important association rules from huge number of generated rules are other

significant problems of knowledge discovery from academic data.

ii. Motivation

In a developing country like Bangladesh, too many students from rural area come to city for higher

education. They usually come to city leaving their family and have to accommodate with a completely

new environment. They start their new educational life at institution‟s hall. New living place, new types

of foods, new companions, new atmosphere. It is seen that they usually need some time to cope up both

physically and mentally with all of these new things which may hamper their educational activities at the

very beginning. And the scenario is bit more difficult for girls than boys. So sometimes they lag behind at

the beginning of the race of their higher studies which may create an adverse effect in the long run for

them. On the other hand, the city students are more likely familiar with the environment, living with their

family and provided with more opportunities of educational, technological and psychological aspects

Page | 2

which may give them some advantages in the track of higher education. Though the scenario can be

different, the more opportunities may drive them away from the track and demoralize them in studies.

In higher education system like BUET, the performance of one course depends on different aspects such

as class attendance, class test, quiz, assignments, term final examinations, etc. some of which start from

very beginning of the class. So if any student gets poor marks in any of these, it may affect the final

result. And the later courses are sometimes dependent of previous courses. So if any student gets poor

result in any course it may affect the performance of other related courses too.

So it is very obvious to discover all possible knowledge from academic data to know all the relevant rules

behind students‟ performances whether they are doing well or bad. And if they cannot perform well then

the reason behind it can also be discovered.

iii. Goal and Objectives

The department of Computer Science and Engineering (CSE) is one of the prestigious departments of

BUET. Although, this department possesses most of the brightest students of Bangladesh, statistical data

demonstrates that performance of some students degrades noticeably. Moreover the problem of retention

as well as abandonment is also prevalent among the students. The main objective of this research study is-

To discover knowledge of students‟ academic progress from academic performance with personal

statistics through the impact of different assessment of courses e.g., class test, attendance, term final

examination etc.

To find out reasons behind the degradation of student‟s merit i.e., decay in their potentiality

To discover causes behind extended continuation for graduation i.e., retention of students

To find out why some meritorious students drop out before graduation i.e., abandonment of students

iv. Key Techniques used to achieve the Goal

A. Data Analysis

1) Personal and Academic Data

In this research, we have considered academic data structure of BUET. The student data of the BIIS

(BUET Institutional Information System) contains several personal and academic information of a

particular student. We have collected them anonymously for the data preprocessing and data analysis. We

have considered these personal and academic data stated in the Table 1 for knowledge discovery

regarding academic performance, abandonment and retention of students illustrated in Figure 4.1.

Table 4.1: Selected Data from BIIS database

Academic Information

Department

Admission Year / Batch

Overall CGPA

Marks of Class test, Attendance, Two Answer

Scripts, Total Marks and Grades of all Theory

Courses

Total Marks and Grades of all Sessional Courses

Total Completed Credit Hour

Personal Information Gender

Hall Resident/Non-resident

Page | 3

Figure 4.1: Factors related to Academic Performance, Abandonment and Retention of students

2) Course and Curriculum

As we have experimented with the students‟ data of the department of Computer Science and Engineering

(CSE) in BUET, we have analyzed all the courses in the curriculum which has to be taken to complete the

BSc degree. A student has to take total 68 departmental and non-departmental courses in total. All the

courses along with their credit hour are shown in Table 4.2.

Table 4.2: All Undergraduate Courses for department of CSE

Among them there are 40 theory courses (25 departmental and 15 non-departmental) and 28 sessional

courses (20 departmental and 7 non-departmental) including thesis. We determine academic performance

and impact of other factors on basis of these courses‟ final grade and marks of attendance, class tests,

term final answer scripts, total marks etc.

Course Type Credit Hour Course Number

Departmental

Theory Courses

4.0 CSE307, CSE321

3.0

CSE103, CSE105, CSE201, CSE203, CSE205, CSE207,

CSE209, CSE303, CSE305, CSE309, CSE311, CSE301,

CSE313, CSE315, CSE317, CSE401, CSE403, CSE423,

CSE409, CSE461, CSE463

2.0 CSE100, CSE 211

Departmental

Sessional Courses

1.5 CSE106, CSE202, CSE206, CSE210, CSE214, CSE304,

CSE308, CSE314, CSE316, CSE404

0.75

CSE204, CSE208, CSE300, CSE310, CSE322, CSE324,

CSE402, CSE410, CSE462, CSE464

Non-Departmental

Theory Courses

4.0 PHY109, MATH143, EEE263, MATH 243,

3.0

EEE163, MATH141, ME165, CHEM101, HUM175, MATH241,

EEE269, IPE493

2.0 HUM211, HUM275, HUM371

Non-Departmental

Sessional Courses

1.5 PHY102, EEE164, ME160, HUM272, CHEM114, EEE264,

EEE270

Thesis 6.0 CSE400

Academic Performance

Student Retention

Student Abandonment

Residence Gender

Records of all Continuous

Assessments

Records of

Departmental Courses Records of Non

Departmental Courses

Page | 4

B. Preprocessing for Mining Academic Database

1) Relational Database

Students take courses through BIIS account via registration. In the relational database illustrated in Figure

4.2, all the personal information as well as the results of taken courses of a student are stored. Through

which we can obtain the relational table containing a student‟s gender, hall status, performance of all

courses, CGPA etc.

Figure 4.2: Relational database

2) Universal Database

A universal database is created for the purpose in which records of all taken courses along with personal

information like gender, hall status of corresponding student id are stored in a single row of the table. For

a specific course, the grade, attendance, marks of class tests, marks of each section (section A and section

B) of term final answer scripts and total marks. Like this the similar records of all other taken courses are

stored in the database with the corresponding student id. And by this process the records of other students

are stored in the database one after another after the corresponding Gender and Hall Status of a particular

student. Another attribute is stored as Student Type by which we have determined the student type-

regular, retentive or abandoned. As, for applying Apriori algorithm of Association Rule Mining, we have

to set the value of attribute in discrete form. So, record such as student id has been omitted in the

universal table.

Table 4.3: Partial portion of universal database

3) Data Transformation

The universal database of Table 4.3 has been transformed into an equivalent transformation table by

transforming the continuous valued attribute as discrete valued attribute representing some knowledge for

the suitability of implementing Apriori algorithm of Association Rule Mining. As for example, CGPA is a

continuous attribute and it has been transformed into five classifications as excellent, very good, good,

average and poor. We have used one algorithm for transforming all continuous numbers for attendance,

class tests, and both sections of answer scripts of term final and total marks of a course. We have used

another algorithm for transforming all grade or grade points of courses or overall CGPA into those five

classifications.

Gender

Hall_

Status

Student_

Type CSE103_

Grade

CSE103

_Attend

ance

CSE103

_CT

CSE103_

Section A

CSE 103_

SectionB

CSE103

_Total

… Male Resident Regular A+ 30 55 90 75 250

Female Non-

Resident

Regular A

25

45

85

70

225

… … … … … … … … …

Student

Grade

Sheet

Course achieves represents

Page | 5

For transforming the numbers of universal table i.e., attendance, class tests, section A, section B, total

marks of each course, Algorithm1 has been developed to populate the transformed table in such a way

that there is no continuous value in an entry.

Similarly the grades of universal table are also transformed by an algorithm named as Algorithm2. As the

real data set contains CGPA in grade points we similarly consider another variable grade point and

transformed the continuous value of CGPA to these five classified definitions.

As there are theory courses of credit 4.0, 3.0 and 2.0 and sessional with credit hour 1.5 and 0.75, we need

different transformation rule tables for all these different courses. Below, Transformation rules for 3.0

credit hour (in Table 4), for 4.0 credit hour (in Table 5), for 2.0 credit hour (in Table 6) theory courses

and for all sessional courses (in Table 7) are illustrated.

Algoithm1: Marks_Transformation ( )

Input: marks of Attendance, CT, Section A, Section B, Total Marks of each course from Universal

Table of Studentlist

Output: discrete level of marks for the Transformation Table

for i=1 to | Studentlist |

if (marks>=80%)

level = “Excellent”

else if (marks<80% && marks>=75%)

level = “Very Good”

else if (marks<75% && marks>=60%)

level = “Good”

else if(marks<60% && marks>=50%)

level = “Average”

else if(marks<50%)

level = “Poor”

end for

Algoithm2: Grade_Transformation ( )

Input: all acquired Grade of each courses in the Courselist of the universal table

Output: transformed_ grade for the Transformation Table

for i=1 to | Courselist |

if grade = A+

transformed_grade = „Excellent‟

else if grade = A

transformed_grade = „Very Good‟

else if grade = A- or B+

transformed_grade = „Good‟

else if grade = B

transformed_grade = „Average‟

else if grade = B- or C+ or C or D

transformed_grade = „Poor‟

end for

Page | 6

Table 4.4: Transformation rule table for 3.0 credit theory course

Table 4.5: Transformation rule table for 4.0 credit theory course

Table 4.6: Transformation rule table for 2.0 credit theory course

Table 4.7: Transformation rule table for all sessional courses

To construct the entire transformed table as given in Table 4.8, we have used the universal table and

above transformation rules.

Table 4.8: Transformed table from universal table

Classified

Name

Range of Marks (M)

Attendance Class Test SecA/SecB Total

Excellent 27≤ M ≤30 48≤M≤60 84≤M≤105 240≤M≤300

Very Good 24≤ M ≤26 45≤M≤47 78≤M≤83 225≤M≤239

Good 21≤ M ≤23 36≤M≤44 63≤M≤77 180≤M≤224

Average 18≤ M ≤20 30≤M≤35 52≤M≤62 150≤M≤179

Poor 0≤ M ≤17 0≤M≤29 0≤M≤51 0≤M≤149

Classified

Name

Range of Marks (M)

Attendance Class Test SecA/SecB Total

Excellent 36≤ M ≤40 64≤M≤80 112≤M≤140 320≤M≤400

VeryGood 32≤ M ≤35 60≤M≤63 105≤M≤111 300≤M≤319

Good 28≤ M ≤31 48≤M≤49 84≤M≤104 240≤M≤299

Average 24≤ M ≤27 40≤M≤47 70≤M≤83 200≤M≤239

Poor 0≤ M ≤23 0≤M≤39 0≤M≤69 0≤M≤199

Classified

Name

Range of Marks (M)

Attendance Class Test SecA/SecB Total

Excellent 18≤ M ≤20 32≤M≤40 56≤M≤70 160≤M≤200

Very Good 16≤ M ≤17 30≤M≤31 52≤M≤55 150≤M≤159

Good 14≤ M ≤15 24≤M≤29 42≤M≤51 120≤M≤149

Average 12≤ M ≤13 20≤M≤23 35≤M≤41 100≤M≤119

Poor 0≤ M ≤11 0≤M≤19 0≤M≤34 0≤M≤99

Classified

Name

Range of Marks (M)

Sessional Credit Hour=1.5 Sessional Credit Hour=0.75

Excellent 120≤ M ≤150 60≤ M ≤75

Very Good 112≤ M ≤119 56≤ M ≤59

Good 90≤ M ≤111 45≤ M ≤55

Average 75≤ M ≤89 37≤ M ≤44

Poor 0≤ M ≤74 0≤ M ≤36

Gender Hall_Statu

s

Student_Type CSE103_

Grade

CSE103_

Attendance

CSE103_CT CSE103_

SectionA

CSE103_

SectionB

CSE103_

Total

…… Male Resident Regular Excellent Excellent Excellent Excellent Good Excellent

Female Non-

resident

Regular Very

Good

Very Good Very Good Excellent Good Very

Good

…. …. …. …. …. …. …. …. ….

Page | 7

4) Dataset and Application Environment

In this experiment, we have considered the data up to the last five graduated batch in the department of

CSE, BUET. The institutional dataset of BUET consist academic and personal data of 9210 students in

last 10 years. We have categorized relevant academic and personal information of those students which

are gender, hall status, admission year, completed credit hour, all records of theory and sessional courses,

overall CGPA etc. from the relational BIIS database and transformed into universal table structure.

Finally we transformed it into a transformed table structure for applying association rule mining. The

entire experimental setup is illustrated in Figure 5.1.

Figure 4.1: Experimental Setup for applying Apriori Algorithm using Weka Explorer to generate

Association Rules

After preprocessing step, we have obtained a transformed table of 582 students of department of CSE

who have already graduated. Universal table also contain one additional attribute which is student type –

retentive, regular or abandoned. Student type is obtained by analyzing completed credit hour and

admission year. We have manipulated the transformation table containing all continuous data transformed

into five discrete value- Excellent, Very Good, Good, Average and Poor. Finally we have used Weka

Explorer to the transformation table (in .csv file format) to generate interesting Association Rules.

BUET Institutional Dataset of 9210 Students

of All Departments in Last 10 years

Gender Hall Status Admission Year Completed CreditHour

All Records of Theory & Sessional Courses Overall CGPA

Universal Table Structure

Regular 552

Student Type

Retentive 26

Abandoned 4

Male 473

Gender

Female 109

Resident 348

Hall Status

Non Resident 234

Theory Course 40

Attendance Classtest Section A Section B Total Grade

Sessional

Course 28

Total Marks Grade

Transformation Table Structure

Regular 552

Student Type

Retentive 26

Abandoned 4

Male 473

Gender

Female 109

Resident 348

Hall Status

Non Resident 234

Poor Average Good Very Good Excellent

All Marks & Grade of 68 Theory & Sessional Courses

Including Overall CGPA of 582 Students

Page | 8

v. Main Results and Discussions

1) Impact of Gender

We have found the impact of gender in the overall academic performance. This indication is very

important in terms of socio economic condition of the country. In BUET majority of the students are male

and lives in the university dormitories. There are multiple factors that affect the academic environment

and students‟ academic performance. The result of Table 5.1 points out that the male students have a very

high confidence level with the poor CGPA. The reason is that male students are generally affected by

various societal problems of a third world country like Bangladesh. All other rules support that the

academic performance of female students is better than the male students.

Table 5.1: Impact of Gender

No. Generated Interesting Rules Minimum Support Confidence

01 CGPA=Poor ⇒ Gender=male 10% 87%

02 CGPA=Average ⇒ Gender=male 10% 79%

03 CGPA=Very Good ⇒ Gender=male 10% 83%

04 Gender=male ⇒CGPA=Good 10% 26%

05 Gender=male ⇒ CGPA=Average 10% 21%

06 CGPA=Good ⇒ Gender=female 5% 22%

07 CGPA=Average ⇒ Gender=female 5% 21%

08 CGPA=Excellent ⇒ Gender=female 5% 20%

2) Impact of Residence

In BUET, most of the students live in institution hall. But the number of students live in home is also

significant fact. Analyzing the rules we have found that both the students of hall and the students residing

at home get good CGPA with a descent minimum support and confidence (in table 5.2). So if any student

wants to do well in academic prospect he can do from anywhere.

Table 5.2: Impact of Hall Status

No Generated Interesting Rules Minimum Support Confidence

01 CGPA=Average ⇒ Hall_Status=Resident 10% 65%

02 CGPA=Very Good ⇒ Hall_Status=Resident

10% 63%

03 CGPA=Good ⇒ Hall_Status=Non-

Resident

10% 43%

04 CGPA=Good Hall_Status=Resident ⇒

Gender=male

10% 82%

But it is found that the percentage of getting poor CGPA is high in hall. Because in hall, there is very little

restriction and sometimes there is no one to take care of a student as family members do. So a student can

be demoralize and get a very poor grade due to lack of studies. And as shown in rule number 1 in table

4.3, the percentage of male resident students is higher in this regard. In most of the cases, it is inevitable

that the poor CGPA holders are resident of hall (rule number 1 and 5 of table 5.3).

Page | 9

Table 5.3: Impact of Hall Status and Gender

No Generated Interesting Rules Minimum Support Confidence

01 CGPA=Poor Gender=male ⇒

Hall_Status=Resident

5% 51%

02 CGPA=Very Good Gender=male ⇒

Hall_Status=Non-Resident

5% 40%

03 Hall_Status=Non-Resident Gender= female ⇒

CGPA=Average

5% 24%

04 Hall_Status=Resident Gender=female ⇒ CGPA=Good

5% 21%

05 CGPA=Poor ⇒ Hall_Status=Resident 5% 52%

3) Correlation between Courses

The analyzed Association Rules show that the grade of one course may depend on prerequisite courses. In

rule number 1 we find that if anyone gets excellent grade in CSE105, he/she gets excellent grade in the

course CSE205 too with a confidence of 0.48 where CSE105 is Structured Programming Language

course and CSE201 is Object Oriented Programming Language course. We also discover that the

interrelation of course CSE311 (Data Communication-I) and CSE321 (Networking) in rule number 6, 7

and 8. We also find the impact of course CSE205 (Digital Logic Design) and CSE209 (Digital Electronics

and Pulse Technique) on course CSE403 (Digital System Design) in rule number 10 in Table 5.4.

Table 5.4: Correlation between Courses

No Generated Interesting Rules Minimum Support Confidence

01 CSE105_Grade=Excellent⇒ CSE201_Grade=Excellent

10% 48%

02 CSE201_Grade=Very Good ⇒

CSE105_Grade=Very Good

5% 30%

03 EEE163_Grade=Excellent ⇒ EEE263_Grade=Very Good

5% 27%

04 CSE205_Grade=Excellent ⇒ CSE403_Grade=Excellent

10% 50%

05 CSE403_Grade=Poor ⇒ CSE205_Grade=Average

5% 28%

06 CSE321_Grade=Average ⇒ CSE311_Grade=Average

5% 36%

07 CSE321_Grade=F ⇒ CSE311_Grade=Poor 3% 13%

08 CSE321_Grade=Poor ⇒ CSE311_Grade=Poor 3% 16%

09 CSE205_Grade=Very Good

CSE209_Grade=Excellent ⇒

CSE403_Grade=Excellent

5% 53%

Page | 10

4) Impact on Retention

If any student fails to pass any course then he becomes retentive because he needs to take that course

again later to complete his graduation. We find that retentive students usually struggle with the grades in

rule number 2, 3, 4, 5 and 6. If a student has not passed in CSE100 which is the first fundamental course

of CSE, he or she is retentive i.e., he or she has not passed in the later departmental courses also. This is

illustrated by the generated rule no. 1 in the Table 4.5. Moreover, we have discovered that maximum

retentive student are hall resident and male which are illustrated in rule number 7 and 8 respectively with

a high confidence in the Table 5.5.

Table 5.5: Impact on Retention

5) Impact on Abandonment

The students who have given up their academic studies without completing all the required courses are

typed as „abandoned‟. By analyzing the rules illustrated in Table 5.6, it is discovered that with a high

confidence, the abandoned students are male and resident of hall. But the minimum value of support is

very low. Thus it is found that the rate of abandonment is very low in the CSE department of this

university.

Table 5.6: Impact on Abandonment

No Generated Interesting Rules Minimum Support Confidence

01 Student Type=Abandoned ⇒ Gender=male 0.5% 100%

02 Student Type=Abandoned ⇒

Hall_Status=Resident

0.5% 75%

03 Student Type=Abandoned ⇒ Gender=male

Hall_Status=Resident

0.5% 75%

6) Impact of Continuous Assessment

The grading of a course depends on various aspects such as marks of attendance, class test, both sections

of term final examination. From rule number 7 which has a maximum confidence value 1.00, we have

discovered that the excellent grade of a course depends on the excellent performance of all other aspects

of continuous assessment. Again, the performance of class test depends on attendance which is illustrated

by rule number 5 in Table 5.7 with a confidence of 0.95 which is very high.

No Generated Interesting Rules Minimum Support Confidence

01 CSE100_Grade=F ⇒

Student Type=Retentive

5% 42%

02 Student Type=Retentive ⇒ MATH243_Grade=Poor

5% 35%

03 Student Type=Retentive ⇒ CSE205_Grade=Average

5% 35%

04 Student Type=Retentive ⇒ CSE311_Grade=Average

5% 27%

05 Student Type=Retentive ⇒ EEE263_Grade=Poor

5% 33%

06 Student Type=Retentive ⇒ CSE409_Grade=Average

5% 43%

07 Student Type=Retentive ⇒ Hall_Status=Resident

5% 65%

08 Student Type=Retentive ⇒ Gender=male 5% 81%

Page | 11

Table 5.7: Impact of Continuous Assessment

No Generated Interesting Rules Minimum Support Confidence

01 CSE103_Attendance=Excellent

CSE103_SectionB=Poor ⇒ CSE103_Grade=Average

10% 63%

02 CSE103_Grade=Very Good CSE103_CT=Good ⇒

CSE103_Attendance= Excellent

10% 97%

03 EEE163_Grade=Average ⇒ EEE163_SectionB=Poor 10% 57%

04 EEE163_Grade=Very Good ⇒ EEE163_Attendance=

Excellent EEE163_CT=Excellent

10% 67%

05 HUM275_CT=Excellent ⇒ HUM275_Attendance=

Excellent

10% 95%

06 HUM275_CT=Excellent HUM275_SectionA=Good⇒ HUM275_Grade=Very Good HUM275_Attendance=

Excellent

10% 75%

07 CSE401_Grade=Excellent CSE401_CT=Excellent

CSE401_SectionA= Excellent ⇒

CSE401_Attendance= Excellent

10% 100%

08 CSE401_SectionB=Excellent ⇒ CSE401_Grade=Good

10% 75%

7) Impact of Non Departmental Courses

After analyzing the generated Association Rules (in Table 5.8) we observed various impacts of non-

departmental courses on academic performances. According to curriculum we need to take some non-

departmental courses‟ performance which is added to the final result. So it may happen that some students

get poor grades in those non departmental courses. But according to generated rules though the good

performance of the non-departmental courses brings good grade but the impact of getting poor grade in

non-departmental courses causes less harm to the final CGPA because those courses are less in quantity

and maximum of those are studied at the beginning of undergraduate level. So students get enough

opportunities to improve their CGPA later.

Table 4.8: Impact of Non Departmental Courses No Generated Interesting Rules Minimum Support Confidence

01 CGPA=Very Good ⇒ HUM272_Grade=Very Good 10% 73%

02 CGPA=Very Good ⇒ MATH143_Grade=Average 5% 37%

03 CGPA=Good ⇒EEE163_Grade=Average 5% 36%

04 CGPA=Very Good ⇒ CHEM101_Grade=Average 10% 52%

05 CGPA=Average ⇒ IPE493_Grade=Very Good 5% 29%

06 CGPA=Good ⇒ ME165_Grade=Average 10% 43%

07 CGPA=Average ⇒ MATH243_Grade=Poor 5% 27%

8) Impact of Departmental Courses

As there are too many departmental courses are studied and there some inter connection between some

courses because of prerequisite courses, the result of departmental courses affect the final CGPA very

much. From the analyzed rules, it is found that the good grade of departmental courses brings good

CGPA. On the other hand poor grade in departmental courses results in poor overall CGPA. This

significant knowledge is discovered from the rules illustrated by the impact of departmental courses in

Table 5.9.

Page | 12

Table 5.9: Impact of Departmental Courses

No Generated Interesting Rules Minimum Support Confidence

01 CGPA=Very Good ⇒ CSE100_Grade=Very Good 5% 42%

02 CGPA=Very Good ⇒ CSE105_Grade=Average 5% 31%

03 CGPA=Very Good⇒ CSE206_Grade=Very Good 10% 44%

04 CGPA=Good ⇒ CSE303_Grade=Average 5% 31%

05 CGPA=Poor ==> CSE321_Grade=Poor 5% 29%

06 CGPA=Excellent ⇒ CSE401_Grade=Excellent 5% 50%

07 CGPA=Average ⇒ CSE401_Grade=Average 5% 29%

08 CGPA=Average ⇒ CSE409_Grade=Average 5% 42%

v. Conclusions Knowledge discovery from academic data is very important to improve the academic performance of any

higher educational institution. In this research, we study the academic system, the existing problems and

the performance data of the most renowned Engineering University of Bangladesh. We have found

problems like abandonment, retention and potentiality decay of the most brilliant students. We have

applied Association Rule Data Mining technique to explore the root of the cause of the above problems.

Before applying the data mining algorithm, the existing academic data has been preprocessed to make it

suitable for data mining. We have developed a data transformation technique that transforms the

relational database into an equivalent universal relational format. In this format, we have also transformed

the continuous data into discrete valued qualitative data. We have found interesting Association Rules

applying Apriori Association Rule generator on the transformed data using WEKA tool. From the large

number of association rules, we have extracted the interesting rules regarding the impacts of gender,

residence, continuous assessment on the academic performance. We have also found the association

among the courses, retention and abandonment. The obtained result is found to be very much significant

for the decision maker to improve the overall academic condition of the institution.

According to the results found, 10% of 582 students of CSE department who have already graduated are

male and have CGPA below 3.00 and the probability of being male students among poor CGPA holders

is 0.87. Again, we have discovered that, 5% of total students have poor CGPA and they are hall resident

and the probability of hall resident among poor CGPA holders is 0.52. We have also discovered the

significant correlation between courses. For example, more than 58 students have excellent grades in both

CSE105 (Structured Programming Language) and CSE201 (Object Oriented Programming Language).

The probability of having excellent grade in CSE201 among students having excellent grade in CSE105 is

0.48. We have found that there are about 30 students who has to retake MATH243 courses. We found that

5% of total male students are both retentive and hall resident and 65% of total retentive students are hall

resident. Abandonment rate is very low in CSE department of BUET as we found that only 3 male

students dropped out before completing graduation and 75% of abandoned students were hall resident.

We have also determined the impact of several Non-departmental courses. For example, more than 60

students possess very good grade in HUM272 as well as have CGPA over 3.50. We have also determined

the impact of several departmental courses. For example, 5% of 582 students have CGPA over 3.75 and

have got A+ in CSE 401. 50% of students having CGPA over 3.75 have obtained A+ in CSE 401.

We hope all these quantitative findings will be helpful to the decision maker for improving the quality of

education provided in this department. We have applied the technique to only the CSE department of

BUET but it is applicable to any department of any higher educational institute.