Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra...

Building an Intelligent Web:Theory and Practice

Pawan Lingras

Saint Mary’s University

Rajendra Akerkar

American University of Armenia and SIBER, India

Discipline

Computer Science Mathematics and Statistics Management

Research Graduate Research Graduate

Chapters 1 – 8 excluding shaded portion related to

mathematics and implementation.

Complete BookInformation Retrieval

Web MiningChapters 2, 4 – 8 excluding

shaded portion related to implementation.

Chapters 1, 2, 3, 7 and 8 Chapters 4 - 8

Chapters 1 – 8 excluding shaded portion related to

implementation.

Information Retrieval

Create a list of words

Remove stop words

Stem words

Calculate frequency of each stemmed word

Figure 2.1 Transforming text document to a weighted list of keywords

Data Mining has emerged as one of the most exciting and dynamic fields in computing science. The driving force for data mining is the presence of petabyte-scale online archives that potentially contain valuable bits of information hidden in them. Commercial enterprises have been quick to recognize the value of this concept; consequently, within the span of a few years, the software market itself for data mining is expected to be in excess of $10 billion. Data mining refers to a family of techniques used to detect interesting nuggets of relationships/knowledge in data. While the theoretical underpinnings of the field have been around for quite some time (in the form of pattern recognition, statistics, data analysis and machine learning), the practice and use of these techniques have been largely ad-hoc. With the availability of large databases to store, manage and assimilate data, the new thrust of data mining lies at the intersection of database systems, artificial intelligence and algorithms that efficiently analyze data. The distributed nature of several databases, their size and the high complexity of many techniques present interesting computational challenges.

0.25 0.5 0.75 1

Recall

Figure 2.43 Relationship between precision and recall

Semantic Web

Semantic WebThe layer language model

(Berners-Lee, 2001; Broekstra et al, 2001)

<h1>Student Service Centre</h1>

Welcome to the home page of the Student Service Centre.

The centre is located in the main building of the University.

You may visit us for assistance during working days.

<h2>Office hours</h2>

Mon to Thu 8am - 6pm<br>

Fri 8am - 2pm<p>

But note that centre is not open during the weeks of the

<a href=”. . .”>State Of Origin</a>.

Figure 3.2 Example of a Web page of a Student Service Centre

<serviceOffered>Admission</serviceOffered>

<organizationName>Student Service Centre</organizationName>

<staff>

<secretary>Penny Brenner</secretary>

</staff>

</organization>

Figure 3.3 Example of a Web page of a Student Service Centre

Figure 3.4 Representing classes and instances (Noy et al., 2001)

root college

lecturer

location

course

@title

Innsbruck

NonlinearAnalysis

ModernAlgebra

DiscreteStructures

SamHoofer

NonlinearAnalysis

DanielaFrost

Computational

Algebra

Algorithms

EdwardBunker

Queries 1 and 2

root college

lecturer

location

course

@title

Innsbruck

NonlinearAnalysis

ModernAlgebra

DiscreteStructures

SamHoofer

NonlinearAnalysis

DanielaFrost

Computational

Algebra

Algorithms

EdwardBunker

Queries 3 and 4

root college

lecturer

location

course

@title

Innsbruck

NonlinearAnalysis

ModernAlgebra

DiscreteStructures

SamHoofer

NonlinearAnalysis

DanielaFrost

Computational

Algebra

Algorithms

EdwardBunker

<?xml version="1.0"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:dc="http://purl.org/dc/elements/1.1/">

<rdf:Description rdf:about="">

<dc:title>

Building an Intelligent Web: Theory and Practice

</dc:title>

<dc:creator> Rajendra Akerkar and Pawan Lingras </dc:creator>

</rdf:Description>

</rdf:RDF>

Figure 3.26 Fragment of RDF

A RDF model for automobiles

<rdf:RDF

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"

xmlns:my="http://www.myvehicle.com/vehicle-schema/">

<rdfs:Class rdf:about="#Vehicle"/>

<rdfs:Class rdf:about="#Car">

<rdfs:subClassOf rdf:resource="#Vehicle"/>

</rdfs:Class>

<rdf:Property rdf:about="#name">

<rdfs:domain rdf:resource="#Vehicle"/>

</rdf:Property>

<rdf:Description rdf:about="#Ford">

<rdf:type rdf:resource="#Car"/>

<my:name>Ford Icon</my:name>

</rdf:Description>

<my:Truck rdf:about="#Mitsubishi">

<my:name>Mitsubishi</my:name>

<my:carry rdf:resource="#Mitsubishi"/>

</my:Truck>

</rdf:RDF>

Figure 3.29 RDF/XML file for the automobile example

<topicMap id="tmrf"

xmlns = 'http://www.topicmaps.org/xtm/1.0/'

xmlns:xlink = 'http://www.w3.org/1999/xlink'>

The map contains information about Technomathematics Research Foundation.

We can include comment and narrative here…

.... here my topics and my associations go ...

</topicMap>

Figure 3.30 A Topic Map document (Adopted from http://topicmaps.bond.edu.au/docs/6/1)

Classification and Association

Data Preparation

• Database Theory

• SQL

• Data Transformation

• http://www.ecn.purdue.edu/KDDCUP/data/

Classification

• Find a rule, a formula, or black box classifier for organizing data into classes. – Classify clients requesting loans into categories

based on the likelihood of repayment– Classify customers into Big or Moderate Spenders

based on what they buy– Classify the customers into loyal, semi-loyal,

infrequent based on the products they buy• The classifier is developed from the data in the

training set• The reliability of the classifier is evaluated using

the test set of data

Classification

• ID3 Algorithm– Numerical Illustration– Application to a Small E-commerce Dataset

• C4.5 for Experimentation

• Other approaches – Neural Networks– Fuzzy Classification– Rough Set Theory

Association

• Market basket analysis – determine which things go together

• Transactions might reveal that– customers who buy banana also buy candles– cheese and pickled onions seem to occur frequently

in a shopping cart

• Information can be used for– arranging a physical shop or structuring the Web site– for targeted advertising campaign

Association

• Apriori Algorithm

• Demonstration for an E-commerce Application

Clustering

• Breaks a large database into different subgroups or clusters

• Unlike classification there are no predefined classes

• The clusters are put together on the basis of similarity to each other

• The data miners determine whether the clusters offer any useful insight

0 1 2 3 4 5

Statistical Methods

• k – means– Numerical Example– Implementation

• Data Preparation • Clustering

• Other Methods

Neural Network Based Approaches

• Kohonen Self Organising Maps– Numerical Demonstration– Application to Web Data Collection

• Other Neural Network Based Approaches

Clustering of customers

Web Mining

Web ContentMining

Web StructureMining

Web UsageMining

Web PageContent Mining

Search ResultMining

GeneralAccess Pattern

Tracking

CustomizedUsage Tracking

Web Usage Mining

High level web usage mining process(Srivastava et al., 2000)

Applications of web usage mining

(Romanko, 2006; Srivastava et al., 2000)

140.14.6.11 - pawan [06/Sep/2001:10:46:07 -0300] "GET /s.htm HTTP/1.0" 200 2267

140.14.7.18 - raj [06/Sep/2001:11:23:53 -0300] "POST /s.cgi HTTP/1.0" 200 499

Clustering exercise

Classification exercise

Channel Recall Precision Finance 44.3% 98.27% Health 52.3% 89.66% Market 49.1% 83.34% News 44.1% 89.27% Shopping 31.5% 91.31% Specials 60.2% 92.86% Sport 50.0% 91.93% Surveys 21.9% 92.66% Theatre 54.8% 94.63%

Table 6.8 Precision and recall for predicting user’s interest in channels

(Baglioni, et al., 2003)

Association exercise

News Section

Minimum Requests

Maximum Requests

Mean Requests

Standard Deviation

Science 1 97 2.3034 2.8184 Culture 1 208 3.7878 5.9742 Sports 1 318 5.6985 10.8360 Economics 1 258 3.9335 7.2341 International 1 208 3.3823 5.5540 Local Lisbon 1 460 5.6883 11.5650 Local Port 1 256 7.5984 13.2351 Politics 1 208 3.3577 5.4101 Society 1 367 4.2673 7.9853 Education 1 90 2.6496 3.29090

Table 6.9 Summary statistics of requests to the Publico on-line newspaper (Batista and Silva, 2002)

The association mining showed strong associations between the following pairs:

Politics and Society

Politics and International News

Politics and Sports

Society and International News

Society and Local Lisbon

Society and Sports

Society and Culture

Sports and International News

Sequence Pattern Analysis of Web Logs

Web Content Mining

Data Collection

• Web Crawlers

• Public Domain Web Crawlers

• An Implementation of a Web Crawler

Architecture of a search engine(Romanko, 2006)

Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra...

Documents

Transcript of Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra...

Rupali Akerkar, Sara Martino and Havard Rue˚...Rupali Akerkar, Sara Martino and Havard Rue˚ Department of Mathematical Sciences NTNU, Norway March 6, 2012 Abstract Multiple event

JENAYAH SIBER

2 Jenayah siber

SİBER GÜVENLİK - edirnesbl.meb.k12.tredirnesbl.meb.k12.tr/.../13110333_siber_Guvenlik.pdf · Siber Kavramı Siber Güvenlik Siber Uzay Siber Casusluk Siber Silah Siber Savaş İnternet’i

SİBER GÜVENLİK ÜRÜNLERİ VE HİZMETLERİsisatem.com.tr/siberhizmetkat_tr_15.05.19.pdf · • Siber Güvenlik Teknik Uzmanlık Eğitimi (500+ kişi) • Siber Güvenlik Farkındalık

Siber Zorbalık

Siber Güvenlik ve Temel Tedbirler Kapsamında Siber ... · Siber güvenlik, kurum, kuruluş ve kullanıcıların varlık-larına ait güvenlik özelliklerinin siber ortamda bulunan

Package ‘SIBER’

Siber Güvenlik

Siber Güvenliğe İlikin Temel Bilgilersome.sdu.edu.tr/assets/uploads/sites/408/files/siber...Siber Güvenliğe İlişkin Temel Bilgiler Ulusal Siber Olaylara Müdahale Merkezi -

Fitnah siber

SİBER GÜVENLİK ve SİBER SUÇLAR-AHA_Aralık2014

PATROLI SIBER TERPADU DALAM PENANGGULANGAN SIBER …

EKİM-ARALIK 2019 SİBER TEHDİT DURUM RAPORU...EKİM-ARALIK 2019 4 SİBER TEHDİT DURUM RAPORU GİRİŞ Bir yılı daha siber tehditler, bu tehditlerden kaynaklı siber olaylar ve

STRATEGI KEAMANAN SIBER NASIONAL - bssn.go.id · STRATEGI KEAMANAN SIBER NASIONAL - Kedaulatan, Keamanan, Kemandirian, Kebersamaan, Adaptif - ... Indonesia sebagai Pusat (Hub) Siber

Siber Savunma (Saldırı) Organizasyonu · Siber Güvenlik Kurulu Strateji Belgesi doğrultusunda öngörülen, Ulusal Siber Güvenlik Eylem Planı Siber güvenlik farkındalık çalışmaları

SIBER GÜVENLIK - 2020 · dijital dönüşümleri, siber tehditlerin hedefli ve koordineli bir şekilde gelişimi sonucunda siber uzayın genişlemesine neden olmuş ve siber tehditler

Siber Savunma (Saldırı) Organizasyonu · Siber Savunma (Saldırı) Organizasyonu Siber savunmaya(*) yönelik ürün ihracatının desteklenmesi Exploit geliştirme çalışmalarının

PUSAT SIBER/KAFE SIBER

Tata Kelola Keamanan Siber dan Diplomasi Siber Indonesia ...