Myanmar Search Engine

Nyi Lynn SeckEC (MCPA)

Search Engine Evolution

● 1st generation (use only “on page” data)– text data, Word frequency, language

● 2nd generation (use off-page, web-specific data)– Link (or connectivity) analysis– Click-through data (What people click)– Anchor-text (How people refer to this page)

● 3rd generation (answer “the need behind the query”)– Semantic analysis - what is this about?– Focus on user need, rather than on query– Context determination

Text Mining Research Area

● Information Retrieval (IR)– Search Engines– Classification– Recommendation

● Information Extraction (IE)– Screen scraping– Product Information (e.g. price) scraping

● Information Understanding– Natural Language Processing (NLP)– Question Answering– Concept Extraction from Newsgroup– Visualization– Summarization

● Cross-Lingual Text Mining● Trend Detection

– Outlier Detection

Classical Indexing

Indexing

– Keyword Indexing

– Subject Indexing (Classification)

– Collocate subjects– Define & Assign code (Call Number) to document

Tokenization

Tokenization is the process of replacing sensitive data with unique identification symbols that retain all the essential information without compromising its security

Assign unique ID to each word & keep in a lexicon

Remove Stop/Noise words before/after tokenization

Stemming, Lemmatization

Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form – generally a written word form.

Lemmatization is the process of reducing an inflected spelling to its lexical root or lemma form. The lemma form is the base form or head word form you would find in a dictionary. The combination of the lemma form with its word class (noun, verb. etc.) is called the lexeme.

ကစ

ကစကြကင� ကစစရ အကစကစပြ

ကစ

ကစေ�နသည� ကစလ�မမ��ည�ကစခ�သည�

Inverted Index

Formula & Algorithm?

The weight of a term that occurs in all documents

Stop Wordsaableaboutaboveabroadaccordingaccordinglyacrossactuallyadjafterafterwardsagainagainstagoaheadain'tallallowallowsalmostalone

What stop words will be use in Myanmar Search Engine?

NGram သ သသသသသ သ သသသသသသသသသသသသ သသသသ သသသသသေ�မမတယဉမ��သတ�ထ ေေ�မမင�န�င�န �လ��အ ညည�

ေ�မမတေ�တယဉ �ယဉမ��မမ�သသတ�တ�ထထမမင�ေ�မမင�န�င�န�င�န �ရနလန%�ည&လ��အ�အ ညည�

|ေ�မမ||ေ�တ||ယဉ �||မမ�||သ||တ�||ထ||ေ�မမင�||န�င�||ရန �||လ��||အ�||သည�|

ေ�မမတယဉ �ေ�တယဉမ��ယဉမ��သမမ�သတ�သတ�ထတ�ထမမင�ထမမင�န�င�ေ�မမင�န�င�န �န�င�နလန%�ည&ရနလန%�ည&အ�လ��အ ညည�

ေ�မမတယဉမ��ေ�တယဉမ��သယဉမ��သတ�မမ�သတ�ထသတ�ထမမင�တ�ထမမင�န�င�ထမမင�န�င�န �ေ�မမင�န�င�နလန%�ည&န�င�နလန%�ည&အ�ရနလန%�ည&အ ညည�

2 Gram |ေ�မမတ||ယဉမ��||သတ�||ေ�မမင�န�င�||ရနလန%�ည&||လ��အ�||အ ညည�|3 Gram |ေ�မမတယဉ �||သတ�ထ||ေ�မမင�န�င�န �||လ��အ ညည�|4 Gram |ေ�မမတယဉမ��|

ေ�မမတယဉမ��သေ�တယဉမ��သတ�ယဉမ��သတ�ထမမ�သတ�ထမမင�သတ�ထမမင�န�င�တ�ထမမင�န�င�န �ထမမင�န�င�နလန%�ည&ေ�မမင�န�င�နလန%�ည&အ�န�င�နလန%�ည&အ ညည�

MyanmarWord Segmentation using Syllable level Longest Matching : Hla Hla Htay

Simple Myanmar Syllable Structure

Consonant

Medial

Killer

Diacritic

Killer

Diacriti

Diacritic

Killer

Diacritic

Killer

Diacritic

CC+MC+M+VC+M+V+KC+M+ V+ K+ DC+M+V+DC+M+KC+M+K+DC+M+DC+VC+V+KC+V+K+DC+V+DC+KC+K+D

Corpus/Lexicon

WWWWWW

Ranking engine

Query engineParser Indexer

Language specific crawler

Pagerepository

queryresults

Crawler

Language Identification

Language Specific Search EngineBasic Architecture

Pann Yu Mon, Management and Information System Engineering Department, Nagaoka University of Technology, Japan

Crawling Coverage

Crawling Parameters

Seed URLs 35Level of depth 6 Crawling time 2 weeksCPU 2.40 GHzMemory 1 GBConnection: 100 Mbit per second

Domains The Number of Pages Collected

.mm 3,555 [ 1.1%]

.com 276,554 [ 83.2%]

Other gTLDs 52,245 [ 15.7%]

Total 332,354 [100.0%]

10th July 2008

Myanmar Search Engine

Documents

Transcript of Myanmar Search Engine

Search engine optimization service, search engine optimization

SEARCH ENGINE MARKETING - crm.agentlocator.cacrm.agentlocator.ca/UserFiles/2223/files/Search-Engine-LRes.pdf · search engine placements PAID SEARCH MARKETING We also have developed

PowerPoint Search Engine , ppt search engine

SEO (Search Engine Optimization) vs SEM(Search Engine Marketing)

Search Engine

Search Engine Optimization and Search Engine Marketing

Search engine advertising - courses.ischool.berkeley.educourses.ischool.berkeley.edu/i141/f05/lectures/search-engine-advertising.pdf · Search engine advertising Hal Varian. SIMS

SEARCH ENGINE OPTIMIZATION How You can generate qualified Leads from Search Engine Optimization Search Engine Optimization.

Search Engine Marketing - megasmultimedia.commegasmultimedia.com/wp-content/uploads/2014/11/SEMPackage_WEB.pdf · Search Engine Marketing SEARCH ENGINE MARKETING (SEM) Search marketing

An Analytic Model to Optimize Search Results Using ... · Keywords: Search Engine; Social Search Engine; Real Time Search Engine; Analytic Search Engine Model; Social Rank; Socialytics;

Myanmar Text To Speech Engine

Trends in Search Engine Optimization and Search Engine Marketing

Universal Shaping Engine - Tiro · 2016. 5. 23. · Myanmar engine Myanmar (Burmese) Thai/Lao engine Lao, Thai Universal engine Balinese, Batak, Brahmi, Buginese, Buhid, Chakma, Cham,

SEO (Search Engine Optimisation) and SEM (Search Engine Marketing) - Seminar on Web Search

Search Engine Marketing: Search Engine Marketing · PDF fileSEO vs. PPC ... Links ... Search engine marketing and social media marketing .....125 Search engine marketing and email

SEARCH ENGINE OPTIMIZATION · 2016-02-06 · SEARCH ENGINE OPTIMIZATION Firman Ardiansyah. 70% dari Search Engine. BUAT SITUS WEB YANG RAMAH PENGGUNA ... Search Engine Friendly URLs

Search Engine Optimisation (Seo) And Search Engine Marketing

Website Search Engine Optimization: Geographical and Cultural … · 2014-12-18 · Search Engine Optimization, Web Crawlers, Search Engine Algorithms, Search Engine Visibility, Jordan