H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

34
Hot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani

Transcript of H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Page 1: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Hot News

Reporter:

Hossein KamyarAsef poormasoomi

SupervisorDr. Mohsen Kahani

Page 2: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Tehran University

Database Research Group Natural Language and Text Processing Group

Page 3: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Database Research Grouphttp://ece.ut.ac.ir/dbrg

Members :Faculty Staff : 8

Students : 9

Alumni : 17

Dr.Caro Lucas Dr.Behzad Moshiri Dr. Rohani Rankouhi

Page 4: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Database Research Group

Research Project: Modernization Of Systems

Information Retrieval

Data Mining

Data Management

Page 5: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Database Research Group

Research Project: Modernization Of Systems

Information Retrieval

Data Mining

Data Management

Page 6: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Database Research Group

Research Project: Modernization Of Systems

Information Retrieval

Data Mining

Data Management

Page 7: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Database Research Group

Industrial ProjectIndustrial Project

Industrial Project

Page 8: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Database Research Group

Related Course:1. Introduction to Database Systems

2. Advanced Database Systems

3. Special Topics in Database Systems

4. Database Laboratory

5. Data Mining

6. Information Retrieval

7. Natural Language Processing

Page 9: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Database Research Group

Persian CorpusHamshahri Corpus

رس�می مجموع�ه همش�هری توس�ط برگزارکنن�دگان 1نس�خه CLEF نگه�داری و توزی�ع می ش�ود. این مجموع�ه در CLEF2008 وCLEF2009 � پرس و جو دارد.100استفاده شده ا�ست و

توس�ط س�امانه 1388 مجموع�ه همش�هری در س�ال 2نس�خه UTIRE در گ�روه تحقیق�اتی پایگ�اه داده دانش�گاه ته�ران و ب�ر اس�استهیه شده ا�ست. TRECاستان�دارد

Page 10: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Database Research Group

Persian CorpusBijankhan Corpus

Bijankhan corpus is a tagged corpus that is suitable for natural language processing research on the Persian (Farsi) language. This collection is gathered form daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural and so on. Totally, there are 4300 different subjects. The Bijankhan collection contains about 2.6 millions manually tagged words with a tag set that contains 40 Persian POS tags.

Page 11: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Database Research Group

Persian Corpus

dotIR مجموعه محک وب

این مجموع�ه حاص�ل از خ�زش وب در ح�وزه.ir ش�امل ی�ک میلی�ون س�ند ایج�اد ش�د. س�پس ب�ا ک�اربر س�اخته ش�دند. این 25 پرس و ج�و توس�ط 50تع�داد UTIREاس�تفاده از نرم اف�زار اب�داعی

پرس و جو ه�ا ب�رای جس�تجوی مجموع�ه م�ورد اس�تفاده ق�رار گرفتن�د و ص�فحات بازی�ابی ش�ده، 25 س�ند ب�رای ه�ر پرس و ج�و(، توس�ط هم�ان 369 س�ند )بط�ور متوس�ط 18424ش�امل مجم�وع

کاربر مورد قضاوت قرار گرفتند. بدین ترتیب اسناد مرتبط با هر پرس و جو مشخص گردید. وی�ژگی 56بعالوه ب�رای بررس�ی و مقایس�ه الگوریتم ه�ای رتبه بن�دی در فع�الیتی م�وازی تع�داد

)ارائ�ه ش�ده توس�ط LETORاز اس�ناد بازی�ابی ش�ده ب�رای ه�ر پرس و ج�و ب�ر اس�اس اس�تاندارد Microsoft Research Asia اس�تخراج ش�دند. محقق�ان گ�رامی می توانن�د از برداره�ای مق�دار )

ی�ا آم�وزش و ب�رای رتبه بن�دی و ب�رای مقایس�ه الگوریتم ه�ای پیش�نهادی خ�ود وی�ژگی، ارتب�اط تنظیم الگوریتم ها سود ببرند.

این پ�روژه توس�ط مرک�ز تحقیق�ات مخ�ابرات ای�ران و آزمایش�گاه پایگ�اه داده دانش�گاه ته�ران.پشتیبانی شده است

Page 12: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Natural Language and Text Processing Group

Members:10 members

Heshaam Faili

[Assistant Professor, Ph.D. Artificial Intelligence from Sharif University of Technology]

Page 13: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Research Project:

More Than 23 Papers ?

Natural Language and Text Processing Group

Page 14: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Industrial ProjectIndustrial Project

Industrial Project

Natural Language and Text Processing Group

تشخیص و تصحیح خطاهای تایپی، •دستوری و معنایی

قابلیت نصب بر روی ویرایشگر متداول •word

قابلیت یادگیری و ارتقاء عملکرد به •صورت خودکار

دقیق و کارآمد• رایگان•

Page 15: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Persian Corpus1. TEP: Tehran English-Persian Parallel Corpus

First free Eng-Per corpus

4-million tokens on each side

Sentence Aligned

2. TMC: Tehran Monolingual Corpus

Largest freely available monolingual corpus for Persian language

Tokenized

Suitable for Language Modeling

3. Mutual Information

http://ece.ut.ac.ir/nlp/resources.html

Natural Language and Text Processing Group

Page 16: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Related Course:Introduction to Natural Language Processing, Dr. Heshaam Faili Advanced Database Systems

Natural Language and Text Processing Group

Page 17: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Beheshti Universityshahid

Natural Language Processing research laboratory was founded by Dr. Mehrnoush Shamsfard at the beginning of 2006 in computer engineering department of Shahid Beheshti University

More Than 25 members. More Than 92 papers.

http://nlp.sbu.ac.ir/

Page 18: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Research Project

A. Developing Linguistic resources

Developing Semantic annotated corpus

Developing chunked corpus

Developing parallel corpus

Developing Persian Verbs database

Semi-automatic Lexicon Acquisition 

Start : 2006

Researchers : Maliheh Monshizadeh, Elham Fekri

Page 19: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Research Project

B. Fundamental Persian text processing tools Standard Text Preparation for Persian

Stemmer /Morphological analyzer / lemmatizer

Tokenizer

POS Tagger

Spell checker

chunker

Syntax parser

Persian Named Entity Recognition - SBUNER

Persian Anaphora resolution

Semantic Role Labelling

Start : 2006

Researchers : Samira Noferesti, Rana Forsati, Pooneh Mortazavi, Hoda Sadat Jafari

Page 20: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Research Project

C. NLP Applications Machine translation – PenTrans project   

English to Persian Translation System

Persian to English Translation System

Machine translation evaluation toolkit

Persian Text summarization – PARSUMIST   

Question Answering    Persian –

English – SBUQA

Information Extraction - Mersad   

Text understanding   

Conversion between Persian sentences and first order logic

Text generation

Start : 2006

Researchers : Chakaveh Saedi, Yasaman Motazedi, Mostafa Nazari

Page 21: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Research Project

D. Ontology engineering Ontology development   

Development of CMMI-ACQ ontology

Collaborative development of ontology of computer science and engineering (COMON)

Fuzzy ontologies

Ontology Learning Ontology learning from text

Ontology learning from web

Relation extraction

Ontology mapping    Evolutionary ontology matching

A linguistic-Structural Approach to Bilingual Ontology Mapping

Ontology population and instantiation

Start : 2006

Researchers : Aynaz Taheri, Hakimeh Fadaei, Tara akhavan, Rahim Dehkharghani, Valeh Montaghami, Bahareh Sarrafzadeh, Amir Sharifloo, Rana Forsati

Page 22: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Research Project

E. Semantic Web Semantic Annotation of documents    

Converting web documents into semantic web resources   

Semantic search   

Semantic web service discovery and composition

Start : 2006

Researchers : Bahareh Sarrafzadeh, Hoda Mirzaie, Maryam Haghollahi, Homan Farrokhzad

Page 23: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Research Project

F. Hybrids Application of fuzzy ontologies in qualitative reasoning    

E-learning    Ontology based Content Rearrangement for Intelligent Tutoring Systems  – OCRITS Intelligent Content Management

Start : 2006

Researchers : Hamzeh Motahari, Marzieh Shariati

Page 24: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Courseware

Ontology Engineering Natural Language Processing Semantic Web Advanced Natural Language Processing, Fall 2005 BY:

Regina Barzilay and Michael Collins

Columbia UniversityMIT University

Page 25: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Tools

FarsNet The first Persian WordNet 

STeP-1  Standard Text Preparation for Persian

Tokenizer

Stemmer

POS tagger

Spell checker

Page 26: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

SNatural Language ProcessingWeb Intelligence Laboratory

harif University

Page 27: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Natural Language Processing

Dr ghasem Sani

Dr hesham FailiSince 2003 after three inactivity

ElizaPOS TaggerUnsupervised Natural Grammar Induction

Page 28: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Supervisor:Dr Abolhasani

with 28 members

Web Intelligence Laboratory

Page 29: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Web Intelligence Laboratory

Advanced Researches:Semantic Search EnginesSemantic Web ServicesSemantic web for pervasive computingAnnotationSemantic GridsSocial Networks AnalysisOntology Alignment and LearningWeb ClusteringBusiness Intelligence

Page 30: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

New Researches:Composite Web Service Execution Framework.Tracking news to find hot topics.Semantic Programming.Trust model in Semantic Web.New models for recommender systems.Using web to create a lecture for a subject.A Farsi framework for Information Retrieval.A semantic based framework for business intelligence applications.

Web Intelligence Laboratory

Page 31: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

S Unknown Laboratory

but Online POS Tagger

با همکاری پروژه ی عروض تحت پشتیانی شورای عالی اطالع رسانی

http://persianp.ir/index.php?option=com_wrapper&view=wrapper&Itemid=7

http://www.prosody.ir

cience & Technology University

Page 32: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Conferences

The Cross-Language Evaluation Forum (CLEF)(i)developing an infrastructure for the testing, tuning and evaluation of information retrieval systems operating on European languages in both monolingual and cross-language contexts

(ii) (ii) creating test-suites of reusable data which can be employed by system developers for benchmarking purposes.CLEF Conferences be held since 2000

CLEF2011 will be held by Amsterdam University

Computational Approaches to Arabic Script-based Languages (CAASL)CAASL2011 will be held in Geneva

Page 33: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

Corporationعصر گویش پرداز

استخراج اطالعات آماريn-gram براي زبان فارسياستخراج گرامر زبان فارسيتهيه مجموعه واژگان زبان فارسياستخراج كلمات پركاربرد زبان فارسي به تفكيك موضوعي

پروژه های در حال تحقیق مدل احتمالي کلمات تکي، دوتايي، سه تايي و چهارکلمه اي براي زبان هاي فارسي

و انگليسي قوانين دستوريGPSG براي زبان فارسي دستور زبان احتمالي پارسرهاي مناسب مدل زباني روشهاي خوشه بندي کلمات

Page 34: H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.

we do ...