H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.
-
Upload
grant-blake -
Category
Documents
-
view
219 -
download
0
Transcript of H ot News Reporter: Hossein Kamyar Asef poormasoomi Supervisor Dr. Mohsen Kahani.
Hot News
Reporter:
Hossein KamyarAsef poormasoomi
SupervisorDr. Mohsen Kahani
Tehran University
Database Research Group Natural Language and Text Processing Group
Database Research Grouphttp://ece.ut.ac.ir/dbrg
Members :Faculty Staff : 8
Students : 9
Alumni : 17
Dr.Caro Lucas Dr.Behzad Moshiri Dr. Rohani Rankouhi
Database Research Group
Research Project: Modernization Of Systems
Information Retrieval
Data Mining
Data Management
Database Research Group
Research Project: Modernization Of Systems
Information Retrieval
Data Mining
Data Management
Database Research Group
Research Project: Modernization Of Systems
Information Retrieval
Data Mining
Data Management
Database Research Group
Industrial ProjectIndustrial Project
Industrial Project
Database Research Group
Related Course:1. Introduction to Database Systems
2. Advanced Database Systems
3. Special Topics in Database Systems
4. Database Laboratory
5. Data Mining
6. Information Retrieval
7. Natural Language Processing
Database Research Group
Persian CorpusHamshahri Corpus
رس�می مجموع�ه همش�هری توس�ط برگزارکنن�دگان 1نس�خه CLEF نگه�داری و توزی�ع می ش�ود. این مجموع�ه در CLEF2008 وCLEF2009 � پرس و جو دارد.100استفاده شده ا�ست و
توس�ط س�امانه 1388 مجموع�ه همش�هری در س�ال 2نس�خه UTIRE در گ�روه تحقیق�اتی پایگ�اه داده دانش�گاه ته�ران و ب�ر اس�استهیه شده ا�ست. TRECاستان�دارد
Database Research Group
Persian CorpusBijankhan Corpus
Bijankhan corpus is a tagged corpus that is suitable for natural language processing research on the Persian (Farsi) language. This collection is gathered form daily news and common texts. In this collection all documents are categorized into different subjects such as political, cultural and so on. Totally, there are 4300 different subjects. The Bijankhan collection contains about 2.6 millions manually tagged words with a tag set that contains 40 Persian POS tags.
Database Research Group
Persian Corpus
dotIR مجموعه محک وب
این مجموع�ه حاص�ل از خ�زش وب در ح�وزه.ir ش�امل ی�ک میلی�ون س�ند ایج�اد ش�د. س�پس ب�ا ک�اربر س�اخته ش�دند. این 25 پرس و ج�و توس�ط 50تع�داد UTIREاس�تفاده از نرم اف�زار اب�داعی
پرس و جو ه�ا ب�رای جس�تجوی مجموع�ه م�ورد اس�تفاده ق�رار گرفتن�د و ص�فحات بازی�ابی ش�ده، 25 س�ند ب�رای ه�ر پرس و ج�و(، توس�ط هم�ان 369 س�ند )بط�ور متوس�ط 18424ش�امل مجم�وع
کاربر مورد قضاوت قرار گرفتند. بدین ترتیب اسناد مرتبط با هر پرس و جو مشخص گردید. وی�ژگی 56بعالوه ب�رای بررس�ی و مقایس�ه الگوریتم ه�ای رتبه بن�دی در فع�الیتی م�وازی تع�داد
)ارائ�ه ش�ده توس�ط LETORاز اس�ناد بازی�ابی ش�ده ب�رای ه�ر پرس و ج�و ب�ر اس�اس اس�تاندارد Microsoft Research Asia اس�تخراج ش�دند. محقق�ان گ�رامی می توانن�د از برداره�ای مق�دار )
ی�ا آم�وزش و ب�رای رتبه بن�دی و ب�رای مقایس�ه الگوریتم ه�ای پیش�نهادی خ�ود وی�ژگی، ارتب�اط تنظیم الگوریتم ها سود ببرند.
این پ�روژه توس�ط مرک�ز تحقیق�ات مخ�ابرات ای�ران و آزمایش�گاه پایگ�اه داده دانش�گاه ته�ران.پشتیبانی شده است
Natural Language and Text Processing Group
Members:10 members
Heshaam Faili
[Assistant Professor, Ph.D. Artificial Intelligence from Sharif University of Technology]
Research Project:
More Than 23 Papers ?
Natural Language and Text Processing Group
Industrial ProjectIndustrial Project
Industrial Project
Natural Language and Text Processing Group
تشخیص و تصحیح خطاهای تایپی، •دستوری و معنایی
قابلیت نصب بر روی ویرایشگر متداول •word
قابلیت یادگیری و ارتقاء عملکرد به •صورت خودکار
دقیق و کارآمد• رایگان•
Persian Corpus1. TEP: Tehran English-Persian Parallel Corpus
First free Eng-Per corpus
4-million tokens on each side
Sentence Aligned
2. TMC: Tehran Monolingual Corpus
Largest freely available monolingual corpus for Persian language
Tokenized
Suitable for Language Modeling
3. Mutual Information
http://ece.ut.ac.ir/nlp/resources.html
Natural Language and Text Processing Group
Related Course:Introduction to Natural Language Processing, Dr. Heshaam Faili Advanced Database Systems
Natural Language and Text Processing Group
Beheshti Universityshahid
Natural Language Processing research laboratory was founded by Dr. Mehrnoush Shamsfard at the beginning of 2006 in computer engineering department of Shahid Beheshti University
More Than 25 members. More Than 92 papers.
http://nlp.sbu.ac.ir/
Research Project
A. Developing Linguistic resources
Developing Semantic annotated corpus
Developing chunked corpus
Developing parallel corpus
Developing Persian Verbs database
Semi-automatic Lexicon Acquisition
Start : 2006
Researchers : Maliheh Monshizadeh, Elham Fekri
Research Project
B. Fundamental Persian text processing tools Standard Text Preparation for Persian
Stemmer /Morphological analyzer / lemmatizer
Tokenizer
POS Tagger
Spell checker
chunker
Syntax parser
Persian Named Entity Recognition - SBUNER
Persian Anaphora resolution
Semantic Role Labelling
Start : 2006
Researchers : Samira Noferesti, Rana Forsati, Pooneh Mortazavi, Hoda Sadat Jafari
Research Project
C. NLP Applications Machine translation – PenTrans project
English to Persian Translation System
Persian to English Translation System
Machine translation evaluation toolkit
Persian Text summarization – PARSUMIST
Question Answering Persian –
English – SBUQA
Information Extraction - Mersad
Text understanding
Conversion between Persian sentences and first order logic
Text generation
Start : 2006
Researchers : Chakaveh Saedi, Yasaman Motazedi, Mostafa Nazari
Research Project
D. Ontology engineering Ontology development
Development of CMMI-ACQ ontology
Collaborative development of ontology of computer science and engineering (COMON)
Fuzzy ontologies
Ontology Learning Ontology learning from text
Ontology learning from web
Relation extraction
Ontology mapping Evolutionary ontology matching
A linguistic-Structural Approach to Bilingual Ontology Mapping
Ontology population and instantiation
Start : 2006
Researchers : Aynaz Taheri, Hakimeh Fadaei, Tara akhavan, Rahim Dehkharghani, Valeh Montaghami, Bahareh Sarrafzadeh, Amir Sharifloo, Rana Forsati
Research Project
E. Semantic Web Semantic Annotation of documents
Converting web documents into semantic web resources
Semantic search
Semantic web service discovery and composition
Start : 2006
Researchers : Bahareh Sarrafzadeh, Hoda Mirzaie, Maryam Haghollahi, Homan Farrokhzad
Research Project
F. Hybrids Application of fuzzy ontologies in qualitative reasoning
E-learning Ontology based Content Rearrangement for Intelligent Tutoring Systems – OCRITS Intelligent Content Management
Start : 2006
Researchers : Hamzeh Motahari, Marzieh Shariati
Courseware
Ontology Engineering Natural Language Processing Semantic Web Advanced Natural Language Processing, Fall 2005 BY:
Regina Barzilay and Michael Collins
Columbia UniversityMIT University
Tools
FarsNet The first Persian WordNet
STeP-1 Standard Text Preparation for Persian
Tokenizer
Stemmer
POS tagger
Spell checker
SNatural Language ProcessingWeb Intelligence Laboratory
harif University
Natural Language Processing
Dr ghasem Sani
Dr hesham FailiSince 2003 after three inactivity
ElizaPOS TaggerUnsupervised Natural Grammar Induction
Supervisor:Dr Abolhasani
with 28 members
Web Intelligence Laboratory
Web Intelligence Laboratory
Advanced Researches:Semantic Search EnginesSemantic Web ServicesSemantic web for pervasive computingAnnotationSemantic GridsSocial Networks AnalysisOntology Alignment and LearningWeb ClusteringBusiness Intelligence
New Researches:Composite Web Service Execution Framework.Tracking news to find hot topics.Semantic Programming.Trust model in Semantic Web.New models for recommender systems.Using web to create a lecture for a subject.A Farsi framework for Information Retrieval.A semantic based framework for business intelligence applications.
Web Intelligence Laboratory
S Unknown Laboratory
but Online POS Tagger
با همکاری پروژه ی عروض تحت پشتیانی شورای عالی اطالع رسانی
http://persianp.ir/index.php?option=com_wrapper&view=wrapper&Itemid=7
http://www.prosody.ir
cience & Technology University
Conferences
The Cross-Language Evaluation Forum (CLEF)(i)developing an infrastructure for the testing, tuning and evaluation of information retrieval systems operating on European languages in both monolingual and cross-language contexts
(ii) (ii) creating test-suites of reusable data which can be employed by system developers for benchmarking purposes.CLEF Conferences be held since 2000
CLEF2011 will be held by Amsterdam University
Computational Approaches to Arabic Script-based Languages (CAASL)CAASL2011 will be held in Geneva
Corporationعصر گویش پرداز
استخراج اطالعات آماريn-gram براي زبان فارسياستخراج گرامر زبان فارسيتهيه مجموعه واژگان زبان فارسياستخراج كلمات پركاربرد زبان فارسي به تفكيك موضوعي
پروژه های در حال تحقیق مدل احتمالي کلمات تکي، دوتايي، سه تايي و چهارکلمه اي براي زبان هاي فارسي
و انگليسي قوانين دستوريGPSG براي زبان فارسي دستور زبان احتمالي پارسرهاي مناسب مدل زباني روشهاي خوشه بندي کلمات
we do ...