Gathering and Organizing System for PErsonal Language Skills - GOSPELS
Transcript of Gathering and Organizing System for PErsonal Language Skills - GOSPELS
Gathering and Organizing System for PErsonal Language Skills
G.O.S.PE.L.S.
Student: Enrico ZanardoSupervisor: Prof. Vittore CasarosaFree University of Bolzano-Bozen8th October 2010
Goal
Provide appropriate documents to users based on their language skills in English, Italian and German as determined in accordance with guidelines provided by the European Language Portfolio.
DEEN
IT
Problems1. Classify documents according to “GOSPELS rating system” and match it to rating of the European Language Portfolio (A1, A2, ..., C1, C2).
2. Know user's language skills for the three language supported by the system (English, Italian and German).
3. Provide results in the three different languages according to user's language skills in each language.
Solution to step 1(Classify documents)
Algorithm
Level of complexityof the document
Frequency ofmost
common words
Part ofSpeech of the word
Docs
Algorithm
Level of complexityof the document
Frequency ofmost
common words
Part ofSpeech of the word
TemplateDocuments
RangeLanguage
Levels
Docs
Match betweenGospels Algorithm & ELP
Example Results
A1 A2 B1 B2 C1 C2
A1 A2 B1 B2 C1 C2
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
12.66
23.9425.51
31.88
34.0935.72
Italian
Gospels Algorithm
Rating Known words Words
Apache Solr 1.4
WEB-GUIJ2EE
DBPostgresql 8.4.4
Apache Nutch 1.1
LanguageLevel plug-in
TreeTagger
Wiktionary
APACHE TOMCAT 6.0
ARCH LINUX 2010.05
CRAWLER
INDEXERSEARCHER
USER Profile
GOOGLE TRANSLATOR API
APACHE LUCENE
Internet “unibz.org”
Prototype
Conclusions and possible extensions● The prototype is stable and seems to work well.
● Further testing required to improve and tune the algorithm● Further testing required to improve the matching with ELP
● The architecture can easily support other languages● It needs the frequency of words in the new language● It needs the PoS tagger for the new language
● The prototype can be easily modified to become an additional function of an existing digital library● It has to be embedded in the indexer