Web IR / NLP Group (WING) Architecture
description
Transcript of Web IR / NLP Group (WING) Architecture
Web IR / NLP Group(WING) Architecture
Min-Yen Kan
School of Computing
National University of Singapore
Min-Yen Kan
Projects
Funded• CSIDM (CAS, China): Aobo, CSIDM Interns• ForeCite (Expires Oct 2010): Kaz, Emma, Thang
Proposed• Data Cleaning in the Cloud (UCI)• Text Mining Clinical Articles (Duke-NUS / UCI)
– Shreyasee, Justin,
• Text Mining Scientific Articles (Global Asia Institute)• ForeCite2
2WING, NUS
Min-Yen Kan
Research Topics•Yee Fan Tan - Record Matching in Digital Libraries
•Jin Zhao - Math Equation IR
•Jesse Gozali – Phototaking Behavior
•Ziheng Lin – Rhetorical Discourse Analysis
•Cong Duy Vu Hoang – Related Work Summarization
•Jun Ping Ng – Logic in Question Answering
•Aobo Wang – Crowdsourcing for Machine Translation
•Shihong Huang, Wai Hong Loh – Tooltip translator for Firefox
•Kazunari Sugiyama - Recommender Systems in Digital Libraries
•Minh Thang Luong – ForeCite
•Emma Thuy Dung Nguyen – ForeCite
Incoming Staff (4 UROP, 1 Intern):
•Shomir Wilson (Intern) – Mention Detection in Scientific Articles, w/ Jin
•Shawn Tan (UROP) – Continuing PARCELS, w/ Jesse
•Tamisa Huangsiri, Low Wee Hung – (UROP) CSIDM Firefox w/ Aobo, Jun Ping
•Yipeng Huang (UROP) – Cloud Data Cleaning, w/ Yee Fan, Jin
3WING, NUS
DLIR/MM/HCINLP
Min-Yen Kan
Responsibilities (to be discussed)
• Kaz: Non-CSIDM UROP guidance• Yee Fan: None (Thesis Writing!!)• Jin: RPNLPIR / Meeting and Room Bookings• Ziheng: Publication Page / Joomla / Social• Jesse: RoR / FC / CSX• Aobo: RoR / Web System Admin• Jun Ping: System Admin Lead
4WING, NUS
Min-Yen Kan
5WING, NUS
Cluster Architecture
Fixed IP– CTE – RAID drive host, LDAP host, source code repository– AYE – webserver, mailserver, mailman, virtual host on ECP
DHCP (.ddns.)– ECP – LDAP backup– PIE – compute server
Windows Server (.ddns)– KPE– KJE– BKE– SLE
Systems named after Singapore’s highways
Min-Yen Kan
6WING, NUS
OS support
All *nix group machines run CentOS 5• stable Linux Enterprise distribution• all mount cte’s raid drive, plus other automounts
Future• use rsync to sync all binaries across machines• expand RAID to encompass disks over different machines for more space (more SAN like)
Min-Yen Kan
7WING, NUS
RAID setup• Currently 5.0 TB in RAID 5?• ext3 mounted to cte
– /mnt/homes – home directories– /mnt/rpnlpir-indep – machine indep data (datasets)– /mnt/rpnlpir-Linux – binaries – /mnt/rpnlpir-Windows – binaries
Future• DB server coming online for Rails applications
Min-Yen Kan
8WING, NUS
Webserver (aye.comp.nus.edu.sg)
• Apache • Virtual hosts (wing.comp, linc.comp, opac.comp)• Hosts Tomcat for java servlets• Hosts gmond (Gangila monitor)• Runs webalizer for stats • Hosts Ruby on Rails apps (Trung’s myror script; to be deprecated soon)• Hosts web service server (router for web service calls)
Min-Yen Kan
Web Services
• Our infrastructure tuned to make many services and demos by web services.• External calls to port 4000 • List of Webservices on http://wing.comp.nus.edu.sg/~forecite/• Calls handled by WebServiceServer (WSS) ruby code.• Directory for webservices currently at
/home/forecite/services/
9WING, NUS
Min-Yen Kan
10WING, NUS
Joomla
• For our website• Administration by admin@wing, PhD students
Customizations• Forum integration (phpbb)
– Forum has contact information for all staff
– Forum userdb not yet synched with shadow pass in LDAP
• RPNLPIR (resource list)• Blog
Min-Yen Kan
11WING, NUS
Mailing List
• mailman run on aye• lists also run on wing (alias for aye)• both local and international mailing list hosted here
Min-Yen Kan
12WING, NUS
LDAP
• To keep logins/uids/guids synched• Main server on cte• Backup on aye• Needs to be robust in case of failure of LDAP server• Local root for all machines must be maintained
Min-Yen Kan
13WING, NUS
RPNLPIR (Research Project for NLP / IR)
• Common team account• Keep software repository mirrored by web page listing• Keeps CVS repo in ~/CVSDir• Keeps git repo in ~/repo• Accessible to all group members