Web IR / NLP Group (WING) Architecture

13
Web IR / NLP Group (WING) Architecture Min-Yen Kan School of Computing National University of Singapore

description

Web IR / NLP Group (WING) Architecture. Min-Yen Kan School of Computing National University of Singapore. Projects. Funded CSIDM (CAS, China): Aobo, CSIDM Interns ForeCite (Expires Oct 2010): Kaz, Emma, Thang Proposed Data Cleaning in the Cloud (UCI) - PowerPoint PPT Presentation

Transcript of Web IR / NLP Group (WING) Architecture

Page 1: Web IR / NLP Group (WING) Architecture

Web IR / NLP Group(WING) Architecture

Min-Yen Kan

School of Computing

National University of Singapore

Page 2: Web IR / NLP Group (WING) Architecture

Min-Yen Kan

Projects

Funded• CSIDM (CAS, China): Aobo, CSIDM Interns• ForeCite (Expires Oct 2010): Kaz, Emma, Thang

Proposed• Data Cleaning in the Cloud (UCI)• Text Mining Clinical Articles (Duke-NUS / UCI)

– Shreyasee, Justin,

• Text Mining Scientific Articles (Global Asia Institute)• ForeCite2

2WING, NUS

Page 3: Web IR / NLP Group (WING) Architecture

Min-Yen Kan

Research Topics•Yee Fan Tan - Record Matching in Digital Libraries

•Jin Zhao - Math Equation IR

•Jesse Gozali – Phototaking Behavior

•Ziheng Lin – Rhetorical Discourse Analysis

•Cong Duy Vu Hoang – Related Work Summarization

•Jun Ping Ng – Logic in Question Answering

•Aobo Wang – Crowdsourcing for Machine Translation

•Shihong Huang, Wai Hong Loh – Tooltip translator for Firefox

•Kazunari Sugiyama - Recommender Systems in Digital Libraries

•Minh Thang Luong – ForeCite

•Emma Thuy Dung Nguyen – ForeCite

Incoming Staff (4 UROP, 1 Intern):

•Shomir Wilson (Intern) – Mention Detection in Scientific Articles, w/ Jin

•Shawn Tan (UROP) – Continuing PARCELS, w/ Jesse

•Tamisa Huangsiri, Low Wee Hung – (UROP) CSIDM Firefox w/ Aobo, Jun Ping

•Yipeng Huang (UROP) – Cloud Data Cleaning, w/ Yee Fan, Jin

3WING, NUS

DLIR/MM/HCINLP

Page 4: Web IR / NLP Group (WING) Architecture

Min-Yen Kan

Responsibilities (to be discussed)

• Kaz: Non-CSIDM UROP guidance• Yee Fan: None (Thesis Writing!!)• Jin: RPNLPIR / Meeting and Room Bookings• Ziheng: Publication Page / Joomla / Social• Jesse: RoR / FC / CSX• Aobo: RoR / Web System Admin• Jun Ping: System Admin Lead

4WING, NUS

Page 5: Web IR / NLP Group (WING) Architecture

Min-Yen Kan

5WING, NUS

Cluster Architecture

Fixed IP– CTE – RAID drive host, LDAP host, source code repository– AYE – webserver, mailserver, mailman, virtual host on ECP

DHCP (.ddns.)– ECP – LDAP backup– PIE – compute server

Windows Server (.ddns)– KPE– KJE– BKE– SLE

Systems named after Singapore’s highways

Page 6: Web IR / NLP Group (WING) Architecture

Min-Yen Kan

6WING, NUS

OS support

All *nix group machines run CentOS 5• stable Linux Enterprise distribution• all mount cte’s raid drive, plus other automounts

Future• use rsync to sync all binaries across machines• expand RAID to encompass disks over different machines for more space (more SAN like)

Page 7: Web IR / NLP Group (WING) Architecture

Min-Yen Kan

7WING, NUS

RAID setup• Currently 5.0 TB in RAID 5?• ext3 mounted to cte

– /mnt/homes – home directories– /mnt/rpnlpir-indep – machine indep data (datasets)– /mnt/rpnlpir-Linux – binaries – /mnt/rpnlpir-Windows – binaries

Future• DB server coming online for Rails applications

Page 8: Web IR / NLP Group (WING) Architecture

Min-Yen Kan

8WING, NUS

Webserver (aye.comp.nus.edu.sg)

• Apache • Virtual hosts (wing.comp, linc.comp, opac.comp)• Hosts Tomcat for java servlets• Hosts gmond (Gangila monitor)• Runs webalizer for stats • Hosts Ruby on Rails apps (Trung’s myror script; to be deprecated soon)• Hosts web service server (router for web service calls)

Page 9: Web IR / NLP Group (WING) Architecture

Min-Yen Kan

Web Services

• Our infrastructure tuned to make many services and demos by web services.• External calls to port 4000 • List of Webservices on http://wing.comp.nus.edu.sg/~forecite/• Calls handled by WebServiceServer (WSS) ruby code.• Directory for webservices currently at

/home/forecite/services/

9WING, NUS

Page 10: Web IR / NLP Group (WING) Architecture

Min-Yen Kan

10WING, NUS

Joomla

• For our website• Administration by admin@wing, PhD students

Customizations• Forum integration (phpbb)

– Forum has contact information for all staff

– Forum userdb not yet synched with shadow pass in LDAP

• RPNLPIR (resource list)• Blog

Page 11: Web IR / NLP Group (WING) Architecture

Min-Yen Kan

11WING, NUS

Mailing List

• mailman run on aye• lists also run on wing (alias for aye)• both local and international mailing list hosted here

Page 12: Web IR / NLP Group (WING) Architecture

Min-Yen Kan

12WING, NUS

LDAP

• To keep logins/uids/guids synched• Main server on cte• Backup on aye• Needs to be robust in case of failure of LDAP server• Local root for all machines must be maintained

Page 13: Web IR / NLP Group (WING) Architecture

Min-Yen Kan

13WING, NUS

RPNLPIR (Research Project for NLP / IR)

• Common team account• Keep software repository mirrored by web page listing• Keeps CVS repo in ~/CVSDir• Keeps git repo in ~/repo• Accessible to all group members