Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the...
-
Upload
della-cole -
Category
Documents
-
view
235 -
download
0
Transcript of Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the...
![Page 1: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/1.jpg)
1
Personal information• Hung-Hsuan Chen 陳弘軒• PhD, Computer Science and Engineering, the
Pennsylvania State University (2008 – 2013)• MS, BS, Computer Science, National Tsing Hua
University (2000 – 2004, 2004 – 2006)• Recent honors
Best paper award, College of Engineering, PSU (2013) Highest F1 score, the Competition of Plagiarism detection,
PAN (2013) Invited to Amazon PhD research symposium, present
research work at Amazon, single digit acceptance rate (2013)
Travel award, SIGMOD 2013, ICHI 2013, SBP 2012
![Page 2: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/2.jpg)
2
Data Science?
From data scientist Drew Conwayhttp://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
![Page 3: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/3.jpg)
3
Research interest in general: data analysis and mining
Link Only
Link + Content
Content Only
Ranking Function
Similarity Search
Link Prediction
Info Prop P2P Trans.
User Analysis
Data Type
Task
DBSocial’13DMH’13 KDD’12
SAC’13
SBP’12K-CAP’11
ASONAM’13
JCDL’13D-Lib’12
JCDL’11JCDL’11
JCDL’10WWW’10
MIR’10
ICPADS’05
CLEF’13
MobiDE’09SIGCOMM’08
TKDD ‘14
TMIS ‘14
Text Analysis
ECIR’14
AAAI’14JCDL’14
JCDL’14
WebSci’14
IAAI’14
![Page 4: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/4.jpg)
4
Research experience• RA, IST, Pennsylvania State University (2008 – now)
Large scale text mining + social network analysis• RA, CSIE, National Taiwan University (08/2011 –
01/2012) Information propagation analysis
• Software engineer Intern, Google (05/2010 – 08/2010) Recommender system + user log analysis
• RA, IIS, Academia Sinica (11/2007 – 07/2008) User traffic analysis
• RA, CS, National Tsing Hua University (2004 – 2006) Distributed data stream analysis
![Page 5: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/5.jpg)
5
Selected recent research
![Page 6: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/6.jpg)
6
CSSeer• An open source expert recommender system based
on a given digital library Live site (based on CiteSeerX): http://csseer.ist.psu.edu
• The framework is shipped to Dow Chemical Expert discovery based on internal technical reports
• Author disambiguation (by random forest) Wen-Yi:雯怡?文溢? “C. Giles” = “C. Lee Giles” = “Lee Giles” = “C. L. Giles”? Google Scholar suffers from a similar problem
• Keyphrase extraction (by naïve Bayesian)• Expert ranking (by naïve Bayesian)
![Page 7: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/7.jpg)
7
How to rank experts?• Rank authors, not documents
Text indexing and PageRank-like methods cannot be directly applied
• Ranking efficiency Aggregating author scores on-
the-fly is time consuming Offline computing unigrams
• Ranking quality What is the probability that a is
an expert given a query term q? P(a|q) = ΣP(d)P(q|d)P(a|q,d) = ΣP(d)P(q|d)P(a|d)
Document Ranking Function
Query term
d1
d2
d3
d4
d5
d3
d2
d5
d1
d4
Current searchengine
![Page 8: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/8.jpg)
8
Comparison with other expert recommenders
Simulates Google Scholar
Obtain ground truth from: http://arnetminer.org/lab-datasets/expertfinding/
![Page 9: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/9.jpg)
9
ASCOS• Discovering similar objects in a network• Symmetric vs. asymmetric similarity
Similarity can be asymmetric• Coauthoring behavior: a young researcher might be more
interested in collaborating with a strong researcher than vise versa
Asymmetric property may reveal the hierarchical relationship between objects• Word association network: “fruit” should be the super-class
of “banana” and “apple”, but sub-class of “food”
• Link prediction ASCOS better predicts future collaborations than
SimRank and several other state-of-the-art link prediction algorithms
![Page 10: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/10.jpg)
10
Intuition of ASCOS• Similarity from i to j is dependent on the
similarity score from i’s neighbors to j
N(i): the set of neighbors of node i • Utilize all paths between nodes• Asymmetric
otherwise 1
if |)(|: )(
j isiN
c
s iNk kjij
![Page 11: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/11.jpg)
11
Hierarchical structure inference
The score difference between the neighbor words of “instrument” to the word “instrument” (node i)
The score difference between the neighbor words of “fruit” to the word “fruit” (node i)
![Page 12: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/12.jpg)
12
Future research direction (and several ongoing research)
![Page 13: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/13.jpg)
13
Data science• Hacking skills in handling big data
CiteSeerX, CSSeer, CollabSeer• 3 million+ documents• 1 million+ authors, 300K+ disambiguated authors• 3 billion+ log entries to analyze
Google ad logs• Several TB per day
• Knowledge in math and stats Various data mining techniques Social network analysis Natural language processing
• Inter-discipline collaboration Collaborated with Dow Chemical, Alcatel-Lucent, etc.
![Page 14: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/14.jpg)
14
Data is changing the world• WhosCall
Telephone number crowd-sourcing Reverse lookup and number identification
• Waze Map crowd-sourcing + Google Map info Automatic road update + real time traffic update
• 零時政府 台灣懸浮微粒汙染圖–已被用於新聞台氣象報導 萌典–教育部林主任:應用層面已經不是廠商做不做得
出來的問題,是我們想都想不到能有這些應用。• And many others…
Netflix, Amazon, Walmart, State Farm, Spotify, Yelp, medical data used in hospital, etc.
![Page 15: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/15.jpg)
15
Potential research projects: IR and DM on MOOC
• MOOC: Massive Open Online Course• Mining keyphrases from slides + audio lectures
Slide texts may not be a complete sentence POS taggers may not work
Slides provides unique style clues for keyphrase extraction Speaker’s voice, tones, and other features may provide other
clues for keyphrase extraction Combining above heterogeneous perspectives to improve
performance
• Automatic course topic clustering or classification• Rely on students’ interactions with MOOC to predict
their learning performance Find talented students and slow learners as early as possible 因材施教
![Page 16: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/16.jpg)
16
Potential research projects: IR and DM on digital libraries
• Math equation retrieval How to correctly parse equation (from PDF)? How to index equation? Query interface?
• Music score retrieval How to parse music notes? How to index music? Query interface?
• Inferring “meaning” of figures Retrieving x, y labels and the points in a figure could make a search
engine more powerful Sample query: what’s the performance of method Y when x = x1?
• “Artificial” paper detection IEEE and Springer withdraw 120 papers Fake paper influences user experience and Scientometrics
![Page 17: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/17.jpg)
17
Teaching
![Page 18: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/18.jpg)
18
Teaching experience• Guest lectures in classes
Information Retrieval and Search Engines (Spring 2011, Spring 2013, at PSU)
• TA Operating Systems (Fall 2005, at NTHU)
• Guest speaker of various seminars (in addition to conference presentations) Dept of CS, RIT – 2014 Amazon – 2013 Graduate Exhibitions, PSU – 2012, 2013 College of Engineering, PSU – 2013 Network Science Seminar, PSU – 2013
![Page 19: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/19.jpg)
19
Teaching philosophy• Incorporate research in teaching
Students understand the usefulness of what they’ve learned
Students understand what’s happening in science Students may bring useful feedback or fresh ideas
• Learning by doing Students may better understand these topics Students usually feel more confident when they
implement a concept by themselves• Competition
Online competitions stimulate students’ motivations to think and work hard
![Page 20: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/20.jpg)
20
Advanced courses I can offer• 資料探勘 (data mining)
Overview Evaluation methods Supervised
• Classification• Regression
Unsupervised• Clustering• Density estimation
Applications of DM Practical issues Advanced techniques
• 資料擷取與蒐尋引擎 (information retrieval and search engine) Overview Retrieval evaluation Concept of documents Text processing Query models and
indexing Web crawling and
robots.txt Link analysis
![Page 21: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/21.jpg)
21
Advanced courses I can offer• 社群網路分析 (Social
network analysis) Overview Graph theory Properties of real social
networks Community detection Link prediction Information propagation Heterogeneous social
network
• 大規模文字資料分析 (Large-scale Text Document analysis) Word and document
representation Text mining pipeline Fundamentals of NLP Association rules MapReduce framework Keyphrase extraction Duplicate detection Recommender systems
![Page 22: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/22.jpg)
22
Basic courses I can offer• 線性代數 (Linear
algebra) Systems of linear
equations Vector and matrix Eigenvalues and
eigenvectors Determinants When LA meets DM
• Matrix decomposition vs recommender systems
• PCA vs dimension reduction
• Eigenvector vs PageRank
• 機率與統計 (Probability and Statistics) Intro to probability Random variables and
Bayes’ theorem Discrete random
variables and PMF Continuous random
variables and PDF Joint probability
distribution Confidence interval Hypothesis testing
![Page 23: Personal information Hung-Hsuan Chen 陳弘軒 PhD, Computer Science and Engineering, the Pennsylvania State University (2008 – 2013) MS, BS, Computer Science,](https://reader036.fdocuments.net/reader036/viewer/2022081420/56649e035503460f94aedaeb/html5/thumbnails/23.jpg)
23
Other courses I am interested to offer
• 計算機程式設計• 資料結構• 資料庫系統概論• Web 程式設計• 開源軟體開發實務