22期.百度彭滔 搜索引擎评估与用户行为分析
-
Upload
janwen-lou -
Category
Technology
-
view
569 -
download
9
description
Transcript of 22期.百度彭滔 搜索引擎评估与用户行为分析
![Page 1: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/1.jpg)
¢ñ�£Ĕ��Ô�Ć6¸
�Π2012-01-07
![Page 2: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/2.jpg)
Who am I
! Who am I – �Î – [email protected] – ×~üĭ¢ñğ
• Ô�Ć6¸ó
– v!ą\ • ¢ñ�|ĵíÈĔ�ĵïúÙ ĵÔ��ÂĵÊĤ6¸ • �«�±+Ø�7"
![Page 3: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/3.jpg)
�nĽ¢ñ�£l Ĕ�
L;¬�Ľ����¨
0ÿ¬�Ľý�Ø®�
µY¬�ĽĴnIJ
�÷
Ĩ�
Ú�
![Page 4: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/4.jpg)
1. ¢ñ�£l Ĕ�ľ
! �õĔ��; – ¢ñ_2010{�ĥ�āüê�j}Ô ĺ81.9%Ļ
• CNNIC, ķ�]�āüùFuÐ3úČ�Rĸļ2011
– Google effects on memory • Ĭr[ĩĦįĽ�Ħį v.s. �Õă • Į²ĂÔÕăĽ�ċĦį v.s. WĢ±Õă
– (Sparrow, 2011)
• The Internet has become a primary form of external or transactive memory, where information is stored collectively outside ourselves.
![Page 5: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/5.jpg)
1. ¢ñ�£l Ĕ�ľ
! ¢ñ�£ØĔ� – Ü+�
• Query #$ url – Ċ~
• �ĵ4ĵ(ĵ© ! ö-ªÈ
– MAP – DCG – nDCG – ERR – …
2�
3�
1�
2�2�2�
1⁄1 ×�
1⁄2 ×�
1⁄3 ×�
1⁄4 ×�
1⁄5 ×�
1⁄6 ×�
+�
+�
+�
+�
+�
=�
5.0667�
![Page 6: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/6.jpg)
�nĽ¢ñ�£l Ĕ�
L;¬�Ľ����¨
0ÿ¬�Ľý�Ø®�
µY¬�ĽĴnIJ
�÷
Ĩ�
Ú�
![Page 7: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/7.jpg)
2. L;¬�Ľ����¨
! Side by side Ĕ�
![Page 8: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/8.jpg)
2. L;¬�Ľ����¨
! ¢ñ�£�vĔ�Ľ – àF�S¡5�ä©�|ëÖĺªÈĻ
• qijϳE v.s. ò�e4C
– 10000 • �log��G10000�queryļQEUC��ϳO¬¼Ė
– 1000 • 10000�query�ļ±1000�÷¹�|�OØ
– 100 • 1000�diff�ļPM�Ğ100�ĜĆ�vreview • 30ĺgood) : 50 (same) : 20 (bad)
PM�
![Page 9: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/9.jpg)
2. L;¬�Ľ����¨
! L;¬�Ø�� – §Ĥ
• ×~Å«Ø¢ñĤ v.s. pmĔ�ØqueryĤ
– ¬ħ • “�Tø5Ĕ�÷đĶ”
PM�
![Page 10: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/10.jpg)
�nĽ¢ñ�£l Ĕ�
L;¬�Ľ����¨
0ÿ¬�Ľý�Ø®�
µY¬�ĽĴnIJ
�÷
Ĩ�
Ú�
![Page 11: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/11.jpg)
3.0ÿ¬�Ľý�Ø®�
! �? ĺcrowdsourcing) – �h?)MìčAďļ>Ī.Ā�Sĺevaluator)Ķ – s��¢ñĔ��=6ċĐit�=Ķ – Å�evaluatorp�Ąy�=ļ°PÇ�Ķ
! WSEĽ×~¢ñĔËzK – /ğijē – ò�tĉÂqij – ĝ�U#¥ – xĉěĆ�Ù
![Page 12: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/12.jpg)
2. 0ÿ¬�Ľý�Ø®�
! WSEzKĽevaluatorØv!Òf
![Page 13: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/13.jpg)
3.0ÿ¬�Ľý�Ø®�
! �?Lesson1: – dĎÆ�§ģĈ
![Page 14: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/14.jpg)
3.0ÿ¬�Ľý�Ø®�
! �?Lesson2: – ��JīØ – Ü"*ĕļ�Ü"�V
![Page 15: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/15.jpg)
3.0ÿ¬�Ľý�Ø®�
! �?Lesson3: – µ:Æ�´�ÄģĈ
– �O�=îcØ`ć • EconomicsĽÓ��$Ē
– �OĔ�þ�=ĤØ`ć • ��Ľ�@ÃáÑ • evaluatorĽ�@¤'
![Page 16: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/16.jpg)
3.0ÿ¬�Ľý�Ø®�
! WSEØ�5 – §×Ĕ�þ – ÅT10w��ؽÉ
! + crowdsourcingدi – reCaptcha – Amazon Mechanical Turk – ESP Game – Human computation
![Page 17: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/17.jpg)
�nĽ¢ñ�£l Ĕ�
L;¬�Ľ����¨
0ÿ¬�Ľý�Ø®�
µY¬�ĽĴnIJ
�÷
Ĩ�
Ú�
![Page 18: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/18.jpg)
3.µY¬�ĽĴnIJ
! ĴnIJļ�5¶ÍÍĹ – 8ÔÝqÔ�EİļÛ�ijēëÖØka – AB testing, Bucket testing
'������
�$��� �
���/�
�2).%
100%�
50%�
50%�
![Page 19: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/19.jpg)
3.µY¬�ĽĴnIJ
! �ĈġÔAB testingتÈ? – Z»�¥Hļ�ĩ�vÛ�Ĕ�
![Page 20: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/20.jpg)
3.µY¬�ĽĴnIJ
! �ĈġÔAB testingتÈ? – ZB�ëÖئ¹���
![Page 21: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/21.jpg)
3.µY¬�ĽĴnIJ
! AB testingØÊå – 4gqijĽÂb+«� – ÊĤ6Ġ – qij�ò – «�6¸
![Page 22: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/22.jpg)
3.µY¬�ĽĴnIJ
! AB testingØ«�6¸Â� – 1T Cm§�
• cubeproducer, disqlļhadoop
– 1G olap§� • infobright, mondrian
– 1M ôé§� • ABreport
![Page 23: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/23.jpg)
3.µY¬�ĽĴnIJ
! AB testing�½Øċė – �ä�Ę
• ���½Ľ – Overall Evaluation Criteria
» ("Crook,"2009)"– Queryrank:
» #�,�+&)» ��&��-(�%�)
• i��½Ľ – 8Ôi��½�DijēļûNċė – �v2ë¾
![Page 24: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/24.jpg)
3.µY¬�ĽĴnIJ
! ÌĤ§�ċėØöij – _ċė;ø5Į²
• §�J�ÌĤļ÷đI±��
– Á¼�7 • qijĒČ
– æĚĽAA test – EĚ
• ÒfÁ¼ – «�ċ¸Øw9 – �ߦÑØw9
![Page 25: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/25.jpg)
3.µY¬�ĽĴnIJ
! ÌĤ§�ċėØöij – �ߦÑØw9
![Page 26: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/26.jpg)
3.µY¬�ĽĴnIJ
! ÊĤè� – 50% v.s. 50% Ľıĭ¥Ï – ,�qij�<ľ
Baidu�B1�
B2�
B3�
B4�
a1�
a2�
i1�
i2�
i3�
i4�
d1�
d2�
d3�
u1�
u2�
u3�
![Page 27: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/27.jpg)
�nĽ¢ñ�£l Ĕ�
L;¬�Ľ����¨
0ÿ¬�Ľý�Ø®�
µY¬�ĽĴnIJ
�÷
Ĩ�
Ú�
![Page 28: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/28.jpg)
�÷
! ¢ñ�£Ĕ� – Ü+�Ľ�ļ4ļ(ĵ© – ö-ªÈĽDCG
! ×~qę – Cm¬�ĽPM review – 0ÿ¬�Ľcrowdsourcing – µY¬�ĽAB testing
![Page 29: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/29.jpg)
�÷
! ��Ľ – �vĔ� v.s. AB testing�½o_1ç
– ���ã�@&ðl Ĕ�
– Þ¬Ô�Xk v.s. ĥ²Ô��ĥ
![Page 30: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/30.jpg)
![Page 31: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/31.jpg)
�nĽ¢ñ�£l Ĕ�
L;¬�Ľ����¨
0ÿ¬�Ľý�Ø®�
µY¬�ĽĴnIJ
�÷
Ĩ�
Ú�
![Page 32: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/32.jpg)
Ĩ�1ĽwsezKØo%º·
���
��� � �����������
��!�
�#��
�#��
���
���
�"��
�����
���
![Page 33: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/33.jpg)
Ĩ�2Ľ�¿Àºº·â�^
BWS�
M1�
BWS�"� �
!1)Sid=1001)
User)log�
#�/Cookie�
�1���)��sid�0#X�*X’�
internal)log�
M2�
M10�
N2�
![Page 34: 22期.百度彭滔 搜索引擎评估与用户行为分析](https://reader033.fdocuments.net/reader033/viewer/2022061502/5584f9cfd8b42ae71b8b4744/html5/thumbnails/34.jpg)
关注我们:t.baidu-tech.com
资料下载和详细介绍:infoq.com/cn/zones/baidu-salon
InfoQ 策划·组织·实施
关注我们:weibo.com/infoqchina
“畅想•交流•争鸣•聚会”是百度技术沙龙的宗旨。 百度技术沙龙是由百度与InfoQ中文站定期组织的线下技术交流活动。目的是让中高端技术人员有一个相对自由的思想交流和交友沟通的的平台。主要分讲师分享和OpenSpace两个关键环节,每期只关注一个焦点话题。 讲师分享和现场Q&A让大家了解百度和其他知名网站技术支持的先进实践经验,OpenSpace环节是百度技术沙龙主题的升华和展开,提供一个自由交流的平台。针对当期主题,参与者人人都可以发起话题,展开讨论。