Search Engine Dependency Conference

41
SEARCH ENGINE DEPENDENCY AND ITS INFLUENCE ON DATA QUALITY By Ronan CHARDONNEAU

description

Conference slides about search engine dependency and its influence on data quality

Transcript of Search Engine Dependency Conference

Page 1: Search Engine Dependency Conference

SEARCH ENGINE DEPENDENCY AND ITS INFLUENCE ON

DATA QUALITYBy Ronan CHARDONNEAU

Page 2: Search Engine Dependency Conference

Index

I - Introduction to the world of search enginesII - Risks of search engines dependency

III - How to solve the equation?IV - Future of Google and information research

V - Conclusion

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 3: Search Engine Dependency Conference

The World of Search engines

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 4: Search Engine Dependency Conference

Market configurationTOP 10 Search websites in the world for August 2007

Target: users more than 15 year-old, home and at work Source: comscore qSearch 2.0

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 5: Search Engine Dependency Conference

Leaders per country

Source: map made using data on « Alexa the Web information company (2008) ».

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 6: Search Engine Dependency Conference

A win or lose market

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 7: Search Engine Dependency Conference

Approximation of language contents available on Internet

Source: Internet world Stats

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 8: Search Engine Dependency Conference

What has already been proved

• Studies are showing that Internet is the main information provider (at least in Europe and America);• When surfing on the Internet search engines are the most used websites;• People trust search engines results;• When making research on the Internet people are mainly using one single search engine;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 9: Search Engine Dependency Conference

Brief summary

• Google is the market leader, followers are far;• 8 search engine leaders and probably eight continents on Internet;• A market defined by the adoption of standards (<50%) to search;• Contents are mainly in English, importance of Chinese, quality contents in Japanese, German and Korean;• Internet users cannot live without search engines and are loyal to a specific one;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 10: Search Engine Dependency Conference

Risks of search engine dependency and its influence on

data quality

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 11: Search Engine Dependency Conference

Definition The behaviour of not reconsidering the results coming from one single search engine.

It normally starts when you hear sentences such as:

- "Why should I bother using other search engines because I find everything I want with Google?"

- Do I really have some risks when I am using Google?

- All countries in the world have Google in their top 100 or less;

- Google has been recognized as the most powerful brand;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 12: Search Engine Dependency Conference

• Who is Google? Well... It is our friend;• We can carry it everywhere, relevant, convenient(quick display, services associated);• But:

– You have to know how to deal with it;– You have to know its limits;– You have to know its potential;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 13: Search Engine Dependency Conference

• If you don’t know how to deal with it:- You will never use his true capacities;- You will probably take the first information which is

displayed;• If you don't know its limits:

- And cannot find the information you will may think that the information does not exist;

- You may even think that the technology does not exist elsewhere;

• If you don’t know its potential:- You will not improve at performing research;

Consequences

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 14: Search Engine Dependency Conference

Advertisement• Search engines economical model is based on advertisement (99% of Google revenues are based on it);• However studies are showing that some categories of adults (non Internet generations) do not make the difference between commercial and non commercial links;• Some search engines are more commercial than others;• The more you know a search engine (Google) and the more you can practise Search Engine Optimization;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 15: Search Engine Dependency Conference

Google is not an isolated case• Baidu dependency in China and Yandex dependency in Russia;• Seznam dependency in Czech Republic;• Naver dependency in South Korea;• Yahoo dependency in Japan and many others Asian countries;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 16: Search Engine Dependency Conference

• Search engine dependency is confortable and then understandable;• But for many reasons it goes for a mass consumption information (blog phenomenon, advertisement…) which is not the best ones;• In our countries it is Google dependency but keep in mind that Europe and Americas are not the center of the world;

Brief summary

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 17: Search Engine Dependency Conference

How to solve the equation?

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 18: Search Engine Dependency Conference

First point

• If an answer exist... we should look for it;• At the moment there is no miracle solution

for lazy search;• But there are ways to get closer to the

answer;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 19: Search Engine Dependency Conference

Three pillarsLearn how to use the technology

Breaking the habitsTechnological awareness

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 20: Search Engine Dependency Conference

Concrete case: GoogleLearn how to use the technology:• Make advanced research:

– Simple Boolean operators («  », links:, define:, ?, *, ~,…) ;– Complex request: ?intitle:index.of? "" -filetype:html -filetype:asp -wiki -ringtone -filetype:htm

-posts -lyrics -filetype:shtml -filetype:php -filetype:doc -filetype:pdf -filetype:txt mpeg wma avi wmv

– Google Advanced search;• Using other Google services such as Google Alerts;• Use sub Google search engines such as Google Scholars;

Breaking the habits:- Get used to practice what you learnt and force

yourself to do so;- Results are coming and you get used to it;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 21: Search Engine Dependency Conference

Concrete case: GoogleTechnological awareness:

By performing better at search you will discover new technologies that you will have to learn.

For example: Google Alerts tell you that a new searchengine is coming up and then you try it;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 22: Search Engine Dependency Conference

Technological awareness: Google

Google Ads

Google Advanced Search

Do you know iGoogle?

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

When Google promotes its own technology good chances that it is worthwhile

Page 23: Search Engine Dependency Conference

Technological awareness: How to select the best

• Search engine market is a world of buzz:

• Where every search engine want to beat Google;• But are they really providing a technical

revolution?

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 24: Search Engine Dependency Conference

• Real time information: the Twitter example

When Google starts to be interested in one's technology it should then be a good one

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Start to look at what Google does not have

Page 25: Search Engine Dependency Conference

• Finding similar websites: Who is like it?

Unfortunately it is working only for popular websites

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Start to look at what Google does not have

Page 26: Search Engine Dependency Conference

Another way of searching information: Social bookmarking

Advantages: you find unindexed websites;

Disadvantages: rubbish websites, advertisement?

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Start to look at what Google does not have

Page 27: Search Engine Dependency Conference

Graphical display

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Start to look at what Google does not have

Page 28: Search Engine Dependency Conference

Look for specialized search engines- People: 123 People, CV gadget, Pipl…- Jobs: Indeed, JobiJoba…- Tutorials: Tutosearch, …- Torrent: Toorgle, …- Scientific information: Scirus,…- Information in a specific language: Yandex

for Russian, Baidu for Chinese….

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Start to look where Google is not the best

Page 29: Search Engine Dependency Conference

• Triangle method: Locating three independent sources that point to the same answer;

• Recent events in Tibet showed how it was important to look at different sources of information and even out of your own country;

How to improve data quality on the Internet?

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Source 1: Washington Post

Source 3: AntiCnn.comSource 2: Le Parisien

Page 30: Search Engine Dependency Conference

• Learn how to use, change your habits, be aware; • Be curious• Think about another way to look for information;• Three dependent sources of information;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Brief summary

Page 31: Search Engine Dependency Conference

Future of Google and information research

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 32: Search Engine Dependency Conference

Semantic search• You get feed instead of entering your

request;• Everything is talking about Semantic

search;• But it is mature yet, a buzz world again (there

are not a lot of suggestions);• Poor results if developped on scratch (poor index)

if developped by huge companies (few suggestions);

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 33: Search Engine Dependency Conference

Some issues to fix

• How to well index pictures? Are solutions such as Google labeler are the best???

• How to index videos?• How to index sounds?

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 34: Search Engine Dependency Conference

A Google which will have to change

• Too much information on the Internet;• A Google which is collapsing and providing

more and more sub search engines;• The development of high bandwidth

connection which mean graphical interface;

• A technological awareness which is difficult to transmitt;

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 35: Search Engine Dependency Conference

But a Google more and more present in our life

• Forecasts are going in that sense;• Development of OS on cell phones, Web

browser, Web software application (Google slides, Google « excel »....)

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 36: Search Engine Dependency Conference

The question is just how they will do it?

Google in 1998 Google 11 years after

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 37: Search Engine Dependency Conference

• Google will be with us in the future and we have to get used to it;

• Information research will be more and more assisted but you will still be in late if you do not perform advanced research;

• In a short future some issues will still be there (indexing of pictures…)

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Brief summary

Page 38: Search Engine Dependency Conference

Conclusion

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 39: Search Engine Dependency Conference

What you have to keep in mind

• At least if you are dependent you should be well dependent;

• Apply the triangle method;• Reconsider on each time the information

process (think differently);

I-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 40: Search Engine Dependency Conference

RecommandationsMaster thesis about search engine dependency:

- http://www.pandia.com/index.htmlList of search engines:

- http://www.pandia.com/powersearch/index.html- http://www.philb.com/whichengine.htm

To know more about search engines: Pandia search:- www.pandiasearch.com

Documentaries:- Google: Behind the screen by IJsbrand van Veelen

http://www.youtube.com/watch?v=TBNDYggyesc&hl=fr- The Great Firewall of China

http://www.youtube.com/watch?v=IWsXhNJFj78&hl=frI-Introduction II-Risks III-Solutions IV-Future V-Conclusion

Page 41: Search Engine Dependency Conference

Thank you for your attention

http://moteurs-de-recherches-alternatifs.blogspot.com