Where does Enterprise search end and text analytics begin?
-
Upload
paul-cleverley -
Category
Business
-
view
577 -
download
0
Transcript of Where does Enterprise search end and text analytics begin?
Where does enterprise search end and text analytics begin?
I was asked recently by a Chief Information Officer (CIO) of a large organization, where does search
end and text analytics begin? It is an interesting question perhaps worth exploring and developing
some models, one of which is shown below (Figure 1).
Figure 1 – From ‘traditional search’ (green) to unlocking a wider range of questions we can ask
The green boxes (Figure 1) are the domain of traditional search and the blue boxes illustrate the
direction of travel being taken by some organizations blending increasing amounts of text analytics
and Knowledge Organization Systems (KOS) like taxonomies and thesauri to traditional Information
Retrieval (IR). From left to right, this allows organizations to move beyond just finding documents
which contain existing ‘knowledge’, to pattern recognition (where the searcher is part of the process)
discovering latent ‘knowledge’ directly, sometimes using the whole corpus of text available.
Whether you call this space Knowledge Management, e-business, Artificial Intelligence (AI), smart
machines, deep learning, cognitive computing or plain old search & discovery probably does not
matter.
Ranking by frequency dominates
Text has been converted into numerical representations of some form since the first search engine
was developed. Statistical approaches for search have generally been dominated by frequency. Whilst
different metadata fields are typically tuned with different search ranking criteria (e.g. title and tags
higher weights compared to body text), all things being equal its statistical frequency that dominates
ranking. Whether that is how many times search terms have been mentioned in a document/corpus
or how many times people have made certain search queries (for query suggestions as you type) or
how many times a web page or document has been referenced or accessed. Even facets (refiners)
typically shown on the left hand side of a search user interface to help narrow a search, are almost
always ranked by frequency or popularity, regardless of whether they are created from manual or
automatic tagging methods. Ranking by frequency dominates.
I won’t delve into ‘relevance’ in this article, needless to say ranking by statistical frequency has been
used as a part-surrogate for relevance.
Linguistics
Linguistics (including the use of authority lists, thesauri/taxonomies, ontologies for handling synonyms
& relationships and Natural Language Programming (NLP)) have been used for decades to improve
searching. These techniques are typically used to ensure the above methods operate on ‘concepts’ to
mitigate the ‘vocabulary problem’ we have as humans and improve search recall and precision.
Thesauri, taxonomies and ontologies can also help recommend search terms. They are an aid to the
statistics, with many scholars proposing a ‘best of both worlds’ that hybrid (linguistic & statistical)
techniques work best to cater for a range of scenarios’ in search and auto-classification, although
some still argue for just one or the other.
Social cues
Social analytics has been used in search for many decades. For example, recommending or boosting
an item in search results because it is viewed or cited very often, suggesting search terms as you type
“the Google type model” or suggesting another information item that may be of interest, the
“Amazon.com type model” (often called crowdsourcing). These approaches are transactional, based
on the social cues from people. Enterprise social tools increasingly hosted in the cloud, perform the
same type of analysis, using the data from document or post views and likes, ‘people who attend online
meetings that you attended’ etc., to algorithmically push information to the user in an activity feed
like the “Facebook type model”, complementing traditional enterprise search. Aspects of
personalization in browser cookies have been doing this for years of course, by clickthrough
advertising revenue.
It could be argued that search has never been separate to analytics.
Content Analytics
As information volumes have grown exponentially, so has the analysis of the content inside these
‘containers’ of information, such as web pages, documents and structured databases. Linguistics is
still a very important part, but it supports the statistical methods that are used to seek out ‘interesting’
situational context.
Taxonomies and semantic networks are useful, if not essential for computer systems to help us
discover information, however, they may also blind us to new discoveries if we ignore what the data
(text) is telling us and only superimpose a priori representations.
Organizations have turned to auto-classification to help records retention, to reduce file storage costs
and clean up Redundant, Obsolete and Temporary (ROT) files typically on the shared file system and
email systems. Search is also used for reporting, highlighting information in the corpus that should not
be there for legal, privacy or confidentiality reasons. This analysis may often be simply a series of
phrase queries. Similar techniques have been (and continue to be used) automatically moving emails
into spam folders when they contained certain trigger words or with disambiguated semantics.
Perhaps these approaches are akin to First generation content analytics.
Second generation analytics could be viewed as more advanced techniques which target business
value, wealth creation and risk reduction to surface real world patterns. In addition to frequency, both
similarity and discriminatory techniques are increasingly used. In these techniques conversion of text
to numerical form is taken to extremes. For example the creation of complex probability distributions
using neural networks approximating one words relationship to every word in the entire corpus where
vectors can be compared, added and subtracted.
Instead of using this analytical information to influence search results of document and web pages,
the focus shifts to the associations between concepts and entities within and across those documents
and web pages. From documents to entities & concepts. It is still ‘search’ but a different focus.
Using an analogy, if documents are “atoms”, content analytics smashes them apart to look at the
concepts & entities “particles” inside and their behaviour with respect to one another. You might find
a new particle or a new relationship between particles that you did not know before, but you have to
look inside first and it takes imagination and energy to produce the really exciting.
Analogues
Organizations are increasingly interested in how these content analytic techniques can be used to
identify complex business analogues. As one scientist made the comment, “analogues are difficult to
search on because you don’t know what they are, so you don’t know what search queries to use!” For
example in the oil and gas industry looking for geological environments (using the similarity of words
that appear around different entities), and/or surface an activity trend that one company is doing in a
geological basin, that other companies are not. These techniques can transform unstructured
information into structured information to automatically return answers (not lists of documents), to
stimulate ideas, visualize results on a map or store in a database.
Prediction
Coping with information overload and keeping on top of what is going on around you is getting
increasing difficult in many areas. Using historical information to calibrate systems, may help predict
certain types of events before they happen. For example, looking at patterns from daily operations
reports as clues, alerting engineers to potential impending issues. They present another voice based
on the text, akin to “Look, last time I saw these clues appearing in the operations reports..this
happened”. These techniques have been used with quantitative data for many years to predict and
prescribe action, but are now being increasingly applied to qualitative (unstructured text) authored
by people.
Relevance versus Interestingness
In some recent research on facilitating serendipity in the search user interface, a scientist in an oil and
gas company mentioned a search result refiner was ‘relevant but not interesting’. Clearly what is
interesting for one person, may not be interesting for another. However, there may be a need to move
beyond traditional definitions (and algorithms) for relevance (Figure 2).
Figure 2 – Moving search beyond a text box and ten blue links
Summary
Returning to the question posed at the beginning, ‘Where does search end and text analytics begin?’
Perhaps they are two sides of the same coin. Text analytics has always been essential to basic
information retrieval, although its main use was to help people find results (containers) they were
looking for. Analytics of social cues have been used to good effect to help people locate what they
were looking for and also discover information that they were not looking for. However, some argue
this is discovery through the ‘rear view mirror’ and creates a filter bubble that does not encourage us
to stray from the beaten path. Combining search and content based text analytics presents us with
opportunities to help us formulate our needs, test hypotheses, predict events, unlock new knowledge
and increase the propensity of our user interfaces to stimulate fortuitous information discovery.
There are likely challenges for the CIO. How to meet existing complaints and needs (to find that single
web page or document) for all staff, whilst delivering the capability for advanced discovery to small
communities that may wish to mine external and internal information (much more than a simple e-
discovery solution) – which may lead to leaps in business value that cannot be predicted in advance.
Many organizations have already done this organically. It may be a mistake to think this can be
achieved with a single technology or user interface.
That brings further challenges with regards to costs, multiple indexing streams and complexity.
Business cases and creative architectures may exist however to meet both requirements. However,
this will require leadership, to set out a clear vision with careful planning and architecting. These
decisions need to be made against a backdrop of technology vendor propaganda and useful
information which are intertwined, making it sometimes difficult for objective and realistic views of
what is possible and what are the caveats.
Exponentially growing information volumes combined with proven techniques published in the public
domain, present an opportunity to expand our horizons; ‘To move the goalposts’, with respect to the
questions we can ask computer systems and what those computer systems can suggest to us; after
all, we may be asking the wrong questions.
More at: www.paulhcleverley.com
References Addison, V. (2014). Oil, Gas Industry Focuses on Predictive Analytics. Hart Energy October 6th 2014. Online Article
(Accessed February 2015).
Adkins, S (2003). Information Gathering in the Electronic Age: The Hidden Cost of the Hunt. Safari Techbooks, January 2003.
AIIM (2008). Market IQ Report: Findability: The art and science of making content easy to find. Association for Information
and Image Management (AIIM) 2008. Sponsored by OpenText.
Allan, J., Croft, B., Moffat, A., Sanderson, M. (2012). Frontiers, Challenges, and Opportunities for Information Retrieval.
Report from the Second Strategic Workshop on Information Retrieval in Lorne, February 2012, ACM SIGIR Forum,
46(1), 2-32
Alyahyaee, A. (2012). Oil & Gas Data Repository (OGDR), Energistics National Data Repository (NDR) ’11 Update. 21st-
24th October 2012. Kuala Lumpur, Malaysia.
Andersen, E. (2012). Making Enterprise Search Work: From Simple Search Box to Big Data Navigation. Center for
Information Systems Research (CISR) Massachusetts Institute of Technology (MIT) Sloan School Management,
12(11).
Ballard, T., Blaine, A. (2011). User search limiting behaviour in Online Catalogs. Comparing classic catalog use to search
behaviour in next generation catalogs. New Library World, 112(5/6), 261-273.
Bawden, D. (1986). Information-Systems and the Stimulation of Creativity. Journal of Information Science, 12(5), 203-216.
Behounek, S., Casey, K. (2007). EarthSearch=GoogleEarth Enterprise+PetroSearch. Society of Petroleum Engineers (SPE)
Digital Energy Conference and Exhibition, 11-12th April, Houston, Texas, USA. Report ID: SPE-108208-MS
Berger, P.L., Luckmann, T. (1966). The social construction of reality. A treatise in the sociology of knowledge. 1st ed. London:
Penguin.
Bizer, C., Heath, T., Berners-Lee, T. (2009). Linked Data – The Story So Far. Special Issue on Linked Data, International
Journal on Semantic Web and Information Systems (IJSWIS), 5(3), 1-22.
Blackman, S. (2012). Risky business: challenges of deepwater drilling in the North Sea. Offshore Technology, 21st June 2012.
Online Article (Accessed December 2014).
Blei, D, Ng, A., Jordan, M. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research 2003, 3, 993-1022
Broussard, F., Dineen, P., Tushingham, K. (2011). Hart’s E&P Magazine. Digital Oil Field: G&G software accelerates user
productivity. Schlumberger, August 2011.
Brown, J.S., Duguid, P. (1991). Organizational Learning and Communities-of-Practice: Toward a Unified View of Working,
Learning and Innovation. Organizational Science, 2(1), 40-57.
Brown, N. (2014). Fostering Collaboration Using Analytics & Real-time Big Data Search: Insight into Technology Services.
AstraZeneca presentation Enterprise Search Europe, 29-30th May, London, UK.
Bushell, S. (1999). Wiring the Corporate Brain. Chief Information Officer (CIO). Online Article 6th October 1999 (Accessed
October 2014).
Caballero, R, Nuernberg, S. (2014). Building an Enterprise Taxonomy. 18th International Petroleum Data, Integration and Data
Management (PNEC), May 20-22nd 2014, Houston, USA.
Carpineto, C., Romano, G. (2012). A Survey of Automatic Query Expansion in Information Retrieval. ACM Computing
Surveys, 44(1), 1-50.
Chuang, J., Manning, C.D., Heer, J. (2012). “Without the Clutter of Unimportant Words”: Descriptive Keyphrases for Text
Visualization. ACM Transactions on Computer-Human Transactions, 19(3)
Chui, M., Manyika, J., Bughin, J., Dobbs, R., Roxburgh, C., Sarrazin, H., Sands, G., Westergren, M. (2012). The social
economy: Unlocking value and productivity through social technologies. McKinsey Global Institute Report. Online
Article (Accessed January 2015).
Chum, F., Everett, M., Hills, S., Soma, R., Cutler, R. (2011). Realizing the Semantic Web Promise in the Oil & Gas Industry:
Challenges and Experiences. SemTech 2011,, 9th June 2011, San Francisco, USA.
Chum, F. (2009). Semantic Technologies at the Ecosystem Level. Interview by (Morrison, A. and Parker, B.)
PriceWaterhouseCoopers Technology Forecast Spring 2009. Online Article.
Cleverley, P.H. (2012). Improving Enterprise Search in the Upstream Oil and Gas Industry by Automatic Query Expansion
using a Non-Probabilistic Knowledge Representation. International Journal of Applied Information Systems (IJAIS),
1(1), 25-32
Cleverley, P.H. (2014). Towards a causal model for search user satisfaction and sub-optimal task performance in the upstream
oil and gas industry. Doctoral PhD Thesis (work in progress – unpublished), Robert Gordon University, Aberdeen, UK.
Cleverley, P.H., Burnett, S. (2015b). Creating sparks: comparing search results using discriminatory search term word co-
occurrence to facilitate serendipity in the enterprise. Journal of Information and Knowledge Management (JIKM).
Cleverley, P.H., Burnett, S. (2015a). Retrieving haystacks: a data driven information needs model for faceted search. Journal
of Information Science, 41, 97-113
Colleran, J. (2014). Improving Exploration Success through Better Data Management: Maersk Oil Perspective. The Oil and
Gas Industry Conference, 12th June 2014, London, UK.
Coyne, I.T. (1997). Sampling in qualitative research. Purposeful and theoretical sampling; merging or clear boundaries. Journal
of Advanced Nursing, 26, 623-630.
Dale, E. (2013). The importance of constant measurement in search relevance. A longitudinal case study. Ernst & Young.
Enterprise Search Summit 2013, New York, USA.
DeLone, W.H., McLean, E.R. (2002). The DeLone and McLean Model of Information System Success: A Ten Year Update.
Journal of Management Information Systems, 19(4), 9-30.
Delphi (2002). Taxonomy & Content Classification. Market Milestone Report. Online Article (Accessed March 2013).
Demartini, G. (2007). Leveraging Semantic Technologies for Enterprise Search, PIKM November 2009, Lisboa, Portugal.
Dextre Clarke, S.G., Zeng, M.L. (2012). From ISO 2788 to ISO 25964: The Evolution of Thesaurus Standards towards
Interoperability and Data Modeling. Information Standards Quarterly, 24(1), 20-26.
Dillon, T. S., Talevski, A., Potdar, V., & Chang, E. (2009). Web of things as a framework for ubiquitous intelligence and
computing. In Ubiquitous Intelligence and Computing (2-13). Springer Berlin Heidelberg.
Doane, M. (2010). Cost benefit analysis: Integrating an enterprise taxonomy into a SharePoint environment. Journal of Digital
Asset Management, 6(5), 262-278
Duan, L., Xu, L.D. (2012). Business Intelligence for Enterprise Systems: A Survey. IEEE Transactions on industrial
informatics, 8(3), 679-687
Espinosa, J.A., Armour, F. (2010). Enterprise Architecting Process and Coordination. Executive Briefing Series, Center for
Information Technology and the Global Economy (CITGE), Kogod School of Business, 3(3)
Fagan, J.C. (2010). Usability studies of faceted browsing: A literature review. Information Technology and Libraries, 58-66.
Faith, A. (2011). Linguistically Training Automatic Indexing Software for Complex Taxonomies. Semantic Technology &
Business Conference June 2013.
Feldman, S., Sherman, C. (2001). The High cost of not finding information. White Paper International Data Corporation (IDC).
Feldman, S., Marobella, J.R., Duhl, J., Crawford, A. (2005). The Hidden Costs of Information Work. White Paper International
Data Corporation (IDC).
Feldman, S. (2009). IDC Executive Briefings: Information Advantage: Information Access in Tommorow’s Enterprise.
International Data Corporation (IDC).
Foster, A. & Ford, N. (2003) Serendipity and information seeking: an empirical study. Journal of Documentation. 59(3), 321-
340
Friedman, B. (2010). Serendipity is an Explorationists best friend. American Association of Petroleum Geologists (AAPG)
Online Article.
Furnas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T. (1987). The vocabulary problem in human-system communication.
Communications of the ACM, 30(11), 964-971
Garbarini, M., Catron, R.E., Pugh, B. (2008). Improvements in the Management of Structured and Unstructured Data. Society
of Petroleum Engineers, Report IPTC12035.
Garbujo, C., Viarigi, P. (2013). ENI E&P Global GIS Project Infoshop. ESRI Petroleum Users Group (EPUG) 14th November
2013, London, UK.
Geggel, L. (2015). Forget Jeopardy: 5 Abilities That Make IBM’s Watson Amazing. Livescience Online Article April 15th,
(Accessed April 2015)
Ghiselin, D. (2010). Serendipity is alive and well at EagleFord. Hart’s E&P Online Article.
Gimmal (2013). Information Governance and Compliance in Oil and Natural Gas Company. Online Article (Accessed January
2015)
Goker, A., Davies, J. (2009). Information Retrieval: Searching in the 21st Century. UK: Wiley & Sons Ltd
Greenberg, J. (2011). Introduction: Knowledge Organization Innovation: Design and Frameworks. Bulletin of the American
Society for Information Science and Technology, April/May 2011, 37(4), 12-14.
Grefenstette, G. (1994). Explorations in Automatic Thesaurus Generation. MA, USA: Kluwer Academic Publishers Norwell
Grimes, S. (2014). Text Analytics Applied. 2nd LIDER Road mapping workshop, May 8th 2014, Madrid, Spain.
Gwizdka, J. (2009). What a difference a tag cloud makes: effects of tasks and cognitive abilities on search results interface
use. Information Research. 14(4)
Halvey, M., Keane, M.T. (2007). An assessment of tag presentation techniques. Proceedings of 16th International World Wide
Web Conference (WWW).
Hamski, J. (2010). Unstructured Geospatial Information for a Competitive Advantage in Resource Exploration. Elsevier,
Online Article, Accessed January 2015.
Hearst, M.A. and Stoica, E. (2009). NLP Support for Faceted Search Navigation in Scholarly Collections. Proceedings of the
2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, ACL-IJCNLP Suntec, Singapore 7th
August 2009, 62-70
Hedden, H. (2013). Taxonomies for Auto-Tagging Unstructured Content. Text Analytics World, October 1st 2013, Boston
USA.
Heye, D. (2003). Taxonomies and automatic classification at Shell – a case study. ‘Building a Knowledge Framework:
Practical Taxonomy Design and Application Conference, September 29-30th Amsterdam, The Netherlands.
Hills, S. (2014). Why we Want to Implement ISO Metadata: Energy Industry Profile of ISO 19115-1:2014 (“EIP”) V1.0.
Energistics FGDC ISO Metadata Implementation Forum 12th February 2014.
Hjorland, B. (2008). What is Knowledge Organization (KO)? International journal devoted to concept theory, classification,
indexing and knowledge representation, 35(2/3), 86-101
Hodge, G. (2000). Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files. Washington,
USA, First Digital Library Federation and Council on Library and Information Resources.
Hubert, C. (2012). Seamless Collaboration. Enabling Employees to Work Together Across Boundaries. APQC Report K03906,
1-15.
Jacob, E.J. (2004). Classification and categorization: A Difference that Makes a Difference. Library Trends, 52(3), 515-540.
Jacobs, P.S., Rau, L.R. (1990). SCISOR: Extracting information from on-line news. CACM 33, 88-97
Jurka, T.P., Collingwood, L., Boydstun, A.E., Grossman, E., van Atteveldt, W. (2013). RTextTools: A Supervisory Learning
Package for Text Classification. The R Journal, 5(1), 6-12.
Kaizer, J., Hodge, A. (2005): "AquaBrowser Library: Search, Discover, Refine", Library Hi Tech News, 22(10), 9-12
Kastrin, A., Rindflesch, T.C., Hristovski, D. (2014). Large-Scale Structure of a Network of Co-Occuring MeSH Terms:
Statistical Analysis of Macroscopic Properties. PLoS One, 9(7).
Khoo, C.S.G., Luyt, B., Ee, C., Osman, J., Lim, H., Yong, S. (2007). How users organize electronic files on their workstations
in the office environment: a preliminary study of personal information organization behaviour. Information Research,
11(2).
Koenig, M.E.D. (2002). Time saved – a misleading justification for KM. KM World, 11(5)
Krestel, R., Demartini, G., Herder, E. (2011). Visual Interfaces for Stimulating Exploratory Search. JCDL 2011, June 13th-17th
Ottawa, Canada, 393-394.
Landauer, T.K., Dumais, S.T. (1997). A Solution to Platos’ Problem: The Latent Semantic Analysis Theory of Acquisition,
Induction, and Representation of Knowledge. Psychological Review, 104(2), 211-240.
Lennon, A., Alshubi, F., Cleverley, P.H. (2012). Improving Subsurface and Wells Document Management at Qatar Shell. 16th
Annual Petroleum Data Integration Conference. May 15th-17th Houston, USA.
Low, B. (2011). Usability and contemporary user experiences in digital libraries. CIGS Seminar, University of Edinburgh.
Slide 17
Lowe, A., McMahon, C., Culley, S. (2004). Characterising the requirements of engineering information systems. International
Journal of Information Management, 24, 401-422.
Luke, T., Schaer, P., Mayr, P. (2012). Improving Retrieval Results with discipline-specific Query Expansion. Proceedings of
Theory and Practice of Digital Libraries, 2012.
Lund, K., Burgess, C., Atchley, R.A. (1995). Semantic and Associative Priming in High-Dimensional Semantic Space.
Cognitive Science Proceedings, 603-608
Magnuson, D. (2014). Auto Classification and the Holy Grail for Records Managers. IBM Presentation as the Association or
Records Managers and Administrators (ARMA), Houston.
Majid, S., Anwar, M.A., Eisenshitz, T.S. (2000). Information Need and Information Seeking Behavior of Agricultural
Scientists in Malaysia. Library & Information Science Research, 22 (2), 145-163
Manning, C.D., Schutze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, United States of
America, Massachusetts Institute of Technology (MIT) Press.
Manning, C.D., Raghavan, P., Schutze, H. (2009). An Introduction to Information Retrieval. Cambridge, England. Cambridge
University Press.
Marchionini, G. (2006). Exploratory Search: From Finding to Understanding. Communications of the ACM. 49 (4), 41-46
Martela, F. (2015). Fallible Inquiry with Ethical End-in-View: A Pragmatist Philosophy of Science for Organizational
Research. Organizational Studies, 1-27.
Mason, J. (2006). Mixing methods in a qualitative way. Qualitative Research, 6(1), 9-25
Matarazzo, J.M., Pearlstein, T. (2014). Demonstrating the Value of Corporate Libraries. APLIC Meeting, April 29th 2014,
Boston, USA.
McCandless, D. (2012). Information in beautiful, 2nd ed., William Collins, London.
McCay-Peet, L. & Toms, E. (2011). Measuring the dimensions of serendipity in digital environments. Information Research,
16(3)
McDonald, S., Ramscar, M. (2001). Testing the distributional hypothesis: The influence of context on judgements of semantic
similarity. Proceedings of the 23rd Annual Conference of the Cognitive Science Society, 611-616.
McNaughton, N. (2015). Knowledge organization – the great debate! Oil Information Technology Journal, 20(2), 1-11
Microsoft and Accenture (2010). Upstream Oil & Gas Computing Trends Survey (2010). Conducted by PennEnergy Research
and the Oil & Gas Journal Research Centre.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J. (2013). Distributed representations of words and phrases and
their compositionality. Advanced in Neural Information Processing Systems, 3111-3119
Miller, D. (2014). Just the facts: Auto-classification and Taxonomies. ConceptSearching Webinar, Online Article (Accessed
February 2015).
Miller, G.A. (1956). The magical number seven, plus or minus two. Some limits on how our capacity for processing
information. Psychological Review, 63, 81-97.
Miller, G.A., Beckwith, R., Fellbaum, C.D., Gross, D., Miller, K. (1990). WordNet: An online lexical database. International
Journal of Lexicography, 3(4), 235–244
Mindmeter (2011). Mind the Enterprise Search Gap. Report Sponsored by SmartLogic.
Mitchell, T.M., AbuZaki, W., Betteridge, J., Carlson, A., Hruschka, E.R., Kisiel, B., Settles, B., Wang, R. (2009). How Will
We Populate the Semantic Web on a Vast Scale? International Semantic Web Conference (ISWC) 2009.
Morgan, D.L. (1997). Focus Groups as Qualitative Research: Planning and Research Design for Focus Groups. In Sage
Research Methods, 32-46
Munkvold, B.E., Paivarinta, T., Hodne, A.K., Stangeland, E. (2006). Contemporary issues of enterprise content management:
the case of Statoil. Scandinavian Journal of Information Systems, 18(2), 69-100.
Navigli, R., Velardi, P. (2002). Automatic Adaptation of WordNet to Domains. Proceedings of the Third International
Conference on Language Resources and Evaluation (LREC ’02), Canary Islands, Spain.
Nimmagadda, S.L., Dreher, H., Rudra, A. (2014). Integration and Effective Management of Heterogeneous Petroleum Digital
Ecosystems Using Big Data Paradigm. PPDM Data Management Symposium, 6th August 2014, Perth, Australia.
Niu, X., Hemminger, B.M. (2010). Beyond Text Querying and Ranking List: How People are searching through Faceted
Catalogs in Two Library Environments. Proceedings of the 73rd Association for Information Science and Technology
(ASIS&T) Annual Meeting on Navigating Streams in an Information Ecosystem 2010, 47(29)
Noor, A.M., Yassin, C.Z.H. (2006). Issues, Challenges and Constraints in K-Era. Proceedings of the Knowledge Management
International Conference. Kuala Lumpur, Malaysia, 6-8th June 2006.
Norling, K., Boye, J. (2013). 2013 Findability Survey. Findability Day. Findwise, Stokholm May 2013
NSS (2014). National Statistics Service Australia Online Calculator (Accessed September 2014).
Oberle, D. (2014). How ontologies benefit enterprise applications. Semantic Web, 5(6), 473-491.
O’Donnell, M. (2011). Visualizing Patterns in Text: Keynote talk at AESLA (Spanish Association of Applied Linguistics),
University of Salamanca May 4th-6th. (Online Article, accessed September 2014).
Ohly, P.H. (2012). Actas del X Congreso ISKO Capitulo Espanol (Ferrol 2012), 541-551
Oil and Gas UK (2011). Oil and Gas UK. Exploration Economic Report 2011. Online Article (Accessed January 2015).
Olson, T.A. (2007). Utility of a faceted catalog for scholarly research. Library Hi Tech. 25(4), 550-561.
Oracle (2012). From overload to impact: An industry scorecard on big data business challenges. Online Article (Accessed
March 2013).
Outsell (2005). Survey of Knowledge Workers. Online Article (Accessed March 2013).
Painter, K., Dutton, S.J., Owens, E.O., Burgoon, L.D. (2014). Automatic Document Classification for Environmental Risk
Assessment. PeerJ PrePrints,
Palkowsky,B. (2005). A New Approach to Information Discovery – Geography Really Does Matter. Society of Petroleum
Engineers (SPE) Annual Technical Conference and Exhibition, Dallas, Texas, USA, 9-12th October 2015. Report ID:
SPE 96771
Palmer, C.R., Pesenti, J., Valdes-Perez, R.E., Christel, M.G., Hauptmann, A.G., Ng, D., Wactlar, H.D. (2001). Demonstration
of hierarchical document clustering of digital library retrieval results. Proceedings of the 1st ACM/IEEE-CS joint
conference on digital libraries, 451.
Peng, J., He, B., Ounis, I. (2009). Predicting the Usefulness of Collection Enrichment for Enterprise Search. ICTIR 2009, 366-
370.
Preece, A., Flett, A., Sleeman, D., Curry, D., Meany, N., Perry, P. (2001). Better Knowledge Management through Knowledge
Engineering. Knowledge Management IEEE Intelligent Systems, Jan/Feb 2001, 36-42
Prince, V., Roche, M. (2009). Information Retrieval in Biomedicine: Natural Language Processing for Knowledge Integration.
New York, USA, Medical Information Science Reference.
Quaadgras, A., Beath, C.M. (2011). Leveraging unstructured data to capture business value. Center for Information Systems
Research (CISR). MIT, Sloan School of Management, 11(4).
Raskin, R. (2011). National Aeronautical Space Administration (NASA) Semantic Web for Earth and Environmental
Terminology (SWEET) Ontology.
Rasmus, D.W. (2013). How IT Professionals can Embrace the Serendipity Economy. Harvard Business Review, August 19th
2013 Online Article (Accessed January 2013).
Robinson, M.A (2010). An empirical analysis of engineer’s information behaviors. Journal of the American Society for
Information Science and Technology, 61(4), 640-658
Romero, L. (2013). Deloitte: Improving Findability in the Enterprise. APQC Knowledge Management Conference May 3rd
2013, Houston, Texas, USA.
Rose, D.G. (2010). Apache Corporation. The ECM Journey. AIIM Southwest Chapter, May 6th 2010.
Saleem, M., Kamdar, M.R., Iqbal, A, Sampath, S., Deus, H.F., Nyonga, A. (2013). Fostering Serendipity through Big Linked
Data. Semantic Web Challenge (ISWC) 2013.
Salmador Sanchez, M.P., Angeles Palacios, A. (2008). Knowledge-based manufacturing enterprises: evidence from a case
study. Journal of Manufacturing Technology Management, 19(4), 447-468.
Salthe, S.N. (2012). Hierarchical Structures. Axiomathes, 22, 355-383
Sarrafzadeh, B., Vechtomova, O., Jokic, V. (2014). Exploring Knowledge Graphs for Exploratory Search. IIiX August 26th-
29th 2014, Regensburg, Germany.
Sasaki, Y. (2008). Automatic Text Classification. University of Manchester. Online Article (Accessed November 2014).
Schlumberger (2008). Schlumberger Oilfield glossary. Online resource (accessed March 2014).
Shiri, A.A., Revie, C.W., Chowdhury, G. (2002). Thesaurus-assisted search term selection and query expansion: a review of
user-centred studies. Knowledge Organization, 29(1), 1-19.
Skoglund, M., Runeson, P. (2009). Reference-based search strategies in systematic reviews. Proceedings of the 13th
International Conference on Evaluation and Assessment in Software Engineering (EASE). Durham University, 20-21st
April 2009, 31-40.
Smiraglia, R.P., van den Heuvel, C. (2011). Idea Collider: From a Theory of Knowledge Organization to a Theory of
Knowledge Interaction. Bulletin of the American Society for Information Science and Technology, April/May 2011,
37(4), 43-47.
Smith, R. (2012). Implementing Enterprise Information Management at Marathon Oil. Gartner Portals, Content and
Collaboration Summit. Track B: Content and Information Management Session B2, March 12th 2012.
Solskinnsbakk, G., Gulla, J.A. (2008). Ontological Profiles as Semantic Domain Representations. NLDB 2008, LNCS 5039,
pg. 67-78
Spiteri, L.F. (2004). Word Association Testing and Thesaurus Construction. Library and Information Science Research
Electronic Journal (LIBRES), 14(2)
Stamper, R. (1996). Signs, Information Norms and Systems. In Signs of Work, Semiosis and Information Processing in
Organizations, Holmqvist et al. (Eds), Berlin: Walter de Gruyter
Stenmark, D. (2008). Identifying clusters of user behaviour in Intranet Search Engine log files. Journal of the American Society
for Information Science and Technology, 59(14), 2232-2243.
Steyvers, M., Tenenbaum, J.B. (2005). "The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model
of Semantic Growth". Cognitive Science 29
Stock, W.G. (2010). Concepts and Semantic Relations in Information Science. Journal of the American Society for Information
Science and Technology, 61(10), 1951-1969.
Tonstad, K., Bjorge, E. (2003). Data Management Metrics in Statoil, Smi Data Management Presentation, London, UK.
Tudhope, D., Alani, H., Jones, C. (2001). Augmenting Thesaurus Relationships: Possibilities for Retrieval. Journal of Digital
Information (JODI), 1(8).
Van Noorden, R. (2014). Scientists may be reaching a peak in reading habits. Nature International weekly journal of science,
news 5th February 2014 Online Article (Accessed January 2015).
Velardi, P., Navigli, R., Martinez, S. (2012). A New Method for Evaluating Automatically Learned Terminological
Taxonomies. Proceedings of the 8th Conference on International Language Resources and Evaluation (LREC 2012),
May 21-27th, 2012.
Villena-Roman, J., Collada-Perez, S., Lana-Serrano, S., Gonzalez-Cristobal, J.C. (2011). Hybrid Approach Combining
Machine Learning and a Rule-Based Expert System for Text Categorization. Proceedings of the Twenty-Fourth
International Florida Artificial Intelligence Research Society Conference, 323-328.
W3C (2009). W3C workshop on Semantic Web in Oil and Gas Industry – Report.
Walkup, G.W., Ligon, B.J. (2006). The Good, Bad and Ugly of Stage-Gate Project Management Process as Applied in the Oil
and Gas Industry. Society of Petroleum Engineers (SPE) Annual Technical Conference and Exhibition, 24-27th
September, San Antonio, Texas, USA. Report ID: SPE-102926-MS.
Wei, F., Liu, S., Song, Y., Pan, S., Zhou, M.X., Qian, W., Shi, L., Tan, L., Zhang, Q. (2010). TIARA: A Visual Exploratory
Text Analytic System. Proceedings of ACM. Knowledge Discovery in Databases (KDD), July 25-28th Washington DC,
USA.
Wessely, J. (2011). Text Analytics and Auto-Categorization in Semantic Web Applications. SemTech 2011. Online
Presentation (Accessed December 2014).
White, M. (2012). Enterprise Search. 1st Edition. California: O’Reilly.
White, M. (2014). Search Strategy A-Z List of Topics. Intranet Focus, September 2014, Online Article.
Wilson, T.D. (2000). Human Information Behavior. Special Issue on Information Science Research, Informing Science, 3(2)
Yu, K., Zhang, J., Chen, M., Xu, X., Suzuki, A., Ilic, K., Tong, W. (2014). Mining hidden knowledge for drug safety
assessment: topic modelling of LiverTox as a case study. BMC Bioinformatics, 15
Zeeman, D., Jones, R., Dysart, J. (2011). Assessing Innovation in Corporate and Government Libraries. Computers in
Libraries, 31(5)
Zeng, M.L. (2008). Knowledge Organization Systems (KOS). Knowledge Organization, 35(2/3).