Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc....

47
Introduction to Full- Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. [email protected]

Transcript of Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc....

Page 1: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Introduction to Full-Text Searching in SQL Server

2012Adolfo J. Socorro, Ph.D.

IT Impact, [email protected]

Page 2: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Outline

What can we do with FTS?How to install FTSFTS componentsCreating FTS indexesHow to query with FTSFILESTREAM and FileTable

Page 3: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

FTS Basics

charvarcharncharnvarchartext

ntextimagexmlvarbinaryvarbinary(max) 

FTS allows searching against character-based data

Page 4: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Search Functionality

• “hotel” => “hotel”

Specific words or phrases

• “fan” => “fantastic”, “fantasy”• “local store” => “locally stored”

Prefixes

• “minimized” => “minimizing”, “minimise”

Inflectional forms

Page 5: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Search Functionality

• “search,query” => “query to perform search”

Proximity

• “folder” => “directory”

Synonyms

Weighted Values

Page 6: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

A First Look

Let’s run some simple examples to get a feel for FTS!

Page 7: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

LIKE vs FTS

LIKE works on character patterns only

Cannot use the LIKE predicate to query formatted binary data

FTS is much faster against large amounts of unstructured text data

Page 8: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Supported SQL Server Editions

EnterpriseBusiness IntelligenceStandardWebExpress with Advanced Services

Available since at least SQL Server 2000

Page 9: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

FTS Components

Word Breaker

Stemmer Stoplists

Thesaurus Filters Property

Lists

Page 10: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Language Support

50+ languages

Language-specific componentsWord breakers and stemmersStoplistsThesaurus files

Page 11: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

How to Install

Page 12: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Default FTS Language

Page 13: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

FTS Indexes

One index per table or indexed view

Must have a unique, single-column, non-nullable index on the table

Grouped within the same database into one or more full-text catalogs (“containers”)

Page 14: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Full-Text Catalogs

A logical construct

A way to manage FT indexes together

Page 15: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Index Population

Population: the addition of data to full-text indexes

Automatic Manual• On Request• Scheduled

Page 16: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Steps to Setup an Index on a Table

Create Full-Text Catalog

For Each Column to Index• Indicate language• Indicate document type *

Choose Change-Tracking Mechanism

Page 17: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Full-Text Index Wizard

Page 18: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Example: Create Catalog and Index

Page 19: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

CONTAINS

Precise or prefix matches to single words and phrases

Proximity matchesLogical operations between conditions:

AND, OR, AND NOTOptional use of inflectional forms and

thesaurus

Page 20: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

FREETEXT

Matching the meaning, but not the exact wording, of specified words or phrases

Always uses inflectional forms and thesaurus

Page 21: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

CONTAINSTABLE AND FREETEXTTABLE

Return a relevance ranking value (RANK) and full-text key (KEY) for each row

The actual RANK values are unimportant and typically differ each time the query is run

ISABOUT/WEIGHT influence the rankingin CONTAINSTABLE

Page 22: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Example: Queries

Page 23: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Stoplists

A mechanism to discard commonly occurring strings that do not help the search

a is the

by and …

Page 24: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Thesaurus

Nicknames: Robert/BobCommon misspellings:

calendar/calenderHomophones: Geoff/JeffTechnical terms: proc/procedure

Very powerful if you log searches and learn what users are commonly

searching for

Page 25: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Thesaurus

One file per language

Expansions

“bike”in addition to

“bicycle”

Replacements

“calendar” instead of “calender”

Page 26: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Filters

Extract textual information from the document (removing the formatting)

Send the text to the word-breaker component for the language associated with the column

Need to manually install Office 2010 and PDF filters

Page 27: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Example: FTS Components

Page 28: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Where to Store Large Objects?

Database File System

security

manageability,

recoverability

transactional

consistency

performance

Page 29: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Why Store in the Database?

Integrating unstructured data into the relational database provides significant benefits: Integrated storage and data

management capabilities (e.g., backup)

Ease of administration and policy management

Full-text search

Page 30: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

FILESTREAM

A database/file system hybrid

FILESTREAM is an attribute that can be assigned to a varbinary(max) column

Allows storing BLOB data in the file system

Not restricted to the 2 GB limitSQL Server imposes on BLOBs

Page 31: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

FILESTREAM

SQL Server buffer pool is not used

Isolation semantics are governed byDatabase Engine transaction isolation levels

Page 32: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Steps to FILESTREAM

Enable at OS level

Configure at instance level

Create a filegroup

Add a file to the filegroup• Indicate root folder

Page 33: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

OS-level Configuration ofFILESTREAM

Page 34: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Instance-level Configuration of FILESTREAM

Page 35: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Example: FILESTREAM

Page 36: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

FILESTREAM

All data access must be transactional

Must use specific APIs for file I/O

Do not edit the files directly!

Page 37: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

When to Use FILESTREAM

Objects that are being stored are, on average, larger than 1 MBStore smaller objects in the

database

Fast read access is important

You are using a middle tier for application logic

Page 38: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

FileTables

A special, fixed-schema kind of table

Builds on top of existing FILESTREAM capabilities

Store files and documents in in the database, but access them from Windows applications as if they were stored in the file system (WIN32 API)

Page 39: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

FileTables

Hierarchical namespace

Includes file system properties as columns

Preserves full file names

Non-transactional access through the FS

Page 40: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

FileTables

Calls to create or change a file or directory through the Windows share are intercepted by a SQL Server component and reflected in the corresponding relational data in the FileTable

Page 41: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Example: FTS over FileTables

Page 42: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

FileTables vs FILESTREAM

File and directory hierarchy maintained in the database

Windows application compatibility

Relational access to file attributes

Both are available in all editions

Page 43: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Wrap Up

Advanced searching on character-based data, including documents

FTS setup, components, and queries

FILESTREAM

FileTables

Page 44: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Other Topics

Document-property searchSemantic searchOptimizationsQuery plans and execution traces

Page 45: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

References

Posts and presentations by Bob Beauchemin http://www.sqlskills.com/blogs/bobb/

Blog: SQL Server FTS Team Blog http://blogs.msdn.com/b/sqlfts

SQL Server 2012 Books Online http://msdn.microsoft.com/en-us/library/

cc645577(SQL.110).aspx

Page 46: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.

Filter Packs

Adobe PDF Filter http://www.adobe.com/support/downloads/

thankyou.jsp?ftpID=4025&fileID=3941

Office 2010 Filters http://www.microsoft.com/en-us/download/

details.aspx?id=17062

Page 47: Introduction to Full-Text Searching in SQL Server 2012 Adolfo J. Socorro, Ph.D. IT Impact, Inc. asocorro@itimpact.com.