Understanding and Applying Cloud Hybrid...

Post on 12-Jul-2020

28 views 0 download

Transcript of Understanding and Applying Cloud Hybrid...

Understanding and ApplyingCloud Hybrid Search

@jefffried

Jeff Fried CTO, BA Insight

we love hybrid search - it's amazing how fast usage is growing

Jeff Teper @jeffteper

Focused on Search and

SharePoint since 2004

Longtime

Search Nerd

• CTO, BA Insight

• Senior PM, Microsoft

• VP, FAST

• SVP, LingoMotors

About Jeff Fried

Passionate About

• Search

• SharePoint

• Search-driven

applications

• Information Strategy

Blog:

BAinsight.com/blog

Technet Column

“A View from the

Crawlspace”

jeff.fried@bainsight.com

About BA Insight

– Connectivity

– Applications -

– Classification -

– Analytics

KCTCS (background)

Search is not stationary

Demo

9

Why Hybrid SharePoint?

The

Evolution

of

SharePoint:

HYBRID Management ExtensibilityExperiences

| Server

Experiences Management Extensibility

| Server | Server

HYBRID

Team

Sites

Portals

Enterprise

Content Mngt

BI

Search Provides a Unified View

SharePoint 2013/2016 Search Architecture

Web Service (CEWS)

“Classic” Hybrid Search is Federated

not a single result set OOB

Cloud Hybrid Search

Benefits of Cloud Hybrid Search

2) Makes finding content easy, wherever the content lives

1) Simpler, easier, and less costly to run search

SharePoint Server

(On-premises or Hosted)Office 365

SharePoint Online Content

Onedrive for Business ContentSharePoint Content

Cloud Hybrid Search

Case Study: Split Users with SharePoint

SupportSales & Marketing

Knowledge Articles

Fileshares

OneDrive

Support forum

SPO

Search Farm

SP 2013 content SP 2010 content

On-premises

Office 365

SPO content

SP 2013/2016

Cloud SSA

Setting up Cloud Hybrid Search

1.

2.

3.

4.

Use search verticals with Cloud Hybrid Search

SharePoint Online

Custom result source using Local SharePoint results plus a filter which excludes results from on-premises

TIP: Can be used during validation of hybrid search in the production tenant.

Result source query:

{searchTerms} NOT(IsExternalContent:1)

Result Sources are your friend

The Support Search vertical only searches sites that are relevant to the Support team.

It uses Local SharePoint results plus a filter on which sites to include in the search results

Result source query:

{searchTerms} (

Path:»http://sp2010» OR

Path:»file://fileshare» OR

Path:»http://demohybrid.../../supportforum»)

SharePoint Online Support Search

Demo

25

Single node topology

VM

Crawler

CPC

(unused)

APC

(unused)

Indexer

(unused)QPC

Multi-node topology

1.

2.

3.

VM

Crawler

QPC

VM

Crawler

CPC

(unused)

APC

(unused)

Indexer

(unused)QPC

Reduce your footprint

Servers

Volume of Content(indexable items) Pattern

On-prem Search Farm

Cloud Hybrid Search

0-10 million items small 4 App + 2 DB 1 or 2

10-40 million items medium 12 App + 2 DB 2

40-100 million items large 28 App + 4 DB 2

400 million items XL example (SP2016) 86 App + 4DB 2 or 3

Item Limits and Pricing

Licensing: 1M items of external content in index for every 1TB storage in O365

1TB included by default

+ 0.5 GB per licensed O365 user

No limit on number of items from O365 in the index

Default throttling at 20M external items; current threshold at 25M

2000 users x 0.5 GB = 1TB

+ 1TB default = 2 TB total

-> 2M external items indexed

+ Can also buy the “Office 365 Extra File Storage” Add-on

$0.20/GB/Month = $200/TB/Month = $200/M items/Month

50,000 users x 0.5 GB = 25TB

+ 1TB default = 26 TB total

-> 26M external items indexed

SharePoint 2016 Hybrid

Cloud Hybrid

Search User Profiles Following

Extranet

Compliance

(DLP/e-

Discovery)

Config

Experience

Built on Search

Advantages•

Disadvantages

Cloud SSA Pro/Con versus on-prem

External Content

(on-premises and/or

in the cloud)

SharePoint Server

(On-premises or Hosted)Office 365

SharePoint Online Content

Onedrive for Business Content

Co

nn

ecto

rs

SharePoint Content

Adding External Content

Cloud Hybrid Search

Also drives:

• Office Graph (delve,..)

• Compliance (DLP, …)

Connectors to MANY Enterprise Systems

ERP and Portal Systems•••••

External Content in O365 UX

Unified view across all content - on-premises and on-line- inside and outside SharePoint

DLP Sensitive Data Search works with hybrid

Search for sensitive data across on-premises and SharePoint Online

All Built-in sensitive types

Identification and export

Extends to data in OneDrive

Sensitive Information type detection through KQL searches

Get instant statistics

Preview & export results

Current Caveats:

1) don’t see thumbnails, just file icons

2) Have to query for it to show up

Case Study: Cloud SSA, external content

Large global company

in materials science

DirSync SP 2007/2010/2013 Fileshares BCS

Cloud SSA

SPO

Search Index

1

2

34

5

6

7

Logical architecture: crawling

Corporate

network

Office 365

3rd Party Connectors

External Content

(on-premises and/or

in the cloud)

Custom

Processing

CEWS

Bottlenecks:

1) Source systems

2) Content Processing

3) Indexer

….

External Content

(on-premises and/or

in the cloud)

Bottlenecks:

1) Uplink

2) Source systems

….

42

Performance

500K items crawled on an Azure D3

50 DPS 100 DPS

1 hour

SCS under the hood

Crawler

Content

Indexing

API

Blob store

Document state table

Work queues

Backend

API

Index/Graph

On-Premises content source

Search farm

Azure

Broker

Crawler

Content

SPO content source

What is pushed to the SCS Endpoint?

SharePoint 2013/ 2016

FileShares

Her user token gets rehydrated with her online claims as she is authenticated against Office 365.

Cloud SSA

SPO

Search Index

Logical architecture: query

Corporate network

SP 2013

1

2a

Jaden issues a query from Office 365.

Her user token contains her online identity and group memberships.

1

Jaden issues a query from a site on-premises. This sends over her on-premises claims to SPO

2a

2b

2b

Office 365

SUPPORTED

– Custom IFilter

– BCS connectors

– Partner connectors

Customizations with Cloud Hybrid Search

SUPPORTED

– Tenant level schema mapping

– Query rules

– Result sources

Cloud SSA SCS/O365

NOT SUPPORTED

• Content that requires custom security trimming

NOT SUPPORTED

• Site collection level schema mapping

• Custom security trimming

• Custom entity extraction

• Content enrichment web service

Issues with Cloud Hybrid Search (1)Cloud Hybrid Search "annoyances"

Performance Characteristicsslower query latency for on-prem queries against Cloud SSA

SharePoint Online Limitationsno synonyms

no site-level schema

no full trust code access

Hybrid Administration Weaknessesclunky metadata mapping

can't remove on-premises search results from Cloud SSA

trickier to test & debug crawls

can't reset index from Cloud SSA

Be aware of these

& compensate for them

(Fixed in August PU)

(Semi-addressed in June PU)

And it’s getting better:

Should I run index reset?

NO!DeleteAllCloudHybridSearchContent()

https://blogs.technet.microsoft.com/beyondsharepoint/2016/07/07/cloud-hybrid-search-service-application-removing-items-from-the-office-365-search-index/

Issues with Cloud Hybrid Search (2)

50

Content Enrichmentno CEWS

no Entity Extraction

Securityno Custom Security Trimming

Can't crawl across Multiple Domains

Can't Crawl SP in Classic Auth Mode

Data Sovereigntyexport-restricted content

can't be put in O365 index

Limitations of Cloud SSA

External Content

(on-premises and/or

in the cloud)

SharePoint Server

(On-premises or Hosted)

SPO Content

OneDrive Content

Co

nn

ecto

rs SharePoint Content

Connector

Framework

Office 365

AutoClassifier

(app version)

CEWS

Custom

Processing

Case study:Content Enrichment

Content

CloudSSA

Connector Framework

IndexingConnectors

Smart Pipeline

AutoClassifierCustom Stage A

CustomStage C

Custom Stage B

Online

On-Prem

Cloud Hybrid Search under the coversSecurity = identity sync + ACL mapping

Cloud SSACloud SSA

ParseCrawl

SCS

ACL Map Process

Blob store

queue

Directory Synchronization

SID S-1-5-21-1212121212-1212121212-1212

jaden@corp.hybridsearch.com

msOnline-OnPremiseSecurity

Identifier

S-1-5-21-1212121212-1212121212-1212

PUID PUID-XXXX-XXXXXXXXXX

Mapping of Access Control Lists

Allow: S-1-5-21-1212121212-1212121212-1212 Allow: PUID-XXXX-XXXXXXXXXX

• User SIDs are mapped to PUIDs

• Group SIDs are mapped to Object IDs

• «Everyone» and «Authenticated users» are mapped to

«Everyone except external users»

Only AD Users and Groups,

Only from one domain

Case Study: Crawling Cross-Domain

A global single index solution

Cloud SSA

Cloud SSA

Cloud SSA

Cloud SSA

Cloud SSA

BUT export-restricted content

can’t be in the global index

Issues with Cloud Hybrid Search OOB

Content Enrichmentno CEWS

no Entity Extraction

Securityno Custom Security Trimming

Can't crawl across Multiple Domains

Can't Crawl SP in Classic Auth Mode

Data Sovereigntyexport-restricted content

can't be put in O365 index

Limitations of Cloud SSA BA Insight Solution

Connector Framework

AutoClassifier

Connector Framework

can 'map down' to AD groups

can 'map across' cross-domain

can crawl and map security

Federator

Key Considerations for Hybrid: Workloads, Environment, Data, Customizations

Availability of features Online versus

On-Premises on particular workloads

Significant investments in

customization of On-Premises

workloads

Concerns over global network

performance with remote sites

Regulatory

considerations

Manageability concerns

Contact:Jeff.Fried@BAinsight.comwww.BAinsight.com

Questions