OPEN SOURCE SEARCH WITH APACHE LUCENE/SOLR · 2018. 11. 30. · // entry object was loaded.}; //...

1
kGuard lockGuard(&d_lock); mInvalidated = static_cast<int>(d_cache.size()); ar(); nvalidated; S <const Calendar> e::lookupCalendar(const char *calendarName) RT(calendarName); kGuard lockGuard(&d_lock); or iter = d_cache.find(calendarName); cache.end()) { TimeOutFlag pired(d_timeOut, iter->second.loadTime())) { ter->second.get(); e.erase(iter); hared_ptr<const Calendar>(); age namespace rprise namespace ------------------------------------------------------------ ‘calendar’ // that was loaded at the specified ‘loadTime’ using the , d_hasTimeOutFlag(false) , d_lock() PackedCalendar packedCalendar; // if (d_loader_p->load(&packedCalend return bsl::shared_ptr<const Calen // Create out-of-place calendar that w Calendar *calendarPtr = new (*d_allo d_a CalendarCache_Entry entry(calendar BSLS_ASSERT(static_cast<bsl::time_t // Insert newly-loaded calendar into c bsls::BslLockGuard lockGuard(&d_loc ConstCacheIterator iter = d_cache.fin // Here, we assume that the time elap // loading of the calendar is insignific // will simply return the entry in the c if (iter != d_cache.end()) { return iter->second.get(); ze()); me) const { ETURN --- License”); License. // MANIPULATORS CalendarCache_Entry& operator=(const CalendarCache_Entry&); const bsls::TimeInterval& timeout, bslma::Allocator *basicAllocator) BSLS_ASSERT(calendarName); bsls::BslLockGuard lockGuard(&d_lock); // -------------------------- // Copyright 2015 Blo // Licensed under the // you may not use thi // You may obtain a co http://www.apach // Unless required by // distributed under th // WITHOUT WARRAN express or implied. // See the License for // limitations under th // -------------------------- // bdlt_calendarcache #include <bdlt_calend #include <bsls_ident.h BSLS_IDENT_RCSID(b #include <bdlt_calend #include <bdlt_date.h #include <bdlt_packe #include <bslma_defa #include <bsls_assert. #include <bsl_climits. #include <bsl_function namespace Bloomber namespace bdlt { // STATIC HELPER FU bool hasExpired(const const bsl::tim // Return ‘true’ if at rd lockGuard(&d_lock); alidated = static_cast<int>(d_cache.size()); dated; nst Calendar> okupCalendar(const char *calendarName) lendarName); rd lockGuard(&d_lock); r = d_cache.find(calendarName); e.end()) { OutFlag d_timeOut, iter->second.loadTime())) { second.get(); se(iter); d_ptr<const Calendar>(); namespace e namespace ----------------------------------------------------- Bloomberg Finance L.P. // that was loaded at the specified ‘loadTime’ using the specified // ‘allocator’. The behavior is undefined unless ‘calendar’ , d_lock() , d_allocator_p(bslma::Default::allocator(basicAllocator)) { // tem if (d_loader_p->load(&packedCalendar, ca return bsl::shared_ptr<const Calendar> // Create out-of-place calendar that will b Calendar *calendarPtr = new (*d_allocato d_alloca CalendarCache_Entry entry(calendarPtr, b BSLS_ASSERT(static_cast<bsl::time_t>(-1) // Insert newly-loaded calendar into cache bsls::BslLockGuard lockGuard(&d_lock); ConstCacheIterator iter = d_cache.find(ca // Here, we assume that the time elapsed // loading of the calendar is insignificant c // will simply return the entry in the cache return iter->second.get(); } Lucene, Solr, Ant, Ivy, ZooKeeper, and JIRA are either trademarks or registered trademarks of their respective owners. CHRISTINE POERSCHKE, BLOOMBERG SHARD 1 REPLICA N SHARD 1 REPLICA 1 SHARD N REPLICA 1 REPLICA 2 REPLICA N SHARD 1 REPLICA 2 DOCUMENTS DOCUMENTS SHARD N SHARD LEADER SHARD N OVERSEER INDEX OF 500 MILLION STORIES 1 .5 MILLION SAVED SEARCHES Alerts in 100ms More. Better. Faster. 325K+ Subscribers 1 Million Stories PUBLISHED EACH DAY 500 Stories PER SECOND 9 Million Searches PER DAY RESPONSE TIME 1 80ms Available for Search in ~100ms OPEN SOURCE SEARCH WITH APACHE LUCENE/SOLR WHAT IS APACHE? • Apache Software Foundation • Established 1999 as a charitable organization • Mission: to provide (free) software for the public good • 350+ open source projects • Meritocracy, collaboration, project independence Apache License used by many open source projects (including non-Apache projects) WHAT ARE LUCENE AND SOLR? Apache Lucene is a search engine library written in Java. Apache Solr is a search platform, built on top of the Lucene library. A PICTURE OF A SOLR CLOUD A Solr Cloud consists of multiple Solr instances on different machines, either virtual machines or ‘bare metal’. The overall state of the cloud (the cluster state) is kept in Apache ZooKeeper™. A Collection in the Cloud is divided into Shards with each shard hosting a subset of Documents. Shards are replicated within the Cloud so that there is more than one copy of each shard’s data. One of the replicas of a given shard will be elected to act as Shard Leader. The Cloud scales via replication and sharding: to handle more search queries, add more replicas on more machines; to have capacity for more documents, split your collection into more shards and house the extra shards on extra machines. NEWS SEARCH AT BLOOMBERG OUR CHALLENGES Load, latency, stability Very spikey flow of news and user load ‘Always on’, no downtime, 24/7/365 Indexing and searching News stories and searches in 40+ languages Content from 125K+ sources, from tweets to 100+ page-long research reports Complex user-specific privileging/ACL model Arbitrarily complex boolean queries Ticker lists e.g. “I want news on all the companies of interest to me.” Custom components Indexing, parsing, searching, postfiltering CONTRIBUTING BACK 15+ Solr/Lucene contributors including 3 committers (I am one of them) Fixes or new features in 15+ releases, such as o Lucene’s SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr (SOLR-5730 and SOLR-8621) o Streaming Expressions (SOLR-7377) o Learning To Rank integration into Solr (SOLR-8542) COMMUNITY User mailing list: 3500+ subscribers, ~250 emails per week Dev mailing list: 800+ subscribers, ~100 emails per day Widely used in all sorts of sites and applications, from A(pple) to Z(appos.com) Many, sometimes competing, companies and individuals Talks and meetup groups in cities around the world New people welcome DEVELOPMENT • Task and issue tracking via Apache JIRA • Source code in Apache Git apache repository Building and dependency management via Apache Ant and Apache Ivy Multi-platform build, continuous integration and other services hosted by Apache EXPECTATIONS It’s impossible to follow and understand everything Community of professionals who volunteer their time are enthusiastic, but busy (just like you) No one expects big patches; big code patches are rare, just start somewhere … o community.apache.org o wiki.apache.org/lucene-java/HowToContribute o wiki.apache.org/solr/HowToContribute LET’S BE REAL Intellectual property Understand your employer’s open source policies and guidelines Respect open source projects’ licenses Balance Work Open source work Life in general (I keep honeybees in my spare time) INSPIRATION Everyone is unique, everyone’s contributions are unique, find or create your place and role techatbloomberg.com/post-topic/open-source RESOURCES lucene.apache.org/solr/quickstart.html • Apache Solr Reference Guide • Solr Community Wiki • Books and blogs SCAN HERE TO DOWNLOAD THIS POSTER.

Transcript of OPEN SOURCE SEARCH WITH APACHE LUCENE/SOLR · 2018. 11. 30. · // entry object was loaded.}; //...

Page 1: OPEN SOURCE SEARCH WITH APACHE LUCENE/SOLR · 2018. 11. 30. · // entry object was loaded.}; // -----// class Calendar-Cache_Entry // -----// CREATORS CalendarCache_Entry::Calendar-Cache_Entry():

// entry object was loaded.};

// ------------------------- // class Calendar-Cache_Entry // -------------------------

// CREATORSCalendarCache_Entry::Calendar-Cache_Entry(): d_ptr(), d_loadTime(0){}

CalendarCache_Entry::Calendar-Cache_Entry(Calendar *calen-dar, bsl::-time_t loadTime, bslma::Al-locator *allocator): d_ptr(calendar, allocator), d_loadTime(loadTime){ BSLS_ASSERT(calendar); BSLS_ASSERT(allocator);

, d_allocator_p(bslma::Default::allo-cator(basicAllocator)){ BSLS_ASSERT(loader); BSLS_ASSERT(bsls::TimeInter-val(0) <= timeout); BSLS_ASSERT(timeout <= bsls::TimeInterval(INT_MAX));}

CalendarCache::~CalendarCache(){}

// MANIPULATORSbsl::shared_ptr<const Calendar>CalendarCache::getCalendar(const char *calendarName){ BSLS_ASSERT(calendarName);

{ bsls::BslLockGuard lockGuard(&d_lock);

CacheIterator iter = d_cache.find(calendarName);

}

} // close unnamed namespace

// ========================= // class Calendar-Cache_Entry // =========================

class CalendarCache_Entry { // This class defines the type of objects that are inserted into the // calendar cache. Each entry contains a shared pointer to a read-only // calendar and the time at which that calendar was loaded. Note that an // explicit allocator is *required* to create a entry object.

// DATA bsl::shared_ptr<const Calendar> d_ptr; // shared pointer to

// MANIPULATORSCalendarCache_Entry& Calendar-Cache_Entry::operator=( const CalendarCache_Entry& rhs){ d_ptr = rhs.d_ptr; d_loadTime = rhs.d_loadTime;

return *this;}

// ACCESSORSbsl::shared_ptr<const Calendar> CalendarCache_Entry::get() const{ return d_ptr;}

bsl::time_t CalendarCache_Entry::-loadTime() const{ return d_loadTime;}

// ------------------- // class Calendar-Cache

PackedCalendar packedCalen-dar; // temporary, so use default allocator

if (d_loader_p->load(&packed-Calendar, calendarName)) { return bsl::shared_ptr<const Calendar>(); // RETURN }

// Create out-of-place calendar that will be managed by ‘bsl::shared_ptr’.

Calendar *calendarPtr = new (*d_allocator_p) Calendar(packed-Calendar, d_allocator_p);

CalendarCache_Entry entry(cal-endarPtr, bsl::time(0), d_alloca-tor_p);

BSLS_ASSERT(static_cast<bsl::-time_t>(-1) != entry.loadTime());

CalendarCache_Entry(const CalendarCache_Entry& original); // Create a cache entry object having the value of the specified // ‘original’ object.

~CalendarCache_Entry(); // Destroy this cache entry object.

// MANIPULATORS CalendarCache_Entry& operator=(const Calendar-Cache_Entry&); // Assign to this cache entry object the value of the specified ‘rhs’ // object, and return a reference providing modifiable access to this // object.

// ACCESSORS bsl::shared_ptr<const Calendar> get() const; // Return a shared pointer providing non-modifiable access to

, d_timeOut(0), d_hasTimeOutFlag(false), d_lock(), d_allocator_p(bslma::Default::allo-cator(basicAllocator)){ BSLS_ASSERT(loader);}

CalendarCache::CalendarCache(-CalendarLoader *loader, const bsls::Ti-meInterval& timeout, bslma::Allocator *basicAllocator)

// We have to supply ‘bsl::less<key>()’ because ‘bsl::map’ does not have a// constructor that takes only an allocator.

: d_cache(bsl::less<bsl::string>(), basicAllocator), d_loader_p(loader), d_timeOut(static_cast<bsl::-time_t>(timeout.seconds())), d_hasTimeOutFlag(true)

// bdlt_calendarcache.cpp -*-C++-*-#include <bdlt_calendarcache.h>

#include <bsls_ident.h>BSLS_IDENT_RCSID(bdlt_calendar-cache_cpp,”$Id$ $CSID$”)

#include <bdlt_calendarloader.h>#include <bdlt_date.h> // for testing only#include <bdlt_packedcalendar.h>

#include <bslma_default.h>

#include <bsls_assert.h>

#include <bsl_climits.h> // ‘INT_MAX’#include <bsl_functional.h>

namespace BloombergLP {namespace bdlt {

namespace {

// STATIC HELPER FUNCTIONSstatic

return entry.get();}

int CalendarCache::invalidate(const char *calendarName){ BSLS_ASSERT(calendarName);

bsls::BslLockGuard lockGuard(&d_lock);

CacheIterator iter = d_cache.find(calendarName);

if (iter != d_cache.end()) { d_cache.erase(iter);

return 1; // RETURN }

return 0;}

int CalendarCache::invalidateAll(){ bsls::BslLockGuard lockGuard(&d_lock);

#include <bsl_functional.h>

namespace BloombergLP {namespace bdlt {

namespace {

// STATIC HELPER FUNCTIONSstaticbool hasExpired(const bsl::time_t& interval, const bsl::time_t& loadTime) // Return ‘true’ if at least the specified time ‘interval’ has elapsed // since the specified ‘loadTime’, and ‘false’ otherwise.{ const bsl::time_t now = bsl::time(0); BSLS_ASSERT(static_cast<bsl::-time_t>(-1) != now);

const bsl::time_t elapsedTime = now - loadTime;

return elapsedTime >= interval ?

CalendarCache_Entry::Calendar-Cache_Entry(Calendar *calen-dar, bsl::-time_t loadTime, bslma::Al-locator *allocator): d_ptr(calendar, allocator), d_loadTime(loadTime){ BSLS_ASSERT(calendar); BSLS_ASSERT(allocator);}

CalendarCache_Entry::Calendar-Cache_Entry(const Calendar-Cache_Entry& original): d_ptr(original.d_ptr), d_loadTime(original.d_loadTime){}

CalendarCache_Entry::~Calendar-Cache_Entry(){}

// MANIPULATORS

{ bsls::BslLockGuard lockGuard(&d_lock);

CacheIterator iter = d_cache.find(calendarName);

if (iter != d_cache.end()) {

if (!d_hasTimeOutFlag || !hasExpired(d_timeOut, iter->second.loadTime())) { return iter->second.get(); // RETURN } else { d_cache.erase(iter); } } }

// Load calendar identified by ‘calendarName’.

PackedCalendar packedCalen-dar; // temporary, so use default allocator

, d_allocator_p(bslma::Default::allocator(basicAllocator)){ BSLS_ASSERT(loader); BSLS_ASSERT(bsls::TimeInterval(0) <= timeout); BSLS_ASSERT(timeout <= bsls::TimeInterval(INT_MAX));}

CalendarCache::~CalendarCache(){}

// MANIPULATORSbsl::shared_ptr<const Calendar>CalendarCache::getCalendar(const char *calendarName){ BSLS_ASSERT(calendarName);

{ bsls::BslLockGuard lockGuard(&d_lock);

CacheIterator iter = d_cache.find(calendarName);

if (iter != d_cache.end()) {

if (!d_hasTimeOutFlag || !hasExpired(d_timeOut, iter->second.loadTime())) { return iter->second.get(); // RETURN } else { d_cache.erase(iter); } } }

// Load calendar identified by ‘calendarName’.

PackedCalendar packedCalendar; // temporary, so use default allocator

if (d_loader_p->load(&packedCalendar, calendarName)) { return bsl::shared_ptr<const Calendar>(); // RETURN }

// Create out-of-place calendar that will be managed by ‘bsl::shared_ptr’.

Calendar *calendarPtr = new (*d_allocator_p) Calendar(packedCalendar, d_allocator_p);

{ bsls::BslLockGuard lockGuard(&d_lock);

const int numInvalidated = static_cast<int>(d_cache.size());

d_cache.clear();

return numInvalidated;}

// ACCESSORSbsl::shared_ptr<const Calendar>CalendarCache::lookupCalendar(const char *calendarName) const{ BSLS_ASSERT(calendarName);

bsls::BslLockGuard lockGuard(&d_lock);

CacheIterator iter = d_cache.find(calendarName);

if (iter != d_cache.end()) {

if (!d_hasTimeOutFlag || !hasExpired(d_timeOut, iter->second.loadTime())) { return iter->second.get(); // RETURN } else { d_cache.erase(iter); } }

return bsl::shared_ptr<const Calendar>();}

} // close package namespace} // close enterprise namespace

// ----------------------------------------------------------------------------// Copyright 2015 Bloomberg Finance L.P.//// Licensed under the Apache License, Version 2.0 (the “License”);// you may not use this file except in compliance with the License.// You may obtain a copy of the License at//// http://www.apache.org/licenses/LICENSE-2.0//// Unless required by applicable law or agreed to in writing, software

}

} // close unnamed namespace

// ========================= // class CalendarCache_Entry // =========================

class CalendarCache_Entry { // This class defines the type of objects that are inserted into the // calendar cache. Each entry contains a shared pointer to a read-only // calendar and the time at which that calendar was loaded. Note that an // explicit allocator is *required* to create a entry object.

// DATA bsl::shared_ptr<const Calendar> d_ptr; // shared pointer to // out-of-place instance

bsl::time_t d_loadTime; // time when calendar was // loaded

public: // CREATORS CalendarCache_Entry(); // Create an empty cache entry object. Note that an empty cache entry // is never actually inserted into the cache.

CalendarCache_Entry(Calendar *calendar, bsl::time_t loadTime, bslma::Allocator *allocator); // Create a cache entry object for managing the specified ‘calendar’ // that was loaded at the specified ‘loadTime’ using the specified // ‘allocator’. The behavior is undefined unless ‘calendar’ uses // ‘allocator’ to obtain memory.

CalendarCache_Entry(const CalendarCache_Entry& original); // Create a cache entry object having the value of the specified // ‘original’ object.

~CalendarCache_Entry(); // Destroy this cache entry object.

// MANIPULATORSCalendarCache_Entry& CalendarCache_Entry::operator=( const CalendarCache_Entry& rhs){ d_ptr = rhs.d_ptr; d_loadTime = rhs.d_loadTime;

return *this;}

// ACCESSORSbsl::shared_ptr<const Calendar> CalendarCache_Entry::get() const{ return d_ptr;}

bsl::time_t CalendarCache_Entry::loadTime() const{ return d_loadTime;}

// ------------------- // class CalendarCache // -------------------

// CREATORSCalendarCache::CalendarCache(CalendarLoader *loader, bslma::Allocator *basicAllocator)

// We have to supply ‘bsl::less<key>()’ because ‘bsl::map’ does not have a// constructor that takes only an allocator.

: d_cache(bsl::less<bsl::string>(), basicAllocator), d_loader_p(loader), d_timeOut(0), d_hasTimeOutFlag(false), d_lock(), d_allocator_p(bslma::Default::allocator(basicAllocator)){ BSLS_ASSERT(loader);}

CalendarCache::CalendarCache(CalendarLoader *loader, const bsls::TimeInterval& timeout, bslma::Allocator *basicAllocator)

// We have to supply ‘bsl::less<key>()’ because ‘bsl::map’ does not have a// constructor that takes only an allocator.

PackedCalendar packedCalendar; // temporary, so use default allocator

if (d_loader_p->load(&packedCalendar, calendarName)) { return bsl::shared_ptr<const Calendar>(); // RETURN }

// Create out-of-place calendar that will be managed by ‘bsl::shared_ptr’.

Calendar *calendarPtr = new (*d_allocator_p) Calendar(packedCalendar, d_allocator_p);

CalendarCache_Entry entry(calendarPtr, bsl::time(0), d_allocator_p);

BSLS_ASSERT(static_cast<bsl::time_t>(-1) != entry.load-Time());

// Insert newly-loaded calendar into cache if another thread hasn’t done so // already.

bsls::BslLockGuard lockGuard(&d_lock);

ConstCacheIterator iter = d_cache.find(calendarName);

// Here, we assume that the time elapsed between the last check and the // loading of the calendar is insignificant compared to the timeout, so we // will simply return the entry in the cache if it has been inserted by // another thread.

if (iter != d_cache.end()) { return iter->second.get(); // RETURN }

d_cache[calendarName] = entry;

return entry.get();}

int CalendarCache::invalidate(const char *calendarName){ BSLS_ASSERT(calendarName);

bsls::BslLockGuard lockGuard(&d_lock);

{ bsls::BslLockGuard lockGuard(&d_lock);

const int numInvalidated = static_cast<int>(d_cache.size());

d_cache.clear();

return numInvalidated;}

// ACCESSORSbsl::shared_ptr<const Calendar>CalendarCache::lookupCalendar(const char *calendarName) const{ BSLS_ASSERT(calendarName);

bsls::BslLockGuard lockGuard(&d_lock);

CacheIterator iter = d_cache.find(calendarName);

if (iter != d_cache.end()) {

if (!d_hasTimeOutFlag || !hasExpired(d_timeOut, iter->second.loadTime())) { return iter->second.get(); // RETURN } else { d_cache.erase(iter); } }

return bsl::shared_ptr<const Calendar>();}

} // close package namespace} // close enterprise namespace

// ----------------------------------------------------------------------------// Copyright 2015 Bloomberg Finance L.P.//// Licensed under the Apache License, Version 2.0 (the “License”);// you may not use this file except in compliance with the License.// You may obtain a copy of the License at//// http://www.apache.org/licenses/LICENSE-2.0//// Unless required by applicable law or agreed to in writing, software// distributed under the License is distributed on an “AS IS” BASIS,// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.// See the License for the specific language governing permissions and// limitations under the License.// ----------------------------- END-OF-FILE ----------------------------------

// bdlt_calendarcache.cpp -*-C++-*-#include <bdlt_calendarcache.h>

#include <bsls_ident.h>BSLS_IDENT_RCSID(bdlt_calendarcache_cpp,”$Id$ $CSID$”)

}

} // close unnamed namespace

// ========================= // class CalendarCache_Entry // =========================

class CalendarCache_Entry { // This class defines the type of objects that are inserted into the // calendar cache. Each entry contains a shared pointer to a read-only // calendar and the time at which that calendar was loaded. Note that an // explicit allocator is *required* to create a entry object.

// DATA bsl::shared_ptr<const Calendar> d_ptr; // shared pointer to // out-of-place instance

bsl::time_t d_loadTime; // time when calendar was // loaded

public: // CREATORS CalendarCache_Entry(); // Create an empty cache entry object. Note that an empty cache entry // is never actually inserted into the cache.

CalendarCache_Entry(Calendar *calendar, bsl::time_t loadTime, bslma::Allocator *allocator); // Create a cache entry object for managing the specified ‘calendar’ // that was loaded at the specified ‘loadTime’ using the specified // ‘allocator’. The behavior is undefined unless ‘calendar’ uses // ‘allocator’ to obtain memory.

CalendarCache_Entry(const CalendarCache_Entry& original); // Create a cache entry object having the value of the specified // ‘original’ object.

~CalendarCache_Entry(); // Destroy this cache entry object.

// MANIPULATORS CalendarCache_Entry& operator=(const CalendarCache_Entry&); // Assign to this cache entry object the value of the specified ‘rhs’ // object, and return a reference providing modifiable access to this // object.

// ACCESSORS bsl::shared_ptr<const Calendar> get() const; // Return a shared pointer providing non-modifiable access to the // calendar referred to by this cache entry object.

bsl::time_t loadTime() const; // Return the time at which the calendar referred to by this cache // entry object was loaded.};

// MANIPULATORSCalendarCache_Entry& CalendarCache_Entry::operator=( const CalendarCache_Entry& rhs){ d_ptr = rhs.d_ptr; d_loadTime = rhs.d_loadTime;

return *this;}

// ACCESSORSbsl::shared_ptr<const Calendar> CalendarCache_Entry::get() const{ return d_ptr;}

bsl::time_t CalendarCache_Entry::loadTime() const{ return d_loadTime;}

// ------------------- // class CalendarCache // -------------------

// CREATORSCalendarCache::CalendarCache(CalendarLoader *loader, bslma::Allocator *basicAllocator)

// We have to supply ‘bsl::less<key>()’ because ‘bsl::map’ does not have a// constructor that takes only an allocator.

: d_cache(bsl::less<bsl::string>(), basicAllocator), d_loader_p(loader), d_timeOut(0), d_hasTimeOutFlag(false), d_lock(), d_allocator_p(bslma::Default::allocator(basicAllocator)){ BSLS_ASSERT(loader);}

CalendarCache::CalendarCache(CalendarLoader *loader, const bsls::TimeInterval& timeout, bslma::Allocator *basicAllocator)

// We have to supply ‘bsl::less<key>()’ because ‘bsl::map’ does not have a// constructor that takes only an allocator.

: d_cache(bsl::less<bsl::string>(), basicAllocator), d_loader_p(loader), d_timeOut(static_cast<bsl::time_t>(timeout.seconds())), d_hasTimeOutFlag(true), d_lock(), d_allocator_p(bslma::Default::allocator(basicAllocator)){ BSLS_ASSERT(loader); BSLS_ASSERT(bsls::TimeInterval(0) <= timeout); BSLS_ASSERT(timeout <= bsls::TimeInterval(INT_MAX));

PackedCalendar packedCalendar; // temporary, so use default allocator

if (d_loader_p->load(&packedCalendar, calendarName)) { return bsl::shared_ptr<const Calendar>(); // RETURN }

// Create out-of-place calendar that will be managed by ‘bsl::shared_ptr’.

Calendar *calendarPtr = new (*d_allocator_p) Calendar(packedCalen-dar, d_allocator_p);

CalendarCache_Entry entry(calendarPtr, bsl::time(0), d_allocator_p);

BSLS_ASSERT(static_cast<bsl::time_t>(-1) != entry.loadTime());

// Insert newly-loaded calendar into cache if another thread hasn’t done so // already.

bsls::BslLockGuard lockGuard(&d_lock);

ConstCacheIterator iter = d_cache.find(calendarName);

// Here, we assume that the time elapsed between the last check and the // loading of the calendar is insignificant compared to the timeout, so we // will simply return the entry in the cache if it has been inserted by // another thread.

if (iter != d_cache.end()) { return iter->second.get(); // RETURN }

d_cache[calendarName] = entry;

return entry.get();}

int CalendarCache::invalidate(const char *calendarName){ BSLS_ASSERT(calendarName);

bsls::BslLockGuard lockGuard(&d_lock);

CacheIterator iter = d_cache.find(calendarName);

if (iter != d_cache.end()) { d_cache.erase(iter);

return 1; // RETURN }

return 0;}

int CalendarCache::invalidateAll()

// ----------------------------------------------------------------------------// Copyright 2015 Bloomberg Finance L.P.//// Licensed under the Apache License, Version 2.0 (the “License”);// you may not use this file except in compliance with the License.// You may obtain a copy of the License at//// http://www.apache.org/licenses/LICENSE-2.0//// Unless required by applicable law or agreed to in writing, software// distributed under the License is distributed on an “AS IS” BASIS,// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.// See the License for the specific language governing permissions and// limitations under the License.// ----------------------------- END-OF-FILE ----------------------------------

// bdlt_calendarcache.cpp -*-C++-*-#include <bdlt_calendarcache.h>

#include <bsls_ident.h>BSLS_IDENT_RCSID(bdlt_calendarcache_cpp,”$Id$ $CSID$”)

#include <bdlt_calendarloader.h>#include <bdlt_date.h> // for testing only#include <bdlt_packedcalendar.h>

#include <bslma_default.h>

#include <bsls_assert.h>

#include <bsl_climits.h> // ‘INT_MAX’#include <bsl_functional.h>

namespace BloombergLP {namespace bdlt {

namespace {

// STATIC HELPER FUNCTIONSstaticbool hasExpired(const bsl::time_t& interval, const bsl::time_t& loadTime) // Return ‘true’ if at least the specified time ‘interval’ has elapsed // since the specified ‘loadTime’, and ‘false’ otherwise.{ const bsl::time_t now = bsl::time(0); BSLS_ASSERT(static_cast<bsl::time_t>(-1) != now);

const bsl::time_t elapsedTime = now - loadTime;

return elapsedTime >= interval ? true : false;}

} // close unnamed namespace

// =========================

{ bsls::BslLockGuard lockGuard(&d_lock);

const int numInvalidated = static_cast<int>(d_cache.size());

d_cache.clear();

return numInvalidated;}

// ACCESSORSbsl::shared_ptr<const Calendar>CalendarCache::lookupCalendar(const char *calendarName) const{ BSLS_ASSERT(calendarName);

bsls::BslLockGuard lockGuard(&d_lock);

CacheIterator iter = d_cache.find(calendarName);

if (iter != d_cache.end()) {

if (!d_hasTimeOutFlag || !hasExpired(d_timeOut, iter->second.loadTime())) { return iter->second.get(); // RETURN } else { d_cache.erase(iter); } }

return bsl::shared_ptr<const Calendar>();}

} // close package namespace} // close enterprise namespace

// ----------------------------------------------------------------------------// Copyright 2015 Bloomberg Finance L.P.//// Licensed under the Apache License, Version 2.0 (the “License”);// you may not use this file except in compliance with the License.// You may obtain a copy of the License at//// http://www.apache.org/licenses/LICENSE-2.0//// Unless required by applicable law or agreed to in writing, software

}

} // close unnamed namespace

// ========================= // class CalendarCache_Entry // =========================

class CalendarCache_Entry { // This class defines the type of objects that are inserted into the // calendar cache. Each entry contains a shared pointer to a read-only // calendar and the time at which that calendar was loaded. Note that an // explicit allocator is *required* to create a entry object.

// DATA bsl::shared_ptr<const Calendar> d_ptr; // shared pointer to // out-of-place instance

bsl::time_t d_loadTime; // time when calendar was // loaded

public: // CREATORS CalendarCache_Entry(); // Create an empty cache entry object. Note that an empty cache entry // is never actually inserted into the cache.

CalendarCache_Entry(Calendar *calendar, bsl::time_t loadTime, bslma::Allocator *allocator); // Create a cache entry object for managing the specified ‘calendar’ // that was loaded at the specified ‘loadTime’ using the specified // ‘allocator’. The behavior is undefined unless ‘calendar’ uses // ‘allocator’ to obtain memory.

CalendarCache_Entry(const CalendarCache_Entry& original); // Create a cache entry object having the value of the specified // ‘original’ object.

~CalendarCache_Entry(); // Destroy this cache entry object.

// MANIPULATORSCalendarCache_Entry& CalendarCache_Entry::operator=( const CalendarCache_Entry& rhs){ d_ptr = rhs.d_ptr; d_loadTime = rhs.d_loadTime;

return *this;}

// ACCESSORSbsl::shared_ptr<const Calendar> CalendarCache_Entry::get() const{ return d_ptr;}

bsl::time_t CalendarCache_Entry::loadTime() const{ return d_loadTime;}

// ------------------- // class CalendarCache // -------------------

// CREATORSCalendarCache::CalendarCache(CalendarLoader *loader, bslma::Allocator *basicAllocator)

// We have to supply ‘bsl::less<key>()’ because ‘bsl::map’ does not have a// constructor that takes only an allocator.

: d_cache(bsl::less<bsl::string>(), basicAllocator), d_loader_p(loader), d_timeOut(0), d_hasTimeOutFlag(false), d_lock(), d_allocator_p(bslma::Default::allocator(basicAllocator)){ BSLS_ASSERT(loader);}

CalendarCache::CalendarCache(CalendarLoader *loader, const bsls::TimeInterval& timeout, bslma::Allocator *basicAllocator)

// We have to supply ‘bsl::less<key>()’ because ‘bsl::map’ does not have a// constructor that takes only an allocator.

PackedCalendar packedCalendar; // temporary, so use default allocator

if (d_loader_p->load(&packedCalendar, calendarName)) { return bsl::shared_ptr<const Calendar>(); // RETURN }

// Create out-of-place calendar that will be managed by ‘bsl::shared_ptr’.

Calendar *calendarPtr = new (*d_allocator_p) Calendar(packedCalendar, d_allocator_p);

CalendarCache_Entry entry(calendarPtr, bsl::time(0), d_allocator_p);

BSLS_ASSERT(static_cast<bsl::time_t>(-1) != entry.load-Time());

// Insert newly-loaded calendar into cache if another thread hasn’t done so // already.

bsls::BslLockGuard lockGuard(&d_lock);

ConstCacheIterator iter = d_cache.find(calendarName);

// Here, we assume that the time elapsed between the last check and the // loading of the calendar is insignificant compared to the timeout, so we // will simply return the entry in the cache if it has been inserted by // another thread.

if (iter != d_cache.end()) { return iter->second.get(); // RETURN }

d_cache[calendarName] = entry;

return entry.get();}

int CalendarCache::invalidate(const char *calendarName){ BSLS_ASSERT(calendarName);

bsls::BslLockGuard lockGuard(&d_lock);

Lucene, Solr, Ant, Ivy, ZooKeeper, and JIRA are either trademarks or registered trademarks of their respective owners.

CHRISTINE POERSCHKE, BLOOMBERG

SHARD 1 REPLICA N

SHARD 1 REPLICA 1 SHARD N REPLICA 1

REPLICA 2

REPLICA N

SHARD 1 REPLICA 2

DOCUMENTS

DOCUMENTS

SHARD N SHARDLEADER

SHARD N

OVERSEER

INDEX OF 500 MILLION STORIES

1.5 MILLION SAVED SEARCHES

Alerts in 100msMore. Better. Faster.

325K+ Subscribers 1 Million Stories PUBLISHED EACH DAY

500 Stories PER SECOND

9 Million Searches PER DAY

RESP

ON

SE T

IME

180m

s

Available for Search

in ~100ms

OPEN SOURCE SEARCH WITH APACHE LUCENE/SOLR

WHAT IS APACHE?• Apache Software Foundation• Established 1999 as a charitable organization• Mission: to provide (free) software for the public good• 350+ open source projects• Meritocracy, collaboration, project independence• Apache License used by many open source projects (including non-Apache projects)

WHAT ARE LUCENE AND SOLR?• Apache Lucene is a search engine library written in Java.

• Apache Solr is a search platform, built on top of the Lucene library.

A PICTURE OF A SOLR CLOUD• A Solr Cloud consists of multiple Solr instances on different

machines, either virtual machines or ‘bare metal’. The overall state of the cloud (the cluster state) is kept in Apache ZooKeeper™.

• A Collection in the Cloud is divided into Shards with each shard hosting a subset of Documents. Shards are replicated within the Cloud so that there is more than one copy of each shard’s data. One of the replicas of a given shard will be elected to act as Shard Leader.

• The Cloud scales via replication and sharding: to handle more search queries, add more replicas on more machines; to have capacity for more documents, split your collection into more shards and house the extra shards on extra machines.

NEWS SEARCH AT BLOOMBERG

OUR CHALLENGESLoad, latency, stability• Very spikey flow of news and user load• ‘Always on’, no downtime, 24/7/365

Indexing and searching• News stories and searches in 40+ languages• Content from 125K+ sources, from tweets to 100+ page-long research reports

• Complex user-specific privileging/ACL model• Arbitrarily complex boolean queries• Ticker lists e.g. “I want news on all the companies of interest to me.”

Custom components• Indexing, parsing, searching, postfiltering

CONTRIBUTING BACK• 15+ Solr/Lucene contributors including 3 committers (I am one of them)• Fixes or new features in 15+ releases, such as

o Lucene’s SortingMergePolicy and EarlyTerminatingSortingCollector configurable in Solr (SOLR-5730 and SOLR-8621)

o Streaming Expressions (SOLR-7377)o Learning To Rank integration into Solr (SOLR-8542)

COMMUNITY• User mailing list: 3500+ subscribers, ~250 emails per week• Dev mailing list: 800+ subscribers, ~100 emails per day• Widely used in all sorts of sites and applications, from A(pple) to Z(appos.com)• Many, sometimes competing, companies and individuals• Talks and meetup groups in cities around the world• New people welcome

DEVELOPMENT• Task and issue tracking via Apache JIRA• Source code in Apache Git apache repository• Building and dependency management via Apache Ant and Apache Ivy

• Multi-platform build, continuous integration and other services hosted by Apache

EXPECTATIONS• It’s impossible to follow and understand everything• Community of professionals who volunteer their time are enthusiastic, but busy (just like you)

• No one expects big patches; big code patches are rare, just start somewhere …o community.apache.orgo wiki.apache.org/lucene-java/HowToContributeo wiki.apache.org/solr/HowToContribute

LET’S BE REALIntellectual property• Understand your employer’s open source policies and guidelines

• Respect open source projects’ licensesBalance• Work• Open source work• Life in general (I keep honeybees in my spare time)

INSPIRATION• Everyone is unique, everyone’s contributions are unique, find or create your place and role

• techatbloomberg.com/post-topic/open-source

RESOURCES• lucene.apache.org/solr/quickstart.html

• Apache Solr Reference Guide

• Solr Community Wiki

• Books and blogs

SCAN HERE TO DOWNLOAD THIS POSTER.