OpenLDAP Scaling Guide

download OpenLDAP Scaling Guide

of 13

Transcript of OpenLDAP Scaling Guide

  • 7/31/2019 OpenLDAP Scaling Guide

    1/13

    Guide to Scaling OpenLDAP

    MySQL Cluster as Data Store for OpenLDAP Directories

    An OpenLDAP Whitepaper by Symas Corporation

    Copyright 2009, Symas Corporation

  • 7/31/2019 OpenLDAP Scaling Guide

    2/13

    Table of Contents

    1 INTRODUCTION.........................................................................................................................3

    2 TRADITIONAL OPENLDAP DATA STORES.............................................................................4

    2.1 Escalating Database Demands ..............................................................................................................................................

    2.1 The Cost of Updates...............................................................................................................................................................

    2.2 Redundant Replicas of the Directory Data Store..................................................................................................................5

    2.2.1 Deployment Complexity.....................................................................................................................................................5

    2.3 The Costs of Database Redundancy......................................................................................................................................2.3.1 Database Replication Overhead..........................................................................................................................................62.3.2 The Hidden Costs of Database Replicas.............................................................................................................................6

    3 MYSQL CLUSTER AS A DATA STORE FOR OPENLDAP ......................................................7

    3.1 Maintaining Redundancy......................................................................................................................................................

    3.2 MySQL Cluster CGE: Smart Network Database..................................................................................................................73.2.1 MySQL Cluster Architecture ..............................................................................................................................................83.2.1 Efficient Synchronous Replication......................................................................................................................................93.2.2 Distributed Data Storage to Reduce Costs..........................................................................................................................93.2.3 Geographical Redundancy................................................................................................................................................103.2.4 Simplified Design and Deployment..................................................................................................................................10

    3.3 Integrating Directories with MySQL Cluster .....................................................................................................................1

    3.4 Scaling OpenLDAP with MySQL Cluster Carrier Grade Edition....................................................................................11

    4 CONCLUSION...........................................................................................................................12

    5 REFERENCES..........................................................................................................................12

    6 ABOUT SYMAS.........................................................................................................................13

    Copyright 2009, Symas Corporation Page 2 of 13

  • 7/31/2019 OpenLDAP Scaling Guide

    3/13

  • 7/31/2019 OpenLDAP Scaling Guide

    4/13

    2 Traditional OpenLDAP Data Stores

    OpenLDAP directory databases have commonly been hosted on the same physical system as thedirectory server itself. To meet both availability and performance levels, multiple copies (replicas) ofthe database servers are typically deployed. However, the massive growth in data volumes, coupledwith more frequent updates and higher performance demands presents challenges to this approachfor certain classes of directory workloads.

    2.1 Escalating Database Demands

    With the introduction of the first standard Directory Database Model (the X.500 Data Model),technology has been developed for storing the underlying data using various storage devices andtechnologies. The range of approaches goes from text-based flat-files using the standard datainterchange format to implementations built on top of Relational Database Management Systems(RDBMSs). There are advantages and disadvantages to each storage technique, depending on thedeployment environment.

    Traditional databases used as directory data stores provide very good storage capabilities with

    "transaction" wrappers that provide high levels of data integrity during additions and updates.However, the basic design of the OpenLDAP directory server associates a dedicated copy of thedatabase to a running instance of the server software, typically hosted on the same system. Many ofthese databases do not offer the low level logic to maintain data integrity across requests and updatesfrom multiple OpenLDAP servers, which causes challenges in environments with large directorydatabases storing and managing dynamic data:

    Each database server must have all the data for which it may be queried in its local database

    (referrals across servers are very time-consuming and rarely acceptable)

    Each database server must process all updates affecting any entries it contains

    To ensure required performance levels are achieved with these very dynamic workloads, each

    OpenLDAP directory server typically needs a very large memory (RAM) to hold the directory databasein its in-memory cache (RAM), which can increase the cost of the system.

    2.1 The Cost of Updates

    For the largest OpenLDAP deployments, there are specific performance and scalability challenges. Interms of processing overhead, it is much more expensive to update (add, delete, or change) adatabase record than to read it. This is true of any database system where atomicity, consistency,isolation, and durability (ACID) properties are required. ACID properties guarantee that databasetransactions are processed reliably. In many database applications a transaction often involvesmultiple database updates and the design principle of a transaction wrapper and ACID propertiesprovides the ability to consistently undo partial updates, should the transaction fail.

    The data in a directory database is generally stored in physical structures on storage devices thatupdate multiple physical files when an entry is changed. The storage approaches are all quitedifferent, but even the simplest uses indexes that are independent from the underlying persisted datastore. An example of such situation is where an update requires the server to update two (or more)separate files. As a result, directory designers rely, when possible, on database products for the

    ACID capabilities that wrap the directory's transactions.

    ACID properties represent a layer of necessary overhead that makes these writes (updates) morecostly when compared to reads (queries). Depending on the complexity of the underlying data

    Copyright 2009, Symas Corporation Page 4 of 13

  • 7/31/2019 OpenLDAP Scaling Guide

    5/13

    mappings, database updates to the OpenLDAP directory may be from 3-to-10 times as demanding asdatabase accesses to the OpenLDAP directory. There is really no upper bound on this complexity asconfigurations allow unlimited indexing of entries. The challenge this presents is that OpenLDAPdirectory servers require databases to handle the increased overhead of updates, with a resultingincrease in system cost.

    2.2 Redundant Replicas of the Directory Data StoreIn most OpenLDAP deployments, there are redundant copies (replicas) of the database, which isdriven by two powerful design considerations:

    First, servers and their associated storage systems fail due to hardware, software or configurationfaults. Recovery options include either configuring a server and storage system and restore thedirectory databases on-line from a backup, or having a warm standby backup system ready forimmediate deployment. Clearly, for a production database servicing a mission-critical LDAP Directory,finding a server and restoring from a backup tape is unacceptable. It can take hours or days to loadthe data from the backup and, in the meantime, applications relying on the LDAP server will beunavailable. The only real option is for users to maintain standby servers containing replicas of themaster directory database, maintained in parallel and readily available to assume processing in the

    event of the primary master server failing.

    The other reason for users to create replicas of OpenLDAP data stores is to meet performancerequirements. Despite changing workload requirements driven by on-line applications, typical datastores for OpenLDAP directories process more reads (queries) than writes (updates). These queriesare completely independent from one another, and have no impact across replicas. Rather thanconfigure one or two powerful central query servers, it is often more cost effective to distribute lessexpensive database replicas across the network. Each replica handles its associated load and can bemanaged independently for capacity and reliability.

    2.2.1 Deployment Complexity

    This approach results in multiple master directory database servers being deployed. To process

    database updates, each master server accepts requests from users and applications. Databasereplication mechanisms must be implemented to coordinate updates across these distributeddatabases, and provide the logic necessary to ensure data integrity. Called Multi-Master Replication,this capability supports both technical and organizational requirements for distributed master servercapabilities.

    The larger, more mission-critical, and complex the data store of the OpenLDAP directory, the morelikely there are to be numerous replicas of the data store under the control of multiple master directoryservers. As a result, the overall cost and management overhead of providing the directory data storeservices can quickly escalate.

    Copyright 2009, Symas Corporation Page 5 of 13

  • 7/31/2019 OpenLDAP Scaling Guide

    6/13

    M= Master. R = Replica. H = Hub. R/O = Read-Only

    Figure 1: Database deployment complexity and cost grows as the OpenLDAP directory datastore scales

    2.3 The Costs of Database Redundancy

    While redundant OpenLDAP data stores provide a good solution for some environments, they canpresent significant challenges for the larger data sets which are increasingly becoming common incarrier infrastructures. The data store replication overhead and cost of the replica servers themselvescan outweigh the benefits for more dynamic applications.

    2.3.1 Database Replication Overhead

    When directory database replicas (copies) receive requests from OpenLDAP servers to updateentries, they are forwarded to the designated master server which is responsible for maintaining themaster database. The master OpenLDAP directory server processes the change and updates itsmaster directory database. Changes are then propagated (replicated) to all of the subordinatereplicas. The replica receives an update request from the master server and processes it into its localcopy of the database. There are several mechanisms for propagating these changes from master toreplicas, but the goal is to ensure that these changes are made as quickly as possible, whilemaintaining data integrity across the database replicas.

    There is a cost associated with the mechanism that propagates these updates to the databasereplicas. The replication process must ensure that each server receives and updates its localdatabase, in order to deliver data consistency across the OpenLDAP environment. This databasereplication overhead can reduce the overall throughput of the directory database server infrastructurefor the most dynamic applications.

    2.3.2 The Hidden Costs of Database Replicas

    In addition to the replication overhead, there is also the cost of the actual update to each of thereplicas. Once the update (add, delete, or modify) is sent via the replication mechanism, the replicadatabase server has to process it. Each update request from the master to a replica is an update

    Copyright 2009, Symas Corporation Page 6 of 13

  • 7/31/2019 OpenLDAP Scaling Guide

    7/13

    transaction. The processing of that update transaction is not radically different than the processingthe master server had to do in order to update the master database. As a result, the replicadatabase servers need to be nearly as powerful (and expensive) as the master serversthemselves because they have to handle similar levels of update load.

    It also means that distributing the update load across multiple master database servers is rarely a

    load-balancing solution because the updates ultimately have to be reflected on each of the mastersand all of the replicas anyway. Multi-master solutions can help manage peak loads on particularservers but the aggregate load must be supported also.

    3 MySQL Cluster as a Data Store for OpenLDAP

    On-Line applications, especially within the telecommunications industry, demand directory databasesthat can scale to 100+ million entries with much higher update rates. The challenge confronting manyOpenLDAP developers and administrators today is how to maintain the performance and availabilitybenefits of creating redundant replicas of the directory database, while overcoming the challenges ofincreased performance overhead, management complexity and cost. It is clear that the architecturalmodel of each OpenLDAP directory server managing its own unique database is no longer viable forthe emerging set of large and dynamic directory-based applications.

    3.1 Maintaining Redundancy

    To address the challenges of growing directory databases, hosting OpenLDAP directory data in adatabase that is shared over a network, transparently providing directory database services, cansignificantly increase scalability and simplify administration, while at the same time, reducing the costsof redundancy and updates for the most dynamic and write-intensive directory applications.

    Due to the explosion in available bandwidth and CPU power, as well as cheaper and faster storage(RAM, Disk, SSD), distributed database solutions have become viable for hosting the datastore of an

    OpenLDAP directory. By using this approach, the number of database replicas can be reduced whilelowering the cost of maintaining the database. This solution also provides redundancy features andservices that guarantees the most demanding OpenLDAP directory requirements. This isaccomplished by eliminating unnecessary copies of the directory data store and the processingneeded to maintain those copies, while delivering on the availability and performance requirements.

    Delegating the management of the OpenLDAP directory data to dedicated, high-availability clustereddatabase technologies addresses the issue of replication overhead and database integrity, freeing updirectory servers. All database updates are propagated by extremely efficient and trustworthymechanisms, at a lower cost and with less overhead of traditional methods.

    3.2 MySQL Cluster CGE: Smart Network Database

    MySQL Cluster 1 is a real-time database that combines the flexibility of a high availability relationaldatabase with the low TCO of open source. It features a shared-nothing distributed architecture withno single point of failure to assure 99.999% availability, allowing users to meet their most demandingmission-critical application requirements. Its flexible design, supporting both in-memory and diskbased data, delivers consistent, millisecond response times with the ability to service tens ofthousands of transactions per second. MySQL Cluster supports the ability to perform manyadministrative tasks online without affecting service, such as scaling processing and data storage,

    1 For more information on MySQL Cluster including datasheets, whitepapers, webinars and case studies, please refer tohttp://www.mysql.com/products/database/cluster/

    Copyright 2009, Symas Corporation Page 7 of 13

  • 7/31/2019 OpenLDAP Scaling Guide

    8/13

    performing back-ups, updating database schemas and upgrades of hardware and software within thecluster.

    MySQL Cluster eliminates the need for expensive shared storage, and runs on a range of commodityhardware and OS platforms, making it the most open and cost-effective database solution for missioncritical applications anywhere.

    Figure 2: The MySQL Cluster architecture delivers carrier-grade availability and performance,without the traditional carrier-grade price

    3.2.1 MySQL Cluster Architecture

    MySQL Cluster CGE (Carrier Grade Edition) consists of three different types of nodes, each providingspecialized services within the cluster.

    Data Nodes are the main nodes of the cluster. They provide the following functionality to the cluster:

    Data storage and management of both in-memory and disk-based data

    Automatic and user defined partitioning of data

    Synchronous replication of data between data nodes

    Transactions and data retrieval Automatic fail over

    Resynchronization after failure

    By storing and distributing data in a shared-nothing architecture, i.e. without the use of a shared-disk,if a data node happens to fail, there will always at least one additional data node storing the sameinformation. This allows for requests and transactions to continue to be satisfied without interruption.Data nodes can also be added on-line, allowing for unprecedented scalability of data storage.

    Copyright 2009, Symas Corporation Page 8 of 13

  • 7/31/2019 OpenLDAP Scaling Guide

    9/13

    Application Nodes are the applications connecting to the database. This can take the form of anapplication leveraging the high performance NDB API, such as LDAP servers via a driver to MySQLCluster. MySQL Servers can be deployed which perform the function of SQL interfaces into the datastored within a cluster. Thus, applications can simultaneously access the data in MySQL Cluster usinga rich set of interfaces, such as SQL, LDAP and web services. Moreover, additional Application nodescan be added online.

    Management Nodes manage and make cluster configuration information available to other nodes.The Management Nodes are used at startup and when there is a system reconfiguration.Management Nodes can be stopped and restarted without affecting the ongoing execution of the Dataand Application Nodes. By default, the Management Node also provides arbitration services, in theevent there is a network failure which leads to a split-brain or a cluster exhibiting network-partitioning.

    With this distributed architecture, where dependencies have been minimized, applications continue torun and data remain consistent, even if any one of the data, application, or management nodes fail.

    3.2.1 Efficient Synchronous Replication

    MySQL Cluster CGE provides an additional layer of intelligence and automation not found indatabases that have traditionally been used to store OpenLDAP data. MySQL Cluster stores thedatabase on a cluster of data nodes and transparently propagates all updates to the cluster via itssynchronous replication mechanism. It uses an internal, secure, ACID compliant two-phase commitprotocol that is substantially more efficient than traditional database replication. Clusters can also bedistributed across geographically disparate sites and kept in sync using an asynchronous replicationprotocol.

    As a result, users can deploy MySQL Cluster to host the data store of the OpenLDAP directory, andtake advantage of the in-built replication mechanisms to maintain multiple copies of the data. As aresult, DBAs can implement replication with significantly less effort and lower cost than traditionalapproaches.

    Figure 3: MySQLCluster allows simple, fast and secure replication of data updates

    3.2.2 Distributed Data Storage to Reduce Costs

    MySQL Cluster simplifies sharing copies of the data across OpenLDAP servers. Performance is tunedfor shared access and users can easily establish the optimum number of physical data nodes neededto support multiple database replicas, with the required levels of redundancy and performance.

    Copyright 2009, Symas Corporation Page 9 of 13

  • 7/31/2019 OpenLDAP Scaling Guide

    10/13

    As a real-time database, MySQL Cluster meets the most stringent latency requirements ofcommunications applications by storing data in memory. This serves to minimize the impact ofmoving data from a local data store co-hosted on a directory server to a centrally accessed networkeddatabase.

    Traditional OpenLDAP deployments, co-locate the directory and the database on the same server,

    requiring expensive SMP hardware. As MySQL Cluster can distribute the database across severalservers, while maintaining fast access to data storage, the overall memory and system cost can besubstantially reduced.

    3.2.3 Geographical Redundancy

    The ability to withstand site failures by replicating the database of the directory across multiple remotelocations is an important capability for many deployments. Geographic Replication with conflictdetection and resolution is available as an option with MySQL Cluster, allowing OpenLDAP directorydatabases to be efficient synchronized across multiple data centers.

    Figure 4: Geographic Replication extends 99.999% database availability across remotelocations

    3.2.4 Simplified Design and Deployment

    With traditional OpenLDAP data stores, users must carefully configure and deploy master databaseservers and their replicas to conform to the update limitations of the database server replicationprotocol. Applications have to be engineered to write changes to one master database server, whilereads can be performed on any of the replica database servers. Using MySQL Cluster as the datastore for the OpenLDAP directory, writes can happen on any OpenLDAP server connected to MySQLCluster. This significantly boosts the write performance of the directory data store and is of greatimportance to feature-rich next generation communications services and networks

    MySQL Cluster guarantees the integrity of updates, independent of server relationships or networkconfiguration, thereby simplifying the design and deployment of highly available, highly scalableOpenLDAP directories.

    Copyright 2009, Symas Corporation Page 10 of 13

  • 7/31/2019 OpenLDAP Scaling Guide

    11/13

    3.3 Integrating Directories with MySQL Cluster

    Using MySQL Cluster as the OpenLDAP directory data store requires no modifications to theOpenLDAP server or to its applications, ensuring compatibility with existing directory services. An

    interface to MySQL Cluster takes advantage of the directory server features. Furthermore, directorydata managed by MySQL Cluster is also accessible for applications wanting access to it via nativeNDB and SQL application programming interfaces.

    3.4 Scaling OpenLDAP with MySQL Cluster Carrier Grade Edition

    MySQL Cluster can be used as a data store for directories responsible for the authentication andauthorization of devices and subscribers within Communications Service Providers applications.Target deployments would typically involve OpenLDAP directories demanding frequent look-ups andmodification of subscriber data, typically with 100m+ entries. MySQL Cluster offers:

    1. Seamless scalability upgrade, with no changes to the LDAP applications2. High rates of directory lookups (reads) ,3. High rates of directory updates (writes).

    Prior to MySQL Cluster, the only real alternative for these demanding OpenLDAP directories werevery large SMP systems with vast memory capacity (RAM) acting as a cache for directory data.

    Using MySQL Cluster as the OpenLDAP data store, a distributed cluster of data nodes, based oncommodity systems can each handle a subset of the directory database. By distributing RAM acrossnodes, the costs per-GB and per-system are greatly reduced. These benefits can be achieved withoutany significant administrative overhead, while maintaining transparency to the directory service'susers and applications, and by preserving and enhancing the inherent value of directory services inthe enterprise and telco environment. This approach provides very high levels of performance withmassive scalability and predictability. It also dramatically reduces the cost of acquisition, deploymentand management of these very large OpenLDAP directory databases.

    Figure 5: Simplified scaling to handle the most demanding OpenLDAP directory databaseworkloads

    Copyright 2009, Symas Corporation Page 11 of 13

  • 7/31/2019 OpenLDAP Scaling Guide

    12/13

    MySQL Cluster Carrier Grade Edition's transparent replication and back-up services also extendthese benefits from large, dynamic OpenLDAP directory database to smaller high-value OpenLDAPdirectory. It makes a great deal of sense to migrate a production OpenLDAP directory data store offtraditional database technologies to MySQL Cluster Carrier Grade Edition, long before the growth ofthe OpenLDAP directory database makes scalability of the data store a major issue. Once thatrelatively simple conversion is complete, users can grow the directory database using the superiorscalability of MySQL Cluster without impacting the directory client applications.

    4 Conclusion

    The growth of on-line services in both enterprise and telecommunications networks is driving a radicalchange in the way directory servers store and maintain their data. Update rates are increasing, theamount of data being stored for each entry is growing while availability and performance demands arebecoming ever more stringent. This demands different database design and implementationphilosophies.

    In many existing environments, the OpenLDAP directory and the database are deployed on the samehost. The server has to be equipped with sufficient RAM to act as a cache for the database, therebysupporting response time requirements, and must be powerful enough to process updates quickly. As

    OpenLDAP directory databases grow in size and updates become more frequent, so a higher load isplaced on each directory server. Many OpenLDAP directory database environments deploy multipleredundant systems, comprising masters and replicas, in order to meet availability and performancedemands. However, a database replication overhead can be incurred in order to maintain dataconsistency across database replicas. These conditions cause spiraling hardware requirements,along with increased operational costs and complexity, while reducing business agility.

    Using the OpenLDAP Driver for MySQL Cluster Carrier Grade Edition, the data store of theOpenLDAP directory can be decoupled from the OpenLDAP directory server, and presented as ashared resource over the network using the real-time, carrier-grade MySQL Cluster database. UsingMySQL Cluster's in-built mechanisms for data replication and its real time design, users can increasethe performance and availability of their database serving OpenLDAP with lower replication overhead,reduced management complexity and savings in hardware costs.

    Developers do not need to concern themselves with database replication technologies or HighAvailability mechanisms, and their applications continue to work unchanged, providing a seamlessupgrade to existing OpenLDAP environments.

    MySQL Cluster Carrier Grade Edition, with associated Professional and Training Services, makes anideal solution to address the scalability challenges of the most dynamic and fast growing OpenLDAPapplications.

    5 References

    OpenLDAP: http://www.openldap.org/

    Symas: http://www.symas.com/

    MySQL Cluster on the web: http://www.mysql.com/products/database/cluster/

    MySQL Cluster Datasheet:http://www.mysql.com/products/database/cluster/mysql-cluster-datasheet.pdf

    Copyright 2009, Symas Corporation Page 12 of 13

  • 7/31/2019 OpenLDAP Scaling Guide

    13/13