Reddit Architecture and Extension Proposal

Team Narnia“Winter is Coming”

Brendan FooteDan MaasMaitri ParmarCaleb Spronk

December 14, 2012

RedditPhase IV: Architectural Extension

1

1. Version History

Version Date Author Comments

1 November 20, 2012 Team Narnia Initial version of the Reddit architecture description

2 December 14, 2012 Team Narnia Added extension;made revisions based on feedback

2

2. Table of Contents1. Version History2. Table of Contents3. Introduction

3.1 Purpose and Scope3.2 Audience3.3 Status3.4 Architectural Design Approach

4. Glossary5. System Context

5.1 System Environment5.1.1 Stakeholders5.2 Overview of Requirements

5.2.1 Functional Requirements5.2.2 Non-Functional Requirements

5.3 System Scenarios5.3.1 Functional Scenarios:5.3.2 Quality based Scenarios:

6. Architectural Forces6.1 Goals6.2 Constraints6.3 Architectural Principles

7. Architectural Views for Top-Level System7.1 Functional View

7.1.1 Existing Functional Elements7.1.2 Reddit with Recommendation Extension: Link Recommendations7.1.3 Functional Scenarios

7.1.3.1 View the Front Page7.1.3.2 Vote on an item7.1.3.3 Compute Nearest Neighbors7.1.3.4 Compute Recommendations

7.1.4 System-Wide Processing7.2 Information View

7.2.1 Data Structure7.2.2 Data Flow7.2.3 Data Ownership7.2.4 Information Lifecycles7.2.5 Timeliness and Latency7.2.6 Archive and Retention

3

7.3 Deployment View7.3.1 Runtime Platform Model7.3.2 Component Descriptions7.3.3 Software Dependencies7.3.4 Network Model

8. Architectural Views for Caching Subsystem8.1 Functional View

8.1.1 Functional Elements8.1.2 Functional Scenarios

8.1.2.1 View the Front Page8.1.2.2 Vote on an Item

8.1.3 System-Wide Processing8.2 Information View

8.2.1 Data Structure8.2.2 Data Flow8.2.3 Data Ownership8.2.4 Information Lifecycles8.2.5 Timeliness and Latency8.2.6 Archive and Retention

9. Perspectives9.1 Top-Level System

9.1.1 Scalability9.1.2 Evolvability

9.2 Subsystem9.2.1 Availability9.2.2 Performance

10. Appendices10.1 Appendix A: Decisions and Alternatives10.2 Alternative 1: The Application Server Approach10.3 Alternative 2: The Incremental Update Approach

4

3. Introduction

3.1 Purpose and ScopeThe internet has entered a stage where it contains sufficiently large amounts of information that it can be challenging to find the content that you are looking for. Most content that comes back from search engines is not of interest to any particular user. Wouldn’t it be great if there existed a place where large numbers of people could store links for topics that they are interested in?

Reddit is a social news site driven by user-submitted links. It allows users to post links to content that they find interesting. It uses an upvote/downvote system for rating these links. There is also a comprehensive nested comment system, where the comments can be voted on themselves. There is also a taxonomy of categories of the content, allowing users to filter what they see.

Reddit allows users to do the following:

● Easily explore taxonomies of content that users may not already be familiar with● Search for specific keywords● Upvote/downvote every link● View content by popularity, which is calculated using the ratio of upvotes to downvotes

over customizable periods of time● Create “Sub-Reddits” for any particular subject they want to. Since each sub-reddit is

entirely controlled by the original creators (and/or whomever they decide to share in control), the site is entirely run by the users, and not the creators of the site.

The purpose of this document is to delve deep into Reddit’s architecture and find out how it handles such massive load on a day to day basis. Reddit is the 3rd largest open-source website in the world, behind Wikipedia and Imgur (which itself was made by a Redditor to have a nice place to host images for Reddit). The scaling of the system to serve it’s large user base is one of the most interesting aspects. Reddit got over 2 billion page views in December, 2011, with over 35 million unique visitors. At a high-level, It is a 4-tier architecture; the web user interface, application, service, and database layers work together to serve web pages to users.

Besides the stated Reddit functionality, we’ve decided to propose some additional functionality that would really help advance Reddit as the go-to Social News site. We believe that any additional reason to get users to register and spend time on Reddit would be a boon to their business, and adding the ability to have recommended links for registered voters based on their voting history seems like one of the most obvious paths to take to achieve this goal. This document will give suggestions, in detail, on how to add this functionality on top of the existing

5

Reddit system.

3.2 AudienceThere are three primary audiences for this document. The first audience would be the Stakeholders of the Reddit system, and a full list of stakeholders can be found in Section 5.1.1 of this document. The second audience will be the instructor and TA of the SEng 5861 class, for which this document was actually created. The last audience would be anyone who would want to implement our suggested extension to the Reddit system, to add a link recommendation system based on registered users’ vote histories.

3.3 Status The status of Reddit and it’s architecture is that it’s already one of the most popular sites in the world right now. They had 3.4 Billion page views in the month of August (2012) alone. Outside of that, they are constantly modifying the architecture and code to help scale the system even further as they find new users as time goes on. This document will describe the architecture of Reddit as pulled down from Github in October of 2012 and will describe in detail our proposed extension to Reddit on top of that particular snapshot of Reddit. This will result in

● Revised Glossary● A reassessment of the architecture design approach, especially the importance of

various perspectives● Additional requirements● An additional system scenario● Functional view additions

○ a new information flow diagram○ assignment of responsibilities and description of interfaces for two new functional

elements○ Functional scenarios with sequence diagrams

● Information view additions○ a new element on the logical data structure diagram

● Deployment view additions○ Two new batch servers

● Revised architectures perspectives● Alternatives approaches for the recommendation extension.

3.4 Architectural Design ApproachTo explain to stakeholders, how Reddit will fulfill it’s mission, two system-level architectural views will be defined: the Functional View and Deployment View, with a quick look at the Information View. One subsystem layer will also be described: the Functional and the Information Views of the caching subsystem. Intertwined with these two Views, we’ll be presenting our suggested architecture modifications to add the recommended links extended functionality. Additionally, at the end of this document, we’ll give a few alternatives to the

6

extension that we present throughout the main portion of this Architectural Description.

Given that there are dozens of perspectives, we chose 6 perspectives which are more critical to Reddit.

1 Security Perspective: Medium○ User credentials need to be secured, so that attackers cannot access login and

password information. Many people use the same usernames and passwords for multiple sites, so a breach would be a major concern

○ For Reddit Gold service, it is import to protect the credit card information. Reddit uses third party payment service like Paypal and Google Checkout for payment.

○ Protect against the DoS attack threats, because high availability of the system is highly desired perspective.

○ Roles and privileges need to be adequately secured so that users can’t do things they are not supposed to be able to do.

2 Availability Perspective: High○ With a worldwide user base, any Reddit downtime will be noticed by someone.○ Reddit is a victim of its own success, where people expect new content to be

available to them at all times.○ Since users are not that invested in the site, downtime might drive them to the

competition.○ Social sharing web sites live and die by their uptime, since nothing can be shared

when they are down, and their reputation for reliability is at risk.○ With so much content (or meta-content) being created, availability includes not

losing that data in case of an outage, which will be important to users.○ In terms of Availability and the proposed extension, the proposed changes should

never negatively affect the normal Availability of the Reddit site as it currently exists. For instance, if any of the changes for the extension go down momentarily, it should alter the main functionality of the site (basically the functionality that already exists in Reddit).

○ Once the extension is completed, it’s tested on real users, and it’s decided that the extension is going to stay for good, it should have the same type of Availability numbers that are expected of the rest of the Reddit site.

3 Scalability and Performance Perspective: High○ Scalability is perhaps the most important perspective due to the massive growth

of daily page views since Reddit’s inception in 2005. The latest numbers I could find was for the month of August, in which they had 3.4 billion pageviews, reaching 39.5 million unique visitors. To go from scrappy startup to 3.4 billion page views in a single month shows how much they’ve paid attention to Scalability so far.

○ According to Alexa.com (the leader in web analytics), Reddit appears to have been growing steadily over the past 2 years, and is showing no signs of slowing

7

down. For the amount of free data you can get on the site, Reddit was ranked the 200th on the list for world web traffic 2 years ago, and is now currently sitting at 135.

○ Even if Reddit were only holding steady in it’s traffic rankings, the internet has been steadily growing year over year, showing 14% growth from June 2011 to June 2012.

○ When President Obama posted an Ask Me Anything in September, it was later reported as Reddit’s largest day of traffic ever... and it was definitely apparent, because the site was mostly unresponsive from right when Obama’s Twitter account mentioned he was going on Reddit, to about 2 hours after he got done. If they plan on doing such high profile Ask Me Anything’s in the future, they need to be able to scale even more.

○ Even at peak load, the site should still perform well, and should be responsive when submitting links, making comments, and providing up and down votes.

○ Even with the added extension’s functionality, the extension cannot create a noticeable performance hit to the user, and it should not affect the linear scalability of the Reddit system that already exists.

4 Evolution Perspective: Medium○ Reddit must be flexible enough to compete with the constant evolution of social

applications. However, it is not crucial for reddit to replicate the most popular features of other social networking sites. Reddit has a well-defined purpose, and should not stray too far from that.

○ An important area where reddit must evolve is in it’s user interface. Sites that are intuitive, easy to use, customizable, and personalizable are more popular with end users than sites that are not. Merely changing the layout of reddit over time may be enough to improve usability. These types of changes are often distinct from the rest of the code base and do not impose large costs.

○ Making the site customizable, though, is a more deeply rooted attribute of a system. Implementers will want to be able to have more and more flexibility to control the features and user interface of the reddit they are installing.

○ Over time, users will want to personalize more and more of their view of reddit. The code base should be designed in such a way that adding personalization features does not incur heavy costs.

○ Reddit should be able to handle eventual changes to how content dealing with links is organized and hosted.

○ In summary, reddit should be able to evolve, but it is not the most important quality that the system needs in order to attract users and keep them interested in the site.

○ The addition of the proposed extension should not hinder the overall Evolvability of the Reddit system as a whole. There should never be a time where new functionality is impossible to add due to decisions made for the link recommendation extension.

8

5 Usability Perspective: High○ With so much competition in the social news space (e.g., Slashdot, Digg, Hacker

News) Reddit must be very intuitive and easy to use.○ Reddit has a very focused set of functionality, such as posting links, voting, and

commenting. These must be very easy to do.○ The faster users can navigate through the site, the more page views, the more

advertising revenue it can get.

6 Localization Perspective: Medium○ Reddit is used mainly by English speakers, but the non-English contingency is

expected to grow.○ This is important because their is a non-trivial subset of Reddit users who have

native languages other than English. They should be able to read and contribute to Reddit in their native language.

○ This will also affect the layout, as some languages are right to left, and so all content should be laid out accordingly.

9

4. Glossary

Term Definition

Subreddit A community of related links that fall under a specific topic, have their own views (new, top, and controversial), and have their own independent moderators. Anyone can create a new subreddit.

Moderator A person with some administrative privileges for a subreddit. These are used to control the appearance of the subreddit, manage links, comments, and flag spam, among other things. See http://code.reddit.com/wiki/help/moderation for details.

Award The reason someone is being honored -- the category in which they excel(led). For example, "Best Link".

Trophy The digital token placed on the recipient's userpage to commemorate that particular person winning that particular award on that particular day.

Front Page For logged out users, this would contain the top 25 posts from the top 20 subreddits, a list curated by Reddit staff. For logged in users it would contain the top 25 posts from their subscribed subreddits.

Tuple A set of values stored as a list.

Dictionary A set of key-value pairs, where each key has a value that is a list of values.

Neighbor Account For recommending links to the users, we are going to calculate the nearest neighbor accounts in relation to their vote histories. This calculation is done in a batch process, and are stored in the new Neighbor Account thing database.

10

5. System Context

5.1 System EnvironmentThe system environment can be best described in a context diagram:

Figure 1 - Reddit Context Diagram

While Reddit is mostly a set of Python code running on App Servers, there are several systems that it integrates with to serve those cat memes. First, we have the actual data storage being hosted in 2 different databases (PostgreSQL or Cassandra) and also temporarily being stored in a memcache to help with performance. How these systems work together will be gone into much more detail in the Subsystem Architecture Section number 8. The little bit of media content that Reddit is responsible for storing (such as thumbnails) is stored on Amazon WebServices, otherwise everything else is just accessed via the links submitted to Reddit (the Content Web Sites, imgur.com, etc). Reddit also uses Amazon for their CloudSearch service,

11

which provides a fully-managed search service for the site. There is a publicly available API for developers to use to access Reddit content (one such 3rd party tool would be the Reddit Enhancement Suite). And last is the actual Web Browser that’s accessing the Reddit website.

5.1.1 StakeholdersReddit’s stakeholders are as follows:

● Venture Capitalists/Sponsors○ They care because they will be funding the majority of the startup costs, and they

want to ensure that they will be able to see a return on their investment.○ Projects that do not sell an actual product, but rather make money off of

subscriptions and advertising usually need additional startup funds.● Management

○ They care because they want to ensure that the project is successful and that it meets its goals.

● Developers○ Developers will be implementing the system and will be responsible for

translating requirements into code, and so will want the requirements to be clear and to make sense.

● System administrators○ They will be responsible for the operational aspects of the system, and have an

interest in having an easy-to-maintain system.● Users

○ reddit.com is driven by the number of users that use the site. By getting user feedback into this project, it will be more likely that users will want to use the site. The more users that use the site, the more advertising revenue will be generated and more premium services will be ordered.

● Advertisers○ They want to pay for ads on sites that generate large amounts of traffic.○ They also want their ads to be prominent enough for people to click on them, but

not too prominent such that users will avoid the site.

5.2 Overview of Requirements

5.2.1 Functional Requirements

● Users shall be able to view links for a pre-defined, default set of categories (the front page). Users shall be able to sort these links different ways (e.g., hot, newest, top today, top this month, top all time, controversial).

● Users shall be able to post links in any category. Users must be signed in to post links.● Users shall be able to log in.● Users shall be able to log out.● Users shall be able to vote on all links. Users must sign in to vote.● Users shall be able to comment on all links, and post reply comments to other user’s

comments. Users must be logged in to create comments.

12

● Users shall be able to subscribe to subreddits, or communities.○ each community shall be independent from other communities and moderated by

a team of volunteers.● Users shall be able to view links from the set of subreddits to which they have

subscribed, with all the same sorting controls they could apply to the front page● Daily awards for various categories shall be given out to encourage user participation.

○ awards are calculated daily for the previous day and the winners listed on an honor roll.

○ Each winner receives a trophy on their user page as a token of having won the award.

● Each user shall have a private message box, with which they can write and read private email messages to and from other reddit users.

● Users shall be able to sign up for premium services (reddit gold). This includes A set of extended features you can get for your reddit account, including but not limited to:

○ A trophy on your userpage○ The ability to turn off sidebar ads, sponsored links, both, or neither○ The option of seeing twice as many comments at once without having to click

"load more comments"○ The ability to see up to 100 subscribed subreddits in your front-page listing○ New comment highlighting: see what's been posted since the last time you

visited a thread○ You can add notes to your friends to help you keep track of them all○ See your karma broken down by subreddit.

● Users shall be able to access their personal profile, where they can:○ View comments they’ve made○ View links they’ve submitted○ View liked links○ View disliked links○ View hidden links○ View saved links○ View they like and comment Karma

● User shall be able to configure their personal preferences for controlling how they view the website. This includes:

○ Primary language settings○ Link, comment, content, message, display, and privacy settings

● User shall be able to share a link through email by clicking a hyperlink and entering an address.

● User shall be able to report a link (e.g., for being offensive) by clicking a hyperlink.● User shall be able to hide a link (i.e., the link doesn’t show up on any list views for this

user anymore) by clicking a hyperlink.● User shall be able to save a link for later viewing on a personal “saved links” page by

clicking a hyperlink.● Users shall be able to get recommended links based on their voting history and that of

other users

13

5.2.2 Non-Functional Requirements

● Accessibility○ reddit shall be accessible to persons with disabilities.○ reddit shall be translated into all languages with internet user bases exceeding

50,000 people.● Availability

○ reddit shall be on-line and properly functioning at all times. There should be less than 1 hour downtime per year.

● Configurability○ reddit server processes shall be configurable in order to improve the

performance of each action a user can take when visiting reddit.● Documentation

○ user functionality shall be documented to help people use reddit.○ administrator functionality shall be documented to help reddit administrators and

open-source implementers.○ source code quality guidelines shall be documented so that source code changes

can be contributed in a consistent manner.● Extensibility

○ reddit shall be designed in a modular fashion so that new features can easily be added.

○ reddit shall be designed so that open-source contributions can be easily made.● Open Source

○ the reddit core website shall be open source. Some proprietary aspects that help reddit generate revenue do not need to be open source

● Performance○ reddit processes shall respond to all user actions within 1 second.

● Portability○ reddit shall be designed so that it can be installed on a variety of operating

systems, including linux, mac, and windows.● Privacy

○ user data will not be revealed or sold to third parties.● Security

○ usernames, email addresses, and passwords shall be encrypted● Usability

○ the site shall provide an intuitive user interface that requires at most 15 minutes of training before a user can start posting links, voting on links, and providing comments.

● Scalability○ reddit shall be built so that large increases in the volume of users can be handled

without a noticeable performance decline.

5.3 System Scenarios

5.3.1 Functional Scenarios (ranked in order of importance, most to least):

14

Submit a Link● Overview: How the system handles a user submitting a link.● System state: The user has an account and is logged in.● System environment: The deployment environment is operating normally, without

problems.● External stimulus: a user submits a link through the link submission page● Required system response: the link should be validated (correct captcha, not a repeat

link), then queued up for storage and indexing in the database. Once that has completed, the information from the database transaction should be propagated out to the Memcache.

Vote on a Link● Overview: How the system handles a user voting on a link or comment● System state: The user has an account and is logged in.● System environment: The deployment environment is operating normally.● External stimulus: A user clicks on either the up or the down arrow directly to the left of a

link or comment.● Required system response: The system increments the upvote count for an upvote, and

decrements the downvote count for a downvote, and displays the updated number immediately to the user. The system keeps track of your vote and stores it in your profile area so that it can be viewed later. Additionally, your link karma and comment karma are updated, depending on the popularity of the link or comment.

View the Front Page (No User Logged In)● Overview: How the system handles a user who is not logged in viewing the front page.● System state: The user may or may not be logged in.● System environment: The deployment environment is operating normally.● External stimulus: A user goes to the reddit homepage.● Required system response: The incoming request gets the default 20 subreddits for

unregistered users (pics, gaming, worldnews, videos, etc), and gathers the top 25 “hot” listings for those subreddits. From there, the user is taken to the homepage, with those 25 listings loaded for them.

User Creating an Account on Reddit● Overview: How a user creates an account (so they can vote on links/comments, submit

links/comments, and subscribe to subreddits outside of the 20 default subreddits).● System state: The user does not already have an account.● System environment: The deployment environment is operating normally.● External stimulus: A user follows the link to create a new account.● Required system response: The system asks the user for a username, password, and

email (optional). Before the user can submit their information, they must enter a captcha to verify that they are a real person. Once they submit their data, they are sent a confirmation email (if email was entered) and are automatically logged in.

Leave a Comment on a Post

15

● Overview: How the system handles a user leaving a comment● System state: The user has an account and is logged in.● System environment: The deployment environment is operating normally.● External stimulus: A user posts a comment on a given post (not a reply to a previous

comment)● Required system response: Validate that it is an actual person leaving the comment via

captcha. The comment is queued up for storage and indexed in the database. Once that has completed, the change to the database should be propagated out to the Memcache. Also, the page is updated to have the new comment among the existing comments to allow for upvoting, editing, deleting, or replying to.

Subscribe/unsubscribe to a Subreddit● Overview: This scenarios pertains to how a user subscribes to various feeds or

channels, called “subreddits” or just reddits.● System state: The user is logged in and is viewing their current reddits.● System environment: The deployment environment is operating normally.● External stimulus: A user enters the reddit name they wish to subscribe to, or clicks the

“unsubscribe” next to the reddit they no longer wish to subscribe to.● Required system response: When subscribing, the reddit is added to the list of current

reddits on the user’s view of the reddit homepage. When unsubscribing, the reddit is removed from the list of current reddits on the user’s view of the reddit homepage.

View the Front Page (Logged In User)● Overview: How the system handles a registered and logged in user viewing the front

page● System state: The user must be logged in already● System environment: The deployment environment is operating normally.● External stimulus: A user visits the Reddit homepage (www.reddit.com)● Required system response: The incoming request first gets the list of subscribed

subreddits for the logged in user, then gathers the top 25 “hot” listings for that given subset of subreddits. From there, the user is taken to the homepage with those 25 listings loaded for them.

View the Recommended Links Page (Logged In User)● Overview: How the system handles a registered and logged in user viewing the

Recommended Links page● System state: The user must be logged in already● System environment: The deployment environment is operating normally.● External stimulus: A user visits the Recommended Links page

(www.reddit.com/recommendedlinks)● Required system response: The incoming request goes to the Cassandra Cache to get

the latest pre-computed Recommended Links for the logged in user, and displays them on the page.

5.3.2 Quality based Scenarios:

16

Scalability: Traffic Triples in Size● Overview: How the system deals with rapid increases in traffic.● System environment: The deployment environment is operating normally.● Environment changes: User requests go from 500/second to 1,500/second for over an

hour.● Required system behavior: The system should automatically spin up 50% more AWS

machines to handle the increased traffic, with the Load Balancer divvying up the traffic between these and the existing app servers.

Availability: An Application Server Fails● Overview: How the system deals with one of the app servers failing.● System environment: The deployment environment is operating normally. Users are in

the middle of http sessions, and new ones are coming in.● Environment changes: One of the app servers stops responding to http requests.● Required system behavior: The system should no longer route new user sessions to the

down application server. It should route all traffic to the remaining app servers while the downed server restarts (via an automated process). It should also preserve the sessions that were already in progress.

Modifiability: Add an Attribute to links● Overview: How the system deals with additions to the logical data structure, i.e. a new

attribute is added to links.● System environment: The deployment environment is operating normally.● Environment changes: The developers add a new attribute to one of the entities in the

system.● Required system behavior: The system does not require a change to the database

tables. Adding code to handle the new attribute and mapping it to the database, and then recompiling the application is enough. This assumes that attributes will be stored in the nosql database, not the relational database.

Performance: Make reddit responsive● Overview: 50 million people are logged in to reddit● System environment: The deployment environment is operating normally.● Environment changes: 90% of users begin to post links and comments, and vote on links

and comments.● Required system behavior: Reddit should respond in less than 1 second to each user’s

request. This means that critical information is posted back to the user in real time, and information that the user does not see immediately can be queued or processed when resources become available.

6. Architectural Forces

6.1 Goals

17

Reddit is in the already crowded social news tech space, with many large existing competitors like Digg and Slashdot. However, Reddit differentiates itself from these other websites by focusing entirely on user driven content. Unlike Digg, which is also built upon user submitted links, Reddit gives complete control over to it’s community instead of putting a large amount of sponsored content to the main portions of the site. Ideally, Reddit wants to be your go to source for all news online (hence the tagline “The Front Page of the Internet”). But besides that content goal, there are many architectural goals that Reddit must achieve in order to be competitive. If any of these are lacking, the users of the site can easily use one of their competitors (Social News sites doesn’t have high lock-in like normal Social Networks like Facebook).

To support the business driver, these are the architecture goals that should be fulfilled:

1 Extensibility. Reddit’s architecture should easily allow changes for new feature requests needed to remain competitive.

2 Scalability. As Reddit becomes popular, it should be able to manage abundant amount of data. It must be linearly scalable across hardware resources to support an exponentially growing user base.

3 Usability. Reddit should be easy to use and require a minimum learning curve in order to start using the site.

4 Accessibility. Reddit should be accessible to people with physical, visual and auditory impairments.

5 Localizable. Reddit should be able to be translated into all major languages without any loss of functionality to the end user.

6.2 ConstraintsTo meet the Architectural Goals listed above, Reddit has the following constraints:

● Pythons Global Interface Loc makes multithreading performance prohibitive. Reddit cannot use python’s thread feature, because it is too slow.

● Cython (Python with C types) must be used to for the memcache interface libraries because it has significant performance benefits over python.

● Reddit must continue to use a customized version of the Pylons framework because upgrades would break the application.

● Reddit needs to be developed in Python, because all the developers are familiar with Python.

6.3 Architectural PrinciplesReddit’s architecture has 4 main principles:

Principle: All data should be cached.

Rationale: One of the primary goals of Reddit is it should be responsive, otherwise the user will switch to some other site, as the majority of the content available on Reddit are links to external sources. At least the basic user operations like

18

submitting a link, voting and making comments should be fast (less than 1 second).

Implications: ● Store data in arrays of caches for fast retrieval.● Use precomputed queries to provide user with the results quickly, so that

you don’t waste time on collecting data and preparing the representation, when user demands for it. Cache the results of the precomputed queries.

● When user initiates an action, do the minimal amount of work and respond to the user. Queue all the jobs to be completed related to the task and complete it in the background. This will reduce the time the user has to wait for response.

● Cache all rendered pages so that if the same (unaltered) page is requested, it will not be re-rendered.

Principle: Reddit should be scalable. To handle the increasing traffic, adding a new app server or database should be easy.

Rationale: Reddit being a social news website, the content is going to increase exponentially day by day. If the system is not scalable it is going to crash and burn after a limit. Also, Reddit is one of the largest sites in the world (in the top 150 sites), and has been increasing traffic steadily since it’s inception in 2005.

Implications: ● Adding more applications server should be easy. If you have more than one server you waste resources in each server notifying all other servers of its state. To remove this overhead, keep no state on the applications server. Use a caching system and have separate servers for the caching. Every app server will get the state from the caching system servers, which will be a the central repository. This will make it easy to add a new application server.

● Separate each entity type on different database server. Have multiple server for one entity if needed. For example the comment database may be a huge one. It may need more than one server. Separating out entities will help us scale each one separately as needed.

Principle: Reddit should be extensible. Adding a new feature should be easy to any entity should be easy, without affecting other entities.

Rationale: If we add a new feature to link, it should not require us to change all the other entities like comments etc. It also should not require us to do a huge data migration of the existing data. This can risk significant downtown and may affect the existing data adversely. Users hate it when they lose their data.

Implications: ● Have a database schema such that adding a new feature does not require adding a new column to a table, because this will make data migration difficult.

19

● Organize tables such that adding an attribute to one entity does not affect other entities.

Principle: Application servers should not need to communicate with each other.

Rationale: This is necessary to allow horizontal scalability. Requests should be handled at the granularity of a single application server. That way, any number of application servers can be deployed/removed/replaced and reddit will continue to function as expected.

Implications: ● A load balancer is needed in front of the application servers to route requests to the appropriate location.

7. Architectural Views for Top-Level System

7.1 Functional View

7.1.1 Existing Functional Elements

Reddit is a web application for users to submit content, and comment, view, and vote on content submitted by others. It is written in Python and uses the popular Pylons web framework, which is an implementation of the Model-View-Controller software architecture, but customized to behave more like the lighter-weight framework web.py, which the authors created themselves for Reddit originally and later left so they wouldn’t have to support it.

When responding to an HTTP request, Reddit does the normal thing of finding the corresponding controller class, which then marshals the model objects and views it needs to create a page. It checks the cache both for the content listings as well as the rendered html pages. If it doesn’t get a hit, it goes back to the PostgreSQL database for dynamic queries, and Cassandra database for precomputed queries. The creators originally had each app process use its own cache, but keeping them in sync meant an app server had to notify all the others of the new entry, leading to a factorial explosion of communication, so a shared cache was swapped in to create better scalability.

When the user takes actions on the site, these events are queued up and processed by batch operations asynchronously to avoid blocking the user. Simultaneously, when the user takes an action the updates will appear immediately, even though the backend processing of those changes has yet to take place. These jobs insert the new information into the database, whereafter it is pulled out by map reduce jobs to precompute values that are pushed into a cassandra cache. The jobs pulling off the queues also see if they can obviate the need for this map reduction by simply mutating the cache entries in place.

20

7.1.2 Reddit with Recommendation Extension: Link Recommendations

Figure 2 - Top-level functional view modified for link recommendation extension.

The extension chosen for reddit is a link recommendation extension. An extensive description of this extension and it’s rational can be found in Appendix A. Three design alternatives were proposed. The alternatives that were not chosen are also described in Appendix A. Below is a

22

description of the chosen alternative.

The chosen approach, which will hereafter be known as the “batch approach”, takes advantage of distributed computing resources to calculate link recommendations. The component modifications required are shown in figure 3 above. In this approach, a nearest neighbor batch process will run periodically to determine the N other closest user accounts to each user’s account. The resulting list of nearest neighbors is stored as a relation in PostgreSql. The link recommendation batch process will run periodically as well, and will use the nearest neighbors and their voting history to determine what links should be shown as recommended links in this user’s account. For a visual representation of data flow, see the data flow diagram in figure 4.

The positives for this approach are as follows:1 Cost:

a Simple to implement, leading to lower costb Least computing power required, and hence lowest cost, especially since the

frequency at which the batch is run can be configured however the administrators need to balance their cost and data currency.

2 Scalability: Batch servers can be added to the process at any time.3 Availability: since the components will be running on dedicated batch servers, their

availability will not affect the availability of the application as a whole. Individual batch servers can be restarted, added, and removed without affect what a user sees when they visit reddit.

4 Evolvability: The batch process is decoupled from the rest of reddit, so the two can evolve without affecting one another.

The negatives for this approach are as follows:1 Recommendations for all users will be computed, though it is not known that all of them

will be used2 Data will only be refreshed on a schedule, meaning it will lose its currency over those

intervals

Overall, the positives and negatives were compared to those of the alternatives in Appendix A, and it was decided that the batch approach was the most reasonable implementation.

23

Figure 3 - Data flow diagram for batch recommendation extension.

Element Name Nearest Neighbor Batch

Responsibilities Take the entire set of accounts on the site and for each one, find those that are most similar to it based on voting history (both in terms of common votes and similar distribution of votes among subreddits)

Interfaces - Inbound data: accounts table,commands: runnable (e.g., by a scheduler)

Interfaces - Outbound the “Neighbor Account” things in the thingdb

Element Name Link Recommendation Batch

Responsibilities Access Neighbor Account, Link and Votes thing in ThingDB. Find the upvoted links by the neighbors that don’t have a vote from this user and recommend those links to the user. Update the recommendation results in Cassandra by interacting to CachChain.

24

Interfaces - Inbound the “Neighbor Account”, Link and Vote thing in the thingdb

Interfaces - Outbound Link recommendations for each user

Element Name Routing

Responsibilities The routing class maps requests for specific URLs to the controlling class that will coordinate the rendering of the appropriate page and the action that should be invoked on the controller.

Interfaces - Inbound HTTP GET and POST requests

Interfaces - Outbound Implementations of the WSGIController class

Element Name Controllers

Responsibilities It facilitates calls from the Routing component( which are coming from the client).

Interfaces - Inbound POST functions e.g POST_comment

Interfaces - Outbound GET function e.g GET_comment

Element Name Pages

Responsibilities Contain classes for all content areas of all the reddit pages.

Interfaces - Inbound Controller calls for HTML queries, Template calls for render

Interfaces - Outbound media scrapper for thumbnails, render initiations for component

Element Name Render Wrapper

Responsibilities Content is set by the content renderer class. Used by the pages to render content.

Interfaces - Inbound render calls from Pages

Interfaces - Outbound Calls to the Cache chain for results

25

Element Name HTML Template

Responsibilities Contains static HTML and placeholders into which HTML can be dynamically interpolated by Python code.

Interfaces - Inbound Placeholder function calls and scripts points (denoted by lines starting with % or between <% and %>)

Interfaces - Outbound none

Element Name SearchQuery

Responsibilities Controllers use this component for search requests

Interfaces - Inbound searchQuery function called by controllers

Interfaces - Outbound search and sort functions on Cloudsearch component

Element Name Cloudsearch

Responsibilities Cloudsearch component represents the search query sent to the Amazon cloud search.

Interfaces - Inbound CloudSearchQuery function called by SearchQuery component for searching.

Interfaces - Outbound Amazon Cloud Search for get the search query results.

Element Name Globals

Responsibilities a container for objects available throughout the life of the application.

One instance of Globals is created by Pylons during application initialization and is available during requests via the 'g' variable.

Interfaces - Inbound a global variable

Interfaces - Outbound database

26

Element Name ThingDB

Responsibilities It’s primary responsibility is to get/put data from CacheChain and PostgreSQL database and to/from Message queue.

Interfaces - Inbound Calls by batch operations to get data from the queue, and CacheChain.Calls by Models to save data or get data from the CacheChain.

Interfaces - Outbound handle_items - Call to Message Queue Interface for pulling jobs.cache_get functions to pull from cacheChain.Add_item - call to Message Queue Interface to putting jobs in the queue.

Element Name Javascript

Responsibilities It renders JavaScript in the pages that are loaded.

Interfaces - Inbound Calls from HTML template page component.

Interfaces - Outbound None

Element Name Batch Operation

Responsibilities These are kicked of as separate processes. There are sources and sinks of data server, and batch jobs move data between those. These servers perform bunch of tasks, which includes-It pulls data from PostgreSQL primary database and map the data to Hadoop.-It pulls jobs from the Message queue and processes those jobs.-It pulls data from the primary database and dumps into Read only DB servers for google search indexing.

Interfaces - Inbound None

Interfaces - Outbound It uses thingDB component to pull/put data

27

from Cachchain and in turn PostgreSQL.It will use thingDb component to get data from Messagequeue interface.

Element Name API Controller

Responsibilities Acts as the controller for almost all AJAX interactions in the web site. Examples include redirecting to the login overlay, validating submissions, fetching additional comments for display.

Interfaces - Inbound HTML GET and POST requests

Interfaces - Outbound page templatesdatabase

Element Name CacheChain

Responsibilities Queries the various caches in order of priority to get the most accurate information as quickly as possible.

Interfaces - Inbound - get- get_multi- simple_get_multi- reset

Interfaces - Outbound - get- get_multi- simple_get_multi(implemented by Postgres, Cassandra, Memcache)

Element Name Memcache

Responsibilities Store values that are shared between application servers. Spread the cache across multiple nodes to increase throughput.

Interfaces - Inbound creating new content in the cache:- set- set_multigetting items out:- get- get_multi

28

updating items:- incr- decr- add- append- prepend- replacedeleting items:- delete- delete_multi

Interfaces - Outbound none

Element Name Reddit base

Responsibilities provide archetypal controller functionality for other controllers to to inherit or invoke

Interfaces - Inbound none

Interfaces - Outbound inheritance and function invocation

Element Name Media Scraper

Responsibilities It gets thumbnails for the link from Amazon S3.


Interfaces - Outbound HTTP requests out to other pages

Element Name Models

Responsibilities Owns data representations of various classes and their business-logic-independent behaviors


Interfaces - Outbound ORM (provided by thing db)

7.1.3 Functional Scenarios

7.1.3.1 View the Front PageViewing the front page is the hallmark scenario for Reddit, because it must be able to happen very quickly and efficiently. The user’s request goes through a load balancer to get to a running

29

Python process. There, the web framework’s routing configuration selects the appropriate controller to coordinate the creation of the page -- in this case, the hotlisting controller. The controller uses the database layer to execute its query (“hot” listings), which goes through a prioritized list of caches to see if the result set has already been computed recently enough. Depending on what ended up being the source (the local cache, memcache, cassandra, or the database itself), it updates the other caches so this query won’t be computed again in the short term. From there, a rendering class is instantiated, which uses an html template and interpolates the listing data. Of course, it too checks the cache to see if the rendered html has been computed recently. The result is then shown to the user.

30

Figure 4 - View the Front Page Sequence Diagram

31

7.1.3.2 Vote on an itemReddit eats its own dog food, so to speak, when it comes to voting on an item. While they have a public-facing API for third-party vendors, the javascript reddit’s own pages use it as well. The service is provided by a Python process, which takes calls and turns them into items on a message queue. That queue is processed asynchronously by a batch server running Python as well. It instantiates a Vote object to hold all the info it pulls off of the queue and writes that to the database. But there’s also a clever performance hack: whereas a separate batch process used to then re-query the database to pre-compute listings it would pump into the cache, the system now cuts out that middle-man and simply mutates the impacted entries in cache currently.

Figure 5 - Vote On An Item Sequence Diagram

32

7.1.3.3 Compute Nearest NeighborsBefore recommendations can be computed using collaborative filtering, it is necessary to find accounts with similar voting history profiles, both in terms of items voted on as well as the subreddits they are in. Any clustering algorithm can then be applied. For example, a first filtering is done by comparing the percentage of votes within each subreddit, and within the resulting group of accounts create vectors based on their votes and set a threshold for the cross product of the vectors. An alternative algorithm could be used, such as partitioning the space of accounts using k-nearest-neighbor. These are turned into neighbor account relationships in the database.

Figure 6 - Compute Nearest Neighbors Sequence Diagram

33

7.1.3.4 Compute RecommendationsTo find links to recommend to a user, look at the links that its nearest neighbor accounts have upvoted (called “missing” upvotes). Put all of these together into a resultset and updates the cache for later access by the controller modules.

Figure 7 - Compute Recommendations Sequence Diagram

7.1.4 System-Wide Processing

Reddit contains batch operations that continuously pull data from the source postgresql database and places it into secondary data sources. For example, there exists process to continuously pull precomputed query data from the amqp queue and assimilate it into the cassandra database. Another example is a batch job that pulls data from the postgresql database and sends it to a dedicated server that is used for google search indexing only. System-wide batch jobs are further described in the deployment view.

7.2 Information View

7.2.1 Data Structure

Even though we chose to do the Functional and Deployment Views for our Top Level description

34

of Reddit, we found a few pieces of the Information View extremely valuable in allowing the Reddit architects to achieve their Scalability and Extensibility goals. The actual database table structure is one of those things:

Figure 8 - Reddit LDS

For such a large website, Reddit has an exceedingly simple data model. They originally were mapping each main entity type (account, link, comment, etc) to it’s own table in the database, as most any application would, but after several painful migrations and being limited on the features they could quickly add, they decided to change it to just the 5 tables you see above. So, instead of having an individual tables for Accounts, Links, and Comments, there’s just one Thing table that would store basic information common to lots of Reddit objects. They use the Data table to store variations on those Things, and the Rel table to store the Relationships between those Things. This gives the Reddit architects large amounts of extensibility for whenever they want to change the structure of these entities. all they have to do is add another Data record, no expensive database migration needed.

The tradeoff for such a flexible data model is slower performance, since so many joins are necessary to get a resultset out. Reddit avoids this being perceived by the user by precomputing the results and storing them in a cache

Here’s a brief description of each of the 5 tables used in the Reddit Database:

● Thing: A Thing is really a generic Reddit object that has a Thing_Id (Primary Key), number of Upvotes, number of Downvotes, can be marked as Deleted, marked as Spam, and a Date. Since there are multiple objects in Reddit that have this generic structure, they were able to abstract their tables out to this level and use the Data table for any deviations that would need to be stored.

● Data: A Data record is used to give extra attributes to a Thing. This way, when an object being stored in the Thing table needs to have a bit more specialized functionality, it can be easily added. The Data record has 4 attributes, a Thing_Id, a Key, a Value, and a Kind. The combination of the Thing_Id and the Key make up Data’s Primary Key. An

35

example of what a Data record could be used for would be to store a Link’s URL (the Link being a Thing).

● Rel: A Rel record stores the relationship between different Thing entities. A Rel record has 5 attributes, a Rel_Id (the Primary Key), Thing1_Id, Thing2_Id, a Name, and a Date. The two Thing*_Id attributes are references back to the Thing table to specify which Thing records are being related to each other. Note that one downfall of this structure is that it does not allow for variations on the “Relationship” records, as there is no Rel_Data table. If there were ever a need to store more information about a given Relationship, it would need to be placed on one of the related Things.

● Type: A Type is used simply to keep track of all types used in the code base, storing very little information, just it’s Type_Id and Name.

● Rel Type: A Rel Type is used to store the relationships between Types, and give this relationship an Id and Name field. Used in conjunction with the Type table, you can store the structure of all objects being used in the code base.

In the current version of Reddit, while there is much flexibility provided for the Data, Type, and Type Relationship tables, there are a fixed number of Things, and Relationships between those Things. You can see these below in Figure 1 below. Note that we kept the color scheme from the original LDS to help distinguish between Things and Relationships.

36

Figure 9 - Thing and Relationship Models

As eluded to before, these would have been the actual Database Tables in the original Reddit

37

schema, but now all of this is stored in just 2 tables (3 if you include Thing’s extra Data attributes). Besides just giving an idea of what Things and Relationships that the developers can work with, this also provides and easy way to see how to segment their data into multiple databases. For instance, this is the default partitioning of data on multiple Data Servers (Note that for the actual Reddit production environment, they probably have each object on it’s own Database Server):

● Objects stored on Main DB Server:○ Link, Account, Message, Save Hide, Click, Inbox Account Comment, Inbox,

Account Message, Moderator Inbox, Report Account Link, Report Account, Message, Report Account SubReddit, Jury Account Link, Ad, AdSr, Flair, and Promo Campaign

● Objects store on Comment DB Server:○ Comment, Subreddit, SRMember, Friend, and Report Account Comment

● Objects stored on Vote DB Server:○ Vote Account Link, Vote Account Comment, and Nearest Account

● Objects stored on Award DB Server:○ Award and Trophy

This partitioning allows the architects and developers to scale easier and get Reddit performing as fast as possible. One thing the architects mentioned was that they wished they would have built Database Sharding into the architecture when moving to this “Thing Database”, but now it’s going to be increasingly difficult. We’ll go into this more in the extensions section.

In terms of the proposed extension, it’s worth noting that a new thing was added to the Thing and Relationship diagram above, the Nearest Account relationship. These records will store the results of the Nearest Account Batch process, and will be one of the direct inputs into the Link Recommendation Batch process whenever it runs (noting that it can run much later than the Nearest Account Batch ran).

7.2.2 Data Flow

As stated at the beginning of the Information View, we won’t be going into detail here since we just wanted to show the unique Data Model of Reddit.

7.2.3 Data Ownership

Not Applicable to Reddit

7.2.4 Information Lifecycles


7.2.5 Timeliness and Latency


38

7.2.6 Archive and Retention

Not Applicable to Reddit

7.3 Deployment View

The Deployment view of Reddit defines the important characteristics of thesystem’s operational deployment environment. This view includes the details of theprocessing nodes that the system requires for its installation and proper runtime functionality.

Reddit consists of a highly distributed set of modules, processes, and third-party software systems. The main purpose of most of these systems is to enable an extremely large number of users to user the website at the same time without suffering any performance lag. This is accomplished by scaling each subsystem to handle more load, and by caching data in places where it can be accessed faster than by looking it up in a database.

7.3.1 Runtime Platform Model

The runtime platform model shows the physical systems that comprise Reddit. Lines indicate how the systems connect with each other. Folders indicate groups of physical servers that can be scaled to the desired number of servers. Components indicate where pieces of Reddit code are running to distribute data between system.

39

Figure 10 - Reddit Top-Level Deployment Diagram

40

7.3.2 Component Descriptions

In the table below, functional elements are mapped to the servers on which they reside. Note that the bulk of the application resides on the Python App Servers.

Update the table for new components from extension.

Functional Element Deployment Node

Search Batch Search Batch Server(s)

Mapreduce Batch Mapreduce Batch Server(s)

Precompute Batch Precompute Server(s)

Everything Else Python App Server(s)

Nearest Neighbour Batch Nearest Neighbour Batch Server(s)

Link Recommendation Batch Link Recommendation Batch Server(s)

HAProxy is the Load Balancer, which distributes the traffic across multiple Python Application Servers.

Memcache is used to store almost everything including database data, session data, rendered pages etc. When user requests something, the app server goes to the Memcache to see if the requested data is already available in cache. It is also used for global locking of nodes. It appears that they want to have ZooKeeper for global locking of nodes. But it does not seem to be implemented yet. Zookeeper has a tree system with guaranteed order of operations on those nodes.

Cassandra is used to store results of precomputed queries. These queries are picked up from the AMQP message queue. If the data is not available in the Memcache, the request is routed to Cassandra.

Reddit has a cluster of PostgreSQL backend database. These are their primary database servers.

When a user requests something the system does the minimal amount of work and tell the user the transaction is done. Then put the job in the AMQP message queue. The job knows a bunch of things to be updated for it.

Amazon Hadoop is used to run map/reduce jobs on link database. It is used for precomputing queries like top link listing of last one hour. And the results is stored in Cassandra.

41

Google search works in a way that is difficult to cache for. So Reddit has dedicated DB read only servers used only for Google search indexing.

Amazon cloudsearch is used for searching something within Reddit.

Amazon S3(Simple Storage Service) is used to store static content. .

Batch servers are bunch of servers which runs separate processes(batch jobs). These are used for mapping data from Primary database server to Read only DB server(used for google search). These batch server are also used for mapping data from primary database server to Hadoop. There are also Consume queue batch servers, which picks up jobs from the AMQP queue and process those jobs. For the extension, we are going to add a new Nearest Neighbor batch server which will calculate nearest neighbor accounts for all users at regular interval.

Link Recommendation Batch server will run after the Nearest Neighbor Batch server completes calculating neighbor accounts. This server will fetch data from the PostgreSQL database and calculate the link recommendations for each user and put the results in Cassandra.

7.3.3 Software Dependencies

In this section, software dependencies have been listed. These are the python software modules that are required in order for Reddit to run. Many of these are client libraries that are needed on the Reddit application servers in order to interface with external systems. Dependencies on external software not running on the Reddit application server have been omitted, as that information is not currently available.

Update table for any algorithms required by recommend links extension

Component Version

Routes 1.8

Pylons 0.9.6.2

webhelpers 0.6.4

boto 2.0

pytz n/a

pycrypto n/a

Babel 0.9.1

42

cython 0.14

SQLAlchemy 0.7.4

BeautifulSoup n/a

cssutils 0.9.5.1

chardet n/a

psycopg2 n/a

pycountry n/a

pycassa 1.7.0

PIL n/a

pycaptcha n/a

amqplib n/a

pylibmc 1.2.1-dev

py-bcrypt n/a

python-statsd n/a

snudown 1.1.0

l2cs n/a

lxml n/a

kazoo n/a

7.3.4 Network Model

Currently, Reddit runs entirely within the cloud. Note that the location where various pieces of Reddit are deployed is subject to change. The important thing to know here is that Reddit is remotely hosted in the cloud. Different aspects of Reddit are hosted on various Amazon cloud services. Amazon EC2 (Elastic Cloud Compute) hosts the dynamic content, memcache servers, PostegreSQL servers, cassandra servers, and zookeeper servers. Amazon S3 (Simple Static Services) hosts static content. Amazon Cloudsearch hosts search indices, such that when you search within Reddit, search results are delivered from Cloudsearch. Amazon Elastic Mapreduce hosts Hadoop Mapreduce. The details of each of these components is beyond the scope of this document. See http://aws.amazon.com/products/ for details on how each element works.

43

Figure 11 - Network Diagram

44

8. Architectural Views for Caching SubsystemScaling and performance are critical to the success of Reddit. Because of this, we decided to focus on the caching subsystem. The caching subsystem is responsible for setting and getting data to and from Reddit’s data storage devices. As mentioned in the deployment view, Reddit uses a 4-tiered database architecture, in which data is stored in progressively slower components.

Upon a request for data, the caching subsystem decides where a particular data block is located. When data is not found in a particular layer, it is responsible for getting the data from the next layer down and syncing the data into the layer in which it was not found. That means that the next time the data is requested, it will be access more quickly. The caching sublayer also is responsible for storing data. Typically data is saved into higher layers and then eventually saved into the PostgreSQL database. For both fetching and saving data, the concept of cache chains are used.

8.1 Functional View

8.1.1 Functional Elements

Reddit makes extensive use of caching so that it can avoid reading and writing to the database at all costs. Reading to and writing from memory is much faster, and Reddit’s goal is to reach a 100% cache hit rate.

In order to accomplish this, they manage a series of cache chain components. Each cache chain is composed of multiple cache modules. When a read data access occurs, each element is the cache is traversed until the relevant data is found. If it is not found in the cache, it is looked up in the database and then synchronized to the relevant caches. When a write data access occurs, data is written to all of the relevant caches and then written to the database.

The process is slightly different for writing and reading to and from the render cache. When Reddit assembles dynamic pages to be sent to the user, the page is written to the render cache chain the first time it is accessed. Upon subsequent requests of the same page, if the data on the page has not changed, then the page can be loaded from the render cache chain. If the data has changed, the the page is re-rendered and written back to the render cache chain.

There are 4 distinct cache chains used. Cassandra Cache Chain defines the order of caches to look in when reading or writing data that is backed by Cassandra. Hard Cache Chain defines the order of caches to look in when reading or writing data that is backed by PostgreSQL. Stale Cache Chain is the same as the Hard Cache Chain except that it is not backed by a “light”

45

database acting as a cache. It is assumed that items stored here more quickly go stale, and therefore do not need to be retained in the cache as long. Render Cache Chain is composed of a series of caches that store rendered html pages that are ready to be sent back to user requests.

There are five distinct cache chain components. The local cache is in-memory storage of content directly on the application server. The mem cache is in-memory storage of content on the memcache server. The version currently used is Cmemcache, which is a cache access module written in C for better performance. The hard cache is a “light” version of the database that is faster to access than the full database, and it used for only certain operations. The cassandra cache is a facade for a cassandra database column family - it is not actually a cache. The lock cache is another mem cache that stores locks on certain write transactions to avoid contention. Interestingly, one of our sources of documentation said that the lock cache was replaced by Apache Zookeeper. However, it looks like this change has not yet been effected in the live reddit code base.

Figure 12 - Caching Component Diagram

Element Name Global Cache Chain Manager

46

Responsibilities Manages high level configurations of cache chains. Populates cache chains references with real objects at runtime. Allows cache chains to be composed of an infinite number and combination of mem caches.

Interfaces - Inbound Global references - provide access to cache chain objects

Interfaces - Outbound Global references - accesses cache chain objects

Element Name Cassandra Cache Chain

Responsibilities Provides a caching layer for Cassandra-based data. Data stored here are mostly pre-computed queries.

Interfaces - Inbound Get/Set Data - Operations for searching the chain for the desired data, and setting data in all emements of the chain.

Interfaces - Outbound Get/Set Data - Operations for setting and getting data to and from specific cache elements.

Element Name Hard Cache Chain

Responsibilities Provides a caching layer for PostgreSQL-based data. Data stored here are not pre-computed and are page, user, or time specific.

Interfaces - Inbound Get/Set Data - Operations for searching the chain for the desired data, and setting data in all elements of the chain.


Element Name Stale Cache Chain

Responsibilities Provides a caching layer for PostgreSQL based data that are more likely to go stale than data in the Hard Cache Chain.

Interfaces - Inbound Get/Set Data - Operations for searching the

47

chain for the desired data, and setting data in all elements of the chain.


Element Name Render Cache Chain

Responsibilities Stores rendered html pages into it’s element caches.

Interfaces - Inbound Get/Set Data - Operations for searching the chain for the desired html page, and setting html pages in all elements of the chain.

Interfaces - Outbound Get/Set Data - Operations for setting and getting html pages to and from specific cache elements.

Element Name Cassandra Cache

Responsibilities A facade for reading and writing data to Cassandra itself. Stores data that is backed by Cassandra. It is not actually a cache.

Interfaces - Inbound Get/Set Data - Operations for getting and setting data in this specific cache.

Interfaces - Outbound Get/Set Data - Operations for getting data from and setting data into the memcache client library.

Element Name Hard Cache

Responsibilities A mem cache that is designed to store data backed by PostgreSQL.



Element Name Lock Cache

48

Responsibilities A memcache that stores locks on all transactions that are contention-sensitive. It prevents data being written to caches and databases that are currently in the process of being updated by an existing transaction.

Interfaces - Inbound Get/Set Data - Operations for getting and setting locks in this specific cache.

Interfaces - Outbound Get/Set Data - Operations for getting locks from and setting data into the memcache client library.

Element Name Local Cache

Responsibilities A dictionary of data sets that stores it’s data in memory. It provides the fastest access to it’s data.

Interfaces - Inbound Get/Set Data - Operations for getting and setting locks in this dictionary.

Interfaces - Outbound None

Element Name Memcache (CMemcache)

Responsibilities A memcache implemented in C for faster performance than python. It manages large volumes of data that are stored in memory in the memcache nodes. It is configurable in that data can be stored in an infinite variety of formats. It provides the basis of the hard, render, cassandra, and lock caches.



Element Name Memcache client library (memcache.Client)

Responsibilities Manages the low-level API calls to and from the physical memcache server process.

Interfaces - Inbound Get/Set Data - Operations for getting and setting data in a specific memcache server.

49

Interfaces - Outbound Get/Set Data - Operations for getting data from and setting data into the requested mem cache server process. It can interact with an almost infinite number of memcache servers.

Element Name Hard Cache Backend

Responsibilities A component that managed setting and getting data to and from a “light” version of PostgreSQL database that acts as the final level in the hard cache chain.

Interfaces - Inbound Get/Set Data - Operations for getting and setting data in this specific database.

Interfaces - Outbound Get/Set Data - Operations for getting data from and setting data into the database.

A class diagram showing the structure of the caching subsystem has also been included. It shows that there is a clear hierarchy of classes that delegate method calls to sub classes and aggregated classes. Another important thing to notice here is that most of the caches and cache chains inherit shared behavior.

50

Figure 13 - Caching Layer Class Diagram

8.1.2 Functional Scenarios

8.1.2.1 View the Front PageOnce a front page listing request gets into the hotlisting controller, it is delegated to ThingDB. ThingDB calls get_multi, which is responsible for returning the entire list of frontpage links that the user should see. The sole location that frontpage listings are located is the CassandraCacheChain. This is a prioritized list of caches, each of which may or may not contain the data we need. ThingDB generates a key value based on data from the request, which is used to search for the data.

The first cache checked is the LocalCache. This is a dictionary that is stored in-memory on the

51

python application server(s). If the data is found here, it is returned, but only after the CMemcache and the CassandraCache are synchronized. This ensures that the next time the data is requested, it is uniformly available to all caches. If the data is not found in the LocalCache, the CMemcache is searched. If found, the data is returned after the LocalCache and the CassandraCache are synchronized. Likewise, the last cache to search is the CassandraCache. This is simply an abstraction of the Cassandra database. The data must be located here, as it is the source of truth for all listings. Once fetched, the LocalCache and CMemcache are synchronized so that the next fetch does not need to go all the way out to Cassandra.

At this point, the data is returned to ThingDB, which returns the data to the model object that requested it, which in turn sends it on into the page renderer. While it is possible that the response is delayed because data is synchronized across the members of the cache chain before being returned to the requesting object, the cache components are numerous and fast enough that the performance hit is negligible.

Figure 14 - View the Front Page Sequence Diagram (Caching Sublayer)

52

8.1.2.2 Vote on an ItemIn this scenario, ThingDB gets a request to post a new vote to a Reddit comment or link. First, it makes a lock into the LockCache, so that write contention issues can be avoided with other user sessions. The thing object, in this case a vote, then makes a call to cache itself into the HardCacheChain. The HardCacheChain then loops through the LocalCache, the CMemCache, the HardCache, and the HardCacheBackend to iteratively set the data in each set of caches. After the data has been set in all of the caches, a call is made to the PostgreSql database manager to save the vote in its final resting place.

Figure 15 - Vote on an Item Sequence Diagram (Caching Sublayer)

8.1.3 System-Wide Processing

Not applicable.

8.2 Information View

8.2.1 Data Structure

Not applicable.

8.2.2 Data Flow

The data flow for the caching sublayer depicts how data flows through the caching elements when ThingDB needs to read and write data. During a read request, ThingDB talks to the

53

appropriate cache chain, which promises to search all of it’s component caches for the requested data. If the data is found, it is returned to the calling ThingDB object. In the case of dynamic queries, If it is not found, the data is queried in PostegreSql. Precomputed queries are guaranteed to find their data in the CassandraCache. During a write request, data is written sequentially to all elements of the requisite cache chain. For dynamic queries, data is then also written to PostgreSql.

Figure 16 - Caching Data Flow Diagram

8.2.3 Data Ownership

Not applicable.

8.2.4 Information Lifecycles

When a new piece of data is added to Reddit, for instance a new link is posted, it is initially uncached. The data is then cached into the appropriate cache, for example, cache A. The data remains cached until it is updated in some other cache, say cache B, in the cache chain. At that point it becomes dirty. This means that the data in cache A can no longer be reliably read and sent back to a user. It therefore becomes uncached. The cache chain will eventually set the data from cache B into cache A, returning A to the cached state.

54

Figure 17 - Caching State Diagram

8.2.5 Timeliness and Latency

Not applicable.

8.2.6 Archive and Retention

Not applicable.

55

9. Perspectives

9.1 Top-Level System

9.1.1 Scalability

Reddit has grown massively from being a startup in 2005 to 3.4 billion page views in a month. This shows scalability is probably the most important perceptive, and not being able to handle the increasing amount of data can lead to the road of failure.

We should be able to easily add the applications server to handle the steadily increasing load. To do that we are not going to save any state in the application server. This will eliminate the need for each server to share the state with all other servers. We will use Memcache to save all states. So if one application server is down, the Load balancer can route the traffic to another available application server and that new server can get all the required state information from Memcache. If required a new Memcache server is easy to add.

The database schema is very simple and such that it is possible to separate each entity object on different database server. Information View in section 7.2 above, provides details about how the schema allows for scalability. It makes it possible to have multiple server for an entity object if needed. For example, the comment database may be a huge one. We can easily add more more servers for the comment database to handle the load. Separating out entities will help us scale each one separately as needed.

We host the entire Reddit on the cloud. This provides access to the infrastructure needed to scale. It is easier to add additional hardware to meet the performance requirement while handling the increasing load. This will make comparatively smoother path to scalability because of access to the resource.

In terms of the link recommendation extension, the scalability should not be appreciably affected. The additional nodes required to perform the link recommendation calculations will be able to scale in the same way that the rest of the reddit architecture will scale.

9.1.2 Evolvability

As previously mentioned in section 3.4, the ability to evolve the functionality and features in Reddit is a fairly important goal. This is due to the nature of the competitive social news landscape on the internet today. There are multiple other websites that provide the same basic functionality, so it’s important that Reddit can continue to evolve it’s feature set to keep with or

56

stay ahead of the competition.

That said, the biggest way that they planned for feature evolution was the way they designed the Database Table schema. They were originally creating tables for each of the entities, and it was causing them to have very painful data migrations any time they wanted to change the table schema (basically whenever they wanted to add new features). After several of these extremely painful upgrades, they decided that it was worthwhile to radically change the structure of their database to allow for more Evolvability.

To do this, they changed their table structure to only be 5 tables total - Thing, Data, Rel, Type, and Rel Type. This allows the original entity based database structures to be stored in the first 3 tables (Thing, Data, Rel). A given entity was split up by storing the generic data in the Thing data that is common to most Reddit entities (things like upvotes, downvotes, spam, date and deleted fields), and any extra fields would be stored in the Data table. From there, the relationships between those entities were stored in the Rel table (note that this design did not allow for relationships to have any custom attributes). Now, when any new “Thing”, any new attribute on existing Things, or new relationship between Things are needed, the table structure doesn’t need to change (hence no need for extremely nasty data migrations). This allows them to push new features at a much faster pace, without having to worry about changing the database structure.

In terms of the link recommendation extension, the evolvability of reddit will not be appreciably affected. If anything, the system will become slightly more evolvable. This is because “new” and “cool” and “hot” algorithms should be able to replace the existing nearest neighbor and recommendation engine algorithms while maintaining the same interfaces to other components.

9.2 Subsystem

9.2.1 Availability

Availability of the caching subsystem is defined as ensuring all of the physical components that implement caching of data are online and functioning as expected. This is a very important quality because Reddit relies on caching to allow the site to be responsive, despite tens of thousands of simultaneous requests. The caching subsystem is also an intermediary for storing write data, and it is possible that if the caching subsystem went down, data that a user thought they had saved would not actually be saved.

If the caching subsystem were to be down, or in some other way degraded, many or all requests for data would have to travel to the backend PostgreSql database. This would be so slow that many users would see Reddit as completely down, and would see one of the funny pictures that indicate such events. Requests for pre-rendered html would also no longer be available, so all pages would need to be re-rendered. This would also cause an extreme performance degradation.

We already know that the physical caching subsystem consists of instances of Memcache

57

running on dedicated servers to store data from standard queries. It also consists of instances of Cassandra running on dedicated servers to store pre-computed data from complex queries. Since availability of the caching subsystem is crucial, we know that we need to have multiple instances of these elements running to achieve high availability:

1 Balance load across user requests. If we send too much traffic to any one node, then the system will become ineffective.

2 Provide layers of caches so that data can be automatically retrieved from sublayers if not found in the current layer. The calling application should not need to know what is configured in a cache chain, how many nodes it contains, or which ones are up and which ones are down.

3 Distribute keys across different nodes and provide redundancy. If one node goes down, it should not take down all data references; data keys should be spread out. Furthermore, if one node goes down, it should have a set of peers in place ready to fill in.

9.2.2 Performance

The entire motivation for creating a cache is performance and the opportunity to reuse values that have been computed and would be needlessly expensive to recompute. The first thought is to create an in-memory cache for each application process, but in since reddit achieves availability and scalability by having many application processes, the cache entries would have to be broadcast amongst themselves to stay in sync and get everyone the benefit, and that is itself has a performance impact in addition to being messy and complicated. This led to a decision to create a stand-alone memcache that is shared by all application instances deployed on its own servers.

This idea can be expanded to values that were never computed by an application instance. But since we don’t know in advance which data to compute, the amount of entries is larger than you’d want to fit in memory. For this reason, an instance of memcachedb was originally put in to store pre-computed values, later to be replaced by Cassandra (fulfilling exactly the same role) for availability reasons.

As a last resort, the database can always be used to compute values. To abstract away the various tiers of performance offered by these solutions, a cache chain is made available to the application instances that looks through these in order.

58

10. Appendices

10.1 Appendix A: Decisions and AlternativesOne of the core concepts of Reddit is that all of it’s information is user-driven content. All links, comments, subreddits, and votes originate from the community of Reddit. Even the collection of subscribed subreddits is entirely up to the user; outside of a few tools to help a user choose from the most popular subreddits, it is entirely up to each user to find the subreddits they would be interested in subscribing to.

One way of making this process much easier on the users would be to create a Link Recommendation engine that would scrape the vast amounts of user data to recommended specific links that a user might not otherwise find. For instance, Reddit could take all user upvote data, run a daily or weekly job, and report back to each registered user links that aren’t in their subscribed subreddits that users like them upvoted. Not only that, but paying attention to how the user responds to these recommendations could be further data used to input back into the Recommendation Engine. So, if a user upvotes one of his recommended links, that means the recommendation engine made a good prediction, and it would further reinforce its assumptions. If the user downvotes the recommended link, it could be used to slightly modify the algorithm. If a user keeps upvoting recommended links from a given subreddit, it’s a strong indication that they might want to actually subscribe to the subreddit. In a roundabout way, it solves the subreddit discovery problem.

Another positive of this Recommendation Engine would be to further entice user registrations. Normally, even logged out users can go to each subreddit that they find interesting, but having individual link recommendations might be the tipping point for them to create a user. And with more registered users comes more dedicated users and higher revenue from targeted ads.

As for the changes, at a top level, this would most likely require a new component for a Recommendation Engine at the Top-Level Component Diagram. It could be run at spaced intervals (so, perhaps once a week) for each user, staggered throughout the user base so it wouldn’t be crunching data for all users at once (so compute recommendations for 1/7 the users each day). This additional component would take information from the Postgres databases (remember, this is not a time sensitive operation), run it’s computations, and store the recommendations in a new Thing object, which would store the results as precomputed queries in a set of Cassandra nodes. Although these are the ways we expect the system to change, we realize a more thorough investigation is required.

Note: for each alternative, the closest neighbor batch design will not change, as so is not described in our alternatives. It is documented in section 7 above.

59

10.2 Alternative 1: The Application Server Approach

In this alternative, the recommendation controller itself will compute the link recommendations for each user. This will happen on the fly when the user goes into their link recommendations panel. This controller will continue to make use of the closest neighbor calculations to find the recommendations. As shown in figure A.1, the recommendation engine pulls data from links and votes, find the recommendations based on the closest neighbor batch, and sends the results to the user and to the cassandra cache chain. Figure A.2 shows how this approach maps to the top-level functional component diagram.

The positives for this approach are as follows:5 Data currency. Since the link recommendations are computed directly when the user

navigates to or refreshes the link recommendation panel, the links are going to be as up-to-date as possible.

6 Cost. Compared to the other alternatives, there is not as much new hardware required, and the functionality will be much simpler to design and implement.

The negatives for this approach are as follows:1 User-perceived performance. The user will need to wait for the calculations to complete

before they can see the contents of the page. Because we expect the calculations to be time consuming, this will lead to severe page load delays.

2 Scalability. The servers will have to do more processing for this alternative, and the link recommendations are tightly coupled to the remaining functionality. It is therefore harder to scale the application. Additionally, it is likely that any given application server will hit the performance elbow, whereby the server will basically freeze attempting to service its existing requests.

3 Availability. If the recommendation engine goes down, then it is likely that the entire application server will go down. This is more likely to happen with the application servers handling additional burdens. Measured across hundreds of servers, this indicates greatly reduced availability.

4 Evolvability. As mentioned above, since the recommendation functionality will be tightly coupled with the existing reddit application logic, it will be much more time consuming and technically challenging to implement new algorithms, performance enhancements, and code fixes.

Based upon the above analysis, it is not advisable to choose this option. While it may possibly be quicker to implement and have better data currency, the risks to performance, scalability, availability, and evolvability are too great to justify it.

60

Figure A.1 - The application server approach dataflow diagram

61

Figure A.2 - Top-level functional view modified for alternative #1

10.3 Alternative 2: The Incremental Update Approach

This alternative was an attempt at making the link recommendations as updated as possible, while providing better performance than alternative 1. To do that we will recalculate the link recommendations each time a new vote comes in and store the pre-computed results in Cassandra. As shown in figure A-3, we will be using the Closest Neighbor batch process(mentioned in the accepted proposal), which will find the nearest neighbor accounts for

62

a user. Whenever there is an up vote on a link by the nearest neighbor, the vote handler will call recommendation engine. The recommendation engine will look into the nearest neighbor, link and vote database, to check if the link is up voted by the nearest neighbor of a user and is not voted by the user then this link will be recommended to the user. The recommendation engine will send this link recommendation to the Vote Handler, which in turn will update Cassandra Cache Chain. Figure A.4 shows how this approach maps to the top-level functional component diagram.

This one has the main advantage of more updated recommendation compared to our accepted proposal. It also has better performance than the alternative 1, because we will calculate the recommendation each time a user submits vote and not when the user asks for recommendation (by clicking on the recommendations panel) as in alternative 1. It has better availability than alternative 1, because even if the recommendation engine goes down the older link recommendation will always be available to the user. Also the unavailability of this component would not hurt the availability of the rest of components serving the other functionality. Scalability - The Vote handler servers will have to do additional processing for this alternative, as the recommendation engine is tightly coupled to the Vote handler functionality. Therefore it may be harder to scale the application, because vote handler is more likely to hit the performance elbow. On the other hand we will be adding new recommendation batch servers for the recommendation engine, which is going to be easier to scale. So we would say the scalability is going to be neutral for this alternative.

However, this alternative has two disadvantages: 1. Evolvability: Since we are modifying existing Vote handler component, adding a new functionality is not going to be easy. Since the recommendation functionality will be tightly coupled with the existing vote handler logic, it will be much more time consuming and technically challenging to implement new algorithms and fix any bugs.2. Cost: Since this alternative is more complex and more time consuming. We will be spending more on the developers to implement this functionality than our accepted proposal. Even though this alternative has several benefits, it is going to be more time consuming and cost us more than the accepted proposal. This is the main reason for not going with this alternative.

63

Figure A.3 - The incremental update approach dataflow diagram

64

Figure A.4 - Top-level functional view modified for alternative #2

65

Reddit Architecture and Extension Proposal

Documents

Transcript of Reddit Architecture and Extension Proposal