0to100 in 18 months

Post on 15-Aug-2015

15 views 1 download

Tags:

Transcript of 0to100 in 18 months

MoodleMoot 2013 Grzegorz Dostatni

0 to 100 in 18 months

Introduction

Incidents

Questions

Architecture and Setup

Organization

What is next?

Results

Agenda

Introduction

In 2010 University of Alberta was looking for a new LMS. Objectives:- Reduce licensing costs- Create a service people want to

use- Improve service reliability- Collaborate with other institutions

across the province and beyond

Organization

Vice-Provost Information Technology - Sponsor

Centre for Teaching and Learning - Application Support and Development

Academic Information and Communication Technologies - System Administration, Database Administration, Networking

And Many more...

Developers • New features• Bug fixes• Frequent Updates

(See their talk at 3:30 Collaboration without Compromising)

Organization

VS.

System Admins • Stability• Security• Redundancy

ResultsUptime 99.992%

On Oct 1, 2012: 508,681 page views 3.8 million apache hits 0.332 s average page return time

ResultsDaily Page Views

Total Number of page views 65 million

ResultsDaily Unique Visitors

Total number of unique visitors who have logged in at least once: 54,844

ArchitectureDecisions

Infrastructure Hardware considerations - 6 physical hosts- VMWare Cluster- Application cluster behind Hardware

Loadbalancers- Failover systems in another data centre- BigIP F5 loadbalancers- EMC CX-4 fiber attached storage

ArchitectureDecisions

Software Software Decisions - Ubuntu LTS- Postgresql 9.0- Database in a VM- FIleserver replication using LVM and

DRBD- Hourly (nearly) backups with 30 day

retention- Backups happen on Standby servers- eAcceletor- "make everything as simple as possible,

but not simpler"

ArchitectureAll Environments

3 production Environments: • Main

Production • Archive (old

content) • CPD (non

credit)

ArchitectureProduction

Defined scalability paths • Adding nodes

to cluster • Increasing

performance of DB

ArchitectureBackups

Consistent backups require a snaphot in time of both DB and FS. Using Database Point in Time recovery to achieve synchronization

ArchitectureMonitoring

Catch problems before they turn into outages

All machines are monitored for - CPU load- Disk - Free MemoryDatabase

- Postgresql errors- Long running processesFileserver

- DRBD Mirror StatusApplication

- Number of apache processes

IncidentsOutages

1 hour outage when our UPS failed

0.5 hour outage cause unknown (disk contention?)

IncidentsProblems

Missing Database Index after upgrade to 2.2.3 DRBD Replication failure Moodle Cron issues

What is next?

Suggestions for improvements - moodle cron.php- Application functional testing- Application monitoring- create our own Ubuntu repository- scaling up and clustering Postgresql

Questions?

More information, including scripts, documentation, Disaster Recovery procedures, installation instructions, please go to http://www.ualberta.ca/~dostatni/moodlemoot2013