0to100 in 18 months
Transcript of 0to100 in 18 months
MoodleMoot 2013 Grzegorz Dostatni
0 to 100 in 18 months
Introduction
Incidents
Questions
Architecture and Setup
Organization
What is next?
Results
Agenda
Introduction
In 2010 University of Alberta was looking for a new LMS. Objectives:- Reduce licensing costs- Create a service people want to
use- Improve service reliability- Collaborate with other institutions
across the province and beyond
Organization
Vice-Provost Information Technology - Sponsor
Centre for Teaching and Learning - Application Support and Development
Academic Information and Communication Technologies - System Administration, Database Administration, Networking
And Many more...
Developers • New features• Bug fixes• Frequent Updates
(See their talk at 3:30 Collaboration without Compromising)
Organization
VS.
System Admins • Stability• Security• Redundancy
ResultsUptime 99.992%
On Oct 1, 2012: 508,681 page views 3.8 million apache hits 0.332 s average page return time
ResultsDaily Page Views
Total Number of page views 65 million
ResultsDaily Unique Visitors
Total number of unique visitors who have logged in at least once: 54,844
ArchitectureDecisions
Infrastructure Hardware considerations - 6 physical hosts- VMWare Cluster- Application cluster behind Hardware
Loadbalancers- Failover systems in another data centre- BigIP F5 loadbalancers- EMC CX-4 fiber attached storage
ArchitectureDecisions
Software Software Decisions - Ubuntu LTS- Postgresql 9.0- Database in a VM- FIleserver replication using LVM and
DRBD- Hourly (nearly) backups with 30 day
retention- Backups happen on Standby servers- eAcceletor- "make everything as simple as possible,
but not simpler"
ArchitectureAll Environments
3 production Environments: • Main
Production • Archive (old
content) • CPD (non
credit)
ArchitectureProduction
Defined scalability paths • Adding nodes
to cluster • Increasing
performance of DB
ArchitectureBackups
Consistent backups require a snaphot in time of both DB and FS. Using Database Point in Time recovery to achieve synchronization
ArchitectureMonitoring
Catch problems before they turn into outages
All machines are monitored for - CPU load- Disk - Free MemoryDatabase
- Postgresql errors- Long running processesFileserver
- DRBD Mirror StatusApplication
- Number of apache processes
IncidentsOutages
1 hour outage when our UPS failed
0.5 hour outage cause unknown (disk contention?)
IncidentsProblems
Missing Database Index after upgrade to 2.2.3 DRBD Replication failure Moodle Cron issues
What is next?
Suggestions for improvements - moodle cron.php- Application functional testing- Application monitoring- create our own Ubuntu repository- scaling up and clustering Postgresql
Questions?
More information, including scripts, documentation, Disaster Recovery procedures, installation instructions, please go to http://www.ualberta.ca/~dostatni/moodlemoot2013