SYSTEM ADMINISTRATION COMPUTER SCIENCE DEPARTMENT COLORADO STATE UNIVERSITY COMPUTER SCIENCE...

20
SYSTEM ADMINISTRATION COMPUTER SCIENCE DEPARTMENT COLORADO STATE UNIVERSITY COMPUTER SCIENCE DEPARTMENT WAYNE TRZYNA FALL 2012 CT 320: Network and System Administration

Transcript of SYSTEM ADMINISTRATION COMPUTER SCIENCE DEPARTMENT COLORADO STATE UNIVERSITY COMPUTER SCIENCE...

SYSTEM ADMINISTRATIONCOMPUTER SCIENCE DEPARTMENT

COLORADO STATE UNIVERSITYCOMPUTER SCIENCE DEPARTMENT

WAYNE TRZYNAFALL 2012

CT 320: Network and System Administration

CT320: Fall Semester 2012

2

Topics

1. Organization2. Hardware and Facilities3. Operating Systems4. Processes5. Applications6. Scripting7. Networking & Security8. Wrap-up

9/25/12

CT320: Fall Semester 2012

3

Organization

People: 21 faculty, 5 instructors, 13 staff members, 350 undergrads, 200 grad students

Systems: 600+ total, ~450 Linux, ~170 Windows, ~40 Macs, mainly HP workstations and servers

Other: ~17 shared printers, ~33 switches, wired and wireless networks, Linux lab, Windows lab

Staff: Wayne Trzyna (full time, ~35 years), Paul Hansen (half time, ~10 years), 3 half-time GSAs

Users: Password file currently has ~2500 passwords, 100’s of new accounts every semester

9/25/12

CT320: Fall Semester 2012

4

Sysadmin (1)

Sysadmin group is responsible for large diverse set of hardware and software systems!

Graduate assistants handle first level support: new accounts, password problems, disk quotas, etc.

Graduate assistants train each other, generally they work in organization for 1 to 3 years.

Network security group has another set of servers and systems, and their own sysadmin team.

Contact sysadmin via email ([email protected]) or by walking into upstairs office.

9/25/12

CT320: Fall Semester 2012

5

Sysadmin (2)

Responsibilities include:Procurement of hardware and softwareInstallation of hardware and softwareMaintenance of hardware and softwareManagement of periodic processesProviding support for usersWorking with facilities (new building!)Troubleshooting problemsSoftware licensingStrategic planning

9/25/12

CT320: Fall Semester 2012

6

Server Room

Locked room with three high capacity HVAC units (19 tons total cooling), 50KVA Uninterruptable Power Supply (good for about 20 minutes), and 10Gb fiber optic network link to campus and the internet.

Facilities include rack-mounted and stand-alone production servers, along with extra servers for testing and development, tape loaders for backup, network ‘backbone’ switch.

Core services include file service, directory service, database (mysql), print service (cups), web-service (apache), software licence service, etc.

Servers run Red Hat, vs Fedora, for stability.compute servers include: 78-node single-processor 4-core

system (lattice), and several 4-processor many-core large-memory (512 Gb) machines. 9/25/12

CT320: Fall Semester 2012

7

Operating Systems

Fedora, Red Hat, W2K, WinXP, Win7, MacOs operating systems supported.

Philosophy is to minimize versions of operating systems and apps. ‘Never ending battle of consolidation.’

OS upgrades often move or rename configuration files, change command parameters, etc.

Many problems caused by dependencies on specific versions of shared libraries, duplicate libraries, etc.

Multiple versions of libraries (and compilers) must often be maintained.

9/25/12

CT320: Fall Semester 2012

8

Processes

Cloning and installation of Linux and Windows operating systems.

Monitoring of disk quotas/space, expired accounts/access, file logging/pruning, security.

Automatic backup of user data which is stored on servers and mounted remotely to workstations.

Automatic distribution of passwords, groups, and other account information to workstations.

Primarily depends on cron to implement periodic processes.

9/25/12

CT320: Fall Semester 2012

9

Processes: OS Updates

Signature process is the continual update of operating system and application software.

Follows major Fedora releases, approximately on a schedule of every six months.

Security requires kernel and other patches to be pushed to system every week.

Basic strategy is to remote mount user data from servers to avoid hosting storage on workstations.

Avoids the need to backup individual workstations, allows system software to be updated cleanly.

All fedora clients run the same cloned system image.

9/25/12

CT320: Fall Semester 2012

10

Processes: Cloning (1)

1. Must initially produce a source system with the latest software, applications, drivers.

2. Knowledge is maintained with an evolving document that describes process (1000+ lines).

3. After creation/verification of the system, an image is copied with the dump command.

4. The target system has its disks formatted and partitioned (root, boot, swap, tmp).

5. The restore command is then used to copy the image to the target system.

6. All this is built based on scripted modifications to the Fedora rescue CD. 9/25/12

CT320: Fall Semester 2012

11

Processes: Cloning (2)

6. First, the target system is booted from a RAMDISK image on the customized rescue CD.

7. Then the disk is repartitioned, and filesystems and swap-space are initialized.

9. Next the system clone-image is restored onto the root partition.

10. A customization script sets up IP addresses, host names, etc., and creates the boot block for GRUB.

11. Finally the system is rebooted from the new disk image. 9/25/12

CT320: Fall Semester 2012

12

Processes: Backup

Backup is primarily limited to server machines, since only servers store user data. We also back up one of each unique client clone image.

Servers (bach, parsons, chopin) each have around 5TB of data in multiple partitions.

Backup is via LTo4 tape drive with approximately 1.6TB capacity per tape.

Backups use homebrew system built on dump and restore. (Many organizations use open source or commercial packages: Amanda, Veritas, Tivoli, etc.)

Full backups are performed weekly, incremental backups daily. Fulls and incrementals are interleaved on daily tapes.

9/25/12

CT320: Fall Semester 2012

13

Processes: Miscellaneous

Printing: CUPS server and clientsAccounts: MOAA – Mother of All Accounts

Maintains a NDBM database of accounts CSUID, full-name, login, expiration, etc Set of C programs developed internally Builds system data-structures such as passwd file,

group file, etcX Windows: Problematic nVidia driversWeb: Apache server on parsonsWikis: MediaWiki and other solutionsVirtualization: Limited use within department

(but many other organizations have embraced it)9/25/12

CT320: Fall Semester 2012

14

Scripting

Automated processes are implemented as scripts, many of which are home grown.

Scripts are written using csh or bash or Perl languages, and some C programs.

Limited use of Python in house, but some 3rd party scripts use it.

Scripts are documented and must continually be updated as processes and operating systems evolve.

9/25/12

CT320: Fall Semester 2012

15

Scripting Example

echo_status "Mounting root filesystem"status_busymkdir ${TGT_DIR}mount ${CLONE_DEVICE}${ROOT_PARTNO} ${TGT_DIR}status_done

echo_status "Restoring from clone image"status_busycd ${TGT_DIR}gzip -d < ${IMAGE_DIR}"/"${CLONE_IMAGE} | restore rf -status_done

echo_status "Cleaning & Configuring ${CLONE_HOST}"status_busy#leftovers from restorerm -f ${TGT_DIR}"/restoresymtable"rm -f ${TGT_DIR}"/.autofsck"

9/25/12

CT320: Fall Semester 2012

16

Crontab Example

20 * * * * /usr/lib/sendmail -Ac -q## common server stuff starts here:#15 4 * * * find /tmp/ -ctime +7 -a -exec rm -f {} \; > /dev/null 2>&115 4 * * * find /var/tmp/ -ctime +7 -a -exec rm -f {} \; > /dev/null 2>&130 * * * * /s/bach/i/sys/sa/cron/check-xsession-errors0,30 8-22 * * * /s/bach/i/sys/sa/cron/check-load 4.05,15,25,35,45,55 * * * * /s/bach/i/sys/sa/cron/uptime-log## cronjobs specific to chopin start here:##01 0 * * 6 (/usr/bin/find /s/chopin/ -name core -type f -depth -exec /bin/rm -f {} \; ) > /dev/null 2>&1#01 0 * * 7 (/usr/bin/find /s/chopin/ -name 'core.[0-9]*[0-9]' -type f -depth -exec /bin/rm -f {} \; ) > /dev/null 2>&10 1 * * * (cd /usr/local/etc/dumps/bin; ./run.bach )55 * * * * /s/bach/i/sys/sa/cron/check-dns40 17 * * 1-5 /bin/csh -c "(cd /s/bach/i/sys/sa/hosts ; ./hosts.update > & ./logfile)"0 8 * * * /usr/local/etc/dumps/bin/check_logs.pl --config=/usr/local/etc/dumps/lib/check_logs.cfg 1>/dev/null 2>/dev/null0 0 1 * * /usr/local/etc/log-rotate…

9/25/12

CT320: Fall Semester 2012

17

Applications

Must support diverse requirements for applications and tools.

Multiple versions of Firefox, GNU compilers, Java compilers, Eclipse tool, etc.

Operating heuristic is to require a Fedora RPM for each application, if possible.

Avoid 3rd party RPM packages, build directly from source instead.

Applications that are not shipped with Fedora reside in /usr/local.

Examples: android, apache, cuda, ffmpeg, hadoop, lapack, netbeans, python, R, scipy, tau, …

9/25/12

CT320: Fall Semester 2012

18

Networking & Security

Outside network connection is managed by ACNS, not the department.

Some level of protection provided by firewalls at ACNS border router and department router interfaces.

Must continually stay one step ahead of hackers by keeping patches up to date.

Homebrew “Tripwire” system monitors system files for unexpected changes.

Require users to select good passwords, enforced by password command.

Linux is not as susceptible to viruses, Windows systems protected by Symantec.

9/25/12

CT320: Fall Semester 2012

19

Wrap-up

What are the essential skills required?Handle pressure wellGood people skillsBroad technical backgroundTroubleshooting skillsOrganization and design skillsWhat is satisfying about the job?Immediate feedbackIndependence (If you keep things running

well, people leave you alone.)Excitement! 9/25/12

CT320: Fall Semester 2012

20

Questions?

Add text here.

9/25/12