Ganga Core: Status Jakub T. Moscicki ARDA/LHCb LHCb Software Week, September, 2005.
-
Upload
ralf-waters -
Category
Documents
-
view
216 -
download
0
description
Transcript of Ganga Core: Status Jakub T. Moscicki ARDA/LHCb LHCb Software Week, September, 2005.
Ganga Core: Status
Jakub T. Moscicki
ARDA/LHCb
LHCb Software Week, September, 2005
Ganga Overview
AtlasPROD
DIAL
DIRAC
LCG2
gLite
localhost
LSF
submit, kill
get outputupdate status
store & retrieve job definition
prepare, configure
Ganga4
JobJobJobJob
scripts
Gaudi
Athena
AtlasPROD
DIAL
DIRAC
LCG2
gLite
localhost
LSF
+ split, merge, monitor
Release ScheduleGanga 4 Jul 8 4.0.0-beta1 Aug 8 4.0.0-beta2 Sep 7 4.0.0-beta3 Sep 23 4.0.0-beta4
Apr 16 4.0.0-alpha1 May 3 4.0.0-alpha2 May 13 4.0.0-alpha3 May 30 4.0.0-alpha4 Jun 6 4.0.0-alpha5 Jun 10 4.0.0-alpha6 Jun 24 4.0.0-alpha7 Jul 5 4.0.0-alpha8
Ganga 3 Mar 11 3.0.0 Apr 1 3.0.1
beta series: fully operational, public pre-release - bugfixes, testing, missing features - stability: config files, repository backwards compatibilityaudience: - tested/used ~10 users in LHCb, Atlas and outside - encouraged to be tried by everybody, no setup needed
alpha series: prototype with frequent and incompatible changesaudience: internal developers
discontinued
Testing/stability
• Core testing:• automatic test suite (61 test cases)
• unit tests / invariant tests / integration tests• bugfix tests
• "a bug report have a test-case" policy• subsystem stubs (test submitters, transient repository)
• Extensions testing:• use-case tests in preparation (published as LHCb-2005-027 note)
• Release compatiblity:• automatic repository regression testing• GPI/config compatiblity policies
Project Structure
• Framework• Submission logic• Monitoring• JobRepository• FileWorkspace• Utilities (config, logging)• Interfaces:
• interactive shell• command line / scripts• embedding / library
• Plugins• Applications• Backends• Datsets
Project Structure
• Release area:• /afs/cern.ch/sw/ganga/install/slc3_gcc323/4.0.0-beta4
• bin/ganga• python/Ganga• core framework• Local, LSF, LCG, gLite backends• Executable application• python/GangaLHCb • Gaudi,DIRAC plugins• python/GangaAtlas• Athena, ADA plugins
[Configuration] RUNTIME_PATH = GangaLHCb:/myarea/GangaAtlas
Configuration
• Config file: ~/.ganga4• Default template is well documented.
• Configurable features:• plugin location• hierarchical logger levels• polling rate (15 seconds)• repository configuration (local/remote)• file workspace (job input/output location)• VO• software versions• plugin specific parameters• Relevant command line options• -c cfgfile• -o[Repository]type=Remote• -o[Logging]GangaLHCb.Lib.Dirac=DEBUG
Command line
ganga -h*** Welcome to Ganga ***Version: Ganga-4-0-0-beta4Documentation and support: http:/cern.ch/gangaType help() or help('index') for online help.
usage: ganga [options] [script] [args] ...options: --version show program's version number and exit -h, --help show this help message and exit -i enter interactive mode after running script -cFILE read user configuration from FILE (default ~/.ganga4) -g, --generate-config generate a default config file, backup the existing one -oEXPR, --option=EXPR set configuration options, may be repeated mutiple times, for example: -o[Logging]Ganga.Lib=DEBUG -oGangaLHCb=INFO -o[Configuration]TextShell = IPython FIXME: PATH-like variables are reset and not appended to (this behaviour is different from config file behaviour) --quiet only ERROR messages are printed --very-quiet only CRITICAL messages are printed --debug all messages including DEBUG are printed --no-prompt never prompt interactively for anything except IPython (FIXME:) --no-rexec rely on existing environment and do not re-exec ganga process to setup runtime plugin modules (affects LD_LIBRARY_PATH)
Interfaces
• Interactive Shell IPython: <TAB>, coloring, history, editing, direct shell access Automatically generated GPI help index
• Scripting ganga script.py interpreter
• Embedding
#!/bin/env gangaprint jobs
from Ganga.Runtime import GangaProgramprog = GangaProgram()prog.bootstrap()from Ganga.GPI import *
GPI
• Ganga Public Interface: GPI– high-level, user-friendly Python API for job manipulation– combines
consistency and flexibility of programming language interface clarity and ease of use
Ganga.Core
GPI
GUI CLIP SCRIPT
GPI
• Hello World>>> job = Job()>>> job.application.exe='/bin/echo'>>> job.application.parameters=['hello world'])>>> job.submit()submitting job
>>> outfile = file(job.directory+'/output/std.out')>>> print outfile.read()Job started at: Fri Feb 18 14:05:32 2005Processing input files.../bin/echo Donehello worldApplication executed with the status code 0Processing output files...Exiting...Job finished at: Fri Feb 18 14:05:32 2005
>>> job2 = job.copy()>>> job2.backend = “LSF”>>> job2.submit()
GPI
• Inspecting the jobs>>> print job.id5
>>> print jobsStatistics: 5 jobs jobs-------------- ID status name# 1 completed# 2 new Job20041231647334751371267881024# 3 completed Job2004123165980541363751081024# 4 submitted Job2004123182494941363292601024# 5 completed Job20041231842266811363443961024
>>> for j in jobs[1:3]:... print j.id12
GPI
• Complex scenarios
>>> j = Job()>>> j.application = DaVinci()>>> j.application.options = 'my.opts'>>> j.backend = Glite()>>> j.backend.requirements = 'other.GlueCEUniqueID == "grid- ce.desy.de:2119/jobmanager-lcgpbs-short"'
>>> for i in range(100): j = Job()
Ganga Tool vs Framework
• Ganga is a lightweight user tool easy to install (pure-python) “designed and optimized” for users GPI has a syntax (users have to judge):
• j.application, j.backend, j.id, j.submit(), …. Etc
• But also: Ganga is a developer framework Plugin model
• independent and rapid development of handlers (backends, applications) Promote but not force common GPI abstractions
• We do not require nor invent abstract base classes which are least common denominators between systems, example:
– you may implement very complex application (e.g. ADA) and enable submission to DIAL only if that’s your main case
• the design of framework does not attempt to match all possible applications with all possible backends
But: enable to build common tools on top of GPI: GUI, scripts,…
Some Design Principles
• Avoid shared environment example: in LCG environment LD_LIBRARY_PATH is incompatible
with some application environments solution: LCG backend handler uses a private, cached environment
• Don't force common abstractions upfront application <-> backend are connected via adapters (runtime
handlers) in most cases adapters are shared (thus their number is reduced)
Adapters: 7 vs 11 vs 20
X63LCG/glite
7XXXDIAL
XX4XDIRAC
XLSF
X521Localhost
ADAAthena Gaudi (DaVinci,Gauss,
Boole,…)
executable (any script)
Summary
• Factsheet (4-0-0-beta4)– size:
Ganga base: ~400KB, pure-python (no install) Atlas and LHCb extensions: ~100KB
– existing functionality: basic job manipulation via GPI easy configuration / extension local and remote registry, local workspace Lib:
– local host, LSF, LCG2, DIRAC, glite– Gaudi (DaVinci, Gauss,...) , Athena, DIAL, Ada
– future functionality: GUI splitting/merging asynchronous job submission (remote job manager)
http://cern.ch/ganga
Backup Slides
Ganga Architecture
Ganga.Core
GPI
GUI CLIPj =
Job(backend='LSF')
j.submit()
Job Repository
File Workspace IN/OUT SANDBOX
AtlasPRODDIALDIRACLCG2gLitelocalhostLSF
Athena
Gaudi
Plugin Modules
Monitoring
Ganga Object Model
Gaudi Application Objectclass Gaudi(GangaObject): _schema = Schema(Version(1,0),{ 'optsfile': FileItem(), 'version': SimpleItem(None), 'platform': SimpleItem(None), 'package': SimpleItem(None), 'appname': SimpleItem(None), 'cmt_release_area': SimpleItem(None), 'cmt_user_path': SimpleItem(None), 'masterpackage': SimpleItem(None), 'extraopts': SimpleItem(None)}) _category='applications' _name='Gaudi'
def _auto__init__(self): ...
def configure(self): ... extra_cfg=GaudiExtras() extra_cfg.flatopts=FileParser.writeString(gaudiopts,"expand") return (modified, extra_cfg)
def list_choices(self,property): ...
Job Submit
class GaudiLFSRunTimeHandler: def prepare(self,app,extra): (algpack,alg,algver)=app.masterpackage.split('/',3) script="""#!/usr/bin/env bashexport CMTPATH=###CMTUSERPATH###export ###THEAPP###_release_area=###CMTRELEASEAREA###if [ -f ${LHCBHOME}/scripts/ProjectEnv.sh ]; then . ${LHCBHOME}/scripts/ProjectEnv.sh ###THEAPP### ###VERSION###else echo "Could not find the ProjectEnv.sh script. Your job will probably fail"fimkdir -p cmttemp/v1/cmtcat >cmttemp/v1/cmt/requirements <<EOFuse ###ALG### ###ALGVER### ###ALGPACK###EOFcmt setup -sh -quiet -pack=cmttemp -version=v1 -path=$PWD >cmttemp/v1/cmt/setup.sh. cmttemp/v1/cmt/setup.sh$###THEAPP###_release_area/###APPUPPER###/###APPUPPER###_###VERSION###/###PACKAGE###/###THEAPP###/###VERSION###/###PLATFORM###/###THEAPP###.exe myopts.opts""" script=script.replace('###CMTUSERPATH###',app.cmt_user_path) script=script.replace('###THEAPP###',app.appname) script=script.replace('###CMTRELEASEAREA###',app.cmt_release_area) script=script.replace('###VERSION###',app.version) script=script.replace('###ALG###',alg) script=script.replace('###ALGVER###',algver) script=script.replace('###ALGPACK###',algpack) script=script.replace('###APPUPPER###',app.appname.toupper()) script=script.replace('###PACKAGE###',app.package) script=script.replace('###PLATFORM###',app.platform)
return {'jobscript': ('myscript',script), 'inputbox':[('myopts.opts',extra.flatopts)]}
LSF Submit (1)def submit(self,jobid, jobconfig): inw = FileWorkspace.InputWorkspace() outw = FileWorkspace.OutputWorkspace()
logger.info('LSF: submitting job %d',jobid)
inw.create(jobid) outw.create(jobid) scriptpath = self.preparejob(jobid,jobconfig,inw,outw)
# FIXME: garbbing stdout is done by shell magic and probably should be implemented in python directly rc,soutfile = shell_cmd('cd %s; bsub %s' % (inw.getPath(),scriptpath))
if rc == 0: sout = file(soutfile).read() import re m = re.compile(r"^Job <(?P<id>\d*)> is submitted to (\S*) queue <(?P<queue>\S*)>.", re.M).search(sout)
if m is None: logger.warning('could not match the output and extract the LSF job identifier!') logger.warning('command output \n %s ',sout) else: self.id = m.group('id') queue = m.group('queue') if self.queue != queue: self.queue = queue logger.warning('you requested queue "%s" but the job was submitted to queue "%s"',self.queue,queue) logger.warning('command output \n %s ',sout) logger.info('job %d submission OK',jobid)
return rc == 0
LSF Submit (2)def preparejob(self,jobid,jobconfig,inw,outw): appscriptpath = inw.writefile(jobconfig['jobscript'],executable=1)
# put files into job workdir (also to protect the originals while the job is running)
sharedinputbox = map(lambda f: inw.writefile(f), jobconfig['inputbox']) sharedoutputbox=outw.getPath() print sharedoutputbox
text = """#!/usr/bin/env pythonimport shutil
sharedinputbox = ###SHAREDINPUTBOX###sharedoutputbox= ###SHAREDOUTPUTBOX###
for fn in sharedinputbox: shutil.copy(fn,'.')
s = os.system('###APPSCRIPTNAME###')
print 'DEBUG: Job finshed with exit code: ',s
if s == 0: for fn in os.listdir('.'): if not os.path.isdir(fn): shutil.copy(fn,sharedoutputbox) # FIXME: needs recursive copy sys.exit(s)"""
text = text.replace('###SHAREDINPUTBOX###',repr(sharedinputbox)) text = text.replace('###APPSCRIPTNAME###',appscriptpath) text = text.replace('###SHAREDOUTPUTBOX###',repr(sharedoutputbox))
return inw.writefile(('__jobscript__',text),executable=1)
Job Submit Sequence
Files/Job Repository
• File Workspace ~/__Ganga4__/workspace/input/* ~/__Ganga4__/workspace/output/*
• Job Repository ~/__Ganga4__/repository/ganga_user
LSF backend object
class LSF(GangaObject): _schema = Schema(Version(1,0), {'queue' : SimpleItem(defvalue='8nm'), 'id' : SimpleItem(defvalue=None,protected=1,copyable=0), 'status' : SimpleItem(defvalue=None,protected=1,copyable=0) }) _category = 'backends' _name = 'LSF'
def __init__(self): super(LSF,self).__init__()
LSF Monitoring def updateMonitoringInformation(jobs):
rc,soutfile = shell_cmd('bjobs -a',allowed_exit=[0,255])
sout = file(soutfile).read()
if rc == 0: import re m1 = re.compile(r"JOBID\s+USER\s+STAT\s+QUEUE").search(sout) if not m1: logger.warning('problem with understanding the bjobs output:\n%s',sout) else: items = re.compile(r"^(?P<id>\d+)(\s*)(\S*)(\s*)(\S*)", re.M).findall(sout)
ids = map(lambda x: x[0], items)
for j in jobs: try: idx = ids.index(j.backend.id) new_status = items[idx][4]
if j.backend.status != new_status: logger.info('%d: LSF job status changed to %s',j.id,new_status)
j.backend.status = new_status
if j.backend.status == 'DONE' or j.backend.status == 'ERROR': j.status = "completed" except ValueError: pass updateMonitoringInformation = staticmethod(updateMonitoringInformation)
Hello CLI
• Hello World: # execute hello script locally from Ganga.CLI import * Job(exe='hello').submit()
• Hello DaVinci: # execute DaVinci on the LSF, GRID, ...
# analysis will start at a worker node somewhere far far away ;-) j = Job(name='serious analysis',backend='LSF') j.application = DaVinciApplication(version='v12r3') j.application.optsfile = "DV-demo.opts" j.outputfiles = ["DVNtuples.hbook"] j.submit()
Jobs
• Jobs # registry of persistent jobs jobs()
Statistics: 2 jobs registry-------------- ID status name# 1 new serious analysis# 2 submitted hello
# looping and selecting jobs j = jobs()[1] for j in jobs(): print j for j in jobs()[2:9]: j.name = 'important!' important = jobs()['important!']
Plugin Components
• Applications & Backends # list plugin components backends()
['TestSubmitter', 'Local', 'Glite'] applications()
['DaVinciApplication', 'TestApplication', 'Executable'] # creating objects app = DaVinciApplication(optionsfile='some.opts') bk = Local() j.application, j.backend = app, bk # creating objects by a string name j.application = 'DaVinciApplication' j.application.optionsfile = 'some.opts' j.backend = 'Local'
Templates and Copying
• Copy jobs # reuse existing jobs configuration to create new jobs j = other_job.copy() j = Job(template = other_job)
• Job templates # job templates are just like any other jobs
# except that their sole purpose it to store job configuration t = JobTemplate(backend=LSF(queue='8nm')) j = Job(template = t) # templates are stored in a separate container templates()
• Statistics: 1 jobs templates• ID status name• # 1 TEMPLATE None
Design Principles
• CLI Design Principles Be predictable and follow python way of thinking Increase complexity of interface with complexity of task:
• Simple tasks – simple!Complicated tasks – also simple ;) !
Try to prevent users from slient mistakes:• job.id = 5 # FAILS: id is a read-only property• finished_job.name = 'newname' # FAILS: job is finished so can't modify
Hide implementation:• job._impl.attrs['id'] = 5
Be convinient and guide users• j.application.exe <=> j.exe # ALIASES of properties• TAB completion shows properties and hides internals
Be flexible: good for writing complex macros/scripts...
Ganga Architecture
Client
Ganga.Core
GPI
GUI CLIPj =
Job(backend='LSF')
j.submit()
Job Repository
File Workspace IN/OUT SANDBOX
AtlasPRODDIALDIRACLCG2gLitelocalhostLSF
Athena
Gaudi
Plugin Modules
Monitoring