Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture...

19
Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates

Transcript of Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture...

Page 1: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

Python@Work2007-Jul-12

PkManager.pyHoward Kapustein

Director of Technology and ArchitectureManhattan Associates

Page 2: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

Background Director of Technology and Archicture

Manhattan Associates, 7 years EPCglobal Reader Protocol 1.0, Co-Chairman [RFID] SMS, Platform Services (Architecture/Subsystems), 12 years Open Source submitter (Jython, among others)

20 years experience Acronym Soup: C++, Java, Visual Basic, Windows, Unix, concurrency,

i18n, security, RDBMS, GUI, web server, XML, TCP, web services, REST, AJAX, JSON(!), oodles more

No COBOL, No Perl Python since 1997

Switched from AWK Thompson AWK compiler created EXEs

Getting 'long in the tooth' Tried to learn TCL – not so much Stumbled over Eric Raymond's “Why Python?” essay

Made perfect sense Google for “why python eric raymond” Python is Beautiful (even back when it was 1.5)

Rich language, Richer library Thank god for py2exe Pywin32 is pretty handy too

Page 3: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

Application Warehouse Management Open Systems (WMOS)

Large C++, CORBA, portable 'enterprise' application >8 million lines of code Borland's Visibroker for C++ AIX, HP-UX, Linux, Solaris, Windows 24x7x365 – Near-realtime 'Execution' system

i.e. 1 hour outage = millions of dollars Heavy RF+MHE interaction

99% of activity is high volume, low latency

Heavy customization element Routinely modified for every customer

Each customer = Forked codebase IOW more variables + post-release

Performance, Scalability, Latency, Reliability, Resiliency The 'not negotiable' family

Page 4: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

Problem CORBA process:

Server = EXE: initializes, registers available factories with ORB, responds to requests

Client: Factory*f=bind(“factory”); Object*o=f->newInstance();

o->DoStuff(); release o; release f; //aka delete ORB knows what factories are supposed-to-be and actually-available Borland: If requested factory not running, ORB asks the Object Activation Demon

(OAD) to start it [Just-In-Time Activation] Problem: OAD stability is abominable

Runs for hours/days, then randomly hangs or crashes for no apparent reason But JIT support made it popular for non-production (test, dev, …)

Doesn't mean we didn't regularly see support issues due to folks using the OAD Homegrown replacements:

PkPad: Unix shell script, pre-start list of processes, polling via ps to determine premature death to restart

Cons: 30 second sleep between sweep (or huge perf hit), no JIT, no management PkManager.exe: NT Service, multithreaded, interrupt-driven (no polling)

Cons: Windows only (<20% customers), no JIT Solution: PkManager.py

Superset: JIT + PreStart, interrupt-based (no polling), administration interface And by-god-rock-solid-reliable!

Page 5: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

Basic Architecture Global Variable: timeToExit = threading.Event() Thread 1: Main

Initialize (parse command line etc) Start worker threads Main loop

while not timeToExit.isSet(): time.sleep(0.1)

Thread 2: Monitor (Process Manager)while not timeToDie.isSet(): ProcessRequests(); StartChildren(); WaitForDeath()timeToExit.set()

Thread 3: API (Web Server) JIT requests Administration Console Web Services

Thread 4: Uptime (Reporter)while not timeToDie.isSet(): print 'Uptime: %s since %s' % (now-startup, startup) timeToDie.wait(n)

Page 6: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

Configuration (DSL) Configuration file = Domain Specific Language (DSL)

[[wmosprod.dat]] Python dictionaries are sweet!

Look ma, it's JSON symbols={'N':'order', 'OnStart':'#prestart', 'JIT':'#ondemand', …}

config = []lineno = 0for line in open('wmosprod.dat').readline().strip(): lineno += 1 try: entry = eval(line, {}, symbols) config += entry except: print 'Error line %d' % (lineno) errors += 1if errors > 0: raise UserWarning('Uh-oh…')

Users see simple and obvious configuration Code is maintainable and simple

Mostly to 'nicely' handle and report errors

Page 7: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

Signals – Ouch! TIP: Do this very early

import signalsignals = dir(signal) if 'SIGBREAK' in signals:

signal.signal(signal.SIGBREAK, signal.default_int_handler)if 'SIGTERM' in signals:

signal.signal(signal.SIGTERM, signal.default_int_handler)

Surprises #1: SIGBREAK+SIGTERM not always available #2: Default action is usually terminate

Now except KeybreakException will trip

Page 8: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

Threading All threads use same basic pattern e.g.

process_timeToExit = threading.Event() #Globalclass Thread_Monitor(threading.Thread): def __init__(self, other, parms, …): …initialize… def run(self): try: …setup… while not self.timeToDie.isSet(): …do stuff… except KeyboardInterrupt: print 'Ctrl-Break detected; terminating…' except Exception, e: print FormatException() process_timeToExit.set() def stop(self): self.timeToDie.set() threading.Thread.join(self, timeout)

threading.Event is your friend Global Event to coordinate process termination/cleanup Per-thread communication

“Thread, Kill Thyself” = Event.set(); “Time to die?” = Event.isSet() “Thread, Art Thou Dead?” = Thread.join()

Alternative, pair of events:timeToDie = threading.Event()iAmDead = threading.Event()def KillThyself(): timeToDie.set()def TimeToDie(): timeToDie.isSet()def IAmDead(): iAmDead.set()def AreYouDeadYet(): iAmDead.isSet()

Page 9: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

FormatException() Simplify exception reporting

def __function__(nFramesUp=1): """Create a string naming the function n frames up on the stack.""" co = sys._getframe(nFramesUp+1).f_code return "%s (%s @ %d)" % (co.co_name, co.co_filename, co.co_firstlineno)

def FormatException(ei=None): if ei == None: ei = sys.exc_info() info = traceback.format_exception(ei[0], ei[1], ei[2]) return ''.join(info)

Typical usage:try: DoSomething()except SomeException: print FormatException()

Never catch the exception object, though you cantry: DoSomething()except SomeException, e: print FormatException(e)

Page 10: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

KeyboardInterrupt try block necessary per thread

Raised on the active thread when detected Worse, KeyboardInterrupt derives from StandardException

except Exception eats everything Including KeyboardInterrupt and SystemExit! Probably not what you wanted…

This coupled with SIGBREAK fun was a bear to figure out Python 3000 is supposed to 'fix' this

Changing the exception hierarchy! Should make porting…fun…

Page 11: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

Web Server PkManager predates WSGI's emergence

class PkManagerWebServer(SocketServer.ThreadingMixIn, BaseHTTPServer.HTTPServer): #1 passclass PkManagerRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler): #2 protocol_version = 'HTTP/1.0' #3 server_version = 'PkManagerHTTP/' + __version__ def do_HEAD(self): #4 self.do_GET() def do_POST(self): #4 self.ProcessRequest(self.rfile) def do_Get(self): #4 requestbody = StringIO() requestbody.seek(0) self.ProcessRequest(requestbody) requestbody.close() def ProcessRequest(self, requestbody): …parse url… name = 'Handler_' + path.replace('/', '_') #5 handler = self.__class__.__dict__.get(name) #6 if handler is None: if not self.ServeStaticFile(): #7 self.ProcessResponse(400) #8 return else: result = handler(self) if 'Cache-Control' not in headers: headers['Cache-Control'] = 'private, max-age=0' self.ProcessResponse(statuscode, body, headers) #8

#n = Item of interest

Page 12: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

Request Handlers PkManagerRequestHandler methods, e.g.

monitor_requests = Queue.Queue() #Global variabledef Handler__process_start(self): #1 parms = SplitToKVPairs(self.rfile) #2 processname = parms.get('exe') if processname is None: return (400, 'Missing parameter (exe=<name>)') timeout = int(parms.get('wait', TimeoutDefault))

iamdone = g_EventCache.get(timeout) #3 request = (self.effective_path, iamdone, processname)

monitor_requests.put(request) #4 realtimeout = self.TimeoutMSecToRealValue(timeout) #5 if iamdone != None: iamdone.wait(realtimeout) #6 if not iamdone.isset() #7 return (408, None) g_EventCache.put(iamdone) #8 return (200, None) #9

#1: Method name = 'Handler_' + URL's path component #2: Parameters are fundamentally URL query parameters #5: Timeout = N or Infinite or NoWait #6: Wait up to the timeout #7: If timeout, HTTP status = 408 Request Timeout #9: Success! HTTP status = 200 OK

Page 13: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

EventCache New Event() per request = Huge Perf Pig Took 3 hours to identify bottleneck

Only 20 minutes to solve! I Python

class EventCache: def __init__(self): self.cache = Queue.Queue() def get(self, timeout): if timeout == Timeout_NoWait: return None try: event = self.cache.get_nowait() event.clear() return event except Queue.Empty: return threading.Event() def put(self, event): self.cache.put(event) def __len__(self): return self.cache.qsize()g_EventCache = EventCache()

Call get(timeout) for a new Event

Call put(event) to return Event to cache when done Only if done with the Event If errors occurred (e.g. timeout), don't put()

Python will clean up the Event object once no longer referenced

Page 14: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

Queue.Queue All inter-thread-communication via Event and Queue Handler creates a tuple to queue

(resource, event, …parameters…) Output parameters passed as empty list

Iamdone = Event() : name=[] : age=[] : shoesize=[]request = (self.effective_path, iamdone, name, age, shoesize)queue.put(request)iamdone.wait()print name[0], age[0], shoesize[0]

Monitor thread pulls requests from queuedef HandleRequests(): try: while 1: request = queue.get_nowait() path = request[0] name = 'HandleAPIRequest_' + path.replace('/', '_') handler = globals().get(name) : assert handler != None handler(request) except Queue.empty, e: passdef HandleAPIRequest__some_service_entrypoint(request): name=request[2] : age=request[3] : shoesize=request[4] …do stuff… name.append(…) : age.append(…) : shoesize.append(…) iamdone = request[1] if iamdone != None: iamdone.set()

So effective I ported Queue to C++

Page 15: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

Internationalization (i18n) Initially tried module gettext

Standard. Capable. Simple API. Very similar to GNU gettext API But…needed simple deployment

“Zero-Install” – anything else is just a support call (or many…) How to find the message catalogs?

localedir/language/LC_MESSAGES/domain.mo Create a 3-level tree, with very fixed names, to drop a bunch of localized text resources?

And what about customization? Bah. Python to the rescue!

[[PkManagerI18N-*.py]]i18n={} : i18nMeta={}def i18nLoad(path): sys.path.insert(0, path) for root, paths, filenames in os.walk(path) if fnmatch.fnmatch(filename, 'PkManagerI18N-*.py'): name = os.path.splitext(filename)[0] pathname = os.path.join(root, filename) try: module = __import__(name) text = getattr(module, 'Text', None) if text != None: meta = getattr(module, 'Meta', None) for locale in text.iterkeys(): i18n[locale] = text[locale] : i18nMeta = meta[locale] except (ImportError, SyntaxError), e: Abort(5, 'Error loading i18n resource %s' % (filename)) del sys.path[0]

Page 16: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

Internationalization (i18n) – Part Deux Simple format

Text = { 'es': { 'About':'Sobre', 'English' : u'Engl\u00e9s', … } }Meta = { 'es': { 'Name':'Spanish', 'Display' :u'Espa\u00f1ol' } }

But what about complex languages? Python source files can use arbitrary encodings!# -*- coding: utf8 -*- Text = { 'zh': { 'About':u' 亸乾些亖亃 ', … }, 'jp': { 'About':u' ノキアについて ', … }, 'ar': { 'About':u'عن' } }Meta = { 'zh': { 'Name':'Chinese', 'Display':u' 中国 ' }, 'jp': { 'Name':'Japanese', 'Display':u' 日本語 ' }, 'ar': { 'Name':'Arabic', 'Display':u'العربية' } }

One neat trick in module gettext _() is defined as ‘lookup-text’. Nifty ideaprint _(‘About’)def _(s, locale=None, language=None): if locale==None: locale=options.locale

textlist = i18n.get(locale) if textlist != None: text = textlist.get(s) if text != None: return text

if language != None textlist = i18n.get(language) if textlist != None: text = textlist.get(s) if text != None: return text

if isint(s): return s else: return ‘[%s]’ % (s)

Page 17: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

py2exe Running PkManager.py is natural on Unix

Not so much on Windows py2exe binds source + runtime into .exe

# setup.pyfrom distutils.core import setupimport py2exesetup(name='PkManager', version=GetVersion(), description="WMOS process manager, overseer, care and feederer", author='Manhattan Associates', url='http://www.manh.com', console=[{'script':"PkManager.py", 'icon_resources':[(1, 'PkManager.ico')]}], zipfile=None, #Append to .exe / no separate .zip data_files=[('.', [os.path.abspath(r‘wmosprod.dat')])], options={"py2exe":{"compressed":1, "optimize":2, "xref":0, "includes":[], "dll_excludes":[]}}

Create the executablepython -OO setup.py py2exe

Replace console parameter to compile an NT Serviceservice=[{'modules':'PkManager', 'script':"PkManager.py", 'icon_resources':[(1, 'PkManager.ico')]}],

Page 18: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

Demo

Page 19: Python@Work 2007-Jul-12 PkManager.py Howard Kapustein Director of Technology and Architecture Manhattan Associates.

Questions? Blog: http://blog.kapustein.com Email: [email protected]