Jason Stowe Condor Week 2009 April 22 nd, 2009. Coming to Condor Week since 2005. Started as a User.

Post on 12-Jan-2016

217 views 0 download

Tags:

Transcript of Jason Stowe Condor Week 2009 April 22 nd, 2009. Coming to Condor Week since 2005. Started as a User.

Jason StoweJason Stowe

Condor Week 2009Condor Week 2009

April 22April 22ndnd, 2009, 2009

Coming to Condor Week Coming to Condor Week since 2005. Started as a Usersince 2005. Started as a User

Users hunger for featuresUsers hunger for features

AccountingGroups (2004/2005)AccountingGroups (2004/2005)Configuration w/Pipes (2005/2006)Configuration w/Pipes (2005/2006)GroupResourcesUsed (2006/2007)GroupResourcesUsed (2006/2007)

Condor in Cloud (2007/2008)Condor in Cloud (2007/2008)Resource Weights (2008/2009)Resource Weights (2008/2009)

Based upon customer requestsBased upon customer requests

Focus on software development for Focus on software development for managing Condor at any scale,managing Condor at any scale,

and provide services that and provide services that complement the technologycomplement the technology

Universities, Fortune 500s, Universities, Fortune 500s, Government Labs, Small/Medium Government Labs, Small/Medium

Businesses, that use CondorBusinesses, that use Condor

Users like Condor because...Users like Condor because...It’s open, it works, flexible, It’s open, it works, flexible, (corporations) no lock-in (corporations) no lock-in

API/Operating System, and...API/Operating System, and...

The CommunityThe Community

Today, let’s talk about Today, let’s talk about a few challenges, solutionsa few challenges, solutions

War Story #1: War Story #1: Compute & DataCompute & Data

Whenever you find or solveWhenever you find or solvea computation problem, youa computation problem, you

discover a data problem.discover a data problem.

““Dark” or Latent, Unused StorageDark” or Latent, Unused Storageon any OS/Deviceon any OS/Device

Empty space dispersed across Empty space dispersed across machines in unusable sizesmachines in unusable sizes

““We need more filer space, but we We need more filer space, but we have empty space on all our have empty space on all our

machines.”machines.”

So we looked at HadoopSo we looked at Hadoop

New type of storage:New type of storage:Aggregated or “Cloud” StorageAggregated or “Cloud” Storage

Block Store ArchitectureBlock Store Architecture

But how do we use it?But how do we use it?

1.5 years ago: It works well 1.5 years ago: It works well to access it in Java, but what to access it in Java, but what

about mounting?about mounting?

So we tried WebDAVSo we tried WebDAV

Next up,Next up,open source FUSE driveropen source FUSE driver

Need: Windows/Linux, Reliable, Large Need: Windows/Linux, Reliable, Large Files, scalable, and Read/WriteFiles, scalable, and Read/Write

Mountable drivers Mountable drivers Linux(FUSE) / Windows (IFS)Linux(FUSE) / Windows (IFS)

CloudFS ArchitectureCloudFS Architecture

When we rolled it out...When we rolled it out...

Customers Asked for Customers Asked for Surprising FeaturesSurprising Features

HTTP/REST Protocols similar to Amazon S3HTTP/REST Protocols similar to Amazon S3Reasons: Reasons:

Installing mountable driver across Installing mountable driver across servers/workstations prohibitiveservers/workstations prohibitive

Want similar interface to various cloud storage Want similar interface to various cloud storage providers => Internal Cloudproviders => Internal Cloud

FTP Interface – Because it is simple!FTP Interface – Because it is simple!

Status TodayStatus Today

Mountable Multi-platform Drivers. Mountable Multi-platform Drivers. Linux: SUSE 10, RHEL/CentOS 4&5, Linux: SUSE 10, RHEL/CentOS 4&5,

Windows 2k3 +, OSX 10.3+Windows 2k3 +, OSX 10.3+

Encryption to avoid snooping Encryption to avoid snooping sensitive datasensitive data

Data Nodes built on Java: Linux, Data Nodes built on Java: Linux, Windows, OSX, SolarisWindows, OSX, Solaris

RESTful Storage Service & RESTful Storage Service & FTP interfaceFTP interface

Management interface for Management interface for controlling storage featurescontrolling storage features

(Integrating with CycleServer)(Integrating with CycleServer)

Looking forward to Looking forward to condor_hadoop!condor_hadoop!

War Story #2: War Story #2: Cloud CalculationsCloud Calculations

Condor usersCondor usersPeak vs. Median usagePeak vs. Median usage

ProblemProblem

Need for compute power Need for compute power comes up suddenlycomes up suddenly

Condor Users hunger for Condor Users hunger for resourcesresources

Condor users balance Condor users balance “We need more servers for big “We need more servers for big

runs” and “Our servers are 40% runs” and “Our servers are 40% utilized”utilized”

Many ways to solve Many ways to solve this problem using EC2this problem using EC2

Use cases do exist for Use cases do exist for adding nodes to a local condor pooladding nodes to a local condor pool

using Amazon EC2using Amazon EC2

We favored entire poolsWe favored entire poolsin cloudin cloud

Data Scheduling, Data Scheduling, Performance issuesPerformance issues

Run workflows faster using Run workflows faster using resources you could never buy...resources you could never buy...

can test CycleServer at a scale can test CycleServer at a scale our users have and we don’tour users have and we don’t

Need 1000 node Condor PoolNeed 1000 node Condor PoolWait 15 minutesWait 15 minutes

Dynamic Resources => Dynamic Resources => Pool can be sized to the jobsPool can be sized to the jobs

1 core1 core x x 1000 hrs 1000 hrs ==1000 core 1000 core x x 1 hr 1 hr = = ~$200~$200

Sounds good, but how Sounds good, but how do we do this for a do we do this for a

Workflow like BLAST?Workflow like BLAST?

From e-science 2008:From e-science 2008:For 64x the processorsFor 64x the processors

Hadoop Running Blast: 57xHadoop Running Blast: 57xmpiBLAST: 52.4xmpiBLAST: 52.4x

High-CPU Amazon EC2 nodesHigh-CPU Amazon EC2 nodeshave best price/performancehave best price/performance

Scalability: 2x CPUs = 1.9825xScalability: 2x CPUs = 1.9825x64 CPUS = 60.7x Speed-up64 CPUS = 60.7x Speed-up

Why High Throughput leads toWhy High Throughput leads toEfficient ComputingEfficient Computing

Another User:Another User:Worked with Varian - Worked with Varian - Mass SpectrometersMass Spectrometers

Other High-Tech Other High-Tech Lab EquipmentLab Equipment

Problem: Coming up on Problem: Coming up on a conference, needed to run a conference, needed to run

a large simulationa large simulation

Six WeeksSix WeeksOn an internal Condor poolOn an internal Condor pool

Deployed a Condor poolDeployed a Condor poolin CycleCloudin CycleCloud

Same 6-week Job Same 6-week Job

Ran < 1 DayRan < 1 Day

War Story #3: War Story #3: ManagementManagement

Condor Tutorial mentionsCondor Tutorial mentions“Why use a personal Condor?”“Why use a personal Condor?”

i.e. Condor on few nodes...i.e. Condor on few nodes...

Condor on 1 computer Condor on 1 computer Gets you policies, Gets you policies,

fault-tolerance, Etc. fault-tolerance, Etc.

Similarly, management issues Similarly, management issues come up even on small poolscome up even on small pools

Collaborating with U. of W. Collaborating with U. of W. MadisonMadison

Managing Configuration Files Managing Configuration Files (our Config with Pipes CW2006)(our Config with Pipes CW2006)

Exploring ClassAds/LogFilesExploring ClassAds/LogFilesbecomes problematicbecomes problematic

Visualization, Reporting, etc.Visualization, Reporting, etc.

Man-decades on development Man-decades on development of tools to assist running Condorof tools to assist running Condor

Have demo against Madison poolHave demo against Madison poolCome see me. We’d love Come see me. We’d love

more use casesmore use cases

Questions? Thank youQuestions? Thank you

For more information go to:For more information go to:http://www.cyclecomputing.comhttp://www.cyclecomputing.com

We constantly see opportunities for talented We constantly see opportunities for talented Condor folks, so please feel free to contact us!Condor folks, so please feel free to contact us!

Jason StoweJason Stowejstowe - cyclecomputing.comjstowe - cyclecomputing.com