Building production IaaS Cloud Software

by Tim Cramer, VP Engineering, Eucalyptus

Challenges and Experiences Building production IaaS Cloud Software

I joined Eucalyptus in December of 2010 as the VP of Engineering and began my amazing adventure in Cloud Computing.  Over the last 1.84110 years I’ve learned quite a bit and we’ve been through both challenging and exhilarating times.  Through our time together, we’ve accomplished great things which I’m excited to share with you.

When I was interviewing for the position there was a fairly small team (including tools, partner, release, quality, and dev) with 2 founders as directors/engineers and 2 founders doing core engineering work (this is out of the 7 founders).   One of the things I concentrated on during the interview process was the company/team culture, and specifically focused on the founders, many of whom had PhD’s, to ensure they were interested in creating products instead of doing research.   We had a joke while I was at Sun that you never wanted PhD’s on your engineering teams because you needed someone to do the work (no offense intended to some of the awesome PhD’s that worked their tails off).   I was really impressed (we both passed each other’s tests!) and I started quickly.

What I came into was a team and a company getting a foothold with real production deployments.  We had a lot of interested companies, mostly doing proof of concepts, and some in more advanced stages needing a few key features to commit to Euca.  Puma was a prime example, and they needed integrated NetApp SAN support before they could go live.  What the team did, by necessity, was create a number of customer specific branches of the code to satisfy potential customers as quickly as possible.  This created, however, a fairly difficult to manage situation when creating a product to serve many customers.  We were able to respond quickly, but not with breadth. We also had an interesting open source strategy, where the enterprise offering and open source offerings, although both existing and based on the same source code, weren’t fully compatible (one couldn’t upgrade from the open source to the enterprise).

With some major changes coming down the line (High Availability and Identity Management) which had core architecture changes, it was time to make a key strategic decision to open source the code and create a plugin model for those bits (VMWare broker and SAN support) that would be enterprise only.  We had that discussion in the first few weeks of my employment, and it was encouraging to watch our CEO, Marten, lead the discussion, get everyone’s opinions, and then make the call.   I felt lucky to be on a management team that together confronted key strategic questions, listened to one another, debated (not endlessly!), and made the decision.

2011 was a tough year for engineering.  With 2 very large features with far reaching architectural implications, we had issues which caused us to slip our release… a lot.  7 months late, which was beyond painful for our sales, support, and engineering teams specifically and the company in general.  However, after we released 3.0, we immediately began massive change…  doubled and restructured the team and their roles, changed our product management process, changed our SCM, hosted site, bug/feature tracking system, development process and open source release process.  Amazingly, our engineering team was accepting of this level of change and put their full effort behind making it successful.  I had to burn the change management handbook 🙂

Our 3.1 release was really just moving to Git, GitHub, Jira, settling into new roles (over half the team had a new job description) and creating the plugin model for our code.  It also meant game changing open source software was to be available.  We gave our prediction of completion (4 months) and hit the date.  We didn’t do it perfectly process-wise (far from it, in fact) but we did hit the date, and started to tweak our Agile development process to work for Euca.

Now we’re on to the 3.2 release.  We thought we’d hit November, but some changes were needed and it looks like we’re slipping a couple of weeks into December.  We’re working on troubleshootiness (yep, not a word), reporting (completely revamped), an EMC VNX SAN adaptor, and a new User Console.  The new user console is the first time we got to unleash our new User Experience Architect, Jenny Loza (Dr. JLo), and the result is simply amazing, you should check it out and let us know what you think!

We continue to focus on the Agile process (modified fairly heavily) and getting our Jira workflow down pat, compared to 3.1 it’s an order of magnitude better.   The team is enjoying the distributed decision making/responsibility of scrum teams, and this helps facilitate better communication.  That said, we continue to focus on improving this area as the team is distributed (50% outside Santa Barbara, including this month having our first developer in China).

We also have some great strengths.  The team has the top minds in cloud computing.  They are very open to change and experimentation. Our culture at Euca is second to none.  The team has very little ego and is open to debate.  We love solving the entire problem and integrating all our features for a complete solution.

We know how to have a great time (the party bus to LA for a night on the town was incredible, we only lost 2 engineers).  I think that could be a blog post on its own, but more likely suitable for 4chan. 🙂

The focus of future blog posts will be to dive into some of the experiences we’ve had and how we’re working on them.  I’d love to hear your advice on how you’ve resolved similar issues, or what you think might be coming up as challenges for the team that we can proactively get a jump on resolving. So… the list:

  • Going from a small team with quality, release, tools, partner and development engineering, to more than double, with over half hired in the last 6 months, half of the original members getting new roles, and encompassing the following new teams: documentation, security, community, user experience, project management, and sustaining.
  • Converting the system we use for tracking features and bugs from Launchpad to Jira
  • Moving our source code and management system from Launchpad/BZR to GitHub/Git
  • Changing our development process from waterfall to a modified version of Agile
  • Revamping how we release our product as open source by creating a plugin model for our enterprise bits.
  • Changing our product lifecycle.  Going to a more structured release/maintenance cycle and keeping our engineers continuously engaged.

Over the next weeks I’ll expand on these changes and provide in depth insight into what we’re doing in engineering.  Let me know what you’d like to hear about first, or other topics you’d like to know more about! You can find us Eucalyptoids on #eucalyptus-devel on freenode. Please join us in the community, ask questions, or suggest some blogging themes. And check out the code on github,


Post Navigation