MesosCon EU 2015

We’ve just returned from the first MesosCon EU, a conference dedicated to the open-source Apache Mesos project in Europe. MesosCon (global) was held earlier this year in Seattle.

Apache Mesos (meaning ‘middle’ or ‘between’ in Greek, as in ‘Mesopotamia’ – ‘between rivers’) is a cluster management system that we’ve been running in production since the beginning of the year. We’re big fans. It’s been gaining a lot of traction recently, and powers infrastructure at places like Twitter, CERN, Apple (to power Siri), AirBnB, and many more. Mesos helps abstract resources, in our case AWS EC2 instances, into a single pool of resources for efficiently deploying distributed applications. Mesos provides resource isolation and sharing between applications. We run our Cloud-based data science platform, the Model Foundry, on Mesos, running services in Docker containers on Mesos using the Marathon framework.

The conference was held at the beautiful Dublin Convention Centre, and was attended by around 175 people, infinitely up from last year (there was no conference – see what we did there?). MesosCon in Seattle was attended by 700, up from 250 last year. Great growth!

Morning Dublin!

Morning Dublin!

An exciting schedule was planned for the day, with talks on how companies are using Mesos to more low-level discussions on new Mesos features. The conference began with an update from the Apache project’s chair and Mesos co-creator Ben Hindman. Ben gave an update on the progress over the last year, including new frameworks, the addition of modules that bring huge extensibility to Mesos, improvements to security (SSL!), and new primitives for frameworks to leverage. The new primitives include dynamic reservations, persistent volumes, external volumes, quotas and maintenance primitives. These primitives will enable a host of new exciting use-cases, for sure. More on those later.

Mesos co-creator Ben Hindman kicking us off.

The first talk was given by Mesos committers Joris Van Remoortere and Michael Park on how to contribute to the Mesos project. Getting started contributing to large open-source project can be a daunting task, and the guys did a good job demystifying the process for Mesos. The main take away for us was: for success, make connections with current committers and align your goals with theirs.

Andy Petrella then gave a presentation on a big data data science platform that he has been working on, based on the familiar stack of Mesos, Marathon and Chronos (a distributed cron framework for Mesos). While there is some overlap with the Sandtable Model Foundry, the platform appears to be focus exclusively on machine learning workflows, in particular using Apache Spark, and less on the collaboration. Interesting to see other systems in this space.

This was followed by Kevin Swinney from Twitter who gave a great introduction to immutable infrastructure. In particular, why we should want it; how it can be implemented; and some pitfalls and how best to avoid them. To Kevin, immutable infrastructure requires to be snapshot-equivalent; we must be able destroy, recreate and scale “at will”; there must exist a single specification; and automation. We can implement and package immutable infrastructure in a number of ways: from machine images (e.g. AWS AMIs); flattened container images; layered containers images (e.g. Docker); application bundles (e.g. WAR; PEX); and layered application bundles (such as POM or requirements.txt). Kevin did a good job of laying out the advantages and disadvantages of each packaging approach. Clearly, the current trend is moving in from both ends of the spectrum towards layered container images (i.e. Docker). He also made some interesting points about the use of ‘configuration services’, which effectively externalise application behaviour – breaking immutability. The same goes for mutable labels, such as Docker labels like ‘latest’. These do not provide unique identifiers, since the labels can be changed — better to use content-addressed identifiers, such as SHA hashes. Thanks, Kevin; I’ll keep my eyes open.

The next talk was given by Rob Johnson from Yelp discussing their internal platform-as-a-service, PaaSTA. PaaSTA is built on top of – wait for it – Mesos, Marathon and Chronos, and is used to deploy services into a monitored, highly available application spanning multiple data centres. Rob told a compelling (if not familiar) story of Yelp’s transformation from a monolithic application to a service-oriented architecture (microservice?) that has worked well reduce operational overhead and the operations bottleneck for deployment. Good stuff.

Old pal Jenkins in the house, courtesy of Yelp.

Over lunch we enjoyed three lightning talks. The first on persistent primitives provided a brief look into the new primitives in Mesos useful for running stateful services, e.g. a database. Without these, when a Mesos tasks fails or restarts, the state is lost and removed by Mesos. Mesos now includes support for persistent volumes, for saving state, and dynamic reservations, where resources are guaranteed upon failure. ArangoDB, a distributed database, was presented as an example of a framework using these primitives. The next talk highlighted that while Mesos is sold on scale, purportedly, up to 50,000 nodes, it is also convenient to use at tiny scales, e.g. 3 nodes. The final lunch talk outlined joint work between ARM and IBM, looking to connect ARM’s existing batch-processing infrastructure and Mesos for running Jenkins (a widely used open-source Continuous Integration server that we use at Sandtable) and Spark jobs. They are investigating two approaches: writing the batch system as a Mesos framework or developing a federation approach. It was interesting to see example of how a legacy system could integrate with Mesos.

After lunch, the talks split in two tracks. The first talk we attended was given by Gregory Chomatas from HubSpot, where he focussed on their Mesos framework, Singularity. On the surface, Singularity appears similar to Apache Aurora, another Mesos framework aimed at running different types of workloads on Mesos, i.e. long-running processes, scheduled jobs and on-offs tasks. Gregory gave a great talk and not just about Singularity, but about how HubSpot has implemented their infrastructure and reaped the benefits of using Mesos, again expressing a familiar story – reducing operational complexity and deployment friction.

The following talk switched gear with Szymon Konefal from Intel providing a look at an oversubscription module, called Serenity, being developed to introduce new flexible and custom policies for use with the Mesos oversubscription API. The oversubscription API provides support to take advantage of slack utilisation by running tasks in Mesos clusters. With the right policies in place, it may be possible to schedule ‘best-effort’ tasks, which can be revoked, to take advantages of slack utilisation by production tasks. The idea being to drive up overall data centre utilisation even further. Frameworks can subscribe to receive offers for revocable resources and then schedule revocable tasks. Although the oversubscription API has been added to Mesos, this space feels fresh. One to watch.

Storage in Mesos is a hot topic, so we were very interested to attend the talk by Jörg Schad of Mesosphere on Mesos file systems. At the moment, storage is essentially handled out-of-band, that is, outside of the Mesos system, using, for example, HDFS (the Hadoop distributed file system). However, a number of new primitives and modules have been developed to begin handling storage within Mesos. This includes dynamic reservations, persistent volumes, external volumes, storage discovery, and the Docker volume driver isolator module (more here). This is a very active space, we expect much more soon.

We were also keen to attend Marco Massenzio’s talk demonstrating how to develop frameworks for Mesos in Python using the new HTTP API. The new API does not require the use of bindings to the native Mesos libraries; this is good. It appears quite straightforward. The demo notebook is available here. We look forward to having a play!

Finally, we attended a talk given by Naoise Dunne from the Insight Centre for Data Analytics about release management on Mesos. Naoise presented another great case study of the use of Mesos, and how it can be used effectively in the development and deployment of complex analytic applications. In particular Mesos is useful for handing unscheduled bursts of activities (elasticity), having to work with small operations teams, and data science focussed. There are definitely overlaps with how we are setup here at Sandtable, and we echo many of the points Naoise made.

The conference was very well organised with the quality of talks consistently high and insightful. We didn’t stay for the hackathon today; maybe next year. Hope you’re having fun guys!

Thanks again to the sponsors and the organisers; definitely worth the trip over.

Leave a comment

Please prove that you are human: