QCon London 2016 – Day 3 & Learnings

In this blog, the final in the series about QCon London 2016, we’ll give an overview of the talks we attended and end with a list of some of the things we learned from the conference.

The theme for the third and final day of QCon was, on the whole, microservices. The final day’s tracks included exciting topics like ‘Microservices for mega-architectures’, ‘Data science & machine learning methods’, ‘Modern native languages’, ‘Full stack javascript’, and ‘Modern agile development’.

We started with the last keynote of the conference by Peter Alvaro (of UCSC) and Kolton Andrus (formerly Netflix), who gave a highly entertaining talk on failure testing at Netflix – a collaboration between industry and academia. As previously discussed, Netflix have a invested heavily in building resilient systems – in fact they do ‘failure as a service’; however, much of this is a manual process. In this keynote, Peter and Kolton discussed a collaboration whereby they adapted a theoretical failure testing approach devised by Peter to Netflix’s infrastructure. The talk was ultimately about what it takes to get ideas out of the lab and into the “real world”. Peter and Kolton showed that it can be done if you meet in the middle, adapt the theory to reality, and remember to work backwards from what you know. Great keynote.

To start the track talks of the day, track organiser Russ Miles spoke about test-driven microservices. Russ discussed the complexity of building and reasoning about (large) loosely coupled service oriented architectures with bounded contexts (microservices), in particular when the services can change. It’s difficult to build confidence in such systems that is acceptable to the business. How to do unit/component/service testing in this environment? Testing the pieces is good of course, i.e. unit and component testing; however, there are apparently organisations running 1000s (!) of microservices. How does one reason about such systems? We could use stories; we, as humans, like stories – or what about ‘journeys’? Journeys are stories covering the whole service pipeline. But the pipeline will change – systems will be replaced; contracts will be broken. This is the complexity of using lots of microservices: there are many moving parts. We need ways to simplify and reason about the whole system. Agent-based model anyone?

In the data science track, Mathieu Bastian, Head of Data Engineering from GetYourGuide and formerly LinkedIn, gave a great talk on the mechanics of testing large data pipelines. Mathieu started with a compelling motivating example showing how complex data pipelines can become Рa measure really of how successful and critical they are. Part of the complexity of managing data pipelines is the combination of code and data. Code will have bugs of course, and data will change. The key is to embrace automation: testing of the code and validation of the data. Each of these offers great benefits. Testing reduced bugs in code (hopefully!); gives confidence to iterate quickly and scales well to multiple developers. While validating data helps reduce manual testing and, ultimately, helps avoid catastrophic failures. However, each has its challenges. For testing, it is required to have data to test realistically; pipelines are often not running locally but in Cloud for example, which has a cost; and finally there is a lack of good tooling. As for validation, data sources may be out of our control, and can be difficult to test machine learning models.

"complex data pipeline" (Slide by Mathieu Bastian)

An example of a complex data pipeline. (Slide by Mathieu Bastian)

Mathieu went through a number of testing and validation strategies. We’ll briefly mention some here; however, if you are involved with building data pipelines we would highly recommend going through Mathieu’s slides. For testing he listed a number of practices: design for testing from the start; pipeline jobs should be functions (output only depends on input); it should be safe to re-run jobs (idempotence?); you should centralise configuration, version code and timestamp data; unit test locally; build from schemas (Avro was listed as an example); use generators to create data that have properties of real data; and, finally, integration test on realistic sample data.

As for validation strategies, it is necessary to handle different types of failures, from things under our control like bugs to more difficult issues, for example, handling noisy data and model biases. Possible strategies are different for input and output validation. For input validation, it is necessary to test for data quality: “garbage in, garbage out”, lest we forget. But also data can change for any number of reasons: due to data migrations; product migration, or perhaps data dependencies are updated. Issues here could include missing values, encoding issues, schema changes, duplicate rows, etc. Output validation is also important – the outputs are going to be consumed by another system or by a human. Approaches here include looking for anomalies, for example, checking values are within bounds, and using A/B testing. Some great advice here.

After, we attended a talk by Peter Bourgon (of Weaveworks and formerly SoundCloud) on successful Golang program design. Peter gave wide-ranging practical advice covering a number of topics such as the development environment; repository structure; formatting and style; configuration; logging and telemetry; testing; and dependency management. If you’re into Golang, definitely check out Peter’s slides.

At Sandtable, we’re a Python shop, but we’re interested to find out more about newer languages like Golang and Rust (see here for a discussion on Dropbox’s use of both) that are gaining traction. Golang, developed by Google, has been gaining some recents converts as a systems programming language, in particular for implementing microservices. It is also the language in which Docker is implemented.

Back to microservices, Tammer Saleh from Pivotal gave a great talk on microservice antipatterns; or, ‘how not to go down in flames’. Tammer went through a total of 12 antipatterns, a few of which we’ll discuss here.

  • The first (and most important?) antipattern Tammer described was called ‘overzealous microservices’, in which he noted that one should not start with microservices but with a monolith and extract services as required. Only use microservices if you need them, basically. Good start.
  • Getting into more technical antipatterns, Tammer described an antipattern concerning spiky loads between services. The recommended approach is to smooth traffic utilising buffers through queues, for example, to amortise the (scaling) costs. This does however mean responses must be handled asynchronously.
  • Moving on, as the number of microservices increases, the number IPs and ports increases too, and how are services supposed to find each other? The antipattern would be to hardcode hosts and ports. Instead it is recommended to use a service discovery tool like Consul or etcd, or a centralised approach like DNS. (Using a service discovery tool – but not Zookeeper – would definitely be the trendy approach).
  • Debugging microservices can be difficult as the error in one may have been caused by another service; tracking down issues can be a nightmare. It is suggested to use correlation IDs across journeys – following requests as they travel through the system. Although this must be done manually, it will save much pain later.
  • Another interesting antipattern involves testing microservices – and how to test using mock servers. The antipattern is to allow the consuming team to create their own mocks and stubs of services, which means a proliferation of mocks. One solution is for the service team to own the client used to consume the service and to control when mocking through configuration. The downside to this is that the service team must release clients in languages required by consumers.
  • Finally, as the number of microservices increases, there will be an increasing need for monitoring and logging – do not fly blind – be prepared for more and more graphs, alerts, and pages; and furthermore there will be an operational explosion, so automate as much as possible. Some good antipatterns here to be mindful of.

Having now had a few weeks to digest the experience, here are a few things we learned.

  • Microservices are increasingly popular, and bringing great flexibility – when combined with DevOps and Continuous delivery.
  • Microservices are not panacea; they bring their own complexities, not least increasingly complex interactions between microservices, and an operational explosion as the number of services grows.
  • There is an on-going (continuing?) debate about what microservice are: do they have to be small (as in lines of code)? Or is it more: “do one thing and do it well”?
  • Continuous delivery continues to deliver.
  • Software is eating the world, from inside containers.
  • Unikernels are the next logical step after containers, and coming to a Docker toolchain near you soon.
  • Research papers actually get read, sometimes, by some people.
  • Conferences can have good food.

QCon London was a great conference. We really liked the topics of the tracks; the daily introductions to the day’s tracks; the voting system; the breaks between talks; and of course the talks themselves. The conference flowed well. The only issue was poor wifi connectivity in busy rooms, and perhaps the need for more power outlets.

Thanks again to the organisers, track hosts, and speakers. Hope to see you again next year.

Leave a comment

Please prove that you are human: