Strata+Hadoop London 2016

Last week we spoke at the Strata+Hadoop London conference held at the ExCel centre. We presented as part of the new Data 101 track held on Wednesday that intended to introduce the principles of data architecture and doing data science. Our talk was on doing data science for strategic business problems.

We spoke about issues with building models for human decision-makers (DM) faced with strategic problems. To start, we discussed what models are. To us, models are simple representations of a system or phenomena of interest, such as a wax model or a statistical linear model. Models are always an approximation of the system and, as such, are never correct or right. You could, as Emanuel Derman eloquently discusses, view models as metaphors. They focus on aspects of the world you are interested in, and leave out others. Nevertheless, while they are wrong, they can still be useful. There are in fact many ways in which models, and the modelling process, can be useful, as Joshua Epstein argues. You can use models to make predictions (as is common in data science, using, for example, methods from machine learning); for explanation; for deciding what data to collect; for discovering new questions; for illuminating on core dynamics of the system; amongst many others. It’s our experience, however, that for human DM to actually use models they must be able to interrogate and understand them.

In 1969, John Little published a protest paper discussing these very issues, and he proposed a set of guidelines – six rules – that make models more likely to be used. He said models should be simple to understand; robust to absurdities; easy to control; adaptive to new information; complete on important factors; and easy to communicate with (quick execution time). We think his guidelines apply today, as they did back in ’69. Because, to quote Little: “If you want a manager to use a model, we should make it his model, an extension of his ability to think about and analyze his operation.”

So while the trend in data science is towards complex, black-box predictive modelling, for example, using neural networks (e.g. deep learning approaches) and ensemble methods that have great predictive capabilities, they often have poor explanatory power. In the context of strategic decision-making, in which DM will want to be able to understand models, in particular their relation to the system of interest, the use of black-box models may therefore be limited. For an in-depth review of the differences between explanatory and predictive modelling, see Galit Schmueli’s ‘To Explain or to Predict’ paper.

At Sandtable, we use agent-based modelling and simulation to build explanatory models that provide effective decision-support tools for DM. We package models as data products delivered as web-based interactive applications that enable DM to explore models and simulate potential future scenarios.

To view slides from the talk, see here.

Here are some highlights from the rest of the conference.

Wednesday

On Wednesday, we attended talks in the Hardcore Data Science track. Mikio Braun gave an interesting talk about data science in industry. We’ve faced many of the same issues, and in particular it was interesting to hear about the (cultural) differences between data scientists and developers. We also heard from Matthew Smith, who talked about incorporating domain understanding (export knowledge) into data science, something we certainly care about. Matthew has built many fascinating models over the years, and the thread of of his talk was how to effectively combine process models with data.

Thursday

Joseph Hellerstein of UCB gave a great keynote on “data relativism”: nowadays there may not be a “single-source of truth”, as there was in the heydays of data warehouses – but the meaning of data depends on the context in which it is used. Stuart Russell also of UCB, of ‘AI: A Modern Approach’ fame, gave a fascinating keynote on the future AI. Stuart notes that AI is progressing fast, and he proposes that we need to reorient the field to build AIs that will be aligned with human values systems. Fascinating stuff. This was later followed up with an interesting discussion about the risks of AI, alongside Jaan Tallinn (of Skype fame; and founder of the Centre for the Study of Existential Risk at Cambridge).

Good AI?

Good AI?

Friday

For something quite different, designed Stefanie Posavec based in London talked about (and inspired?) with her year-long project, Dear Data, done in collaboration with Giorgia Lupi based in NY. For a whole year, each week they would send each other a postcard with data that they had manually gathered and drawn. Following on, Tracia Wang gave a fascinating talk on “thick data” – data brought to light using ethnographic research methods; and how in the mire of big data we should not forget about people, and their emotions and stories. Cat Drew from the UK Policy Lab and Government Data Science Partnership gave a great keynote about how the UK government is increasingly bringing a more data, digital and design approach to policy making. Particularly interesting was the ethical framework recently published to make sure people are using data in an acceptable way.

2016-06-03 10.18.20-2

Right or wrong?

Thanks to the organisers and sponsors. We really enjoyed talking at and attending Strata. Finally, thanks again to those who attended our talk.

Leave a comment

Please prove that you are human: