ASI Data Science Fellowship Presentations

Over to Cambridge Heath last Thursday the 27th to see presentations from the most recent batch of ASI fellows. ASI’s founder also used the occasion to launch their new data science platform, SherlockML, which reassuringly incorporates some of the ideas we’ve put into practice in our own platform (data lake, Jupyter integration, API for data apps, etc). You really need something like this if you want a data science team that works as a team.

The ASI fellows were reporting back on their 6 week industry placements. Overall, the quality of the presentations was excellent. There were a few stand-outs.

Anton van Pamel crunching GPS data from Arriva buses to predict (and avoid) bunching. A great demonstration of the importance of developing a set of tools to describe the problem before setting out to find the solution. And a great opportunity to answer the eternal question of why you wait half an hour for a bus and three arrive all at once.

Faye Chung who worked with Aimia to study the paths shoppers take as they walk around a supermarket. Paths were characterised as strings of letters, with each letter representing a zone in the supermarket. Levenstein string edit distance was then used to cluster similar paths. The idea is to use the emerging clusters (good chef / bad chef / wanderer / etc) as a basis for targeting customers based on their in-store behaviour.

Some¬†fellows teased us by telling us about the problems they had solved, but not disclosing the solution. Antoine Clais said he’d found a solution to a survey weighting problem. Setrak Balian gave us a glimpse of some of the work he’s done building a simulation of teacher supply in England.

A couple of fellows had been using graph databases as part of their project work. My favourite presentation of the evening was from Sam Short, a particle physicist with over two dozen papers to her name, who had used Neo4j and NLP to build a rogue landlord detection tool for Lambeth Council, using data scraped from Companies House and the Land Registry. Already in use, the tool saved 20 hours of the 100 hours investigators apparently have to spend building a case.

Leave a comment

Please prove that you are human: