An Intern Writes…

Week of the 25/07/16

 

I am three weeks into my summer internship at Sandtable, before I’ll be heading back to Imperial College in October to start the third year of my theoretical physics degree. Here’s a little insight into how things are going!

Following a weekend’s breather recovering from the mind-boggling insanity that was Friday’s board games & pizza night, the challenge this week was all about acquiring and manipulating data on the US. Well. That and trying to retain as many limbs as possible on the cycle in from Putney.

Propelled by the success of the forecasting model with a UK retail client, Sandtable is now looking into options for expansion into the US market. Although to a mere human like myself, the wizardry of the data scientists seems unbounded, I was assured that they would indeed be needing reliable data sets to do their work.

With a budget of well over a billion dollars, the US census bureau was the first port of call, however university sources turned out to be far more suitable for our specialist requirements. When the data failed to open on my computer, Nigel gleefully announced that ‘big data’ status had been achieved – a momentous moment in my life.

I spent the first half of the week using Python to extract and sort the necessary data from shapefiles and CSVs, find the centroids of the US census blocks, and match up different levels of data – due to privacy, income information is only available on a block-group level. Pandas (a word I must have pronounced wrong over 20 times before the laughs ran out for James and he did me the courtesy of a correction) data frames were particularly useful for carrying out this work.

So that was all the census information dealt with. Next on the agenda was the store data. I used an API with access codes from Factual Developer, again in Python, to pull data on information like store names, addresses and coordinates.

On top of all of this, I also got the opportunity to present my previous week’s work to Dunnhumby’s data innovation manager. This consisted of an investigation into discrepancies between Sandtable’s model and true sales on an individual store level.

Thanks to the enthusiastic assistance from all of the scientists and fellow intern Faure, my Python skills have come on leaps and bounds. I can’t wait for next week!

Leave a comment

Please prove that you are human: