Eponymous laws of software development in data science

Over the years, a number of eponymous “laws” (or adages) have emerged in software development. In this blog, we discuss five well-known ones: Ziv’s, Humphrey’s, Conway’s, Bradley’s, and Brooks’; and explore how they apply to data science (DS) as it continues to evolve, and in particular to Sandtable’s approach as we scale.

To provide some context, at Sandtable, a DS company, we carry out data science projects working with our clients to address important business problems. We do this through a combination of data analysis, theory and model building, and through the exploration of potential (“what-if”) scenarios. The modelling approach we employ is called agent-based modelling. The deliverables of our projects consist of insights gleaned from data analysis and the results of computational simulations delivered through web-based data applications. Data science projects are undertaken by data scientists working with planners, managers, and application developers using an in-house Cloud-based data science platform. As we continue to grow, we’re looking to scale these activities in numbers of projects and people.

Ziv’s Law

The first is Ziv’s law, or the ‘Uncertainty Principle’, first discussed in a 1997 paper by Ziv, Richardson and Klosh, ‘The Uncertainty Principle in Software Engineering’. The law states that software development is an inherently uncertain process, and hence it can be hard (impossible?) to fully define a specification upfront. During the development process discoveries will be made, and hence it might be a good idea to update the plan and adapt to what is learnt. It is due to this uncertainty that traditional project management approaches, like the waterfall model, which require a fully specified sequential plan upfront, do not work well for software development. Although now thought as obvious, perhaps, Ziv’s law reminds us that this wasn’t always the case.

Arguably, the uncertainty (and therefore risk of failure) for DS projects is higher than for software development. This is because of a number of factors. Firstly, due to the often exploratory nature of the projects, to the extent that some projects are not problem (or hypothesis) driven but are purely exploratory: ‘Here’s some data, what can you do with it?’. Secondly, DS projects often have external dependencies, for example, around data, that bring additional uncertainty. There may be unknowns around when data will be received, for example, or how it was generated, and even what it is. Thirdly, and finally, there may be (initially) little domain knowledge in the team about a particular business or sector. As with software projects, getting client and user involvement as early and as often as possible can help mitigate these issues.

Humphrey’s Law

The next law we consider is Humphrey’s. This law says that users of a system don’t necessarily know what they want from the system until they see it, or perhaps until they see what they don’t want. A consequence of this is that there is a limit to the value of doing upfront requirements gathering from users. A better approach might be to get feedback from users as quickly as possible, and iterate. Again, possibly taken as obvious at this point in software development — the law stands to show us that this wasn’t always the case.

However, this is currently a big issue for DS projects: clients/users may not know a lot about DS and by extension what it can do for them. This lack of understanding that will continue until there is wider adoption of DS. Unfortunately, there is also plenty of hype to fill the knowledge gap. Dealing with these issues requires both educating clients in the possibilities of DS, and also the management of expectations. Educating clients can be done, for example, through the use of case-studies, presenting work and getting feedback as often as possible.

Conway’s Law

Let’s turn to Conway’s law, named after Melvin Conway and first described in his 1968 note: ‘How do committees invent?’. As originally stated, the law is: “Any organization that designs a system (defined more broadly here than just information systems) will inevitably produce a design whose structure is a copy of the organization’s communication structure.” Given Conway’s law then, if you want to build software of a certain architecture, you must organise your team to reflect this structure. In general, large, monolithic codebases are built by large, tightly coupled teams. If you want to build loosely coupled systems that can adapt and develop more quickly, you should organise teams to be loosely coupled and autonomous. A “team” of 1 will in general also produce tightly coupled software.

As DS evolves and looks to deliver more data products and applications, how software is delivered will become critical to the success of DS projects. This is of course tied to the development methodologies and tools, but, as Conway’s law reflects, to the structure of the organisation too. Conway’s law forces us to think about how we organise in order to develop the kind of software we want to build. In software development, Agile development methodologies, such as Scrum, are used widely – using small teams to iteratively and consistently deliver useful, high quality software. Given the increasing requirement to deliver software as part of DS projects, it’s worth asking how agile approaches be adapted and applied, and also thinking about role that current DS teams (often highly specialised) will play in the delivery of future projects.

Bradley’s Law

While the previous laws speak to the design and building of software, the next law, Bradley’s, which is perhaps less well-known, concerns software once it’s been released. It says that in order for software to survive and continue to be useful, it must evolve, that is it must be adapted and extended, or it will perish. There may be a number of forces at play here. The first is that users will learn and adapt to the software, so you should look to understand how your users actually use your software (you might be surprised), and work to improve user experience. Secondly, if you’re successful, there will likely be competition, so not only must the software evolve but it must do so fast.

For DS projects, there are two aspects to consider with respect to Bradley’s law. Firstly, as with software in general and discussed above, there is issue of adapting the software to improve user experience — remember users may not know what they actually want (Humphrey’s law); and they will adapt once they are using it. Secondly, specifically for DS projects that involve building predictive models that drive data products, such as recommendation engines, the success of these projects will be judged by how useful and actionable (and accurate) the insights are. On top of this, as the world changes, the performance of models will be affected, and they will therefore require close monitoring and updating. In 2014, Google released a paper discussing technical debt in real-world machine learning systems, and included changes in the external world as one of the key sources of technical debt in such systems. Managing (and servicing) this technical debt will be key to the on-going successful application of data products and services.

Brooks’ Law

The last is Brooks’ law, first discussed by Fred Brooks in his classic book, the ‘Mythical Man Month’, released in 1975. Brooks’ law states that adding “resources to a late software project will make the project even later.” Simply and, perhaps, intuitively, one might think that adding resources to a project would speed up its delivery. However, since software development is a human intensive process as well as engineering one, and given that software development can be considered new product design, this simplistic assumption is often wrong. The reason is that on-boarding can require intense communication between the team, ultimately slowing them down, and hence worsening the situation rather than improving it.

Due to novel nature of many DS projects, and the use of cutting-edge tools and techniques, Brooks’ law is worse than for software development. The on-boarding of team members, whom may have little applicable experience, can be intensive. To reduce this cost going forward, there is a need to standardise tools and methodologies, demanding improved collaboration, greater efficiency, quality, and reproducibility of DS projects, much as has happened in software development. Software engineering can offer help here: coding standards; version control; code reviews; quality control; to name a few.

Finally…

As the field of data science evolves and looks to deliver more value to businesses, it is worth heeding these laws and their implications. In particular, it could be gainful to look at how software development has progressed, particularly through the development of agile software development methodologies and tools, in helping to address some of these issues. We believe that more work should be done to apply software engineering practices in data science.

Leave a comment

Please prove that you are human: