10 Books Every Data Practitioner Should Read
Roxana Pamfil, Data Scientist; Jannes Klaas, Data Scientist; Emily Jones, Data Scientist; Roshini Ashokkumar, Data Engineer and Mareike Herzog, Analytics Engagement Manager
At QuantumBlack, our work as data scientists, data engineers, and analytics managers generally involves code and graphs rather than prose. However, many of us remain avid readers in our spare time. In London, we host a monthly book club, which provides a fantastic opportunity to discuss our latest favourite reads, whether fiction or nonfiction.
While most of these book choices have nothing to do with our day jobs, there are occasionally titles that help us view our work in AI and analytics through a new lens. In this article, we highlight the books that have provided our team with food for thought, from offering predictions on the future of analytics to reminding us of the importance of ethical data practice.
Algorithms to Live By: The Computer Science of Human Decisions
Authors: Brian Christian and Tom Griffiths
What the book is about: A captivating read that connects concepts from computer science to experiences from everyday life. It turns out that there is a lot of algorithmic thinking that goes behind deciding whether to try a new restaurant or stick to an old favourite (the explore–exploit trade-off), figuring out how many people to date before settling down (optimal stopping) or planning laundry loads (schedule optimisation).
Why it’s relevant: Even those familiar with many of the technical concepts presented in this book will gain a new perspective — and perhaps some ideas on how to explain to family and friends what it is that you do for a living.
Factfulness
Authors: Hans Rosling, with Ola Rosling and Anna Rosling Rönnlund
What the book is about: Hans Rosling explains why we so often get the answer to questions about global trends so wrong and how, when we take a step back to properly analyse data, the world is doing much better than we think. He illustrates the contribution of mass media to this distorted perception and lays out how our instincts lead us to exaggerate situations. Rosling outlines how we can combat this type of thinking by embracing facts and shifting our perception.
Why it’s relevant: As data permeates almost all aspects of our day-to-day, interpreting this information from an unbiased perspective has never been more important. This book reminds us that taking the time to focus on indisputable facts can help us solve global problems in a more measured and effective manner.
Weapons of Math Destruction
Author: Cathy O’Neil
What the book is about: Algorithms that make predictions about people are everywhere and without proper scrutiny have great potential for harm. From the discrimination of resume-screening algorithms that discriminate against women and racial minorities, to more subtle biases such as ‘scheduling optimization algorithms’ that punish retail workers, O’Neil takes the reader on a journey through the many cautionary tales of analytics.
Why it’s relevant: This book is a reminder that as creators of algorithms, we have the responsibility to ensure that they are fair, transparent and used only in the ways they are intended. Getting this wrong can have drastic consequences on people’s lives and even on society itself, which is why the book supports calls for a Data Scientists’ Hippocratic Oath.
The Book of Why
Authors: Judea Pearl and Dana Mackenzie
What the book is about: Pearl received a Turing Award, the highest recognition in computer science, for developing a mathematical approach for causal reasoning. The Book of Why is a friendly introduction to these concepts, including what Pearl calls ‘The Ladder of Causality’. There are many examples along the way, from establishing the link between smoking and lung cancer to determining whether a drug is effective at lowering blood pressure — and even preventable cases of scurvy that could be traced back to physicians not having the right causal model for how citrus fruit prevents this disease.
Why it’s relevant: Causal inference is playing an increasingly significant role in the field of AI. In many domains, knowing why a model makes a certain prediction is just as important as the predictions themselves. Pearl’s work proved particularly popular at QuantumBlack during the development of CausalNex, an open-source Python library for causal modelling and ‘what if’ analysis.
Invisible Women
Author: Caroline Criado Pérez
What the book is about: An enlightening book that offers insightful evidence into some of the root causes of gender discrimination. Criado Perez investigates and identifies the link between this discrimination and a gender data gap, highlighting the impact of datasets that either omit women or fail to report numbers in a sex-disaggregated manner. This results in a world that contains a host of biases against women across issues like health, public transport and employment.
Why it’s relevant: Data is being utilised at an increasing rate to inform decisions, be it clinical trials, public policies or recruitment processes. It is therefore critical that the data being employed to make these decisions is not biased and is representative of the whole population. This book reiterates the importance of vigilance while utilising the data — moreover, it helps us to identify gaps in our data collection and aggregation processes to avoid potential discrimination in our own projects.
Inspired: How to Create Tech Products Customers Love
Author: Marty Cagan
What the book is about: Marty Cagan is the founder of the Silicon Valley Product Group and one of the gurus of modern product management. In this book, he lays out his vision of product management and a product-driven organisation in a comprehensive and accessible manner. He explores the role of the product manager and shows how an organisation can be set up to create great products.
Why it’s relevant: Putting a product lens on data work can help focus our efforts and eventually achieve better outcomes. Data science and data engineering do not exist in a vacuum and usually serve an end-user. Even if that end-user is internal, such as a corporate decision-maker, the result of our work is still a product that should solve user problems.
Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World
Author: Bruce Schneier
What the book is about: Security Expert Bruce Schneier offers his take on some of the most prevalent problems with data security, privacy and surveillance. He outlines a range of historic examples of data exploitation on the part of individuals, corporations and governments, as well as how we can defend ourselves against this sort of exploitation in the future.
Why it’s relevant: Some of the security methodologies and guidance covered in this book will help design more resilient systems and will also help us stay more watchful, using best practices while architecting technical solutions with data (in the form of data collection, storage and use). It also provides a good introduction to some of the basic and important security principles: security, privacy, accountability, transparency and resilience.
The Pragmatic Programmer
Authors: Andy Hunt and Dave Thomas
What the book is about: Originally a textbook on computer programming and software engineering, The Pragmatic Programmer contains a collection of tips and insights into how to write good software. It is the source of a number of popular stories and tricks in software engineering, such as the story of the boiling frog, stone soup and rubber duck debugging. As the title suggests, the book is pragmatic, acknowledging that there is no one ‘correct’ method and instead emphasises the need for programmers to weigh the pros and cons of various valid methods.
Why it’s relevant: With pipelines becoming bigger and more complex, writing good code has become a crucial prerequisite for today’s data practitioners. This book offers useful insight and takeaways to inform a variety of data specialisms.
Introduction to Statistical Learning
Authors: Gareth James, Daniela Witten, Trevor Hastie and Rob Tibshirani
What the book is about: ISLR introduces the main statistical learning techniques, their theory and practical applications, as well as some general learning theory. It is an accessible textbook, relatively short, yet comprehensive. It covers most major algorithms and techniques used in data science and machine learning.
Why it’s relevant: If there is one comprehensive, technical yet accessible guide to data science, this is it. For everyone looking to break into data science, close study of this book provides vital preparation, while also offering a good refresher for experienced practitioners.
Deep Medicine
Authors: Eric J. Topol
What the book is about: Renowned M.D. and science communicator Topol explores AI and its potential to revolutionise medicine. The book is accessible to all audiences and gives a broad overview of the promise and pitfalls of applying AI to healthcare. It strikes a fascinating balance between using specific examples — of patient cases, research findings and AI applications — to illustrate larger points.
Why it’s relevant: The book holds that healthcare — while somewhat late to the party — will inevitably adopt AI. While there has been impressive progress and the potential of improvements to our increasingly ‘uncaring’ healthcare are dramatic, there has also been a considerable amount of hype and the risk of harm — unintended or not — cannot be ignored. Physicians and patients alike will require more AI-literacy to make sure that the next ‘revolution in healthcare’ will improve our lives.
Sometimes hearing a new explanation of a familiar concept is all it takes to spark new ideas and ways of working. For that reason, we’re always looking out for new books to further our understanding of the concepts we work with every day at QuantumBlack. The above list just scratches the surface of the breadth of suitable reading material — do let us know your favourite data-related reads in the comment section.