So You Want to Be a Data Engineer?

QuantumBlack, AI by McKinsey
4 min readApr 15, 2021

Sam Hiscox, Data Engineer, Balazs Konig, Data Engineer, QuantumBlack

Machine learning, data science, MLOps, deep learning — these disciplines are enabling organisations of every size, from startup to industry titan, to drive greater value. Scaling the impact of analytics technology will remain a major business priority for years to come and, with such enormous amounts of data involved, much of the heavy lifting around scalability will fall to data engineers.

These practitioners will be responsible for deciding what data is useful and what is superfluous, transforming the relevant digital information into fuel for algorithms and ensuring the process is future-proofed. It’s therefore understandable why data engineering has found itself under the spotlight and is becoming an increasingly popular career path.

Both authors came to data engineering via the more traditional mechanical engineering route and we know that a significant number of today’s students, graduates and working professionals are gravitating towards the specialism straight away. This article aims to demystify the role and highlight the key skills that will help aspiring data engineers navigate their initial journey.

Value Mindset Over Background

We both experienced moments of imposter syndrome when transitioning into data engineering during our early careers. We questioned whether our qualifications in mechanical engineering were enough of a foundation or whether we were simply out of our depth. Over time, we have found that mindset is more important than background. If you’re the kind of person who likes taking things apart, working out how they do what they do and whether they can be improved, you will likely suit data engineering.

We first encountered data-driven projects while working at automotive companies. We couldn’t read the coding languages that were being used and we had never ‘deployed to the cloud’ before. But we devoted our time to learning how to solve these interesting problems and to improving our technical competency with sites like HackerRank, Udemy and DataCamp.

We would also reassure those already engrossed in a different specialism that we are not the only data engineers who arrived without a background in Computer Science. QuantumBlack currently employs data engineers who studied and worked in fields such as Biology, Chemical Engineering and Economics. Our Global Head of Data Engineering studied Latin and Ancient Greek. Instead of presenting a barrier to entry, these different perspectives across the team have helped us to solve challenging problems in a variety of different domains, from IoT and banking to elite sports and autonomous vehicle development. Experience and qualifications that you see as irrelevant may actually be an asset.

Appreciate — And Write — Production-Ready Code

Aspiring data engineers often ask what technologies they should learn. While we could reel off a list of ten technologies we are currently fans of, the landscape evolves so rapidly that this article would soon be out of date. Perhaps a better question is, what skills will make a great data engineer?

Today’s data engineers are essentially specialised software engineers. We aspire to the same fundamental tenets of good coding; code is read much more frequently than it is written, so the best engineers spend time making their code clear and well documented. Comprehensive testing means fewer bugs. Implementing DevOps processes helps to find issues earlier. Using version control systems like git provides traceability, transparency and enables collaboration.

Coding fundamentals, like writing readable, tested code with clear documentation, are independent of language — the only difference between good Python and good Java is syntax. With that in mind, we suggest spending time learning to write tests, add clear docstrings to any code you write and investigate software engineering design principles like DRY.

Contributing to open source projects is a fantastic way to practice writing production-ready code and using version control systems, to say nothing of acquiring invaluable experience which can be referenced in job interviews. Websites such as Kaggle provide open source datasets to experiment with — why not try writing a data pipeline using an open source framework like Kedro?

Recognise That No Data Engineer Is An Island

Since joining QuantumBlack, we have had the opportunity to work on cutting-edge data engineering R&D and across Quantumblack’s largest recommendation-engine project. This has been exhilarating, rewarding work but has also presented complex problems that can only be achieved by working as a team.

At QuantumBlack, a data engineer’s typical working day may involve meeting with a business analyst to discuss the nuances of manufacturing data, or working alongside a data scientist to explore which features are required to test a hypothesis. We might need to talk to the technical delivery lead about DevOps, or liaise with a machine learning engineer around how to make model explainability data available to the partner organisation. Building the right solution goes beyond technical ability and communication skills are essential for translating business problems into data engineering tasks that can then be tackled.

As well as translating business needs into a technical brief, communicating how your solution actually works to a variety of audiences is an essential capability. Presenting highly technical content to a room of non-technical professionals is a crucial part of the job — after all, we want business leaders to understand the solutions we’ve created, the impact they enable and how end users should be operating them.

We hope this article offers useful insight into the requisites for becoming a data engineer. We strongly believe that there’s never been a more exciting time to join the specialism and that data engineering’s best days are still to come. We appreciate that the technical skill sets involved may appear to be a barrier, but as outlined above a problem-solving mindset and the ability to communicate well are tremendous foundations to build upon.

Data Engineering needs a diverse talent pool, not just those with a background in Computer Science or Data Science. If you’re interested in pursuing a career in data engineering, please do check out our latest available opportunities on the QuantumBlack website.

--

--

QuantumBlack, AI by McKinsey

An advanced analytics firm operating at the intersection of strategy, technology and design.