Data Engineering’s Role Is Scaling Beyond Scope — And That Should Be Celebrated
Saravanakumar Subramaniam, Principal Data Engineer, Matthieu Vautrot, Principal Data Engineer, Evangelos Theodoridis, Principal Data Engineer, QuantumBlack. With contributions from Toby Sykes, Global Head Of Data Engineering, QuantumBlack
Last week marked the publication of the DataIQ 100, the annual list recognising the UK’s leading individuals working across data and analytics — and our own Toby Sykes, QuantumBlack’s Global Head of Data Engineering, was among them.
Toby has spent the last five years shaping and growing QuantumBlack’s worldwide data engineering operations. He’s earned a reputation for applying innovative data engineering to drive commercial success — and he has consistently championed the benefits of using analytics to adapt an organisation’s culture and processes to ultimately drive value. DataIQ recognised Toby’s leadership and commitment in demonstrating the importance of data and analytics, alongside his extensive work in representing QuantumBlack and McKinsey’s global data engineering guild.
It is a well-deserved badge of honour for Toby. However, it is also an encouraging sign of data engineering’s continued, growing importance. Previous DataIQ lists have focused primarily either on data science or on broader, strategic roles such as Head of Data or Chief Data Officer, while last year’s list featured nobody with the words ‘data engineering’ in their title. While it may seem a relatively minor development, the elevation and recognition for data engineering shows that the industry as a whole is maturing and realising that successful advanced analytics at scale is much more than just modelling — in fact 80% of analytics is effectively data engineering. But what does this mean for practitioners and for those organisations aiming to harness analytics?
The 2021 Data Engineer
There are few question marks around advanced analytics these days — people know that it can be applied to drive business value. There are still questions around fragmented, legacy data landscapes and therefore there is a challenge around how quickly, efficiently, and effectively solutions can be scaled in order to drive the most value across an organisation. This is where data engineering plays a crucial role.
“There’s certainly more discipline in how analytics solutions are being built, as many are being created with an expectation that the data pipelines eventually scale beyond the initial project scope,” explains Toby, “This means software engineering expertise has become vital, not just in ensuring applications are developed in a robust, scalable way but also in determining speed to development, deployment and faster reuse for future models/data consumption. Traditional software engineering and DevOps have been welcomed into the data science world, enabling a faster path from experimentation to production, more effective reusability of both code and data assets, and more robust, resilient solutions (e.g. DataOps).”
Today’s data engineers are responsible for unlocking data science and analytics in an organisation, as well as building well-curated, accessible data foundations. Responsibilities have increased and expectations are higher than they were even five years ago.
“At QuantumBlack we place an enormous emphasis on the need for our data engineers to mix and learn from colleagues across data science, technical architecture and DevOps,” says Toby. “This is partly to support communication across our multidisciplinary team, but it’s also because today’s data engineers will need to play some of these roles in their specific function. The responsibility horizon for data engineers has expanded massively in recent years and now includes software development lifecycle best practice, data architecture, as well as informing an organisation’s data ethics, governance, and information security.”
Building An Environment For Data Engineering Growth
With high functioning analytics teams increasingly relying on data engineers, organisations have had to adapt. The race for engineering talent has certainly grown hotter in recent years as companies recognise the business value in iterating faster with data and scaling value delivery in order to continuously learn and drive better organisational performance.
“We’ve significantly expanded our team in a relatively short space of time,” explains Toby. “When I arrived five years ago I was one of three QuantumBlack data engineers, but today we have more than 100 globally. It’s now common to see data engineers making up a far higher proportion of a project team. For consultancy projects you may find the data scientist and data engineer ratio becomes 1:1, simply because they begin in a challenging and technical data environment which must be prepared and ‘cleaned’ before data science work can begin.”
Many organisations want to emphasise the need for speed when it comes to cleaning and transforming this data. There is significant demand for data engineers to drive down this preparation time and make data available and reliable at the point of need. However, across the analytics industry data architecture and governance is becoming progressively more complex and many barriers still remain to realising fast access to data. This can make for a frustrating experience for fresh data engineering talent, usually trained on the latest Cloud technologies, who become hindered by traditional governance processes. Organisations must enable and encourage their test and learn mindset, providing faster (while remaining safe and compliant) access to data, as well as leveraging the full power of the Cloud to be agile in provisioning tools and services to improve developing data assets. This unlock, coupled with an approach to code and data reuse, will make for a happy, excited team that can generate real value.
And the vast majority of companies have or are adopting Cloud technologies to provide for more flexibility and agility to develop data solutions. There has also been a rise in open source tools and frameworks to take advantage of, with libraries such as Spark and Tensorflow becoming widespread, and many organisations seeking to minimise vendor or product lock-in. So in addition to refining governance around availability and access to the tooling, companies have to be careful what tools to choose:
“Over the last few years there’s been an explosion in the newer frameworks which provide a greater choice in languages to deploy,” says Toby. “However, organisations need to be measured in how much flexibility they allow. Writing data and model pipelines in a variety of languages may not have much impact on whether a solution achieves its initial scope, but it makes scaling later a challenging prospect. Think about the languages that have the highest level of adoption and contribution and ideally pick one for your end-to-end AA solutions”
With scalability front of mind today, organisations are deciding to shift to a single tech stack. “At QuantumBlack we decided to align on a single end-to-end framework across data engineering and data science (Kedro). This single way to develop pipelines made the entire process much easier and it’s a decision that an increasing number of firms are facing today.”
As companies continue to transform to be more data-driven, Data Engineering’s contribution will only grow and it’s likely that initiatives such as the DataIQ 100 will continue to recognise and celebrate engineers in the years ahead.
“Systems will only become more sophisticated, and this will inevitably require even more complex and powerful infrastructure,” says Toby. “It’s an incredibly exciting time for new data engineers in particular, as they’re joining an industry that has never valued them more highly at a time when that industry is also being relied upon to solve some of the world’s biggest challenges.”
If Toby’s story has inspired you and you are interested in joining his team, we are currently hiring Data Engineers in many locations around the world. Details of open roles and how to apply can be found on our website.