Overview

It is common to hear that ‘data is the new oil,’ and whether you agree or not, there is certainly a lot of untapped value in much of the data that organisations hold. Data is like oil in another way—it flows through pipelines. Many organisations want to put their data to better use, taking it from business processes or IT systems, analysing it and identifying insights that tell them new things about their customers and their operations.

To find these insights, the data has to be regularly, or even continuously, transported from the place where it is generated to a place where it can be analysed.

WE LIKE TO THINK OF THIS TRANSPORTATION AS A PIPELINE BECAUSE:

  1. Data goes in at one end and comes out at another location (or several others).

  2. The volume and speed of the data are limited by the type of pipe you are using.

  3. Pipes can leak - you can lose data if you don’t take care of them.

The data engineers who create these pipelines are the plumbers of the data world, and they are a critical service for any organisation that wants to take data analysis seriously. They create the architectures that allow the data to flow to the data scientists and business intelligence teams, who generate the insight that leads to business value.

This playbook is for anyone who is involved in designing, implementing or maintaining data pipelines.

WE HOPE THIS WILL HELP YOU CREATE BETTER PIPELINES AND ADDRESS THE COMMON CHALLENGES THAT CAN OCCUR, SUCH AS:

  • We are just getting started on our data journey. Are we doing the right thing?

  • We have lots of data, but gathering the insights and making decisions takes too long to be useful.

  • We have lots of really useful data, but it is locked away. Only a small number of people can access it.

  • It is difficult for my end-users to access the data they need. They end up creating their own data silos or unreliable temporary fixes.

  • I don’t trust my pipelines to deliver the data reliably.

  • It is difficult to add new data sources to my system. How can I be agile if it takes six months to add a new data source?

Download a PDF version of this playbook