WE HAVE FOUND THAT BY CONSIDERING DATA INGESTION AS A PIPELINE, WE CAN GAIN A NUMBER OF BENEFITS AS FOLLOWS:
Replicable patterns - understanding data processing as a network of pipelines creates a way of thinking that sees individual pipes as examples of patterns in a wider architecture, which can be reused and repurposed for new data flows.
Faster timeline for integrating new data sources - having a shared understanding and tools for how data should flow through analytics systems makes it easier to plan for the ingestion of new data sources, and reduces the time and cost for their integration.
Confidence in data quality - thinking of your data flows as pipelines that need to be monitored and also be meaningful to end users, improves the quality of the data and reduces the likelihood of breaks in the pipeline going undetected.
Confidence in the security of the pipeline - security is built in from the first pipeline by having repeatable patterns and a shared understanding of tools and architectures. Good security practices can be readily reused for new dataflows or data sources.
Incremental build - thinking about your data flows as pipelines enables you to grow your dataflows incrementally. By starting with a small manageable slice from a data source to a user, you can start early and gain value quickly.
Flexibility and agility - pipelines provide a framework where you can respond flexibly to changes in the sources or your data users’ needs.