Is data lineage important? Many don’t seem to think so. Case in point, TJ’s tweet below.
Fine!! I won’t mention data mesh in this post, TJ 😉. Follow that thread and you’ll see comments that say provenance and observability are what data lineage solves but not much else.
Data Products
Fine, I lied 😬. I’m going to talk about data mesh a little by talking a bit about data products. The easiest way to define data products is by comparing them to produce you buy at the grocery store. If you care about what you intake into your body, then you probably care about how your food is processed and grown. When shopping around for produce, people tend to inspect the food they buy at least a little bit. The organic labels that come with your food tells something about how it processed. Specifically if it may have pesticides. If you know what it takes to have an organic label on your food, then you may have some idea of its processing lineage. The label gives you a hint of that lineage. We don’t have this for data. You’ll need data lineage for that.
Data lineage gives you the assurance needed for consumers of data that it was properly cleansed, enriched, and secured. We should find a way to provide this information as part of the metadata of the data product. This is especially important for data scientists. They need to explain the insights they provide to the business. If data scientists are unsure of the provenance and processes through which the data products traveled, how can they be sure their metrics are safe to use for critical changes in the business. This should resonate especially to those in the healthcare or pharmaceutical industries where changes could be life threatening.
Streaming Data Lineage
If you’ve tried to build a complete picture of data lineage from source system to sink system, oftentimes you’re dealing with incomplete information. You may have to stitch multiple lineage graphs together (provided by separate tools) to get the full picture from source to sink.
Keep reading with a 7-day free trial
Subscribe to SUP! Hubert’s Substack to keep reading this post and get 7 days of free access to the full post archives.