Data Lineage Analysis: Most Common Use Cases

Or Hillel
Startups Nation
Published in
5 min readAug 25, 2021

--

Data Lineage Analysis

You’re in charge of planning your department’s upcoming team-bonding social event. Which of the following activities do you consider?

  • Escape room
  • Scavenger hunt
  • Cocktail competition
  • Data lineage analysis

We’re 99% positive that you did NOT pick the data lineage project.

Why? Because no matter how much you and your team appreciate the value of data lineage analysis, it will never be a leisure activity you do for FUN.

The data lineage process (i.e. tracking any data point back to its origin or ahead to its destination and seeing what happened to it along the way) is and will always remain a tool: something that you USE for a purpose.

What are the most common purposes for which data lineage analysis is used?

Let’s go through a few.

Pinpointing the source of data-related problems (a.k.a. Root Cause Analysis)

“Why do these reports give contradictory figures?!”

“What sources are involved in landing the data in this report?!”

“Why does the data seem completely off in this table?!”

“Why is this report showing corrupt data?!”

“Why do the business side’s questions always seem to end with a ?! ?!”

To answer these questions (well, maybe not the last one), you and your team need to play detective and track the error to its root using the trusty bloodhound of data lineage. While you *could* do this manually — and for years manual data lineage was the only option — it typically takes hours, days or longer. Now that automated data lineage can enable your team to identify in minutes where any given figure came from, an automated data lineage tool is the way to go.

After you have your entire data pathway mapped out, you can ascertain through data lineage mapping analysis if there was an error in the pathway or, alternatively, to confirm the figure and provide a reasonable explanation.

Data lineage analysis helps you way beyond the specific case you’re investigating. Because you can actually trace the error to its root cause, you have the power to fix and eliminate the cause of the bad data. When you say, “I’m sorry. This won’t happen again,” you’ll be speaking with confidence (instead of with a wish and a prayer).

Predicting the future without a crystal ball (a.k.a. Impact Analysis)

There is nothing so dangerous as a change to a report, process or system that is described prior to execution as “oh, it’s only a tiny change; it’s not a big deal.” Inevitably, your team will be burning the midnight oil dealing with the fallout from said “tiny” change.

But what if you could foresee the impact of the change before it’s actually made? What if you could track the potential shock waves both upstream and downstream, and warn the entities that it would impact — in advance?

AH — behold the power of data lineage impact analysis!

Automated data lineage gives you these powers of foresight, enhancing your agility and adaptability, and enabling you to make system or process changes without a long preparatory period before and without unintended fallout after.

Proving your data is what and where you say it is (a.k.a. Regulatory Compliance)

GDPR, HIPAA, FRTB, BCBS, ABCDEF…

Ha! Just checking to see if you were paying attention!

You need to be paying attention to your data, because your clients, industry professionals and compliance auditors are.

If a client contacts your organization and requests the removal of his personal information, you need to be able to dive into your system, locate all of his PII and wipe it out of existence. If said client (or an auditor) asks for proof that the PII has gone poof, you better be able to provide it.

If you’re a financial institution who wants to be able to use your own internal financial models under regulatory standards like TRIM and FRTB, you need to be able to prove the veracity of those models and numbers. Even if you simply want to say that an asset is worth $50,000, someone is going to want to see how you got that number — and your reputation relies on you having a data-backed answer.

Data lineage analysis report capabilities are your ticket to strong, reliable, data-backed answers. They are what enable you to look compliance auditors in the eye and say, “You want to know how I got that number? Here’s the exact data that went into it: just take a look!”

Even if they do pass ABCDEF (Act of Better Consumer Data Effective Forthwith), a strong handle on data lineage analysis means you’ll be prepared.

Preparing data for a seamless move (a.k.a. System Migration)

“Hurray — we’re migrating to Snowflake (or Azure Data Factory, or <your dream data system here>)!”

After you’ve raised your toasts and drained your champagne glasses, the grim reality sets in: migrating from a legacy system to a modern, cloud-based one is like packing up your cluttered New Jersey home of 35 years to move to a stunning brand-new condo in Hawaii. It’s ridiculously exciting… and ridiculously overwhelming.

Typical pre-migration prep involves:

  • Identifying redundancies, obsolete or unreliable data sources and targets
  • Finding and eliminating processes that simply aren’t needed anymore
  • Assessing the dependencies that need to be created between processes

Doing this manually is a tedious, nit-picking job, requiring endless amounts of combing through logs and job schedules in disparate systems and examining stored procedure code and report definitions. The process usually takes many months. And because the company’s data landscape is naturally always changing, the BI team ends up chasing a moving target.

Automated data lineage quickly creates a visual map of your data systems. Visual data lineage tools clearly show the data, the processes and the relationships between them. Your team can then smoothly put data lineage analysis into play, identifying data sources, processes and targets that are relevant vs. those that are obsolete, questionable or nonexistent. In addition, processes can be identified and analyzed to ensure they are generating the expected output in the right sequence.

We’ll understand why you’re breaking out the champagne again.

Put data lineage analysis to good use

Eliminating problems, saving time, maintaining your reputation, keeping the business side off your back… data lineage analysis is the BI superpower.

But all superheroes need a break. So when you plan your department’s team-bonding social event, be sure to leave data lineage out of it.

--

--

Or Hillel
Startups Nation

Helps executive teams, marketers and data analysts leverage innovative digital strategies and emerging technologies to outsmart their competitors.