Data Lineage is now generally available in Amazon DataZone and next generation of Amazon SageMaker

AWS announces general availability of Data Lineage in Amazon DataZone and next generation of Amazon SageMaker, a capability that automatically captures lineage from AWS Glue and Amazon Redshift to visualize lineage events from source to consumption. Being OpenLineage compatible, this feature allows data producers to augment the automated lineage with lineage events captured from OpenLineage-enabled systems or through API, to provide a comprehensive data movement view to data consumers.

This feature automates lineage capture of schema and transformations of data assets and columns from AWS Glue, Amazon Redshift, and Spark executions in tools to maintain consistency and reduce errors. With in-built automation, domain administrators and data producers can automate capture and storage of lineage events when data is configured for data sharing in the business data catalog. Data consumers can gain confidence in an asset’s origin from the comprehensive view of its lineage while data producers can assess the impact of changes to an asset by understanding its consumption. Additionally, the data lineage feature versions lineage with each event, enabling users to visualize lineage at any point in time or compare transformations across an asset’s or job’s history. This historical lineage provides a deeper understanding of how data has evolved, essential for troubleshooting, auditing, and validating the integrity of data assets.

The data lineage feature is generally available in all AWS Regions where Amazon DataZone and next generation of Amazon SageMaker are available.

To learn more, visit Amazon DataZone and next generation of Amazon SageMaker.
 

Source:: Amazon AWS