Skip to main content
Version: Next

DataHub Releases

Summary

VersionRelease DateLinks
v0.14.12024-09-17Release Notes, View on GitHub
v0.14.0.22024-08-21Release Notes, View on GitHub
v0.14.02024-08-13Release Notes, View on GitHub
v0.13.32024-05-23View on GitHub
v0.13.22024-04-16View on GitHub
v0.13.12024-04-02View on GitHub
v0.13.02024-02-29View on GitHub
v0.12.12023-12-08View on GitHub
v0.12.02023-10-25View on GitHub
v0.11.02023-09-08View on GitHub
v0.10.52023-08-02View on GitHub
v0.10.42023-06-09View on GitHub
v0.10.32023-05-25View on GitHub
v0.10.22023-04-13View on GitHub
v0.10.12023-03-23View on GitHub
v0.10.02023-02-07View on GitHub
v0.9.6.12023-01-31View on GitHub
v0.9.62023-01-13View on GitHub
v0.9.52022-12-23View on GitHub
v0.9.42022-12-20View on GitHub
v0.9.32022-11-30View on GitHub
v0.9.22022-11-04View on GitHub
v0.9.12022-10-31View on GitHub
v0.9.02022-10-11View on GitHub
v0.8.452022-09-23View on GitHub
v0.8.442022-09-01View on GitHub
v0.8.432022-08-09View on GitHub
v0.8.422022-08-03View on GitHub
v0.8.412022-07-15View on GitHub

v0.14.1

Released on 2024-09-17 by @david-leifker.

DataHub v0.14.1 Release Notes

User Experience

  • Enhanced Data Propagation UI: New features allow viewing propagated column documentation, source information, and asset-level propagation details. This improves visibility into data lineage and enables better understanding of data flow across the organization. (#11047)

  • Improved Search Result Tracking: Added page number to search result click events, enabling better measurement of search ranking performance. This helps users understand and optimize their search experience. (#11151)

  • Fixed Display Issues: Resolved issues with displaying "0" values for last ingested data and improved handling of multilingual characters in descriptions. These fixes ensure more accurate and readable information presentation. (#10840, #10975)

Developer Experience

  • Performance Improvements:

    • Implemented lazy dataLoaders for GraphQL queries, significantly reducing latency for local environments. (#11293)
    • Added option to log slow GraphQL queries, helping identify and address performance bottlenecks. (#11308)
    • Introduced session authorization caching for faster access checks. (#11327)
  • Enhanced Search Capabilities:

    • Added support for custom highlighting fields in GraphQL queries, allowing faster and more customizable data retrieval. (#11339)
    • Implemented new search query functionality to filter by parents/children of Domains or Containers. (#11279)
    • Added support for multiple values in 'CONTAIN', 'START_WITH', and 'END_WITH' operators, enabling more flexible and precise searches. (#11068)
  • API Improvements:

    • Extended throttling to API requests, supporting non-browser ingestion/write requests and manual throttling for better control over system load. (#11325)
    • Added support for 'START_WITH' and 'END_WITH' operators in GraphQL API, enhancing string query capabilities. (#11026)
  • Bug Fixes:

    • Resolved issues with forward slash handling in search queries, empty key-value pairs in Elasticsearch mapping, and support for various data types in object fields. These fixes improve search accuracy and data representation. (#10932, #11004, #11066)
    • Addressed Postgres regression by upgrading the ebean library from version 12.x to 15.x, resolving a read lock NPE issue. (#11379)

Metadata Ingestion

  • S3 Integration Enhancements:

    • Enhanced partition support for S3 dataset ingestion, improving metadata representation and enabling advanced partition detection. (#11083)
    • Enhanced S3 ingestion process to support reading specific file types, allowing more granular control over data ingestion. (#11177)
  • BigQuery Improvements:

    • Implemented query log extractor for BigQuery, creating "Query" entities with usage statistics, lineage, and operation details. (#10994)
    • Added support for filtering GCP project ingestion based on project labels, enabling more targeted data collection. (#11169)
    • Implemented query job retries for transient errors, improving system robustness. (#11162)
  • Snowflake Updates:

    • Added support for Iceberg tables in Snowflake access history, enhancing lineage capture capabilities. (#10961)
    • Introduced ability to define clustering key formulas for Snowflake datasets. (#11254)
    • Fixed tag exclusion issues in Snowflake ingestion process. (#11250)
  • New and Updated Connectors:

  • Other Ingestion Improvements:

    • Added support for MongoDB database ingestion as containers. (#11178)
    • Implemented automatic capturing of Snowflake assets with Pandas I/O Manager in Dagster module. (#11189)
    • Enhanced Fivetran ingestion with destination ID filtering capabilities. (#11277)
    • Added support for browse-only tables in Databricks ingestion. (#10766)

Other Improvements and Fixes

  • Upgraded various dependencies including Kafka, Azure Identity, Acryl-SQLglot, and GraphQL/Spring versions.
  • Improved error handling and logging across multiple components.
  • Enhanced test coverage and reliability.
  • Updated documentation for various features and processes.

Breaking Changes

Notable breaking changes include:

  • Removal of lower method from get_db_name in SQLAlchemySource, affecting URNs of related entities.
  • Changes to default sink mode and aspect handling that require server version 0.14.0+.

See the full details here.

Contributors

We extend our heartfelt thanks to all contributors for their valuable work on this release:

First-Time Contributors

@AaronYang0628, @alexandrebunn, @alisa-aylward-toast, @arpanchakra29, @esselius, @eunseokyang, @ignitz, @milindgupta, @milindgupta9, @Nbagga14, @rohansun, @sakethvarma397, @vignesh-hbk

Repeat Contributors

@deepgarg-visa, @dushayntAW, @feldjay, @filipe-caetano-ovo, @ksrinath, @Masterchen09, @matthew-coudert-cko, @mayurinehate, @nmbryant, @pinakipb2, @prashanthic23, @sagar-salvi-apptware, @siladitya2, @sleeperdeep

DataHub Maintainers

@anshbansal, @asikowitz, @chriscollins3456, @darnaut, @david-leifker, @eboneil, @hsheth2, @jjoyce0510, @maggiehays, @pedro93, @RyanHolstien, @shirshanka, @sid-acryl, @skrydal, @treff7es, @yoonhyejin

Your contributions are invaluable in making DataHub better for everyone. Thank you!

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.14.0.2...v0.14.1

v0.14.0.2

Released on 2024-08-21 by @RyanHolstien.

DataHub v0.14.0.2 Release Notes

User Experience

  • Renamed: Validation --> Quality: The Validation tab has been renamed to Quality to make it more intuitive to end-users that it contains outcomes from data quality checks. [#10935]

  • Data Contract UI: A new Data Contract UI is now available under the Quality Tab, allowing users to handle various data assertion types and add/remove contracts more easily. [#10625]

  • Updates to Customized Search Ranking: By default, explore (* ) query results are ranked based on enrichment (tags, terms, owners, description, domains, row/column counts) as well as incident status. [#10774]

  • Custom Dataset Names: Business users can now maintain an editable dataset name separate from default properties, providing more control over dataset identification. [#10608]

  • Documentation Propagation Setting Page: A new settings page has been added to the UI for managing Documentation Propagation, giving users more control over how documentation is shared across the platform. [#11038]

Developer Experience

  • NEW: DataHub Open Assertions Specification:

    • Announcing a universal assertions specification for declaring Data Quality checks and compiling them into artifacts for use by 3rd party Data Quality tools like Great Expectations, dbt tests, and Snowflake via Data Quality DMFs. [[#1](https://github.com/datahub-project/datahub/pull/1)0609]
    • Added ability to define data quality rules using a YAML specification file, enabling users to set assertions like volume metrics and conditions, with the ability to compile and schedule them to run on Snowflake as the assertion backend. [#10602]
  • API and SDK Enhancements:

    • New GraphQL APIs added for managing forms, structured properties, and data contracts. [#10826, #10825, #10632]
    • Updates to Java and Python SDKs to support creating and updating structured properties on assets. [#10823, #10824]
    • Support for conditional write semantics including If-Modified-Since, If-Unmodified-Since, and If-Version-Match in MetadataChangeProposals (MCP) and OpenAPI. [#10868]
  • CLI Improvements:

    • A new check server-config command has been added to test server credentials and retrieve diagnostic information. [#10990]
    • The get command now includes a --details/--no-details flag for more detailed output, facilitating easier issue debugging. [#10815]
    • Update to CLI to optionally display server configuration settings. [#10676]
    • Added functionality to the CLI by introducing the ability to assign actors (users or groups) to forms in the forms YAML API. [#10683 ]
  • Improved Logging and Monitoring:

    • Unified request logging implemented across GraphQL, OpenAPI, and Restli requests, including additional information like actor, IP address, and API type. [#10802]
    • New CLI command check server-config added to test server credentials and retrieve diagnostic information. [#10990]
  • Performance Optimizations:

    • Implemented throttling for the mce-consumer based on mae-consumer lag. [#10626]
    • Unified request logging now includes additional information like actor, IP address, and API type across GraphQL, OpenAPI, and Restli requests. [#10802]
    • Added an ASYNC_BATCH mode to the rest sink for improved performance. [#10733]
    • Improved the performance of read queries in Neo4j by specifying labels and combining multiple Neo4j statements within the addEdge function into a single statement, improving efficiency and performance. [#10593, #10598]
  • Security Enhancements:

    • Updated encryption and decryption methods with a stronger cryptographic algorithm. [#11059]
    • Optimized regular expressions to prevent potential ReDoS vulnerabilities. [#10315]

Metadata Ingestion

  • New Ingestion Sources:

    • Azure Blob Storage: Added as a new ingestion source with support for Path Specs. [#10813]
    • Grafana: New connector to ingest dashboards, providing documentation within DataHub for DevOps members on call. [#10891]
    • IBM DB2: Added support for this platform. [#10601]
  • Snowflake Improvements:

    • Enhanced view lineage parsing without query-based lineage/usage. [#10905]
    • Added support for more than 10k views in a Snowflake database. [#10718]
    • Implemented parallel schema extraction for improved performance. [#10653]
    • Added snowflake-queries source for lineage, usage, queries, and operational metadata to improve performance and configurability. [#10835]
  • BigQuery Enhancements:

    • Refactored and parallelized dataset metadata extraction for better performance. [#10884]
    • Added support for new data types including BIGNUMERIC, NUMERIC, DECIMAL, BIGDECIMAL, FLOAT64, and RANGE. [#10950]
    • Added support for ingesting View labels during ingestion. [#10648]
  • Looker Updates:

    • Ingested explore tags into DataHub. [#10547]
    • Fixed issues related to CLL generation when the view definition language is SQL. [#10542]
    • Added support for including platform instance details in URNs for dashboards and charts. [#10771]
  • Other Improvements:

    • dbt: Enhanced flexibility in lineage generation with the new experimental prefer_sql_parser_lineage flag. [#11039]
    • Airflow: Task ownership info can now be set as a group rather than an individual user. [#10742]
    • Athena: Enhanced profiling capabilities to support column quantiles and medians. [#10723]
    • Fivetran: Improved connector performance for faster ingestion. [#10556]
    • SageMaker: Added stateful ingestion capability to remove deleted assets during ingestion runs. [#10573]
    • Tableau: Support added for ingesting multiple Tableau sites in a single configuration, with sites appearing as containers in DataHub. [#10498]
    • Added support for ingesting schemas from schema registry in the Kafka module. [#10612]
    • Introduced a TagsToTermMapper transformer for mapping specific tags to glossary terms. [#10758]
    • Enhanced the SQL lineage parser with an optional default_dialect parameter for customized dialect selection. [#10830]

Other Improvements and Fixes

  • Fixed high vulnerabilities related to sensitive information logging. [#11088]
  • Optimized regular expressions to prevent potential ReDoS vulnerabilities. [#10315]
  • Improved error handling and logging across various modules.
  • Enhanced test coverage for new features and existing functionality.

Breaking Changes

  • Protobuf CLI will no longer create binary encoded protoc custom properties by default.
  • Changes to Data flow info and data job info aspects may require a server upgrade.
  • OpenAPI V3 - Creation of aspects now requires wrapping within a value key.
  • Profiling configuration for Glue source has been updated.

For full details on breaking changes, please refer to the updating guide.

Contributors

Massive shoutout to all of the contributors who made this release possible:

First-Time Contributors

@aabharti-visa, @acrylJonny, @amit-apptware, @AndreasHegerNuritas, @aviv-julienjehannet, @brbrown25, @chardaway, @dragontail, @ipolding-cais, @joelmataKPN, @john-claro-cko, @jordanjeremy, @lima-renan, @nadavgross, @nephtyws, @obaltian, @PeamThom, @pie1nthesky, @pulsar256, @samblackk, @shtephlee, @simaov, @steffengr, @tkdrahn, @TristanHeisler, @wornjs, @xkollar

Repeat Contributors

@ajoymajumdar, @bossenti, @cburroughs, @cccs-eric, @deepgarg-visa, @dushayntAW, @fjmacagno, @githendrik, @haeniya, @jayasimhankv, @k7ragav, @kevin1chun, @ksrinath, @Kunal-kankriya, @looppi, @Masterchen09, @mayurinehate, @ngamanda, @nmbryant, @noggi, @pankajmahato-visa, @PatrickfBraz, @pinakipb2, @Rajasekhar-Vuppala, @rtekal, @sagar-salvi-apptware, @shubhamjagtap639, @siladitya2, @ssilb4, @Sukeerthi31, @sumitappt, @TonyOuyangGit, @walter9388

DataHub Maintainers

@anshbansal, @asikowitz, @chriscollins3456, @darnaut, @david-leifker, @eboneil, @ethan-cartwright, @gabe-lyons, @hsheth2, @jayacryl, @jjoyce0510, @maggiehays, @pedro93, @RyanHolstien, @shirshanka, @sid-acryl, @skrydal, @treff7es, @yoonhyejin

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.13.3...v0.14.0.2

v0.14.0

Released on 2024-08-13 by @RyanHolstien.

Known Issues

Issue with kafka-setup missing a script for new deployments, hotfix will be released shortly

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.13.3...v0.14.0

v0.13.3

Released on 2024-05-23 by @david-leifker.

View the release notes for v0.13.3 on GitHub.

v0.13.2

Released on 2024-04-16 by @david-leifker.

View the release notes for v0.13.2 on GitHub.

v0.13.1

Released on 2024-04-02 by @david-leifker.

View the release notes for v0.13.1 on GitHub.

v0.13.0

Released on 2024-02-29 by @RyanHolstien.

View the release notes for v0.13.0 on GitHub.

DataHub v0.12.1

Released on 2023-12-08 by @david-leifker.

View the release notes for DataHub v0.12.1 on GitHub.

v0.12.1rc2

Released on 2023-11-28 by @david-leifker.

View the release notes for v0.12.1rc2 on GitHub.

v0.12.0

Released on 2023-10-25 by @pedro93.

View the release notes for v0.12.0 on GitHub.

v0.11.0

Released on 2023-09-08 by @iprentic.

View the release notes for v0.11.0 on GitHub.

v0.10.5

Released on 2023-08-02 by @david-leifker.

View the release notes for v0.10.5 on GitHub.

v0.10.4

Released on 2023-06-09 by @pedro93.

View the release notes for v0.10.4 on GitHub.

v0.10.3

Released on 2023-05-25 by @iprentic.

View the release notes for v0.10.3 on GitHub.

DataHub v0.10.2

Released on 2023-04-13 by @iprentic.

View the release notes for DataHub v0.10.2 on GitHub.

DataHub v0.10.1

Released on 2023-03-23 by @aditya-radhakrishnan.

View the release notes for DataHub v0.10.1 on GitHub.

DataHub v0.10.0

Released on 2023-02-07 by @david-leifker.

View the release notes for DataHub v0.10.0 on GitHub.

DataHub v0.9.6.1

Released on 2023-01-31 by @david-leifker.

View the release notes for DataHub v0.9.6.1 on GitHub.

DataHub v0.9.6

Released on 2023-01-13 by @maggiehays.

View the release notes for DataHub v0.9.6 on GitHub.

DataHub v0.9.5

Released on 2022-12-23 by @jjoyce0510.

View the release notes for DataHub v0.9.5 on GitHub.

[Known Issues] DataHub v0.9.4

Released on 2022-12-20 by @maggiehays.

View the release notes for [Known Issues] DataHub v0.9.4 on GitHub.

DataHub v0.9.3

Released on 2022-11-30 by @maggiehays.

View the release notes for DataHub v0.9.3 on GitHub.

DataHub v0.9.2

Released on 2022-11-04 by @maggiehays.

View the release notes for DataHub v0.9.2 on GitHub.

DataHub v0.9.1

Released on 2022-10-31 by @maggiehays.

View the release notes for DataHub v0.9.1 on GitHub.

DataHub v0.9.0

Released on 2022-10-11 by @szalai1.

View the release notes for DataHub v0.9.0 on GitHub.

DataHub v0.8.45

Released on 2022-09-23 by @gabe-lyons.

View the release notes for DataHub v0.8.45 on GitHub.

DataHub v0.8.44

Released on 2022-09-01 by @jjoyce0510.

View the release notes for DataHub v0.8.44 on GitHub.

DataHub v0.8.43

Released on 2022-08-09 by @maggiehays.

View the release notes for DataHub v0.8.43 on GitHub.

v0.8.42

Released on 2022-08-03 by @gabe-lyons.

View the release notes for v0.8.42 on GitHub.

v0.8.41

Released on 2022-07-15 by @anshbansal.

View the release notes for v0.8.41 on GitHub.