Suppress failure when reading $partitions system table in get_indexes #426

LittleWat · 2023-11-30T06:24:32Z

Description

This PR attempts to handle the error when fetching Hudi tables schema in Superset and close Error when fetching Hudi tables Schema apache/superset#21945

Presto does support the $partitions table suffix per the release notes but Trino seems not to support this so this should be removed (?)

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/flask_appbuilder/api/__init__.py", line 110, in wraps
    return f(self, *args, **kwargs)
  File "/app/superset/views/base_api.py", line 127, in wraps
    raise ex
  File "/app/superset/views/base_api.py", line 121, in wraps
    duration, response = time_function(f, self, *args, **kwargs)
  File "/app/superset/utils/core.py", line 1454, in time_function
    response = func(*args, **kwargs)
  File "/app/superset/utils/log.py", line 255, in wrapper
    value = f(*args, **kwargs)
  File "/app/superset/databases/api.py", line 794, in table_extra_metadata
    payload = database.db_engine_spec.extra_table_metadata(
  File "/app/superset/db_engine_specs/trino.py", line 66, in extra_table_metadata
    if indexes := database.get_indexes(table_name, schema_name):
  File "/app/superset/models/core.py", line 863, in get_indexes
    return self.db_engine_spec.get_indexes(self, inspector, table_name, schema)
  File "/app/superset/db_engine_specs/base.py", line 1298, in get_indexes
    return inspector.get_indexes(table_name, schema)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/reflection.py", line 605, in get_indexes
    return self.dialect.get_indexes(
  File "/usr/local/lib/python3.9/site-packages/trino/sqlalchemy/dialect.py", line 283, in get_indexes
    partitioned_columns = self._get_columns(connection, f"{table_name}$partitions", schema, **kw)
  File "/usr/local/lib/python3.9/site-packages/trino/sqlalchemy/dialect.py", line 178, in _get_columns
    res = connection.execute(sql.text(query), {"schema": schema, "table": table_name})
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1306, in execute
    return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1498, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1862, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2043, in _handle_dbapi_exception
    util.raise_(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1819, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.9/site-packages/trino/sqlalchemy/dialect.py", line 399, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.9/site-packages/trino/dbapi.py", line 587, in execute
    self._iterator = iter(self._query.execute())
  File "/usr/local/lib/python3.9/site-packages/trino/client.py", line 810, in execute
    self._result.rows += self.fetch()
  File "/usr/local/lib/python3.9/site-packages/trino/client.py", line 830, in fetch
    status = self._request.process(response)
  File "/usr/local/lib/python3.9/site-packages/trino/client.py", line 609, in process
    raise self._process_error(response["error"], response.get("id"))
sqlalchemy.exc.ProgrammingError: (trino.exceptions.TrinoUserError) TrinoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message="Invalid Hudi table name (unknown type 'partitions'): my-table$partitions", query_id=20231129_142219_00171_dk2ne)
[SQL: SELECT
    "column_name",
    "data_type",
    "column_default",
    UPPER("is_nullable") AS "is_nullable"
FROM "information_schema"."columns"
WHERE "table_schema" = ?
  AND "table_name" = ?
ORDER BY "ordinal_position" ASC]
[parameters: ('my-schema', 'my-table$partitions')]

Non-technical explanation

Release notes

(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

* Fix error when fetching Hudi tables schema. ({issue}`https://github.com/apache/superset/issues/21945`)

cla-bot · 2023-11-30T06:24:35Z

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

ebyhr · 2023-11-30T06:46:05Z

trino/sqlalchemy/dialect.py

@@ -280,7 +280,7 @@ def get_indexes(self, connection: Connection, table_name: str, schema: str = Non
        if not self.has_table(connection, table_name, schema):
            raise exc.NoSuchTableError(f"schema={schema}, table={table_name}")

-        partitioned_columns = self._get_columns(connection, f"{table_name}$partitions", schema, **kw)
+        partitioned_columns = self._get_columns(connection, table_name, schema, **kw)


$partitions table depends on the connector's implementation, so we shouldn't have used it by hard-coding.
We could change this method like a) change the behavior based on the connector name or b) suppress exception. It would be nice to adopt both eventually as relying on the name isn't perfect, but we can start with b) in my opinion.

Removing $partitions table suffix doesn't make sense to me as it may break existing usages.

cc: @hashhar

thank you for sharing your opinion! We're using the Hudi connector so if I choose the A pattern, this should be fixed like as follows, right...? (please feel free to make commits to this branch or create another PR 🙇 )

if connector_name == "hudi": partitioned_columns = self._get_columns(connection, table_name, schema, **kw) else: partitioned_columns = self._get_columns(connection, table_name, schema, **kw)

Hudi connector doesn't support $partitions system table for now. We need a different approach, e.g. parse result of SHOW CREATE TABLE, or treat all columns as non-partition columns

@ebyhr thank you for your comment!

but we can start with b) in my opinion.

Following this, I made a commit 2232541

Could you check if this is what you expect...? 🙏

e.g. parse result of SHOW CREATE TABLE,

sorry, I don't understand how to parse this result. 🤦

Here is the result of SHOW CREATE TABLE for the target Hudi table:

SHOW CREATE TABLE using Trino

Create Table CREATE TABLE "<CATALOG>"."<DB>"."<TABLE>" ( _hoodie_commit_time varchar COMMENT '', _hoodie_commit_seqno varchar COMMENT '', _hoodie_record_key varchar COMMENT '', _hoodie_partition_path varchar COMMENT '', _hoodie_file_name varchar COMMENT '', global_intensity double COMMENT '', global_reactive_power double COMMENT '', city varchar COMMENT '', voltage double COMMENT '', global_active_power double COMMENT '', sub_metering_1 double COMMENT '', sub_metering_2 double COMMENT '', sub_metering_3 double COMMENT '', meter_id varchar COMMENT '', location array(double) COMMENT '', ts varchar COMMENT '' ) WITH ( location = 's3a://<my-bucket>', partitioned_by = ARRAY['ts'] )

SHOW CREATE TABLE using Presto

Create Table CREATE TABLE hudi."data-platform-demo"."hudi-s3-ingest " ( "_hoodie_commit_time" varchar, "_hoodie_commit_seqno" varchar, "_hoodie_record_key" varchar, "_hoodie_partition_path" varchar, "_hoodie_file_name" varchar, "global_intensity" double, "global_reactive_power" double, "city" varchar, "voltage" double, "global_active_power" double, "sub_metering_1" double, "sub_metering_2" double, "sub_metering_3" double, "meter_id" varchar, "location" array(double), "ts" varchar )

The difference is that Trino has:

WITH ( location = 's3a://<my-bucket>', partitioned_by = ARRAY['ts'] )

In Superset, fetching Hudi schema works in Presto but it does not work in Trino.
How can we use this information...? 🙇

I think Yuya meant that instead of using $partitions table in case of Hudi you can fire a SHOW CREATE TABLE and use the partitioned_by information from that output instead.

IMO however the simple fix is to do something like:

partitioned_columns = None try: partitioned_columns = self._get_columns(connection, f"{table_name}$partitions", schema, **kw) except Exception as e: logger.debug("Couldn't fetch partition columns for ...") if not partitioned_columns: return [] partition_index = dict( name="partition", column_names=[col["name"] for col in partitioned_columns], unique=False ) return [partition_index]

This feature shouldn't have been added to begin with since there's no general purpose way to figure out if a table is partitioned or not at the moment. e.g. while Hudi/Hive use partitoned_by, Iceberg uses partitioning for example.

@hashhar thank you for your comment! I fixed the code to follow your implementation.
I see... Could we use the simple fix for now to make Superset work...? 🙏
When creating the dataset in Superset using Trino+Hudi, this fetching error blocks it. There is a workaround to create the dataset via SQL Lab but we want to use the normal way to create the dataset.

cla-bot · 2023-12-06T07:45:23Z

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

… comment](trinodb#426 (comment))

hashhar

This is okay as a first step (but note Trino UI will still show failed queries).

@hovaesco Do you plan to follow-up on this to improve this?

… comment](trinodb#426 (comment))

LittleWat · 2023-12-20T08:53:33Z

I could confirm that this patch fixes the issue, thanks!! :

I'm glad if you could merge this so that we don't have to use the custom library. 🙏

ebyhr

Could you squash commits into one and fix the commit title like "Suppress failure when reading $partitions system table in get_indexes"?
https://github.com/trinodb/trino/blob/master/.github/DEVELOPMENT.md#format-git-commit-messages is the guideline for a commit message.

trino/sqlalchemy/dialect.py

LittleWat · 2023-12-20T22:10:00Z

thank you for your review in detail and sharing the doc! Fixed following your suggestion 🙇

Not all connectors have a `$partitions` table. This caused `get_indexes` to fail when called on a non-Hive (or non-partitioned Hive) table. Since Trino engine doesn't have concept of partitions there's no single way to fetch partition columns. One option is to parse the output of `SHOW CREATE TABLE` to identify them but the logic would differ based on what connector is being used. So we just opt to suppress the failure in case of a non-Hive or non-partitioned Hive table instead.

hashhar · 2023-12-22T09:32:32Z

edited commit message to add some context, merging. Thanks @LittleWat.

LittleWat · 2023-12-23T00:19:05Z

thank you for updating the commit message and merge this! this helps our project!

LittleWat mentioned this pull request Nov 30, 2023

fix(trino): the error when fetching Hudi tables Schema apache/superset#26131

Closed

9 tasks

ebyhr reviewed Nov 30, 2023

View reviewed changes

LittleWat marked this pull request as ready for review December 6, 2023 07:56

LittleWat changed the title ~~remove $partitions table suffix~~ Fix the error when fetching Hudi tables Schema Dec 6, 2023

LittleWat changed the title ~~Fix the error when fetching Hudi tables Schema~~ Fix the error when fetching Hudi tables schema Dec 6, 2023

LittleWat changed the title ~~Fix the error when fetching Hudi tables schema~~ Fix the error when fetching Hudi tables schema of Supserset Dec 6, 2023

LittleWat changed the title ~~Fix the error when fetching Hudi tables schema of Supserset~~ Fix the error when fetching Hudi tables schema of Superset Dec 6, 2023

LittleWat changed the title ~~Fix the error when fetching Hudi tables schema of Superset~~ Fix the error when fetching Hudi tables schema in Superset Dec 6, 2023

LittleWat added a commit to LittleWat/trino-python-client that referenced this pull request Dec 18, 2023

fix the exception handling in get_indexes func based on [the review…

046f56b

… comment](trinodb#426 (comment))

cla-bot bot added the cla-signed label Dec 18, 2023

LittleWat force-pushed the remove-partitions-suffix-1 branch from 046f56b to 800303d Compare December 18, 2023 11:41

LittleWat added a commit to LittleWat/trino-python-client that referenced this pull request Dec 18, 2023

fix the exception handling in get_indexes func based on [the review…

800303d

… comment](trinodb#426 (comment))

LittleWat changed the title ~~Fix the error when fetching Hudi tables schema in Superset~~ Handle the error when fetching Hudi tables schema in Superset Dec 18, 2023

LittleWat force-pushed the remove-partitions-suffix-1 branch from 800303d to 338147b Compare December 18, 2023 11:54

LittleWat added a commit to LittleWat/trino-python-client that referenced this pull request Dec 18, 2023

fix the exception handling in get_indexes func based on [the review…

338147b

… comment](trinodb#426 (comment))

LittleWat force-pushed the remove-partitions-suffix-1 branch from 338147b to b1753ed Compare December 18, 2023 11:57

LittleWat added a commit to LittleWat/trino-python-client that referenced this pull request Dec 18, 2023

fix the exception handling in get_indexes func based on [the review…

b1753ed

… comment](trinodb#426 (comment))

hashhar reviewed Dec 18, 2023

View reviewed changes

LittleWat force-pushed the remove-partitions-suffix-1 branch from b1753ed to 2d39b20 Compare December 18, 2023 12:56

LittleWat added a commit to LittleWat/trino-python-client that referenced this pull request Dec 18, 2023

fix the exception handling in get_indexes func based on [the review…

2d39b20

… comment](trinodb#426 (comment))

LittleWat force-pushed the remove-partitions-suffix-1 branch from 2d39b20 to e6a20f5 Compare December 18, 2023 13:01

LittleWat added a commit to LittleWat/trino-python-client that referenced this pull request Dec 18, 2023

fix the exception handling in get_indexes func based on [the review…

e6a20f5

… comment](trinodb#426 (comment))

ebyhr approved these changes Dec 20, 2023

View reviewed changes

trino/sqlalchemy/dialect.py Outdated Show resolved Hide resolved

LittleWat force-pushed the remove-partitions-suffix-1 branch from e6a20f5 to 429595a Compare December 20, 2023 22:07

LittleWat changed the title ~~Handle the error when fetching Hudi tables schema in Superset~~ Suppress failure when reading $partitions system table in get_indexes Dec 20, 2023

hashhar force-pushed the remove-partitions-suffix-1 branch from 429595a to e62f55f Compare December 22, 2023 09:27

hashhar merged commit 9df2cf2 into trinodb:master Dec 22, 2023
12 checks passed

hashhar mentioned this pull request Dec 22, 2023

Error when fetching Hudi tables Schema apache/superset#21945

Closed

hashhar mentioned this pull request Feb 16, 2024

Release notes for 0.328.0 #417

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suppress failure when reading $partitions system table in get_indexes #426

Suppress failure when reading $partitions system table in get_indexes #426

LittleWat commented Nov 30, 2023 •

edited

Loading

cla-bot bot commented Nov 30, 2023

ebyhr Nov 30, 2023 •

edited

Loading

LittleWat Nov 30, 2023 •

edited

Loading

ebyhr Nov 30, 2023 •

edited

Loading

LittleWat Dec 6, 2023 •

edited

Loading

hashhar Dec 14, 2023 •

edited

Loading

LittleWat Dec 18, 2023

cla-bot bot commented Dec 6, 2023

hashhar left a comment

LittleWat commented Dec 20, 2023

ebyhr left a comment

LittleWat commented Dec 20, 2023

hashhar commented Dec 22, 2023

LittleWat commented Dec 23, 2023

Suppress failure when reading $partitions system table in get_indexes #426

Suppress failure when reading $partitions system table in get_indexes #426

Conversation

LittleWat commented Nov 30, 2023 • edited Loading

Description

Non-technical explanation

Release notes

cla-bot bot commented Nov 30, 2023

ebyhr Nov 30, 2023 • edited Loading

Choose a reason for hiding this comment

LittleWat Nov 30, 2023 • edited Loading

Choose a reason for hiding this comment

ebyhr Nov 30, 2023 • edited Loading

Choose a reason for hiding this comment

LittleWat Dec 6, 2023 • edited Loading

Choose a reason for hiding this comment

hashhar Dec 14, 2023 • edited Loading

Choose a reason for hiding this comment

LittleWat Dec 18, 2023

Choose a reason for hiding this comment

cla-bot bot commented Dec 6, 2023

hashhar left a comment

Choose a reason for hiding this comment

LittleWat commented Dec 20, 2023

ebyhr left a comment

Choose a reason for hiding this comment

LittleWat commented Dec 20, 2023

hashhar commented Dec 22, 2023

LittleWat commented Dec 23, 2023

LittleWat commented Nov 30, 2023 •

edited

Loading

ebyhr Nov 30, 2023 •

edited

Loading

LittleWat Nov 30, 2023 •

edited

Loading

ebyhr Nov 30, 2023 •

edited

Loading

LittleWat Dec 6, 2023 •

edited

Loading

hashhar Dec 14, 2023 •

edited

Loading