Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mySQL schema data model updates #19472

Merged
merged 13 commits into from
Jan 29, 2025
Merged

mySQL schema data model updates #19472

merged 13 commits into from
Jan 29, 2025

Conversation

azhou-datadog
Copy link
Contributor

@azhou-datadog azhou-datadog commented Jan 23, 2025

What does this PR do?

Updating the mySQL schema data model.

I've left columns and foreign keys as is, but partitions and indexes have been re-shaped. I've also removed a few columns that I don't find very useful, listed below. Please check me on these, here is the documentation for indexes and partitions

Partitions changes

  • Partitions have been re-shaped to avoid the pattern of a consumer of this dictionary needing to keep track of subpartition ordinal positions values to correlate a subpartition with its parent partition.
  • If subpartitions exist, the partition level fields data_length and table_rows are now aggregates of the same values in the subpartition dictionary. If there are no subpartitions, they behave as normal.
  • Removed partition_comment. We don’t include index comments, but if we want those, we can add those to indexes and keep this one too.
  • Removed index_lengths.
  • Removed data_free.
  • Removed max_data_lengths.
  • Removed tablespace_name.
  • Changed several string typed values to their actual underlying types (bools or ints).

Indexes changes

  • Indexes have similarly been re-shaped so an index built from multiple columns does not need to keep track of a seq_in_index value.
  • I’ve added the expression column to indexes. This is important for indexes that were built with Functional Key Parts. This was introduced in mySQL 8.0.13 (release notes, information_schema.statistics 8.0.13 docs)
  • Removed seq_in_index from indexes.
  • Changed several string typed values to their actual underlying types (bools or ints).
  • Collation should be associated per column, not the index as a whole.

Updated all unit tests to cover these changes.

Motivation

We want to display mySQL schemas in the schemas explorer page

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

Copy link

codecov bot commented Jan 23, 2025

Codecov Report

Attention: Patch coverage is 92.15686% with 4 lines in your changes missing coverage. Please review.

Project coverage is 88.07%. Comparing base (5150864) to head (9917e01).
Report is 18 commits behind head on master.

Additional details and impacted files
Flag Coverage Δ
activemq ?
cassandra ?
hive ?
hivemq ?
ignite ?
jboss_wildfly ?
kafka ?
mysql 89.59% <92.15%> (?)
presto ?
solr ?

Flags with carried forward coverage won't be shown. Click here to find out more.

@azhou-datadog azhou-datadog force-pushed the allen.zhou/mysql-schema branch from ec6776b to 1d748f0 Compare January 27, 2025 17:07
Copy link

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

Copy link

The changelog type changed or removed was used in this Pull Request, so the next release will bump major version. Please make sure this is a breaking change, or use the fixed or added type instead.

@azhou-datadog azhou-datadog changed the title DRAFT: Allen.zhou/mysql schema mySQL schema data model updates Jan 28, 2025
@@ -83,7 +83,6 @@ def __init__(self, check, config, connection_args):
self._version_processed = False
self._connection_args = connection_args
self._db = None
self._check = check
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleting a duplicate, see L81

@@ -385,6 +385,7 @@ files:
hidden: true
description: |
Configure collection of schemas (databases).
Only tables and schemas for which the user has been granted SELECT privileges are collected.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will want to address this in a future PR, current best path forward seems to be setting up prepared statements that can query all schema info.

index_data["cardinality"] = int(row["cardinality"])
index_data["index_type"] = str(row["index_type"])
index_data["non_unique"] = bool(row["non_unique"])
index_data["expression"] = str(row["expression"]).strip().lower() if row["expression"] else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of .strip().lower() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking about normalizing some of these values at one point, but probably not necessary. Removed this

@azhou-datadog azhou-datadog added this pull request to the merge queue Jan 29, 2025
Merged via the queue into master with commit 3e22c06 Jan 29, 2025
34 checks passed
@azhou-datadog azhou-datadog deleted the allen.zhou/mysql-schema branch January 29, 2025 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants