Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat!: Introduce Model linting #3787

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Feat!: Introduce Model linting #3787

wants to merge 13 commits into from

Conversation

VaggelisD
Copy link
Contributor

@VaggelisD VaggelisD commented Feb 5, 2025

Fixes #3753

This PR introduces a new interface for a general purpose Model linter:

  • Each Rule checks for a specific lint error; If a violation is found, a RuleViolation is created with the proper context
  • Rules are grouped in RuleSets; Each RuleSet acts as both a set and a mapping (rule class name -> rule type).
  • The Linter applies the RuleSets on the model and outputs warnings / raises errors depending on their severity

The linter configuration along with the default values is the following:

gateways:
 ...

default_gateway: dev

model_defaults:
  ...

linter:
  enabled: False
  
  # Can be overridden with "ALL" or ["rule0", "rule1", ...., "ruleN"]
  rules: []
  warn_rules:
  exclude_rules: ["noselectstar", "ambiguousorinvalidcolumn", "invalidselectstarexpansion"]

This is to preserve the behavior of QueryRenderer today; For this matter, validate_query is deprecated and instead the corresponding checks are implemented as rules in the Linter.

On a model granularity, rules can be ignored by the new ignore_lints attribute:

MODEL (
  name sqlmesh_example.test_model,
  kind FULL,
  ignore_lints ["noselectstar"]
);

SELECT ...;

Users can implement their own lint rules in a linter/ directory by subclassing Rule; These will be automatically picked up at load time:

# linter/user.py

import typing as t

from sqlmesh.core.linter.rule import Rule, RuleViolation, RuleSet
from sqlmesh.core.model import Model

class NoMissingOwner(Rule):
    def check(self, model: Model) -> t.Optional[RuleViolation]:
        return RuleViolation(rule=self, model=model) if not model.owner else None

    @property
    def summary(self) -> str:
        return "Model owner should be set"

@VaggelisD VaggelisD marked this pull request as draft February 5, 2025 15:25
@VaggelisD VaggelisD force-pushed the vaggelisd/query_linter branch from cbdeab5 to 8f456d4 Compare February 11, 2025 17:46
@VaggelisD VaggelisD changed the title Feat!: Introduce SQL linting Feat!: Introduce Model linting Feb 11, 2025
@VaggelisD VaggelisD force-pushed the vaggelisd/query_linter branch from 8f456d4 to e60106e Compare February 11, 2025 17:47
@VaggelisD VaggelisD force-pushed the vaggelisd/query_linter branch from e60106e to c90f22c Compare February 11, 2025 18:05
@VaggelisD VaggelisD force-pushed the vaggelisd/query_linter branch 3 times, most recently from 2c3e92e to 7116a75 Compare February 24, 2025 09:09
@VaggelisD VaggelisD marked this pull request as ready for review February 24, 2025 09:26
@VaggelisD VaggelisD requested a review from a team February 24, 2025 09:47
@VaggelisD
Copy link
Contributor Author

PS: v0071 is also defined here, I suspect Iaroslav's PR will be merged much sooner.

Copy link
Contributor

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting stuff!

@VaggelisD VaggelisD force-pushed the vaggelisd/query_linter branch from 7116a75 to f9c6ae2 Compare February 25, 2025 17:48
@VaggelisD VaggelisD force-pushed the vaggelisd/query_linter branch from f9c6ae2 to 7beea0b Compare February 25, 2025 17:49
@VaggelisD VaggelisD requested a review from a team February 26, 2025 14:32
self._name = name or self.__class__.__name__.lower()

@abc.abstractmethod
def check(self, model: Model) -> t.Optional[RuleViolation]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this type signature it seems like rules can only check for a single violation within a model, but what if there are multiple violations of the same rule in a model? What gets shown to the end user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my idea was that a RuleViolation will be indeed singular per rule but if there's multiple violations we can store their context in a collection.

For instance, if we have a rule that detects a certain faulty pattern in a query, I thought it'd look somewhat odd to have N x RuleViolation(..., expr: exp.Expression) instead of a single RuleViolation(..., exprs: t.List[exp.Expression]); The former looks like it's more polluting for now and increases the complexity which we are now trying to avoid based on internal discussions.

How does that sound?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that probably works too.

@etonlels
Copy link
Contributor

etonlels commented Feb 26, 2025

@VaggelisD is it possible to create configurable rules, and if so, how is that done?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Linting rule to PREVENT select * in general
6 participants