Skip to content

Commit

Permalink
Feature/pretty print (#12)
Browse files Browse the repository at this point in the history
* Restructuring of project

- Move definitions.py to /definitions folder
- Update path in all files using definitions.py
- Add anti-pattern markdown files

* Add basic CLI output setup

* Add argument to render descriptions

* Add markdown descriptions of anti-patterns

* Linting

* Linting

* Add general title to definitions

* Add title from definitions to detector output

* Fix typo

* Add title to printer

* Update docstring

* Add description to detector output

* Linting
  • Loading branch information
leonardomathon authored Mar 15, 2022
1 parent 44caa85 commit a7ea4a9
Show file tree
Hide file tree
Showing 23 changed files with 505 additions and 95 deletions.
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
sqlparse==0.4.2
sqlparse==0.4.2
rich==12.0.0
3 changes: 2 additions & 1 deletion requirements_dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ flake8==4.0.1
tox==3.24.5
pytest==7.0.1
pytest-cov==3.0.0
mypy===0.931
types-setuptools==57.4.10
mypy==0.931
4 changes: 4 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,14 @@ classifiers =
[options]
packages =
sqleyes
sqleyes.definitions
sqleyes.detector
sqleyes.detector.antipatterns
sqleyes.printer
sqleyes.utils
install_requires =
sqlparse>=0.4.2
rich>=12.0.0
python_requires = >=3.6
package_dir =
=.
Expand All @@ -44,6 +47,7 @@ testing =
pytest>=7.0
pytest-cov>=3.0
mypy>=0.910
types-setuptools>=57.4.10
flake8>=4.0
tox>=3.24

Expand Down
14 changes: 12 additions & 2 deletions sqleyes/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import argparse

from sqleyes.main import main
from sqleyes.printer.printer import IntroPrinter, OutputPrinter


parser = argparse.ArgumentParser(
Expand All @@ -10,12 +11,21 @@
parser.add_argument('-q', '--query', metavar="", type=str, required=True,
help="A raw SQL query to analyze")

parser.add_argument('-d', '--description', action="store_true",
help="Show descriptions of found errors")

parser.set_defaults(description=False)

args = parser.parse_args()


def cli():
print(main(args.query))
IntroPrinter(args.query).print()
output = main(args.query)
OutputPrinter(output).print(args.description)


if __name__ == '__main__':
print(main(args.query))
IntroPrinter(args.query).print()
output = main(args.query)
OutputPrinter(output).print(args.description)
Empty file added sqleyes/definitions/__init__.py
Empty file.
20 changes: 20 additions & 0 deletions sqleyes/definitions/antipatterns/ambiguous_groups.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
### Ambiguous Groups

This anti-pattern occurs when developers misuse the aggregation command `GROUP BY`.

Every column in a query's `SELECT` statement must have a single value row per row group, which is also known as the **Single-Value Rule**. Now, for columns in the `GROUP BY` aggregation this is guaranteed, because it returns exactly one value per group, regardless of how many rows the group matches. For other SQL commands such as `MAX(), MIN(), AVG()`, it will also result in a single value for each group, so this is also guaranteed.
The database server, on the other hand, cannot be so certain about any other field listed in the `SELECT` statement. It cannot always ensure that the identical value for the other columns appears on every row in a group. This may cause erroneous results.

#### Example code

```SQL
SELECT CID, PID, MIN(date)
FROM customers JOIN shoppinglists USING (CID)
GROUP BY CID;
```

The code above shows a basic example of this anti-pattern. In this example, because the `shoppinglists` table identifies numerous products to a specific customer, there are several distinct values for product ID for a given customer ID. There is no way to express all product ID values in a grouping query that reduces to a single row per customer.

#### Fix

Always make sure that the columns in the `SELECT` clause have single values. This can be achieved by grouping over multiple columns.
17 changes: 17 additions & 0 deletions sqleyes/definitions/antipatterns/fear_of_the_unknown.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
### Fear of the Unknown

In SQL, values in columns can be left empty. This results in an attribute of a certain row having a `NULL` value. SQL considers `NULL` to be a special value, distinct from zero, false, true, or an empty string. Therefore, it is not possible to test for `NULL` values with standard comparison operators such as `=, >=, <>, etc`. Instead use `IS NULL` and `IS NOT NULL`.

#### Example code

```SQL
SELECT pName, suffix
FROM products
WHERE suffix <> NULL;
```

The code shown above is querying the product name and suffix columns from the products table where the suffix is not equal to `NULL`. One might think that this will result in all rows that have a suffix, however this is not the case. Any comparison to `NULL` returns _unknown_, not true or false. Therefore, this query does not return any data.

#### Fix

Use `IS NULL` and `IS NOT NULL` when comparing against `NULL` values.
16 changes: 16 additions & 0 deletions sqleyes/definitions/antipatterns/implicit_columns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
### Implicit Columns

When writing a query that needs a lot of columns, developers often opt to use the SQL wildcard selector `*`. This means that every column from the table(s) specified in the `FROM` clause is returned, meaning that the list of columns is implicit instead of explicit. In many ways, this makes the query more concise. However, this can come at a cost as the result set can be quite big for large tables. This will have an impact on the performance of the query.

#### Example code

```SQL
SELECT *
FROM purchases;
```

Suppose we have the task of finding all customer IDs, store IDs, product IDs as well as the quantity and price from the purchases table. This would mean that the only columns from the purchases table that are not present in this query would be the purchase ID and data columns. The query shown in above contains the implicit columns anti-pattern as it uses the wildcard `*`. Instead of selecting only the columns requested by the task, it utilizes a wildcard, returning in a result that includes both the store ID and the date columns.

#### Fix

Always explicitly select the columns you need. Use the wildcard `*` operator with caution.
21 changes: 21 additions & 0 deletions sqleyes/definitions/antipatterns/poor_mans_search_engine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
### Poor Man's Search Engine

Suppose we want to search for words or sentences in our database. The first thing that comes to mind is using a SQL pattern-matching predicate, such as the `LIKE` keyword, to which we can specify a pattern or using `REGEXP`. Both methods seem like a very good option for full searches.

However, the main problem of pattern-matching predicates is their poor performance. Because they cannot use a traditional index, they must scan every row of the specified tables. The overall cost of a table scan for this search is very high, since matching a pattern against a column of strings is a costly operation when we compare it to other comparison methods like integer equality.

Another problem with simple pattern-matching using the keyword `LIKE` or regular expressions is that they can find unintended matches, making the search result not accurate or erroneous.

#### Example code

```SQL
SELECT *
FROM products
WHERE pName LIKE "%cat%"
```

The code above shows an example of how to search products that have the word "cat" in their product name. This should be avoided if the products table is large.

#### Fix

Use a specialized search engine method to do pattern matching. They sometimes come as standard with certain databases or DBMS's.
17 changes: 17 additions & 0 deletions sqleyes/definitions/antipatterns/random_selection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
### Random Selection

When writing a query that needs to select a random row from a table, developers might use `ORDER BY RAND() LIMIT 1`, where the `RAND()` function to sort the data randomly. However, this is not the best solution. By using the `RAND()` inside an `ORDER BY` clause, the use of an index is not possible, since there is no index containing the values returned by the random function. This is a big concern for the query's performance because using an index is one of the best ways to increase the computation of sorting. As a result of not employing an index, the query result set must be sorted by the database using a slow table scan, making the performance poor.

#### Example code

```SQL
SELECT CID
FROM customers
ORDER BY RAND() LIMIT 1;
```

A typical use case for this anti-pattern is when we have the task of selecting a random cusomter ID from the customers table. The query above shows a typical (faulty) solution.

#### Fix

Choose a random value using other means. Common ways would be to generate a random value between 1 and the greatest primary key, or counting the total number of rows and generating a random number between 0 and the row count. Then we can use the random number inside a `WHERE` clause.
29 changes: 29 additions & 0 deletions sqleyes/definitions/definitions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
DEFINITIONS = {
"anti_patterns": {
"ambiguous_groups": {
"filename": "ambiguous_groups.md",
"title": "Incorrect GROUP BY usage",
"type": "Ambiguous Groups"
},
"fear_of_the_unknown": {
"filename": "fear_of_the_unknown.md",
"title": "Incorrect NULL usage",
"type": "Fear of the Unknown"
},
"implicit_columns": {
"filename": "implicit_columns.md",
"title": "Avoid usage of wildcard selector",
"type": "Implicit Columns"
},
"poor_mans_search_engine": {
"filename": "poor_mans_search_engine.md",
"title": "Avoid pattern matching",
"type": "Poor Man's Search Engine"
},
"random_selection": {
"filename": "random_selection.md",
"title": "Avoid ORDER BY RAND() usage",
"type": "Random Selection"
}
}
}
9 changes: 8 additions & 1 deletion sqleyes/detector/antipatterns/abstract_base_class.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
"""Abstract anti-pattern detector class"""
from abc import ABC, abstractmethod

from sqleyes.utils.load_file import load_description


class AbstractDetector(ABC):
"""
Expand All @@ -16,8 +18,9 @@ class AbstractDetector(ABC):
query : str
The query to be searched for.
"""

filename: str = NotImplemented
type: str = NotImplemented
title: str = NotImplemented

@abstractmethod
def __init__(self, query):
Expand All @@ -27,3 +30,7 @@ def __init__(self, query):
@abstractmethod
def check(self):
pass

def get_description(self):
return load_description("sqleyes.definitions", "antipatterns/",
self.filename)
6 changes: 5 additions & 1 deletion sqleyes/detector/antipatterns/ambiguous_groups.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import re

from sqleyes.detector.antipatterns.abstract_base_class import AbstractDetector
from sqleyes.detector.definitions import DEFINITIONS
from sqleyes.definitions.definitions import DEFINITIONS
from sqleyes.detector.detector_output import DetectorOutput
from sqleyes.utils.query_functions import (check_single_value_rule,
get_columns_from_group_by_statement,
Expand All @@ -11,7 +11,9 @@

class AmbiguousGroupsDetector(AbstractDetector):

filename = DEFINITIONS["anti_patterns"]["ambiguous_groups"]["filename"]
type = DEFINITIONS["anti_patterns"]["ambiguous_groups"]["type"]
title = DEFINITIONS["anti_patterns"]["ambiguous_groups"]["title"]

def __init__(self, query):
super().__init__(query)
Expand All @@ -34,7 +36,9 @@ def check(self):

if not single_values:
return DetectorOutput(certainty="high",
description=super().get_description(),
detector_type=self.detector_type,
title=self.title,
type=self.type)

return None
Expand Down
6 changes: 5 additions & 1 deletion sqleyes/detector/antipatterns/fear_of_the_unknown.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
"""Fear of the Unknown anti-pattern detector class"""
import re
from sqleyes.detector.antipatterns.abstract_base_class import AbstractDetector
from sqleyes.detector.definitions import DEFINITIONS
from sqleyes.definitions.definitions import DEFINITIONS
from sqleyes.detector.detector_output import DetectorOutput


class FearOfTheUnknownDetector(AbstractDetector):

filename = DEFINITIONS["anti_patterns"]["fear_of_the_unknown"]["filename"]
type = DEFINITIONS["anti_patterns"]["fear_of_the_unknown"]["type"]
title = DEFINITIONS["anti_patterns"]["fear_of_the_unknown"]["title"]

def __init__(self, query):
super().__init__(query)
Expand All @@ -19,7 +21,9 @@ def check(self):
for pattern in patterns:
if pattern.search(self.query):
return DetectorOutput(certainty="high",
description=super().get_description(),
detector_type=self.detector_type,
title=self.title,
type=self.type)

return None
6 changes: 5 additions & 1 deletion sqleyes/detector/antipatterns/implicit_columns.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
"""Implicit Columns anti-pattern detector class"""
import re
from sqleyes.detector.antipatterns.abstract_base_class import AbstractDetector
from sqleyes.detector.definitions import DEFINITIONS
from sqleyes.definitions.definitions import DEFINITIONS
from sqleyes.detector.detector_output import DetectorOutput


class ImplicitColumnsDetector(AbstractDetector):

filename = DEFINITIONS["anti_patterns"]["implicit_columns"]["filename"]
type = DEFINITIONS["anti_patterns"]["implicit_columns"]["type"]
title = DEFINITIONS["anti_patterns"]["implicit_columns"]["title"]

def __init__(self, query):
super().__init__(query)
Expand All @@ -17,7 +19,9 @@ def check(self):

if pattern.search(self.query):
return DetectorOutput(certainty="high",
description=super().get_description(),
detector_type=self.detector_type,
title=self.title,
type=self.type)

return None
6 changes: 5 additions & 1 deletion sqleyes/detector/antipatterns/poor_mans_search_engine.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
"""Poor Man's Search Engine anti-pattern detector class"""
import re
from sqleyes.detector.antipatterns.abstract_base_class import AbstractDetector
from sqleyes.detector.definitions import DEFINITIONS
from sqleyes.definitions.definitions import DEFINITIONS
from sqleyes.detector.detector_output import DetectorOutput


class PoorMansSearchEngineDetector(AbstractDetector):

filename = DEFINITIONS["anti_patterns"]["poor_mans_search_engine"]["filename"]
type = DEFINITIONS["anti_patterns"]["poor_mans_search_engine"]["type"]
title = DEFINITIONS["anti_patterns"]["poor_mans_search_engine"]["title"]

def __init__(self, query):
super().__init__(query)
Expand All @@ -19,7 +21,9 @@ def check(self):
for pattern in patterns:
if pattern.search(self.query):
return DetectorOutput(certainty="medium",
description=super().get_description(),
detector_type=self.detector_type,
title=self.title,
type=self.type)

return None
7 changes: 5 additions & 2 deletions sqleyes/detector/antipatterns/random_selection.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
"""Random Selection anti-pattern detector class"""
import re
from sqleyes.detector.antipatterns.abstract_base_class import AbstractDetector
from sqleyes.detector.definitions import DEFINITIONS
from sqleyes.definitions.definitions import DEFINITIONS
from sqleyes.detector.detector_output import DetectorOutput


class RandomSelectionDetector(AbstractDetector):

filename = DEFINITIONS["anti_patterns"]["random_selection"]["filename"]
type = DEFINITIONS["anti_patterns"]["random_selection"]["type"]
title = DEFINITIONS["anti_patterns"]["random_selection"]["title"]

def __init__(self, query):
super().__init__(query)
Expand All @@ -19,7 +21,8 @@ def check(self):
for pattern in patterns:
if pattern.search(self.query):
return DetectorOutput(certainty="high",
description=super().get_description(),
detector_type=self.detector_type,
title=self.title,
type=self.type)

return None
19 changes: 0 additions & 19 deletions sqleyes/detector/definitions.py

This file was deleted.

Loading

0 comments on commit a7ea4a9

Please sign in to comment.