-
Notifications
You must be signed in to change notification settings - Fork 442
[WIP] Add preliminary Clickhouse formatter #922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
a659fd2 to
80f6e1e
Compare
nene
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly in the right direction, but I spotted some issues.
Most glaringly the fact that formatting of SELECT clauses has been completely neglected.
|
I really would encourage you to add tests as early on as possible. You can start with just: import { format as originalFormat, FormatFn } from '../src/sqlFormatter.js';
import behavesLikeSqlFormatter from './behavesLikeSqlFormatter.js';
describe('ClickhouseFormatter', () => {
const language = 'clickhouse';
const format: FormatFn = (query, cfg = {}) => originalFormat(query, { ...cfg, language });
behavesLikeSqlFormatter(format);
// or maybe (I'm unsure how similar exactly Clickhouse is to PostgreSQL)
// behavesLikePostgresqlFormatter(format);
});That will make sure your configuration will work for the basic stuff that should be the same in all SQL dialects. If any of these bahavesLikeSqlFormatter() tests fail, then I would ask you to fix the problem anyway. It's fairly unlikely that there's something in Clickhouse dialect that would necessitate a change to these core tests. |
|
Also, it would be of great help if you went through the wiki and added information there about the Clickhouse dialect. Especially these first few pages about Identifiers, Parameters, ... Comments. Doing that will also help you to get these details right about Clickhouse dialect. |
|
@nene I appreciate you taking the time to look! I'm very aware that it's not nearly ready to land and that it needs quite a bit of love, I just didn't want to invest a ton more time if it's completely off the mark. I have a commit in progress with the tests; I'm trying to aim for nearly all of the example queries from the docs to get handled correctly. I'll ping you when I have something that's ready for review. |
|
Hey @nene, I've been making a lot of progress getting things in order. The code is in a much better place, but I wanted to pause before continuing to ask for your opinion, because I don't know that there's an obvious right answer to what I'm running into. There's two components to this, which are intertwined. The first part is that Clickhouse lets you drop/alter multiple resources at once. For instance: DROP TABLE foo, bar;would be valid according to https://clickhouse.com/docs/sql-reference/statements/drop#drop-table However, with which doesn't seem correct. It would seem to me that the more appropriate choice would be to make This makes it consistent with similar syntax, like with To my knowledge, there's no way to make a clause conditionally tabular, unless I'm missing something. On one hand, this could be made consistent with the other dialects and the feature tests would pass, but the multi-resource case would be pretty unfortunate. On the other hand, I could break convention with the other formatters and use a reserved clause here, and have a slightly less pleasant single-resource case that looks solid with multiple resources. As a note, -- SELECT foo
SELECT
fooSo this isn't completely unprecedented cosmetically. My personal preference is the second option, but I can also appreciate that you'd want the different formatters to behave consistently. The second piece is essentially the same issue. Consider this statement: ALTER ROW POLICY IF EXISTS policy1 ON CLUSTER cluster_name1 ON database1.table1 RENAME TO new_name1, policy2 ON CLUSTER cluster_name2 ON database2.table2 RENAME TO new_name2;
However, this disagrees with the existing built-in feature tests, which would format ALTER TABLE supplier RENAME TO the_one_who_suppliesas where Similar to the first item I noted above, I could break convention with the existing tests and follow the rules that I believe make Clickhouse format in a way that I believe looks the best and with internal consistency (making multi-resource statements into reserved clauses and having One other option (which addresses the tabular one-line clause vs reserved clause discrepancy, but not the I haven't done a lot of research to understand how big of an undertaking this would be, and as far as I can tell, this isn't something that exists already (but if it does and I've missed it, please let me know!) I'd love your thoughts on this. |
|
I think with both of these it's best to consider which is the common way one would use a statement. Like the Similarly with these policies. I would guess it's quite rare that one needs to alter multiple policies at once. So I wouldn't optimize the formatter for these rare cases, but rather for the most common case. In general... The thing is, that this formatter works using heuristics and it doesn't really understand SQL. It just looks for some patterns and then tries to make some best guesses. There's no shortage of cases where it does a poor job. But there's only so much it can do with this sort of limited architecture. To address these fundamental issues I built a new tool: prettier-plugin-sql-cst, which actually parses the SQL and is able to handle this sort of cases. Heh... actually now that I tested it, I discovered it messes up the |
nene
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few quick comments.
|
This pull request is automatically built and testable in CodeSandbox. To see build info of the built libraries, click here or the icon next to each commit SHA. |
|
Hey @nene, apologies for the delay in following up. It's been a busy month here. I think I've addressed all of your comments. Please let me know if there's anything that you'd like changed or that you aren't happy with in the current implementation. |
nene
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spotted a few issues, mostly minor. The main thing being about map literals.
| it('supports the ternary operator', () => { | ||
| // NOTE: Ternary operators have a missing space because | ||
| // ExpressionFormatter's `formatOperator` method special-cases `:`. | ||
| expect(format('SELECT foo?bar: baz;')).toBe('SELECT\n foo ? bar: baz;'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be more consistent to avoid using \n in all the tests and use the dedent utility instead.
| * when they are used as operators/modifiers (not function calls). | ||
| * | ||
| * IN operator: foo IN (1, 2, 3) - IN comes after an identifier/expression | ||
| * IN function: IN(foo, 1, 2, 3) - IN comes at start or after operators/keywords |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IN-function/operator logic was removed, but the comments still talk about it.
| regex: String.raw`\{\s*[^:]+:[^}]+\}`, | ||
| key: v => { | ||
| const match = /\{([^:]+):/.exec(v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor inconsistency: the \s* is not necessary in the first regex.
| operators: [ | ||
| // Arithmetic | ||
| '%', // modulo | ||
|
|
||
| // Ternary | ||
| '?', | ||
| ':', | ||
|
|
||
| // Lambda creation | ||
| '->', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are missing some operators: ==, <=> and ||.
| export const isToken = { | ||
| ARRAY: testToken({ text: 'ARRAY', type: TokenType.RESERVED_DATA_TYPE }), | ||
| BY: testToken({ text: 'BY', type: TokenType.RESERVED_KEYWORD }), | ||
| IN: testToken({ text: 'IN', type: TokenType.RESERVED_KEYWORD }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added, but is unused.
| reservedKeywords: keywords, | ||
| reservedDataTypes: dataTypes, | ||
| reservedFunctionNames: functions, | ||
| extraParens: ['[]'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClickHouse also supports map literals, so we should include '{}' in here.
But unfortunately that syntax conflicts with parameter syntax. It seems to me that in map literals the keys have to be quoted. I think we could distinguish these two based on that.
An example test which currently fails:
it('supports map literals', () => {
expect(format(`SELECT {'foo':1,'bar':10,'baz':2,'zap':8};`)).toBe(dedent`
SELECT
{'foo': 1, 'bar': 10, 'baz': 2, 'zap': 8};
`);
});|
Also, please run |
Addresses #614
I've started putting together a Clickhouse formatter. Before I start getting into the weeds with tests and updating all of the docs/tests/playground, I wanted to check that things are directionally correct. Please let me know if there's anything you'd like to see done differently before I proceed.