Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing check for NullInt64.Valid in Generic.Count #374

Open
indradhanush opened this issue Dec 6, 2024 · 7 comments
Open

Missing check for NullInt64.Valid in Generic.Count #374

indradhanush opened this issue Dec 6, 2024 · 7 comments

Comments

@indradhanush
Copy link

We're running kine with mysql and see this error intermittently:

│ time="2024-12-04T18:12:50.283533302Z" level=error msg="error while range on /registry/statefulsets/ /registry/statefulsets/: sql: Scan error on column index 0, name \"prev_revision\": converting NULL to int64 is unsupported"

The error originates when LimitedServer.Range invokes the method list and ultimately invoking Generic.Count.

While the error goes away when we restart kine (thus our hunch being this related to a race condition) but it does show up again after a few days. We haven't yet narrowed down on what triggers this.

In the meantime we would be interested in contributing a fix for this. While the fix in itself is simple - check for NullInt64.Valid, but we're not sure what the error handling should do. Should it return a custom error? I'd love your thoughts before I work on a PR.

Note: The same handling should also be applied to Generic.CountCurrent and perhaps also to the other usage of NullInt64 in sqllog.scan.

@brandond
Copy link
Member

brandond commented Dec 6, 2024

What version of mysql are you using? Can you reproduce this on demand?

I'm curious what the actual query is returning in this case. I don't think it's in generic.Count because that only uses the max revision via SELECT MAX(rkv.id) AS id FROM kine AS rkv, and the error is about the prev_revision column which is only needed when reading an entry.

@indradhanush
Copy link
Author

indradhanush commented Dec 10, 2024

What version of mysql are you using? Can you reproduce this on demand?

We're running MySQL 8.0.25. Unfortunately we haven't been able to reliably reproduce this yet.

I don't think it's in generic.Count because that only uses the max revision via SELECT MAX(rkv.id) AS id FROM kine AS rkv, and the error is about the prev_revision column which is only needed when reading an entry.

I traced the usage of that query which is defined as compactRevSQL and see that:

I don't see another code path for this, so it does appear to be coming from generic.Count, unless I'm missing something?

Question is, do we expect this query to ever return null:

SELECT
  MAX(crkv.prev_revision) AS prev_revision
FROM
  kine AS crkv
WHERE
  crkv.name = 'compact_rev_key'

If yes, then under what conditions?

@brandond
Copy link
Member

Yeah, I am assuming there is an underlying SQL error here that is being obscured by the error from scan. There should never not be a compact_rev_key entry in the db.

@supreeth90
Copy link

@brandond can this happen if the DB is not initialized yet(table kine does not exist yet) ? That can be a race condition ? We have seen this happen only we bring up a new control plane.

@brandond
Copy link
Member

brandond commented Jan 9, 2025

Kine shouldn't even come up until the DB is initialized, which includes creating the compact_rev_key entry. Can you provide steps to reproduce?

@supreeth90
Copy link

thats the tricky part, we have not been able reproduce it ourselves and it is happening in our production environment intermittently.
@brandond Thanks for your quick responses! Just confirming that kine does not create the kine table ? And it has to be done outside before running kine ?

@brandond
Copy link
Member

brandond commented Jan 10, 2025

Just confirming that kine does not create the kine table ? And it has to be done outside before running kine ?

No, kine expects to create the table. We don't document the schema anywhere outside the code so I'm not sure why you'd expect to have to create it manually.

  • `CREATE TABLE IF NOT EXISTS kine
    (
    id BIGINT UNSIGNED AUTO_INCREMENT,
    name VARCHAR(630) CHARACTER SET ascii,
    created INTEGER,
    deleted INTEGER,
    create_revision BIGINT UNSIGNED,
    prev_revision BIGINT UNSIGNED,
    lease INTEGER,
    value MEDIUMBLOB,
    old_value MEDIUMBLOB,
    PRIMARY KEY (id)
    );`,
    `CREATE INDEX kine_name_index ON kine (name)`,
    `CREATE INDEX kine_name_id_index ON kine (name,id)`,
    `CREATE INDEX kine_id_deleted_index ON kine (id,deleted)`,
    `CREATE INDEX kine_prev_revision_index ON kine (prev_revision)`,
    `CREATE UNIQUE INDEX kine_name_prev_revision_uindex ON kine (name, prev_revision)`,

Are you creating it manually ahead of time with unexpected schema or missing indices?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants