feat(ng-kernel): add the persistent kernel #251

skyzh · 2024-11-27T00:20:51Z

Thinking about the new start point of optd, let's build something from a MVP instead of getting everything working at the first place.

We have naive + persistent memo table. The memo table doesn't store cost and properties for now. It finds the duplicates. It's async.

On the persistent side, we use sqlx to run the queries in in-memory sqlite to eliminate external dependencies when running tests. The persistent memo table currently supports dedup predicates.

Signed-off-by: Alex Chi <[email protected]>

connortsui20 · 2024-11-27T01:41:55Z

optd-ng-kernel/src/cascades.rs

+pub mod memo;
+pub mod naive_memo;
+pub mod optimizer;
+pub mod persistent_memo;


Every file needs module-level documentation. And following the standard of most modern Rust open-source projects, this should be in src/cascades/mod.rs, especially since it is just making its submodules public.

well, most of them were pub(crate) in the old codebase

connortsui20 · 2024-11-27T01:43:59Z

optd-ng-kernel/Cargo.toml

+anyhow = "1"
+async-recursion = "1"
+async-trait = "0.1"
+arrow-schema = "47.0.0"
+tracing = "0.1"
+ordered-float = "4"
+itertools = "0.13"
+serde = { version = "1.0", features = ["derive", "rc"] }
+chrono = "0.4"
+sqlx = { version = "0.8", features = [
+    "runtime-tokio",
+    "sqlite",
+] } # TODO: strip the features, move to another crate
+serde_json = { version = "1" } # TODO: move to another crate


We should strive to make our dependency versioning consistent. Ideally, we follow minimal versions (-Zminimal-versions), but I can understand that might be tedious. At the very least, we should choose something consistent.

optd-ng-kernel/Cargo.toml

connortsui20 · 2024-11-27T01:47:18Z

optd-ng-kernel/src/cascades/optimizer.rs

+#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Debug, Default, Hash)]
+pub struct GroupId(pub(super) usize);
+
+#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Debug, Default, Hash)]
+pub struct ExprId(pub usize);
+
+#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Debug, Default, Hash)]
+pub struct PredId(pub usize);


These all need documentation, even if it is just a single line. Someone reading this might have no idea what PredId actually represents.

skyzh · 2024-11-27T01:47:42Z

The next steps:

We should split the SQL migration things into a new crate, and potentially use a ORM, though I think an ORM is too heavy in our case -- I don't think we want to worry about schemas changes of optd-core for now.

On the memo table side, we can continue adding new functionalities: logical props, winner, etc.

And, with the current memo table, we can already implement 4 out of 5 cascade tasks (optimize input needs to know and update the winner)

Plus, I think we can start documenting whatever we have now + cut down the features we need on these crates. i.e., we should only use sqlx any or impl Connection interface in the core, and let the cli to decide which database backend to use.

There are also a few optimization opportunities in the memo table -- creating indexes, better merge group (can we really do lazy merging on SQL?)

connortsui20 · 2024-11-27T01:48:30Z

optd-ng-kernel/src/cascades/optimizer.rs

+impl Display for GroupId {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        write!(f, "!{}", self.0)
+    }
+}


Is there a reason why there is an exclamation mark here? Shouldn't this just be something like:

impl Display for GroupId { fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { write!(f, "GroupId: {}", self.0) } }

If there is a reason, then it should be documented.

the old convention from optd-core that makes dump pretty...

That should be documented! Even if this was in the old repo, it's not obvious that this is the purpose.

connortsui20 · 2024-11-27T01:49:19Z

optd-ng-kernel/src/nodes.rs

+//! The RelNode is the basic data structure of the optimizer. It is dynamically typed and is
+//! the internal representation of the plan nodes.


This is a good start, but this could be much more specific. What else does this file contain?

connortsui20 · 2024-11-27T01:50:51Z