Skip to content

Vldbss 2025(merge on demand) #581

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -81,4 +81,3 @@ MacroBlockBegin: "
END_CATCH_ERROR$"
...


2 changes: 1 addition & 1 deletion benchmark/pax_storage_concurrency_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ class BenchmarkBase : public Fixture
table_meta_->fields_[1].attr_len_ = 11;
table_meta_->fields_[1].field_id_ = 1;
handler_ = new RecordFileHandler(StorageFormat::PAX_FORMAT);
rc = handler_->init(*buffer_pool_, log_handler_, table_meta_);
rc = handler_->init(*buffer_pool_, log_handler_, table_meta_, nullptr);
if (rc != RC::SUCCESS) {
LOG_WARN("failed to init record file handler. rc=%s", strrc(rc));
throw runtime_error("failed to init record file handler");
Expand Down
2 changes: 1 addition & 1 deletion benchmark/record_manager_concurrency_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ class BenchmarkBase : public Fixture
}

handler_ = new RecordFileHandler(StorageFormat::ROW_FORMAT);
rc = handler_->init(*buffer_pool_, log_handler_, nullptr);
rc = handler_->init(*buffer_pool_, log_handler_, nullptr, nullptr);
if (rc != RC::SUCCESS) {
LOG_WARN("failed to init record file handler. rc=%s", strrc(rc));
throw runtime_error("failed to init record file handler");
Expand Down
1 change: 1 addition & 0 deletions build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ function do_init
git -C "deps/3rd/libevent" checkout 112421c8fa4840acd73502f2ab6a674fc025de37 || return
# git submodule update --remote "deps/3rd/libevent" || return
git -C "deps/3rd/jsoncpp" checkout 1.9.6 || return

current_dir=$PWD

MAKE_COMMAND="make --silent"
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
32 changes: 32 additions & 0 deletions docs/docs/design/miniob-how-to-add-new-datatype.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
title: 如何新增一种数据类型
---

> 本文介绍如何新增一种数据类型。
MiniOB 的数据类型系统采用分层设计,实现集中在[path](../../../src/observer/common)文件夹下,核心组件包括:
1. Value 类:统一数据操作接口
路径:src/observer/common/value.h
作用:封装实际数据值,提供类型无关的操作方法
2. Type 工具类:特定类型的操作实现
路径:src/observer/common/type/
作用:每种数据类型对应一个工具类,实现具体运算逻辑

以下示例展示 MiniOB 如何处理整数类型数据:
```cpp
// 假设解析器识别到整数 "1"
int val = 1;
Value value(val); // 封装为 Value 对象
// 执行加法运算
Value result;
Value::add(value, value, result); // 调用加法接口
// Value::add 方法内部会根据类型调用对应工具类
// 对于 INT 类型,实际调用代码位于:
// src/observer/common/type/integer_type.cpp
```

# 若要新增一种数据类型(如 DATE),建议按以下步骤开发:
1. 在 src/observer/common/type/attr_type.h 中添加新的类型枚举以及对应类型名
2. 在 src/observer/common/type/data_type.cpp 中添加新的类型实例
3. 在 src/observer/common/type/ 文件夹下,参照现有工具类,实现 DateType 工具类
4. 在 Value 类中增加类型处理逻辑,支持date类型的分发,储存date类型值
5. 必要情况下还需要增加新的词法规则(lex_sql.l)以及语法规则(yacc_sql.y),支持新类型关键字
522 changes: 522 additions & 0 deletions docs/docs/design/miniob-realtime-analytic.md

Large diffs are not rendered by default.

169 changes: 169 additions & 0 deletions docs/docs/design/miniob-sql-execution-process.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
如图[image](images/miniob-sql-execution-process.png)

PlantUML时序图使用 https://www.plantuml.com/plantuml 生成
代码如下:
```cpp
@startuml
title SQL 执行流程时序图\n

skinparam sequence {
ArrowColor #003366
LifeLineBorderColor #003366
LifeLineBackgroundColor #F0F8FF
ParticipantBorderColor #003366
ParticipantBackgroundColor #E6F7FF
ParticipantFontColor #003366
NoteBackgroundColor #FFF2E6
NoteBorderColor #FFA940
BoxPadding 20
}

actor 客户端 as Client
participant "Server" as Server
participant "TaskHandler" as TaskHandler
participant "SessionStage" as Session
participant "ParseStage" as Parser
participant "ResolveStage" as Resolver
participant "OptimizeStage" as Optimizer
participant "ExecuteStage" as Executor
participant "ResultHandler" as Result

autonumber "<b>[000]"

Client -> Server: SQL命令(网络请求)
note right: 1. 客户端发送SQL到服务器端口\n - 建立TCP连接\n - 传输SQL文本
activate Server

Server -> TaskHandler: handle_event(communicator)
activate TaskHandler
note right: 2. 任务处理器初始化\n - 创建事件对象\n - 绑定通信器\n - 准备处理环境

TaskHandler -> Session: handle_request2(event)
activate Session
Session --> TaskHandler: 会话就绪
deactivate Session
note right Session: 3. 会话管理\n - 用户身份验证\n - 设置数据库上下文\n - 维护会话状态

group "SQL处理核心 (责任链模式)"
TaskHandler -> Parser: handle_request(sql_event)
activate Parser
group "ParseStage.handle_request"
Parser -> Parser: parse(sql.c_str())
note right: 4.1 启动SQL解析\n - 准备解析环境
Parser -> Parser: yylex_init_extra()
note right: 4.2 初始化词法扫描器\n - 分配扫描器资源\n - 设置字符串缓冲区
Parser -> Parser: yyparse(scanner)
note right: 4.3 语法解析核心\n - Flex词法分析\n - Bison语法分析\n - 生成AST节点树
Parser -> Parser: yylex_destroy()
note right: 4.4 释放扫描器资源\n - 清理词法分析状态
Parser --> TaskHandler: 返回AST
end
deactivate Parser
note right: 5. 解析完成\n - 输出抽象语法树(AST)\n - 支持SELECT/INSERT等语句结构

TaskHandler -> Resolver: handle_request(sql_event)
activate Resolver
group "ResolveStage.handle_request"
Resolver -> Resolver: Stmt::create_stmt()
note right: 6.1 语句转换入口\n - 根据AST节点类型分发
Resolver -> Resolver: 递归处理节点
note right: 6.2 深度遍历语法树\n - 处理子表达式\n - 构建完整语句结构
Resolver -> Resolver: 创建具体Stmt对象
note right: 6.3 生成可执行语句\n - SelectStmt\n - InsertStmt\n - UpdateStmt等
Resolver --> TaskHandler: 返回Stmt
end
deactivate Resolver
note right: 7. 转换完成\n - AST→可执行Stmt对象\n - 完成语义分析

TaskHandler -> Optimizer: handle_request(sql_event)
activate Optimizer
group "OptimizeStage.handle_request"
Optimizer -> Optimizer: create_logical_plan()
note right: 8.1 创建初始逻辑计划\n - 扫描算子\n - 连接算子\n - 投影算子
Optimizer -> Optimizer: logical_plan_generator.create()
note right: 8.2 生成逻辑算子树\n - 递归处理Stmt对象

group "重写规则应用"
Optimizer -> Optimizer: ComparisonSimplificationRule.rewrite()
note right: 9.1 比较表达式简化\n 例: 1=1 → true\n 常量折叠优化
Optimizer -> Optimizer: ConjunctionSimplificationRule.rewrite()
note right: 9.2 连接表达式简化\n 例: false AND expr → false\n 短路逻辑优化
Optimizer -> Optimizer: PredicateRewriteRule.rewrite()
note right: 9.3 谓词重写\n 删除恒真表达式\n 简化条件结构
Optimizer -> Optimizer: PredicatePushdownRewriter.rewrite()
note right: 9.4 谓词下推\n 将过滤条件下推至扫描层\n 减少中间结果集
end

Optimizer -> Optimizer: optimize()
note right: 10. 基于代价优化\n - 选择最优连接顺序\n - 索引选择\n - 访问路径优化

Optimizer -> Optimizer: generate_physical_plan()
alt 向量模型
Optimizer -> Optimizer: physical_plan_generator.create_vec()
note right: 11.1 向量化执行计划\n - 批量处理(1024行/批)\n - 列式内存布局\n - 现代OLAP优化
else 火山模型
Optimizer -> Optimizer: physical_plan_generator.create()
note right: 11.2 迭代式执行计划\n - 逐行处理\n - next()接口模型\n - 传统OLTP优化
end
Optimizer --> TaskHandler: 返回执行计划
end
deactivate Optimizer
note right: 12. 优化完成\n - 生成物理执行计划\n - 准备执行环境

TaskHandler -> Executor: handle_request(sql_event)
activate Executor
group "ExecuteStage.handle_request"
alt 有物理算子(DML)
Executor -> Executor: handle_request_with_physical_operator()
note right: 13.1 DML执行路径\n - SELECT: 执行查询\n - INSERT: 插入数据\n - DELETE: 删除数据
Executor --> TaskHandler: DML结果
else 无物理算子(DDL)
Executor -> Executor: command_executor.execute()
note right: 13.2 DDL执行路径\n - CREATE TABLE\n - CREATE INDEX\n - ALTER TABLE
Executor --> TaskHandler: DDL结果
end
end
deactivate Executor
note right: 14. 执行完成\n - 数据变更生效\n - 查询结果就绪
end

TaskHandler -> Result: write_result(event)
activate Result

group "ResultHandler处理"
alt 无结果集(DDL/更新)
Result -> Result: write_result_internal()
note right: 15.1 简单结果处理\n - 操作状态(成功/失败)\n - 影响行数统计
Result --> Client: 状态码
else 有结果集(查询)
Result -> Result: sql_result->open()
note right: 15.2 初始化结果集\n - 准备数据缓冲区\n - 获取表头信息
Result -> Result: 打印表头
note right: 15.3 输出列信息\n - 列名\n - 数据类型\n - 列宽

alt 向量模型
Result -> Result: write_chunk_result()
note right: 16.1 向量化结果返回\n - 批量传输数据块\n - 高效网络利用\n - 减少序列化开销
Result --> Client: 批量数据
else 火山模型
Result -> Result: write_tuple_result()
note right: 16.2 迭代式结果返回\n - 逐行获取数据\n - 流式传输模式
loop 逐行处理
Result --> Client: 单行数据
end
end
end
end

Result --> TaskHandler: 完成
deactivate Result

TaskHandler --> Server: 处理完成
deactivate TaskHandler

Server --> Client: 最终确认
deactivate Server
note right: 17. 完整流程结束\n - 释放所有资源\n - 维护连接状态\n - 准备接收下一条SQL

@enduml
```
29 changes: 29 additions & 0 deletions docs/docs/how_to_build.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,3 +160,32 @@ git config --global core.autocrlf false
关于该问题的更多细节,请参考[问题来源](https://ask.oceanbase.com/t/topic/35604437/7)。
关于该问题的进一步分析,请参考[Linux系统下执行sudo命令环境变量失效现象](https://zhuanlan.zhihu.com/p/669332689)。
也可以将cmake所在路径添加到sudo的PATH变量中来解决上述问题,请参考[sudo命令下环境变量实效的解决方法](https://www.cnblogs.com/xiao-xiaoyang/p/17444600.html)。


### 3. Could not find a package configuration file provided by "Libevent"
在执行build.sh脚本时,遇到下面的错误
![cmake error](images/miniob-build-libevent.png)

通常是因为cmake版本原因(版本太高?)导致libevent在init阶段没有编译成功。

***解决方法:***

在[text](../../deps/3rd/libevent/CMakeLists.txt) 中将cmake的最低版本设置
cmake_minimum_required(VERSION 3.1 FATAL_ERROR)
改为
cmake_minimum_required(VERSION 3.1...3.8 FATAL_ERROR)
之后重新执行
```bash
sudo bash build.sh init
```

如果你成功解决libevent的问题,你大概率会遇到另一个错误:
![cmake error](images/miniob-build-jsoncpp.png)
需要在[text](../../deps/3rd/jsoncpp/jsoncppConfig.cmake.in)中将cmake策略
cmake_policy(VERSION 3.0)
改为
cmake_policy(VERSION 3.0...3.8)
之后重新执行
```bash
sudo bash build.sh init
```
Binary file added docs/docs/images/miniob-build-jsoncpp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/docs/images/miniob-build-libevent.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 13 additions & 0 deletions src/common/lang/comparator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,19 @@ int compare_int(void *arg1, void *arg2)
}
}

int compare_int64(void *arg1, void *arg2)
{
int v1 = *(int64_t *)arg1;
int v2 = *(int64_t *)arg2;
if (v1 > v2) {
return 1;
} else if (v1 < v2) {
return -1;
} else {
return 0;
}
}

int compare_float(void *arg1, void *arg2)
{
float v1 = *(float *)arg1;
Expand Down
1 change: 1 addition & 0 deletions src/common/lang/comparator.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ See the Mulan PSL v2 for more details. */
namespace common {

int compare_int(void *arg1, void *arg2);
int compare_int64(void *arg1, void *arg2);
int compare_float(void *arg1, void *arg2);
int compare_string(void *arg1, int arg1_max_length, void *arg2, int arg2_max_length);

Expand Down
10 changes: 10 additions & 0 deletions src/observer/common/type/attr_type.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,13 @@ AttrType attr_type_from_string(const char *s)
}
return AttrType::UNDEFINED;
}

bool is_numerical_type(AttrType type)
{
return (type == AttrType::INTS || type == AttrType::FLOATS);
}

bool is_string_type(AttrType type)
{
return (type == AttrType::CHARS);
}
2 changes: 2 additions & 0 deletions src/observer/common/type/attr_type.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,5 @@ enum class AttrType

const char *attr_type_to_string(AttrType type);
AttrType attr_type_from_string(const char *s);
bool is_numerical_type(AttrType type);
bool is_string_type(AttrType type);
3 changes: 3 additions & 0 deletions src/observer/common/type/data_type.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ See the Mulan PSL v2 for more details. */
#include "common/type/data_type.h"
#include "common/type/vector_type.h"

// Todo: 实现新数据类型
// your code here

Comment on lines +17 to +19
Copy link
Preview

Copilot AI Jul 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment '实现新数据类型' (implement new data types) should be in English and should be removed if the implementation is complete.

Suggested change
// Todo: 实现新数据类型
// your code here

Copilot uses AI. Check for mistakes.

array<unique_ptr<DataType>, static_cast<int>(AttrType::MAXTYPE)> DataType::type_instances_ = {
make_unique<DataType>(AttrType::UNDEFINED),
make_unique<CharType>(),
Expand Down
4 changes: 4 additions & 0 deletions src/observer/common/type/data_type.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,15 @@ See the Mulan PSL v2 for more details. */
#include "common/type/attr_type.h"

class Value;
class Column;

/**
* @brief 定义了数据类型相关的操作,比如比较运算、算术运算等
* @defgroup DataType
* @details 数据类型定义的算术运算中,比如 add、subtract 等,将按照当前数据类型设置最终结果值的类型。
* 参与运算的参数类型不一定相同,不同的类型进行运算是否能够支持需要参考各个类型的实现。
*/

class DataType
{
public:
Expand All @@ -47,6 +49,8 @@ class DataType
*/
virtual int compare(const Value &left, const Value &right) const { return INT32_MAX; }

virtual int compare(const Column &left, const Column &right, int left_idx, int right_idx) const { return INT32_MAX; }

/**
* @brief 计算 left + right,并将结果保存到 result 中
*/
Expand Down
11 changes: 10 additions & 1 deletion src/observer/common/type/float_type.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,25 @@ See the Mulan PSL v2 for more details. */
#include "common/value.h"
#include "common/lang/limits.h"
#include "common/value.h"
#include "storage/common/column.h"

int FloatType::compare(const Value &left, const Value &right) const
{
ASSERT(left.attr_type() == AttrType::FLOATS, "left type is not integer");
ASSERT(left.attr_type() == AttrType::FLOATS, "left type is not float");
ASSERT(right.attr_type() == AttrType::INTS || right.attr_type() == AttrType::FLOATS, "right type is not numeric");
float left_val = left.get_float();
float right_val = right.get_float();
return common::compare_float((void *)&left_val, (void *)&right_val);
}

int FloatType::compare(const Column &left, const Column &right, int left_idx, int right_idx) const
{
ASSERT(left.attr_type() == AttrType::FLOATS, "left type is not float");
ASSERT(right.attr_type() == AttrType::FLOATS, "right type is not float");
return common::compare_float((void *)&((float*)left.data())[left_idx],
(void *)&((float*)right.data())[right_idx]);
}

RC FloatType::add(const Value &left, const Value &right, Value &result) const
{
result.set_float(left.get_float() + right.get_float());
Expand Down
1 change: 1 addition & 0 deletions src/observer/common/type/float_type.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ class FloatType : public DataType
virtual ~FloatType() = default;

int compare(const Value &left, const Value &right) const override;
int compare(const Column &left, const Column &right, int left_idx, int right_idx) const override;

RC add(const Value &left, const Value &right, Value &result) const override;
RC subtract(const Value &left, const Value &right, Value &result) const override;
Expand Down
Loading
Loading