Skip to content

Commit a90535e

Browse files
authored
Merge pull request mariadb-corporation#2844 from drrtuy/obsidian
An initial Obsidian vault as a knowledge base.
2 parents de7ba85 + 26aa9ab commit a90535e

20 files changed

+305
-0
lines changed

docs/CSEP->JobList translator.md

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
[[makeJobSteps]]
2+
- [[subquery preprocessing]]
3+
- [[preprocessSelectSubquery]]
4+
- [[preprocessHavingClause]]
5+
- [[parseExecutionPlan]]
6+
- [[makeVtableModeSteps]]
7+

docs/FuncExp high-level overview.md

+105
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
The following introduction will take this SQL statement as an example:
2+
```SQL
3+
select a+b*c, c from test where a>10;
4+
```
5+
6+
Firstly, according to the link you provided, we can see that the main calculation code starts in BatchPrimitiveProcess::execute() from here:
7+
```c++
8+
fe2Output.resetRowGroup(baseRid);
9+
fe2Output.getRow(0, &fe2Out);
10+
fe2Input->getRow(0, &fe2In);
11+
12+
for (j = 0; j < outputRG.getRowCount(); j++, fe2In.nextRow())
13+
{
14+
if (fe2->evaluate(&fe2In))
15+
{
16+
applyMapping(fe2Mapping, fe2In, &fe2Out);
17+
fe2Out.setRid(fe2In.getRelRid());
18+
fe2Output.incRowCount();
19+
fe2Out.nextRow();
20+
}
21+
}
22+
```
23+
24+
By calling the `evaluate` method of `fe2`, with a reference of `row` type `fe2In` as the parameter. The execution of the code then enters the `FuncExpWrapper`, which contains the current calculation expression. It mainly performs two steps, which are:
25+
26+
1. Filtering the conditions in the `where` statement through the functions defined in the `filters` variable. If the `where` condition is met, the calculation proceeds to the next step; otherwise, it will return directly.
27+
2. Calculating the data in `row` through `expression` and obtaining the final result.
28+
29+
```c++
30+
bool FuncExpWrapper::evaluate(Row* r)
31+
{
32+
uint32_t i;
33+
34+
for (i = 0; i < filters.size(); i++)
35+
// filter the conditions in the where statement
36+
if (!fe->evaluate(*r, filters[i].get()))
37+
return false;
38+
// calculate the row that meets the where condition
39+
fe->evaluate(*r, rcs);
40+
41+
return true;
42+
}
43+
```
44+
45+
That is to say, the function that actually performs the calculation is the `evaluate` method pointed to by `fe`. Its parameters are a `row` and a `ReturnedColumn` type array `expression`.
46+
```c++
47+
void FuncExp::evaluate(rowgroup::Row& row, std::vector<execplan::SRCP>& expression)
48+
{
49+
bool isNull;
50+
51+
for (uint32_t i = 0; i < expression.size(); i++)
52+
{
53+
isNull = false;
54+
55+
switch (expression[i]->resultType().colDataType)
56+
{
57+
case CalpontSystemCatalog::DATE:
58+
{
59+
int64_t val = expression[i]->getIntVal(row, isNull);
60+
61+
// @bug6061, workaround date_add always return datetime for both date and datetime
62+
if (val & 0xFFFFFFFF00000000)
63+
val = (((val >> 32) & 0xFFFFFFC0) | 0x3E);
64+
65+
if (isNull)
66+
row.setUintField<4>(DATENULL, expression[i]->outputIndex());
67+
else
68+
row.setUintField<4>(val, expression[i]->outputIndex());
69+
70+
break;
71+
}
72+
73+
....
74+
}
75+
```
76+
77+
In this function, each expression will be looped through, and the calculation results will be obtained by calling the `getValue` method. Taking the SQL statement mentioned at the beginning as an example, there will be two expressions, representing the columns `a + b * c` and `c`. Next, we will use the `a + b * c` expression as an example to introduce its data type and calculation process.
78+
79+
`a + b * c` is an arithmetic operation column, so its type is `ArithmeticColumn` inherited from `ReturnedColumn`. It contains a member variable of type `ParseTree* fExpression`, as well as the implementation of the `getIntValue` method:
80+
```c++
81+
virtual int64_t getIntVal(rowgroup::Row& row, bool& isNull)
82+
{
83+
return fExpression->getIntVal(row, isNull);
84+
}
85+
```
86+
![](./image-20230321175135576.png) Here is AST graphical representation of a the given math expression.
87+
After obtaining the structure of `fExpression`, we continue to look at the calculation process and enter the `getIntVal` method pointed to by `fExpression`:
88+
89+
```c++
90+
//fExpression->getIntVal(row, isNull);
91+
inline int64_t getIntVal(rowgroup::Row& row, bool& isNull)
92+
{
93+
if (fLeft && fRight)
94+
return (reinterpret_cast<Operator*>(fData))->getIntVal(row, isNull, fLeft, fRight);
95+
else
96+
return fData->getIntVal(row, isNull);
97+
}
98+
```
99+
From the code, we can see that the entire calculation process is actually a recursive process. When both the left and right subtrees of the current node are not empty, the current node is converted into an `Operator` type and continues to recursively call; otherwise, the value of the leaf node is returned. Taking the binary tree in the image as an example, when both left and right subtrees exist, the current node's `TreeNode` will be converted to an `operator` type (in this case, the `ArithmeticOperator` type), and the `getIntVal` method defined in it will be recursively called for calculation. When it is a leaf node, the node data is extracted.
100+
101+
Each row of data will go through this recursive process, which will affect the calculation performance to some extent. Therefore, here is my implementation plan for the GSOC project:
102+
103+
Since we can obtain the `ParseTree` before the calculation, we can dynamically generate LLVM calculation code based on the `ParseTree` using JIT technology. When executing `void FuncExp::evaluate(rowgroup::Row& row, std::vector<execplan::SRCP>& expression);`, it is only necessary to call the generated LLVM code, which can avoid the recursive calculation operation for each row of data. In this way, there will be only one recursive call (when dynamically generating code using JIT).
104+
105+
The above are some thoughts I had after reading this part of the code. I hope you can give me some suggestions. Thank you very much.

docs/GROUP BY High overview.md

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
2+
Firstly, here is the context for GB in MCS in general. The whole facility is spread between multiple compilation units. Speaking in terms of classical SQL engine processing. There is a code for PLAN and PREPARE phases. It is glued together b/c now MCS doesn't have a clear distinction b/w the two. It is available in dbcon/mysql/ and dbcon/joblist. I don't think that you will need this part so I will omit it for simplicity.
3+
There is EXECUTE phase code that you will optimize.
4+
It mainly consists of symbols that defined
5+
- [here](https://github.com/mariadb-corporation/mariadb-columnstore-engine/blob/develop/dbcon/joblist/tupleaggregatestep.cpp). SQL AST is translated into a flat program called JobList where Jobs are called Steps and closely relates to the nodes of the original AST. There is TupleAggregateStep that is an equivalent for GROUP BY, DISTINCT and derivatives. This file describes high-level control of TupleAggregateStep.
6+
- [here](https://github.com/mariadb-corporation/mariadb-columnstore-engine/blob/develop/utils/rowgroup/rowaggregation.cpp) is the aggregate and distinct machinery that operates on rows.
7+
- [here](https://github.com/mariadb-corporation/mariadb-columnstore-engine/blob/develop/utils/rowgroup/rowstorage.cpp) is an abstraction of hashmap based as I prev said on RobinHood header-only hash map.
8+

docs/How to write MTR tests.md

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
2+
3+
MTR is a regression test framework for MariaDB/MySQL. It is written in Perl.
4+
[Here](https://github.com/mariadb-corporation/mariadb-columnstore-engine/blob/develop/mysql-test/columnstore/basic/t/mcol271-empty-string-is-not-null.test) is an example of MTR test. When you right the test you can ask MTR to produce a golden file automatically like this.
5+
```shell
6+
./mtr --record --extern socket=/run/mysqld/mysqld.sock --suite=columnstore/basic test_name
7+
```
8+
9+
The golden file goes into mysql-test/columnstore/basic/r/test_name.result.

docs/Limit and Order By.md

Whitespace-only changes.

docs/SubQueries TBD.md

Whitespace-only changes.

docs/WFS_checkWindowFunction.md

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
Statements:
2+
- WFS adds extra aux columns into projection.
3+
- WFS sorts using the outer ORDER BY key column list()
4+
5+
jobInfo.windowDels is used as a temp storage to pass unchanged list of jobInfo.deliveredCols(read as projection list of delivered cols) through WFS if there is no GB/DISTINCT involved.
6+
GB/DISTINCT can clean the list up on their own.
7+
8+
This method might implicitly add GB column in case of aggregates over WF and w/o explicit GROUP BY. This is disabled in WFS::AddSimpleColumn
9+
```SQL
10+
SELECT min(fist_value(i) partition over i) from tab1;
11+
```
12+
13+
14+

docs/WFS_initialize.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
This method runs ORDER BY using built-in sort. Not great in case of ORDER BY + LIMIT but not the end of the world. Remove this. New approach is to save all ORDER BY key columns in output RG.
2+
3+
Retain those ReturnedColumns that are not in WFS columns list but in orderByColsVec.
4+
5+
!!! Remove ORDER BY duplication in WFS.

docs/WFS_makeWindowFunctionStep.md

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
WFS adds extra columns into RG that must be removed after WFS. There is a block here that restores the original deliveredCols only there is no DISTINCT or GROUP BY b/c DISTINCT/GB does a similar filtering modifying jobInfo.distinctColVec.
2+
[[WFS_initialize]]

docs/addProjectStepsToBps.md

+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
There must be 3 projection steps but TBPS RowGroup must be set to something else.
2+
psv size is 4 vs 3

docs/adjustLastStep.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[[WFS_makeWindowFunctionStep]]
2+
[[prepAggregate]]
3+
windowDels -> jobInfo.nonConstDelCols

docs/associateTupleJobSteps.md

+8
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
- if no tables involved makeNoTableJobStep
2+
- Check if constantBooleanSteps is false => sets jobInfo.constantFalse | constant expression evaluates to false
3+
- [[JOIN-related processing]]
4+
- spanningTreeCheck
5+
- [[combineJobStepsByTable]]
6+
- joinTables
7+
- add JOIN query steps in JOIN order
8+
- [[adjustLastStep]]

docs/combineJobStepsByTable.md

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
[[addProjectStepsToBps]]

docs/custom-build.md

+87
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
2+
How to trigger a custom build
3+
=============================
4+
5+
- Open <https://ci.columnstore.mariadb.net>.
6+
7+
- Click the Continue button to login via github.
8+
9+
- After you logged in, select mariadb/mariadb-columnstore-engine repository. Please note that recipes below do not work for branches in forked repositories. The branch you want to build against should be in the main engine repository.
10+
11+
- Click the New Build button on the top right corner.
12+
13+
- Fill the Branch field (branch you want to build).
14+
15+
- Fill desired parameters in key-value style.
16+
17+
Supported parameters with some their values for develop/develop-6 branches:
18+
| parameter name | develop | develop-6 |
19+
|--|--|--|
20+
|`SERVER_REF` | 10.9 |10.6-enterprise |
21+
|`SERVER_SHA` | 10.9| 10.6-enterprise |
22+
|`SERVER_REMOTE` | https://github.com/MariaDB/server|https://github.com/mariadb-corporation/MariaDBEnterprise |
23+
|`REGRESSION_REF` |develop |develop-6 |
24+
|`REGRESSION_TESTS` |`test000.sh,test001.sh` |`test000.sh,test001.sh` |
25+
|`BUILD_DELAY_SECONDS` |0 |0 |
26+
|`SMOKE_DELAY_SECONDS` |0 |0 |
27+
|`MTR_DELAY_SECONDS` |0 |0 |
28+
| `REGRESSION_DELAY_SECONDS`|0 |0 |
29+
| `MTR_SUITE_LIST`|`basic,bugfixes` |`basic,bugfixes` |
30+
| `MTR_FULL_SUITE`|`false` |`false` |
31+
`REGRESSION_TESTS` parameter has an empty value on `cron` (nightly) builds and it passed to build just as an argument to regression script like that:
32+
`./go.sh --tests=${REGRESSION_TESTS}`
33+
So you can set it to `test000.sh,test001.sh` for example (comma separated list).
34+
Build artifacts (packages and tests results) will be available [here](https://cspkg.s3.amazonaws.com/index.html?prefix=custom/%5D).
35+
36+
Trigger a build against external packages (built by external ci-systems like Jenkins)
37+
=============================
38+
39+
- Start build just like a regular custom build, but choose branch `external-packages`.
40+
41+
- Add `EXTERNAL_PACKAGES_URL` variable. For example, if you want to run tests for packages from URL `https://es-repo.mariadb.net/jenkins/ENTERPRISE/bb-10.6.9-5-cs-22.08.1-2/a71ceba3a33888a62ee0a783adab8b34ffc9c046/`, you should set
42+
`EXTERNAL_PACKAGES_URL=https://es-repo.mariadb.net/jenkins/ENTERPRISE/10.6-enterprise-undo/d296529db9a1e31eab398b5c65fc72e33d0d6a8a`.
43+
44+
|parameter name | mandatory |default value |
45+
|--|--|--|
46+
|`EXTERNAL_PACKAGES_URL` | true | |
47+
|`REGRESSION_REF` |false |`develop` |
48+
49+
Get into the live build on mtr/regression steps
50+
===============================================
51+
52+
Prerequisites:
53+
54+
- [docker binary](https://docs.docker.com/engine/install/) (we need only client, no need to use docker daemon)
55+
56+
- [drone cli binary](https://docs.drone.io/cli/install/)
57+
58+
- [your personal drone token](https://ci.columnstore.mariadb.net/account)
59+
60+
- run you custom build number with `MTR_DELAY_SECONDS` or `REGRESSION_DELAY_SECONDS` parameters and note build number.
61+
Build number example:
62+
![](https://lh4.googleusercontent.com/bUXokNezygP7Xx8KqIAYrJEXzFJua6QqP1aDKkr2LTmb3VXASem8MYSzYfB3K3ZySmJTs6ylfh37oYsnFMp0arVT4iNZonJH4kClFlzja_Un89g9n9En6M8kw-VM4VwF3d_ONI18I00Zdsbard1MTmg)
63+
64+
1. Export environment variables:
65+
```Shell
66+
export DRONE_AUTOSCALER=https://autoscaler.columnstore.mariadb.net
67+
export DRONE_SERVER=https://ci.columnstore.mariadb.net
68+
export DRONE_TOKEN=your-personal-token-from-drone-ui-account-page
69+
```
70+
Note Use https://autoscaler-arm.columnstore.mariadb.net as ARM autoscaler.
71+
2. Run:
72+
```Shell
73+
for i in $(drone server ls); do eval "$(drone server env $i)" && drone server info $i --format="{{ .Name }}" && docker ps --format="{{ .Image }} {{ .Names }}" --filter=name=5107; done
74+
```
75+
Where 5107 is your build number.
76+
You should see some output looks like this:
77+
78+
![](https://lh5.googleusercontent.com/O5gbs6bHH9PnlqP_R-nUkGUM_V98c9s9AvDhEDcNx0R22Wlpka4O1-G7GkdZCJNxzxmsMLn5rlRKcYjRakOgF4FQkVZrCSVYQueaqxaL8-lmQg45Yc6ZOEIUOUZhiXe4YQNid1L3N4YqlDiNjSq4FfE)
79+
80+
3. Run:
81+
```
82+
eval "$(drone server env agent-A4kVtsDU)"
83+
```
84+
4. Run:
85+
```
86+
docker exec -it regression5107 bash
87+
```

docs/doAggProject.md

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Here is a filter that removes all aux columns' ids from JI::returnedColVec.
2+
JI::returnedColVec : map<id , aggOp> where 0 - noAggOp means non-agg column
3+
JI::returnedColVec is used in TAS methods.
4+

docs/doProject.md

Whitespace-only changes.

docs/makeVtableModeSteps.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
- [[Limit and Order By]]
2+
- [[associateTupleJobSteps]]
3+
- [[numberSteps]]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
2+
- SessionManagerServer has a number of state flags:
3+
- SS_READY = 1; // Set by dmlProc one time when dmlProc is ready
4+
- SS_SUSPENDED = 2// Set by console when the system has been suspended by user.
5+
- SS_SUSPEND_PENDING = 4// Set by console when user wants to suspend, but writing is occuring.
6+
- SS_SHUTDOWN_PENDING = 8 // Set by console when user wants to shutdown, but writing is occuring.
7+
- SS_ROLLBACK = 16; // In combination with a PENDING flag, force a rollback as soon as possible.
8+
- SS_FORCE = 32; // In combination with a PENDING flag, force a shutdown without rollback.
9+
- SS_QUERY_READY = 64 // Set by PrimProc after ExeMgr thread is up and running
10+
11+
- The actual state is a combination of flags.
12+
13+
- The state of a running cluster resides in SessionManagerServer instance attribute that is inside a controllernode process.
14+
15+
- There is a cold storage for a state that is stored in a file pointed by SessionManager.TxnIDFile with /var/lib/columnstore/data1/systemFiles/dbrm/SMTxnID. This cold state is loaded up when controllernode starts.
16+
17+
- The following FSM diagram demostrates some transitions. !It is not full yet!
18+
19+
```mermaid
20+
stateDiagram-v2
21+
ZeroState --> InitState : Controllernode reads cold state
22+
InitState --> InitState_6 : ExeMgr threads starts in PP/SS_QUERY_READY
23+
InitState_6 --> InitState_6_!1 : DMLProc begins rollbackAll / !SS_READY
24+
InitState_6_!1 --> InitState_6_!1 : DMLProc fails rollbackAll /
25+
InitState_6_!1 --> InitState_6_1 : DMLProc finishes rollbackAll sucessfully / SS_READY
26+
InitState_6_1 --> InitState_16_8_6_1 : cmapi_gets_shutdown
27+
InitState_16_8_6_1 --> ZeroState : see rollback_is_ok
28+
InitState_16_8_6_1 --> InitState_32_16_8_6_1 : see failed_rollback
29+
InitState_32_16_8_6_1 --> ZeroState : force DMLProc shutdown
30+
```
31+
32+
33+
cmapi_gets_shutdown: CMAPI gets shutdown request with TO / SS_SHUTDOWN_PENDING + SS_ROLLBACK
34+
rollback_is_ok: DMLProc sucessfully rollbacks active txns within TO and cluster stops
35+
failed_rollback: DMLProc sucessfully rollbacks active txns within TO and cluster stops

docs/parseExecutionPlan.md

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
- [[walkTree filters]]
2+
- [[optimizeFilterOrder]]
3+
- [[SubQueries TBD]]
4+
- [[WFS_checkWindowFunction]]
5+
- [[TBD checkAggregation]]
6+
- [[JI::havingStepVec -> querySteps]]
7+
- [[iterate querySteps]]
8+
- Filter VARBINARY from querySteps
9+
- pColStep -> pColScanStep translation; feel in seenTableIds
10+
- [[doAggProject]]
11+
- [[doProject]]

docs/prepAggregate.md

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
creates RowGroups for aggregation

0 commit comments

Comments
 (0)