Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tdigest): add tdigest algorithm and storage encoding implementations #2741

Open
wants to merge 40 commits into
base: unstable
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
3ca2216
add tdigest basic definitions.
LindaSummer Oct 29, 2024
94d89c6
add metadata encoding and decoding.
LindaSummer Oct 30, 2024
10527f1
add create implementation.
LindaSummer Oct 31, 2024
5b62c77
add some utils for tdigest.
LindaSummer Nov 3, 2024
b8f568f
partly add tdigest core implemention.
LindaSummer Nov 4, 2024
675c815
refactor tdigest structure and add `Add` implementation.
LindaSummer Nov 6, 2024
d26ffeb
refactor tdigestmerge function.
LindaSummer Nov 8, 2024
a462b07
wip: add tdigest quantile logic.
LindaSummer Nov 21, 2024
f9b24b9
wip: implementing quantiles
LindaSummer Nov 21, 2024
9d54b3d
wip: implementing encoding function.
LindaSummer Nov 24, 2024
ac5d7dd
wip: add serialization method.
LindaSummer Dec 5, 2024
067766f
implement necessary functions.
LindaSummer Dec 22, 2024
2e22834
pass compile stage
LindaSummer Jan 5, 2025
1b6a18a
Add unit tests
LindaSummer Jan 5, 2025
da69aa9
Add tests and fix search key encoding issue.
LindaSummer Jan 19, 2025
b3a6370
fix crash in unittest.
LindaSummer Jan 20, 2025
303a4aa
partialy fix metadata min max issue.
LindaSummer Jan 21, 2025
d80a98f
fix some bugs.
LindaSummer Jan 25, 2025
32dc695
add more unit tests.
LindaSummer Jan 25, 2025
ac2ea2d
cleanup unused code.
LindaSummer Jan 25, 2025
d79074e
format code for linter.
LindaSummer Jan 25, 2025
aa24799
Merge branch 'unstable' into feature/tdigest-for-first-pr
LindaSummer Jan 25, 2025
f84f7d3
apply review suggestions.
LindaSummer Jan 26, 2025
592da56
Merge branch 'unstable' of https://github.com/apache/kvrocks into fea…
LindaSummer Jan 26, 2025
7648868
fix typo of tdigest metadata size check.
LindaSummer Jan 26, 2025
1cb7eb6
Merge branch 'unstable' into feature/tdigest-for-first-pr
PragmaTwice Jan 27, 2025
d6817d3
add new endlines for each file.
LindaSummer Jan 27, 2025
91c760a
rename tdigest type name.
LindaSummer Jan 27, 2025
5a98607
Merge branch 'unstable' into feature/tdigest-for-first-pr
LindaSummer Jan 27, 2025
9d7230b
Merge branch 'unstable' into feature/tdigest-for-first-pr
PragmaTwice Jan 28, 2025
f7fe96e
Merge branch 'unstable' into feature/tdigest-for-first-pr
LindaSummer Jan 28, 2025
c4f9554
Merge branch 'unstable' into feature/tdigest-for-first-pr
aleksraiden Jan 28, 2025
af5480b
Merge branch 'unstable' into feature/tdigest-for-first-pr
LindaSummer Jan 29, 2025
9441342
remove const_cast with a copy.
LindaSummer Jan 30, 2025
7f6d090
Merge branch 'unstable' into feature/tdigest-for-first-pr
LindaSummer Jan 30, 2025
0a6ae83
Merge branch 'unstable' into feature/tdigest-for-first-pr
PragmaTwice Jan 30, 2025
c23f106
refactor useless copy to const reference.
LindaSummer Jan 30, 2025
f0f2f3d
refactor for comments.
LindaSummer Feb 3, 2025
0e842bb
Merge branch 'unstable' into feature/tdigest-for-first-pr
LindaSummer Feb 3, 2025
1cb9882
Merge branch 'unstable' into feature/tdigest-for-first-pr
LindaSummer Feb 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 39 additions & 1 deletion src/storage/redis_metadata.cc
Original file line number Diff line number Diff line change
Expand Up @@ -331,7 +331,8 @@ bool Metadata::ExpireAt(uint64_t expired_ts) const {
bool Metadata::IsSingleKVType() const { return Type() == kRedisString || Type() == kRedisJson; }

bool Metadata::IsEmptyableType() const {
return IsSingleKVType() || Type() == kRedisStream || Type() == kRedisBloomFilter || Type() == kRedisHyperLogLog;
return IsSingleKVType() || Type() == kRedisStream || Type() == kRedisBloomFilter || Type() == kRedisHyperLogLog ||
Type() == kRedisTDigest;
}

bool Metadata::Expired() const { return ExpireAt(util::GetTimeStampMS()); }
Expand Down Expand Up @@ -497,3 +498,40 @@ rocksdb::Status HyperLogLogMetadata::Decode(Slice *input) {

return rocksdb::Status::OK();
}

void TDigestMetadata::Encode(std::string *dst) const {
Metadata::Encode(dst);
PutFixed32(dst, compression);
PutFixed32(dst, capacity);
PutFixed64(dst, unmerged_nodes);
PutFixed64(dst, merged_nodes);
PutFixed64(dst, total_weight);
PutFixed64(dst, merged_weight);
PutDouble(dst, minimum);
PutDouble(dst, maximum);
PutFixed64(dst, total_observations);
PutFixed64(dst, merge_times);
}

rocksdb::Status TDigestMetadata::Decode(Slice *input) {
if (auto s = Metadata::Decode(input); !s.ok()) {
return s;
}

if (input->size() < (sizeof(uint32_t) * 2 + sizeof(uint64_t) * 6 + sizeof(double) * 2)) {
return rocksdb::Status::InvalidArgument(kErrMetadataTooShort);
}

GetFixed32(input, &compression);
GetFixed32(input, &capacity);
GetFixed64(input, &unmerged_nodes);
GetFixed64(input, &merged_nodes);
GetFixed64(input, &total_weight);
GetFixed64(input, &merged_weight);
GetDouble(input, &minimum);
GetDouble(input, &maximum);
GetFixed64(input, &total_observations);
GetFixed64(input, &merge_times);

return rocksdb::Status::OK();
}
32 changes: 29 additions & 3 deletions src/storage/redis_metadata.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
#include <atomic>
#include <bitset>
#include <initializer_list>
#include <limits>
#include <string>
#include <vector>

Expand All @@ -51,6 +52,7 @@ enum RedisType : uint8_t {
kRedisBloomFilter = 9,
kRedisJson = 10,
kRedisHyperLogLog = 11,
kRedisTDigest = 12,
};

struct RedisTypes {
Expand Down Expand Up @@ -92,9 +94,9 @@ enum RedisCommand {
kRedisCmdLMove,
};

const std::vector<std::string> RedisTypeNames = {"none", "string", "hash", "list",
"set", "zset", "bitmap", "sortedint",
"stream", "MBbloom--", "ReJSON-RL", "hyperloglog"};
const std::vector<std::string> RedisTypeNames = {"none", "string", "hash", "list", "set",
"zset", "bitmap", "sortedint", "stream", "MBbloom--",
"ReJSON-RL", "hyperloglog", "TDIS-TYPE"};

constexpr const char *kErrMsgWrongType = "WRONGTYPE Operation against a key holding the wrong kind of value";
constexpr const char *kErrMsgKeyExpired = "the key was expired";
Expand Down Expand Up @@ -337,3 +339,27 @@ class HyperLogLogMetadata : public Metadata {

EncodeType encode_type = EncodeType::DENSE;
};

class TDigestMetadata : public Metadata {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have some validate function here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mapleFU ,

Do you mean for validation the compression property?

I have added a limitation for it as suggested.

Best Regards,
Edward

public:
uint32_t compression;
uint32_t capacity;
uint64_t unmerged_nodes = 0;
uint64_t merged_nodes = 0;
uint64_t total_weight = 0;
uint64_t merged_weight = 0;
double minimum = std::numeric_limits<double>::max();
double maximum = std::numeric_limits<double>::lowest();
uint64_t total_observations = 0;
uint64_t merge_times = 0;
Comment on lines +353 to +354
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So statisitics like merge_times and total_observations is also serialized, do they used for INFO output or just for debugging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mapleFU ,

These two metrics are used in INFO command for compatibility with redis.

Best Regards,
Edward


explicit TDigestMetadata(uint32_t compression, uint32_t capacity, bool generate_version = true)
: Metadata(kRedisTDigest, generate_version), compression(compression), capacity(capacity) {}
explicit TDigestMetadata(bool generate_version = true) : TDigestMetadata(0, 0, generate_version) {}
void Encode(std::string *dst) const override;
rocksdb::Status Decode(Slice *input) override;

uint64_t TotalNodes() const { return merged_nodes + unmerged_nodes; }

double Delta() const { return 1. / static_cast<double>(compression); }
mapleFU marked this conversation as resolved.
Show resolved Hide resolved
};
Loading
Loading