-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GFD mining #465
base: main
Are you sure you want to change the base?
Add GFD mining #465
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There were too many comments to post at once. Showing the first 25 out of 29. Check the log or trigger a new build to see more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
c3eeecf
to
a59e3cc
Compare
: query_(query_), iso_(iso_), res_(res_) {} | ||
|
||
template <typename CorrespondenceMap1To2, typename CorrespondenceMap2To1> | ||
bool operator()(CorrespondenceMap1To2 f, CorrespondenceMap2To1) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second parameter is not used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boost Graph Library requires such syntax from callback function. Unfortunately, if I change the function signature, the code will not compile
Add PR description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
You are doing great! Please don't forget to notify when you are done with changes requested. Also please mark conversations if they are resolved |
60aeadc
to
9ce90a3
Compare
Token name_token = Token(i, name.first); | ||
for (auto& value : name.second) { | ||
Token value_token = Token(-1, value); | ||
result.push_back(Literal(name_token, value_token)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result.reserve(name.second.size() * attrs_info.at(label).size())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, name.second
can have different number of values, depending on name.first
. The reserve may not work correctly.
11d6a2e
to
13b76ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
b5fd5d6
to
ea596c9
Compare
9e67961
to
8cb290f
Compare
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you fix those minor issues, it will be great. However, I think the code is already good enough to be approved
for (std::size_t i = 0; i < patterns.size(); ++i) { | ||
auto pattern = patterns[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we have std::views::enumerate
support, but if we do, than it's better to use it in cases where we need both an index and an element:
for (std::size_t i = 0; i < patterns.size(); ++i) { | |
auto pattern = patterns[i]; | |
for (auto [i, pattern] : std::views::enumerate(patterns)) { |
for (std::size_t i = 0; i < patterns.size(); ++i) { | ||
graph_t pattern = patterns.at(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here if we have std::views::enumerate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, the project only supports C++20
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look at almost anything, the logic part was hopefully checked by other reviewers. I only looked at the tests.
I described the most important comment in the test file itself. But I would also like to draw attention to two things:
- Why do we have access to types specific to this primitive (
Gfd
,Literal
,Token
) in thealgos
namespace? Usually themodel
namespace is used for this.Literal
andToken
must definitely not be in this namespace, andGfd
is still acceptable, but also not very good. - Some tests run for quite a long time in debug mode:
https://github.com/Desbordante/desbordante-core/actions/runs/14045967659/job/39326756580?pr=465
Run core-tests on ubuntu-latest with llvm-clang, Debug
270/719 Test #273: GfdMiningTest.TestMovies ........................................................................................................................................................................................................................................................................................................................... Passed 13.92 sec
Start 274: GfdMiningTest.TestSymbols
271/719 Test #274: GfdMiningTest.TestSymbols .......................................................................................................................................................................................................................................................................................................................... Passed 129.41 sec
Start 275: GfdMiningTest.TestShapes
272/719 Test #275: GfdMiningTest.TestShapes ........................................................................................................................................................................................................................................................................................................................... Passed 61.20 sec
Is this normal? Are there known limits that are acceptable? It seems that this is quite long. Such tests are usually marked with an additional prefix (HeavyDatasets
), so as not to run too expensive tests every time.
Add class for GFD miner
Add minimal GFD and GFD with multiple conclusion tests
Added two examples with searching for dependencies in small graphs.
This PR implements an algorithm for mining graph functional dependencies based on article "Discovering Graph Functional Dependencies" by Fan Wenfei, Hu Chunming, Liu Xueli, and Lu Ping. Algorithm, given an input graph, returns a set of dependencies satisfied on this graph. The algorithm also has two configurable parameters:
k
is the maximum number of vertices in the pattern of the mined dependency andsigma
is its minimum frequency.In addition, the PR implements the ability to run the algorithm in Python, and also contains examples of its use.