New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add GFD mining #465

Open

AntonChern wants to merge 5 commits into Desbordante:main from AntonChern:gfd

Contributor

AntonChern commented Sep 25, 2024 •

edited

Loading

This PR implements an algorithm for mining graph functional dependencies based on article "Discovering Graph Functional Dependencies" by Fan Wenfei, Hu Chunming, Liu Xueli, and Lu Ping. Algorithm, given an input graph, returns a set of dependencies satisfied on this graph. The algorithm also has two configurable parameters: k is the maximum number of vertices in the pattern of the mined dependency and sigma is its minimum frequency.
In addition, the PR implements the ability to run the algorithm in Python, and also contains examples of its use.

github-actions bot reviewed

View reviewed changes

github-actions bot left a comment

clang-tidy made some suggestions

There were too many comments to post at once. Showing the first 25 out of 29. Check the log or trigger a new build to see more.

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

AntonChern force-pushed the gfd branch from 0f5bb57 to 415de17 Compare

October 1, 2024 07:52

github-actions bot reviewed

View reviewed changes

github-actions bot left a comment

clang-tidy made some suggestions

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

AntonChern force-pushed the gfd branch 2 times, most recently from c3eeecf to a59e3cc Compare

October 3, 2024 08:42

xJoskiy suggested changes

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

xJoskiy suggested changes

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

xJoskiy suggested changes

View reviewed changes

src/python_bindings/gfd/bind_gfd_mining.cpp Outdated Show resolved Hide resolved

src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved

src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved

src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.h Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

xJoskiy suggested changes

View reviewed changes

src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved

src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated

+                      : query_(query_), iso_(iso_), res_(res_) {}
+                  template <typename CorrespondenceMap1To2, typename CorrespondenceMap2To1>
+                  bool operator()(CorrespondenceMap1To2 f, CorrespondenceMap2To1) const {

Contributor

xJoskiy Oct 8, 2024

Second parameter is not used

Contributor Author

AntonChern Oct 16, 2024

Boost Graph Library requires such syntax from callback function. Unfortunately, if I change the function signature, the code will not compile

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

Contributor

xJoskiy commented Oct 8, 2024

Add PR description

AntonChern force-pushed the gfd branch from a59e3cc to f998fa9 Compare

October 9, 2024 17:03

github-actions bot reviewed

View reviewed changes

github-actions bot left a comment

clang-tidy made some suggestions

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved

xJoskiy reviewed

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

Contributor

xJoskiy commented Oct 9, 2024

You are doing great! Please don't forget to notify when you are done with changes requested. Also please mark conversations if they are resolved

AntonChern force-pushed the gfd branch 2 times, most recently from 60aeadc to 9ce90a3 Compare

October 16, 2024 15:46

AntonChern requested a review from xJoskiy

October 16, 2024 15:47

xJoskiy suggested changes

View reviewed changes

src/python_bindings/gfd/bind_gfd_mining.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

xJoskiy suggested changes

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved

xJoskiy suggested changes

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated

+                          Token name_token = Token(i, name.first);
+                          for (auto& value : name.second) {
+                              Token value_token = Token(-1, value);
+                              result.push_back(Literal(name_token, value_token));

Contributor

xJoskiy Oct 27, 2024

result.reserve(name.second.size() * attrs_info.at(label).size())

Contributor Author

AntonChern Nov 5, 2024

Unfortunately, name.second can have different number of values, depending on name.first. The reserve may not work correctly.

xJoskiy suggested changes

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/python_bindings/bindings.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

xJoskiy reviewed

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Show resolved Hide resolved

xJoskiy reviewed

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

AntonChern force-pushed the gfd branch 2 times, most recently from 11d6a2e to 13b76ae Compare

November 5, 2024 16:38

xJoskiy suggested changes

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

xJoskiy reviewed

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

AntonChern force-pushed the gfd branch from 13b76ae to b610124 Compare

November 13, 2024 16:25

github-actions bot reviewed

View reviewed changes

github-actions bot left a comment

clang-tidy made some suggestions

src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved

src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved

src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved

src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved

AntonChern force-pushed the gfd branch from b610124 to 174ee2c Compare

November 13, 2024 17:05

xJoskiy reviewed

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

AntonChern force-pushed the gfd branch 2 times, most recently from b5fd5d6 to ea596c9 Compare

March 16, 2025 14:42

xJoskiy reviewed

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

xJoskiy reviewed

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

xJoskiy reviewed

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

AntonChern force-pushed the gfd branch from ea596c9 to ea30a0c Compare

March 17, 2025 05:39

xJoskiy reviewed

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

xJoskiy reviewed

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

xJoskiy reviewed

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

xJoskiy reviewed

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

AntonChern force-pushed the gfd branch from ea30a0c to d3278fe Compare

March 18, 2025 11:30

xJoskiy reviewed

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

xJoskiy reviewed

View reviewed changes

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

AntonChern force-pushed the gfd branch 2 times, most recently from 9e67961 to 8cb290f Compare

March 24, 2025 19:05

AntonChern requested a review from xJoskiy

March 24, 2025 19:06

xJoskiy reviewed

View reviewed changes

src/core/algorithms/gfd/gfd.cpp Outdated Show resolved Hide resolved

AntonChern force-pushed the gfd branch from 8cb290f to 0368697 Compare

March 24, 2025 21:07

AntonChern requested a review from xJoskiy

March 24, 2025 21:21

Contributor

xJoskiy commented Mar 24, 2025

LGTM

xJoskiy approved these changes

View reviewed changes

ol-imorozko approved these changes

View reviewed changes

Collaborator

ol-imorozko left a comment

If you fix those minor issues, it will be great. However, I think the code is already good enough to be approved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp Outdated Show resolved Hide resolved

src/core/algorithms/gfd/gfd_miner.cpp

Comment on lines +516 to +515

		for (std::size_t i = 0; i < patterns.size(); ++i) {
		auto pattern = patterns[i];

Collaborator

ol-imorozko Mar 25, 2025

I don't know if we have std::views::enumerate support, but if we do, than it's better to use it in cases where we need both an index and an element:

Suggested change

      
                for (std::size_t i = 0; i < patterns.size(); ++i) {
          
                    auto pattern = patterns[i];
          
                for (auto [i, pattern] : std::views::enumerate(patterns)) {

src/core/algorithms/gfd/gfd_miner.cpp

Comment on lines +618 to +617

		for (std::size_t i = 0; i < patterns.size(); ++i) {
		graph_t pattern = patterns.at(i);

Collaborator

ol-imorozko Mar 25, 2025

Same here if we have std::views::enumerate

Contributor Author

AntonChern Mar 27, 2025

As far as I understand, the project only supports C++20

vs9h requested changes

View reviewed changes

Collaborator

vs9h left a comment

I didn't look at almost anything, the logic part was hopefully checked by other reviewers. I only looked at the tests.

I described the most important comment in the test file itself. But I would also like to draw attention to two things:

Why do we have access to types specific to this primitive (Gfd, Literal, Token) in the algos namespace? Usually the model namespace is used for this. Literal and Token must definitely not be in this namespace, and Gfd is still acceptable, but also not very good.
Some tests run for quite a long time in debug mode:

https://github.com/Desbordante/desbordante-core/actions/runs/14045967659/job/39326756580?pr=465
Run core-tests on ubuntu-latest with llvm-clang, Debug

270/719 Test #273: GfdMiningTest.TestMovies ...........................................................................................................................................................................................................................................................................................................................   Passed   13.92 sec
        Start 274: GfdMiningTest.TestSymbols
271/719 Test #274: GfdMiningTest.TestSymbols ..........................................................................................................................................................................................................................................................................................................................   Passed  129.41 sec
        Start 275: GfdMiningTest.TestShapes
272/719 Test #275: GfdMiningTest.TestShapes ...........................................................................................................................................................................................................................................................................................................................   Passed   61.20 sec

Is this normal? Are there known limits that are acceptable? It seems that this is quite long. Such tests are usually marked with an additional prefix (HeavyDatasets), so as not to run too expensive tests every time.

test_input_data/graph_data/shapes_graph.dot Outdated Show resolved Hide resolved

src/tests/test_gfd_mining.cpp Outdated Show resolved Hide resolved

src/tests/all_paths.h Outdated Show resolved Hide resolved

src/tests/all_paths.h Outdated Show resolved Hide resolved

AntonChern added 5 commits

March 27, 2025 17:46


          Correct changes to previous algorithms

c67487b


          Add GFD mining

2834faa

Add class for GFD miner


          Add tests for GFD mining

a31be5d

Add minimal GFD and GFD with multiple conclusion tests


          Add python bindings for GFD mining

f152564


          Add examples for GFD mining

eaf5f6c

Added two examples with searching for dependencies in small graphs.

AntonChern force-pushed the gfd branch from 0368697 to eaf5f6c Compare

March 27, 2025 14:48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

github-actions[bot] github-actions[bot] left review comments

xJoskiy xJoskiy approved these changes

vs9h vs9h requested changes

ol-imorozko ol-imorozko approved these changes

Requested changes must be addressed to merge this pull request.

Labels

None yet