Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-Deterministic Results while using binary #198

Closed
LeaveryF opened this issue Nov 20, 2024 · 14 comments
Closed

Non-Deterministic Results while using binary #198

LeaveryF opened this issue Nov 20, 2024 · 14 comments

Comments

@LeaveryF
Copy link

Hello.

When I use mtkahypar with c library, I also met the problem of "Non-Deterministic Results", as is shown in #190 and #189 . Then I switched from c library to binary. However, this problem still exists.

This is how I run mtkahypar binary:

./MtKaHyPar \
  -h mt_input_hypergraph.txt \
  --preset-type=deterministic \
  -t 4 \
  -k 8 \
  -e 0.03 \
  -g mt_input_target_graph.txt \
  -o steiner_tree \

where mt_input_hypergraph.txt and mt_input_target_graph.txt are generated by my project.

And this is the different results:

Objectives: 
  steiner_tree         = 1136 (primary objective function) 
  Approximation Factor = 1.00484  
  cut                  = 931  
  km1                  = 1116  
  soed                 = 2047  
  Imbalance            = 0.0288922  
  Partitioning Time    = 0.161263 s  

Partition sizes and weights:  
|block 0| =  103  w( 0 ) = 5247  max( 0 ) = 5597
|block 1| =   37  w( 1 ) = 5591  max( 1 ) = 5597
|block 2| =   88  w( 2 ) = 5389  max( 2 ) = 5597
|block 3| =  100  w( 3 ) = 5381  max( 3 ) = 5597
|block 4| =   87  w( 4 ) = 5391  max( 4 ) = 5597
|block 5| =   86  w( 5 ) = 5400  max( 5 ) = 5597
|block 6| =   39  w( 6 ) = 5482  max( 6 ) = 5597
|block 7| =   60  w( 7 ) = 5589  max( 7 ) = 5597
Objectives: 
  steiner_tree         = 1135 (primary objective function) 
  Approximation Factor = 1.00484  
  cut                  = 931  
  km1                  = 1116  
  soed                 = 2047  
  Imbalance            = 0.0288922  
  Partitioning Time    = 0.162108 s  

Partition sizes and weights:  
|block 0| =   60  w( 0 ) = 5589  max( 0 ) = 5597
|block 1| =   39  w( 1 ) = 5482  max( 1 ) = 5597
|block 2| =  100  w( 2 ) = 5381  max( 2 ) = 5597
|block 3| =   87  w( 3 ) = 5391  max( 3 ) = 5597
|block 4| =   88  w( 4 ) = 5389  max( 4 ) = 5597
|block 5| =  103  w( 5 ) = 5247  max( 5 ) = 5597
|block 6| =   37  w( 6 ) = 5591  max( 6 ) = 5597
|block 7| =   86  w( 7 ) = 5400  max( 7 ) = 5597

Also, it seems that the larger the scale of the (hyper)graph file, the more likely and clearly the results will differ.

Interestingly, when I use instances in mtkahypar project, for example, the ibm01.hgr and target.graph, this problem didn't occur. So, it seems the problem lies in my (hyper)graph file?

I have no idea what I should do. If you want my (hyper)graph files, I'm willing to offer them.

@LeaveryF
Copy link
Author

Just now, I tested ibm01.hgr along with my target graph file. That means I changed target graph file from target.graph in your project to mine. Then this problem occurred again.

My target graph files are as follows:

4 3 1
2 1 
1 1 3 1 
2 1 4 1 
3 1 

or

8 11 1
2 1 5 1 7 1 
1 1 3 1 6 1 7 1 
2 1 8 1 
8 1 
1 1 6 1 
2 1 5 1 7 1 
1 1 2 1 6 1 8 1 
3 1 4 1 7 1 

Both of them, as the mapped graph file, have the non-deterministic problem.

Did I write the target graph file incorrectly or are there some other reasons for this?

@N-Maas
Copy link
Collaborator

N-Maas commented Nov 20, 2024

Hi @LeaveryF,

thanks for the detailed report! You probably did nothing wrong and there is actually an issue with the algorithm. I'm afraid support for steiner trees is still a bit experimental, specifically with regards to the deterministic configuration.

From the results you showed, it seems that the main partitioning step is deterministic, however the mapping to the target graph is different in both cases. Therefore, the most likely cause is that some part of the mapping procedure is not deterministic. I will try to investigate it this week and see whether I can fix the problem.

@N-Maas
Copy link
Collaborator

N-Maas commented Nov 21, 2024

Could you check whether the branch nikolai/deterministic-mapping fixes the problem @LeaveryF ?

@LeaveryF
Copy link
Author

Mapping to small target graph, such as the two files above, the results have been stable, but larger target graph still have this problem.
Here is another graph file with which I met the problem:

32 88 1
13 1 15 1 19 1 28 1 30 1 
5 1 9 1 14 1 23 1 29 1 
16 1 18 1 21 1 26 1 32 1 
12 1 20 1 21 1 22 1 25 1 
2 1 7 1 12 1 13 1 
12 1 13 1 21 1 23 1 24 1 30 1 
5 1 11 1 21 1 23 1 26 1 
11 1 14 1 16 1 18 1 23 1 26 1 27 1 
2 1 16 1 21 1 31 1 
18 1 20 1 21 1 28 1 31 1 32 1 
7 1 8 1 12 1 26 1 30 1 32 1 
4 1 5 1 6 1 11 1 22 1 23 1 25 1 28 1 
1 1 5 1 6 1 19 1 25 1 
2 1 8 1 16 1 23 1 25 1 
1 1 20 1 25 1 
3 1 8 1 9 1 14 1 20 1 21 1 24 1 26 1 28 1 
20 1 24 1 26 1 27 1 
3 1 8 1 10 1 24 1 30 1 
1 1 13 1 
4 1 10 1 15 1 16 1 17 1 21 1 24 1 32 1 
3 1 4 1 6 1 7 1 9 1 10 1 16 1 20 1 24 1 27 1 
4 1 12 1 23 1 31 1 
2 1 6 1 7 1 8 1 12 1 14 1 22 1 30 1 
6 1 16 1 17 1 18 1 20 1 21 1 28 1 
4 1 12 1 13 1 14 1 15 1 
3 1 7 1 8 1 11 1 16 1 17 1 30 1 32 1 
8 1 17 1 21 1 
1 1 10 1 12 1 16 1 24 1 29 1 
2 1 28 1 
1 1 6 1 11 1 18 1 23 1 26 1 31 1 
9 1 10 1 22 1 30 1 
3 1 10 1 11 1 20 1 26 1 

It does happen more rarely. Using test_instance.hgr in your project and my target graph file can reproduce this problem, maybe.

@N-Maas
Copy link
Collaborator

N-Maas commented Nov 21, 2024

Okay, I found another bug which was a bit more tricky and actually affected the non-deterministic mode, too. Could you try again with the new commit @LeaveryF ?

Also, the issue that I found implies that the steiner trees only work for k <= 64 for now. Is this a problem for your use case? With some additional work it would be possible to allow larger k. Note however that the running time might degrade for large k anyways due to the high computational complexity of the steiner tree metric.

@LeaveryF
Copy link
Author

It works well in my cases now. Thank you very much.

And, my use case just need k <= 64, so that's not a problem for me now.

@N-Maas N-Maas closed this as completed Nov 22, 2024
@LeaveryF
Copy link
Author

LeaveryF commented Nov 23, 2024

Hi, I have another question. It's small and not so important maybe. In deterministic preset type, do the c library interfaces "mt_kahypar_create_hypergraph" and "mt_kahypar_read_hypergraph_from_file" construct the same hypergraph? I have tried both with my data structure and hypergraph files, but they seem different. @N-Maas

@N-Maas
Copy link
Collaborator

N-Maas commented Nov 25, 2024

Could you specify what you mean with "seem different"? If the data in the file and the data provided to mt_kahypar_create_hypergraph are equivalent and you are using deterministic preset, they should result in the same hypergraph.

Note that you need to be a bit careful with the input to mt_kahypar_create_hypergraph, specifically you need to provide a sentinel element as the last entry in hyperedge_indices (see example in the header file). At the moment, we don't have checks for the validity of the input.

@LeaveryF
Copy link
Author

LeaveryF commented Nov 25, 2024

Well, I have some raw hypergraph data, and I have tried to process it in different ways...

In the first time, I read the data and transformed it into my data structure, then passed them to mt_kahypar_create_hypergraph with its format to construct the hypergraph. And then, started partition.

In the second time, I processed the data into an hmetis file, then used mt_kahypar_read_hypergraph_from_file to construct the hypergraph. And then, started partition.

In the third time, I processed the data into an hmetis file, then used mt binary to process the hypergraph.

All of them were set the same imbalance, seed, and deterministic preset type, along with the same steiner tree mapped target graph.

The results(steiner tree metric, and final partition results) for 2 and 3 are the same, which I think that means the problem does not lie in partition process. While 1 is different from them, which I personally think the problem may lie in this c library interface.

I think I followed the input format of mt_kahypar_create_hypergraph and it does have the sentinel element. I will also review it again and test different data later.

@LeaveryF
Copy link
Author

I found that mapping ./tests/instances/hypergraph_with_node_and_edge_weights.hgr to target graph ./tests/interface/test_instances/target.graph or ./tests/instances/graph_with_edge_weights.graph would cause a Segmentation fault during Initial Partitioning. Is that normal?

This is the params:

./build/mt-kahypar/application/MtKaHyPar \
  -h ./tests/instances/hypergraph_with_node_and_edge_weights.hgr \
  --preset-type=deterministic \
  -t 4 \
  -k 8 \
  -e 0.03 \
  -g ./tests/instances/graph_with_edge_weights.graph \
  -o steiner_tree \

@N-Maas
Copy link
Collaborator

N-Maas commented Nov 26, 2024

So there are two problems with the input:
First, the input hypergraph has less nodes than the target graph. This caused the segfault. I pushed a fix which should avoid the segfault in this scenario, though whether partitioning makes sense for such an input is unclear.
Second, the target graph contains an isolated (degree 0) node. This causes non-sensical values for the steiner tree metric, since no tree exists which can connect this node with the remaining graph. I updated the implementation so that it will check whether the target graph is connected and raise an error otherwise.

Note that the changes are already on the master branch.

I don't know whether this is related to the non-deterministic results with the C interface, but probably not. For the latter, an example input to reproduce the problem would be helpful.

@LeaveryF
Copy link
Author

LeaveryF commented Nov 27, 2024

Oh, yes. I ignored the isolated node.

And for the c interface, I wrote a simple program as follows. Changing the boolean variable use_file would produce different results. I would greatly appreciate it if you could find a moment to help me check this. The input hgr is ibm01.hgr, so I didn't consider weight.

#include <fstream>
#include <iostream>
#include <memory>
#include <sstream>
#include <thread>
#include <vector>

#include "libmtkahypar.h"

int main(int argc, char *argv[]) {
  // Initialize thread pool
  mt_kahypar_initialize_thread_pool(4, true);
  // Setup partitioning context
  mt_kahypar_context_t *context = mt_kahypar_context_new();
  mt_kahypar_load_preset(context, DETERMINISTIC);
  // mt_kahypar_configure_context_from_file(context, "config.ini");
  mt_kahypar_set_partitioning_parameters(context, 8, 0.03, KM1);
  // Enable logging
  mt_kahypar_set_context_parameter(context, VERBOSE, "1");
  // Set seed
  // mt_kahypar_set_seed(0);

  mt_kahypar_hypergraph_t hypergraph;
  bool use_file = true;
  std::string hgr_file = "../../examples/ibm01.hgr";
  if (use_file) {
    hypergraph = mt_kahypar_read_hypergraph_from_file(
        hgr_file.c_str(), DETERMINISTIC, HMETIS);
  } else {
    std::ifstream fin(hgr_file);
    int m, n, sum = 0;
    fin >> m >> n;
    std::vector<std::vector<int>> edge_vec(m);
    for (int i = 0; i < m; i++) {
      std::string line;
      std::getline(fin, line);
      std::istringstream iss(line);
      int v;
      while (iss >> v) {
        edge_vec[i].push_back(v - 1);
      }
      sum += edge_vec[i].size();
    }

    std::unique_ptr<size_t[]> hyperedge_indices =
        std::make_unique<size_t[]>(m + 1);
    size_t cnt = 0;
    for (int i = 0; i < m; i++) {
      hyperedge_indices[i] = cnt;
      cnt += edge_vec[i].size();
    }
    hyperedge_indices[m] = cnt;

    std::unique_ptr<mt_kahypar_hyperedge_id_t[]> hyperedges =
        std::make_unique<mt_kahypar_hyperedge_id_t[]>(sum);
    int index = 0;
    for (const auto &v : edge_vec) {
      for (const auto &e : v) {
        hyperedges[index++] = e;
      }
    }

    hypergraph = mt_kahypar_create_hypergraph(
        DETERMINISTIC, n, m, hyperedge_indices.get(), hyperedges.get(), nullptr,
        nullptr);
  }

  // Read target graph file
  mt_kahypar_target_graph_t *target_graph =
      mt_kahypar_read_target_graph_from_file("../../examples/target.graph");

  // Map hypergraph onto target graph
  mt_kahypar_partitioned_hypergraph_t partitioned_hg =
      mt_kahypar_map(hypergraph, target_graph, context);

  // Extract Mapping
  std::unique_ptr<mt_kahypar_partition_id_t[]> mapping =
      std::make_unique<mt_kahypar_partition_id_t[]>(
          mt_kahypar_num_hypernodes(hypergraph));
  mt_kahypar_get_partition(partitioned_hg, mapping.get());

  // Extract Block Weights
  std::unique_ptr<mt_kahypar_hypernode_weight_t[]> block_weights =
      std::make_unique<mt_kahypar_hypernode_weight_t[]>(8);
  mt_kahypar_get_block_weights(partitioned_hg, block_weights.get());

  // Compute Metrics
  const double imbalance = mt_kahypar_imbalance(partitioned_hg, context);
  const double steiner_tree_metric =
      mt_kahypar_steiner_tree(partitioned_hg, target_graph);

  // Output Results
  std::cout << "Partitioning Results:" << std::endl;
  std::cout << "Imbalance           = " << imbalance << std::endl;
  std::cout << "Steiner Tree Metric = " << steiner_tree_metric << std::endl;
  for (size_t i = 0; i < 8; ++i) {
    std::cout << "Weight of Block " << i << "   = " << block_weights[i]
              << std::endl;
  }

  mt_kahypar_free_context(context);
  mt_kahypar_free_hypergraph(hypergraph);
  mt_kahypar_free_partitioned_hypergraph(partitioned_hg);
  mt_kahypar_free_target_graph(target_graph);
}

@N-Maas
Copy link
Collaborator

N-Maas commented Nov 27, 2024

There is actually an issue with both your code and the C interface.

You have a small bug in the parsing of the hypergraph file. Currently, you prepend an empty hyperedge and ignore the last hyperedge (since the line break remains after parsing the header). You should instead do something like this:

std::string line;
std::ifstream fin(hgr_file);
std::getline(fin, line);
std::istringstream iss(line);
int m, n, sum = 0;
iss >> m >> n;
std::vector<std::vector<int>> edge_vec(m);
for (int i = 0; i < m; i++) {
  std::getline(fin, line);
  std::istringstream iss(line);
  int v;
  while (iss >> v) {
    edge_vec[i].push_back(v - 1);
  }
  sum += edge_vec[i].size();
}

However, the C interface also differs from reading the hypergraph from the file. The difference is that reading from a file will sort the pins of each hyperedge in ascending order, while the C interface leaves the order unchanged. This is unfortunate and we should probably unify the behavior of the two APIs. For the moment, you can use the fix-c-interface-input branch which will also apply the sorting in mt_kahypar_create_hypergraph.

The change won't make it to master for the moment, since doing this properly (specifically, such that it also works for the python interface) is a bit more complicated.

@LeaveryF
Copy link
Author

OK, I see it! Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants