Numerical issue of ot.emd with large entries #753

tlacombe · 2021-04-01T14:09:50Z

tlacombe
Apr 1, 2021

Describe the bug

It seems ot.emd fails to return an optimal plan (up to numerical precision) if there is large entries in the cost matrix (even if the optimal weight to put on these entries is 0).

To Reproduce

import numpy as np
import ot

M = np.array(
    [
        [2.50275352e02, 3.74653218e02, 2.41352736e03, 1.00000000e32, 1.51751540e-03],
        [2.13082030e02, 3.28812836e02, 2.29487946e03, 1.00000000e32, 1.37109800e-01],
        [1.97333083e02, 3.09175848e02, 2.24250550e03, 1.00000000e32, 2.46506283e00],
        [1.00000000e32, 1.00000000e32, 1.00000000e32, 5.26223432e00, 2.50000000e31],
        [3.84690152e01, 8.09465684e01, 3.33064175e02, 2.50000000e31, 0.00000000e00],
    ]
)

a = np.array([0.125, 0.125, 0.125, 0.125, 0.5])
b = np.array([0.125, 0.125, 0.125, 0.125, 0.5])
P = ot.emd(a=a, b=b, M=M, numItermax=2000000)
Q = np.array(
    [
        [0, 0, 0, 0, 0.125],
        [0, 0, 0, 0, 0.125],
        [0, 0, 0, 0, 0.125],
        [0, 0, 0, 0.125, 0],
        [0.125, 0.125, 0.125, 0, 0.125],
    ]
)
assert (P.sum(axis=0) == a).all()
assert (P.sum(axis=1) == a).all()
assert (Q.sum(axis=0) == a).all()
assert (Q.sum(axis=1) == a).all()
print("my cost matrix:\n", Q)
print("POT matrix:\n", P)
print("POT cost:", np.sum(np.multiply(P, M)))
print("my cost:", np.sum(np.multiply(Q, M)))

returns:

my cost matrix:
 [[0.    0.    0.    0.    0.125]
 [0.    0.    0.    0.    0.125]
 [0.    0.    0.    0.    0.125]
 [0.    0.    0.    0.125 0.   ]
 [0.125 0.125 0.125 0.    0.125]]
POT matrix:
 [[0.    0.125 0.    0.    0.   ]
 [0.125 0.    0.    0.    0.   ]
 [0.    0.    0.125 0.    0.   ]
 [0.    0.    0.    0.125 0.   ]
 [0.    0.    0.    0.    0.5  ]]
POT cost: 354.43787279000003
my cost: 57.54321038317501

Expected behavior

ot.emd should return (up to numerical precision) a transport plan (at least) as good as the Q I manually propose.

Environment (please complete the following information):

OS (e.g. MacOS, Windows, Linux): Ubuntu 20.04
Python version: 3.7
How was POT installed (source, pip, conda): conda

Output of the following code snippet:

>>> import platform; print(platform.platform())
Linux-5.4.0-70-generic-x86_64-with-debian-bullseye-sid
>>> import sys; print("Python", sys.version)
Python 3.7.4 (default, Aug 13 2019, 20:35:49) 
[GCC 7.3.0]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.16.4
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.3.1
>>> import ot; print("POT", ot.__version__)
POT 0.7.0

Additional context

As shown, I set numIterMax at 2000000 and didn't get any warning (and the code run fast) so the algorithm does converge.

ncourty · 2021-04-01T21:39:55Z

ncourty
Apr 1, 2021
Maintainer

Hi theo, thanks for reporting this bug. I suspect there is an overflow somewhere in the solver, it will be hard to identify. Here is a quick fixup, if it suits your need. Normalizing M by dividing it by its maximum value does not change the solution and it behaves more consistently:
P = ot.emd(a=a, b=b, M=M/M.max(), numItermax=2000000)
will produce the desired output.

0 replies

rflamary · 2021-04-02T09:27:42Z

rflamary
Apr 2, 2021
Maintainer

Should we normalize like that the M matrix inside emd then?

This is weird because it seemed to me that when using np.inf in the M matrix the exact solver works very well.

0 replies

mglisse · 2021-04-05T19:55:02Z

mglisse
Apr 5, 2021

Note that compiling network_simplex_simple.h with a positive DEBUG_LVL does not work (niter is probably supposed to be iter_number).

0 replies

tlacombe · 2021-04-19T08:51:48Z

tlacombe
Apr 19, 2021
Author

@ncourty I may be missing something but I tried the same code using normalization in the ot.emd but this didn't solve the issue.
I still get the same (non-optimal) transport plan as the one obtained without normalization.
I tried replacing 1e32 values by np.inf as suggested by @rflamary , but this does not work in my example at least : the returned transport plan does not satisfy the marginal constraints.

I also tried using ot.emd2 directly, the output seems consistent with doing np.sum(np.multiply(...)) : using M or M/M.max() yields the same result, using M_inf yields NaN.

Code to reproduce (updated version of the previous code):

import numpy as np
import ot

M = np.array(
    [
        [2.50275352e02, 3.74653218e02, 2.41352736e03, 1.00000000e32, 1.51751540e-03],
        [2.13082030e02, 3.28812836e02, 2.29487946e03, 1.00000000e32, 1.37109800e-01],
        [1.97333083e02, 3.09175848e02, 2.24250550e03, 1.00000000e32, 2.46506283e00],
        [1.00000000e32, 1.00000000e32, 1.00000000e32, 5.26223432e00, 2.50000000e31],
        [3.84690152e01, 8.09465684e01, 3.33064175e02, 2.50000000e31, 0.00000000e00],
    ]
)

M_inf = np.array(
    [
        [2.50275352e02, 3.74653218e02, 2.41352736e03, np.inf, 1.51751540e-03],
        [2.13082030e02, 3.28812836e02, 2.29487946e03, np.inf, 1.37109800e-01],
        [1.97333083e02, 3.09175848e02, 2.24250550e03, np.inf, 2.46506283e00],
        [np.inf, np.inf, np.inf, 5.26223432e00, np.inf],
        [3.84690152e01, 8.09465684e01, 3.33064175e02, np.inf, 0.00000000e00],
    ]
)


a = np.array([0.125, 0.125, 0.125, 0.125, 0.5])
b = np.array([0.125, 0.125, 0.125, 0.125, 0.5])

### POT transport plans
P = ot.emd(a=a, b=b, M=M, numItermax=2000000)
P2 = ot.emd(a=a, b=b, M=M/M.max(), numItermax=2000000)
P_inf = ot.emd(a=a,b=b, M=M_inf, numItermax=2000000)

### POT cost computed directly
potc = ot.emd2(a=a, b=b, M=M)
print('cost by pot.emd2 directly:', potc)
potc2 = ot.emd2(a=a,b=b, M=M/M.max()) * M.max()
print("cost by pot.emd2 using M/M.max():", potc2)
potc_inf = ot.emd2(a=a,b=b, M=M_inf)
print("cost by pot.emd2 using M_inf:", potc_inf)

### My transport plan
Q = np.array(
    [
        [0, 0, 0, 0, 0.125],
        [0, 0, 0, 0, 0.125],
        [0, 0, 0, 0, 0.125],
        [0, 0, 0, 0.125, 0],
        [0.125, 0.125, 0.125, 0, 0.125],
    ]
)
assert (P.sum(axis=0) == a).all()
assert (P.sum(axis=1) == b).all()
assert (Q.sum(axis=0) == a).all()
assert (Q.sum(axis=1) == b).all()
### P_inf does not satisfy the marginal constraints
#assert(P_inf.sum(axis=0) == a).all()
#assert(P_inf.sum(axis=1) == b).all()

print("*****")
print("my cost matrix:\n", Q)
print("POT transport plan:\n", P)
print("POT transport plan with M_inf:", P_inf)
print("*****")
print("POT cost:", np.sum(np.multiply(P, M)))
print("POT cost after normalization", np.sum(np.multiply(P2, M)))
print("my cost:", np.sum(np.multiply(Q, M)))

Output:

cost by pot.emd2 directly: 354.43787279000003
cost by pot.emd2 using M/M.max(): 354.43787279000003
cost by pot.emd2 using M_inf: nan
*****
my cost matrix:
 [[0.    0.    0.    0.    0.125]
 [0.    0.    0.    0.    0.125]
 [0.    0.    0.    0.    0.125]
 [0.    0.    0.    0.125 0.   ]
 [0.125 0.125 0.125 0.    0.125]]
POT transport plan:
 [[0.    0.125 0.    0.    0.   ]
 [0.125 0.    0.    0.    0.   ]
 [0.    0.    0.125 0.    0.   ]
 [0.    0.    0.    0.125 0.   ]
 [0.    0.    0.    0.    0.5  ]]
POT transport plan with M_inf: [[0.    0.    0.    0.    0.   ]
 [0.    0.    0.    0.    0.   ]
 [0.    0.    0.    0.    0.   ]
 [0.    0.    0.    0.125 0.   ]
 [0.    0.    0.    0.    0.5  ]]
*****
POT cost: 354.43787279000003
POT cost after normalization 354.43787279000003
my cost: 57.54321038317501

0 replies

rflamary · 2021-04-21T11:32:28Z

rflamary
Apr 21, 2021
Maintainer

interesting bug, i will try to reproduce it on my machine and get back to you.

0 replies

mglisse · 2021-04-21T17:12:42Z

mglisse
Apr 21, 2021

A bit simplified

import numpy as np
import ot

z = 1e16
M = np.array([[1, z, 0], [1, z, 0], [z, 0, z], [0, z, 0]])

b = np.array([1, 1, 3])
a = np.array([1, 1, 1, 2])
P = ot.emd(a=a, b=b, M=M)
Q = np.array([[0, 0, 1], [0, 0, 1], [0, 1, 0], [1, 0, 1]])
assert (P.sum(axis=0) == b).all()
assert (P.sum(axis=1) == a).all()
assert (Q.sum(axis=0) == b).all()
assert (Q.sum(axis=1) == a).all()
print("my cost matrix:\n", Q)
print("POT matrix:\n", P)
print("POT cost:", np.sum(np.multiply(P, M)))
print("my cost:", np.sum(np.multiply(Q, M)))

my cost matrix:
[[0 0 1]
[0 0 1]
[0 1 0]
[1 0 1]]
POT matrix:
[[1. 0. 0.]
[0. 0. 1.]
[0. 1. 0.]
[0. 0. 2.]]
POT cost: 1.0
my cost: 0.0

Could be related to EPSILON or the precision of double indeed, since 1e16 is roughly the limit where it starts failing.

0 replies

rflamary · 2021-04-22T07:40:04Z

rflamary
Apr 22, 2021
Maintainer

yes that's it good catch! basically we cannot solve the problem precisely if there exist two values different values in m such that m_1+_2=m_1 due to numerical precision.

I dont' think we can say it is a bug or something we can solve if it is due to the numerical precision for float64.

Maybe we can add a warning when the dynamic in M is too large. Note that if you want to forbid a link you just need to set its value to M.max()*1.0001 or something like that instead of a very large value, it will not change the solution or the value.

Again thank you for reporting this weird behavior.

0 replies

mglisse · 2021-04-22T19:17:59Z

mglisse
Apr 22, 2021

yes that's it good catch! basically we cannot solve the problem precisely if there exist two values different values in m such that m_1+_2=m_1 due to numerical precision.

I dont' think we can say it is a bug or something we can solve if it is due to the numerical precision for float64.

Well, it isn't impossible. If m_1 is not used in the optimal plan, it is possible to compute the optimal plan without relying on something like m_1+m_2-m_1==m_2. As you mentioned in a previous comment, even a cost of +inf could work (the fact that it can currently output something that isn't even a transport plan is a bit alarming). Depending on the algorithm, arranging the computations to be robust like that can be trivial or very complicated and costly, and the question is whether it is worth the trouble. Maybe it isn't here...

Maybe we can add a warning when the dynamic in M is too large. Note that if you want to forbid a link you just need to set its value to M.max()*1.0001 or something like that instead of a very large value, it will not change the solution or the value.

Just around the max of the other values may not be sufficient to forbid a link, it could still be used to transport a small mass in the optimal plan.
We will handle truly infinite values separately in Gudhi (it is currently officially unsupported), so hopefully our users won't be tempted to simulate infinity with large values...

0 replies

rflamary · 2021-04-23T08:42:05Z

rflamary
Apr 23, 2021
Maintainer

POT exact OT solver is a wrapper around the C++ LEMON solver and we modifying it is out of what we can do. In addition, it might require some kind of testings at each iteration that could greatly decrease the performance.

About the max value, I think it is sufficient if the dynamic in M is limited (below 1e14 for instance) adding a cost that is bigger than all the other would prevent any mass on the link because the solver is a Network simplex and the solution will always be on a face of the polytop and under the dynamic condition this face can be improved if there is mass on the largest values of M.

0 replies

mglisse · 2021-04-23T09:29:01Z

mglisse
Apr 23, 2021

POT exact OT solver is a wrapper around the C++ LEMON solver and we modifying it is out of what we can do.

ok, thanks.

In addition, it might require some kind of testings at each iteration that could greatly decrease the performance.

Yes, although since I am not familiar with the algorithm I have no idea how large the cost would be.

About the max value, I think it is sufficient if the dynamic in M is limited (below 1e14 for instance) adding a cost that is bigger than all the other would prevent any mass on the link because the solver is a Network simplex and the solution will always be on a face of the polytop and under the dynamic condition this face can be improved if there is mass on the largest values of M.

emd(a=[],b=[],M=[[1,10,10],[10,1,10],[10,10,18]]) outputs a diagonal matrix, which in particular puts some mass on the 18, I need something strictly larger than 19 so it outputs a different plan. So "bigger than all the other" isn't quite sufficient, more like bigger than a path that can replace this link, or to be safe maybe bigger than the sum of the other costs.

0 replies

rflamary · 2021-04-23T12:06:44Z

rflamary
Apr 23, 2021
Maintainer

you are right good example

0 replies

Numerical issue of ot.emd with large entries #753

Uh oh!

tlacombe Apr 1, 2021

Describe the bug

To Reproduce

Expected behavior

Environment (please complete the following information):

Additional context

Replies: 11 comments

Uh oh!

Uh oh!

ncourty Apr 1, 2021 Maintainer

Uh oh!

rflamary Apr 2, 2021 Maintainer

Uh oh!

mglisse Apr 5, 2021

Uh oh!

tlacombe Apr 19, 2021 Author

Uh oh!

rflamary Apr 21, 2021 Maintainer

Uh oh!

mglisse Apr 21, 2021

Uh oh!

rflamary Apr 22, 2021 Maintainer

Uh oh!

mglisse Apr 22, 2021

Uh oh!

rflamary Apr 23, 2021 Maintainer

Uh oh!

mglisse Apr 23, 2021

Uh oh!

rflamary Apr 23, 2021 Maintainer

tlacombe
Apr 1, 2021

ncourty
Apr 1, 2021
Maintainer

rflamary
Apr 2, 2021
Maintainer

mglisse
Apr 5, 2021

tlacombe
Apr 19, 2021
Author

rflamary
Apr 21, 2021
Maintainer

mglisse
Apr 21, 2021

rflamary
Apr 22, 2021
Maintainer

mglisse
Apr 22, 2021

rflamary
Apr 23, 2021
Maintainer

mglisse
Apr 23, 2021

rflamary
Apr 23, 2021
Maintainer