You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have recently found an interesting issue with gVisor. Following is the Python script we ran:
import lightgbm as lgb
import sys
from numpy.random import seed
from numpy.random import randint
def lightgbm_method(num_jobs):
count = 1000
seed(1)
data = []
for _ in range(count):
data.append(randint(0, 100, 5))
labels = randint(0, 100, count)
clf = lgb.LGBMClassifier(n_jobs=num_jobs)
clf.fit(data, labels)
return 0
lightgbm_method(int(sys.argv[1]))
For some reason the runtime largely depends on the num_jobs passed to the function. Following is the runtime of this script with the num_jobs passed:
Running on node with 8 physical cores (c6gd.2xlarge):
num_jobs
Native Kernel (seconds)
gVisor on systrap (seconds)
gVisor on ptrace (seconds)
1
4.42
9.50
44.09
2
4.16
13.14
68.04
4
4.07
12.82
60.33
7
4.71
15.28
51.40
8
5.62
361.35
56.54
9
31.25
35.37
164.59
10
34.31
34.99
178.73
Running on node with 16 physical cores (r7gd.4xlarge):
num_jobs
Native Kernel (seconds)
gVisor on systrap (seconds)
gVisor on ptrace (seconds)
1
3.59
8.42
33.54
2
3.49
11.84
51.37
4
3.34
11.60
48.88
8
4.66
13.58
49.61
15
26.38
189.51
170.27
16
75.84
272.24
220.72
17
76.74
67.99
248.44
Above numbers are very consistent in our environment.
Observations
There seems to be a pattern that with this job, it would take significantly longer time (up to 70 times longer!) to finish when num_jobs is set to equal to the number of physical cores on host.
When num_jobs is not passed, lgb.LGBMClassifier takes in default value to be same as physical cores. This makes the worst case to be the default case
However, when setting OMP_THREAD_LIMIT env variable to 1, even num_jobs is equal to physical cores, the job takes very fast to complete.
With ptrace platform, it takes longer to complete in general. However, when num_jobs is close to physical cores, ptrace actually surpasses systrap. This might indicate some issues in systrap
There is a known issue on lightgbm with OpenMP that multi-threading with lightgbm could be hanging. We followed the step to set the num_threads=1, and the issue no longer exists. But it is still not clear if the performance degradation is caused by this issue, as we do not observe same level of degradation with native kernel.
Could you please help us understand the degradation we are seeing here, especially the case with # of physical cores is 8 and num_jobs is also set to 8? Why would gVisor suddenly takes ~70 times slower than native kernel?
Steps to reproduce
Python script to reproduce:
import lightgbm as lgb
import sys
from numpy.random import seed
from numpy.random import randint
def lightgbm_method(num_jobs):
count = 1000
seed(1)
data = []
for _ in range(count):
data.append(randint(0, 100, 5))
labels = randint(0, 100, count)
clf = lgb.LGBMClassifier(n_jobs=num_jobs)
clf.fit(data, labels)
return 0
lightgbm_method(int(sys.argv[1]))
runsc version
runsc version release-20241217.0-40-gfe855beceea5-dirty
spec: 1.1.0-rc.1
docker version (if using docker)
uname
Linux ws-uswest2-2-e20c 5.10.215-203.850.amzn2.aarch64 #1 SMP Tue Apr 23 20:32:21 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
kubectl (if using Kubernetes)
repo state (if built from source)
No response
runsc debug logs (if available)
The text was updated successfully, but these errors were encountered:
Thanks a lot for the reproducer, I'll see if I can use it.
As to why this is happening: you can look at previous issues we've had on this like #9119. For communicating between the sentry and user processes we try to use the "fast" path as much as possible, which involves spinning; however if we're core-bound this then also prevents other jobs from making progress, which leads to cascading performance losses across all jobs. We have an "intelligent" way of disabling the fast path, but I think we'll have to improve upon it.
Description
Hi team,
We have recently found an interesting issue with gVisor. Following is the Python script we ran:
For some reason the runtime largely depends on the
num_jobs
passed to the function. Following is the runtime of this script with thenum_jobs
passed:Running on node with 8 physical cores (c6gd.2xlarge):
Running on node with 16 physical cores (r7gd.4xlarge):
Above numbers are very consistent in our environment.
Observations
num_jobs
is set to equal to the number of physical cores on host.lgb.LGBMClassifier
takes in default value to be same as physical cores. This makes the worst case to be the default caseOMP_THREAD_LIMIT
env variable to 1, even num_jobs is equal to physical cores, the job takes very fast to complete.ptrace
platform, it takes longer to complete in general. However, whennum_jobs
is close to physical cores,ptrace
actually surpassessystrap
. This might indicate some issues insystrap
num_threads=1
, and the issue no longer exists. But it is still not clear if the performance degradation is caused by this issue, as we do not observe same level of degradation with native kernel.Could you please help us understand the degradation we are seeing here, especially the case with # of physical cores is 8 and
num_jobs
is also set to 8? Why would gVisor suddenly takes ~70 times slower than native kernel?Steps to reproduce
Python script to reproduce:
runsc version
docker version (if using docker)
uname
Linux ws-uswest2-2-e20c 5.10.215-203.850.amzn2.aarch64 #1 SMP Tue Apr 23 20:32:21 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
kubectl (if using Kubernetes)
repo state (if built from source)
No response
runsc debug logs (if available)
The text was updated successfully, but these errors were encountered: