Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per Script Invocation Lua Memory Limits #903

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

kevin-montrose
Copy link
Contributor

@kevin-montrose kevin-montrose commented Jan 7, 2025

Another decent sized one, though hopefully this is the last "big" Lua PR - the rest I can foresee should be smaller.

TODOs:

  • Custom allocators working
  • New config options for allocators and memory limits
  • Update benchmarks
  • Get final benchmark numbers
  • Answer open questions
    • Are memory pressure updated necessary? Have a thread with .NET GC folks for this. Got our answer, they are correct to have here.
    • Behavior when scripts aborted? Redis is weird here. It's reasonable for writes that happened pre-abort to still happen. We can explore rollback if there's a pressing need, but it's non-trivial.

This introduces the ability to specify maximum memory limits for Lua scripts, currently this a single config (--lua-script-memory-limit). To enable this we also have to introduce custom allocators (--lua-memory-management-mode) for Lua, there are 3 in this PR: Native (the current behavior, where Lua provides the allocator), Tracked (where memory is acquired with NativeMemory and GC pressure is updated), and Managed (where a POH array is pre-allocated and memory is obtained from a freelist punned over that allocation).

In order to gracefully handle Lua OOMs more of the operation of LuaRunner (things like compilation and the preamble) is hidden behind Lua PCalls. This is a necessary change, as the default behavior of Lua is to abort the process in the face of OOMs - PCalls prevent that.

To make the PCall changes less expensive (and just generally less awful), I introduced some (Strong, not Pinned) GCHandles, function pointers, and trampolines. At the end of this, we're basically just using KeraLua to package Lua and define some constants - none of the .NET code is really running anymore. If we really wanted, we could build Lua ourselves (maybe even drop down to 5.1 to match Redis) and exploit that tight coupling - but I have no intention of doing so at this time.

When improving the Lua OOM RESP error, I also found a bug in previous PR around buffer management - it is fixed in this commit.

The Allocators

Native

This is the default.

This just uses the built-in Lua allocator, which is a thin shim over malloc. It should perform bit better than Tracked simply because there isn't any .NET code in the way.

Native does not support memory limits.

Tracked

A thin wrapper over NativeMemory. It supports memory limits, and will fail once total requested bytes exceeds the configure limit. Since it cannot see the overhead of NativeMemory the limit is only softly enforced.

This currently calls GC.(Add|Remove)MemoryPressure, but see Open Questions.

Managed (w/ and w/o Limits)

A really basic free-list based allocator over a POH array. It pre-allocates the total limit, and (if one is configured) it strictly limits allocations since the overhead can be seen.

If a limit is not configured, 2MB (or larger, if the requested size exceeds 2MB) arrays are allocated as needed.

We could certainly do a lot better here (I imagine there's something existing in Garnet I could steal or repurpose), but this is mostly a proof we could get Lua 100% onto the managed heap. That said, I couldn't help put profile a little bit, so it shouldn't be awful given Lua's allocation patterns.

Open Questions

Is GC pressure actually needed in the Tracked case?

Docs say:

The AddMemoryPressure and RemoveMemoryPressure methods improve performance only for types that exclusively depend on finalizers to release the unmanaged resources. It's not necessary to use these methods in types that follow the dispose pattern, where finalizers are used to clean up unmanaged resources only in the event that a consumer of the type forgets to call Dispose.

Which makes very little sense to me, as in a container (like a job) with memory limits the presence or lack of a finalizer seems irrelevant to whether the GC needs to be informed of native allocations?

Ultimately the .NET GC folks will just have to answer this one, I've opened a thread with them.

Docs are (somewhat) incorrect here, and will be updated. It is correct, but not strictly necessary, to have these calls in the Tracked case. I'm leaving them in so the GC can respond more promptly to memory pressure.

What is expected behavior when a script is aborted?

This change introduces a case where a script might be aborted, and I expect future changes (timeouts, and potentially SCRIPT|KILL) to add more.

Redis doesn't allow this - you are expected to let Redis crash, or force a shutdown, if a script goes out of control. That's kind of nuts, IMO, especially for any HA service.

However, by deviating from Redis (with this opt-in switch), we do need to define expected behavior.

Right now, the behavior is "any commands that executed in the script, executed". Commands cannot half-execute, but scripts can, basically.

Is this acceptable, or do we need some (presumably configurable) rollback behavior?

With transactions enabled we already know the scope of "needs to be rolled back", but the implementation would be non-trivial.

Decision: No rollbacks

Summarizing some discussion:

If we are running in non-transaction mode, then we view Lua scripts as logically no different from a client issuing a sequence of calls, so the idea that the commands that happened pre-abort are persisted, is the only thing that makes sense.

If we are in transaction mode, it is possible some users expect atomicity - but this is going to be harder as we will not have the "before image" of keys stored anywhere. It is perfectly fine to document that there is no rollback in this situation. We will simply unlock the keys and "succeed" the partial transaction.

Benchmarks

I changed ScriptOperations to use LuaParams instead of OperationParams as we were already ignoring most of the operation variants there. Now all Lua-related benchmarks run for with different allocators enabled: Native (the old behavior, and current default), Tracked w/ 2M limit, Tracked w/o a limit, Managed w/ 2M limit, and Managed w/o limit.

main results are as of ce21c248f084744e45bbff08d0ecce0a51326cca.
luaMemoryLimits are as of a2996e9ae5f7e9c44a8848e44cc91417ddf418c4.

Broadly speaking, we're giving up a bit of perf for the ability to recover from OOMs (and other runtime errors, technically). There's some work that could be done to claw bits of this back, in theory, but we are actually doing more with this change.

LuaRunnerOperations

Comparing the baseline and the Native,None case, we're giving up a small amount across the board. Worst case ~9%, though these are very fast (ns) already.

main

Method Params Mean Error StdDev Median Allocated
ResetParametersSmall None 102.5 ns 0.51 ns 0.43 ns 102.6 ns -
ResetParametersLarge None 103.4 ns 0.50 ns 0.47 ns 103.4 ns -
ConstructSmall None 97,641.9 ns 609.86 ns 540.63 ns 97,877.1 ns 344 B
ConstructLarge None 99,759.4 ns 1,113.26 ns 1,041.34 ns 99,650.9 ns 3408 B
CompileForSessionSmall None 1,663.9 ns 32.35 ns 57.50 ns 1,689.1 ns -
CompileForSessionLarge None 34,445.3 ns 222.75 ns 208.36 ns 34,498.1 ns -

luaMemoryLimits

Method Params Mean Error StdDev Median Gen0 Gen1 Gen2 Allocated
ResetParametersSmall Managed,Limit 102.15 ns 0.366 ns 0.305 ns 102.16 ns - - - -
ResetParametersLarge Managed,Limit 95.21 ns 0.692 ns 0.614 ns 95.12 ns - - - -
ConstructSmall Managed,Limit 137,379.06 ns 2,712.401 ns 4,456.553 ns 137,299.33 ns 3.6621 3.6621 3.6621 2097606 B
ConstructLarge Managed,Limit 138,607.89 ns 2,755.709 ns 4,753.463 ns 138,323.06 ns 3.6621 3.6621 3.6621 2100672 B
CompileForSessionSmall Managed,Limit 6,339.17 ns 126.666 ns 160.192 ns 6,316.06 ns - - - 99 B
CompileForSessionLarge Managed,Limit 34,215.27 ns 137.625 ns 122.001 ns 34,210.77 ns - - - -
ResetParametersSmall Managed,None 100.34 ns 0.361 ns 0.338 ns 100.28 ns - - - -
ResetParametersLarge Managed,None 100.74 ns 0.475 ns 0.421 ns 100.58 ns - - - -
ConstructSmall Managed,None 143,109.12 ns 2,836.777 ns 5,397.263 ns 142,720.70 ns 3.6621 3.6621 3.6621 2097678 B
ConstructLarge Managed,None 157,441.66 ns 3,137.771 ns 7,697.007 ns 156,534.24 ns 3.6621 3.6621 3.6621 2100740 B
CompileForSessionSmall Managed,None 253,209.01 ns 26,459.785 ns 78,017.278 ns 279,199.30 ns - - - 512 B
CompileForSessionLarge Managed,None 34,322.35 ns 155.958 ns 130.232 ns 34,329.75 ns - - - -
ResetParametersSmall Native,None 99.18 ns 0.490 ns 0.459 ns 99.31 ns - - - -
ResetParametersLarge Native,None 99.61 ns 0.462 ns 0.386 ns 99.53 ns - - - -
ConstructSmall Native,None 106,559.60 ns 1,881.476 ns 1,759.934 ns 107,402.40 ns - - - 328 B
ConstructLarge Native,None 106,600.10 ns 1,103.980 ns 1,032.664 ns 106,645.17 ns - - - 3392 B
CompileForSessionSmall Native,None 2,067.66 ns 38.841 ns 71.995 ns 2,064.22 ns - - - -
CompileForSessionLarge Native,None 34,540.81 ns 391.955 ns 347.458 ns 34,426.06 ns - - - -
ResetParametersSmall Tracked,Limit 98.86 ns 0.596 ns 0.528 ns 98.69 ns - - - -
ResetParametersLarge Tracked,Limit 100.35 ns 0.678 ns 0.601 ns 100.53 ns - - - -
ConstructSmall Tracked,Limit 156,818.87 ns 940.860 ns 880.081 ns 157,031.77 ns 0.2441 0.2441 0.2441 401 B
ConstructLarge Tracked,Limit 161,996.34 ns 928.417 ns 775.270 ns 162,092.98 ns 0.2441 0.2441 0.2441 3466 B
CompileForSessionSmall Tracked,Limit 3,949.95 ns 65.316 ns 61.097 ns 3,948.64 ns 0.0076 0.0076 0.0076 -
CompileForSessionLarge Tracked,Limit 41,873.48 ns 315.093 ns 279.322 ns 41,905.40 ns 0.1221 0.1221 0.1221 -
ResetParametersSmall Tracked,None 105.51 ns 0.626 ns 0.555 ns 105.62 ns - - - -
ResetParametersLarge Tracked,None 100.43 ns 0.649 ns 0.607 ns 100.48 ns - - - -
ConstructSmall Tracked,None 160,488.50 ns 1,080.702 ns 1,010.889 ns 160,509.45 ns 0.2441 0.2441 0.2441 362 B
ConstructLarge Tracked,None 159,200.11 ns 690.706 ns 612.293 ns 159,156.92 ns 0.2441 0.2441 0.2441 3426 B
CompileForSessionSmall Tracked,None 4,056.16 ns 49.709 ns 41.510 ns 4,069.20 ns 0.0076 0.0076 0.0076 -
CompileForSessionLarge Tracked,None 43,021.50 ns 527.114 ns 440.164 ns 42,957.61 ns 0.1221 0.1221 0.1221 -

LuaScriptCacheOperations

Cases where we construct a new LuaRunner are a bit slower, though most of these are in the error bounds.

main

Method Params Mean Error StdDev Median Allocated
LookupHit None 2.855 μs 0.8448 μs 2.464 μs 1.450 μs 688 B
LookupMiss None 2.504 μs 0.6450 μs 1.882 μs 3.150 μs 688 B
LoadOuterHit None 3.472 μs 0.8717 μs 2.543 μs 3.200 μs 688 B
LoadInnerHit None 220.146 μs 9.0449 μs 25.806 μs 213.550 μs 1056 B
LoadMiss None 5.845 μs 0.7413 μs 2.139 μs 6.200 μs 688 B
Digest None 14.450 μs 0.6912 μs 1.994 μs 13.800 μs 688 B

luaMemoryLimits

Method Params Mean Error StdDev Median Allocated
LookupHit Managed,Limit 3.341 μs 0.6181 μs 1.803 μs 3.600 μs 64 B
LookupMiss Managed,Limit 3.071 μs 0.5321 μs 1.560 μs 3.400 μs 688 B
LoadOuterHit Managed,Limit 5.104 μs 0.6970 μs 2.022 μs 5.500 μs 688 B
LoadInnerHit Managed,Limit 208.739 μs 9.3748 μs 27.642 μs 206.650 μs 2098560 B
LoadMiss Managed,Limit 5.648 μs 0.9120 μs 2.631 μs 5.850 μs 688 B
Digest Managed,Limit 14.363 μs 0.4871 μs 1.382 μs 14.300 μs 688 B
LookupHit Managed,None 3.106 μs 0.6947 μs 2.027 μs 2.700 μs 688 B
LookupMiss Managed,None 2.385 μs 0.6353 μs 1.863 μs 1.400 μs 688 B
LoadOuterHit Managed,None 5.884 μs 0.5978 μs 1.734 μs 6.000 μs 688 B
LoadInnerHit Managed,None 207.616 μs 13.4135 μs 38.915 μs 196.600 μs 2098384 B
LoadMiss Managed,None 7.473 μs 0.5251 μs 1.498 μs 7.450 μs 688 B
Digest Managed,None 13.751 μs 0.7736 μs 2.257 μs 14.250 μs 688 B
LookupHit Native,None 3.546 μs 0.6263 μs 1.817 μs 4.150 μs 688 B
LookupMiss Native,None 2.584 μs 0.6604 μs 1.937 μs 1.400 μs 688 B
LoadOuterHit Native,None 5.173 μs 0.6795 μs 1.971 μs 5.600 μs 688 B
LoadInnerHit Native,None 215.667 μs 4.3205 μs 9.927 μs 216.500 μs 1040 B
LoadMiss Native,None 5.934 μs 0.7840 μs 2.262 μs 6.150 μs 688 B
Digest Native,None 12.857 μs 1.0007 μs 2.951 μs 13.050 μs 688 B
LookupHit Tracked,Limit 2.293 μs 0.6791 μs 1.970 μs 1.400 μs 688 B
LookupMiss Tracked,Limit 2.584 μs 0.5875 μs 1.723 μs 1.900 μs 688 B
LoadOuterHit Tracked,Limit 5.420 μs 0.7018 μs 2.036 μs 5.600 μs 688 B
LoadInnerHit Tracked,Limit 247.422 μs 6.8352 μs 19.279 μs 242.900 μs 1072 B
LoadMiss Tracked,Limit 5.975 μs 0.6976 μs 2.013 μs 6.200 μs 688 B
Digest Tracked,Limit 13.461 μs 0.6482 μs 1.849 μs 13.550 μs 688 B
LookupHit Tracked,None 3.379 μs 0.6682 μs 1.939 μs 4.100 μs 976 B
LookupMiss Tracked,None 2.763 μs 0.6014 μs 1.754 μs 3.450 μs 688 B
LoadOuterHit Tracked,None 4.794 μs 0.7989 μs 2.330 μs 5.500 μs 688 B
LoadInnerHit Tracked,None 235.640 μs 5.0250 μs 13.840 μs 234.200 μs 1072 B
LoadMiss Tracked,None 6.243 μs 0.6940 μs 1.980 μs 5.900 μs 688 B
Digest Tracked,None 13.654 μs 0.6362 μs 1.856 μs 13.700 μs 64 B

LuaScripts

Giving up ~32% in the worst case (comparing baseline to Native,None, Script4).

main

Method Params Mean Error StdDev Gen0 Allocated
Script1 None 109.3 ns 1.09 ns 1.02 ns - -
Script2 None 174.6 ns 1.55 ns 1.38 ns 0.0002 24 B
Script3 None 248.1 ns 1.52 ns 1.35 ns 0.0005 32 B
Script4 None 228.1 ns 2.97 ns 2.78 ns - -

luaMemoryLimits

Method Params Mean Error StdDev Gen0 Allocated
Script1 Managed,Limit 173.4 ns 0.61 ns 0.51 ns - -
Script2 Managed,Limit 216.8 ns 0.94 ns 0.88 ns 0.0002 24 B
Script3 Managed,Limit 300.3 ns 1.38 ns 1.29 ns 0.0005 32 B
Script4 Managed,Limit 289.7 ns 5.54 ns 5.18 ns - -
Script1 Managed,None 149.1 ns 1.52 ns 1.42 ns - -
Script2 Managed,None 215.5 ns 1.13 ns 1.05 ns 0.0002 24 B
Script3 Managed,None 296.5 ns 1.45 ns 1.14 ns 0.0005 32 B
Script4 Managed,None 285.1 ns 5.48 ns 5.13 ns - -
Script1 Native,None 150.0 ns 0.88 ns 0.82 ns - -
Script2 Native,None 215.5 ns 1.00 ns 0.89 ns 0.0002 24 B
Script3 Native,None 298.7 ns 2.31 ns 2.05 ns 0.0005 32 B
Script4 Native,None 300.5 ns 3.51 ns 3.29 ns - -
Script1 Tracked,Limit 149.9 ns 2.69 ns 2.52 ns - -
Script2 Tracked,Limit 222.9 ns 4.27 ns 7.13 ns 0.0002 24 B
Script3 Tracked,Limit 303.6 ns 4.76 ns 4.45 ns 0.0005 32 B
Script4 Tracked,Limit 284.5 ns 1.49 ns 1.32 ns - -
Script1 Tracked,None 148.1 ns 1.96 ns 1.74 ns - -
Script2 Tracked,None 214.6 ns 0.86 ns 0.76 ns 0.0002 24 B
Script3 Tracked,None 301.3 ns 2.02 ns 1.79 ns 0.0005 32 B
Script4 Tracked,None 284.2 ns 2.11 ns 1.76 ns - -

ScriptOperations

This is more of a mixed bag, LargeScript is improved somewhat (~6%), while very basic evaluations like Eval and EvalSha are a bit slower. The loss is probably due to the pcall, and the gains are probably peanut butter improvements in calls to and from Lua from .NET.

main (eliding Params != None)

Method Params Mean Error StdDev Allocated
ScriptLoad None 80.452 μs 0.4009 μs 0.3554 μs 9600 B
ScriptExistsTrue None 18.095 μs 0.2135 μs 0.1893 μs -
ScriptExistsFalse None 17.289 μs 0.0655 μs 0.0547 μs -
Eval None 58.513 μs 0.2955 μs 0.2468 μs -
EvalSha None 24.331 μs 0.4261 μs 0.3986 μs -
SmallScript None 61.024 μs 0.2889 μs 0.2702 μs -
LargeScript None 4,297.098 μs 49.7821 μs 46.5662 μs 4 B
ArrayReturn None 110.093 μs 0.7220 μs 0.6754 μs -

luaMemoryLimits

Method Params Mean Error StdDev Median Gen0 Gen1 Gen2 Allocated
ScriptLoad Managed,Limit 85.92 μs 0.912 μs 0.853 μs 85.77 μs - - - 9600 B
ScriptExistsTrue Managed,Limit 18.13 μs 0.290 μs 0.272 μs 18.28 μs - - - -
ScriptExistsFalse Managed,Limit 16.87 μs 0.082 μs 0.073 μs 16.86 μs - - - -
Eval Managed,Limit 71.51 μs 0.866 μs 0.810 μs 71.33 μs - - - -
EvalSha Managed,Limit 31.83 μs 0.531 μs 0.497 μs 31.63 μs - - - -
SmallScript Managed,Limit 56.39 μs 0.265 μs 0.248 μs 56.34 μs - - - -
LargeScript Managed,Limit 4,964.14 μs 99.011 μs 125.218 μs 4,943.73 μs - - - 8 B
ArrayReturn Managed,Limit 155.47 μs 10.902 μs 32.144 μs 146.56 μs - - - -
ScriptLoad Managed,None 87.75 μs 0.617 μs 0.547 μs 87.82 μs - - - 9600 B
ScriptExistsTrue Managed,None 18.34 μs 0.241 μs 0.226 μs 18.40 μs - - - -
ScriptExistsFalse Managed,None 17.22 μs 0.114 μs 0.101 μs 17.19 μs - - - -
Eval Managed,None 69.74 μs 1.018 μs 0.952 μs 69.44 μs - - - -
EvalSha Managed,None 35.71 μs 0.702 μs 0.721 μs 35.69 μs - - - -
SmallScript Managed,None 59.14 μs 0.189 μs 0.157 μs 59.14 μs - - - -
LargeScript Managed,None 5,035.96 μs 78.300 μs 73.242 μs 5,022.85 μs - - - 8 B
ArrayReturn Managed,None 163.17 μs 10.815 μs 31.888 μs 155.71 μs - - - -
ScriptLoad Native,None 83.49 μs 1.469 μs 1.374 μs 83.17 μs - - - 9600 B
ScriptExistsTrue Native,None 17.62 μs 0.085 μs 0.071 μs 17.61 μs - - - -
ScriptExistsFalse Native,None 17.06 μs 0.080 μs 0.067 μs 17.04 μs - - - -
Eval Native,None 69.78 μs 0.482 μs 0.427 μs 69.74 μs - - - -
EvalSha Native,None 29.04 μs 0.242 μs 0.215 μs 29.09 μs - - - -
SmallScript Native,None 57.33 μs 1.056 μs 0.987 μs 57.88 μs - - - -
LargeScript Native,None 4,028.48 μs 21.690 μs 16.934 μs 4,032.39 μs - - - 8 B
ArrayReturn Native,None 122.22 μs 1.979 μs 1.851 μs 121.50 μs - - - -
ScriptLoad Tracked,Limit 84.09 μs 0.621 μs 0.581 μs 84.02 μs - - - 9600 B
ScriptExistsTrue Tracked,Limit 18.25 μs 0.115 μs 0.102 μs 18.25 μs - - - -
ScriptExistsFalse Tracked,Limit 18.01 μs 0.349 μs 0.327 μs 17.93 μs - - - -
Eval Tracked,Limit 69.00 μs 0.484 μs 0.453 μs 69.08 μs - - - -
EvalSha Tracked,Limit 28.56 μs 0.444 μs 0.416 μs 28.52 μs - - - -
SmallScript Tracked,Limit 58.14 μs 0.663 μs 0.587 μs 58.26 μs - - - -
LargeScript Tracked,Limit 5,133.20 μs 61.766 μs 54.754 μs 5,120.47 μs 15.6250 15.6250 15.6250 23 B
ArrayReturn Tracked,Limit 164.22 μs 1.495 μs 1.398 μs 164.29 μs - - - -
ScriptLoad Tracked,None 83.47 μs 0.620 μs 0.549 μs 83.54 μs - - - 9600 B
ScriptExistsTrue Tracked,None 18.28 μs 0.150 μs 0.133 μs 18.28 μs - - - -
ScriptExistsFalse Tracked,None 17.92 μs 0.157 μs 0.147 μs 17.88 μs - - - -
Eval Tracked,None 69.06 μs 0.665 μs 0.622 μs 69.05 μs - - - -
EvalSha Tracked,None 32.11 μs 0.432 μs 0.383 μs 32.01 μs - - - -
SmallScript Tracked,None 57.22 μs 0.935 μs 0.874 μs 57.11 μs - - - -
LargeScript Tracked,None 4,984.96 μs 29.063 μs 24.269 μs 4,977.49 μs 15.6250 15.6250 15.6250 25 B
ArrayReturn Tracked,None 146.78 μs 1.429 μs 1.267 μs 146.98 μs - - - -

@kevin-montrose kevin-montrose marked this pull request as ready for review January 8, 2025 15:12
@badrishc
Copy link
Contributor

badrishc commented Jan 9, 2025

LuaScripts BDN - Giving up ~32% in the worst case

This would be the most concerning for the PR. What is causing this drop, and if it is the trampoline, then is there a way to enable an unsafe mode that avoids this overhead?

public SessionScriptCache(StoreWrapper storeWrapper, IGarnetAuthenticator authenticator, ILogger logger = null)
{
this.storeWrapper = storeWrapper;
this.logger = logger;

scratchBufferNetworkSender = new ScratchBufferNetworkSender();
processor = new RespServerSession(0, scratchBufferNetworkSender, storeWrapper, null, authenticator, false);

// There's some parsing involved in these, so save them off per-session
memoryManagementMode = storeWrapper.serverOptions.LuaOptions.MemoryManagementMode;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems these lines are causing BDN for BasicOperations, ObjectOperations, HashObjectOperations to fail as something (perhaps storeWrapper) is null here:

System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
---> System.NullReferenceException: Object reference not set to an instance of an object.
   at Garnet.server.SessionScriptCache..ctor(StoreWrapper storeWrapper, IGarnetAuthenticator authenticator, ILogger logger) in /_/libs/server/Lua/SessionScriptCache.cs:line 41
   at Garnet.server.RespServerSession..ctor(Int64 id, INetworkSender networkSender, StoreWrapper storeWrapper, SubscribeBroker`3 subscribeBroker, IGarnetAuthenticator authenticator, Boolean enableScripts) in /_/libs/server/Resp/RespServerSession.cs:line 221
   at Embedded.server.EmbeddedRespServer.GetRespSession() in /_/benchmark/BDN.benchmark/Embedded/EmbeddedRespServer.cs:line 41
   at BDN.benchmark.Operations.OperationsBase.GlobalSetup() in /_/benchmark/BDN.benchmark/Operations/OperationsBase.cs:line 80
   at BDN.benchmark.Operations.BasicOperations.GlobalSetup() in /_/benchmark/BDN.benchmark/Operations/BasicOperations.cs:line 20
   at BenchmarkDotNet.Engines.EngineFactory.CreateReadyToRun(EngineParameters engineParameters)
   at BenchmarkDotNet.Autogenerated.Runnable_0.Run(IHost host, String benchmarkName) in /_/benchmark/BDN.benchmark/bin/Release/net8.0/cb61c2e4-da46-43ab-8a17-882e6ff8a654/cb61c2e4-da46-43ab-8a17-882e6ff8a654.notcs:line 177
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InvokeDirectByRefWithFewArgs(Object obj, Span`1 copyOfArgs, BindingFlags invokeAttr)
   --- End of inner exception stack trace ---
   at System.Reflection.MethodBaseInvoker.InvokeDirectByRefWithFewArgs(Object obj, Span`1 copyOfArgs, BindingFlags invokeAttr)
   at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at BenchmarkDotNet.Autogenerated.UniqueProgramName.AfterAssemblyLoadingAttached(String[] args) in /_/benchmark/BDN.benchmark/bin/Release/net8.0/cb61c2e4-da46-43ab-8a17-882e6ff8a654/cb61c2e4-da46-43ab-8a17-882e6ff8a654.notcs:line 57

Example action run: https://github.com/microsoft/garnet/actions/runs/12681127499/job/35344227191

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run the BDN Operations.ScriptOperations - the allocated value for "LargeScript" is now showing 23 bytes when it used to be 12. Is that expected / OK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a heads up ... I will push an update to this PR to do the following

  1. Update the BDN CI Action YML so it runs (and charts) Lua.LuaScriptCacheOperations and Lua.LuaRunnerOperations
  2. Update BDN_Benchmark_Config.json with most recent allocated byte numbers. This file is the "ground truth" of what we expect the Allocated value is. Since I don't have history of your new BDN metrics, I will set the "expected" values to what we are seeing currently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run the BDN Operations.ScriptOperations - the allocated value for "LargeScript" is now showing 23 bytes when it used to be 12. Is that expected / OK?

In my experience BDN memory tracking that gets down to just a handful of bytes is kidna inherently variable, so 23-vs-12 isn't concerning.

I am also seeing this in the BDN.benchmark.Lua.LuaRunnerOperations which means the BDN is failing to create the metrics.

I don't see those results in the link you shared? Should I be looking somewhere else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok regarding the 23 vs 12. I will update the expected value.
The BDN.benchmark.Lua.LuaRunnerOperations were not part of our test runs. It will be part of my push to the PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen some runs where the allocated is 668 but then ran again it comes back as 1024 or 1312. I have seen this on same platform, but also on different platform. For example:
On windows: LookupHit Tracked,None = 668
On Linux: LookupHit Tracked,None = 1024

Our expected value doesn't differentiate platform, so I can just put 1024 and they both will pass.
My question - is it expected that the allocated can vary a bit from run to run (even on same platform)?

Copy link
Contributor

@darrenge darrenge Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

** Pushed my changes **
BDN Run with the latest BDN fixes, two new Lua BDNs (Lua.LuaScriptCacheOperations and Lua.LuaRunnerOperations) and my fixes to expected values.: https://github.com/microsoft/garnet/actions/runs/12698624593
Reminder - all results log files are at the bottom of the BDN test

Looks like LuaRunnerOperations BDN test itself is failing as results coming back NA.
CompileForSessionSmall | Managed,Limit | NA | NA | NA | NA | NA | NA | NA | NA |

From Results file:
System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
---> Garnet.common.GarnetException: Failed to write to response buffer
at Garnet.common.GarnetException.Throw(String message, LogLevel logLevel) in //libs/common/GarnetException.cs:line 57
at Garnet.server.LuaRunner.CompileCommon[TResponse](TResponse& resp) in /
/libs/server/Lua/LuaRunner.cs:line 564
at Garnet.server.LuaRunnerTrampolines.CompileForSession(IntPtr luaState) in //libs/server/Lua/LuaRunner.cs:line 1568
at Garnet.server.NativeMethods.lua_pcallk(IntPtr luaState, Int32 nargs, Int32 nresults, Int32 msgh, IntPtr ctx, IntPtr k)
at Garnet.server.LuaRunner.CompileForSession(RespServerSession session) in /
/libs/server/Lua/LuaRunner.cs:line 453
at BDN.benchmark.Lua.LuaRunnerOperations.CompileForSessionSmall() in //benchmark/BDN.benchmark/Lua/LuaRunnerOperations.cs:line 218
at BenchmarkDotNet.Autogenerated.Runnable_4.WorkloadActionUnroll(Int64 invokeCount) in /
/benchmark/BDN.benchmark/bin/Release/net8.0/1e3667a0-3c30-49b8-9ec2-d2045162aeb7/1e3667a0-3c30-49b8-9ec2-d2045162aeb7.notcs:line 1068
at BenchmarkDotNet.Engines.Engine.Measure(Action1 action, Int64 invokeCount) at BenchmarkDotNet.Engines.Engine.RunIteration(IterationData data) at BenchmarkDotNet.Engines.EngineStage.RunIteration(IterationMode mode, IterationStage stage, Int32 index, Int64 invokeCount, Int32 unrollFactor) at BenchmarkDotNet.Engines.EngineStage.Run(IStoppingCriteria criteria, Int64 invokeCount, IterationMode mode, IterationStage stage, Int32 unrollFactor) at BenchmarkDotNet.Engines.EngineWarmupStage.Run(Int64 invokeCount, IterationMode iterationMode, Int32 unrollFactor, RunStrategy runStrategy) at BenchmarkDotNet.Engines.EngineWarmupStage.RunWorkload(Int64 invokeCount, Int32 unrollFactor, RunStrategy runStrategy) at BenchmarkDotNet.Engines.Engine.Run() at BenchmarkDotNet.Autogenerated.Runnable_4.Run(IHost host, String benchmarkName) in /_/benchmark/BDN.benchmark/bin/Release/net8.0/1e3667a0-3c30-49b8-9ec2-d2045162aeb7/1e3667a0-3c30-49b8-9ec2-d2045162aeb7.notcs:line 951 at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor) at System.Reflection.MethodBaseInvoker.InvokeDirectByRefWithFewArgs(Object obj, Span1 copyOfArgs, BindingFlags invokeAttr)
--- End of inner exception stack trace ---
at System.Reflection.MethodBaseInvoker.InvokeDirectByRefWithFewArgs(Object obj, Span`1 copyOfArgs, BindingFlags invokeAttr)
at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters)
at BenchmarkDotNet.Autogenerated.UniqueProgramName.AfterAssemblyLoadingAttached(String[] args) in /_/benchmark/BDN.benchmark/bin/Release/net8.0/1e3667a0-3c30-49b8-9ec2-d2045162aeb7/1e3667a0-3c30-49b8-9ec2-d2045162aeb7.notcs:line 57
// AfterAll

kevin-montrose and others added 7 commits January 9, 2025 14:26
1) Added a check for NA in results which is an indication that the BDN test failed at run time
2) Added 'Lua.LuaScriptCacheOperations','Lua.LuaRunnerOperations' to BDN Github Action
3) Updated Expected values for the new Lua BDN tests
@kevin-montrose
Copy link
Contributor Author

LuaScripts BDN - Giving up ~32% in the worst case

This would be the most concerning for the PR. What is causing this drop, and if it is the trampoline, then is there a way to enable an unsafe mode that avoids this overhead?

Doing some light profiling, it's the extra pcall layer. I'll look at clawing some of this back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants