Skip to content

Commit aa856b3

Browse files
Sort out the mess around Lua error messages and error stats (redis#10329)
This PR fix 2 issues on Lua scripting: * Server error reply statistics (some errors were counted twice). * Error code and error strings returning from scripts (error code was missing / misplaced). ## Statistics a Lua script user is considered part of the user application, a sophisticated transaction, so we want to count an error even if handled silently by the script, but when it is propagated outwards from the script we don't wanna count it twice. on the other hand, if the script decides to throw an error on its own (using `redis.error_reply`), we wanna count that too. Besides, we do count the `calls` in command statistics for the commands the script calls, we we should certainly also count `failed_calls`. So when a simple `eval "return redis.call('set','x','y')" 0` fails, it should count the failed call to both SET and EVAL, but the `errorstats` and `total_error_replies` should be counted only once. The PR changes the error object that is raised on errors. Instead of raising a simple Lua string, Redis will raise a Lua table in the following format: ``` { err='<error message (including error code)>', source='<User source file name>', line='<line where the error happned>', ignore_error_stats_update=true/false, } ``` The `luaPushError` function was modified to construct the new error table as describe above. The `luaRaiseError` was renamed to `luaError` and is now simply called `lua_error` to raise the table on the top of the Lua stack as the error object. The reason is that since its functionality is changed, in case some Redis branch / fork uses it, it's better to have a compilation error than a bug. The `source` and `line` fields are enriched by the error handler (if possible) and the `ignore_error_stats_update` is optional and if its not present then the default value is `false`. If `ignore_error_stats_update` is true, the error will not be counted on the error stats. When parsing Redis call reply, each error is translated to a Lua table on the format describe above and the `ignore_error_stats_update` field is set to `true` so we will not count errors twice (we counted this error when we invoke the command). The changes in this PR might have been considered as a breaking change for users that used Lua `pcall` function. Before, the error was a string and now its a table. To keep backward comparability the PR override the `pcall` implementation and extract the error message from the error table and return it. Example of the error stats update: ``` 127.0.0.1:6379> lpush l 1 (integer) 2 127.0.0.1:6379> eval "return redis.call('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value. script: e471b73f1ef44774987ab00bdf51f21fd9f7974a, on @user_script:1. 127.0.0.1:6379> info Errorstats # Errorstats errorstat_WRONGTYPE:count=1 127.0.0.1:6379> info commandstats # Commandstats cmdstat_eval:calls=1,usec=341,usec_per_call=341.00,rejected_calls=0,failed_calls=1 cmdstat_info:calls=1,usec=35,usec_per_call=35.00,rejected_calls=0,failed_calls=0 cmdstat_lpush:calls=1,usec=14,usec_per_call=14.00,rejected_calls=0,failed_calls=0 cmdstat_get:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=1 ``` ## error message We can now construct the error message (sent as a reply to the user) from the error table, so this solves issues where the error message was malformed and the error code appeared in the middle of the error message: ```diff 127.0.0.1:6379> eval "return redis.call('set','x','y')" 0 -(error) ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'. +(error) OOM command not allowed when used memory > 'maxmemory' @user_script:1. Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479) ``` ```diff 127.0.0.1:6379> eval "redis.call('get', 'l')" 0 -(error) ERR Error running script (call to f_8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1): @user_script:1: WRONGTYPE Operation against a key holding the wrong kind of value +(error) WRONGTYPE Operation against a key holding the wrong kind of value script: 8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1, on @user_script:1. ``` Notica that `redis.pcall` was not change: ``` 127.0.0.1:6379> eval "return redis.pcall('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value ``` ## other notes Notice that Some commands (like GEOADD) changes the cmd variable on the client stats so we can not count on it to update the command stats. In order to be able to update those stats correctly we needed to promote `realcmd` variable to be located on the client struct. Tests was added and modified to verify the changes. Related PR's: redis#10279, redis#10218, redis#10278, redis#10309 Co-authored-by: Oran Agra <[email protected]>
1 parent 9f30dd0 commit aa856b3

12 files changed

+472
-150
lines changed

src/eval.c

+17-14
Original file line numberDiff line numberDiff line change
@@ -241,11 +241,14 @@ void scriptingInit(int setup) {
241241
" if i and i.what == 'C' then\n"
242242
" i = dbg.getinfo(3,'nSl')\n"
243243
" end\n"
244+
" if type(err) ~= 'table' then\n"
245+
" err = {err='ERR' .. tostring(err)}"
246+
" end"
244247
" if i then\n"
245-
" return i.source .. ':' .. i.currentline .. ': ' .. err\n"
246-
" else\n"
247-
" return err\n"
248-
" end\n"
248+
" err['source'] = i.source\n"
249+
" err['line'] = i.currentline\n"
250+
" end"
251+
" return err\n"
249252
"end\n";
250253
luaL_loadbuffer(lua,errh_func,strlen(errh_func),"@err_handler_def");
251254
lua_pcall(lua,0,0,0);
@@ -387,7 +390,7 @@ sds luaCreateFunction(client *c, robj *body) {
387390
if (luaL_loadbuffer(lctx.lua,funcdef,sdslen(funcdef),"@user_script")) {
388391
if (c != NULL) {
389392
addReplyErrorFormat(c,
390-
"Error compiling script (new function): %s\n",
393+
"Error compiling script (new function): %s",
391394
lua_tostring(lctx.lua,-1));
392395
}
393396
lua_pop(lctx.lua,1);
@@ -398,7 +401,7 @@ sds luaCreateFunction(client *c, robj *body) {
398401

399402
if (lua_pcall(lctx.lua,0,0,0)) {
400403
if (c != NULL) {
401-
addReplyErrorFormat(c,"Error running script (new function): %s\n",
404+
addReplyErrorFormat(c,"Error running script (new function): %s",
402405
lua_tostring(lctx.lua,-1));
403406
}
404407
lua_pop(lctx.lua,1);
@@ -1474,8 +1477,8 @@ int ldbRepl(lua_State *lua) {
14741477
while((argv = ldbReplParseCommand(&argc, &err)) == NULL) {
14751478
char buf[1024];
14761479
if (err) {
1477-
lua_pushstring(lua, err);
1478-
lua_error(lua);
1480+
luaPushError(lua, err);
1481+
luaError(lua);
14791482
}
14801483
int nread = connRead(ldb.conn,buf,sizeof(buf));
14811484
if (nread <= 0) {
@@ -1492,8 +1495,8 @@ int ldbRepl(lua_State *lua) {
14921495
if (sdslen(ldb.cbuf) > 1<<20) {
14931496
sdsfree(ldb.cbuf);
14941497
ldb.cbuf = sdsempty();
1495-
lua_pushstring(lua, "max client buffer reached");
1496-
lua_error(lua);
1498+
luaPushError(lua, "max client buffer reached");
1499+
luaError(lua);
14971500
}
14981501
}
14991502

@@ -1553,8 +1556,8 @@ ldbLog(sdsnew(" next line of code."));
15531556
ldbEval(lua,argv,argc);
15541557
ldbSendLogs();
15551558
} else if (!strcasecmp(argv[0],"a") || !strcasecmp(argv[0],"abort")) {
1556-
lua_pushstring(lua, "script aborted for user request");
1557-
lua_error(lua);
1559+
luaPushError(lua, "script aborted for user request");
1560+
luaError(lua);
15581561
} else if (argc > 1 &&
15591562
(!strcasecmp(argv[0],"r") || !strcasecmp(argv[0],"redis"))) {
15601563
ldbRedis(lua,argv,argc);
@@ -1635,8 +1638,8 @@ void luaLdbLineHook(lua_State *lua, lua_Debug *ar) {
16351638
/* If the client closed the connection and we have a timeout
16361639
* connection, let's kill the script otherwise the process
16371640
* will remain blocked indefinitely. */
1638-
lua_pushstring(lua, "timeout during Lua debugging with client closing connection");
1639-
lua_error(lua);
1641+
luaPushError(lua, "timeout during Lua debugging with client closing connection");
1642+
luaError(lua);
16401643
}
16411644
rctx->start_time = getMonotonicUs();
16421645
rctx->snapshot_time = mstime();

src/function_lua.c

+16-10
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,8 @@ static void luaEngineLoadHook(lua_State *lua, lua_Debug *ar) {
8686
if (duration > LOAD_TIMEOUT_MS) {
8787
lua_sethook(lua, luaEngineLoadHook, LUA_MASKLINE, 0);
8888

89-
lua_pushstring(lua,"FUNCTION LOAD timeout");
90-
lua_error(lua);
89+
luaPushError(lua,"FUNCTION LOAD timeout");
90+
luaError(lua);
9191
}
9292
}
9393

@@ -151,10 +151,13 @@ static int luaEngineCreate(void *engine_ctx, functionLibInfo *li, sds blob, sds
151151
lua_sethook(lua,luaEngineLoadHook,LUA_MASKCOUNT,100000);
152152
/* Run the compiled code to allow it to register functions */
153153
if (lua_pcall(lua,0,0,0)) {
154-
*err = sdscatprintf(sdsempty(), "Error registering functions: %s", lua_tostring(lua, -1));
154+
errorInfo err_info = {0};
155+
luaExtractErrorInformation(lua, &err_info);
156+
*err = sdscatprintf(sdsempty(), "Error registering functions: %s", err_info.msg);
155157
lua_pop(lua, 2); /* pops the error and globals table */
156158
lua_sethook(lua,NULL,0,0); /* Disable hook */
157159
luaSaveOnRegistry(lua, REGISTRY_LOAD_CTX_NAME, NULL);
160+
luaErrorInformationDiscard(&err_info);
158161
return C_ERR;
159162
}
160163
lua_sethook(lua,NULL,0,0); /* Disable hook */
@@ -429,19 +432,19 @@ static int luaRegisterFunction(lua_State *lua) {
429432
loadCtx *load_ctx = luaGetFromRegistry(lua, REGISTRY_LOAD_CTX_NAME);
430433
if (!load_ctx) {
431434
luaPushError(lua, "redis.register_function can only be called on FUNCTION LOAD command");
432-
return luaRaiseError(lua);
435+
return luaError(lua);
433436
}
434437

435438
if (luaRegisterFunctionReadArgs(lua, &register_f_args) != C_OK) {
436-
return luaRaiseError(lua);
439+
return luaError(lua);
437440
}
438441

439442
sds err = NULL;
440443
if (functionLibCreateFunction(register_f_args.name, register_f_args.lua_f_ctx, load_ctx->li, register_f_args.desc, register_f_args.f_flags, &err) != C_OK) {
441444
luaRegisterFunctionArgsDispose(lua, &register_f_args);
442445
luaPushError(lua, err);
443446
sdsfree(err);
444-
return luaRaiseError(lua);
447+
return luaError(lua);
445448
}
446449

447450
return 0;
@@ -475,11 +478,14 @@ int luaEngineInitEngine() {
475478
" if i and i.what == 'C' then\n"
476479
" i = dbg.getinfo(3,'nSl')\n"
477480
" end\n"
481+
" if type(err) ~= 'table' then\n"
482+
" err = {err='ERR' .. tostring(err)}"
483+
" end"
478484
" if i then\n"
479-
" return i.source .. ':' .. i.currentline .. ': ' .. err\n"
480-
" else\n"
481-
" return err\n"
482-
" end\n"
485+
" err['source'] = i.source\n"
486+
" err['line'] = i.currentline\n"
487+
" end"
488+
" return err\n"
483489
"end\n"
484490
"return error_handler";
485491
luaL_loadbuffer(lua_engine_ctx->lua, errh_func, strlen(errh_func), "@err_handler_def");

src/module.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -5658,7 +5658,7 @@ RedisModuleCallReply *RM_Call(RedisModuleCtx *ctx, const char *cmdname, const ch
56585658
errno = ENOENT;
56595659
goto cleanup;
56605660
}
5661-
c->cmd = c->lastcmd = cmd;
5661+
c->cmd = c->lastcmd = c->realcmd = cmd;
56625662

56635663
/* Basic arity checks. */
56645664
if ((cmd->arity > 0 && cmd->arity != argc) || (argc < -cmd->arity)) {

src/multi.c

+2-2
Original file line numberDiff line numberDiff line change
@@ -189,7 +189,7 @@ void execCommand(client *c) {
189189
c->argc = c->mstate.commands[j].argc;
190190
c->argv = c->mstate.commands[j].argv;
191191
c->argv_len = c->mstate.commands[j].argv_len;
192-
c->cmd = c->mstate.commands[j].cmd;
192+
c->cmd = c->realcmd = c->mstate.commands[j].cmd;
193193

194194
/* ACL permissions are also checked at the time of execution in case
195195
* they were changed after the commands were queued. */
@@ -240,7 +240,7 @@ void execCommand(client *c) {
240240
c->argv = orig_argv;
241241
c->argv_len = orig_argv_len;
242242
c->argc = orig_argc;
243-
c->cmd = orig_cmd;
243+
c->cmd = c->realcmd = orig_cmd;
244244
discardTransaction(c);
245245

246246
server.in_exec = 0;

src/networking.c

+64-31
Original file line numberDiff line numberDiff line change
@@ -156,7 +156,7 @@ client *createClient(connection *conn) {
156156
c->argv_len_sum = 0;
157157
c->original_argc = 0;
158158
c->original_argv = NULL;
159-
c->cmd = c->lastcmd = NULL;
159+
c->cmd = c->lastcmd = c->realcmd = NULL;
160160
c->multibulklen = 0;
161161
c->bulklen = -1;
162162
c->sentlen = 0;
@@ -443,8 +443,10 @@ void addReplyErrorLength(client *c, const char *s, size_t len) {
443443
addReplyProto(c,"\r\n",2);
444444
}
445445

446-
/* Do some actions after an error reply was sent (Log if needed, updates stats, etc.) */
447-
void afterErrorReply(client *c, const char *s, size_t len) {
446+
/* Do some actions after an error reply was sent (Log if needed, updates stats, etc.)
447+
* Possible flags:
448+
* * ERR_REPLY_FLAG_NO_STATS_UPDATE - indicate not to update any error stats. */
449+
void afterErrorReply(client *c, const char *s, size_t len, int flags) {
448450
/* Module clients fall into two categories:
449451
* Calls to RM_Call, in which case the error isn't being returned to a client, so should not be counted.
450452
* Module thread safe context calls to RM_ReplyWithError, which will be added to a real client by the main thread later. */
@@ -457,22 +459,30 @@ void afterErrorReply(client *c, const char *s, size_t len) {
457459
return;
458460
}
459461

460-
/* Increment the global error counter */
461-
server.stat_total_error_replies++;
462-
/* Increment the error stats
463-
* If the string already starts with "-..." then the error prefix
464-
* is provided by the caller ( we limit the search to 32 chars). Otherwise we use "-ERR". */
465-
if (s[0] != '-') {
466-
incrementErrorCount("ERR", 3);
467-
} else {
468-
char *spaceloc = memchr(s, ' ', len < 32 ? len : 32);
469-
if (spaceloc) {
470-
const size_t errEndPos = (size_t)(spaceloc - s);
471-
incrementErrorCount(s+1, errEndPos-1);
472-
} else {
473-
/* Fallback to ERR if we can't retrieve the error prefix */
462+
if (!(flags & ERR_REPLY_FLAG_NO_STATS_UPDATE)) {
463+
/* Increment the global error counter */
464+
server.stat_total_error_replies++;
465+
/* Increment the error stats
466+
* If the string already starts with "-..." then the error prefix
467+
* is provided by the caller ( we limit the search to 32 chars). Otherwise we use "-ERR". */
468+
if (s[0] != '-') {
474469
incrementErrorCount("ERR", 3);
470+
} else {
471+
char *spaceloc = memchr(s, ' ', len < 32 ? len : 32);
472+
if (spaceloc) {
473+
const size_t errEndPos = (size_t)(spaceloc - s);
474+
incrementErrorCount(s+1, errEndPos-1);
475+
} else {
476+
/* Fallback to ERR if we can't retrieve the error prefix */
477+
incrementErrorCount("ERR", 3);
478+
}
475479
}
480+
} else {
481+
/* stat_total_error_replies will not be updated, which means that
482+
* the cmd stats will not be updated as well, we still want this command
483+
* to be counted as failed so we update it here. We update c->realcmd in
484+
* case c->cmd was changed (like in GEOADD). */
485+
c->realcmd->failed_calls++;
476486
}
477487

478488
/* Sometimes it could be normal that a slave replies to a master with
@@ -518,7 +528,7 @@ void afterErrorReply(client *c, const char *s, size_t len) {
518528
* Unlike addReplyErrorSds and others alike which rely on addReplyErrorLength. */
519529
void addReplyErrorObject(client *c, robj *err) {
520530
addReply(c, err);
521-
afterErrorReply(c, err->ptr, sdslen(err->ptr)-2); /* Ignore trailing \r\n */
531+
afterErrorReply(c, err->ptr, sdslen(err->ptr)-2, 0); /* Ignore trailing \r\n */
522532
}
523533

524534
/* Sends either a reply or an error reply by checking the first char.
@@ -539,34 +549,57 @@ void addReplyOrErrorObject(client *c, robj *reply) {
539549
/* See addReplyErrorLength for expectations from the input string. */
540550
void addReplyError(client *c, const char *err) {
541551
addReplyErrorLength(c,err,strlen(err));
542-
afterErrorReply(c,err,strlen(err));
552+
afterErrorReply(c,err,strlen(err),0);
553+
}
554+
555+
/* Add error reply to the given client.
556+
* Supported flags:
557+
* * ERR_REPLY_FLAG_NO_STATS_UPDATE - indicate not to perform any error stats updates */
558+
void addReplyErrorSdsEx(client *c, sds err, int flags) {
559+
addReplyErrorLength(c,err,sdslen(err));
560+
afterErrorReply(c,err,sdslen(err),flags);
561+
sdsfree(err);
543562
}
544563

545564
/* See addReplyErrorLength for expectations from the input string. */
546565
/* As a side effect the SDS string is freed. */
547566
void addReplyErrorSds(client *c, sds err) {
548-
addReplyErrorLength(c,err,sdslen(err));
549-
afterErrorReply(c,err,sdslen(err));
550-
sdsfree(err);
567+
addReplyErrorSdsEx(c, err, 0);
551568
}
552569

553-
/* See addReplyErrorLength for expectations from the formatted string.
554-
* The formatted string is safe to contain \r and \n anywhere. */
555-
void addReplyErrorFormat(client *c, const char *fmt, ...) {
556-
va_list ap;
557-
va_start(ap,fmt);
558-
sds s = sdscatvprintf(sdsempty(),fmt,ap);
559-
va_end(ap);
570+
/* Internal function used by addReplyErrorFormat and addReplyErrorFormatEx.
571+
* Refer to afterErrorReply for more information about the flags. */
572+
static void addReplyErrorFormatInternal(client *c, int flags, const char *fmt, va_list ap) {
573+
va_list cpy;
574+
va_copy(cpy,ap);
575+
sds s = sdscatvprintf(sdsempty(),fmt,cpy);
576+
va_end(cpy);
560577
/* Trim any newlines at the end (ones will be added by addReplyErrorLength) */
561578
s = sdstrim(s, "\r\n");
562579
/* Make sure there are no newlines in the middle of the string, otherwise
563580
* invalid protocol is emitted. */
564581
s = sdsmapchars(s, "\r\n", " ", 2);
565582
addReplyErrorLength(c,s,sdslen(s));
566-
afterErrorReply(c,s,sdslen(s));
583+
afterErrorReply(c,s,sdslen(s),flags);
567584
sdsfree(s);
568585
}
569586

587+
void addReplyErrorFormatEx(client *c, int flags, const char *fmt, ...) {
588+
va_list ap;
589+
va_start(ap,fmt);
590+
addReplyErrorFormatInternal(c, flags, fmt, ap);
591+
va_end(ap);
592+
}
593+
594+
/* See addReplyErrorLength for expectations from the formatted string.
595+
* The formatted string is safe to contain \r and \n anywhere. */
596+
void addReplyErrorFormat(client *c, const char *fmt, ...) {
597+
va_list ap;
598+
va_start(ap,fmt);
599+
addReplyErrorFormatInternal(c, 0, fmt, ap);
600+
va_end(ap);
601+
}
602+
570603
void addReplyErrorArity(client *c) {
571604
addReplyErrorFormat(c, "wrong number of arguments for '%s' command",
572605
c->cmd->fullname);
@@ -1086,7 +1119,7 @@ void deferredAfterErrorReply(client *c, list *errors) {
10861119
listRewind(errors,&li);
10871120
while((ln = listNext(&li))) {
10881121
sds err = ln->value;
1089-
afterErrorReply(c, err, sdslen(err));
1122+
afterErrorReply(c, err, sdslen(err), 0);
10901123
}
10911124
}
10921125

src/script.c

+13-9
Original file line numberDiff line numberDiff line change
@@ -505,32 +505,31 @@ void scriptCall(scriptRunCtx *run_ctx, robj* *argv, int argc, sds *err) {
505505
argc = c->argc;
506506

507507
struct redisCommand *cmd = lookupCommand(argv, argc);
508+
c->cmd = c->lastcmd = c->realcmd = cmd;
508509
if (scriptVerifyCommandArity(cmd, argc, err) != C_OK) {
509-
return;
510+
goto error;
510511
}
511512

512-
c->cmd = c->lastcmd = cmd;
513-
514513
/* There are commands that are not allowed inside scripts. */
515514
if (!server.script_disable_deny_script && (cmd->flags & CMD_NOSCRIPT)) {
516515
*err = sdsnew("This Redis command is not allowed from script");
517-
return;
516+
goto error;
518517
}
519518

520519
if (scriptVerifyAllowStale(c, err) != C_OK) {
521-
return;
520+
goto error;
522521
}
523522

524523
if (scriptVerifyACL(c, err) != C_OK) {
525-
return;
524+
goto error;
526525
}
527526

528527
if (scriptVerifyWriteCommandAllow(run_ctx, err) != C_OK) {
529-
return;
528+
goto error;
530529
}
531530

532531
if (scriptVerifyOOM(run_ctx, err) != C_OK) {
533-
return;
532+
goto error;
534533
}
535534

536535
if (cmd->flags & CMD_WRITE) {
@@ -539,7 +538,7 @@ void scriptCall(scriptRunCtx *run_ctx, robj* *argv, int argc, sds *err) {
539538
}
540539

541540
if (scriptVerifyClusterState(c, run_ctx->original_client, err) != C_OK) {
542-
return;
541+
goto error;
543542
}
544543

545544
int call_flags = CMD_CALL_SLOWLOG | CMD_CALL_STATS;
@@ -551,6 +550,11 @@ void scriptCall(scriptRunCtx *run_ctx, robj* *argv, int argc, sds *err) {
551550
}
552551
call(c, call_flags);
553552
serverAssert((c->flags & CLIENT_BLOCKED) == 0);
553+
return;
554+
555+
error:
556+
afterErrorReply(c, *err, sdslen(*err), 0);
557+
incrCommandStatsOnError(cmd, ERROR_COMMAND_REJECTED);
554558
}
555559

556560
/* Returns the time when the script invocation started */

0 commit comments

Comments
 (0)