Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid runtime failure in CopyXLogRecordToWAL with sanitizers enabled #554

Open
wants to merge 1 commit into
base: REL_17_STABLE_neon
Choose a base branch
from

Conversation

alexanderlaw
Copy link

There is a single place in the code, that triggers:

.../vendor/postgres-v17/src/backend/access/transam/xlog.c:1345:3: runtime error: null pointer passed as argument 2, which is declared to never be null
    #0 0x616fe1a47039 in CopyXLogRecordToWAL .../vendor/postgres-v17/src/backend/access/transam/xlog.c:1345
    #1 0x616fe1a452da in XLogInsertRecord .../vendor/postgres-v17/src/backend/access/transam/xlog.c:971
    #2 0x616fe1a8599e in XLogInsert .../vendor/postgres-v17/src/backend/access/transam/xloginsert.c:535
    #3 0x616fe2ce6dd3 in LogLogicalMessage .../vendor/postgres-v17/src/backend/replication/logical/message.c:72
    #4 0x616fe2d91900 in ReplicationSlotDropPtr .../vendor/postgres-v17/src/backend/replication/slot.c:908
    #5 0x616fe2d91521 in ReplicationSlotDropAcquired .../vendor/postgres-v17/src/backend/replication/slot.c:878
    #6 0x616fe2d90f01 in ReplicationSlotDrop .../vendor/postgres-v17/src/backend/replication/slot.c:801
    #7 0x616fe2dd05e5 in DropReplicationSlot .../vendor/postgres-v17/src/backend/replication/walsender.c:1411
    #8 0x616fe2dd3b82 in exec_replication_command .../vendor/postgres-v17/src/backend/replication/walsender.c:2129
    #9 0x616fe3003c7f in PostgresMain .../vendor/postgres-v17/src/backend/tcop/postgres.c:4773
    #10 0x616fe2fe70e6 in BackendMain .../vendor/postgres-v17/src/backend/tcop/backend_startup.c:105
    #11 0x616fe2bdde33 in postmaster_child_launch .../vendor/postgres-v17/src/backend/postmaster/launch_backend.c:277
    #12 0x616fe2bed747 in BackendStartup .../vendor/postgres-v17/src/backend/postmaster/postmaster.c:3594
    #13 0x616fe2be6786 in ServerLoop .../vendor/postgres-v17/src/backend/postmaster/postmaster.c:1674
    #14 0x616fe2be5158 in PostmasterMain .../vendor/postgres-v17/src/backend/postmaster/postmaster.c:1372
    #15 0x616fe25581ee in main .../vendor/postgres-v17/src/backend/main/main.c:237
    #16 0x74322502a1c9 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    #17 0x74322502a28a in __libc_start_main_impl ../csu/libc-start.c:360
    #18 0x616fe1482cd4 in _start (...pg_install/v17/bin/postgres+0x3245cd4) (BuildId: 219f2ffde6cb1384f41ccb0201a0cf72c4a24e59)

in several regress tests (test_dropdb_with_subscription, test_layer_bloating, test_logical_replication, ...)

The change proposed is enough to make tests pass with asan/ubsan-enabled postgres build.

See also 46ab07f.

@tristan957
Copy link
Member

Should this be upstreamed?

@alexanderlaw
Copy link
Author

Should this be upstreamed?

I can research this deeper, but if my memory serves me (I had the same idea when I discovered the failure in 2023), I could not reach that memcpy with rdata_data == null on vanilla postgres. From the backtrace shown, I suspect that ReplicationSlotDropPtr is producing such unusual WAL records.

@tristan957
Copy link
Member

Seems like we should probably figure out how we sending NULL to this function then.

@alexanderlaw
Copy link
Author

Seems like we should probably figure out how we sending NULL to this function then.

I see three such places:

LogLogicalMessage(prefix, NULL, 0, false, true);

LogLogicalMessage(prefix, NULL, 0, false, false);

LogLogicalMessage(prefix, NULL, 0, false, true);

I couldn't find such LogLogicalMessage() calls in vanilla PostgreSQL; so tried:

@@ -373,8 +373,9 @@ pg_logical_emit_message_bytea(PG_FUNCTION_ARGS)
        bool            flush = PG_GETARG_BOOL(3);
        XLogRecPtr      lsn;
 
-       lsn = LogLogicalMessage(prefix, VARDATA_ANY(data), VARSIZE_ANY_EXHDR(data),
+       lsn = LogLogicalMessage(prefix, NULL, 0,
                                                        transactional, flush);

and got the same ubsan-detected errors during ' make -s check -C contrib/test_decoding/'.

@tristan957
Copy link
Member

Yeah, then I think this patch makes sense. I don't think it's possible to not pass NULL there.

@alexanderlaw
Copy link
Author

Closing this PR, as the proposed change is included in neondatabase/neon#10473 now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants