Skip to content

Conversation

@npow
Copy link
Contributor

@npow npow commented Oct 22, 2025

Problem

Intermittent JSONDecodeError when multiple environments are resolved concurrently during deployment:

json.decoder.JSONDecodeError: Expecting value: line 1 column 86673 (char 86672)

Root Cause

Race condition in FIFO-based IPC between deployer subprocess and parent process:

  1. Writer side: Subprocess writes JSON to FIFO, but Python's buffered I/O may not flush immediately
  2. Reader side: Parent process reads from FIFO in non-blocking mode
  3. Race: When subprocess exits quickly after close(), reader detects process exit and breaks on empty read
  4. Problem: OS kernel may still have buffered data in pipe that hasn't been delivered yet
  5. Result: Truncated JSON at arbitrary positions (~86KB in the error case)

Solution

Changed read_from_fifo_when_ready() to use a hybrid approach:

  1. Start in non-blocking mode (existing behavior)
  • Use select.poll() to wait for data
  • Can detect subprocess failures early
  • Can timeout if subprocess hangs
  1. Switch to blocking mode once first data arrives
  • Use fcntl() to remove O_NONBLOCK flag
  • Continue with blocking read() calls
  • POSIX guarantee: Blocking read() returns EOF (0 bytes) ONLY after writer closes AND all kernel pipe buffers are drained

@nflx-mf-bot
Copy link
Collaborator

Netflix internal testing[1398] @ 7594e0f

# All data read, exit main loop
break
else:
if len(events):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does this happen? So we got some event (like file close?) and no data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants