Skip to content

Fix truncation of large payloads#83

Merged
freekmurze merged 1 commit into
spatie:mainfrom
obelloc:fix-payload-truncation
Mar 27, 2026
Merged

Fix truncation of large payloads#83
freekmurze merged 1 commit into
spatie:mainfrom
obelloc:fix-payload-truncation

Conversation

@obelloc

@obelloc obelloc commented Mar 27, 2026

Copy link
Copy Markdown
Contributor

This PR aims to fix the issues #22 and #59, also addressing the concerns of discussion #65.

Problem

When a Task (forked child process) attempts to return a large payload, it gets truncated, which causes the parent process to either continue with a truncated payload (if the payload was of string type), or to fail while unserialize()'ing the payload (if it was an object/array).

Bug

When a child process calls the Connection::write method with its payload, it then calls socket_select to check if there is enough space in the kernel's output buffer for a following write operation not to block. In the first attempt, socket_select always succeeds immediately because the output buffer is always empty for a newly created socket. It then follows to call socket_write. At this point, if the payload is small, it fully fits in the kernel's buffer and everything is good, however, if the payload does not fit the buffer, the function will attempt to write the remaining payload, looping to call socket_select another time. This time though, the kernel's buffer is already full and socket_select has to wait for some free space to show up, but it can only wait for 100us, which is unrealistic, and therefore it times out, returns false, breaks the loop and returns from Connection::write with the payload partially written.

Such a short timeout is unrealistic for this use-case because the parent process, which reads from the connection and is therefore responsible for opening space in the kernel's output buffer, might actually be sleeping for 1ms, which means that while it sleeps, all the children time out.

Solution

The solution is to increase the timeout to a reasonable amount, so that the parent process has time to sleep and go through all the children, reading their connection and opening space in the child's output buffer each time. In this case, the timeout is not the time needed for the child to write its full payload, but rather the time needed for the parent process to read and open any space in the child's buffer, therefore, the timeout should only trigger if the parent process is blocked somewhere for a long time. The value itself is arbitrary but must be generous.

Another important change though is that the current timeout for the Connection class affects both read and write operations. However, the read operation does not need any timeout, or rather, the timeout should be zero (completely nonblocking), because it is only used by the parent process to continuously monitor/read from all the children in a loop. If a particular child has no payload yet, there is no reason to wait at all, it can go straight to the next child and, in the end, it will sleep for 1ms, where it actually waits for all the children to produce payload instead of waiting for each one individually.

This means the timeout only affects write operations and, therefore, can be configured to a very generous value without affecting the performance of the program at all. It should only trigger if the program is already broken somewhere else. Removing the timeout completely would be another option, but I tried to make this PR as small as possible.

I hope this makes sense.

@freekmurze freekmurze merged commit 88436cc into spatie:main Mar 27, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants