Skip to content

Improvements#5

Merged
paolobarbolini merged 12 commits into
mainfrom
fix/review-findings
Jun 10, 2026
Merged

Improvements#5
paolobarbolini merged 12 commits into
mainfrom
fix/review-findings

Conversation

@paolobarbolini

Copy link
Copy Markdown
Member

No description provided.

MeasureManyStream::poll_next_icmp_replies matched Poll::Ready(_), so a
send that failed synchronously (e.g. EACCES for broadcast, ENETUNREACH
when there is no route) was recorded in in_flight even though no reply
can ever arrive. Since the stream only terminates once in_flight is
empty, a single failed send made the stream hang forever.

Only track an address as in-flight when the send succeeded; failed
addresses are consumed and simply yield no measurement.
The background receive task held Arc<InnerPinger>, which contained the
only sender of its own subscription channel. The channel could therefore
never disconnect, so the task never exited and the ICMP socket was never
closed: every Pinger created and then dropped leaked a task and a file
descriptor for the lifetime of the runtime.

Move the sender out of InnerPinger into Pinger itself. Dropping the
Pinger now closes the channel, the task observes the disconnect, returns
and releases the socket.
The receive task ignored Poll::Ready(Err(..)) from the socket. In that
case the socket readiness had already been consumed without registering
a waker, so the task parked with only the subscription channel waker
armed: reply processing silently stopped until the next subscribe or
unsubscribe message happened to wake the task.

Handle the error arm explicitly by scheduling an immediate re-poll, so
the task retries the socket instead of stalling.
RawPinger::poll_recv unwrapped the conversion of the reply's source
address to the pinger's IP version. The DGRAM ICMP socket should never
deliver a packet of the wrong family, but if it ever did, the unwrap
would kill Pinger's background receive task, and from then on every
measure_many call would panic with "Receiver closed".

Treat such packets like any other invalid packet: skip them and poll
the socket again.
tokio's AsyncFd only remembers the waker of the most recent poll per
direction. All MeasureManyStreams of a Pinger share one socket, so when
two streams hit a full send buffer concurrently, only the last one to
poll would be woken on writability; the other - with no pings in flight
yet and thus no other wake source - could stall forever.

Don't rely on the write-readiness waker from the stream: schedule an
immediate re-poll instead. A full ICMP send buffer drains quickly, so
the retry loop is short-lived.
Rounds were identified solely by their u16 wire sequence number, which
wraps after 65536 measure_many calls. A long-lived stream dropped after
the wraparound would unsubscribe whichever newer round had taken over
its sequence number, leaving that round unable to ever receive replies.

Give every round a unique u64 id (the wire sequence number is its lower
16 bits) and only honor an unsubscribe if the slot still belongs to the
unsubscribing round. Also relax the counter ordering from AcqRel to
Relaxed: it is a pure id allocator, no memory is synchronized through
it.
in_flight is keyed by address and replies are matched by source address,
so pinging an address that is still awaiting its reply could never yield
a second measurement; it only clobbered the first ping's start time,
corrupting the reported RTT.

Skip addresses that already have a ping in flight and document the
single-measurement-per-address semantics on measure_many.
Replies were matched purely by sequence number and source address; the
echoed payload was never checked. A late reply to an older round whose
sequence number collided after wraparound (or blindly spoofed
cross-traffic) could be attributed to the current round, producing a
bogus RTT for an address that never actually answered.

Echo replies mirror the request payload, so give each round a single
random payload, have the receive task discard replies that don't carry
it, and as a bonus build the request packet (checksum included) once
per round instead of once per address.
Pinger generated a random identifier and stamped it into every echo
request, but on Linux ping sockets the kernel overwrites the identifier
with the socket's own and uses it to route replies, so the random value
never appeared on the wire and was never checked on receive. It only
suggested an isolation mechanism that doesn't exist at this layer.

Remove it and document the kernel behavior on EchoRequestPacket::new
and EchoReplyPacket::identifier.
The checksum and packet layout code was hand-written when pnet_packet
was removed, but had no unit tests. Regressions there are invisible to
the integration tests because the kernel recomputes the checksum (and
rewrites the identifier) when sending through ping sockets.

Cover the RFC 1071 checksum with reference vectors (known IPv4 header,
empty input, carry folding, odd-length padding) and assert the exact
byte layout of generated v4/v6 echo requests, including the
sums-to-zero property of a correctly checksummed packet.
- README claimed the crate uses RAW sockets; it uses unprivileged
  SOCK_DGRAM ICMP sockets (lib.rs already said so).
- Fix "altough" and "Leaking this method might crate a slowly forever
  growing memory leak" (the leak is also bounded by the pinger's
  lifetime now that dropping it shuts the receive task down).
- Remove a stale SAFETY comment on safe code in RecvFuture and point
  the real one in Socket::poll_read at the right function.
- Document the panics of Pinger::new/measure_many (tokio runtime
  required / gone), the one-task-per-direction polling constraint of
  RawPinger, and the 2048-byte receive truncation.
MeasureManyStream and DualstackMeasureManyStream send pings lazily and
the send/recv futures are inert until polled, so silently dropping them
is almost certainly a bug; mark them must_use (the trait-level must_use
on Future doesn't apply to concrete named types).

Pinger is internally an Arc plus a channel sender, so implement Clone
for it (and derive it for DualstackPinger) instead of forcing users to
wrap pingers in another Arc. The background task keeps running until
the last clone is dropped.
@paolobarbolini paolobarbolini merged commit 8fde5dc into main Jun 10, 2026
6 checks passed
@paolobarbolini paolobarbolini deleted the fix/review-findings branch June 10, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant