Skip to content

gVisor integration #5

Description

@wrr

Integrate Drop with gVisor as an optional isolation layer between sandboxed apps and the host kernel.

Drop config and the drop run command will support a new isolation option with two values:

  • native: use Linux namespaces alone for sandboxed process isolation, this is the currently supported mode.
  • gvisor: in addition to the Linux namespaces, use gVisor userspace kernel for added isolation.

I have an early prototype which demonstrates that this integration should be technically feasible. The following approach looks the most straightforward with smallest gVisor-specific footprint:

  • Drop continues to create and set up the user namespace with all the mounts. This is described as the 'Method 2: Caller-Configured Userns' in this document: https://gvisor.dev/docs/user_guide/rootless/ and is the gVisor integration method used by Docker. This allows to reuse all the existing filesystem setup logic and work around gVisor's lack of support for overlayfs. Drop will do the required overlayfs mounts, and gVisor will work with already assembled directory trees.
  • gVisor uses host network for connectivity, which in this case is a separate network namespace setup by Drop with pasta. This allows to reuse all the network namespace setup and share networking configuration between gVisor and native paths.
  • The communication protocol between Drop parent and child processes needs to be updated to support passing the pseudo terminal file descriptor via a Unix named socket. gVisor doesn't support passing the descriptor via an unnamed socket pair as it is currently done by Drop.

For now we have encountered the following non-blocking issues:

  • gVisor when run within a user namespace created by an external caller doesn't support running with UID other than 0. This can be worked around by configuring gVisor to create an additional, nested user namespace.
  • gVisor doesn't create relevant pseudoterminal entries in /dev/pts dir, such as /dev/pts/0. The pseudoterminal works correctly, but tty command outputs not a tty and ps aux outputs ? as the terminal. This is non-fatal compatibility and UX issue. It will also cause Drop end-to-end tests to fail when testing gvisor based sandbox, so we will need some way to disable certain end-to-end tests when gVisor-based isolation is used.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions