Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions docs/advanced_installation/advanced_installation.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Advanced Installation
=====================

`pip <advanced_installation_pip.html>`__ \|\| `uv <advanced_installation_uv.html>`__ \|\| `pixi <advanced_installation_pixi.html>`__ \|\| `conda <advanced_installation_conda.html>`__ \|\| `Spack <advanced_installation_spack.html>`__
`pip <advanced_installation_pip.html>`__ || `uv <advanced_installation_uv.html>`__ || `pixi <advanced_installation_pixi.html>`__ || `conda <advanced_installation_conda.html>`__ || `Spack <advanced_installation_spack.html>`__

libEnsemble can be installed from ``pip``, ``uv``, ``pixi``, ``Conda``, or ``Spack``.

Expand Down Expand Up @@ -31,7 +31,18 @@ Further recommendations for selected HPC systems are given in the
Globus Compute
--------------

`Globus Compute`_ may be installed optionally to submit simulation function instances to remote Globus Compute endpoints.
`Globus Compute`_ may be installed optionally to submit simulation function
instances to remote Globus Compute endpoints::

pip install globus-compute-sdk

This is an optional dependency; libEnsemble operates normally without it.
If Globus Compute is not installed and a ``globus_compute_endpoint`` is
configured, libEnsemble will warn and fall back to local execution.

See :ref:`Globus Compute - Remote User Functions<globus_compute_ref>` for
usage, and the :doc:`GlobusComputeExecutor API reference</executor/ex_globus_compute>`
for the full executor interface.

.. _Globus Compute: https://www.globus.org/compute
.. _Python: http://www.python.org
Expand Down
6 changes: 3 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def __getattr__(cls, name):
return MagicMock()


autodoc_mock_imports = ["ax", "gpcam", "IPython", "matplotlib", "pandas", "scipy", "surmise"]
autodoc_mock_imports = ["ax", "globus_compute_sdk", "gpcam", "IPython", "matplotlib", "pandas", "scipy", "surmise"]

MOCK_MODULES = [
"argparse",
Expand Down Expand Up @@ -135,7 +135,7 @@ class AxParameterWarning(Warning): # Ensure it's a real warning subclass
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
# source_suffix = ['.rst', '.md']
# source_suffix = ['.md', '.rst']
source_suffix = ".rst"

# The master toctree document.
Expand Down Expand Up @@ -205,7 +205,7 @@ class AxParameterWarning(Warning): # Ensure it's a real warning subclass
html_favicon = "./images/libE_logo_circle.png"
html_title = "libEnsemble"

# Theme options are theme-specific and customize the look and feel of a theme
# Theme options are theme-specific and customize the look and feel of the theme
# further. For a list of options available for each theme, see the
# documentation.
#
Expand Down
57 changes: 57 additions & 0 deletions docs/executor/ex_globus_compute.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
Globus Compute Executor
=======================

`Overview <ex_overview.html>`__ || `Base Executor <ex_base.html>`__ || `MPI Executor <ex_mpi.html>`__ || **Globus Compute Executor**

The :class:`GlobusComputeExecutor<libensemble.executors.globus_compute_executor.GlobusComputeExecutor>`
submits Python callables to a remote `Globus Compute`_ endpoint instead of
launching local subprocesses. It can be used inside simulator functions in the
same way as the :doc:`MPI Executor<ex_mpi>`, retrieving it from
``libE_info["executor"]``.

See :ref:`Globus Compute - Remote User Functions<globus_compute_ref>` for an
overview of the two GC integration modes (manager-side GC-only and user-facing
executor).

.. note::

``globus-compute-sdk`` must be installed to use this executor::

pip install globus-compute-sdk

Users must also authenticate via Globus_ and have an active
`Globus Compute endpoint`_ running on the target system.

GlobusComputeExecutor
---------------------

.. autoclass:: libensemble.executors.globus_compute_executor.GlobusComputeExecutor
:members: register_app, submit, set_workerID, set_worker_info
:show-inheritance:

.. automethod:: __init__

GlobusComputeTask
-----------------

Tasks are created and returned by
:meth:`GlobusComputeExecutor.submit()<libensemble.executors.globus_compute_executor.GlobusComputeExecutor.submit>`.
Each task wraps a ``concurrent.futures.Future`` from the Globus Compute SDK
and exposes the same polling interface as other libEnsemble tasks.

.. autoclass:: libensemble.executors.globus_compute_executor.GlobusComputeTask
:members: poll, wait, kill, result, running, done, cancelled

**Task states**: ``RUNNING`` | ``FINISHED`` | ``FAILED`` | ``USER_KILLED``

**Key attributes**:

:task.state: (string) Current task state - one of the values above.
:task.finished: (bool) True once the task has completed (successfully or not).
:task.success: (bool) True if the remote callable returned without raising.
:task.runtime: (float) Elapsed wall-clock seconds since submission.
:task.submit_time: (float) Time since epoch at submission.

.. _Globus Compute: https://www.globus.org/compute
.. _Globus: https://www.globus.org/
.. _Globus Compute endpoint: https://globus-compute.readthedocs.io/en/latest/endpoints.html
8 changes: 6 additions & 2 deletions docs/executor/ex_index.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _executor_index:

**Overview** \|\| `Base Executor <ex_base.html>`__ \|\| `MPI Executor <ex_mpi.html>`__
**Overview** || `Base Executor <ex_base.html>`__ || `MPI Executor <ex_mpi.html>`__ || `Globus Compute Executor <ex_globus_compute.html>`__

Executors
=========
Expand All @@ -14,8 +14,12 @@ portable interface for running and managing user applications.
ex_overview
ex_base
ex_mpi
ex_globus_compute

The **Executor** provides a portable interface for running applications on any system and
any number of compute resources.
any number of compute resources. The :doc:`MPI Executor<ex_mpi>` launches MPI
applications on local resources; the
:doc:`Globus Compute Executor<ex_globus_compute>` submits Python callables to
remote Globus Compute endpoints.

Please select from the sections above or the sidebar navigation to read more.
17 changes: 16 additions & 1 deletion docs/executor/ex_overview.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Overview
========

**Overview** \|\| `Base Executor <ex_base.html>`__ \|\| `MPI Executor <ex_mpi.html>`__
**Overview** || `Base Executor <ex_base.html>`__ || `MPI Executor <ex_mpi.html>`__

The **Executor** provides a portable interface for running applications on any system and
any number of compute resources.
Expand Down Expand Up @@ -156,4 +156,19 @@ which partitions resources among workers, ensuring that runs utilize different
resources (e.g., nodes). Furthermore, the ``MPIExecutor`` offers resilience via the
feature of re-launching tasks that fail to start because of system factors.

Remote Execution with Globus Compute
-------------------------------------

The :doc:`GlobusComputeExecutor<ex_globus_compute>` submits Python callables
to remote `Globus Compute`_ endpoints instead of launching local subprocesses.
It exposes the same ``submit()`` / ``poll()`` / ``kill()`` interface as other
libEnsemble executors and can be retrieved from ``libE_info["executor"]``
inside simulator functions.

See :ref:`Globus Compute - Remote User Functions<globus_compute_ref>` for an
overview of all Globus Compute integration modes and the
:doc:`GlobusComputeExecutor API reference<ex_globus_compute>` for the full
interface.

.. _concurrent futures: https://docs.python.org/library/concurrent.futures.html
.. _Globus Compute: https://www.globus.org/compute
145 changes: 145 additions & 0 deletions docs/platforms/globus_compute.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
.. _globus_compute_ref:

======================================
Globus Compute - Remote User Functions
======================================

`Globus Compute`_ (formerly funcX) is a distributed, high-performance
function-as-a-service platform. When libEnsemble is running on a resource with
internet access (laptops, login nodes, other servers, etc.), it can offload
simulator calls to remote Globus Compute endpoints:

.. image:: ../images/funcxmodel.png
:alt: running_with_globus_compute
:scale: 50
:align: center

This is useful for running ensembles across machines and heterogeneous resources.
There are **two approaches**, described below.

.. dropdown:: **Caveats**

The following caveats apply to all Globus Compute modes:

1. Simulator functions submitted to Globus Compute must be non-persistent,
since manager-worker communicators cannot be serialized or used by a
remote resource.

2. ``Executor.manager_poll()`` is not available inside remotely executed
functions. Control over remote work is limited to inspecting return
values and exceptions when tasks complete.

3. Globus Compute imposes a `handful of task-rate and data limits`_ on
submitted functions.

4. Users are responsible for authenticating via Globus_ and maintaining their
`Globus Compute endpoints`_ on their target systems.

.. _gc_only_mode:

Manager-side GC (GC-only mode)
-------------------------------

The recommended approach for most use cases. When
``globus_compute_endpoint`` is set in :class:`SimSpecs<libensemble.specs.SimSpecs>`
and ``gen_on_worker`` is not set (the default), libEnsemble enters
**GC-only mode**: no local worker processes are launched. The manager
submits simulation work directly to Globus Compute and polls futures for
results. The generator still runs as a local thread on the manager.

``nworkers`` controls the maximum number of simultaneously in-flight
Globus Compute tasks (virtual concurrency). The default is 1.

This mode supports both the :ref:`gest-api simulator format<datastruct-sim-specs>`
(``SimSpecs.simulator``) and the legacy ``sim_f`` format.

.. code-block:: python

from libensemble import Ensemble
from libensemble.specs import ExitCriteria, GenSpecs, LibeSpecs, SimSpecs


def my_sim(input_dict: dict, **kwargs) -> dict:
"""gest-api simulator - runs remotely on the GC endpoint."""
return {"f": input_dict["x"] ** 2}


sim_specs = SimSpecs(
simulator=my_sim,
vocs=vocs,
globus_compute_endpoint="3af6dc24-3f27-4c49-8d11-e301ade15353",
)

libE_specs = LibeSpecs(nworkers=4) # up to 4 concurrent GC tasks

workflow = Ensemble(
sim_specs=sim_specs,
gen_specs=gen_specs,
libE_specs=libE_specs,
exit_criteria=ExitCriteria(sim_max=20),
)
H, _, _ = workflow.run()

Users can also define ``Executor`` instances within their remote simulator
functions and submit MPI applications normally, as long as libEnsemble and
the target application are accessible on the remote system::

# Within the remote simulator function
from libensemble.executors import MPIExecutor
exctr = MPIExecutor()
exctr.register_app(full_path="/home/user/forces.x", app_name="forces")
task = exctr.submit(app_name="forces", num_procs=64)

.. note::

Both the simulator callable and any VOCS object must be picklable,
as they are serialized and shipped to the remote Globus Compute endpoint.

.. _gc_executor_approach:

GlobusComputeExecutor (user-facing)
------------------------------------

For workflows where the simulation function itself orchestrates remote
calls, like fanning out to multiple endpoints or mixing local
and remote work. Use the
:class:`GlobusComputeExecutor<libensemble.executors.globus_compute_executor.GlobusComputeExecutor>`
directly inside the simulator.

Create and register the executor in the top-level script:

.. code-block:: python

from libensemble.executors import GlobusComputeExecutor

exctr = GlobusComputeExecutor(endpoint_id="3af6dc24-3f27-4c49-8d11-e301ade15353")

Then use it inside the simulator function:

.. code-block:: python

import time


def my_sim(H, persis_info, sim_specs, libE_info):
exctr = libE_info["executor"]

task = exctr.submit(func=my_remote_func, app_args=H["x"][0])

while not task.finished:
task.poll()
if exctr.manager_kill_received():
task.kill()
break
time.sleep(0.1)

return H_o, persis_info

See the :doc:`GlobusComputeExecutor API reference<../executor/ex_globus_compute>` for
the full interface including ``register_app``, ``submit``, and
:class:`GlobusComputeTask<libensemble.executors.globus_compute_executor.GlobusComputeTask>` methods.

.. _Globus Compute: https://www.globus.org/compute
.. _Globus Compute endpoints: https://globus-compute.readthedocs.io/en/latest/endpoints.html
.. _Globus: https://www.globus.org/
.. _handful of task-rate and data limits: https://globus-compute.readthedocs.io/en/latest/limits.html
63 changes: 4 additions & 59 deletions docs/platforms/platforms_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -159,60 +159,9 @@ will better manage simulation and generation functions that contain considerable
computational work or I/O. Therefore the second option is to use Globus Compute
to isolate this work from the workers.

.. _globus_compute_ref:

Globus Compute - Remote User Functions
--------------------------------------

If libEnsemble is running on some resource with
internet access (laptops, login nodes, other servers, etc.), workers can be instructed to
launch generator or simulator user function instances to separate resources from
themselves via `Globus Compute`_ (formerly funcX), a distributed, high-performance function-as-a-service platform:

.. image:: ../images/funcxmodel.png
:alt: running_with_globus_compute
:scale: 50
:align: center

This is useful for running ensembles across machines and heterogeneous resources, but
comes with several caveats:

1. User functions registered with Globus Compute must be *non-persistent*, since
manager-worker communicators can't be serialized or used by a remote resource.

2. Likewise, the ``Executor.manager_poll()`` capability is disabled. The only
available control over remote functions by workers is processing return values
or exceptions when they complete.

3. Globus Compute imposes a `handful of task-rate and data limits`_ on submitted functions.

4. Users are responsible for authenticating via Globus_ and maintaining their
`Globus Compute endpoints`_ on their target systems.

Users can still define Executor instances within their user functions and submit
MPI applications normally, as long as libEnsemble and the target application are
accessible on the remote system::

# Within remote user function
from libensemble.executors import MPIExecutor
exctr = MPIExecutor()
exctr.register_app(full_path="/home/user/forces.x", app_name="forces")
task = exctr.submit(app_name="forces", num_procs=64)

Specify a Globus Compute endpoint in :class:`sim_specs<libensemble.specs.SimSpecs>` via the ``globus_compute_endpoint``
argument. For example::

from libensemble.specs import SimSpecs

sim_specs = SimSpecs(
sim_f = sim_f,
inputs = ["x"],
out = [("f", float)],
globus_compute_endpoint = "3af6dc24-3f27-4c49-8d11-e301ade15353",
)

See the ``libensemble/tests/scaling_tests/globus_compute_forces`` directory for a complete
remote-simulation example.
See :doc:`Globus Compute - Remote User Functions<globus_compute>` for the two
integration approaches (manager-side GC-only mode and the user-facing
``GlobusComputeExecutor``).

Instructions for Specific Platforms
-----------------------------------
Expand All @@ -231,9 +180,5 @@ libEnsemble on specific HPC systems.
perlmutter
polaris
srun
globus_compute
example_scripts

.. _Globus Compute: https://www.globus.org/compute
.. _Globus Compute endpoints: https://globus-compute.readthedocs.io/en/latest/endpoints.html
.. _Globus: https://www.globus.org/
.. _handful of task-rate and data limits: https://globus-compute.readthedocs.io/en/latest/limits.html
6 changes: 3 additions & 3 deletions docs/running_libE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,9 +83,9 @@ if using an :class:`Ensemble<libensemble.ensemble.Ensemble>` object with
**Reverse-ssh interface**

Set ``comms`` to ``ssh`` to launch workers on remote ssh-accessible systems. This
co-locates workers, functions, and any applications. User
functions can also be persistent, unlike when launching remote functions via
:ref:`Globus Compute<globus_compute_ref>`.
co-locates workers, functions, and any applications. Simulator functions can be
persistent, unlike those submitted to :ref:`Globus Compute<globus_compute_ref>`,
which must be non-persistent.

The remote working directory and Python need to be specified. This may resemble::

Expand Down
Loading
Loading