Open-source data integration and analysis platform
OpenHEXA is an open-source data integration and data analysis platform developed by Bluesquare.
Its goal is to facilitate data integration and analysis workflows, in particular in the context of public health projects.
OpenHEXA allows you to:
- Create workspaces to group code, data and users
- Upload and read files from a shared filesystem
- Write and read to a PostgreSQL database
- Use Jupyter notebooks to explore and analyze data
- Run and schedule complex data workflows using data pipelines
- Manage your team members
Please note that this repository does not contain any code: it is a starting point for OpenHEXA users and implementers. Please refer to the technical architecture page of our wiki for more information about the different OpenHEXA components, including the links to the relevant GitHub repositories.
The OpenHEXA documentation lives in our wiki.
To get started, you might be interested in the following pages:
Feel free to reach out in the discussions section if you have questions or suggestions!
Requirements:
- a least Docker 26.1
- Debian bookworm
- Debian packages
gettext-base,postgresql(16+),postgresql-<postgresql version>-postgis-3,duplicity(optional to manage backup and restore) - yq
- Host port
3100available for the bundled Forgejo Git server (override withFORGEJO_PORT)
After having cloned this repo and change your current dir to it, you can check your installation by running first
./script/setup.sh checkIt'll tell you that the .env is missing, that is expected as it's the next
step.
Then, you need to setup the environment and the database. To do so execute the following command
./script/setup.sh allThis will generate a file in the working directory: .env (ee below to
know more about the configuration properties).
Then you can prepare the database and environment with
./script/openhexa.sh prepareImportant
The prepare command will create an initial superuser using the credentials in .env (DJANGO_SUPERUSER_USERNAME / DJANGO_SUPERUSER_PASSWORD). On a fresh install, setup.sh auto-generates a random DJANGO_SUPERUSER_PASSWORD. Check .env to retrieve it for the first login, or edit it before running prepare if you want to set your own.
Finally, you can run openhexa with
./script/openhexa.sh startTo stop, execute
./script/openhexa.sh stopIf you need to purge the configuration and the database after having stopped it, you can do it by executing the following command
./script/setup.sh purgeOnce installed, it could be interesting to make sure you have the last version. You can update openhexa with
./script/openhexa.sh updateTo release and build the Debian package, you need to run on a Debian like Linux distribution
and the following packages are required: devscripts, debhelper,
build-essential. To install them, run the following command:
sudo apt install devscripts debhelper build-essentialNotice this requires super user right (that's what sudo gives you).
If you are not on a debian based distribution, you can use the Dockerfile.build to build a debian container that will do the job for you.
docker build --platform linux/amd64 -t openhexa-build -f Dockerfile.build .
docker run -it -v $(pwd):/work openhexa-buildYou can then follow the instructions below to build the package as usual.
The versions are described into the changelog file. The last
one is unreleased and is the one that is published. To manage versions and
changelog, we use the debhelper tool dch.
To add a new change, do:
EMAIL="firstname lastname <[email protected]>" dch -aThis will open your favorite editor so you can edit the changelog. Save, commit, push, and GitHub Actions will do the rest.
To release a version, do:
EMAIL="firstname lastname <[email protected]>" dch -rD stableTo add a new unreleased version do
EMAIL="firstname lastname <[email protected]>" dch -i -D UNRELEASED -UWhen all the requirements are met, run the following script to build the package:
./script/build.shThe script will check the requirements. Notice that it works with your Git working copy, and all your stage need to be clean. So, if you have any changes, commit or stash them before running the script.
The resulting package is available in the parent directory:
../openhexa_1.0-1_amd64.deb.
Requirements:
- a least Docker 26.1
- Debian bookworm
- Systemd
- yq
- PostgreSQL 16+ (required by OpenHEXA 4.1.0+)
- Host port
3100free for the bundled Forgejo Git server (override withFORGEJO_PORTin/etc/openhexa/env.conf)
First of all, you need to add our APT repository and GPG public key:
curl -fsSL https://raw.githubusercontent.com/blsq/openhexa/refs/heads/main/pubkey.gpg | sudo gpg --yes --dearmor --output /usr/share/keyrings/openhexa.gpg
echo "deb [signed-by=/usr/share/keyrings/openhexa.gpg] https://viz.bluesquare.org/openhexa/ bookworm main" | sudo tee /etc/apt/sources.list.d/openhexa.listMake sure your locales are correctly set with locale. A common setup is
# Set locale
sudo tee -a /etc/default/locale > /dev/null <<EOF
LC_ALL=C.UTF-8
LC_TYPE=C.UTF-8
LC_MESSAGE=C.UTF-8
LC_COLLATE=C.UTF-8
EOF
source /etc/default/locale
Then, you can update your APT database and install openhexa
sudo apt update
sudo apt install openhexaIf you want to manage backup and retore through our script, you can install it
with recommended packages sudo apt install --install-recommends openhexa.
If you have Systemd, OpenHexa is run as a Systemd service openhexa (that you
can then manage with systemctl). If you don't use Systemd, you can still run
the service by running /usr/share/openhexa/openhexa -g start.
When installed, the Systemd service OpenHexa is started. If you need to get its
status, stop it, restart it, or start it, you can do it with systemctl.
A command is also installed to ease the interaction with OpenHexa:
/usr/share/openhexa/openhexa.sh. To get its usage documentation, run:
/usr/share/openhexa/openhexa.sh helpIf you want to interact with an OpenHexa installed globally on the system,
you'll have to use the option -g, or it'll try to interact with the version
in your current directory. For instance, to get its status, you can execute:
/usr/share/openhexa/openhexa.sh -g statusThe installation will also sets up the environment, especially the PostgreSQL
database. The configuration is stored in the file /etc/openhexa/env.conf
(see below for more information about the configuration properties). If you
need to change or add, you can directly change this file, then restarts
OpenHexa with sudo systemctl restart openhexa.
If you need to set it up again, check the installation, or purge the environment
(database and configuration), you can use the tool
/usr/share/openhexa/setup.sh. To get its usage documentation, run:
/usr/share/openhexa/setup.sh helpDuring the setup, the following is done on the PostgreSQL side:
- create 2 databases
hexa-app, andhexa-hub. The first one is used by the OpenHexa app, the second to manage the notebooks. - create 1 superuser
hexa-app, owner ofhexa-app. - create 1 superuser
hexa-hub, owner ofhexa-hub. - make PostgreSQL listens on the Docker gateway IP address.
- authorize all users to connect to
hexa-appfrom the entire Docker subnetwork with encrypted password authentication. - authorize
hexa-hubto connect tohexa-hubfrom the entier Docker subnetwork with encrypted password authentication.
You can manage your backup and restore directly with OpenHexa. It backs up:
- a
pg_dumpallof the PostgreSQL cluster (covers thehexa-appand - the workspace files at
WORKSPACE_STORAGE_LOCATION,hexa-hubdatabases), - the Forgejo data directory at
FORGEJO_STORAGE_LOCATION(git repositories for static webapps plus Forgejo's SQLite metadata database), - a snapshot of
.env(so the encryption keys needed to read the restored database are kept alongside the data).
This relies on the tool duplicity. Make sure that it is installed if you
haven't installed it yet (if you install OpenHexa with apt, do it with the
recommended packages).
First, you need to set it up:
/usr/share/openhexa/setup.sh backup file:///mylocaldirectory/where/to/do/thebackup/ encryption_passkeyThe target directory will contain two duplicity backends side by side:
<LOCATION>/workspaces and <LOCATION>/forgejo.
Depending on the user activities, it might be a good idea to stop the service or simply redirect the website to a maintenance HTML page.
Once configured, the following commands are available:
| Command | Description |
|---|---|
/usr/share/openhexa/openhexa.sh backup |
Back up the PostgreSQL cluster, workspace files, Forgejo data and .env snapshot. |
/usr/share/openhexa/openhexa.sh backup-status |
Show the duplicity collection-status for both the workspaces and forgejo backends. |
/usr/share/openhexa/openhexa.sh restore |
Restore the latest backup. This requires stopping the services before a full restore. |
After a restore, an openhexa-env.bak file is left next to the workspace data:
compare it with the live .env to make sure ENCRYPTION_KEY, SECRET_KEY and
the JupyterHub/Forgejo secrets match the restored database.
restore replays a pg_dumpall produced without --clean, so it expects an empty target cluster (e.g. a fresh install). If the application databases or roles already exist, the CREATE DATABASE / CREATE ROLE statements will fail, leaving the live data effectively untouched.
To restore on top of an existing setup, drop the application objects manually before running restore. Stop the services first so nothing holds open
connections:
# 1. Stop everything that talks to PostgreSQL.
/usr/share/openhexa/openhexa.sh stop
# 2. Drop the OpenHexa databases and roles as the postgres superuser. Replace
# the database/role names below with whatever your `.env` defines (typically
# DATABASE_NAME, JUPYTERHUB_DATABASE_NAME, plus any per-workspace databases
# matching `[a-z0-9]{16}` that you can list with `\l` in psql).
sudo -u postgres psql -p "$DATABASE_PORT" <<'SQL'
DROP DATABASE IF EXISTS "hexa-app";
DROP DATABASE IF EXISTS "hexa-hub";
-- repeat DROP DATABASE for every workspace database
DROP ROLE IF EXISTS "hexa-app";
DROP ROLE IF EXISTS "hexa-hub";
-- repeat DROP ROLE for every workspace role
SQL
# 3. Now run the restore.
/usr/share/openhexa/openhexa.sh restoreBackups taken before the Forgejo upgrade used a single duplicity backend at
<LOCATION> (no workspaces / forgejo sub-prefix) and did not include a
Forgejo data directory or an .env snapshot. openhexa.sh restore won't
recover them as-is — it expects both new sub-prefixes to exist. Restore them
by hand with duplicity:
# Stop the services first
sudo systemctl stop openhexa
# Restore the workspace tree (includes the legacy openhexa-dumpall.sql)
sudo -u openhexa PASSPHRASE='your-passphrase' duplicity restore \
file:///path/to/old/backup/ \
/var/lib/openhexa/workspaces
# Load the PostgreSQL dump
sudo -u postgres psql -f /var/lib/openhexa/workspaces/openhexa-dumpall.sql template1
# Forgejo had no data in the legacy layout: leave FORGEJO_STORAGE_LOCATION
# empty and let `openhexa.sh prepare` bootstrap a fresh Forgejo on next start.
sudo systemctl start openhexa
/usr/share/openhexa/openhexa.sh prepareLocally, we use Minio to manage the storage. It provides a AWS S3 compatible
API. To access to it, you need to provide a key Id and a secret:
WORKSPACE_STORAGE_ENGINE_AWS_ACCESS_KEY_ID and
WORKSPACE_STORAGE_ENGINE_AWS_SECRET_ACCESS_KEY.
Finally, we need the port number where the local PostgreSQL cluster listens:
DB_PORT
In order to be able to send mails to users, you have to provide the configuration options:
EMAIL_HOSTEMAIL_PORTEMAIL_HOST_USEREMAIL_USE_TLSEMAIL_USE_SSLEMAIL_HOST_PASSWORDDEFAULT_FROM_EMAIL
The workspace.db proxy host doesn't work on local installations of OpenHEXA.
You can override it by setting this ENV variable to the local IP of the server:
OVERRIDE_WORKSPACES_DATABASE_HOST="<LOCAL-IP>"
Since OpenHEXA 5.0.0, the Static Webapps feature is backed by a Forgejo Git
server that runs as a sibling container. The package ships a forgejo
service (image codeberg.org/forgejo/forgejo:14) and a custom entrypoint
at /usr/share/openhexa/forgejo/entrypoint.sh that creates the admin user
on first boot.
The relevant configuration properties:
GIT_SERVER_ADMIN_USERNAME(defaultopenhexa-admin)GIT_SERVER_ADMIN_PASSWORD: auto-generated bysetup.shon first installFORGEJO_PORT(default3100): host port mapped to the Forgejo UI
The Django backend talks to Forgejo over the internal Docker network at
http://forgejo:3000. This is set in compose.yml and does not require
configuration. Forgejo's data lives in the named Docker volume
forgejo_data and is preserved across update/restart.
Set WEBAPPS_DOMAIN=webapps.example.com to serve each public webapp from
its own subdomain (e.g. app1.webapps.example.com). This requires a
wildcard DNS record pointing at this host. Leave the variable empty to keep
webapps on the main backend host.
For custom-domain webapps, list each domain in ADDITIONAL_ALLOWED_HOSTS
and attach it to the corresponding Webapp via the Django admin.
The 5.x series introduces Forgejo as a hard dependency. To upgrade an existing 4.6.0 installation:
sudo systemctl stop openhexa
sudo apt update && sudo apt install --only-upgrade openhexa
# Pull the new app/frontend images and the Forgejo image:
sudo /usr/share/openhexa/openhexa.sh -g update
# Run migrations and bootstrap the Git server admin user:
sudo /usr/share/openhexa/openhexa.sh -g prepare
sudo systemctl start openhexaThe package post-install hook runs update and prepare automatically when
installing for the first time, but on upgrades you should re-run them
explicitly to apply Django migrations introduced between 4.6.0 and 5.6.2
(custom webapp domains, AI agent tables, scheduled-run version selection,
read-only table protection).
The new GIT_SERVER_ADMIN_PASSWORD is generated only when .env does not
yet exist. On an in-place upgrade, your existing .env will not contain
this variable and you should add these env variables manually:
GIT_SERVER_ADMIN_USERNAME=openhexa-admin
GIT_SERVER_ADMIN_PASSWORD=something-secureTo test if OpenHexa has been correctly installed, you can run smoke tests that will check minimum operation. To learn how to do so, please read its dedicated README.
We use Github Actions to automate the package building and its tests. If you
want to run our workflows locally, you can use act
as it follows:
act --action-offline-mode pushWarning: Make sure to remove your local .env before running it as act copies your working copy rather than using the checking out action. When it
happens, it overrides other environment files that are provided to the compose
project, which is used to configure it (/etc/openhexa/env.conf).
Locally, we use Minio to manage the storage. It provides a AWS S3 compatible
API. To access to it, you need to provide a key Id and a secret:
WORKSPACE_STORAGE_ENGINE_AWS_ACCESS_KEY_ID and
WORKSPACE_STORAGE_ENGINE_AWS_SECRET_ACCESS_KEY.
Finally, we need the port number where the local PostgreSQL cluster listens:
DB_PORT
The following requires you the following:
- a machine with a public IP address,
- a domain name for which you manage the zone,
- the NGINX service,
Create a file /etc/nginx/sites-available/openhexa with the following content
(replace example.com with your domain name):
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
listen 80;
server_name example.com;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
location ~ ^/(?<root_path>hub|user)(?<path>/.*)? {
rewrite ^ /$root_path$path break;
proxy_pass http://localhost:8001;
# websocket headers
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Scheme $scheme;
proxy_buffering off;
}
location / {
proxy_pass http://localhost:3000;
}
}
Enable and check it:
sudo ln -s /etc/nginx/sites-available/openhexa /etc/nginx/sites-enabled/
sudo nginx -tYou need to update on OpenHexa config in /etc/openhexa/env.conf:
TRUST_FORWARDED_PROTO="false"
PROXY_HOSTNAME_AND_PORT=example.com
INTERNAL_BASE_URL=http://app:8000
FRONTEND_PORT=3000
JUPYTERHUB_PORT=8001Finally, restart NGINX and OpenHexa:
sudo systemctl restart openhexa nginxYou can browse now OpenHexa app at http://example.com.
Additionnaly, you need a certificate. The way it has been retrieved is up to the reader. For the rest, follow the same playbook, except to use the following config
in /etc/nginx/sites-available/openhexa:
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
listen 80;
server_name example.com;
return 301 https://$server_name$request_uri;
}
server {
listen 443 ssl;
server_name example.com;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
ssl_certificate /etc/ssl/certs/nginx-selfsigned.crt;
ssl_certificate_key /etc/ssl/private/nginx-selfsigned.key;
ssl_protocols TLSv1.3;
ssl_prefer_server_ciphers on;
ssl_ciphers EECDH+AESGCM:EDH+AESGCM;
ssl_ecdh_curve secp384r1;
ssl_session_timeout 10m;
ssl_session_cache shared:SSL:10m;
ssl_session_tickets off;
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload";
add_header X-Frame-Options SAMEORIGIN;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection "1; mode=block";
location ~ ^/(?<root_path>hub|user)(?<path>/.*)? {
rewrite ^ /$root_path$path break;
proxy_pass http://localhost:8001;
# websocket headers
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Scheme $scheme;
proxy_buffering off;
}
location / {
proxy_pass http://localhost:3000;
}
}
and in /etc/openhexa/env.conf
TRUST_FORWARDED_PROTO="true"
PROXY_HOSTNAME_AND_PORT=example.com
INTERNAL_BASE_URL=http://app:8000
FRONTEND_PORT=3000
JUPYTERHUB_PORT=8001