Standalone Cloudera-compatible Hive Metastore for local development, implemented in Kotlin and aligned to the Cloudera Spark stack.
Add the dependency:
dependencies {
testImplementation("org.openprojectx.cloudera.hms:junit5:<version>")
}Use ClouderaHiveMetastoreTest.kt:
@ClouderaHiveMetastoreTest(
databaseType = "postgresql",
postgresImage = "postgres:14",
schemaSqlPath = "/hive-schema-3.1.3000.postgres.sql",
logLevel = "DEBUG",
)
class MyMetastoreTestSupported annotation attributes:
databaseType:postgresqlby default; usemariadbfor MariaDBpostgresImage: overrides the PostgreSQL Testcontainers imagemariadbImage: overrides the MariaDB Testcontainers image; defaults tomariadb:10.6.24-ubi9schemaSqlPath: accepts either a filesystem path or a classpath resource pathlogLevel: configures the generated HMS server Log4j 2 root level
For MariaDB-backed tests:
@ClouderaHiveMetastoreTest(
databaseType = "mariadb",
mariadbImage = "mariadb:10.6.24-ubi9",
schemaSqlPath = "/hive-schema-3.1.3000.mysql.sql",
)
class MyMariaDbMetastoreTestAdd the dependency:
dependencies {
testImplementation("org.openprojectx.cloudera.hms:testcontainers:<version>")
}Use the default PostgreSQL image:
val metastore = ClouderaHiveMetastoreContainer()
.withDatabaseType("postgresql")
.withDatabaseName("metastore_db")
.withDatabaseUser("hive")
.withDatabasePassword("hive-password")Use the MariaDB image:
val metastore = ClouderaHiveMetastoreContainer
.withImage("ghcr.io/openprojectx/cloudera-hms:latest-mariadb")
.withDatabaseType("mariadb")
.withDatabaseName("metastore_db")
.withDatabaseUser("hive")
.withDatabasePassword("hive-password")Use a custom image explicitly:
val metastore = ClouderaHiveMetastoreContainer.withImage("my-registry/cloudera-hms:test")
.withDatabaseType("postgresql")
.withDatabaseName("metastore_db")
.withDatabaseUser("hive")
.withDatabasePassword("hive-password")Or set CLOUDERA_HMS_TEST_IMAGE and keep using the default constructor. For MariaDB, point it at a -mariadb image tag and set HMS_DATABASE_TYPE through the wrapper:
export CLOUDERA_HMS_TEST_IMAGE=ghcr.io/openprojectx/cloudera-hms:latest-mariadbTypical JUnit 5 and Testcontainers usage:
import org.junit.jupiter.api.Test
import org.testcontainers.junit.jupiter.Container
import org.testcontainers.junit.jupiter.Testcontainers
@Testcontainers
class MyPostgreSqlMetastoreContainerTest {
@Container
private val metastore = ClouderaHiveMetastoreContainer()
.withDatabaseType("postgresql")
.withDatabaseName("metastore_db")
.withDatabaseUser("hive")
.withDatabasePassword("hive-password")
@Test
fun testMetastore() {
val thriftUri = metastore.thriftUri()
// Create a HiveMetaStoreClient or your application client here.
}
}MariaDB JUnit 5 and Testcontainers usage:
import org.junit.jupiter.api.Test
import org.testcontainers.junit.jupiter.Container
import org.testcontainers.junit.jupiter.Testcontainers
@Testcontainers
class MyMariaDbMetastoreContainerTest {
@Container
private val metastore = ClouderaHiveMetastoreContainer
.withImage("ghcr.io/openprojectx/cloudera-hms:latest-mariadb")
.withDatabaseType("mariadb")
.withDatabaseName("metastore_db")
.withDatabaseUser("hive")
.withDatabasePassword("hive-password")
@Test
fun testMetastore() {
val thriftUri = metastore.thriftUri()
// Create a HiveMetaStoreClient or your application client here.
}
}Start a local PostgreSQL if you want to run the metastore against a local database:
docker compose up -dBuild the shaded runtime:
GRADLE_USER_HOME=/data/.gradle ./gradlew :runtime:shadowJarBuild both container image variants into the local Docker daemon:
GRADLE_USER_HOME=/data/.gradle ./gradlew :image:jibDockerBuildAllThe PostgreSQL image uses ghcr.io/openprojectx/postgres14-jdk17:latest as its base and keeps the existing tag, for example ghcr.io/openprojectx/cloudera-hms:<version>. The MariaDB image uses ghcr.io/openprojectx/mariadb10.6-jdk17:latest as its base and gets a -mariadb tag suffix, for example ghcr.io/openprojectx/cloudera-hms:<version>-mariadb.
Build only one variant when needed:
GRADLE_USER_HOME=/data/.gradle ./gradlew :image:jibDockerBuildPostgres
GRADLE_USER_HOME=/data/.gradle ./gradlew :image:jibDockerBuildMariadbRun the MariaDB image normally; it is built with HMS_DATABASE_TYPE=mariadb and defaults to /hive-schema-3.1.3000.mysql.sql, org.mariadb.jdbc.Driver, and a jdbc:mariadb://127.0.0.1:3306/metastore_db?useMysqlMetadata=true URL.
Run the PostgreSQL image with Docker:
docker run --rm \
--name cloudera-hms \
-p 9083:9083 \
-p 5432:5432 \
ghcr.io/openprojectx/cloudera-hms:latestRun the MariaDB image with Docker:
docker run --rm \
--name cloudera-hms-mariadb \
-p 9083:9083 \
-p 3306:3306 \
ghcr.io/openprojectx/cloudera-hms:latest-mariadbBoth images expose the Hive Metastore Thrift service on 9083. The PostgreSQL variant also exposes 5432; the MariaDB variant also exposes 3306.
Override the default database credentials when needed:
docker run --rm \
--name cloudera-hms \
-p 9083:9083 \
-e POSTGRES_DB=metastore_db \
-e POSTGRES_USER=hive \
-e POSTGRES_PASSWORD=hive-password \
ghcr.io/openprojectx/cloudera-hms:latestdocker run --rm \
--name cloudera-hms-mariadb \
-p 9083:9083 \
-e MARIADB_DATABASE=metastore_db \
-e MARIADB_USER=hive \
-e MARIADB_PASSWORD=hive-password \
ghcr.io/openprojectx/cloudera-hms:latest-mariadbFor contributor-oriented build commands, version alignment, and verification notes, see CONTRIBUTING.md.
core: metastore runtime, PostgreSQL/MariaDB schema bootstrap, and configuration helpersruntime: shaded standalone runtime jar for launching the metastoreimage: Jib-based container image assembly for a combined database plus Hive metastore runtimejunit5: annotation-driven JUnit 5 support that provisions PostgreSQL or MariaDB and starts the metastore for testshms-tck-core: reusable Java 11-compatible Hive metastore TCK contract and assertionshms-tck: Java 17 TCK implementations for the in-processcoreand shadedruntimeexecutionstestcontainers: JDK 11 Testcontainers wrapper for the built metastore imagespark: Spark-facing TCKs that validate Spark SQL and Iceberg against the metastore
The runtime expects these JVM system properties:
cloudera.hms.hostcloudera.hms.portcloudera.hms.database.typecloudera.hms.warehouse.dircloudera.hms.jdbc.urlcloudera.hms.jdbc.drivercloudera.hms.jdbc.usercloudera.hms.jdbc.password
Optional properties:
cloudera.hms.jdbc.drivercloudera.hms.initialize-schemacloudera.hms.schema.resourcecloudera.hms.schema.file
cloudera.hms.schema.file takes precedence when you want to supply your own schema SQL.
When starting the server through the Kotlin API, ClouderaHiveMetastoreConfig.kt also exposes:
extraConfigurationfor arbitrary Hive or Hadoop properties that need to exist inside the metastore JVMlogLevelfor generated HMS server logginglogConfigFilefor a complete custom Log4j 2 properties file
Default configuration is defined in ClouderaHiveMetastoreConfig.kt. Server bootstrap happens in HiveMetastoreServerMain.kt.
The junit5 module provides ClouderaHiveMetastoreTest.kt, which starts PostgreSQL or MariaDB plus a metastore process for a test class.
The testcontainers module wraps the built metastore image for integration tests on JDK 11+. The main entry point is ClouderaHiveMetastoreContainer.kt. The module also reuses the shared TCK from hms-tck-core for its own integration coverage.
The image module builds a runnable container image with Jib. The image expects a base image that already includes one supported database:
- PostgreSQL or MariaDB
- JDK 17
- the standard PostgreSQL or MariaDB container entrypoint at
/usr/local/bin/docker-entrypoint.sh
Build configuration is environment-variable driven:
CLOUDERA_HMS_BASE_IMAGECLOUDERA_HMS_IMAGECLOUDERA_HMS_IMAGE_TAGSCLOUDERA_HMS_IMAGE_VARIANT
Variant tasks are also available:
:image:jibAllpushes PostgreSQL and MariaDB variants.:image:jibDockerBuildAllbuilds PostgreSQL and MariaDB variants into Docker.:image:jibPostgres,:image:jibDockerBuildPostgres, and:image:jibBuildTarPostgresbuild only the PostgreSQL variant.:image:jibMariadb,:image:jibDockerBuildMariadb, and:image:jibBuildTarMariadbbuild only the MariaDB variant.
Runtime configuration is environment-variable driven. The image supports:
POSTGRES_DBPOSTGRES_USERPOSTGRES_PASSWORDPOSTGRES_PORTMARIADB_DATABASEMARIADB_USERMARIADB_PASSWORDMARIADB_PORTMARIADB_RANDOM_ROOT_PASSWORDHMS_DATABASE_TYPEHMS_HOSTHMS_PORTHMS_WAREHOUSE_DIRHMS_JDBC_URLHMS_JDBC_USERHMS_JDBC_PASSWORDHMS_JDBC_DRIVERHMS_INITIALIZE_SCHEMAHMS_SCHEMA_RESOURCEHMS_SCHEMA_FILEHMS_EXTRA_CONFIG_FILEHMS_EXTRA_CONFHMS_LOG_LEVELJAVA_OPTS
Extra Hive or Hadoop properties can be passed either as newline-delimited HMS_EXTRA_CONF entries or as individual HMS_CONF_* environment variables, where _ maps to . and __ maps to -.