Docker container

Introduction

This is the documentation of the official Apache Hop docker container published on:

https://hub.docker.com/r/apache/hop

It’s a Hop Docker image supporting both short-lived and long-lived setups. A short-lived setup executes a pipeline or workflow and stops right after. A long-lived setup starts a Hop server and waits for work.

Operating system

The docker container runs a minimal Linux system called Alpine. OpenJDK version 11 is then used to execute Apache Hop. The Linux user used to execute in the container is hop and the group is hop as well.

Container Folder Structure

Folder Description

Folder	Description
`/opt/hop`	The installation location of the hop client package.
`/files`	This volume has read-write permissions for Linux user `hop`. You can use it for example to mount a folder that contains the hop and project config as well as the workflows and pipelines.
`/home/hop`	The initial working directory location of the docker container.

/opt/hop

The installation location of the hop client package.

/files

This volume has read-write permissions for Linux user hop. You can use it for example to mount a folder that contains the hop and project config as well as the workflows and pipelines.

/home/hop

The initial working directory location of the docker container.

Environment Variables

You can provide values for the following environment variables:

Environment Variable Default Description

Environment Variable	Default	Description
`HOP_LOG_LEVEL`	`Basic`	The log level. Use one of: `None`, `Error`, `Minimal`, `Basic`, `Detailed`, `Debug` or `Rowlevel`.
`HOP_LOG_PATH`	`/opt/hop/hop.err.log`	The file path to the Hop log file.
`HOP_PROJECT_NAME`		Name of the Hop project to create in the container. You also need to specify the `HOP_PROJECT_FOLDER` variable. If you do not set this variable, no project or environment will be created.
`HOP_PROJECT_FOLDER` `HOP_PROJECT_DIRECTORY` (Deprecated)		Path to the home of the Hop project.
`HOP_PROJECT_CONFIG_FILE_NAME`	`project-config.json`	Name of the project config file.
`HOP_ENVIRONMENT_NAME`		The name of the Hop environment to create in the container. If you do not set this variable, no environment will be created. When using an environment a project has to be created too
`HOP_ENVIRONMENT_CONFIG_FILE_NAME_PATHS`		This is a comma separated list of paths to environment config files (including filename and file extension).
`HOP_OPTIONS`	`-XX:+AggressiveHeap`	Any JRE options you want to set. The `-XX:+AggressiveHeap` option tells the container to use all memory assigned to it.
`HOP_SHARED_JDBC_FOLDERS`		Comma separated list to locations where JDBC drivers live (default is /lib/jdbc).
`HOP_CUSTOM_ENTRYPOINT_EXTENSION_SHELL_FILE_PATH`		Optional path to a custom entrypoint extension script file. You can use this for example to fetch Hop project files from S3 or gitlab.
`HOP_SYSTEM_PROPERTIES`		The system properties that should be set during execution. You can specify the properties as a comma separated list, e.g. `PROP1=xxx,PROP2=yyy`

HOP_LOG_LEVEL

Basic

The log level. Use one of: None, Error, Minimal, Basic, Detailed, Debug or Rowlevel.

HOP_LOG_PATH

/opt/hop/hop.err.log

The file path to the Hop log file.

HOP_PROJECT_NAME

Name of the Hop project to create in the container. You also need to specify the HOP_PROJECT_FOLDER variable. If you do not set this variable, no project or environment will be created.

HOP_PROJECT_FOLDER

HOP_PROJECT_DIRECTORY (Deprecated)

Path to the home of the Hop project.

HOP_PROJECT_CONFIG_FILE_NAME

project-config.json

Name of the project config file.

HOP_ENVIRONMENT_NAME

The name of the Hop environment to create in the container. If you do not set this variable, no environment will be created. When using an environment a project has to be created too

HOP_ENVIRONMENT_CONFIG_FILE_NAME_PATHS

This is a comma separated list of paths to environment config files (including filename and file extension).

HOP_OPTIONS

-XX:+AggressiveHeap

Any JRE options you want to set. The -XX:+AggressiveHeap option tells the container to use all memory assigned to it.

HOP_SHARED_JDBC_FOLDERS

Comma separated list to locations where JDBC drivers live (default is /lib/jdbc).

HOP_CUSTOM_ENTRYPOINT_EXTENSION_SHELL_FILE_PATH

Optional path to a custom entrypoint extension script file. You can use this for example to fetch Hop project files from S3 or gitlab.

HOP_SYSTEM_PROPERTIES

The system properties that should be set during execution. You can specify the properties as a comma separated list, e.g. PROP1=xxx,PROP2=yyy

Here are the variables for short-lived containers, to execute a workflow or pipeline:

Environment Variable Required Description

Environment Variable	Required	Description
`HOP_FILE_PATH`	Yes	The path to the Hop workflow or pipeline to execute. If you configured a project you can use a relative path to the project home folder.
`HOP_RUN_CONFIG`	Yes	The name of the Hop run configuration to use to execute your pipeline or workflow.
`HOP_RUN_PARAMETERS`	No	Optionally, you can specify the parameters that should be passed to the pipeline or workflow you are executing. You can specify them as a comma separated list, e.g. `PARAM_1=aaa,PARAM_2=bbb`.
`HOP_RUN_METADATA_EXPORT`	No	You can specify a metadata export file here which will contain all required metadata elements. See also the metadata export option in the Hop Conf tool.

HOP_FILE_PATH

Yes

The path to the Hop workflow or pipeline to execute. If you configured a project you can use a relative path to the project home folder.

HOP_RUN_CONFIG

Yes

The name of the Hop run configuration to use to execute your pipeline or workflow.

HOP_RUN_PARAMETERS

Optionally, you can specify the parameters that should be passed to the pipeline or workflow you are executing. You can specify them as a comma separated list, e.g. PARAM_1=aaa,PARAM_2=bbb.

HOP_RUN_METADATA_EXPORT

You can specify a metadata export file here which will contain all required metadata elements. See also the metadata export option in the Hop Conf tool.

Below are the variables you can use for a long-lived container, running Hop Server:

Environment Variable Default value Description

Environment Variable	Default value	Description
`HOP_SERVER_HOSTNAME`	`0.0.0.0` (listen to anything)	The IP address the server will listen to.
`HOP_SERVER_PORT`	`8080`	The port the server will listen to.
`HOP_SERVER_SHUTDOWNPORT`	`8079`	The port the server shutdown listener will listen to.
`HOP_SERVER_USER`	`cluster`	The username to log into the Hop server.
`HOP_SERVER_PASS`	`cluster`	The password to log into the Hop server
`HOP_SERVER_METADATA_FOLDER`	(optional)	You can point to a folder containing metadata JSON files which are then available to the server.
`HOP_SERVER_MAX_LOG_LINES`	`0` (keep all logging in memory)	The maximum number of log lines kept in memory by the server.
`HOP_SERVER_MAX_LOG_TIMEOUT`	`0` (never clean up log lines)	The time (in minutes) it takes for a log line to be cleaned up in memory.
`HOP_SERVER_MAX_OBJECT_TIMEOUT`	`1440` (a day)	The time (in minutes) it takes for a pipeline or workflow execution to be removed from the server status.
`HOP_SERVER_KEYSTORE`	(optional)	The path to the Java keystore file you want to use to run the Hop server with SSL enabled to support https.
`HOP_SERVER_KEYSTORE_PASSWORD`	(optional)	The password of the Java keystore file you want to use to run the Hop server with SSL enabled to support https
`HOP_SERVER_KEY_PASSWORD`	(optional)	The password of the key if you want to use to run the Hop server with SSL enabled. If both passwords are the same you can omit setting this variable.

HOP_SERVER_HOSTNAME

0.0.0.0 (listen to anything)

The IP address the server will listen to.

HOP_SERVER_PORT

8080

The port the server will listen to.

HOP_SERVER_SHUTDOWNPORT

8079

The port the server shutdown listener will listen to.

HOP_SERVER_USER

cluster

The username to log into the Hop server.

HOP_SERVER_PASS

cluster

The password to log into the Hop server

HOP_SERVER_METADATA_FOLDER

(optional)

You can point to a folder containing metadata JSON files which are then available to the server.

HOP_SERVER_MAX_LOG_LINES

0 (keep all logging in memory)

The maximum number of log lines kept in memory by the server.

HOP_SERVER_MAX_LOG_TIMEOUT

0 (never clean up log lines)

The time (in minutes) it takes for a log line to be cleaned up in memory.

HOP_SERVER_MAX_OBJECT_TIMEOUT

1440 (a day)

The time (in minutes) it takes for a pipeline or workflow execution to be removed from the server status.

HOP_SERVER_KEYSTORE

(optional)

The path to the Java keystore file you want to use to run the Hop server with SSL enabled to support https.

HOP_SERVER_KEYSTORE_PASSWORD

(optional)

The password of the Java keystore file you want to use to run the Hop server with SSL enabled to support https

HOP_SERVER_KEY_PASSWORD

(optional)

The password of the key if you want to use to run the Hop server with SSL enabled. If both passwords are the same you can omit setting this variable.

Updating the Hop docker container image

Make sure to get the latest updates for the Hop image by pulling them:

docker pull apache/hop:<tag>

If you do not specify a value for :<tag> the value latest will be taken. Latest will contain the last officially released version of Apache Hop. You can also specify Development as a tag. That image will contain the last built Development snapshot of Apache Hop. rxq7777

How to run the Container

The most common use case will be that you run a short-lived container to just complete one Hop workflow or pipeline.

The first example below runs the sample switch-case-basic.hpl pipeline from the samples project.

Replace <tag> with latest, Development or a release tag, and replace <HOP_SAMPLE_PROJECT_PATH> with the path to the config/projects/samples folder in your Hop installation to mount that folder as a volume. This will make the samples project folder available as the /files folder in the container.

docker run -it --rm \
  --env HOP_LOG_LEVEL=Basic \
  --env HOP_FILE_PATH='${PROJECT_HOME}/transforms/switch-case-basic.hpl' \
  --env HOP_PROJECT_FOLDER=/files \
  --env HOP_PROJECT_NAME=samples \
  --env HOP_RUN_CONFIG=local \
  --name hop-test-container \
  -v <HOP_SAMPLE_PROJECT_PATH>:/files \
  apache/hop:<tag>

The second example below runs a workflow.

In addition to the most basic example below, this example adds the environment, based on HOP_ENVIRONMENT_NAME, HOP_ENVIRONMENT_CONFIG_FILE_NAME_PATHS and run parameters with HOP_RUN_PARAMETERS

docker run -it --rm \
  --env HOP_LOG_LEVEL=Basic \
  --env HOP_FILE_PATH='${PROJECT_HOME}/pipelines-and-workflows/main.hwf' \
  --env HOP_PROJECT_FOLDER=/files/project \
  --env HOP_PROJECT_NAME=project-a \
  --env HOP_ENVIRONMENT_NAME=project-a-test \
  --env HOP_ENVIRONMENT_CONFIG_FILE_NAME_PATHS=/files/config/project-a-test.json \
  --env HOP_RUN_CONFIG=local \
  --env HOP_RUN_PARAMETERS=PARAM_LOG_MESSAGE=Hello,PARAM_WAIT_FOR_X_MINUTES=1 \
  -v /path/to/local/dir:/files \
  --name my-simple-hop-container \
  apache/hop:<tag>

If you need a long-lived container, this option is also available.

For more information on the long-lived container please also see the Hop Server documentation as it describes what can be done using the Hop Server

Run this command to start a Hop Server in a docker container:

docker run -it --rm \
  --env HOP_SERVER_USER=admin \
  --env HOP_SERVER_PASS=admin \
  --env HOP_SERVER_SHUTDOWNPORT=8080 \
  --env HOP_SERVER_PORT=8181 \
  --env HOP_SERVER_PORT=8180 \
  --env HOP_SERVER_HOSTNAME=0.0.0.0 \
  -p 8080:8080 \
  -p 8181:8181 \
  -p 8180:8180 \
  --name my-hop-server-container \
 apache/hop:<tag>

localhost is a loopback to your machine, which may be the container but not the host (your laptop or server where your run the container). Use 0.0.0.0 instead to listen on all available interfaces.

Hop Server is designed to receive all variables and metadata from executing clients. This means it needs little to no configuration to run.

If you want to use the web-services functionality additional information on how to configure your webserver can be found on the user manual Web Service page. For this to work properly the HOP_SERVER_METADATA_FOLDER variable has to be set too.

When started can then access the hop-server UI from your host at http://0.0.0.0:8181 or http://localhost:8181

Custom Entrypoint Extension Shell Script

To make the Hop Docker image even more flexible, we added a HOP_CUSTOM_ENTRYPOINT_EXTENSION_SHELL_FILE_PATH variable that accepts a path to a custom shell script (that you provide).This shell script will run when you start the container before your Hop project is registered with the container’s Hop config and before your Hop workflow or pipeline gets kicked off. This feature might come in handy when you want to run some custom logic upfront, e.g. source Hop project files from S3 or clone them from GitHub.

The custom shell file can be provided in several ways (this is not a full list):

via the mount point (/files)
You create your own Dockerfile, define this image as the base and then use the COPY instruction to copy your custom shell file in your Docker image.

For the last scenario mentioned, it could be something like this:

We create a simple bash script called clone-git-repo.sh in a sub-folder called resources:

#!/bin/bash
cd /home/hop
git clone ${GIT_REPO_URI}
chown -R hop:hop /home/hop/${GIT_REPO_NAME}

We also make it parameter-driven, so it any other team can use it.We create our custom Dockerfile like so:

FROM apache/hop:Development
ENV GIT_REPO_URI=https://...
# example value: https://github.com/diethardsteiner/apache-hop-minimal-project.git
ENV GIT_REPO_NAME=repo-name
# example value: apache-hop-minimal-project
USER root
RUN apk update \
  && apk add --no-cache git
# copy custom entrypoint extension shell script
COPY --chown=hop:hop ./resources/clone-git-repo.sh /home/hop/clone-git-repo.sh
USER hop

Note that apart from defining the new environment variables (that go in line with the parameters we defined in the clone-git-repo.sh earlier on ), we also COPY the clone-git-repo.sh file to user hop’s home folder.

Next let’s build a small script which builds our custom image and then tests it by spinning up a container and running a workflow:

#!/bin/zsh

DOCKER_IMG_CHECK=$(docker images | grep ds/custom-hop)

if [ ! -z "${DOCKER_IMG_CHECK}" ]; then
  echo "removing existing ds/custom-hop image"
  docker rmi ds/custom-hop:latest
fi

docker build . -f custom.Dockerfile -t ds/custom-hop:latest

echo " ==== TESTING ====="


HOP_DOCKER_IMAGE=ds/custom-hop:latest
PROJECT_DEPLOYMENT_DIR=/home/hop/apache-hop-minimal-project

docker run -it --rm \
  --env HOP_LOG_LEVEL=Basic \
  --env HOP_FILE_PATH='${PROJECT_HOME}/main.hwf' \
  --env HOP_PROJECT_FOLDER=${PROJECT_DEPLOYMENT_DIR} \
  --env HOP_PROJECT_NAME=apache-hop-minimum-project \
  --env HOP_ENVIRONMENT_NAME=dev \
  --env HOP_ENVIRONMENT_CONFIG_FILE_NAME_PATHS=${PROJECT_DEPLOYMENT_DIR}/dev-config.json \
  --env HOP_RUN_CONFIG=local \
  --env HOP_CUSTOM_ENTRYPOINT_EXTENSION_SHELL_FILE_PATH=/home/hop/clone-git-repo.sh \
  --env GIT_REPO_URI=https://github.com/diethardsteiner/apache-hop-minimal-project.git \
  --env GIT_REPO_NAME=apache-hop-minimal-project \
  --name my-simple-hop-container \
  ${HOP_DOCKER_IMAGE}