You can specify an environment or a project when executing a pipeline or a workflow. By doing so you are automatically configuring metadata, variables without too much fuss.
The easiest example is shown by executing the "complex" pipeline from the Apache Beam examples:
-
Windows
-
Linux, macOS
hop-run.bat --project samples --file 'beam/pipelines/complex.hpl' --runconfig Direct
Expected output:
C:\<YOUR_PATH>\hop>echo off
===[Environment Settings - hop-run.bat]===================================
Java identified as "C:\Program Files\Microsoft\jdk-11.0.17.8-hotspot\\bin\java"
HOP_OPTIONS="-Xmx2048m" -DHOP_AUDIT_FOLDER=.\audit -DHOP_PLATFORM_OS=Windows
-DHOP_PLATFORM_RUNTIME=Run -DHOP_AUTO_CREATE_CONFIG=Y
Consolidated parameters to pass to HopRun are
--project samples --file beam/pipelines/complex.hpl --runconfig Direct
Command to start HopRun will be:
"C:\Program Files\Microsoft\jdk-11.0.17.8-hotspot\\bin\java" -classpath lib\core\*;lib\beam\*;lib\swt\win64\*
-Djava.library.path=lib\core;lib\beam "-Xmx2048m" -DHOP_AUDIT_FOLDER=.\audit -DHOP_PLATFORM_OS=Windows
-DHOP_PLATFORM_RUNTIME=Run -DHOP_AUTO_CREATE_CONFIG=Y org.apache.hop.run.HopRun --project samples
--file beam/pipelines/complex.hpl --runconfig Direct
===[Starting HopRun]=========================================================
2022/12/16 14:23:10 - HopRun - Enabling project 'samples'
2022/12/16 14:23:10 - HopRun - Relative path filename specified: config/projects/samples/beam/pipelines/complex.hpl
2022/12/16 14:23:10 - HopRun - Starting pipeline: config/projects/samples/beam/pipelines/complex.hpl
2022/12/16 14:23:21 - General - Created Apache Beam pipeline with name 'complex'
2022/12/16 14:23:21 - General - Handled transform (INPUT) : Customer data
2022/12/16 14:23:21 - General - Handled transform (INPUT) : State data
2022/12/16 14:23:21 - General - Handled Group By (TRANSFORM) : countPerState, gets data from 1 previous transform(s)
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : uppercase state, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Handled Merge Join (TRANSFORM) : Merge join
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : Lookup count per state, gets data from 1 previous transform(s), targets=0, infos=1
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : name<n, gets data from 1 previous transform(s), targets=2, infos=0
2022/12/16 14:23:21 - General - Transform Label: N-Z reading from previous transform targeting this one using : name<n - TARGET - Label: N-Z
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : Label: N-Z, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Transform Label: A-M reading from previous transform targeting this one using : name<n - TARGET - Label: A-M
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : Label: A-M, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : Switch / case, gets data from 2 previous transform(s), targets=4, infos=0
2022/12/16 14:23:21 - General - Transform CA reading from previous transform targeting this one using : Switch / case - TARGET - CA
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : CA, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Transform NY reading from previous transform targeting this one using : Switch / case - TARGET - NY
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : NY, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Transform FL reading from previous transform targeting this one using : Switch / case - TARGET - FL
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : FL, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Transform Default reading from previous transform targeting this one using : Switch / case - TARGET - Default
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : Default, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : Collect, gets data from 4 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Handled transform (OUTPUT) : complex, gets data from Collect
2022/12/16 14:23:21 - General - Executing this pipeline using the Beam Pipeline Engine with run configuration 'Direct' ----
./sh hop-run.sh --project samples --file 'beam/pipelines/complex.hpl' --runconfig Direct
Expected output:
2022/12/16 14:27:37 - HopRun - Enabling project 'samples'
2022/12/16 14:27:37 - HopRun - Relative path filename specified: config/projects/samples/beam/pipelines/complex.hpl
2022/12/16 14:27:37 - HopRun - Starting pipeline: config/projects/samples/beam/pipelines/complex.hpl
2022/12/16 14:27:41 - General - Created Apache Beam pipeline with name 'complex'
2022/12/16 14:27:41 - General - Handled transform (INPUT) : Customer data
2022/12/16 14:27:41 - General - Handled transform (INPUT) : State data
2022/12/16 14:27:41 - General - Handled Group By (TRANSFORM) : countPerState, gets data from 1 previous transform(s)
2022/12/16 14:27:41 - General - Handled generic transform (TRANSFORM) : uppercase state, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:27:41 - General - Handled Merge Join (TRANSFORM) : Merge join
2022/12/16 14:27:41 - General - Handled generic transform (TRANSFORM) : Lookup count per state, gets data from 1 previous transform(s), targets=0, infos=1
2022/12/16 14:27:41 - General - Handled generic transform (TRANSFORM) : name<n, gets data from 1 previous transform(s), targets=2, infos=0
2022/12/16 14:27:41 - General - Transform Label: N-Z reading from previous transform targeting this one using : name<n - TARGET - Label: N-Z
2022/12/16 14:27:41 - General - Handled generic transform (TRANSFORM) : Label: N-Z, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:27:41 - General - Transform Label: A-M reading from previous transform targeting this one using : name<n - TARGET - Label: A-M
2022/12/16 14:27:41 - General - Handled generic transform (TRANSFORM) : Label: A-M, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:27:41 - General - Handled generic transform (TRANSFORM) : Switch / case, gets data from 2 previous transform(s), targets=4, infos=0
2022/12/16 14:27:41 - General - Transform CA reading from previous transform targeting this one using : Switch / case - TARGET - CA
2022/12/16 14:27:42 - General - Handled generic transform (TRANSFORM) : CA, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:27:42 - General - Transform NY reading from previous transform targeting this one using : Switch / case - TARGET - NY
2022/12/16 14:27:42 - General - Handled generic transform (TRANSFORM) : NY, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:27:42 - General - Transform FL reading from previous transform targeting this one using : Switch / case - TARGET - FL
2022/12/16 14:27:42 - General - Handled generic transform (TRANSFORM) : FL, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:27:42 - General - Transform Default reading from previous transform targeting this one using : Switch / case - TARGET - Default
2022/12/16 14:27:42 - General - Handled generic transform (TRANSFORM) : Default, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:27:42 - General - Handled generic transform (TRANSFORM) : Collect, gets data from 4 previous transform(s), targets=0, infos=0
2022/12/16 14:27:42 - General - Handled transform (OUTPUT) : complex, gets data from Collect
2022/12/16 14:27:42 - General - Executing this pipeline using the Beam Pipeline Engine with run configuration 'Direct'
To execute an Apache Beam pipeline a lot of information and metadata is needed. Let’s dive into a few fun information tidbits:
-
By referencing the
project Hop knows where the project is located (samples
)config/projects/samples
-
Since we know the location of the project, we can specify pipelines and workflows with a relative path
-
The project knows where its metadata is stored (
) so it knows where to find theconfig/projects/samples/metadata
pipeline run configuration (Direct
)config/projects/samples/metadata/pipeline-run-configuration/Direct.json
-
This run configuration defines its own pipeline engine specific variables, in this case the output folder :
DATA_OUTPUT=${PROJECT_HOME}/beam/output/
-
The output of the samples is as such written to
config/projects/samples/beam/output
To reference an environment you can execute using
or -e
. The only difference is that you’ll have a number of extra environment variables set while executing.--environment