Hop Conf - The Hop command line configuration tool

Usage

Hop Conf is a command line tool to manage environments. The hop-conf.sh script can be run with the -h flag (./hop-conf.sh -h) to display available options.

 

Usage
Usage: <main class> [-h] [-ec] [-ed] [-el] [-em] [-ey] [-pc] [-pd] [-pl] [-pm]
                    [-pn] [-py] [-aza=<account>] [-azi=<blockIncrement>]
                    [-azk=<key>] [-cfg=<configFile>]
                    [-dc=<defaultProjectConfigFile>] [-de=<defaultEnvironment>]
                    [-dp=<defaultProject>] [-dv=<describeVariable>]
                    [-e=<environmentName>] [-ep=<environmentProject>]
                    [-eu=<environmentPurpose>] [-fj=<fatJarFilename>]
                    [-gck=<serviceAccountKeyFile>] [-gdc=<credentialsFile>]
                    [-gdt=<tokensFolder>] [-p=<projectName>]
                    [-pa=<projectMetadataBaseFolder>]
                    [-pb=<projectDataSetsCsvFolder>] [-pf=<projectConfigFile>]
                    [-ph=<projectHome>] [-pp=<projectCompany>]
                    [-pr=<projectParent>] [-ps=<projectDescription>]
                    [-pt=<projectDepartment>] [-pu=<projectUnitTestsBasePath>]
                    [-px=<projectEnforceExecutionInHome>]
                    [-sj=<standardProjectsFolder>]
                    [-sp=<standardParentProject>] [-sv=<setVariable>]
                    [-xm=<metadataJsonFilename>] [-cfd=<configDescribeVariables>
                    [,<configDescribeVariables>...]]...
                    [-cfv=<configSetVariables>[,<configSetVariables>...]]...
                    [-eg=<environmentConfigFiles>[,
                    <environmentConfigFiles>...]]... [-pv=<projectVariables>[,
                    <projectVariables>...]]...
      -aza, --azure-account=<account>
                            The account to use for the Azure VFS
      -azi, --azure-block-increment=<blockIncrement>
                            The block increment size for new files on Azure,
                              multiples of 512 only.
      -azk, --azure-key=<key>
                            The key to use for the Azure VFS
      -cfd, --config-file-describe-variables=<configDescribeVariables>[,
        <configDescribeVariables>...]
                            A list of variable=description combinations separated by
                              a comma
      -cfg, --config-file=<configFile>
                            Specify the configuration JSON file to manage
      -cfv, --config-file-set-variables=<configSetVariables>[,
        <configSetVariables>...]
                            A list of variable=value combinations separated by a
                              comma
      -dc, --default-projects-folder=<defaultProjectConfigFile>
                            The standard project configuration filename proposed
                              when creating projects
      -de, --default-environment=<defaultEnvironment>
                            The name of the default environment to use when none is
                              specified
      -dp, --default-project=<defaultProject>
                            The name of the default project to use when none is
                              specified
      -dv, --describe-variable=<describeVariable>
                            Describe a variable, use format VARIABLE=Description
  -e, --environment=<environmentName>
                            The name of the lifecycle environment to manage
      -ec, --environment-create
                            Create a new project lifecycle environment. Also specify
                              its name, purpose, the project name and the
                              configuration files.
      -ed, --environment-delete
                            Delete a lifecycle environment
      -eg, --environment-config-files=<environmentConfigFiles>[,
        <environmentConfigFiles>...]
                            A list of configuration files for this lifecycle
                              environment, comma separated
      -el, --environments-list
                            List the defined lifecycle environments
      -em, --environment-modify
                            Modify a lifecycle environment
      -ep, --environment-project=<environmentProject>
                            The project for the environment
      -eu, --environment-purpose=<environmentPurpose>
                            The purpose of the environment: Development, Testing,
                              Production, CI, ...
      -ey, --environment-mandatory
                            Make it mandatory to reference an environment
      -fj, --generate-fat-jar=<fatJarFilename>
                            Specify the filename of the fat jar to generate from
                              your current software installation
      -gck, --google-cloud-service-account-key-file=<serviceAccountKeyFile>
                            Configure the path to a Google Cloud service account
                              JSON key file
      -gdc, --google-drive-credentials-file=<credentialsFile>
                            Configure the path to a Google Drive credentials JSON
                              file
      -gdt, --google-drive-tokens-folder=<tokensFolder>
                            Configure the path to a Google Drive tokens folder
  -h, --help                Displays this help message and quits.
  -p, --project=<projectName>
                            The name of the project to manage
      -pa, --project-metadata-base=<projectMetadataBaseFolder>
                            The metadata base folder (relative to home)
      -pb, --project-datasets-base=<projectDataSetsCsvFolder>
                            The data sets CSV folder (relative to home)
      -pc, --project-create Create a new project. Also specify the name and its home
      -pd, --project-delete Delete a project
      -pf, --project-config-file=<projectConfigFile>
                            The configuration file relative to the home folder. The
                              default value is project-config.json
      -ph, --project-home=<projectHome>
                            The home directory of the project
      -pl, --projects-list   List the defined projects
      -pm, --project-modify Modify a project
      -pn, --projects-enabled
                            Enable or disable the projects plugin
      -pp, --project-company=<projectCompany>
                            The company
      -pr, --project-parent=<projectParent>
                            The name of the parent project to inherit metadata and
                              variables from
      -ps, --project-description=<projectDescription>
                            The description of the project
      -pt, --project-department=<projectDepartment>
                            The department
      -pu, --project-unit-tests-base=<projectUnitTestsBasePath>
                            The unit tests base folder (relative to home)
      -pv, --project-variables=<projectVariables>[,<projectVariables>...]
                            A list of variable=value combinations separated by a
                              comma
      -px, --project-enforce-execution=<projectEnforceExecutionInHome>
                            Validate before execution that a workflow or pipeline is
                              located in the project home folder or a sub-folder
                              (true/false).
      -py, --project-mandatory
                            Make it mandatory to reference a project
      -sj, --standard-projects-folder=<standardProjectsFolder>
                            GUI: The standard projects folder proposed when creating
                              projects
      -sp, --standard-parent-project=<standardParentProject>
                            The name of the standard project to use as a parent when
                              creating new projects
      -sv, --set-variable=<setVariable>
                            Set a variable, use format VAR=Value
      -xm, --export-metadata=<metadataJsonFilename>
                            Export project metadata to a single JSON file which you
                              can specify with this option. Also specify the -p
                              option.
      -v,  --version        Print version information and exit

The available options are listed below:

Table 1. Hop-conf Options
Short Option Extended Option Description

-h

--help

Displays this help message and quits

-v

--version

Print version information and exit

-ec

--environment-create

Create an environment Also specify the name and its home

-ed

--environment-delete

Delete an environment

-el

--environment-list

List the defined environments

-em

--environment-modify

Modify an environment

-pc

--project-create

Create a new project. Also specify the name and its home

-pd

--prject-delete

Delete a project

-pl

--project-list

List the defined projects

-pm

--project-modify

Modify a project

-dv

--describe-variable=<describeVariable>

Describe a variable

-e

-environment=<environmentName>

The name of the environment to manage

-ep

--environment-project=<environmentProject>

The project for the environment

-eu

--environment-purpose=<environmentPurpose>

The purpose of the environment: Development, Testing, Production, CI, …​

-fj

--generate-fat-jar=<fatJarFilename>

Specify the filename of the fat jar to generate from your current software installation

-xm

--export-metadata=<metadataJsonFilename>

Export project metadata to a single JSON file which you can specify with this option. Also specify the -p option to know which metadata to export.

-p

--project=<projectName>

The project name

-pa

--project-metadata-base=<projectMetadataBaseFolder>

The metadata base folder (relative to home)

-pb

--project-datasets-base-base=<projectDataSetsCsvFolder>

The data sets CSV folder (relative to home)

-pf

--project-config-file=<projectConfigFile>

The configuration file relative to the home folder. The default value is project-config.json

-ph

--project-home=<projectHome>

The home directory of the project

-pp

--project-company=<projectCompany>

The company

-ps

--project-description=<projectDescription>

The description of the project

-pt

--project-department=<projectDepartment>

The department

-pu

--project-unit-tests-base=<projectUnitTestsBasePath>

The unit tests base folder (relative to home)

-px

--project-enforce-execution=<projectEnforceExecutionInHome>

Validate before execution that a workflow or pipeline islocated in the project home folder or a sub-folder (true/false)

-sv

--set-variable=<setVariable>

Set a variable, use format VAR=Value

-sv can be used to unset a variable by specifying a variable without a value, e.g. -sv=myvar=

-cfg

--config-file=<configFile>

Specify the configuration JSON file to manage

-cfd

--config-file-describe-variables=<configDescribeVariables>[,<configDescribeVariables>…​]

A list of variable=description combinations separated by a comma

-cfv

--config-file-set-variables=<configSetVariables> ,<configSetVariables>…​]

A list of variable=value combinations separated by a comma

-eg

--environment-config-files=<environmentConfigFiles>[, <environmentConfigFiles>…​]

A list of configuration files for this lifecycle environment, comma separated

-pv

--project-variables=<projectVariables>[,<projectVariables>…​]

A list of variable=value combinations separated by a comma

 

project variables Examples

This is a list of examples on how the parameters on this command are parsed

Normal Usage

--project-variables=key1=value1,key2=value2

Result:

Key value

key1

value1

key2

value2

Spaces in value Usage

--project-variables=key1="This value contains spaces",key2=value2

Result:

Key value

key1

This value contains spaces

key2

value2

Commas in value Usage

--project-variables=key1=\"value1,value2\"

Result:

Key value

key1

value1,value2

Forcing quotes in value

--project-variables=key1="\"\"String with spaces\"\""

Result:

Key value

key1

"String with spaces"

Project Usage and Configuration

Configuration on the command line

The hop-conf script offers many options to edit environment definitions.

Creating an environment

  • Windows

  • Linux, macOS

hop-conf.bat --environment-create \
             --environment hop2 \
             --environment-project hop2
             --environment-purpose=Development \
             --environment-config-files="C:\<YOUR_ENV_FILE_PATH>\env-variables.json"

Expected output:

C:\<YOUR_PATH>\hop>echo off
===[Environment Settings - hop-conf.bat]===================================
Java identified as "C:\Program Files\Microsoft\jdk-11.0.17.8-hotspot\\bin\java"
HOP_OPTIONS=-Xmx2048m -DHOP_AUDIT_FOLDER=.\audit -DHOP_PLATFORM_OS=Windows -DHOP_PLATFORM_RUNTIME=Conf
-DHOP_AUTO_CREATE_CONFIG=Y
Command to start Hop will be:
"C:\Program Files\Microsoft\jdk-11.0.17.8-hotspot\\bin\java" -classpath lib\core\*;lib\beam\*;lib\swt\win64\*
-Djava.library.path=lib\core;lib\beam -Xmx2048m -DHOP_AUDIT_FOLDER=.\audit -DHOP_PLATFORM_OS=Windows
-DHOP_PLATFORM_RUNTIME=Conf -DHOP_AUTO_CREATE_CONFIG=Y org.apache.hop.config.HopConfig  --environment-create
--environment hop2 --environment-project hop2 --environment-purpose Development
--environment-config-files "C:\<YOUR_ENV_FILE_PATH\env-variables.json"
===[Starting HopConfig]=========================================================
Creating environment 'hop2'
Environment 'hop2' was created in Hop configuration file C:\<YOUR_PATH>\hop\config\hop-config.json
Warning: referenced project 'hop2' doesn\'t exist
Found existing environment configuration file: C:\<YOUR_ENV_FILE_PATH>\variables.json
Purpose: Development
Project name: hop2
Config file: C:\<YOUR_ENV_FILE_PATH>\env-variables.json
$ sh hop-conf.sh \
     --environment-create \
     --environment hop2 \
     --environment-project hop2 \
     --environment-purpose=Development \
     --environment-config-files=<YOUR_ENV_FILE_PATH>/env-variables.json

Expected output:

Creating environment 'hop2'
Environment 'hop2' was created in Hop configuration file <YOUR_PATH>/hop/config/hop-config.json
Warning: referenced project 'hop2' doesn't exist
Found existing environment configuration file: <YOUR_ENV_FILE_PATH>/env-variables.json
  hop2
    Purpose: Development
    Project name: hop2
      Config file: <YOUR_ENV_FILE_PATH>/env-variables.json

As you can see from the log, an empty file was created to set variables in:

{ }

Setting variables in an environment

This command adds a variable to the environment configuration file:

 

  • Windows

  • Linux, macOS

hop-conf.bat --config-file "C:\<YOUR_ENV_FILE_PATH>\env-variables.json" --config-file-set-variables "DB_HOSTNAME=localhost,DB_PASSWORD=abcd"

Expected output:

C:\<YOUR_PATH\hop>echo off
===[Environment Settings - hop-conf.bat]===================================
Java identified as "C:\Program Files\Microsoft\jdk-11.0.17.8-hotspot\\bin\java"
HOP_OPTIONS=-Xmx2048m -DHOP_AUDIT_FOLDER=.\audit -DHOP_PLATFORM_OS=Windows -DHOP_PLATFORM_RUNTIME=Conf -DHOP_AUTO_CREATE_CONFIG=Y
Command to start Hop will be:
"C:\Program Files\Microsoft\jdk-11.0.17.8-hotspot\\bin\java" -classpath lib\core\*;lib\beam\*;lib\swt\win64\*
-Djava.library.path=lib\core;lib\beam -Xmx2048m -DHOP_AUDIT_FOLDER=.\audit -DHOP_PLATFORM_OS=Windows
-DHOP_PLATFORM_RUNTIME=Conf -DHOP_AUTO_CREATE_CONFIG=Y org.apache.hop.config.HopConfig
--config-file "C:\<YOUR_ENV_FILE_PATH>\env-variables.json"
--config-file-set-variables "DB_HOSTNAME=localhost,DB_PASSWORD=abcd"
===[Starting HopConfig]=========================================================
Configuration file 'C:\<YOUR_ENV_FILE_PATH>/env-variables.json' was modified.
./hop-conf.sh --config-file <YOUR_ENV_FILE_PATH>/env-variables.json --config-file-set-variables DB_HOSTNAME=localhost,DB_PASSWORD=abcd

Expected output:

Configuration file '<YOUR_ENV_FILE_PATH>/env-variables.json' was modified.

If you look at the file env-variables.json, you’ll see that the variables were added:

{
  "variables" : [ {
    "name" : "DB_HOSTNAME",
    "value" : "localhost",
    "description" : ""
  }, {
    "name" : "DB_PASSWORD",
    "value" : "abcd",
    "description" : ""
  } ]
}

Please note that you can add descriptions for the variables as well with the --describe-variable option. Please run hop-conf without options to see all the possibilities.

Deleting an environment

The following example deletes an environment from the Hop configuration file:

 

  • Windows

  • Linux, macOS

hop-conf.bat -ed --environment hop2

Expected output:

C:\<YOUR_PATH>\hop>echo off
===[Environment Settings - hop-conf.bat]===================================
Java identified as "C:\Program Files\Microsoft\jdk-11.0.17.8-hotspot\\bin\java"
HOP_OPTIONS=-Xmx2048m -DHOP_AUDIT_FOLDER=.\audit -DHOP_PLATFORM_OS=Windows -DHOP_PLATFORM_RUNTIME=Conf -DHOP_AUTO_CREATE_CONFIG=Y
Command to start Hop will be:
"C:\Program Files\Microsoft\jdk-11.0.17.8-hotspot\\bin\java" -classpath lib\core\*;lib\beam\*;lib\swt\win64\* -Djava.library.path=lib\core;lib\beam -Xmx2048m -DHOP_AUDIT_FOLDER=.\audit -DHOP_PLATFORM_OS=Windows -DHOP_PLATFORM_RUNTIME=Conf -DHOP_AUTO_CREATE_CONFIG=Y org.apache.hop.config.HopConfig  -ed --environment hop2
===[Starting HopConfig]=========================================================
Lifecycle environment 'hop2' was deleted from Hop configuration file C:\<YOUR_PATH>\hop\config\hop-config.json
./hop-conf.sh -ed --environment hop2

Expected output:

Lifecycle environment 'hop2' was deleted from Hop configuration file <YOUR_PATH>/hop/config/hop-config.json

Projects Plugin configuration

There are various options to configure the behavior of the Projects plugin itself. In Hop configuration file hop-config.json we can find the following options:

{
    "projectMandatory" : true,
    "environmentMandatory" : false,
    "defaultProject" : "default",
    "defaultEnvironment" : null,
    "standardParentProject" : "default",
    "standardProjectsFolder" : "/home/matt/test-stuff/"
}
Option Description hop-conf option

projectMandatory

This will prevent anyone from using hop-run without specifying a project

--project-mandatory

environmentMandatory

This will prevent anyone from using hop-run without specifying an environment

--environment-mandatory

defaultProject

The default project to use when none is specified

--default-project

defaultEnvironment

The default environment to use when none is specified

--default-environment

standardParentProject

The standard parent project to propose when creating new project

--standard-parent-project

standardProjectsFolder

The folder to which you’ll browse by default in the GUI when creating new projects

--standard-projects-folder

Running Workflows and Pipelines

You can specify an environment or a project when executing a pipeline or a workflow. By doing so you are automatically configuring metadata, variables without too much fuss.

The easiest example is shown by executing the "complex" pipeline from the Apache Beam examples:

 

  • Windows

  • Linux, macOS

hop-run.bat --project samples --file 'beam/pipelines/complex.hpl' --runconfig Direct

Expected output:

C:\<YOUR_PATH>\hop>echo off
===[Environment Settings - hop-run.bat]===================================
Java identified as "C:\Program Files\Microsoft\jdk-11.0.17.8-hotspot\\bin\java"
HOP_OPTIONS="-Xmx2048m" -DHOP_AUDIT_FOLDER=.\audit -DHOP_PLATFORM_OS=Windows
-DHOP_PLATFORM_RUNTIME=Run -DHOP_AUTO_CREATE_CONFIG=Y
Consolidated parameters to pass to HopRun are
--project samples --file beam/pipelines/complex.hpl --runconfig Direct
Command to start HopRun will be:
"C:\Program Files\Microsoft\jdk-11.0.17.8-hotspot\\bin\java" -classpath lib\core\*;lib\beam\*;lib\swt\win64\*
-Djava.library.path=lib\core;lib\beam "-Xmx2048m" -DHOP_AUDIT_FOLDER=.\audit -DHOP_PLATFORM_OS=Windows
-DHOP_PLATFORM_RUNTIME=Run -DHOP_AUTO_CREATE_CONFIG=Y org.apache.hop.run.HopRun  --project samples
--file beam/pipelines/complex.hpl --runconfig Direct
===[Starting HopRun]=========================================================
2022/12/16 14:23:10 - HopRun - Enabling project 'samples'
2022/12/16 14:23:10 - HopRun - Relative path filename specified: config/projects/samples/beam/pipelines/complex.hpl
2022/12/16 14:23:10 - HopRun - Starting pipeline: config/projects/samples/beam/pipelines/complex.hpl
2022/12/16 14:23:21 - General - Created Apache Beam pipeline with name 'complex'
2022/12/16 14:23:21 - General - Handled transform (INPUT) : Customer data
2022/12/16 14:23:21 - General - Handled transform (INPUT) : State data
2022/12/16 14:23:21 - General - Handled Group By (TRANSFORM) : countPerState, gets data from 1 previous transform(s)
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : uppercase state, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Handled Merge Join (TRANSFORM) : Merge join
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : Lookup count per state, gets data from 1 previous transform(s), targets=0, infos=1
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : name<n, gets data from 1 previous transform(s), targets=2, infos=0
2022/12/16 14:23:21 - General - Transform Label: N-Z reading from previous transform targeting this one using : name<n - TARGET - Label: N-Z
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : Label: N-Z, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Transform Label: A-M reading from previous transform targeting this one using : name<n - TARGET - Label: A-M
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : Label: A-M, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : Switch / case, gets data from 2 previous transform(s), targets=4, infos=0
2022/12/16 14:23:21 - General - Transform CA reading from previous transform targeting this one using : Switch / case - TARGET - CA
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : CA, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Transform NY reading from previous transform targeting this one using : Switch / case - TARGET - NY
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : NY, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Transform FL reading from previous transform targeting this one using : Switch / case - TARGET - FL
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : FL, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Transform Default reading from previous transform targeting this one using : Switch / case - TARGET - Default
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : Default, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Handled generic transform (TRANSFORM) : Collect, gets data from 4 previous transform(s), targets=0, infos=0
2022/12/16 14:23:21 - General - Handled transform (OUTPUT) : complex, gets data from Collect
2022/12/16 14:23:21 - General - Executing this pipeline using the Beam Pipeline Engine with run configuration 'Direct'  ----
./sh hop-run.sh --project samples --file 'beam/pipelines/complex.hpl' --runconfig Direct

Expected output:

2022/12/16 14:27:37 - HopRun - Enabling project 'samples'
2022/12/16 14:27:37 - HopRun - Relative path filename specified: config/projects/samples/beam/pipelines/complex.hpl
2022/12/16 14:27:37 - HopRun - Starting pipeline: config/projects/samples/beam/pipelines/complex.hpl
2022/12/16 14:27:41 - General - Created Apache Beam pipeline with name 'complex'
2022/12/16 14:27:41 - General - Handled transform (INPUT) : Customer data
2022/12/16 14:27:41 - General - Handled transform (INPUT) : State data
2022/12/16 14:27:41 - General - Handled Group By (TRANSFORM) : countPerState, gets data from 1 previous transform(s)
2022/12/16 14:27:41 - General - Handled generic transform (TRANSFORM) : uppercase state, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:27:41 - General - Handled Merge Join (TRANSFORM) : Merge join
2022/12/16 14:27:41 - General - Handled generic transform (TRANSFORM) : Lookup count per state, gets data from 1 previous transform(s), targets=0, infos=1
2022/12/16 14:27:41 - General - Handled generic transform (TRANSFORM) : name<n, gets data from 1 previous transform(s), targets=2, infos=0
2022/12/16 14:27:41 - General - Transform Label: N-Z reading from previous transform targeting this one using : name<n - TARGET - Label: N-Z
2022/12/16 14:27:41 - General - Handled generic transform (TRANSFORM) : Label: N-Z, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:27:41 - General - Transform Label: A-M reading from previous transform targeting this one using : name<n - TARGET - Label: A-M
2022/12/16 14:27:41 - General - Handled generic transform (TRANSFORM) : Label: A-M, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:27:41 - General - Handled generic transform (TRANSFORM) : Switch / case, gets data from 2 previous transform(s), targets=4, infos=0
2022/12/16 14:27:41 - General - Transform CA reading from previous transform targeting this one using : Switch / case - TARGET - CA
2022/12/16 14:27:42 - General - Handled generic transform (TRANSFORM) : CA, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:27:42 - General - Transform NY reading from previous transform targeting this one using : Switch / case - TARGET - NY
2022/12/16 14:27:42 - General - Handled generic transform (TRANSFORM) : NY, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:27:42 - General - Transform FL reading from previous transform targeting this one using : Switch / case - TARGET - FL
2022/12/16 14:27:42 - General - Handled generic transform (TRANSFORM) : FL, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:27:42 - General - Transform Default reading from previous transform targeting this one using : Switch / case - TARGET - Default
2022/12/16 14:27:42 - General - Handled generic transform (TRANSFORM) : Default, gets data from 1 previous transform(s), targets=0, infos=0
2022/12/16 14:27:42 - General - Handled generic transform (TRANSFORM) : Collect, gets data from 4 previous transform(s), targets=0, infos=0
2022/12/16 14:27:42 - General - Handled transform (OUTPUT) : complex, gets data from Collect
2022/12/16 14:27:42 - General - Executing this pipeline using the Beam Pipeline Engine with run configuration 'Direct'

To execute an Apache Beam pipeline a lot of information and metadata is needed. Let’s dive into a few fun information tidbits:

  • By referencing the samples project Hop knows where the project is located (config/projects/samples)

  • Since we know the location of the project, we can specify pipelines and workflows with a relative path

  • The project knows where its metadata is stored (config/projects/samples/metadata) so it knows where to find the Direct pipeline run configuration (config/projects/samples/metadata/pipeline-run-configuration/Direct.json)

  • This run configuration defines its own pipeline engine specific variables, in this case the output folder : DATA_OUTPUT=${PROJECT_HOME}/beam/output/

  • The output of the samples is as such written to config/projects/samples/beam/output

To reference an environment you can execute using -e or --environment. The only difference is that you’ll have a number of extra environment variables set while executing.

Cloud Storage Configuration

Hop Conf can be used to configure your AWS, Azure and Google Cloud (Cloud Storage and Drive) accounts with Hop through VFS

Amazon Web Services S3

N/A

Azure

Set the account, block increment size for new files and your Azure key

      -aza, --azure-account=<account>
                            The account to use for the Azure VFS
      -azi, --azure-block-increment=<blockIncrement>
                            The block increment size for new files on Azure,
                              multiples of 512 only.
      -azk, --azure-key=<key>
                            The key to use for the Azure VFS

Google

Google Cloud Storage

Set the path to your Google Cloud service account JSON key file

      -gck, --google-cloud-service-account-key-file=<serviceAccountKeyFile>
                            Configure the path to a Google Cloud service account JSON key file

Google Drive

Set the path to your Google Drive credentials JSON file or Google Drive tokens folder.

      -gdc, --google-drive-credentials-file=<credentialsFile>
                            Configure the path to a Google Drive credentials JSON
                              file
      -gdt, --google-drive-tokens-folder=<tokensFolder>
                            Configure the path to a Google Drive tokens folder