Hop Conf - The Hop command line configuration tool
Usage
Hop Conf is a command line tool to manage environments. The hop-conf.sh script can be run with the -h flag (./hop-conf.sh -h
) to display available options.
Usage: <main class> [-h] [-ec] [-ed] [-el] [-em] [-ey] [-pc] [-pd] [-pl] [-pm]
[-pn] [-py] [-aza=<account>] [-azi=<blockIncrement>]
[-azk=<key>] [-cfg=<configFile>]
[-dc=<defaultProjectConfigFile>] [-de=<defaultEnvironment>]
[-dp=<defaultProject>] [-dv=<describeVariable>]
[-e=<environmentName>] [-ep=<environmentProject>]
[-eu=<environmentPurpose>] [-fj=<fatJarFilename>]
[-gck=<serviceAccountKeyFile>] [-gdc=<credentialsFile>]
[-gdt=<tokensFolder>] [-p=<projectName>]
[-pa=<projectMetadataBaseFolder>]
[-pb=<projectDataSetsCsvFolder>] [-pf=<projectConfigFile>]
[-ph=<projectHome>] [-pp=<projectCompany>]
[-pr=<projectParent>] [-ps=<projectDescription>]
[-pt=<projectDepartment>] [-pu=<projectUnitTestsBasePath>]
[-px=<projectEnforceExecutionInHome>]
[-sj=<standardProjectsFolder>]
[-sp=<standardParentProject>] [-sv=<setVariable>]
[-xm=<metadataJsonFilename>] [-cfd=<configDescribeVariables>
[,<configDescribeVariables>...]]...
[-cfv=<configSetVariables>[,<configSetVariables>...]]...
[-eg=<environmentConfigFiles>[,
<environmentConfigFiles>...]]... [-pv=<projectVariables>[,
<projectVariables>...]]...
-aza, --azure-account=<account>
The account to use for the Azure VFS
-azi, --azure-block-increment=<blockIncrement>
The block increment size for new files on Azure,
multiples of 512 only.
-azk, --azure-key=<key>
The key to use for the Azure VFS
-cfd, --config-file-describe-variables=<configDescribeVariables>[,
<configDescribeVariables>...]
A list of variable=description combinations separated by
a comma
-cfg, --config-file=<configFile>
Specify the configuration JSON file to manage
-cfv, --config-file-set-variables=<configSetVariables>[,
<configSetVariables>...]
A list of variable=value combinations separated by a
comma
-dc, --default-projects-folder=<defaultProjectConfigFile>
The standard project configuration filename proposed
when creating projects
-de, --default-environment=<defaultEnvironment>
The name of the default environment to use when none is
specified
-dp, --default-project=<defaultProject>
The name of the default project to use when none is
specified
-dv, --describe-variable=<describeVariable>
Describe a variable, use format VARIABLE=Description
-e, --environment=<environmentName>
The name of the lifecycle environment to manage
-ec, --environment-create
Create a new project lifecycle environment. Also specify
its name, purpose, the project name and the
configuration files.
-ed, --environment-delete
Delete a lifecycle environment
-eg, --environment-config-files=<environmentConfigFiles>[,
<environmentConfigFiles>...]
A list of configuration files for this lifecycle
environment, comma separated
-el, --environments-list
List the defined lifecycle environments
-em, --environment-modify
Modify a lifecycle environment
-ep, --environment-project=<environmentProject>
The project for the environment
-eu, --environment-purpose=<environmentPurpose>
The purpose of the environment: Development, Testing,
Production, CI, ...
-ey, --environment-mandatory
Make it mandatory to reference an environment
-fj, --generate-fat-jar=<fatJarFilename>
Specify the filename of the fat jar to generate from
your current software installation
-gck, --google-cloud-service-account-key-file=<serviceAccountKeyFile>
Configure the path to a Google Cloud service account
JSON key file
-gdc, --google-drive-credentials-file=<credentialsFile>
Configure the path to a Google Drive credentials JSON
file
-gdt, --google-drive-tokens-folder=<tokensFolder>
Configure the path to a Google Drive tokens folder
-h, --help Displays this help message and quits.
-p, --project=<projectName>
The name of the project to manage
-pa, --project-metadata-base=<projectMetadataBaseFolder>
The metadata base folder (relative to home)
-pb, --project-datasets-base=<projectDataSetsCsvFolder>
The data sets CSV folder (relative to home)
-pc, --project-create Create a new project. Also specify the name and its home
-pd, --project-delete Delete a project
-pf, --project-config-file=<projectConfigFile>
The configuration file relative to the home folder. The
default value is project-config.json
-ph, --project-home=<projectHome>
The home directory of the project
-pl, -projects-list List the defined projects
-pm, --project-modify Modify a project
-pn, --projects-enabled
Enable or disable the projects plugin
-pp, --project-company=<projectCompany>
The company
-pr, --project-parent=<projectParent>
The name of the parent project to inherit metadata and
variables from
-ps, --project-description=<projectDescription>
The description of the project
-pt, --project-department=<projectDepartment>
The department
-pu, --project-unit-tests-base=<projectUnitTestsBasePath>
The unit tests base folder (relative to home)
-pv, --project-variables=<projectVariables>[,<projectVariables>...]
A list of variable=value combinations separated by a
comma
-px, --project-enforce-execution=<projectEnforceExecutionInHome>
Validate before execution that a workflow or pipeline is
located in the project home folder or a sub-folder
(true/false).
-py, --project-mandatory
Make it mandatory to reference a project
-sj, --standard-projects-folder=<standardProjectsFolder>
GUI: The standard projects folder proposed when creating
projects
-sp, --standard-parent-project=<standardParentProject>
The name of the standard project to use as a parent when
creating new projects
-sv, --set-variable=<setVariable>
Set a variable, use format VAR=Value
-xm, --export-metadata=<metadataJsonFilename>
Export project metadata to a single JSON file which you
can specify with this option. Also specify the -p
option.
The available options are listed below:
Short Option | Extended Option | Description | ||
---|---|---|---|---|
-h | --help | Displays this help message and quits. | ||
-ec | --environment-create | Create an environment. Also specify the name and its home | ||
-ed | --environment-delete | Delete an environment | ||
-el | --environment-list | List the defined environments | ||
-em | --environment-modify | Modify an environment | ||
-pc | --project-create | Create a new project. Also specify the name and its home | ||
-pd | --prject-delete | Delete a project | ||
-pl | --project-list | List the defined projects | ||
-pm | --project-modify | Modify a project | ||
-dv | --describe-variable=<describeVariable> | Describe a variable | ||
-e | -environment=<environmentName> | The name of the environment to manage | ||
-ep | --environment-project=<environmentProject> | The project for the environment | ||
-eu | --environment-purpose=<environmentPurpose> | The purpose of the environment: Development, Testing, Production, CI, … | ||
-fj | --generate-fat-jar=<fatJarFilename> | Specify the filename of the fat jar to generate from your current software installation | ||
-xm | --export-metadata=<metadataJsonFilename> | Export project metadata to a single JSON file which you can specify with this option. Also specify the -p option to know which metadata to export. | ||
-p | --project=<projectName> | The project name | ||
-pa | --project-metadata-base=<projectMetadataBaseFolder> | The metadata base folder (relative to home) | ||
-pb | --project-datasets-base-base=<projectDataSetsCsvFolder> | The data sets CSV folder (relative to home) | ||
-pf | --project-config-file=<projectConfigFile> | The configuration file relative to the home folder. The default value is project-config.json | ||
-ph | --project-home=<projectHome> | The home directory of the project | ||
-pp | --project-company=<projectCompany> | The company | ||
-ps | --project-description=<projectDescription> | The description of the project | ||
-pt | --project-department=<projectDepartment> | The department | ||
-pu | --project-unit-tests-base=<projectUnitTestsBasePath> | The unit tests base folder (relative to home) | ||
-px | --project-enforce-execution=<projectEnforceExecutionInHome> | Validate before execution that a workflow or pipeline islocated in the project home folder or a sub-folder (true/false) | ||
-sv | --set-variable=<setVariable> | Set a variable, use format VAR=Value
| ||
-cfg | --config-file=<configFile> | Specify the configuration JSON file to manage | ||
-cfd | --config-file-describe-variables=<configDescribeVariables>[,<configDescribeVariables>…] | A list of variable=description combinations separated by a comma | ||
-cfv | --config-file-set-variables=<configSetVariables> ,<configSetVariables>…] | A list of variable=value combinations separated by a comma | ||
-eg | --environment-config-files=<environmentConfigFiles>[, <environmentConfigFiles>…] | A list of configuration files for this lifecycle environment, comma separated | ||
-pv | --project-variables=<projectVariables>[,<projectVariables>…] | A list of variable=value combinations separated by a comma |
Project Usage and Configuration
Configuration on the command line
The
script offers many options to edit environment definitions.hop-conf
Creating an environment
$ sh hop-conf.sh \
--environment-create \
--environment hop2 \
--environment-project hop2 \
--environment-purpose=Development \
--environment-config-files=/home/user/projects/hop2-conf.json
Creating environment 'hop2'
Environment 'hop2' was created in Hop configuration file <path-to-hop>/config/hop-config.json
2021/02/01 16:37:02 - General - ERROR: Configuration file '/home/user/projects/hop2-conf.json' does not exist to read variables from.
Created empty environment configuration file : /home/user/projects/hop2-conf.json
hop2
Purpose: Development
Configuration files:
Project name: hop2
Config file: /home/user/projects/hop2-conf.json
As you can see from the log, an empty file was created to set variables in:
{ }
Setting variables in an environment
This command adds a variable to the environment configuration file:
$ sh hop-conf.sh --config-file /home/user/projects/hop2-conf.json --config-file-set-variables DB_HOSTNAME=localhost,DB_PASSWORD=abcd
Configuration file '/home/user/projects/hop2-conf.json' was modified.
If we look at the file
we’ll see that the variables were added:hop2-conf.json
{
"variables" : [ {
"name" : "DB_HOSTNAME",
"value" : "localhost",
"description" : ""
}, {
"name" : "DB_PASSWORD",
"value" : "abcd",
"description" : ""
} ]
}
Please note that you can add descriptions for the variables as well with the
option. Please run hop-conf without options to see all the possibilities.--describe-variable
Projects Plugin configuration
There are various options to configure the behavior of the
plugin itself. In Hop configuration file Projects
we can find the following options:hop-config.json
{
"projectMandatory" : true,
"environmentMandatory" : false,
"defaultProject" : "default",
"defaultEnvironment" : null,
"standardParentProject" : "default",
"standardProjectsFolder" : "/home/matt/test-stuff/"
}
Option | Description | hop-conf option |
---|---|---|
projectMandatory | This will prevent anyone from using hop-run without specifying a project |
|
environmentMandatory | This will prevent anyone from using hop-run without specifying an environment |
|
defaultProject | The default project to use when none is specified |
|
defaultEnvironment | The default environment to use when none is specified |
|
standardParentProject | The standard parent project to propose when creating new project |
|
standardProjectsFolder | The folder to which you’ll browse by default in the GUI when creating new projects |
|
Running Projects and Pipelines
You can specify an environment or a project when executing a pipeline or a workflow. By doing so you are automatically configuring metadata, variables without too much fuss.
The easiest example is shown by executing the "complex" pipeline from the Apache Beam examples:
$ sh hop-run.sh --project samples --file 'beam/pipelines/complex.hpl' --runconfig Direct
2021/02/01 16:52:15 - HopRun - Enabling project 'samples'
2021/02/01 16:52:25 - HopRun - Relative path filename specified: config/projects/samples/beam/pipelines/complex.hpl
2021/02/01 16:52:26 - General - Created Apache Beam pipeline with name 'complex'
2021/02/01 16:52:27 - General - Handled transform (INPUT) : Customer data
2021/02/01 16:52:27 - General - Handled transform (INPUT) : State data
2021/02/01 16:52:27 - General - Handled Group By (STEP) : countPerState, gets data from 1 previous transform(s)
2021/02/01 16:52:27 - General - Handled transform (STEP) : uppercase state, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Handled Merge Join (STEP) : Merge join
2021/02/01 16:52:27 - General - Handled transform (STEP) : Lookup count per state, gets data from 1 previous transform(s), targets=0, infos=1
2021/02/01 16:52:27 - General - Handled transform (STEP) : name<n, gets data from 1 previous transform(s), targets=2, infos=0
2021/02/01 16:52:27 - General - Transform Label: N-Z reading from previous transform targeting this one using : name<n - TARGET - Label: N-Z
2021/02/01 16:52:27 - General - Handled transform (STEP) : Label: N-Z, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Transform Label: A-M reading from previous transform targeting this one using : name<n - TARGET - Label: A-M
2021/02/01 16:52:27 - General - Handled transform (STEP) : Label: A-M, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Handled transform (STEP) : Switch / case, gets data from 2 previous transform(s), targets=4, infos=0
2021/02/01 16:52:27 - General - Transform CA reading from previous transform targeting this one using : Switch / case - TARGET - CA
2021/02/01 16:52:27 - General - Handled transform (STEP) : CA, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Transform NY reading from previous transform targeting this one using : Switch / case - TARGET - NY
2021/02/01 16:52:27 - General - Handled transform (STEP) : NY, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Transform FL reading from previous transform targeting this one using : Switch / case - TARGET - FL
2021/02/01 16:52:27 - General - Handled transform (STEP) : FL, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Transform Default reading from previous transform targeting this one using : Switch / case - TARGET - Default
2021/02/01 16:52:27 - General - Handled transform (STEP) : Default, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Handled transform (STEP) : Collect, gets data from 4 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Handled transform (OUTPUT) : complex, gets data from Collect
2021/02/01 16:52:27 - General - Executing this pipeline using the Beam Pipeline Engine with run configuration 'Direct'
2021/02/01 16:52:34 - General - Beam pipeline execution has finished.
To execute an Apache Beam pipeline a lot of information and metadata is needed. Let’s dive into a few fun information tidbits:
-
By referencing the
project Hop knows where the project is located (samples
)config/projects/samples
-
Since we know the location of the project, we can specify pipelines and workflows with a relative path
-
The project knows where its metadata is stored (
) so it knows where to find theconfig/projects/samples/metadata
pipeline run configuration (Direct
)config/projects/samples/metadata/pipeline-run-configuration/Direct.json
-
This run configuration defines its own pipeline engine specific variables, in this case the output folder :
DATA_OUTPUT=${PROJECT_HOME}/beam/output/
-
The output of the samples is as such written to
config/projects/samples/beam/output
To reference an environment you can execute using
or -e
. The only difference is that you’ll have a number of extra environment variables set while executing.--environment
Cloud Storage Configuration
Hop Conf can be used to configure your AWS, Azure and Google Cloud (Cloud Storage and Drive) accounts with Hop through VFS
Azure
Set the account, block increment size for new files and your Azure key
-aza, --azure-account=<account>
The account to use for the Azure VFS
-azi, --azure-block-increment=<blockIncrement>
The block increment size for new files on Azure,
multiples of 512 only.
-azk, --azure-key=<key>
The key to use for the Azure VFS
Google Cloud Storage
Set the path to your Google Cloud service account JSON key file
-gck, --google-cloud-service-account-key-file=<serviceAccountKeyFile>
Configure the path to a Google Cloud service account JSON key file
Google Drive
Set the path to your Google Drive credentials JSON file or Google Drive tokens folder.
-gdc, --google-drive-credentials-file=<credentialsFile>
Configure the path to a Google Drive credentials JSON
file
-gdt, --google-drive-tokens-folder=<tokensFolder>
Configure the path to a Google Drive tokens folder