Google Dataflow Pipeline (Template)

Apache Hop pipelines can be scheduled and triggered in various ways. In this section we will walk through the steps needed to schedule a pipeline on Google Dataflow using Dataflow Templates. Apache Hop uses a flex template to launch a job on Google Dataflow.

Preparing your environment

Before we can add a new pipeline in the Google Cloud Platform console we need to create a Google Storage bucket that contains 3 types of files.

Hop pipelines

The pipelines you created using the Hop Gui and wish to schedule in Google Dataflow.

Tip

You can also create a Hop project using a Google Storage bucket this way you can directly create and edit Hop pipelines in GS

Hop Metadata

For the pipeline to be able to use Hop metadata objects and other run configurations we need to generate a hop metadata.json file. This file can be generated from the GUI under Tools → Export metadata to JSON or using the export-metadata function from the Hop conf tool.

Beam Flex template metadata file:

The final part to get everything working is a metadata file used by Dataflow to stitch all the parts together.

{
    "defaultEnvironment": {},
    "image": "apache/hop-dataflow-template:latest",
    "metadata": {
        "description": "This template allows you to start Hop pipelines on dataflow",
        "name": "Template to start a hop pipeline",
        "parameters": [
            {
                "helpText": "Google storage location pointing to the Hop metadata file",
                "label": "Hop Metadata Location",
                "name": "hopMetadataLocation",
                "regexes": [
                    ".*"
                ]
            },
            {
                "helpText": "Google storage location pointing to the pipeline you wish to start",
                "label": "Hop Pipeline Location",
                "name": "hopPipelineLocation",
                "regexes": [
                    ".*"
                ]
            }
        ]
    },
    "sdkInfo": {
        "language": "JAVA"
    }
}
Important

You can change the docker image used in the metadata file

Creating a Dataflow pipeline

Now we can go back to the console and "Create data pipeline"

beam dataflow template

When selecting the Beam Flex template metadata file you will notice required parameters showing up. You can then add the path yo yhe Hop metadata and Hop pipeline stored in cloud storage.