The Hop SDK

Initializing

First, we need to initialize the Hop API. This means we load configuration details, search for and load plugins and so on.

You can do this with:

HopEnvironment.init();

Plugin folders

If you need to load plugins from different folders you can set system property HOP_PLUGIN_BASE_FOLDERS. The default value is plugins, the relative path to the installation folder plugins folder. You can add your own plugin folders with commas (,) separating the values.

Hop Metadata Providers

Shared metadata in Hop is handled by a HopMetadataProvider. When using the Hop GUI you store it in a project in the metadata/ folder. In that case you can point to such a folder with class JsonMetadataProvider. Note that you can serialize such a metadata collection into a single JSON string using SerializableMetadataProvider. This can be used to send metadata to remote servers and locations.

When you have a provider you can ask it to give you a serializer with which you can add and retrieve all sorts of metadata objects.

Variables

If you want to work with variables in your pipelines and workflows it makes sense to create a top level IVariables object. The easiest way to do this is with:

IVariables variables = Variables.getADefaultVariableSpace();

This method also takes into account variables which are configured in hop-config.json (if that file can be found);

Pipelines

Loading pipeline metadata from a file

You can get the pipeline metadata from a .hpl XML file using:

PipelineMeta pipelineMeta = new PipelineMeta(
  "path-to-your-filename.hpl",   // The filename
  metadataProvider,             // See above
  true,                        // set internal variables
  variables                   // see above
);

Loading pipeline metadata from an input stream

PipelineMeta pipelineMeta = new PipelineMeta(
  inputStream,         // The stream to load from
  metadataProvider,   // See above
  true,              // set internal variables
  variables         // see above
);

Construct pipeline metadata with the Hop API

Obviously you can start with an empty pipeline and add the transforms, hops you like:

PipelineMeta pipelineMeta = new PipelineMeta();

// Generate 1M empty rows
//
RowGeneratorMeta rowGeneratorMeta = new RowGeneratorMeta();
rowGeneratorMeta.setRowLimit("1000000");

TransformMeta rowGenerator = new TransformMeta("1M", rowGeneratorMeta);
rowGenerator.setLocation(50, 50);
pipelineMeta.addTransform(rowGenerator);

// Just a dummy placeholder for testing
//
DummyMeta dummyMeta = new DummyMeta();
TransformMeta dummy = new TransformMeta("Output", dummyMeta);
dummy.setLocation(250, 50);
pipelineMeta.addTransform(dummy);

// Add a hop between both
//
PipelineHopMeta generatorDummyHop = new PipelineHopMeta(rowGenerator, dummy);
pipelineMeta.addPipelineHop(generatorDummyHop);

Pipeline execution

The way a pipeline is executed depends on the run configuration you specify. To make it easy to get the engine we created a factory for you:

IPipelineEngine pipelineEngine = PipelineEngineFactory.createPipelineEngine(
  variables,           // see above
  "local",            // The name of the run configuration defined in the metadata
  metadataProvider,  // The metadata provider to resolve the run configuration details
  pipelineMeta      // The pipeline metadata
);

// We can now simply execute this engine...
//
pipelineEngine.execute();

// This execution runs in the background but we can wait for it to finish:
//
pipelineEngine.waitUnitlFinished();

// When it's done we can evalute the results:
//
Result result = pipelineEngine.getResult();

Injecting data into a pipeline

You can only inject data into a LocalPipelineEngine. Do so using the addRowProducer. Call this method after your run prepareExecution() so that the row producer can be attached to the correct transform copy. After starting the execution of the pipeline you can then use the RowProducer to put rows into the pipeline using putRow(). Make sure to call setFinished() when you’re done feeding rows into the pipeline.

Retrieving rows from a pipeline

This again is only supported on the local pipeline engine LocalPipelineEngine. After prepareExecution() you can add row listeners to the various transforms:

ITransform transform = localPipeline.getTransform("transform-name", 0);
transform.addRowListener(new RowAdapter() {
  void rowWrittenEvent( IRowMeta rowMeta, Object[] row ) throws HopTransformException {
    // A row was written during execution
  }
});

Workflows

Loading workflow metadata from a file

You can get the workflow metadata from a .hwf XML file using:

WorkflowMeta workflowMeta = new WorkflowMeta(
  variables,                     // see above
  "path-to-your-filename.hwf",  // The filename
  metadataProvider             // See above
);

Loading workflow metadata from an input stream

WorkflowMeta workflowMeta = new WorkflowMeta(
  inputStream,                   // the inputstream to read the metadata from
  metadataProvider,             // See above
  variables                    // see above
);

Construct workflow metadata with the Hop API

You typically start with an empty workflow and then add the actions and hops you want:

WorkflowMeta workflowMeta = new WorkflowMeta();

// Add the Start action
//
ActionStart actionStart = new ActionStart("Start");
ActionMeta startMeta = new ActionMeta(actionStart);
startMeta.setLocation(50, 50);
workflowMeta.addAction(startMeta);

// Just a dummy placeholder for testing
//
ActionDummy actionDummy = new ActionDummy("Dummy");
ActionMeta dummyMeta = new ActionMeta(dummyMeta);
dummyMeta.setLocation(250, 50);
workflowMeta.addAction(dummyMeta);

// Add a hop between both
//
WorkflowHopMeta startDummyHop = new WorkflowHopMeta(startMeta, dummyMeta);
workflowMeta.addWorkflowHop(generatorDummyHop);

Workflow execution

Workflow engines are also plugins. Which plugin is used to execute your workflow metadata is specified in a Workflow Run Configuration.

To make it easy to get the engine we created a factory for you:

IWorkflowEngine workflowEngine = WorkflowEngineFactory.createWorkflowEngine(
  variables,           // see above
  "local",            // The name of the run configuration defined in the metadata
  metadataProvider,  // The metadata provider to resolve the run configuration details
  workflowMeta,     // The workflow metadata
  parentLogging    // The parent logging object
);

// We can now execute this engine...
// This execution does not run in the background.
// When you get the result, the execution has completed.
//
Result result = workflowEngine.startExecution();