Script transform Icon Script

Description

The Script transform allows you to write code in any language supported by the JSR-223 standard.

By default, the Hop project ships with the following language support:

ECMAScript (JavaScript as part of the JVM) was removed from the JVM with Java 17. You can still add it in Apache Hop 2.10 and later as documented in the Adding Scripting Languages section.

Supported Engines

Hop Engine

Supported

Spark

Maybe Supported

Flink

Maybe Supported

Dataflow

Maybe Supported

Options

Option Description

Transform name

Name of the transform this name has to be unique in a single pipeline.

Script engine

You can choose any of the discovered JSR-223 scripting engines from the drop-down list.

Script

You can add one or more scripts. There are different types of scripts which you can change by right-clicking on the script tab.

Transform : The script will be executed after every row that is read. Start: The script will be executed once at the start of the transform. End: The script will be executed after the last row was read

Fields

Here you can specify the fields to retrieve from the context of the transform script after every row.

Variable bindings

To allow you to work with the input field values or the surrounding Java ecosystem Hop exposes a bunch of variable bindings.

Variable Description

input

All the input fields under their own name. Please note that certain scripting languages have restrictions on the allowed names of variables. Make sure to rename inappropriate field names with a "Select Values" transform.

transform

This is a reference to the parent transform class Script. You can use this to log certain important events or write extra output rows

pipeline_status

This special variable can be set to any of the following values:

CONTINUE_PIPELINE: Process the current row and generate output (the default value) SKIP_PIPELINE : Do not write an output row STOP_PIPELINE : Cause the pipeline to stop processing rows ERROR_PIPELINE: Cause the pipeline to abort with an error

rowNumber

This is the current row number, starting from 1.

rowMeta

The input row metadata. It’s a list of value metadata.

row

An object array containing the current set of field values. Make sure to address these values using rowMeta to make sure the appropriate data conversions take place.

For example, use rowMeta.getString(row, 0) to get the first String value from the input row.

previousRow

The previous row or null if row is the first input row.

transformName

The name of the transform

pipelineName

The name of the pipeline

Generating rows

Below are scripts that generate 10 output rows using a simple loop in 3 different scripting engines. This happens 3 times with identical output. For the 3 examples you need to define 2 output fields:

  • id : an Integer

  • name : a String

ECMAScript

var Long = Packages.java.lang.Long;
var RowDataUtil = Packages.org.apache.hop.core.row.RowDataUtil;

var START=100000;
var COUNT=10;
var END=START+COUNT;
var id=START;

for (var id=START;id<END;id++) {
  var outputRow = RowDataUtil.allocateRowData(rowMeta.size());
  outputRow[0] = new Long(id);
  outputRow[1] = "Apache Hop "+id;
  transform.putRow(outputRowMeta, outputRow);
}

pipeline_status=SKIP_PIPELINE;

Groovy

def COUNT=10;

id = 100000L;
(1..COUNT).each {
 outputRow = RowDataUtil.allocateRowData(rowMeta.size());
 outputRow[0] = id;
 outputRow[1] = "Apache Hop "+id
 transform.putRow(outputRowMeta, outputRow);

 id++;
}

pipeline_status=SKIP_PIPELINE;

Python

import java.lang.Long as Long

START=100000
COUNT=10
END=START+COUNT
id=START

for id in range(START,END):
	outputRow = RowDataUtil.allocateRowData(rowMeta.size())
	outputRow[0] = Long(id)
	outputRow[1] = "Apache Hop "+str(id)
	transform.putRow(outputRowMeta, outputRow)

pipeline_status=SKIP_PIPELINE

Aggregating rows

Below are scripts that aggregate rows over different groups. The data is sorted by the field group and contains a value field which is summed up into field sum. In the start scripts we define variables sum=0 and previousGroup="".

For the 3 examples you need to define 1 output field:

  • sum : an Integer

ECMAScript

if (group!==previousGroup) {
  sum=value;
  previousGroup=group;
} else {
  sum+=value;
}

if (nextGroup==null) {
  pipeline_status=CONTINUE_PIPELINE;
} else {
  pipeline_status=SKIP_PIPELINE;
}

Groovy

if (!group.equalsIgnoreCase(previousGroup)) {
  sum=value;
  previousGroup=group;
} else {
  sum+=value;
}

if (nextGroup==null) {
  pipeline_status=CONTINUE_PIPELINE;
} else {
  pipeline_status=SKIP_PIPELINE;
}

Python

if group!=previousGroup:
  sum=value
  previousGroup=group
else:
  sum=sum+value

if nextGroup is None:
  pipeline_status=CONTINUE_PIPELINE
else:
  pipeline_status=SKIP_PIPELINE;

Adding scripting languages

You can add additional scripting languages by adding the required libraries to the plugins/transforms/script/lib folder.

For example, to add support for the Ruby scripting language you need to add the following JRuby libraries:

  • jruby-stdlib-<version>.jar

  • jruby-core-<version>.jar

To add Javascript support, download the Nashorn Core jar

  • nashorn-core-15.4.jar

After restarting the Apache Hop GUI you’ll notice that there’s a ruby or Javascript entry in the Scripting Engine dropdown box.