Concepts

Tools

Hop Conf

Hop Conf is a command line tool to manage various aspects of your Hop configuration: projects, environments, cloud configuration and more

Hop Encrypt

Hop Encrypt is a command line tool that obfuscates or encrypts a plain text password for use in XML, password or metadata files. Make sure to also copy the password encryption prefix to indicate the obfuscated nature of the password. Hop will then be able to make the distinction between regular plain text passwords and obfuscated ones.

Hop Gui

Hop Gui is the visual IDE where Hop data developers create, test, run and manage the life cycle for workflows and pipelines. In addition to functionality for development and life cycle management, Hop Gui contains tools and perspectives to manage projects and environments, to search and manage metadata, to manage and version control a large variety of files and to explore logging in a Neo4j graph.

Hop Run

Hop Run is a command line tool to run workflows and pipelines, with options to (list or) specify projects, environments, properties and run configurations.

Hop Search

Hop Search is a command line tool to search all metadata available in a specific project or environment.

Hop Server

Hop Server is a web service interface to manage and run workflows and pipelines.

Item types

Action

An Action is one operation performed in a Workflow. Actions are executed sequentially by default, with parallel execution as a configuration option. An Action returns a true or false exit code, which can be used (or ignored) in the Workflow’s execution.

Hop

A Hop links Actions in a Workflow or Transforms in a Pipeline. In Workflows, Hops operate based on the exit status of previous Actions, Hops in Pipelines pass data between Transforms.

Pipeline

Pipelines are the actual data workers. Operations in a Pipeline read, modify, enrich, clean and write data. Orchestration of Pipelines is done through othere Pipelines and/or Workflows.

Transform

A Transform is a unit of work performed in a Pipeline. Typical Transform operations are reading data from files, databases, performing lookups or joins, enriching, cleaning data and more. All transforms in a Pipeline are executed in parallel. Transforms process data and move batches of processed data on Hops for processing by subsequent Actions.

Workflow

A Workflow is a sequence of operations that are performed sequentially by default (with optional parallel execution). Workflows usually do not operate on the data directly, but perform orchestration tasks. Typical tasks in a Workflow consist of retrieving and archiving data, sending emails, error handling etc. )

Projects and Environments

Project

Hop Projects are a conceptual grouping of configurations, variables, metadata objects and workflows and pipelines. Projects can inherit metadata from parent projects. A project contains one or more environments where the actual configuration is defined.
Example: a 'Sales' project contains a 'customers' database connection and a number of workflows and pipelines. The runtime configurations, database connection properties etc are defined in the 'dev', 'uat' and 'prd' environments.

Environment

Hop Environments are instances of projects that hold the actual runtime configurations and other metadata objects for a project.
Example: the 'dev' environment for the 'Sales' project specifies to read from host '10.0.0.1' for the 'customers' database connection

Metadata

Hop Metadata is the central storage repository for shared metadata like relational database connections, run configurations, servers, git repositories and so on. Metadata is persisted as json and is stored by default in a project’s base folder.

Metadata Driven Architecture

Data types

As a best practice for producing consistent, predictable outcomes when working with your data in Apache Hop, you must consider how the Apache Hop engine processes different data types and field metadata in transformations and jobs.

As a rule, data is never modified by metadata inside of Apache Hop. Data is only modified when Apache Hop writes to files or similar objects, but not to databases.

Apache Hop data types map internally to Java data types, so the Java behavior of these data types applies to the associated fields, parameters, and variables used in your workflows and pipelines. The following table describes these mappings.

Apache Hop	Java data type	Description
BigNumber	BigDecimal	An arbitrary unlimited precision number.
Binary	Byte[]	An array of bytes that contain any type of binary data.
Boolean	Boolean	A boolean value true or false.
Date	Date	A date-time value with millisecond precision.
Integer	Long	A signed long 64-bit integer.
Internet	Address	InetAddress An Internet Protocol (IP) address.
Number	Double	A double precision floating point value.
String	String	A variable unlimited length text encoded in UTF-8 (Unicode).
Timestamp	Timestamp	Allows the specification of fractional seconds to a precision of nanoseconds.

Apache Hop

Java data type

Description

BigNumber

BigDecimal

An arbitrary unlimited precision number.

Binary

Byte[]

An array of bytes that contain any type of binary data.

Boolean

A boolean value true or false.

Date

A date-time value with millisecond precision.

Integer

Long

A signed long 64-bit integer.

Internet

Address

InetAddress An Internet Protocol (IP) address.

Number

Double

A double precision floating point value.

String

A variable unlimited length text encoded in UTF-8 (Unicode).

Timestamp

Allows the specification of fractional seconds to a precision of nanoseconds.

Apache Hop also comes with a number of additional complex data types (e.g. Avro, JSON, Graph) that have no one-on-one mapping to Java data types. These data types only work with specific transforms and can’t be used in general-purpose transforms.

Conversions and comparisons

Nulls and sort order

Null handling and sort behavior follow Java comparison semantics for the mapped type, unless noted below.

String parsing/formatting

When converting from a String to another type, Hop uses the format settings defined in the transform metadata (for example, date masks, decimal symbols, and grouping characters).
When converting from another type to a String, Hop applies the same metadata-based format.

For these reasons, it is important to be explicit about formats when working with String types. A common issue arises with decimal separators, which may differ between environments. For example, an EU locale typically uses , for decimals, while a US locale uses .. Using an explicit format avoids these inconsistencies.

Timestamp

A Timestamp value can be created from a Date, String, Number, or another Timestamp.

Conversions

Date → Timestamp: Represents the same instant; fractional seconds default to 0.
String → Timestamp: Parsed using the specified date mask (for example, yyyy-MM-dd HH:mm:ss).
Number → Timestamp: Interpreted as nanoseconds since the epoch.

UUID

UUID values map to java UUID object that stores them as 16-byte type. Storing a UUID with the String type uses 32-byte. The difference in storage is noticeable in many situations like inserting in a database with native UUID support.

When using MongoDB, Hop writes UUIDs using the STANDARD representation, which corresponds to BSON Binary subtype 4.

Sorting

Databases may order UUID columns by their raw binary value (as is the case in PostgreSQL). Hop, however, compares UUIDs by their two 64-bit halves as signed longs — first the most significant bits (MSB), then the least significant bits (LSB). These ordering methods can differ. To ensure consistent results, perform all sorting either within Hop or entirely in the database.

Gotchas

Writing a UUID to a database without native support as a UUID type (e.g. MySQL) will fail. Convert them to String before writing.

JSON

The JSON type in Hop follows the official JSON standard: https://www.json.org/json-en.html. It is treated as an unordered set of key/value pairs. This means that JSON values may behave differently from their String representations. For example, consider these two JSON objects:

{ "a": 1, "b": 2 }

{ "b": 2, "a": 1 }

When treated as String, they are different. When treated as JSON, they are considered equal. Because JSON comparison in Hop performs structural checks, it is generally more robust — but also slightly slower — than string comparison.

Sorting

The way Hop sorts JSON values resembles how PostgreSQL sorts JSONB values. Keys are sorted alphabetically, and values are compared in the following type order:

NULL < MISSING < BINARY < STRING < NUMBER < BOOLEAN < ARRAY < OBJECT

Arrays are compared element by element; if all compared elements are equal, the longer array is considered greater.

MongoDB

The JSON type supports all standard JSON values when reading or writing to MongoDB. Values that are part of MongoDB’s extended JSON (for example, Date objects) are not supported. To fully support MongoDB-specific BSON values, use the String type instead.

Various

The following items are an alphabetically ordered list of concepts that are used throughout Hop and will be mentioned at various locations in the Hop tools and documentation.

Lazy Loading

If enabled, all data conversions (character decoding, data conversion, trimming, …) for the data being read will be postponed as long as possible, effectively reading the data as binary fields. Enabling lazy conversion can significantly decrease the CPU cost of reading data.
When to avoid: if the data conversion needs to be performed later in the stream anyway, postponing the conversion may slow things down instead of speeding up.
When to use: use cases where Lazy Conversion may speed things up when 1) data is read and written to another file without conversion, 2) data needs to be sorted and doesn’t fit in memory. In this case, serialization to disk is faster with lazy conversion because encoding and type conversions are postponed, or 3) bulk-loading to database without the need for data conversion. Bulk loading utilities typically read text directly and the generation of this text is faster (this does not apply to Table Output).