Data types
As a best practice for producing consistent, predictable outcomes when working with your data in Apache Hop, you must consider how the Apache Hop engine processes different data types and field metadata in transformations and jobs.
| As a rule, data is never modified by metadata inside of Apache Hop. Data is only modified when Apache Hop writes to files or similar objects, but not to databases. |
Apache Hop data types map internally to Java data types, so the Java behavior of these data types applies to the associated fields, parameters, and variables used in your workflows and pipelines. The following table describes these mappings.
| Apache Hop | Java data type | Description |
|---|---|---|
BigNumber | BigDecimal | An arbitrary unlimited precision number. |
Binary | Byte[] | An array of bytes that contain any type of binary data. |
Boolean | Boolean | A boolean value true or false. |
Date | Date | A date-time value with millisecond precision. |
Integer | Long | A signed long 64-bit integer. |
Internet | Address | InetAddress An Internet Protocol (IP) address. |
Number | Double | A double precision floating point value. |
String | String | A variable unlimited length text encoded in UTF-8 (Unicode). |
Timestamp | Timestamp | Allows the specification of fractional seconds to a precision of nanoseconds. |
| Apache Hop also comes with a number of additional complex data types (e.g. Avro, JSON, Graph) that have no one-on-one mapping to Java data types. These data types only work with specific transforms and can’t be used in general-purpose transforms. |
Conversions and comparisons
Nulls and sort order
-
Null handling and sort behavior follow Java comparison semantics for the mapped type, unless noted below.
String parsing/formatting
-
When converting from a String to another type, Hop uses the format settings defined in the transform metadata (for example, date masks, decimal symbols, and grouping characters).
-
When converting from another type to a String, Hop applies the same metadata-based format.
For these reasons, it is important to be explicit about formats when working with String types. A common issue arises with decimal separators, which may differ between environments. For example, an EU locale typically uses , for decimals, while a US locale uses .. Using an explicit format avoids these inconsistencies.
Timestamp
A Timestamp value can be created from a Date, String, Number, or another Timestamp.
Conversions
-
Date → Timestamp: Represents the same instant; fractional seconds default to
0. -
String → Timestamp: Parsed using the specified date mask (for example,
yyyy-MM-dd HH:mm:ss). -
Number → Timestamp: Interpreted as nanoseconds since the epoch.
UUID
UUID values map to java UUID object that stores them as 16-byte type. Storing a UUID with the String type uses 32-byte. The difference in storage is noticeable in many situations like inserting in a database with native UUID support.
When using MongoDB, Hop writes UUIDs using the STANDARD representation, which corresponds to BSON Binary subtype 4.
Sorting
Databases may order UUID columns by their raw binary value (as is the case in PostgreSQL). Hop, however, compares UUIDs by their two 64-bit halves as signed longs — first the most significant bits (MSB), then the least significant bits (LSB). These ordering methods can differ. To ensure consistent results, perform all sorting either within Hop or entirely in the database.
Gotchas
Writing a UUID to a database without native support as a UUID type (e.g. MySQL) will fail. Convert them to String before writing.
JSON
The JSON type in Hop follows the official JSON standard: https://www.json.org/json-en.html. It is treated as an unordered set of key/value pairs. This means that JSON values may behave differently from their String representations. For example, consider these two JSON objects:
{ "a": 1, "b": 2 } { "b": 2, "a": 1 } When treated as String, they are different. When treated as JSON, they are considered equal. Because JSON comparison in Hop performs structural checks, it is generally more robust — but also slightly slower — than string comparison.
Sorting
The way Hop sorts JSON values resembles how PostgreSQL sorts JSONB values. Keys are sorted alphabetically, and values are compared in the following type order:
NULL < MISSING < BINARY < STRING < NUMBER < BOOLEAN < ARRAY < OBJECT Arrays are compared element by element; if all compared elements are equal, the longer array is considered greater.
MongoDB
The JSON type supports all standard JSON values when reading or writing to MongoDB. Values that are part of MongoDB’s extended JSON (for example, Date objects) are not supported. To fully support MongoDB-specific BSON values, use the String type instead.