Memory Group By

Description

The Memory Group By transform builds aggregates in a group by fashion.

This transform processes all rows within memory and therefore does not require a sorted input. However, it does require all data to fit into memory.

When the number of rows is too large to fit into memory, use a combination of Sort Rows and Group By transforms.

Supported Engines

Hop Engine

Supported

Spark

Supported

Flink

Supported

Dataflow

Supported

Options

Option	Description
Transform name	Name of the transform this name has to be unique in a single pipeline,
Always give back a result row	If you enable this option, the Group By transform will always give back a result row, even if there is no input row.
This can be useful if you want to count the number of rows. Without this option you would never get a count of zero (0).	The field that make up the group
After retrieving fields using the Get Fields button, designate the fields to include in the group. See the Group be transform for more details.	Aggregates

Option

Description

Transform name

Name of the transform this name has to be unique in a single pipeline,

Always give back a result row

If you enable this option, the Group By transform will always give back a result row, even if there is no input row.

This can be useful if you want to count the number of rows. Without this option you would never get a count of zero (0).

The field that make up the group

After retrieving fields using the Get Fields button, designate the fields to include in the group. See the Group be transform for more details.

Aggregates

Metadata Injection Support

All fields of this transform support metadata injection. You can use this transform with ETL Metadata Injection to pass metadata to your pipeline at runtime.