Execution Data Profile

Description

An Apache Hop Execution Data Profile builds data profiles as data flow through pipelines. A number of data profilers can be selected and configure the fine tune the type and detail of the data that is profiled.

Options

Option	Description
Name	The name to be used for this Execution Data Profile
Description	A description to be used for this Execution Data Profile
Data Samplers to use	One or more data samplers to use with this Execution Data Profile. See details below.

Option

Description

Name

The name to be used for this Execution Data Profile

Description

A description to be used for this Execution Data Profile

Data Samplers to use

One or more data samplers to use with this Execution Data Profile. See details below.

Data Samplers

Data Sampler	Description	Options
Data profile output rows	Allow for some basic data profiling to be performed on transform output rows	Sample size: This is the maximum number of sample rows kept for any discovered profiling result (default: 25) Last transforms only: only perform data profiling on pipeline endpoints (last transforms)? (default: true) Minima: store the minimum value for this data profile (default: true) Maxima: store the maximum value for this data profile (default: true) Count nulls: count null values for this data profile (default: true) Count non-nulls: count non-null values for this data profile (default: true) Min length: store the minimum lengths for this data profile (default: true) Max length: store the maximum lengths for this data profile (default: true)
First output rows	Samples the first rows of a transform output	Sample size (default: 100)
Last output rows	Samples the last rows of a transform output	Sample size (default: 100)
Random output rows	Do reservoir sampling on the output rows of a transform	Sample size (default: 100)

Data Sampler

Description

Options

Data profile output rows

Allow for some basic data profiling to be performed on transform output rows

Sample size: This is the maximum number of sample rows kept for any discovered profiling result (default: 25)
Last transforms only: only perform data profiling on pipeline endpoints (last transforms)? (default: true)
Minima: store the minimum value for this data profile (default: true)
Maxima: store the maximum value for this data profile (default: true)
Count nulls: count null values for this data profile (default: true)
Count non-nulls: count non-null values for this data profile (default: true)
Min length: store the minimum lengths for this data profile (default: true)
Max length: store the maximum lengths for this data profile (default: true)

First output rows

Samples the first rows of a transform output

Sample size (default: 100)

Last output rows

Samples the last rows of a transform output

Sample size (default: 100)

Random output rows

Do reservoir sampling on the output rows of a transform

Sample size (default: 100)