Execution Data Profile

Description

An Apache Hop Execution Data Profile builds data profiles as data flow through pipelines. A number of data profilers can be selected and configure the fine tune the type and detail of the data that is profiled.

Options

Option Description

Name

The name to be used for this Execution Data Profile

Description

A description to be used for this Execution Data Profile

Data Samplers to use

One or more data samplers to use with this Execution Data Profile. See details below.

Data Samplers

Data Sampler Description Options

Data profile output rows

Allow for some basic data profiling to be performed on transform output rows

  • Sample size: This is the maximum number of sample rows kept for any discovered profiling result (default: 25)

  • Last transforms only: only perform data profiling on pipeline endpoints (last transforms)? (default: true)

  • Minima: store the minimum value for this data profile (default: true)

  • Maxima: store the maximum value for this data profile (default: true)

  • Count nulls: count null values for this data profile (default: true)

  • Count non-nulls: count non-null values for this data profile (default: true)

  • Min length: store the minimum lengths for this data profile (default: true)

  • Max length: store the maximum lengths for this data profile (default: true)

First output rows

Samples the first rows of a transform output

Sample size (default: 100)

Last output rows

Samples the last rows of a transform output

Sample size (default: 100)

Random output rows

Do reservoir sampling on the output rows of a transform

Sample size (default: 100)