Metadata Injection

Metadata injection inserts data from various sources into a template pipeline at runtime to reduce repetitive tasks.

For example, you might have a simple pipeline to load transaction data values from a supplier, filter specific values, and output them to a file. If you have more than one supplier, you would need to run this simple pipeline for each supplier. Yet, with metadata injection, you can expand this simple repetitive pipeline by inserting metadata from another pipeline that contains the ETL Metadata Injection transform. This transform coordinates the data values from the various inputs through the metadata you define. This process reduces the need for you to adjust and run the repetitive pipeline for each specific input.

The repetitive pipeline is known as the template pipeline. The template pipeline is called by the ETL Metadata Injection transform. You will create a pipeline to prepare what common values you want to use as metadata and inject these specific values through the ETL Metadata Injection transform.

We recommend the following basic procedure for using this transform to inject metadata:

Optimize your data for injection, such as preparing folder structures and inputs.
Develop pipelines for the repetitive process (the template pipeline), for metadata injection through the ETL Metadata Injection transform, and for handling multiple inputs.

The metadata is injected into the template pipeline through any transform that supports metadata injection.

Supported Transforms

The goal is to add Metadata Injection support to all transforms, The current (15-October 2022) status is:

Transform

Supports MDI

Abort

Add a checksum

Add constants

Add sequence

Add value fields changing sequence

Add XML

Analytic query

Apache Tika

Append streams

Avro Decode

Avro Encode

Avro File Input

Avro File Output

Azure Event Hubs Listener

Azure Event Hubs Writer

Beam BigQuery Input

Beam BigQuery Output

Beam Bigtable Input

Beam Bigtable Output

Beam File Input

Beam File Output

Beam GCP Pub/Sub : Publish

Beam GCP Pub/Sub : Subscribe

Beam Kafka Consume

Beam Kafka Produce

Beam Kinesis Consume

Beam Kinesis Produce

Beam Timestamp

Beam Window

Block until transforms finish

Blocking transform

Calculator

Call DB procedure

Cassandra input

Cassandra output

Change file encoding

Check if file is locked

Check if webservice is available

Clone row

Closure generator

Coalesce Fields

Column exists

Combination lookup/update

Concat Fields

Copy rows to result

Credit card validator

CSV file input

Data grid

Database join

Database lookup

De-serialize from file

Delay row

Delete

Detect empty stream

Dimension lookup/update

Doris bulk loader

Dummy (do nothing)

Dynamic SQL row

EDI to XML

Email messages input

Enhanced JSON Output

ETL metadata injection

Execute a process

Execute row SQL script

Execute SQL script

Execute Unit Tests

Execution Information

Fake data

File exists

File Metadata

Filter rows

Formula

Fuzzy match

Generate random value

Generate rows

Get data from XML

Get file names

Get files from result

Get files rows count

Get ID from hop server

Get Neo4j Logging Info

Get records from stream

Get rows from result

Get Server Status

Get subfolder names

Get system info

Get table names

Get variables

Group by

HTTP client

HTTP post

Identify last row in a stream

If Null

Injector

Insert / update

Java filter

JavaScript

Join rows (cartesian product)

JSON input

JSON output

Kafka Consumer

Kafka Producer

LDAP input

LDAP output

Load file content in memory

Mail

Mapping Input

Mapping Output

Memory group by

Merge join

Merge rows (diff)

Metadata Input

Metadata structure of stream

Microsoft Access output

Microsoft Excel input

Microsoft Excel writer

MonetDB bulk loader

MongoDB Delete

MongoDB input

MongoDB output

Multiway merge join

Neo4j Cypher

Neo4j Cypher Builder

Neo4j Generate CSVs

Neo4j Graph Output

Neo4j Import

Neo4J Output

Neo4j Split Graph

Null if

Number range

org.mozilla.javascript.UniqueTag@1bac1db9: NOT_FOUND

Parquet File Input

Parquet File Output

PGP decrypt stream

PGP encrypt stream

Pipeline executor

Pipeline Logging

Pipeline Probe

PostgreSQL Bulk Loader

Process files

Properties input

Properties output

Regex evaluation

Replace in string

Reservoir sampling

REST client

Row denormaliser

Row flattener

Row normaliser

Rules accumulator

Rules executor

Run SSH commands

Salesforce delete

Salesforce input

Salesforce insert

Salesforce update

Salesforce upsert

Sample rows

SAS Input

Select values

Serialize to file

Set field value

Set field value to a constant

Set files in result

Set variables

Simple Mapping

Snowflake Bulk Loader

Sort rows

Sorted merge

Split field to rows

Split fields

Splunk Input

SQL file output

SSTable output

Standardize phone number

Stream lookup

Stream Schema Merge

String operations

Strings cut

Switch / case

Synchronize after merge

Table compare

Table exists

Table input

Table output

Teradata Fastload bulk loader

Text file input

Text file input (deprecated)

Text file output

Token Replacement

Unique rows

Unique rows (HashSet)

Update

User defined Java class

User defined Java expression

Value mapper

Web services lookup

Workflow executor

Workflow Logging

Write to log

XML input stream (StAX)

XML join

XML output

XSD validator

XSL Transformation

YAML input

Zip file