BigQuery Output

Description

The BigQuery Output transform writes data to a Google Cloud BigQuery table. On the local Hop engine it uses the BigQuery Java client’s streaming insertAll API (batches of up to 500 rows); on Beam engines it uses the Beam-native BigQuery sink.

Supported Engines

Hop Engine

Supported

Spark

Supported

Flink

Supported

Dataflow

Supported

Options

Option Description

Option	Description
Transform name	Name of the transform, this name has to be unique in a single pipeline.
Project ID	The Google Cloud Platform project. Leave blank to use the application-default project from `GOOGLE_APPLICATION_CREDENTIALS`.
Data set ID	The BigQuery dataset ID. Must already exist.
Table ID	The BigQuery table ID.
Create table if needed	Create the table when it does not yet exist. The schema is derived from the input row meta (Hop type → BigQuery `STANDARD_SQL` type). Default: true.
Truncate table	Empty the table before writing. On the Hop engine this runs a `TRUNCATE TABLE` DDL — free, no DML quota, preserves schema/partitioning/clustering.
Fail if the table is not empty	Refuses to run if the target table already has rows. Mutually useful with `Truncate table` for idempotent loads.

Transform name

Name of the transform, this name has to be unique in a single pipeline.

Project ID

The Google Cloud Platform project. Leave blank to use the application-default project from GOOGLE_APPLICATION_CREDENTIALS.

Data set ID

The BigQuery dataset ID. Must already exist.

Table ID

The BigQuery table ID.

Create table if needed

Create the table when it does not yet exist. The schema is derived from the input row meta (Hop type → BigQuery STANDARD_SQL type). Default: true.

Truncate table

Empty the table before writing. On the Hop engine this runs a TRUNCATE TABLE DDL — free, no DML quota, preserves schema/partitioning/clustering.

Fail if the table is not empty

Refuses to run if the target table already has rows. Mutually useful with Truncate table for idempotent loads.

BigQuery streaming inserts have their own quotas (10 000 rows / 10 MB per insertAll request, 100 000 rows/sec/project default) — separate from DML quotas. Rows are visible to SELECT almost immediately but live in the streaming buffer for up to ~90 minutes before they migrate to managed storage; DML against those rows is rejected until then, but DDL (including TRUNCATE TABLE from the Truncate option) is fine.