Apache Beam Flink Pipeline Engine
Beam Flink
The Flink runner supports two modes: Local Direct Flink Runner and Flink Runner.
The Flink Runner and Flink are suitable for large scale, continuous jobs, and provide:
-
A streaming-first runtime that supports both batch processing and data streaming programs
-
A runtime that supports very high throughput and low event latency at the same time
-
Fault-tolerance with exactly-once processing guarantees
-
Natural back-pressure in streaming programs
-
Custom memory management for efficient and robust switching between in-memory and out-of-core data processing algorithms
-
Integration with YARN and other components of the Apache Hadoop ecosystem
Check the Apache Beam Flink runner docs for more information.
Options
Option | Description | Default |
---|---|---|
The Flink master | Address of the Flink Master where the Pipeline should be executed. Can either be of the form "host:port" or one of the special values [local], [collection] or [auto] | auto |
Parallelism | The pipeline wide maximum degree of parallelism to be used. The maximum parallelism specifies the upper limit for dynamic scaling and the number of key groups used for partitioned state. | -1 |
Checkpointing interval | The interval in milliseconds at which to trigger checkpoints of the running pipeline. | No checkpointing, -1 |
Checkpointing timeout (ms) | The maximum time in milliseconds that a checkpoint may take before being discarded. | -1 |
Minimum pause between checkpoints | The minimal pause in milliseconds before the next checkpoint is triggered | -1 |
Fail on checkpointing errors | Sets the expected behaviour for tasks in case that they encounter an error in their checkpointing procedure. If this is set to true, the task will fail on checkpointing error. If this is set to false, the task will only decline a the checkpoint and continue running | true |
Number of execution retries | Sets the number of times that failed tasks are re-executed. A value of zero effectively disables fault tolerance. A value of -1 indicates that the system default value (as defined in the configuration) should be used. | -1 |
Execution retry delay (ms) | Sets the delay in milliseconds between executions. A value of {@code -1} indicates that the default value should be used. | -1 |
Object re-use | Sets the behavior of reusing objects. Enabling the object reuse mode will instruct the runtime to reuse user objects for better performance. | false |
Disable metrics | Disable Beam metrics in Flink Runner | -1 |
Retain externalized checkpoints on cancellation | Sets the behavior of externalized checkpoints on cancellation. | false |
Maximum bundle size | The maximum number of elements in a bundle. | 1000 |
Maximum bundle time (ms) | The maximum time to wait before finalising a bundle (in milliseconds). | 1000 |
Shutdown sources on final watermark | Shuts down sources which have been idle for the configured time of milliseconds. Once a source has been shut down, checkpointing is not possible anymore. Shutting down the sources eventually leads to pipeline shutdown (=Flink job finishes) once all input has been processed. Unless explicitly set, this will default to Long.MAX_VALUE when checkpointing is enabled and to 0 when checkpointing is disabled. See FLINK-2491 for progress on this issue. | |
Latency tracking interval | Interval in milliseconds for sending latency tracking marks from the sources to the sinks. Interval value ⇐ 0 disables the feature. | 0 |
Auto watermark interval | The interval in milliseconds for automatic watermark emission. | |
Batch execution mode | Flink mode for data exchange of batch pipelines. Reference {@link org.apache.flink.api.common.ExecutionMode}. Set this to BATCH_FORCED if pipelines get blocked, see FLINK-10672 | P |
User agent | A user agent string as per RFC2616, describing the pipeline to external services. | |
Temp location | Path for temporary files. | |
Plugins to stage (, delimited) | Comma separated list of plugins. | |
Transform plugin classes | List of transform plugin classes. | |
XP plugin classes | List of extensions point plugins. | |
Streaming Hop transforms flush interval (ms) | The amount of time after which the internal buffer is sent completely over the network and emptied. | |
Hop streaming transforms buffer size | The internal buffer size to use. | |
Fat jar file location | Fat jar location. |