Get Data From XML transform Icon Get Data From XML

Description

The Get Data From XML transform provides the ability to read data from any type of XML file using XPath specifications.

Get Data From XML can read data from 3 kind of sources (files, stream and url) in 2 modes (user can define files and urls at static mode or in a dynamic way).

Supported Engines

Hop Engine

Supported

Spark

Maybe Supported

Flink

Maybe Supported

Dataflow

Maybe Supported

Options

Files Tab

The files tab is where you define the location of the XML files from which you want to read. The table below contains options associated with the Files tab.

Option Description

Transform name

Name of the transform.

XML Source from field

  • XML source is defined in a field : the previous transform is giving XML data in a certain field in the input stream.

  • XML source is a filename : the previous transform is giving filenames in a certain field in the input stream. These are read.

  • Read source as URL : the previous transform is giving URLs in a certain field in the input stream. These are read.

  • Get XML source from a field : specify the field to read XML, filename or URL from.

File or directory

Specifies the location and/or name of the input text file. Note: Click Add to add the file/directory/wildcard combination to the list of selected files (grid) below.

Regular expression

Specifies the regular expression you want to use to select the files in the directory specified in the previous option.

Selected Files

Contains a list of selected files (or wildcard selections) and a property specifying if file is required or not. If a file is required and it is not found, an error is generated;otherwise, the file name is skipped.

Show filename(s)…​

Displays a list of all files that will be loaded based on the current selected file definitions

Content Tab

Option Description

Settings

  • Loop XPath : For every "Loop XPath" location we find in the XML file(s), we will output one row of data. This is the main specification we use to flatten the XML file(s). You can use the "Get XPath nodes" button to search for the possible repeating nodes in the XML document. Please note that if the XML document is large that this can take a while.

  • Encoding : the XML filename encoding in case none is specified in the XML documents. (yes, those still exist)

  • Namespace aware : check this to make the XML document namespace aware.

  • Ignore comments : Ignore all comments in the XML document while parsing.

  • Validate XML : Validate the XML prior to parsing. Use a token when you want to replace dynamically in a Xpath field value. A token is between @_ and - (@_fieldname-). Please see the Example 1 to see how it works.

  • Use token : a token is not related tro XML parsing but to Hop.

  • Igore empty file : an empty file is not a valid XML document. Check this if you want to ignore those altogether.

  • Do not raise an error if no file: Don’t raise a stink if no files are found.

  • Limit : Limits the number of rows to this number (zero (0) means all rows).

  • Prune path to handle large files: almost the same value as the "Loop XPath" property with some exceptions, see Get Data from XML - Handling Large Files for more details. Note that you can use this parameter to avoid multiple HTTP URL requests.

Additional fields

  • Include filename in output? : Allows you to specify a field name to include the file name (String) in the output of this transform.

  • Rownum in output? : Allows you to specify a field name to include the row number (Integer) in the output of this transform.

Add to result filename

  • Add files to result filename : Adds the XML filenames read to the result of this pipeline. A unique list is being kept in memory that can be used in the next workflow action in a workflow, for example in another pipeline.

Fields Tab

Option Description

Name

The name of the output field

XPath

The path to the element node or attribute to read

Element

The element type to read: Node or Attribute

Type

The data type to convert to

Format

The format or conversion mask to use in the data type conversion

Length

The length of the output data type

Precision

The precision of the output data type

Currency

The currency symbol to use during data type conversion

Decimal

The numeric decimal symbol to use during data type conversion

Group

The numeric grouping symbol to use during data type conversion

Trim type

The type of trimming to use during data type conversion

Repeat

Repeat the column value of the previous row if the column value is empty (null)

Metadata Injection Support

All fields of this transform support metadata injection. You can use this transform with ETL Metadata Injection to pass metadata to your pipeline at runtime.