JSON Input

Description

The JSON Input transform reads data from JSON structures, files, or incoming fields using a JSONPath expression to extract data and output rows.

JSONPath expressions can use either dot notation or square bracket notation.

Supported Engines

Hop Engine

Supported

Spark

Maybe Supported

Flink

Maybe Supported

Dataflow

Maybe Supported

Options

General Tab

Option	Description
Source is from a previous transform	Select to retrieve the source from a previously defined field. When selected, the following fields are available: Select field Use field as file names Read source as URL Do not pass field downstream When this option is cleared, the following fields are available: File or directory Regular Expression Exclude Regular Expression Selected files
Select field	Specify the field name to use as a source from a previous transform.
Use field as file names	Select to indicate the source is a filename.
Read source as URL	Select to indicate if the source should be accessed as a URL.
Do not pass field downstream	Select to remove the source field from the output stream. This action improves performance and memory utilization with large JSON fields.
File or directory	Specify the source location if the source is not defined in a field. Click Browse to navigate to your source file or directory. Click Add to include the source in the Selected files table.
Regular expression	Specify a regular expression to match filenames within a specified directory.
Exclude regular expression	Specify a regular expression to exclude filenames within a specified directory.
File/Directory	The source location indicated by clicking Add after specifying it in File or directory.
Wildcard (RegExp)	Wildcards as specified in Regular expression.
Exclude wildcard	Excluded wildcards as specified in Exclude regular expression.
Required	Required source location for input.
Include subfolders	Whether subfolders are included within the source location.
Delete	Remove a source from the table
Edit	Remove a source from the table and return it back to the File or directory option.
Show filename(s)	Display the file names of the sources successfully connected to the JSON Input transform.

Option

Description

Source is from a previous transform

Select to retrieve the source from a previously defined field. When selected, the following fields are available:

Select field
Use field as file names
Read source as URL
Do not pass field downstream

When this option is cleared, the following fields are available:

File or directory
Regular Expression
Exclude Regular Expression
Selected files

Select field

Specify the field name to use as a source from a previous transform.

Use field as file names

Select to indicate the source is a filename.

Read source as URL

Select to indicate if the source should be accessed as a URL.

Do not pass field downstream

Select to remove the source field from the output stream. This action improves performance and memory utilization with large JSON fields.

File or directory

Specify the source location if the source is not defined in a field.

Click Browse to navigate to your source file or directory.
Click Add to include the source in the Selected files table.

Regular expression

Specify a regular expression to match filenames within a specified directory.

Exclude regular expression

Specify a regular expression to exclude filenames within a specified directory.

File/Directory

The source location indicated by clicking Add after specifying it in File or directory.

Wildcard (RegExp)

Wildcards as specified in Regular expression.

Exclude wildcard

Excluded wildcards as specified in Exclude regular expression.

Required

Required source location for input.

Include subfolders

Whether subfolders are included within the source location.

Delete

Remove a source from the table

Edit

Remove a source from the table and return it back to the File or directory option.

Show filename(s)

Display the file names of the sources successfully connected to the JSON Input transform.

Content Tab

The Content tab contains the following options for configuring which data to retrieve:

Option	Description
Ignore empty file	Select to skip empty files. When cleared, empty files will cause the process to fail and stop.
Do not raise an error if no files	Select to continue when no files are available to process.
Ignore missing path	Select to continue processing files when an error occurs that (1) no fields match the JSON path or (2) that all the values are null. When cleared, no further rows are processed when an error occurs.
Default path leaf to null	Select to return a null value for missing paths.
Limit	Specify a limit on the number of records generated from the tra. Results are not limited when set to zero.
Include filename in output	Select to add a string field with the filename in the result.
Rownum in output	Select to add an integer field with the row number in the result.
Add filenames to result	Select to add processed files to the result file list.

Option

Description

Ignore empty file

Select to skip empty files. When cleared, empty files will cause the process to fail and stop.

Do not raise an error if no files

Select to continue when no files are available to process.

Ignore missing path

Select to continue processing files when an error occurs that (1) no fields match the JSON path or (2) that all the values are null. When cleared, no further rows are processed when an error occurs.

Default path leaf to null

Select to return a null value for missing paths.

Limit

Specify a limit on the number of records generated from the tra. Results are not limited when set to zero.

Include filename in output

Select to add a string field with the filename in the result.

Rownum in output

Select to add an integer field with the row number in the result.

Add filenames to result

Select to add processed files to the result file list.

Fields Tab

The Fields tab displays field definitions to extract values from the JSON structure. The table in this tab contain the following columns:

Option	Description
Name	Name of field that maps to the corresponding field in the JSON input stream.
Path	Complete path of the field name in the JSON input stream. All records can be retrieved by adding the asterisk * in the path. For example, $.mydata.*
Type	Data type of the input field.
Format	An optional mask for converting the format of the original field. See Common Formats for information on common valid date and numeric formats you can use in this transform.
Length	Length of the field.
Precision	Number of floating point digits for number-type fields.
Currency	Currency symbol ($ or €, for example).
Decimal	A decimal point can be a . (5,000.00 for example) or , (5.000,00 for example).
Group	A grouping can be a , (10,000.00 for example) or . (5.000,00 for example).
Trim type	The trim method to apply to a string.
Repeat	The corresponding value from the last row repeated if a row is empty.
Get fields	Populate the table with fields derived from the source file.

Option

Description

Name

Name of field that maps to the corresponding field in the JSON input stream.

Path

Complete path of the field name in the JSON input stream. All records can be retrieved by adding the asterisk * in the path. For example, $.mydata.*

Type

Data type of the input field.

Format

An optional mask for converting the format of the original field. See Common Formats for information on common valid date and numeric formats you can use in this transform.

Length

Length of the field.

Precision

Number of floating point digits for number-type fields.

Currency

Currency symbol ($ or €, for example).

Decimal

A decimal point can be a . (5,000.00 for example) or , (5.000,00 for example).

Group

A grouping can be a , (10,000.00 for example) or . (5.000,00 for example).

Trim type

The trim method to apply to a string.

Repeat

The corresponding value from the last row repeated if a row is empty.

Get fields

Populate the table with fields derived from the source file.

Select fields

Click Get Fields in the Fields tab to open the Select Fields window. Select the checkbox next to each field in your source file that you want include in your output. All the fields selected in this transform are added to the table. You can search for a field name by entering the field name in the Search box.

Additional output fields tab

The Additional output fields tab contains the following options to specify additional information about the file to process:

Option	Description
Short filename field	Specify the field that contains the filename without path information but with an extension.
Extension field	Specify the field that contains the extension of the filename.
Path field	Specify the field that contains the path in operating system format.
Size field	Specify the field that contains the size of the data.
Is hidden field	Specify the field indicating if the file is hidden or not (Boolean).
Last modification field	Specify the field indicating the date of the last time the file was modified.
Uri field	Specify the field that contains the URI.
Root uri field	Specify the field that contains only the root part of the URI.

Option

Description

Short filename field

Specify the field that contains the filename without path information but with an extension.

Extension field

Specify the field that contains the extension of the filename.

Path field

Specify the field that contains the path in operating system format.

Size field

Specify the field that contains the size of the data.

Is hidden field

Specify the field indicating if the file is hidden or not (Boolean).

Last modification field

Specify the field indicating the date of the last time the file was modified.

Uri field

Specify the field that contains the URI.

Root uri field

Specify the field that contains only the root part of the URI.

Considerations

Normally, while processing input JSON files, if a field contains null then the whole record will be omitted from the output.

For example if we have a JSON file like this

{
  "persons" : [
    {
      "id": "1",
      "name": "name 1"
    },
    {
      "id": "2",
      "name": "name 2"
    },
    {
      "id": "3",
      "name": null
    },
    {
      "id": "4",
      "name": "name 4"
    }
  ]
}

When extracting the fields id and Name using the following field definition:

Field name Json Path

Field name	Json Path
id	`$.persons.*.id`
Name	`$.persons.*.name`

$.persons.*.id

Name

$.persons.*.name

given the default behavior, the output will be

id;Name
1;Name 1
2;Name 2
3;null
4;Name 4

Now let’s only select the name field and see what happens

Name
Name 1
Name 2
Name 4

You will notice that you only have 3 rowst retuned in this case ( the null line is omited from the result)

To change Hop’s behavior regarding Null values in Json files, change the following configuration variable:

HOP_JSON_INPUT_INCLUDE_NULLS = Y

When restarting and re-running the pipeline you will have 4 result rows containing the null value.

Metadata Injection Support

All fields of this transform support metadata injection. You can use this transform with ETL Metadata Injection to pass metadata to your pipeline at runtime.