Get filenames transform Icon Get filenames

Description

The Get File Names transform allows you to get information associated with file names on the file system.

The information about the retrieved files is added as rows onto the stream.

Supported Engines

Hop Engine

Supported

Spark

Supported

Flink

Supported

Dataflow

Supported

Usage

The output fields for this transform are:

  • filename - the complete filename, including the path (/tmp/hop/somefile.txt)

  • short_filename - only the filename, without the path (somefile.txt)

  • path - only the path (/tmp/hop/)

  • type

  • exists

  • ishidden

  • isreadable

  • iswriteable

  • lastmodifiedtime

  • size

  • extension

  • uri

  • rooturi

File tab

This tab defines the location of the files you want to retrieve filenames for. For more information about specifying file locations, see section "Selecting file using Regular Expressions" on the Text File Input transform.

Example: You have a static directory of c:\temp where you expect files with an extension of .dat to be placed. Under file/directory you would specify c:\temp and under Wildcard you would have a RegEx with something like .*\.dat$

Filters

The filters tab allows you to filter the retrieved file names based on:

  • All files and folders

  • Files only

  • Folders only

It also gives you:

  • The ability to include a row number in the output

  • The ability to limit the number of rows returned. The limit parameter will act on the total number of rows returned and not only on the number of files returned.

  • The ability to add the filename(s) to the result list

  • The ability to print an error message in case no files/folders are found without stop processing

  • The ability to raise an exception and stop processing in case no files/folders are found.

Details on how exceptions are raised when no files are found

As described above, if you enable the switch in the Filter tab, Hop will raise an exception in case no files are found and will stop the process.

In this case, we need to be aware about two different ways the exceptions are raised depending on the way you choose to identify the set of files you are looking for.

  • If you specified the files (or the inclusion/exclusion expressions) as a set in the Selected files table, the files retrieval is performed by considering the entire set specified (all at once) and the exception is raised in no files are found.

  • If you specified the files by going through the Filenames from field option, the details about the files that we are going to retrieve comes in the incoming rows. That means that the retrieval is done row by row and as soon as one of the details used to specify the searched files fails the exception is raised. Therefore, in this case, the evaluation to fire the exception is done after the evaluation of each single incoming row and not evaluating the overall dataset returned and generated by the details found in the incoming rows.