Microsoft Excel input
Description
The Microsoft Excel Input transform provides you with the ability to read data from Microsoft Excel. The following sections describe each of the available features for configuring this transform.
The default spreadsheet type (engine) is set to Excel XLSX, XLS. When you are reading other file types like OpenOffice ODS and using special functions like protected worksheets, you need to change the Spread sheet type (engine) in the Content tab accordingly.
Options
Files Tab
Option | Description |
---|---|
transform Name | Name of the transform; the name has to be unique in a single transform. |
Spread sheet type (engine) | This field allows you to specify the spreadsheet type. Currently the following are supported:
|
File or directory | Specifies the location and/or name of the input text file. Note: Click Add to add the file/directory/wildcard combination to the list of selected files (grid) below. |
Regular expression | Specify the regular expression you want to use to select the files in the directory specified in the previous option. For example, you want to process all files that have a .txt extension. (For further details see the Text File Input transform and the section "Selecting file using Regular Expressions") |
Exclude Regular Expression | Excludes all files (in a given location) that meet the criteria specified by this regular expression. |
Selected Files | Contains a list of selected files (or wildcard selections) and a property specifying if file is required or not. If a file is required and it is not found, an error is generated;otherwise, the file name is skipped. |
Accept filenames from previous transforms | Allows you to read in file names from a previous transform in the transform. You must also specify which transform you are importing from, and the input field in that transform from which you will retrieve the filename data |
Show filenames(s)… | Displays a list of all files that will be loaded based on the current selected file definitions |
Preview rows | Click Preview to examine the contents of the specified Excel file |
Sheets
In this tab you can specify the names of the sheets in the Excel workbook to read. For each of the sheet names you can specify the row and column to start at.
Content
Option | Description |
Header | Enable if the sheets specified contain a header row to skip |
No empty rows | Enable if you don’t want empty rows in the output of this transform |
Stop on empty row | Makes the transform stop reading the current sheet of a file when a empty line is encountered |
Limit | Limits the number of rows to this number (zero (0) means all rows). |
Encoding | Specifies the text file encoding to use. Leave blank to use the default encoding on your system. To use Unicode, specify UTF-8 or UTF-16. On first use, Spoon searches your system for available encodings.) |
Error handling
Option | Description |
Strict types? | If checked, Hop will report data type errors in the input. |
Ignore errors? | Enable if you want to ignore errors during parsing |
Skip error lines? | If checked, Hop will skip lines that contain errors. These lines can be dumped to a separate file by specifying a path in the Failing line numbers files directory field below. If this is not checked, lines with errors will appear as NULL values in the output. |
Warnings file directory | When warnings are generated, they are placed in this directory. The name of that file is <warning dir>/filename.<date_time>.<warning extension> |
Error files directory | When errors occur, they are placed in this directory. The name of that file is <errorfile_dir>/filename.<date_time>.<errorfile_extension> |
Failing line numbers files directory | When a parsing error occurs on a line, the line number is placed in this directory. The name of that file is <errorline dir>/filename.<date_time>.<errorline extension> |
Fields tab
The fields tab is for specifying the fields that must be read from the Excel files. Use Get fields from header row to fill in the available fields if the sheets have a header row automatically.
The Type column performs type conversions for a given field. For example, if you want to read a date and you have a String value in the Excel file, specify the conversion mask. Note: In the case of Number to Date conversion (for example, 20051028-→ October 28th, 2005) specify the conversion mask yyyyMMdd because there will be an implicit Number to String conversion taking place before doing the String to Date conversion.
Option | Description |
Name | The name of the field. |
Type | The field’s data type; String, Date or Number. |
Length | The length option depends on the field type. Number: total number of significant figures in a number; String: total length of a string; Date: determines how much of the date string is printed or recorded. |
Precision | The precision option depends on the field type, but only Number is supported; it returns the number of floating point digits. |
Trim type | Truncates the field (left, right, both) before processing. Useful for fields that have no static length. |
Repeat | If set to Y, will repeat this value if the field in the next row is empty. |
Format | The format mask (number type). See Text File Input transform and the section "Number Formats" for a complete description of format symbols. |
Currency | Symbol used to represent currencies. |
Decimal | A decimal point; this is either a dot or a comma. |
Grouping | A method of separating units of thousands in numbers of four digits or larger. This is either a dot or a comma. |
Additional output fields tab
This tab retrieves custom metadata fields to add to the transform’s output. The purpose of each field is defined in its name, but you can use these fields for whatever you want. Each item defines an output field that will contain the following information. Some of these are missing.
Option | Description |
Full filename field | The full file name plus the extension. |
Sheetname field | The worksheet name you’re using. |
Sheet row nr field | The current sheet row number. |
Row nr written field | Number of rows written |
Short filename field | The field name that contains the filename without path information but with an extension. |
Extension field | The field name that contains the extension of the filename. |
Path field | The field name that contains the path in operating system format. |
Size field | The field name that contains the size of the file, in bytes. |
Is hidden field | The field name that contains if the file is hidden or not (boolean). |
Uri field | The field name that contains the URI. |
Root uri field | The field name that contains only the root part of the URI. |