Import Kettle (PDI) Projects in Apache Hop

As stated in the Q&A, Apache Hop used Kettle (aka Pentaho Data Integration or PDI) as a starting point in late 2019. A lot has happened in the meantime on both Apache Hop and Pentaho Data Integration.

Compatibility with Kettle/PDI was never a goal for Apache Hop, but since a lot of organizations have invested vast amounts of resources in Kettle/PDI project development, the Apache Hop community provides a way to import Kettle/PDI code into Hop and convert the imported code the Hop ways of working.

Imported Items

  • jobs: convert to Workflows (kjb to hwf), job entries to actions

  • transformations: convert to Pipelines (ktr to hpl), steps to transforms

  • kettle.properties: import to project variables

  • shared.xml: extract relational database connections to Hop relational database connection metadata objects

  • jdbc.properties: extract JNDI (simple-jndi) relational database connections to Hop relational database connection metadata objects

  • connections in jobs and transformations are extracted and converted to Hop relational database connection metadata objects

  • import jobs, transformations and other files into a Hop project (selected or bootstrapped in specified folder)

  • repository references are extracted and converted to file references

Known limitations

  • no connection cleanup: only 1 copy of database connections with the same name but different configurations is kept.

  • no metastore import

Usage

To import your Kettle/PDI projects in Hop, select File → Import from Kettle/PDI or press CTRL-i.

File -→ Import from Kettle/PDI

Add you import sources and target in the pop-up dialog you’ll be presented with:

Import Dialog

The options in this dialog are:

Option Description Optional

Import From

The folder to import Kettle/PDI jobs and transformations from

No

Import in existing project

check to import into an existing project, uncheck to import into a folder

No

Import in project

Dropdown list of available projects to import the Kettle/PDI project into

Conditional

Import to folder

Path to import the Kettle/PDI project to. All imported items will be imported into a Hop project in this folder.

Conditional

Path to kettle.properties

Path to a kettle.properties file. All properties in this file will be imported as variables in the Hop project.

Yes

Path to shared.xml

Path to a shared.xml file. All database connections in this file will be imported as Hop relational database connection metadata objects in the specified Hop project or folder.

Yes

Path to jdbc.properties

Path to a jdbc.properties file. All Kettle/PDI JNDI database connections in this file will be imported as Hop (generic) relational database connection metadata objects in the specified Hop project or folder.

Yes

After entering your import details, click the 'Import' button.

After a couple of seconds (even when importing large projects), you’ll be presented with a migration summary:

Import Report

The migration summary shows:

  • number of jobs

  • number of transformations

  • number of other files

  • number of variables

  • number of database connections

Only migrated items will be shown. Items that were not available in the specified folders or files for this import will not be shown.

When multiple database connections with the same name but different configurations were found (see 'Known limitations'), a connnections.csv file will be created in the project folder. This file contains a list of all jobs and transformations, with the connections they use.

Import from the CLI

The hop-import.sh/bat CLI tool lets you import Kettle/PDI projects from the command line.

Check the hop-import page for details.