Fusion 3.0.0: How to Use The JDBC Lookup Index Pipeline Stage


Database lookups in the middle of ingesting content: reasonable request or abhorrent behavior? The cat tries to vote by breaking the radioactive vial, but isn’t alive long enough to vote.

In the meantime, there are 2 ways to make a database call from either the index pipeline or the query pipeline. While the process is very very (that’s 2 verys) similar I wouldn’t assume the logic works exactly the same until you can put a quantum lock on future behavior.

You would use the JDBC Lookup Index Pipeline stage or the JDBC Lookup Query Pipeline stage. The cat prefers the index pipeline. We’ll discuss that one.

The Short Story

  1. Manually copy the appropriate JDBC driver jar file to 2 folders in Fusion:
    • $FUSION/apps/jetty/api/lib/ext
    • $FUSION/apps/jetty/connectors/lib/ext
  2. Go to your index pipeline and add the JDBC Lookup stage
  3. Enter a SQL statement into the PreparedStatement field with question marks if you need the SQL statement to use a field from the inbound document
  4. Enter one or more PreparedStatementKeys if you have any question marks in the above entered SQL statement.

The Longer Story

A great example needs a great story. I didn’t have one so I settled on using 2 CSV files instead.

For the purposes of this exercise we will use these 2 CSV files for the following:

  • One for our input (so we can trigger the JDBC stage at index time)
  • Another CSV file that we can read using a JDBC driver.

What? Use a JDBC driver to read a CSV file? Heresy! Well, maybe not heresy since I haven’t burned anyone at the stake since the cat gave me up to the Protocol Police for not using SSL.

Please download the JDBC driver for CSV files from Source Forge (the CsvJdbc – CSV file JDBC Driver). Store it somewhere you can get to in a few lines.

Create 2 CSV files:
input.csv:

name,rank_number
Steve Rogers,1
Billy Batson,2

table/rank-table.csv:

rank_number,rank
1,Captain
2,Captain Marvel

We will join the inbound doc (input.csv) with database table (rank-table.csv) using the following SQL:

select rank from rank-table where rank_number=[rank_number from the current inbound document]

Let’s start with a clean slate:

Create a collection (Lucidworks -> DevOps -> Collection Manager -> New) and call it jdbc-index-test

Figure 1 – Create a Collection

Create a Local Filesystem datasource (Lucidworks -> Search -> jdbc-index-test -> Datasources -> Add -> Local Filesystem):

  • Datasource ID: jdbc-index-test-ds
  • Pipeline ID:  jdbc-index-test-default
  • Parser: No Parser
  • StartLinks: [path to where you have input.csv]

Don’t worry, we’ll mess with those in a second. Press Save.

Figure 2 – Create a datasource

Time for configuration hygiene:

Click Go to JDBC-INDEX-TEST-DEFAULT to open the Index Pipeline panel. Delete the following stages:

  • Field Mapping
  • Solr Dynamic Field Name Mapping

You do that by selecting the stage and pressing Remove Stage. Your pipeline will now contain only 1 stage: the Solr Indexer. Nothing better than an uncluttered pipeline.

Figure 3 – An Uncluttered Pipeline (the best kind)

Close the Index Pipeline panel (press the X over at the top right hand corner of the panel).

Next, from the Parser drop down list select default. Press Open Default Parser.

We are not going to change the default parser. We are going to create a new one and use the new one with our datasource.

In the Parser panel press Add +. Enter:

Parser ID: csv-parser

Delete all the parsers except CSV.

Figure 4 – The Lonely CSV Parser in csv-parser

Press Save on the Parser panel to save your new Parser chain.

Close the Parser panel (press the X in the upper right hand corner of the panel).

Select the csv-parser from the Parser drop down list.

Press Save to save the new configuration.

Figure 5 – The New! Improved! jdbc-index-test-ds datasource configuration

At this point, if you wanted to (I will not force you to do things you don’t want to do), you could run the datasource and you should see the 2 docs from the input.csv file in the index through the Query Workbench. If you do that, and you are happy with the results, please press Clear Datasource. But you don’t need to do that.

Figure 6 – Something you should probably do, but don’t need to do

All of that preparation was so we could finally add the JDBC Index stage and configure it to do our bidding. After all of the above prep, we can now jump to the Short Story section of the evening.

Somewhere you downloaded the CsvJdbc driver. Copy it to the following 2 folders (where $FUSION is where you installed Fusion):

  • $FUSION/apps/jetty/api/lib/ext
  • $FUSION/apps/jetty/connectors/lib/ext

Restart Fusion or else Fusion will not recognize the jar files.

Open the Index Pipeline panel, and select your jdbc-index-test-default pipeline:

Press Add a New Pipeline Stage and select the JDBC Lookup stage

For the purposes of this exercise we are using the CsvJdbc driver so the following configuration reflects that:

  • JDBC Driver: org.relique.jdbc.csv.CsvDriver
  • Connection URI: jdbc:relique:csv:/home/search/Downloads/content/jdbc-test/table
  • Username: [enter anything you want]
  • Password: [enter anything you want]
  • SQL Prepared Statement: select rank from rank-table where rank_number=?
  • Join with Document: check
  • Prepared Statement Keys: rank_number

Press Save to save the configuration you have slaved over.

Figure 7: Configuration of the JDBC Lookup stage

So what does all this mean?

The JDBC driver name is the full name of the CsvJdbc driver class.

The connection URL is to a folder where the CSV file can be found. Don’t use a specific file name of the driver will not work.

Since you are not really connecting to a database you don’t need a username/password, but Fusion still wants one so either enter whatever, or just enter a blank space in each field.

The SQL Prepared Statement table name (jdbc) is the name of the CSV file without the extension.

The SQL statement uses the name of the column from the CSV file that we want to add to the document (in this case rank).

The Prepared Statement Keys is the field whose value we want substituted in the Prepared Statement where the question mark is (more about JDBC Prepared Statements here). In this case, we want to search using rank_number from the inbound documents.

Run the datasource. Make sure the SQL is correct. Make sure the field names are correct. Hell, make sure all of the above configurations are correct.

Look for a field called rank_1. It will contain the information from the CSV file.

Figure 8 – Steve Rogers and Billy Batson get their ranks

Okay, so we probably did too many things at once. A Best PracticeTM would be:

  1. Configure the datasource to ingest input.csv
  2. Run the datasource and confirm that it was indexed
  3. Configure the JDBC Index stage is configured properly
  4. Run the datasource again and confirm that not only was input.csv indexed, but the new rank field was added to the document with the proper value
  5. Have a party

In fact, that is how i wrote this blog. One step at a time because I am a ‘fraidy cat.

If you were so industrious that you decided to monitor connectors.log (and, really, who doesn’t?) you might have seen (okay, you would have seen):

WARN [com.lucidworks.connectors.ConnectorJob, id=jdbc-index-test-ds:ConnectorsCallback@66] - {pipelineId=jdbc-index-test-default, stageId=k41yg7cilg8yf1or} - Error in jdbc-index-lookup, docId=null
java.sql.SQLException: Unable to bind PreparedStatement variables: 1 key: rank_number

The log file is pretty much lying to you as the bind worked and the inbound docs have the field you requested in the SQL properly joined with the PreparedStatementKey. Just sayin’.

The cat approves.

Disclosures

Carlos Valcarcel is a full time employee of LucidWorks, but lives in New York as he prefers hurricanes to earthquakes. Having worked at IBM, Microsoft, and Fast Search and Transfer the only thing he is sure of is that the font editor he wrote on his Atari 800 was the coolest program he has ever written. While questions can be a drag he admits that answers will be harder to give without them.

The cat isn’t real, but then neither are you. Enjoy your search responsibly.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s