LWS: Using SolrXML To Crassly Manipulate Solr


As I mentioned in the previous post: SolrXML isn’t just for sending documents into Solr. It is also for sending messages to Solr about things you would like it to do: add one or more documents, update one or more documents or delete one or more documents (there are a few other messages you can send, but why quibble over details).

Let’s look at these in turn.

If you want to follow along then do the following (otherwise jump down to Adding one or more documents):

  • Create a test-content folder in a location that you can find at a moments notice (just in case the police come in).
  • Create 3 folders under test-content: add, update, delete.
    • In add copy add-solr.xml (located below)
    • In update copy add-solr.xml and update-solr.xml (located below)
    • In delete copy add-solr.xml and delete-solr.xml (located below)
  • Create 3 collections and call them add-solrxml, update-solrxml, and delete-solrxml
    • In add-solrxml:
      • Create a SolrXML data source called add-solrxml and point it to the add folder (leave the Include paths alone).
    • In update-solrxml:
      • Create a SolrXML data source called add-solrxml and point it to the add-solr.xml file (NOT the folder. Also leave the Include paths alone).
      • Create a SolrXML data source called update-solrxml and point it to the update-solr.xml file (NOT the folder. Also leave the Include paths alone).
    • In delete-solrxml:
      • Create a SolrXML data source called add-solrxml and point it to the add-solr.xml file (NOT the folder. Also leave the Include paths alone).
      • Create a SolrXML data source called delete-solrxml and point it to the delete-solr.xml file (NOT the folder. Also leave the Include paths alone).

Adding one or more documents

So the basic template looks like this:

<add overwrite="[true | false]" commitWIthin="[millisecond value]">
  <doc boost="[float value]">
    <field name="[field name]" update="[set | add]" boost="[float value]">[field value]</field>
    ...
  </doc>
  ...
</add>

Go to the dashboard for the add-solrxml collection. Start the crawl.

You should expect to see the same sort of results we saw in the last blog post: the 3 documents go in and all is right with the world.

1-add-results

Simple. Too painfully obviously incredibly simple. Something has to be done about that.

Updating one or more documents

So the basic template looks like this (do you feel like you are in Groundhog Day?):

<add overwrite="[true | false]" commitWithin="[millisecond value]">
  <doc boost="[float value]">
    <field name="[field name]" update="[set | add]" boost="[float value]">[field value]</field>
    ...
  </doc>
  ...
</add>

Go to the dashboard for the update-solrxml collection. Start the add-solrxml crawl and then start the update-solrxml crawl.

Check the results. They should be different than the first example. The update.solr.xml file changes the second document’s body (don’t let the order of the results fool you).

2-update-results

Deleting one or more documents

So the basic template looks like this (do you feel like you are in Groundhog Day?):

<delete>
  <id>[id value]</id>
  ...
  <!-- AND/OR -->
  <query>[query string]</query>
  ...
</delete>

Go to the dashboard for the delete-solrxml collection. Start the add-solrxml crawl and then start the delete-solrxml crawl.

Check the results. There should only be two documents left.

3-delete-results

Adding and deleting (gasp!) at the same time

So the basic template looks like this (do you feel like you are in Groundhog Day?):

<update>
  <add>
  ...
  </add>
  <delete>
  ...
  </delete>
</update>

Yes, I left out the prep steps for this one. It involves creating a folder and placing add-single-solr.xml and add-delete-solr.xml in it, creating a collection, creating a data source that points to that folder, and running the crawl.

4-add-delete-results

There should only be 3 documents in that collection. This is left as an exercise for the reader.

The cat is satisfied. Time for a donut.

The Files

add-solr.xml

<!-- add-solr.xml -->
<add>
  <doc>
    <field name="id">1</field>
    <field name="title">This is the permanent title</field>
    <field name="body">This body will never change.</field>
  </doc>
  <doc>
    <field name="id">2</field>
    <field name="title">This second title will not change</field>
    <field name="body">This body will change.</field>
  </doc>
  <doc>
    <field name="id">3</field>
    <field name="title">This Document is Doomed</field>
    <field name="body">This body is going away.</field>
  </doc>
</add>

update-solr.xml

<!-- update-solr.xml -->
<add>
  <doc>
    <field name="id">2</field>
    <field name="body" update="set">I have a better body now that I have been updated.</field>
  </doc>
</add>

delete-solr.xml

<!-- delete-solr.xml -->
<delete>
    <id>3</id>
</delete>

add-single-solr.xml

<!-- add-single-solr.xml -->
<add>
  <doc>
    <field name="id">4</field>
    <field name="title">The Lost Book</field>
    <field name="body">Please don't delete me!</field>
  </doc>
</add>

add-delete-solr.xml

<!-- add-delete-solr.xml -->
<update>
	<add>
	  <doc>
		<field name="id">1</field>
		<field name="title">This is the permanent title</field>
		<field name="body">This body will never change.</field>
	  </doc>
	  <doc>
		<field name="id">2</field>
		<field name="title">This second title will not change</field>
		<field name="body">This body will change.</field>
	  </doc>
	  <doc>
		<field name="id">3</field>
		<field name="title">This Document is Doomed</field>
		<field name="body">This body is going away.</field>
	  </doc>
	</add>
	<delete>
	  <id>4</id>
	</delete>
</update>

References

XML Messages for Updating a Solr Index

Disclosures

Carlos Valcarcel is a full time employee of LucidWorks, but lives in New York as he prefers hurricanes to earthquakes. Having worked at IBM, Microsoft, and Fast Search and Transfer the only thing he is sure of is that the font editor he wrote on his Atari 800 was the coolest program he has ever written. While questions can be a drag he admits that answers will be harder to give without them.

The cat isn’t real, but then neither are you. Enjoy your search responsibly.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s