Solr4.0
Introduction
In many cases, documents have relationships between them and it is too expensive to denormalize them. Thus, a join operation is needed. Preserving the document relationship allows documents to be updated independently without having to reindex large numbers of denormalized documents.
Input Parameters
Joins are processed using Solr's LocalParams syntax. The query typically looks like: q={!join from=manu_id_s to=id}ipod
Thus, you need the join QueryParser(Plugin) which is specified by the {!join} syntax. Then, you specify the foreign key relationship by giving the from and to fields to join on.
Examples
In the example data, all documents have a unique "id" field, but documents modeling products also have a "manu_id_s" which is essentially a "foreign key" to the "id" of the associated manufacturer doc.
- Find all product docs matching "ipod", then join them against (manufacturer) docs and return the list of manufacturers that make those products
- Find all manufacturer docs named "belkin", then join them against (product) docs and return the list of products produced by that manufacturer
- Find all manufacturer docs named "belkin", then join them against (product) docs and filter that list to only products with a price less than 12 dollars
- Find all products matching ipod (sorted by score) and filter that by the set of products produced by joining manufacturers named "Belkin" or "Apple"
Compared To SQL
For people who are used to SQL, it's important to note that Joins in Solr are not really equivalent to SQL Joins because no information about the table being joined "from" is carried forward into the final result. A more appropriate SQL analogy would be an "inner query".
This Solr request...
/solr/collection1/select ? fl=xxx,yyy & q={!join from=inner_id to=outer_id}zzz:vvvIs comparable to this SQL statement...
SELECT xxx, yyy FROM collection1 WHERE outer_id IN (SELECT inner_id FROM collection1 where zzz = "vvv")
Limitations
- Fields or other properties of the documents being joined "from" are not available for use in processing of the resulting set of "to" documents (ie: you can not return fields in the "from" documents as if they were a multivalued field on the "to" documents)
- The Join query produces constant scores for all documents that match -- scores computed by the nested query for the "from" documents are not available to use in scoring the "to" documents
In a DistributedSearch environment, you can not Join across cores on multiple nodes. If however you have a custom sharding approach, you could join across cores on the same node.
Quick Start
NOTE: The described additions to the "browse" screen is currently dependent on SOLR-2502
Follow the Tutorial at http://lucene.apache.org/solr/tutorial.html to get setup
Point your browser at http://localhost:8983/solr/browse?&queryOpts=join
- Fill in your query and the names of two fields to join on, for example From: manu_id and To: id (join between the manu_id on the products and the id on the manufacturers)
- Submit -- Notice the results are of the manufacturers who make those items and not of the products themselves even though the match is on the products