Showing posts with label Solr-net. Show all posts
Showing posts with label Solr-net. Show all posts

Monday, June 10, 2013

Solution for making Long GET Request to Solr via SolrNet

Solr has REST api's available for performing various searches on indexed documents. The client generally issues GET requests to Solr with different parameters (fields, row, facet, etc.) set. Since there typically are size/ query length limitations on GET requests (imposed by container, OS, etc.), Solr allows the same queries to be issued to the Solr RequestHandlers as POST request as well.

We ran into one such issue with long GET request to Solr from SolrNet and did a few changes to solve the same.

Solr Side Changes:
First up, we increased the headerBufferSize of the application server as explained on SO here and increased the maxBooleanClauses parameter in solrconfig.xml. This allowed Solr side to start responding to much longer GET requests. The problem however wasn't solved. The client side was a dot net application running within IIS having additional length limitations imposed by Windows OS & the dot net framework.

SolrNet Side Changes:
In round two, we went for a better fix and switched over to a POST requests in place of long GET requests. The solution is largely the same as mentioned on the SolrNet group here & here. The difference being to switch over to a POST request from within the Get() method of the SolrConnection.cs class, when the request string is longer than a configurable threshold value.


Update: PostSolrConnection.cs class has made it to the head branch of SolrNet.  

Monday, May 27, 2013

SolrNet Separate Highlighting Query - hl.q

Solr allows highlighting of matched sections in field values.  There are several parameters that can be set by the caller to adjust the highlighting behaviour.

SolrNet, a library to connect to Solr from dot net applications, also has HighlightingParameters exposed in SolrNet core library. However, not all/ a very small subset of parameters are currently exposed.

Recently needed to use the hl.q query, to issues a separate/ more specific highlighting query to Solr. The work around was to make use of the ExtraParams option, from the base CommonQueryOptions class.

The same approach could be used for any of the other parameters not exposed by SolrNet, such ais hl.BoundaryScanner, per field highlighting, maxScan, etc., essentially all the 3.5x onward features mentioned on the Solr Highlighting wiki.

Friday, March 1, 2013

Atomic Updates via SolrNet

As of today the SolrNet api doesn't offer atomic updates to be issued to a running Solr server. While the Solrnet api is supposed to offer this feature sometime in the future, the following alternative can be used in the interim.

1. Build a custom atomic update XML message:


(See: http://wiki.apache.org/solr/UpdateXmlMessages for more details)

2. Get hold of the connection object (via ServiceLocator):


3. Issue a call to Solr via the connection object:


Will be adding sample code snippets soon..

Friday, February 15, 2013

Solr Cell, Tika And Pages

With Solr Cell, aka Tika, you get the power to index content from within  a wide set of digital files such as Pdfs, Office, Text, etc.

Tika however doesn't naturally offer any demarcations for page boundaries. So you can search for content matches from a file, but not for specific pages from within these files.

Among several different ways to solve this problem, one way could be to index each page of the file as a separate document in Solr and do a field collapsing/ result grouping on the search results by a common file identifier shared by all pages of the file.

Since there could be performance overheads with result grouping, another way is to index the combined file as one solr document (of type Combined) & each page as a separate solr document (of type Page) with a common file identifier. The search can then be performed initially against the combined document (type:combined AND text:abc) to identify files that match & then against the corresponding page type document (type:page AND file-id:123 AND text:abc) to identify pages.