Monday, September 17, 2012

Workaround for Copy Command from WebHDFS

At the moment the WebHDFS api doesn't offer the Copy command. As a result, the client ends up having to download the file to the local disk and re-upload the files via the Create command. Since this ends up being a lot of round trips all the way to the client (typically a non Java based client) the following workaround can be set up to partly alleviates the problem.

Set up a HDFS Webdav server on one of the DN or NN boxes. Issue the Copy command to the Webdav server via a REST call. Free up the client application, while letting the Webdav server with much better connectivity & proximity to the HDFS complete the Copy command request.

Wednesday, September 12, 2012

Remotely Debug Solr Cloud in Eclipse Using JPDA, JDWP, JVMTI & JDI


The acronym's first:
JPDA - Java Platform Debug Architecture
JDWP - Java Debug Wire Protocol
JVMTI - JVM Tool Interface
JDI - Java Debug Interface

To debug any of the open source Java projects such as Solr using Eclipse, rely on the JDWP feature available within any standard JVM. You can get a lot more info about the terms and architecture here.

At a high level the concept is that there is a JVM to be debugged (Solr) & a client side JVM debuggee (Eclipse). The two communicate over the JDWP. Thanks to a standardized wire protocol the client may even be a non JVM application which subscribes to the protocol.

One of the two JVMs acts as the debugging server (the one that waits for the client to connect). The other JVM acts as the debugging client which connects to the debugger server, to start the debugging process.

In our case, to keep things simple let Solr be the debugger server, while Eclipse can be the debugger client. The configurations then are as follows.

On Solr side (assuming Solr Cloud):

java -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=8000 -Djetty.port=7200 -Dhost=myhost -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -Djava.util.logging.config.file=etc/logging.properties -DnumShards=3 -DzkHost=zk1:2171 -jar start.jar

Note: Since we have set suspend = y, Solr side will stay suspended until the Eclipse debugger client has connected

On Eclipse side:
Go to Run > Debug Configurations > Remote Java Application
Then choose Standard Socket Attach. Host: localhost (or IP). Port: 8000 (the same as set above)

Also in Eclipse you should have checked out the Solr source code from Solr trunk as a project. This will allow you to put break points at appropriate location to help with the debugging. So go on give this a shot and happy debugging!