Friday, August 10, 2012

REST based integration with HDFS via WebHDFS

WebHDFS offers a set of perfectly good REST api's for any application to integrate with the HDFS. This can be particularly advantageous for applications written in languages other than Java such as Rails, Dot Net and so on.

Within our lan with commodity desktop class boxes with 2.5 Ghz processors, 8 G Ram set up, and a replication factor of 2, we found about  read/ write speeds of about 27 Mbps via WebHDFS. This was only a shade slower than the 30 Mbps that we were getting via raw file transfers between the same Data Nodes (DN).

Another observation was that our best transfer rates were achieved by setting the buffer size to 22K. We played around with several other buffer size values, but found 22K to be the magic number. Hoping to find some logical explanation for this observation.