Saturday, October 26, 2013

Be Hands On

For as long as you can. Think specialists (surgeons, pilots, etc.) who get better clocking in more hours with/into their art - doing, practicing, persevering. 

Sunday, October 20, 2013

General Availability (GA) for Hadoop 2.x

The Hadoop 2.x GA is nothing less that a big leap forward. Most of the features released such as YARN - a pluggable resource management framework, Name Node HA, HDFS Federation and so on were long awaited. As per the official mail to the community, this release includes:

"To recap, this release has a number of significant highlights compared to Hadoop 1.x:
        • YARN - A general purpose resource management system for Hadoop to allow MapReduce and other other data processing frameworks and services
        • High Availability for HDFS
        • HDFS Federation
        • HDFS Snapshots
        • NFSv3 access to data in HDFS
        • Support for running Hadoop on Microsoft Windows
        • Binary Compatibility for MapReduce applications built on hadoop-1.x
        • Substantial amount of integration testing with rest of projects in the ecosystem

 Please see the Hadoop 2.2.0 Release Notes for details."


Also as per the official email to the community, users are encouraged to move forward to the 2.x branch which is more stable & backward compatible.

Tuesday, October 1, 2013

Need Support to Lift with Confidence

Brace up terminologies coming your way...

Support: A measure of the prevalence of an event x in a given set of N data points. Support is effectively a first level indicator of something occurring frequent enough (say greater than 10% of the times) to be of interest.

In the case of two correlated events x & y,

Confidence: A measure of predictability of two events occurring together. Once confidence is above a certain threshold (say 70%), it means the two events show up together often enough to be used for rules/ decision making, etc.

Lift: A measure of the power of association between two events. For an event y that has occurred, how much more likely is event y to occur once it is known that event x has occurred