Friday, July 20, 2012

Managing Hierarchies and Hierarchical Data in Databases and Solr

Databases - Materialzed Paths, Nested Sets, Nested Intervals, Closure Sets

Solr - Data flattening, Hierarchical Faceting

To be updated..

Sunday, June 10, 2012

Triangular Matrix

There are two special kinds of square matrices (n X n matrix) known as the Upper Triangular Matrix & the Lower Triangular Matrix.

Upper Triangular Matrix - Elements below the main diagonal are all zero.
     i.e. M[i,j] = 0, for i > j

Lower Triangular Matrix - Elements above the main diagonal are all zero.
     i.e. M[i,j] = 0, for i < j

where,
Main diagonal - The diagonal running through M[1,1] to M[n,n] for a nXn square matrix.

Sunday, May 20, 2012

Java Classloaders


Key things about Java classloaders:
1. There is a hierarchy among classloaders:
Bootstrap <---|
                      Extension <--|
                                          System <---|
Custom

- Child classloaders (typically) delegate class loading to parent classloaders. child class loader’s findClass() method is not called if the parent classloader can load the class.
- Custom classloaders can override the default delegation chain to a certain extent.
- Due to the delegation chain of classloaders, ensure classes to be loaded by custom classloaders are not present on the system class path, boot class path, or extension class path.

2. Bootstrap classloader is a special classloader included with the JVM, written in native code. Bootstrap classloader is tasked with loading all core classes part of the jre.
None of the other classloaders can override the Bootstrap classloader's behvaiour.

3. All other classloaders are written in Java.

4. Extension classloader loads classes from the extension directories: JAVA_HOME/jre/lib/ext/

5. A more popular alternative to using the Extension classloader, is to use the System classloader which loads classes from the CLASSPATH environment variable location.

6. Finally, Custom classloaders can be written to override certain defaults like delegating classloading to parents, etc. Custom classloaders is commonly used by Application servers (such as Tomcat).

7. Separate Namespaces per Classloader:
Same class loaded by two different classloaders, are considered different. Trying to cast an object of one class (loaded by classloader 1) to a reference of the other (loaded by classloader 2, though identical in terms of its fully qualified class name) will result in a ClassCastException.

8. Lazy loading and Caching of Classes:
Classloaders load classes lazily. Once loaded classloaders cache all previously loaded classes for the duration of the JVM.

Key Methods of Classloaders:
To be detailed..

Dynamic Reloading of Classes:
Due to the non-overridable behaviour of caching of classes by classloaders, reloading of classes within a running JVM poses problems. To reload a class dynamically (a common use case for app. servers), a new instance of the classloader itself needs to be created.
Once the earlier classsloader is orphaned/ garbage, classes loaded & cached by it (and reachable only via the now GC'd classloader) also become garbage, which can then be collected by the GC.

Sunday, May 13, 2012

Reflexive, Symmetric, Transitive, Associate, Commutative, Distributive Laws and Operators

A refresher:

Reflexive: a = a
(Every element is mapped back to itself)

Symmetric:  a = b,  => b = a

Transitive: a = b  & b = c, => a = c

Associative: (x # y) # z = x # (y # z)
(The operator is immune to different grouping of the operands)

Commutative: x $ y = y $ x
(The operator is immune to changing order of operands)

Distributive: x @ (y ^ z) = x @ y ^  x @ z,
(Outer Operand x can be distributed over the other two y & z without affecting the results)

Monday, April 2, 2012

NP, NP-Hard & NP-Complete


NP: One that has a non-deterministic, polynomial time solution. (Mind you, it is still polynomial time). The other way to define it is, given a solution (certificate), one can verify correctness of the solution using a deterministic Turing Machine (TM) in polynomial time.

NP-Hard: A hard problem - at least as hard as the the hardest problem known (& unsolvable) so far.

NP-Complete: ( NP ) <intersection> ( NP-Hard ).

NP & NP-Hard are different, and overlap only in certain cases when they are NP-Complete. Many NP-Hard as a result, are not NP-Complete. 

Monday, March 5, 2012

Rails Cheat Sheet

This is a living doc. with my notes on getting off the ground and surviving around a Rails apps. My experience with Rails is rather clunky & hardly anything to speak of Ruby.

Set up
Ubuntu OS, Rails 2.0, Ruby 1.8 running on Mongrel behind an Apache web server

Books
A good starting point would be "Four Days on Rails" by John McCreesh.

Installing on Ubuntu:

- Install the following through Synaptic
Ruby 1.8
Gem 1.8
mysql client/ server (should be installed to connect & run mysql server locally)

- Further install all essential gems (you can go about this lazily, installing what you need):
sudo gem install -v=2.0.2 rails

Gems installed on my dev:
gem list 
    actionmailer (2.1.0, 2.0.2)
    actionpack (2.1.0, 2.0.2)
    activerecord (2.1.0, 2.0.2)
    activeresource (2.1.0, 2.0.2)
    activesupport (2.1.0, 2.0.2)
    cgi_multipart_eof_fix (2.5.0)
    daemons (1.0.10)
    fastthread (1.0.1)
    gem_plugin (0.2.3)
    localization_generator (1.0.8)
    mongrel (1.1.5)
    mongrel_cluster (1.0.5)
    mysql (2.7)
    rails (2.0.2)
    rake (0.8.1)
    salted_login_generator (2.0.2)


Creating your first app
    rails <Application Name>
    edit config/database.yml for correct database settings
    ./script/generate model <modelName>
    ./script/generate controller <controllerName>
    edit db/migrate/<record name> for the appropriate insert and delete scripts
    rake db:migrate
    rake db:migrate RAILS_ENV=production
    Scaffold the controller (put the text active_scaffold : <modelName>

Starting the server
  Within the project folder type:
 ./script/server

Hit http://localhost:9000/ to test. Monitor the console outputs for any issues. Once you find things are working, you can turn this into a daemon (adding & to the script).
./script/server --env=production &

Sunday, February 26, 2012

Phantom References in Java

Depending upon the tenacity of the grasp of the reference object, Java language has defined four different kinds of reference types - Strong, Soft, Weak & Phantom.

A simple search leads you to good explanations on the topic. Particularly the article by Ethan, this SO discussionthe official Java docs from Oracle, and this article on reachability are useful.

The focus of this article is to explain the working of the example in KDGregory's post on Phantom References.

You must understand the call sequence shown above and the purpose of the participating classes. The key points are:

(i) PooledConnection: class extends Connection class.

(ii) Client Applications: only interact with instances of  the wrapper PooledConnection class. Applications get hold of new connections via the getConnection() for the ConnectionPool.

(iii) ConnectionPool: has a fixed number of Connections (base).

(iv) IdentityHashMap: Strong references to currently assigned/ in use Connections are maintained in two IdentityHashMaps - ref2Cxt & cxt2Ref. The references in the two HashMaps is the PhantomReference to the newly created PooledConnection.

The code is able to handle the following cases with regard to releasing of connections:

Case 1: A well behaving client, which calls PooledConnection.close():
This in turn calls the ConnectionPool.releaseConnection(Connection), passing in the associated the Connection object, leading to returning of the associated Connection object to the Connection Pool.

Case 2: Misbehaving client, Crashed client, etc. which does NOT call PooledConnection.close():
Once the client application stops using the PooledConnection object, the PooledConnection object goes out of scope. At this stage the PhantomReference associated with the PooledConnection object (see wrappedConnection()), gets enqueued in the associated ReferenceQueue.

In the ConnectionPool.getConnection() method, the call to ReferenceQueue.remove(), returns the PhantomReference to the particular  PooledConnection. Passing in this PhantomReference to ConnectionPool.releaseConnection(Reference), locates the associated Connection object from the _ref2Cxt, IdentityHashMap, & returns the Connection back to the Connection Pool.

This automatic reclaiming of the precious (yet orphaned) Connection is the crux of KD Gregory's Phantom Reference example.

Sunday, January 22, 2012

Java Generics PECS & Get-Put Principle, Non-Reifiable Types & Erasure, Covariance & Contravariance


Joshua Bloch's in the book Effective Java has introduced the PECS (Producer Extends Consumer Super) mnemonic for managing type hierarchies via Java Generics, i.e. Covariance & Contravariance for Generics.

Covariance & Contravariance

Covariance is a subtyping principle where a child/ sub-type can be used in place of parent/ super-type. In Java land this is seen with Arrays, Method Overriding (Java 5 onwards) & Generics Extends.

Contravariance is the reverse, where a parent/ super-type can be used in place of a sub-type. In Java, contravariance is seen with Generics Super.

PECS

Now coming back to PECS, also referred to as the Get-Put, principle of Generics. The key thing about PECS is that it is defined from the persepective of the Collection object in focus, and not the caller/ client using the Collection object.

Case 1: When the Collection is being used to retrieve existing (previously added) data, from the Collection's perspective it is a Producer of data. As per PECS it needs to Extend.
The caller in this case can be sure that results have objects who's parent is MyParent. So the caller can safely iterate over results as an immutable collection of MyParent objects.

However, any kind of addition into the Collection <? extends MyParent> is unsafe. MyParent can have any number of subtypes (MyChild1, MyChild2, etc.) and there is no way for the caller application to know this specific subtype. So at this stage any addition is not allowed.

Case 2: When the Collection is being used to store new data, from the Collection's perspective it is a Consumer of data. As per PECS it needs to define Super.

Type Erasure & Non-Reifiable Nature of Generics

Java Generics is implemented through erasure entirely at the compiler level. The type info. is discarded from within the bytecodes (i.e. List<String> & List have identical bytecodes). This also causes the non-reifiable behaviour of Generic types, i.e. the inability to determine the type info. from the bytecode at runtime. (Note that Arrays are reifiable as shown in E.g. 2 above).

The PECS rule is therefore to make the compiler operate defensively.  Once type info has been associated with a given Collection via Generics, the PECS restriction is there to enable compile time detection of potential unsafe usage.

Friday, January 20, 2012

SEDA - Staged Event Driven Architecture

Welsh, Culler, Brewer's paper at SOSP-01 introduces the concepts around SEDA lucidly. SEDA, in my experience, has been a good architectural choice for building scalable back-end systems. SEDA based systems, not only scale well, but also have trackability, failure recovery, load balancing, etc. introduced very easily/ naturally into the system along the stage boundaries. 

Monday, January 2, 2012

Intent

The intent of this blog is to show a way forward around the complex world of algorithms. Will also be delving into other areas of computer science, distributed systems, architecture, FOSS every now and then. So happy reading.