Thursday, October 25, 2012

Amdahl's Law for Max Utilization and Speedup With Parallelization


There's a lot of talk these days about constraints introduced by the CAP theorem. One other equally relevant law for Parallel and Distributed systems is the Amdahl's law. Amdahl's law talks about the amount of speedup that can be achieved when a given single processor (or threaded) task is split and handed over to N-processors (or threads) to be executed in parallel.

To take an example let's work with the typical entrance examination problem: "if one person takes 2 hours to eat up a cake, how long would four people take to eat the same cake?".

Simple each person eats up one-fourth of the cake. So time taken = time-for-1-person/ N = 2/ 4 = 0.5 hours. Right? Ya, well, unless it's the very same cake that we were referring to, the one that got eaten ;)

What Amdahl's law says is that if the single processor task has sub-tasks or steps (unlike the cake eating example above) not every sub-task/ step can be parallelized. There is some percentage of sub-tasks (F%) that need to be run sequentially. As a result the speedup is not N-times but less computed as follows.



- To go back to our cake eating example,

i.e. 4 times speedup. Indeed 4 people took 1/4 th the time, i.e. 30 mins, as compared to 2 hours by one person.

- Let's now add some sequential steps before the cake eating. First you got to pay for it at the cashier & then take delivery from the delivery counter.
These have to be done in sequence & only by one person (why pay twice?). Thanks to the monopoly & popularity of the cake vendor, there's invariably a long queue at the cashier. It takes 15 mins to pay and another 15 mins to get the delivery of the cake.


So you see, due to the 20% sequential tasks the speedup has dropped from 4 times to 2.5 times.

Wednesday, October 10, 2012

Brewer's CAP Theorem


Brewer's CAP theorem talks about Consistency (C), Availability (A), Partition (P) tolerance, as the constraints that primarily govern the design of all distributed systems. There's a lot of literature available online explaining the theorem. The summary is that given that network partitions (P) will happen, pick one of the other two - Consistency (C) or Availability (A) for designing your system on a case by case basis (since you can't have all three)!

A partition could be caused by the failure of some kind of component - hardware (routers, gateway, cables, physical boxes/ nodes, disks, etc.) and/or software. When that happens:

- If you pick Consistency (C) => All your systems, processing, etc. is blocked/ held up until the failed component(s) recovers.

This has been the default with traditional RDBMS (thanks to their being ACID compliant). For financial & banking applications this normally has to be the choice.

- On the other hand, if you pick Availability (A) => All systems, other than the currently partitioned/ failed systems, continue to function as is within their own partitions. Seems good? Well not quite, cause this obviously results in inconsistencies across the two (or more) partitioned sections.

Systems thus designed with Availability (A) as their selection (over C), must be able to live with inconsistencies across different partitions. Such systems also have some automated way to later get back to consistent state (eventual consistency) once the partitioned/ failed systems have recovered.

This is mostly the design choice with the NoSqls. Also with services such as Amazon AWS where eventual consistency within some reasonable time window (of a few seconds to a minutes) is acceptable.