Algorithms, Design, Code and more

Thursday, June 3, 2021

Installing Canon LBP2900 Printer on Ubuntu 20.04

The instructions for installing Canon LBP2900 Printer on Ubuntu 20.04 remain pretty much the same as given in an earlier post about Installing Canon LBP2900 Printer on Ubuntu 16.04. The few differences seen this time around are the following:

1) Newer version of LINUX-CAPT-DRV-v271-UKEN: Download link as mentioned on the Canon Support page.

The tar ball includes 64-bit installers: cndrvcups-common_3.21-1_amd64.deb & cndrvcups-capt_2.71-1_amd64.deb (also includes rpm files & 32-bit versions).

2) Depdency on libglade2-0:

$ sudo apt-get install libglade2-0

3) Workaround for CAPT 64-bit OS issues linking to 32-bit libraries:

As mentioned in the earlier post, there are certain dependencies for CAPT running in 64-bit OS in linking to 32-bit libraries. Even though, the two deb files (mentioned above) get installed without the dependencies, peculiar behaviour/ error message is seen when the dependecies are missing. On the final step to View Status of printer on captstatusui, an error message is shown: "Check the DevicePath of /etc/ccpd.conf".

The solution to the problem is to simply check the missing dependencies:
(Check for error messages such as not executable, not found, etc.)

$ ldd /usr/bin/captfilter

$ ldd /usr/bin/capt* | sort | uniq | grep "not found"

In case of error install the missing dependencies:

$ sudo apt-get install libc6:i386 libpopt0:i386

$ sudo apt-get install zlib1g:i386 libxml2:i386 libstdc++6:i386

4) Follow the remaining instructions of the earlier post:

=> Install CAPT printer driver (downloaded .deb files mentioned above)

=> Add printer to the system: (Command line lpadmin or system-config-printer or UI System Settings > Printers):

=> Add printer to ccpadmin

=> View status of printer on captstatusui (should show "Ready to Print" if installed correctly)

=> Finally, print test page to finish!

Wednesday, March 31, 2021

Flip side to Technology - Extractivism, Exploitation, Inequality, Disparity, Ecological Damage

Anatomy of an AI system is a real eye-opener. This helps us to get a high level view of the enormous complexity and scale of the supply chains, manufacturers, assemblers, miners, transporters and other links that collaborate at a global scale to help commercialize something like an Amazon ECHO device.

The authors explain how extreme exploitation of human labour, environment and resources that happen at various levels largely remain unacknowledged and unaccounted for. Right from mining of rare elements, to smelting and refining, to shipping and transportation, to component manufacture and assembly, etc. these mostly happen under in-human conditions with complete disregard for health, well-being, safety of workers who are given miserable wages. These processes also cause irreversible damage to the ecology and environment at large.

Though Amazon Echo as an AI powered self-learning device connected to cloud-based web-services opens up several privacy, safety, intrusion and digital exploitation concerns for the end-user, yet focusing solely on Echo would amount to missing the forest for the trees! Most issues highlighted here would be equally true of technologies from many other traditional and non-AI, or not-yet-AI, powered sectors like automobiles, electronics, telecom, etc. Time to give a thought to these issues and bring a stop to the irreversible damage to humans lives, well-being, finances, equality, and to the environment and planetary resources!

Monday, March 29, 2021

Doing Better Data Science

In the article titled "Field Notes #1 - Easy Does It" author Will Kurt highlights a key aspect of doing good Data Science - Simplicity. This includes first and foremost getting a good understanding of the problem to be solved. Later among the hypothesis & possible solutions/ models to favour the simpler ones. Atleast giving the simpler ones a fair/ equal chance at proving their worth in tests employing standardized performance metrics.

Another article of relevance for Data Scientists is from the allied domain of Stats titled "The 10 most common mistakes with statistics, and how to avoid them". The article based on the paper in eLife by the authors Makin and Orban de Xivry lists out the ten most common statistical mistakes in scientific research. The paper also includes tips for both the Reviewers to detect such mistakes and for Researchers (authors) to avoid them.

Many of the issues listed are linked to the p-value computations which is used to establish significance of statistical tests & draw conclusions from it. However, its incorrect usage, understanding, corrections, manipulation, etc. results in rendering the test ineffective and insignificant results getting reported. Issues of Sampling and adequate Control Groups along with faulty attempts by authors to establish Causation where none exists are also common in scientific literature.

As per the authors, these issues typically happen due to ineffective experimental designs, inappropriate analyses and/or flawed reasoning. A strong publication bias & pressure on researchers to publish significant results as opposed to correct but failed experiments makes matters worse. Moreover senior researchers entrusted to mentor juniors are often unfamiliar with fundamentals and prone to making these errors themselves. Their aversion to taking criticism becomes a further roadblock to improvement.

While correct mentoring of early stage researchers will certainly help, change can also come in by making science open access. Open science/ research must include details on all aspects of the study and all the materials involved such as data and analysis code. On the other hand, at the institutions and funders level incentivizing correctness over productivity can also prove beneficial.

Saturday, February 20, 2021

Debit Card to Bank Account Transfer

A feature missing from the Bank/ FinTech value chain is Domestic Debit Card (DC) to Domestic Bank A/c (BA) transfer. With wide proliferation of debit cards, payment gateways and POS vendors all provide C2B payments through these channels. Debit card to Bank A/c transfer doesn't exist, which would instead be a C2C/ P2P provision (via regulated payment intermediaries).

There ought to be very valid reasons for the same such as the regulator disallowing it, security & fraud considerations, transaction fees & gateway charges, error handling & reversal mechanism, and so on.

Alternatives such as bank transfers provisions such as NEFT, RTGS, etc. and UPI mobile-apps based P2P transactions exist. Even for these modes all the issues mentioned above hold true yet the solutions were allowed to run and mature over time. So why not DC to BA?

If ever such a feature were to be rolled out by Banks/ FinTechs then all that customers would need is a single web page on the service provider/ Bank website to capture:

Debit Card details (Card no, Validity, Name, etc.)
Recipient Details (A/c No, IFSC, Name)
Amount to Transfer

Next, the back-end gateway systems would:

Verify the details, authenticate the transactions, perform due handshakes for processing, etc.
If all ok, proceed to standard OTP based validation for the DC holder
Finally, debit the sender's DC (linked Bank A/c) & credit the receiver's Bank A/c

That's about it!

Monday, February 15, 2021

Parental Controls for Securing Online Usage by Children

As explained in the past, various safety features such as family shield filters from providers like OpenDNS , Cloudflare and others, DNS Over Https (DoH), HTTP Strict Transport Security (HSTS) can be used for a hassle free safe browsing across devices for members of the family. To additionally secure and regulate the usage for young kids Parental Control features and tools can be employed on devices and networks being accessed by children.

Parental Controls are available from day one across most device operating systems (OS) such as Android, iOS, and so on. All that the parent then needs to do, is to log in to the device using his/ her credentials and indicate to the device (OS) that the user of the device is a child and switch ON parental controls. Once that's done, the parental controls will get activated and only allow specific apps to run (apps white listed by the parent) while disallowing all others, and also filter out potentially harmful content from various sites and resources online.

Conceptually, that's pretty much all that there is to Parental Controls! For more info you can check out online resources such as these by Vodafone, VI and Google for a better understanding and setting-up parental controls to protect your kids online.

Monday, July 20, 2020

Inclusive Online Education

In today's times maintaining physical/ social distancing is the new way of living. While most of our activities will have to be done from home re-adjustments will be required to a few to shift them over to the digital channels.

Education is one key human endeavor which will progressively be done online. An assessment of the current state of online education here in India shows that while some resources do exist there's a lot that needs to be done to make online education viable and effective for students, particularly for those from the marginalized sections of the society.

Various government bodies connected with education in India such as CBSE, NCERT, AICTE, UGC, NIOS, CIET, CEC, MHRD, etc. have over the years made attempts towards providing distance education, e-learning & MOOCs, digitization of books and materials, content delivery via various digital plaforms & tv , youtube, video conferencing and so on. These freely available resources can be good starting points for aggregating and rolling out well thought out standardized content/ tools for the students. Parallely, availability of content in regional languages and localization can be fast paced.

On the other side are the new world EduTech startup companies that are making lots of progress in the technology driven online education space. Companies such as Byjus, Vedantu, Khan Academy, TOI etc. are now household names thanks to their big advertising push. The target student pool though seems to be the well to do convent/ public school student with the means to pay for the services.

A good grasp over language and access to internet & good mobile and computing devices are pre-requisites for using these platforms well. While absence of such novelties in the lives of students from the poor and marginalized sections of society make the platforms out of reach of such students. Perhaps, it's time for socially conscious EduTech startups to come-forth to bridge the digital divide!

Update 1 (29-Mar-21):

- Check out the discussion on EduTech at India Economic Conclave 21 between
Vineet Nayar(Founder & Chairman, Sampark Foundation) & Ashish Jhalani (CMO (Global), Square Panda)

- Also the Fireside Chat with Tennis Legend Andre Agassi of The Andre Agassi Foundation for Education & Chairman of Board, Square Panda

Thursday, May 7, 2020

Ffmpeg - Swiss Army Knife of Video/ Audio Editting

Ffmpeg is a fantastic video and audio converter to edit & create video & audio files. As is typical of *nix command line tools, ffmpeg has several options that need to be correctly configured to be able to use the tool properly to edit videos or audios.

Architecturally, ffmpeg works with streams of video, audio, images or other data files that are passed through various reader/ writer (demuxer/ muxer) and encoder/ decoder layers for the editing and creating video and audio files:

Image Credit: Official Ffmpeg Linux Manual

The command prompt may be a little overwhelming to start off, but a little playing with the tool shows reveals its immense potential. The official documentation page & Linux manual has a few examples to get you started.

Beyond this there are several online resources, blogs and articles such as this, this, this & this, etc. which have listed down the different ffmpeg commands with options. On the other hand, for those averse to the shell prompt, there are several GUI tools written on top of ffmpeg which can be explored.

Friday, April 17, 2020

Analysis of Deaths Registered In Delhi Between 2015 - 2018

The Directorate of Economics and Statistics & Office of Chief Registrar (Births & Deaths), Government of National Capital Territory (NCT) of Delhi annually publishes its report on registrations of births and deaths that have taken place within the NCT of Delhi. The report, an overview of the Civil Registration System (CRS) in the NCT of Delhi, is a source of very useful stats on birth, deaths, infant mortality and so on within the Delhi region.

The detailed reports can be downloaded in the form of pdf files from the website of the Department of Economics and Statistics, Delhi Government. Anonymized, cleaned data is made available in the form of tables in Section Titled "STATISTICAL TABLES" in the pdf files. The births and deaths data is aggregated by attributes like age, profession, gender, etc.

Approach

In this article, an analysis has been done of tables D-4 (DEATHS BY SEX AND MONTH OF OCCURRENCE (URBAN)), D-5 (DEATHS BY TYPE OF ATTENTION AT DEATH (URBAN)), & D-8 (DEATHS BY AGE, OCCUPATION AND SEX (URBAN)) from the above pdfs. Data from for the four years 2015-18 (presently downloadable from the department's website) has been used from these tables for evaluating mortality trends in Delhi for the three most populous Urban districts of North DMC, South DMC & East DMC for the period 2015-18.

Analysis

1) Cyclic Trends: Data for absolute death counts for period Jan-2015 to Dec-2018 is plotted in table "T1: Trends 2015-18". Another view of the same data is as monthly percentage of annual shown in table "T-2: Month/ Year_Total %".

Both tables clearly show that there is a spike in the number of deaths in the colder months of Dec to Feb. About 30% of all deaths in Delhi happen within these three months. The percentages are fairly consistent for both genders and across all 3 districts of North, South & East DMCs.

As summer sets in from March the death percentages start dropping. Reaching the lowest points below 7% monthly for June & July as the monsoons set in. Towards the end of monsoons, a second spike is seen around Aug/ Sep followed by a dip in Oct/ Nov before the next winters when the cyclic trends repeat.

Trends reported above are also seen with moving averages, plotted in Table "T-3: 3-Monthly Moving Avg", across the three districts and genders. Similar trends, though not plotted here, are seen in the moving averages of other tenures (such as 2 & 4 months).

2) Gender Differences: In terms of differences between genders, far more deaths of males as compared to females were noted during the peak winters on Delhi between 2015-18. This is shown in table "T4: Difference Male & Female".

From a peak gap of about 1000 in the colder months it drops to about 550-600 range in the summer months, particularly for the North & South DMCs. A narrower gap is seen the East DMC, largely attributable to its smaller population size as compared to the other two districts.

Table "T5: Percentage Male/ Female*100" plots the percentage of male deaths to females over the months. The curves of the three districts though quite wavy primarily stay within the rough band of 1.5 to 1.7 times male deaths as compared to females. The spike of the winter months is clearly visible in table T5 as well.

3) Cross District Differences in Attention Type: Table "T6: Percentage Attention Type" plots the different form of Attention Type (hospital, non-institutional, doctor/ nurse, family, etc.) received by the person at the time of death.

While in East DMC, over 60% people were in institutional care the same is almost 20% points lower for North & South DMCs. For the later two districts the percentage for No Medical Attention received has remained consistently high, the South DMC being particularly high over 40%.

4) Vulnerable Age: Finally, a plot of the vulnerable age groups is shown in table "T7: Age 55 & Above". A clear spike in death rates is seen in the 55-64 age group, perhaps attributable to the act of retirement from active profession & subsequent life style changes. The gender skewness within the 55-64 age group may again be due to the inherent skewness in the workforce, having far higher number of male workers, who would be subjected to the effects of retirement. This aspect could be probed further from other data sources.

Age groups in-between 65-69 show far lower mortality rates as they are perhaps better adjusted and healthier. Finally, a spike is seen in the number of deaths in the super senior citizens aged 70 & above, which must be largely attributable to their advancing age resulting in frail health.

Conclusion

The analysis in this article was done using data published by the Directorate of Economics and Statistics & Office of Chief Registrar (Births & Deaths), Government of National Capital Territory (NCT) of Delhi annually on registrations of births and deaths within the NCT of Delhi. Data of mortality from the three most populous districts of North DMC, South DMC and East DMC of Delhi were analysed. Some specific monthly, yearly and age group related trends are reported here.

The analysis can be easily performed over the other districts of Delhi, as well as for data from current years as and when those are made available by the department. The data may also be used for various modeling and simulation purposes and training machine learning algorithms. A more real-time sharing of raw (anonymized, aggregated) data by the department via api's or other data feeds may be looked at in the future. These may prove beneficial for the research and data science community who may put the data to good use for public health and welfare purposes.

Resouces:

Downloadable Datasheets For Analysis:

Friday, February 28, 2020

Defence R&D Organisation Young Scientists Lab (DYSL)

Recently there was quite a lot of buzz in the media about the launch of DRDO Young Scientists Lab (DYSL). 5 such labs have been formed by DRDO each headed by a young director under the age of 35! Each lab has its own specialized focus area from among fields such as AI, Quantum Computing, Cognitive Technologies, Asymmetric Technologies and Smart Materials.

When trying to look for specifics on what these labs are doing, particularly the AI lab, there is very little to go by for now. While a lot of information about the vintage DRDO Centre of AI and Robotics (CAIR) lab is available on the DRDO website, there's practically nothing there regarding the newly formed DRDO Young Scientists Lab on AI (DYSL-AI). Neither are the details available anywhere else in the public domain, till end-Feb 2020 atleast. While these would certainly get updated soon for now there are just these interviews with the directors of the DYSL labs:

Doordarshan's Y-Factor Interview with the 5 DYSL Directors Mr. Parvathaneni Shiva Prasad, Mr. Manish Pratap Singh, Mr. Ramakrishnan Raghavan, Mr. Santu Sardar, Mr. Sunny Manchanda

Rajya Sabha TV Interview with DYSL-AI Director Mr. Sunny Manchanda

Wednesday, February 26, 2020

Sampling Plan for Binomial Population with Zero Defects

Rough notes on sample size requirement calculations for a given confidence interval for a Binomial Population - having a probability p of success & (1 – p) of failure. The first article of relevance is Binomial Confidence Interval which lists out the different approaches to be taken when dealing with:

Large n (> 15), large p (>0.1) => Normal Approximation
Large n (> 15), small p (<0.1) => Poisson Approximation
Small n (< 15), small p (<0.1) => Binomial Table

On the other side, there are derivatives of the Bayes Success Run theorem such as Acceptance Sampling, Zero Defect Sampling, etc. used to work out statistically valid sampling plans. These approaches are based on a successful run of n tests, in which either zero or a an upper bounded k-failures are seen.

These approaches are used in various industries like healthcare, automotive, military, etc. for performing inspections, checks and certifications of components, parts and devices. The sampling could be single sampling (one sample of size n with confidence c), or double sampling (a first smaller sample n1 with confidences c1 & a second larger sample n2 with confidence c2 to be used if test on sample n1 shows more than c1 failures), and other sequential sampling versions of it. A few rule of thumb approximations have also emerged in practice based on the success run techique:

Rule of 3s: That provides a bound for p=3/n, with a 95% confidence for a given success run of length n, with zero defects.

Success run sample size (n) using Confidence Interval (C) & Reliability (R = 1 -p), when sampling with replacement (sampled item replaced and maybe selected again) taking a Binomial Distribution:

probability of R^n, for a successful run of length n with zero defects

case when a maximum of k defects/ failures is acceptable

For a small population and when sampling without replacement (sampled item not replaced), the Hypergeometric Distribution is used for sampling n items from a population having size N with D defects. Accordingly, Operating Characteristic (OC) curves for the given problem are prepared and used to get values of C & R for a given n.

Footnote on Distributions:

Poisson distribution is used to model events within time/ space that are rare (small p) but show up large number of times (large n) & occur independent of the time since last event. Inter arrival times is an iid exponential random variables.

Poisson can be derived as a limiting case to Binomial when n is large (tends to infinity) while the expected no. of successes (or failures) remains fixed, also referred to as the law of rare event

Poisson confidence interval is derived from Gamma Distribution - which is defined using the two-parameters shape & scale. Exponential, Erlang & Chi-Squared are all special cases of Gamma Distrubtion. Gamma distribution is used in areas such as prediction of wait time, insurance claims, wireless communication signal power fading, age distribution of cancer events, inter-spike intervals, genomics. Gamma is also the conjugate prior of Bayesian statistics & exponential distribution.

Bayesian Success Run can be derived using the Beta Distribution which is the conjugate prior for Binomial. Beta Distribution is defined via two shape parameters. Beta Distribution applications is found in order statistics (selection of k-th smallest from Uniform distribution), subjective logic, wavelet analysis, project management (PERT), etc.

Wednesday, October 23, 2019

Over-Smart World

BlackMirror BlackMirror On the Wall

Who's the Smartest of them All?

Smart smart Alexa,

Smart smart TV,

Smart smart Watch,

Smart smart This,

And smart smart That

They never sleep?

Keep always an eye (& ear) on me.

BlackMirror BlackMirror On the Wall

Who's the Smartest of them All?

Thursday, October 3, 2019

Firefox Normandy

Firefox through the Normandy feature provides an option for unsolicited/ automagic updates to default values of a Firefox (targetted) instance. For more on the risk this poses take a look at the ycombinator threads.

To turn off Normandy in Firefox use the advanced settings route: about:config > app.normandy.enabled = false.

Update 1 (23-Oct-19):
- Principally Mozilla (Firefox) have always been in favour of user privacy.

Saturday, September 21, 2019

Last Petrol Car

In the year 2024 my present BS-III Hyundai petrol (BS-III) hatchback would reach its end of life, 15 years after its first drive out of the showroom. Given all the buzz from the Electric Vehicle (EV) space, this would very likely be my last petrol car. At some level, most of us have next to zero attachment with the fuel that powers the vehicle under the hood (petrol, cng, electricity, etc.). What we care about is that the new vehicle shouldn't be a downgrade in terms of reliability, comfort, features, looks, pricing, drivability, power, pickup, etc and an increase in terms of purchase & running costs.

Battery operated EVs seem to be getting better by the day. There's good traction seen in the three-wheelers (battery operated autos/ totos) space. Two- & four-wheelers are likely to hit mass markets soon, with pricing that would be lucrative (perhaps tax incentivized). Further, widespread infrastructural & service support need to be introduced to give people the confidence to switch to EVs.

Yet, at the moment, EV technologies - battery, chargers, fire & safety protocols, instrumentation, cabling & connectors, etc. - are at early-to-mid maturity level. Driving range per charge is about 100 Kms for the entry segment cars which is not enough. It's quite common for people to drive ~150 Kms daily for work. On highways, the range could be much more. So a sub-300 Km range would simply not do!

At the same time, the mass market pricing levels (INR 3 to 6 lacs) should not be breached in any way. The existing coverage of mechanics & service centres of various manufacturers (Maruti, Hyundai, Mahindra, Tata, etc.) needs to be upgraded to support EVs as well.

Reliable electricity remains a constraint in most cities including the metros. On the generation side, renewables would need a wider push. Residential solar rooftop set-ups could be one area of focus. Through such set-ups, individual households & complexes could achieve self-sufficiency for their growing energy needs, including the EV burden/ load (@20-30 Units for full charge per vehicle X 30 days = 600-900 units per vehicle per month). Standard practices to popularize rooftop solar set-ups employed the world over such as PayGo models, incentives/ tax breaks, quality controls, support & maintenance, etc. should be introduced here as well. If possible, it would be great to have the EVs themselves equipped with solar panels on the body to auto-charge whenever required under direct sunlight. Eagerly waiting for these clean green technologies to evolve and make inroads very soon!

Update 1 (09-Oct-19):
- An assessment of the current state of EV adoption in India by Business Standard.

Update 2 (23-Oct-19):
- Bajaj Chetak to be relaunched in an Electric avatar.
- Blu-Smart all electric cabs visible on Delhi roads.

Thursday, September 19, 2019

Renewable Energy In India

India holds great potential in the renewable energies space. We have ample opportunities to generate all our present and future energy needs from sources like solar, wind, water and biomass.

From an energy generation capacity from renewables pegged at ~60 GW (in 2017) we are targetting to reach about 175 GW (100 GW Solar, 60 GW wind, 10 GW biomass, 5 GW small hydro power) by 2022. Which would be close to 50% of our entire energy needs. With ground work for mass adoption of Electric Vehicles (EV) getting traction, our demands for power and generation from renewables will need to scale up even further. To the extent that we may become energy surplus one day and be able to export to the neigbourhood. For a sneak peak into the state of the art from the world of renewables, head over to the Renewable Energy India (REI) Expo 2019 currently underway at the Knowledge Park II, Greater Noida.

The REI-2019 has exhibitors from leaders in the renewables space such as China, Bangladesh, France, Germany, India, Israel, Netherlands, Saudi Arabia, Singapore, Slovakia, South Korea, Taiwan, Tunisia, UK, USA, Vietnam, etc. They are showcasing their product portfolios from solar & wind power devices to installations on floating & permanent structures, from switching & grid apparatus to connectors, from inverters & batteries to EVs, and more. Expo timings are from 10 am to 6 pm. Walk-in as well as online registrations are allowed. Go see the future!

Update 1 (21-Sep-19):
- Listen to what Greta Thrunberg has to say & check out her zero-carbon boat

Update 2 (23-Oct-19):
- Coal to continue powering India's energy requirements for decades - Swaminomics

Wednesday, September 18, 2019

Sim Swap Behind Twitter CEO's Account Hack

There was a lot of buzz about the recent hacking incident of the Twitter CEO, Jack Dorsey's account. The key thing to note is that the hack was effected by a sim swap fraud, wherein a fraudster tricks a mobile carrier into transferring a number. Your mobile being the key to your digital life & hard earned money gets completely compromised through a fraud like sim swap.

SIM swap fraud can be done by some form of social engineering and stealing/ illegally sharing personal data of user used to authenticate with the telecom operator. The other way is by malware or virus infected app or hardware taking over the user's device, or by plain old manipulation of personnel of the telecom company through pressure tactics, bribes, etc.

In order to limit cases of frauds DOT India has brought in a few mandatory checks into the process of swapping/ upgrading sim cards to be followed by all telecom operators. These include IVRS based confirmation call to the subscriber on current working sim, confirmation SMS to current working sim, and blocking of SMS features for 24 hours after swapping of sim.

The window of 24 hours is reasonably sized to allow the actual owner to react in case of a fraud thanks to these checks. Once they realize that their phone has mysteriously gone completely out of network coverage for long, and doesn't seem to work even after restarting and switching to a location known to have good coverage alarm bells ought to go off. Immediately they should contact the telecom operator's helpline number/ visit the official store.

At the same time, the window of 24 hours is not excessively long to discomfort a genuine user wanting to swap/ upgrade. Since SMS services remains disabled, SMS based OTP authentication for apps, banking etc. do not work within this period of time, thereby preventing misuse by fraudsters.

Perhaps, telecom regulators & players elsewhere need to follow suit. Twitter meanwhile has chosen to apply a band-aid solution by turning off their tweet via SMS feature post the hack. Clearly a lot more needs to be done to put an end to the menace.

Thursday, August 29, 2019

What? A Man In The Middle!

Well yes, there could be somone intercepting all your digital & online traffic, unless proper precautions to secure them are in place. The focus of the article is not about how to be the man-in-the-middle (mitm), but to prevent getting snooped on by him. Here are some basic online hygiene techniques to follow to remain safe, as far as possible.

To begin with let's look at the high level components that are a part of the digital traffic:

Device: Phone, pad or desktop
App: Running one the device (Whatsapp, Fb, Gmail, Browser, etc.)
Server: Server components of the service provider, organization, etc. that is listening to & providing some service to the app
Network: Wired, wireless, hybrid channel through which the digital packets (bits) travel between the device & the server

Of course, there are many other components in play, but for now we'll keep things simple.

Device & Apps
The user's device is the first & most common point of vulnerability in the chain. These get infected by viruses or malwares. Some defences include:

Being particular about not installing any untrusted, unverified software. Installing only reputed apps and software that are actively maintained & updated that patch/ resolve existing vulnerabilities inherent in its components or dependent libraries. App developers themselves must be well conversant with standards (secure cookie, etc.) and industry best practices such as OWASP Top 10 and so on, to avoid building poor quality and vulnerable apps/ software.

Keeping devices updated. Staying up to date offers the best defence against recently detected vulnerabilities, which the manufacturers & software vendors rush to fix.

By not clicking on unverified links or downloads.

Making use of conservative settings for all apps, with absolutely minimal privileges. Company provided default permissions are found to be too lax & liberal in many cases. So review what permissions are present & change them to more minimal settings. For instance why the hell would a SMS messages app need to access phones camera?

In order to avoid crashing your phone, make piece-meal changes to the app settings & test. If it works great. If not, make a note & revert! Later check the privileges that you felt were unnecessary and caused problems.

Too much work? Well, for the moment until the device's operating system software undergo major privacy focussed revisions, there doesn't seem to be much of an alternative.

Sticking only to the manufacturer specified software repositories for updates.

For Windows based/ similar systems installing an updated anti-virus is mandatory. Use the free (for personal use) Avast anti-virus if not anything else. Better still switch to a more robust *nix based OS.

If you are a traditionalist using browsers, Mozilla Firefox set up with conservative & minimal privacy settings scores significantly over its competitors, that are mostly data capturing ad machines. If possible, contribute to help keep Mozilla, a non-profit, afloat.

Physically secure your device with a password/ pin & do not allow any unknown person to use the same. In case temporary access is to be provided specially on desktops create guest credentials for the user with limited privileges.

Server
This is the where the real action to process the user's request takes place. Whether it is an info about the weather, sending emails, getting chat notifications, doing banking transactions, uploading photos, etc. the user sends the request along with the data to the server to perform the necessary action. The server itself being a device (mostly a collection of devices database, web-server, load-balancer, cloud service, etc.) is vulnerable to all the above set of devices & apps risks plus many others that sever engineers & operation teams work to harden against.

Standards to be employed, learnings & best practices are shared widely by most of the leaders working in server side technologies via blogs, articles, conferences, journals, communities, etc. The cloud vendors (Amazon AWS, Microsoft Azure, Google Cloud, Rackspace, and so on) are specially active in this regard. They are busy pushing the bar higher with improvements to the various server technologies being rolled out regularly.

There are some open source tools available to check the different aspects of the server set-up. For instance the Owasp Test for HSTS (HTTP Strict Transport Security ) & SslLabs Server Rating Guide provides details on the requirements for the server's SSL certificate used to encrypt data. SslLabs also has an online tool to test & rate the set up of any publicly accessible server & highlight potential weaknesses.

Network
Between the user's device and the server lies the network through which the data and instructions flow. The network may include wired, wireless or a combination of components (routers, hubs, gateways, etc.). The best form of defence against the man-in-the-middle attack is to ensure that only strongly encrypted data is sent over the network (end-to-end (e2e) encryption).

The communication between the user device & server takes place via a secure HTTPS protocol using a signed SSL certificate issued via reputed certificate authority. This ensures that as long as the certificate's private key (known only to the server) remains secure the end-to-end (e2e) encryption between user's device & server works.

Yet, there are ways in which a server set-up for HTTPS communication might end up downgrading to an insecure HTTP protocol or being compromised (SslLabs Server Rating Guide). The best defence against this is to set-up the server to solely work over HTTPS, by setting it up to work with the HTTP Strict Transport Security (HSTS) protocol.

Once HSTS is enabled on the server, any non-secure HTTP requests to the server is either rejected or redirected to the secure HTTPS channel. All insecure HTTP requests from the user's end to the server are automatically switched over to HTTPS & connection between client and server dropped in case of a problem with the server's certificate. So HSTS protects against the various man-in-the-middle attack scenarios such as protocol downgrade (to insecure HTTP) & session hijacking attack.

Beyond e2e encryption & HSTS, the server address lookup process done by the user's device could also get manipulated (by ARP spoofing within LAN & DNS spoofing). In place of the genuine address, user data could be directed to a fake server's address. Performing address lookup securely via DNSSEC provides a good mitigation strategy for DNS vulnerability.

These basic requirements are essential for managing safety of user's data. Yet, in this eternal tussle between the yin and yang of security a lot more needs to be done & certainly the end goal hasn't been reached. As new threats emerge we can only hope to collectively strengthen our defences and stay alert & updated to remain secure.

Monday, August 26, 2019

Dconf, Gsettings, Gnome Files/ Nautilus Refresher

Dconf is the Linux key-based configuration system that provides the back end to Gsettings to store configurations. Dconf settings can be updated via dconf-editor and/ or via the gsettings command line. Gnome Files/ Nautilus settings for instance is Dconf based & can be accessed/ updated with these tools.

Tuesday, April 30, 2019

996.ICU

About time the revolution reaches our shores. Concerns from the 996.ICU movement most certainly apply here. There's additionally the grind from the 3+ hours of daily travel time between workplace & home in the typically over-crowded metros.

The demands for a well balanced work life (wbl) are totally justified. 955 schedule, flexi-hours, work from home, are not concepts to be just spoken of but to be put to practice. Let's work smart, live healthy & be happy!

Tuesday, March 26, 2019

Opinions On A Topic

Media agencies of the day are busy flooding us with news - wanted, unwanted, real, fake, good, bad, ugly, whatever. Yet, for the user the challenge to stay truly updated has never been this tough. Sifting the hay from the chaff is both computationally & practically hard!

There's a real need to automatically detect, flag & block misleading information from propagating. Though at the moment the technology doesn't exist, offerings are very likely to come up soon & get refined over time to nail the problem well enough. While we await breakthroughs on that front, for now the best bet is to depend on traditional human judgment.

- Make use of a set (not one or two) of trusted media sources, that employ professionals & expert journalists. Rely on their expertise to do the job of collecting & presenting the facts correctly. Assuming (hopefully) that these people/ organizations behave professionally, the information that gets through to these sources would be far better.

- Fact check details across the entire set of sources. This helps mitigate against a temporary (or permanent) deliberate/ inadvertent faltering, manipulation, influence, etc. of one odd sources. Use the set as a weak quorum that collectively highlights & prevents propagation of misinformation. Even if a few members there falter, unlikely that all would. The majority would not allow the fakes to make it into their respective channels.

- Challenging part being if a certain piece shows up as a breaking news on one channel & not the others. Could default to labeling it as fake/ unverified, with the following considerations for the news piece:

Case 1: Turns out fake, doesn't show up on the other sources
     => Remains Correctly Marked Fake

Case 2: Turns out to be genuine & eventually shows up on other/ majority sources
    => Gets Correctly Marked True

Case 3: Is genuine, but acquired via some form of journalistic brilliance (expose, criminal/ undercover journalism, etc.) that can't be re-run, or is about a region/ issue largely ignored by the mainstream media unwilling to do the verification, or for some other reason can't be verified
    => Remains Incorrectly Marked Fake

Case 3 is obviously the toughest to crack. While some specifics maybe impossible to verify, other allied details could be easier to access & verify. Once some other media groups (beyond the one that reported) get involved in the secondary verification there is some likelihood of true facts emerging.

For those marginalized there are social groups & organizations, governmental & non-governmental that have some reports published on issues from ground zero. At the same time, as connectivity improves, citizens themselves would be able to bring forth local issues onto national & international platforms. In the interim, these will have to be relied upon until commercial interests & mainstream media eventually bring the marginalized into the folds. Nonetheless, much more thought & effort is needed to check the spread of misinformation.

Finally, here's a little script 'op-on.sh' / 'op-on.py' (works/ tested on *nix desktop), to look up opinions (buzz) on any given topic across a set of media agencies, of repute. Alternatively, a bookmarklet could be added to the browser, which would enable looking up the opinions across the sites. The op-on bookmarklet (tested on Firefox & Chrome) can be installed by right clicking & adding as a bookmark in the browser (or by copying the script into the url of a new bookmark). Pop-up blockers in the browser will need to be temporarily disabled (e.g. by clicking allow pop-ups in Firefox) for the script to work.

The set of media agencies that these scripts look up include groups like TOI, IE, India Today, Times Now, WION, Ndtv, Hindu, HT, Print, Quint, Week, Reuters, BBC, and so on. This might help the curious human reader to look up all those sources for opinions on any topic of interest.

Update 1 (16-Sep-19): Some interesting developments:

Google giving priority to original news article above the ones that are derived/ sourced from the original. A step that may prove beneficial to the spirit of original fearless journalism. (Sep'19)
Economist's model revealing the underlying biases, or lack of it, with the Google news reporting algorithm. The title captures the crux of the article "Google rewards reputable reporting, not left-wing politics". (Jun'19)
Ravish Kumar's speech on winning the Magasaysay award for 2019. He touches upon the various challenges, including issues like fake news, propaganda, among many others confronting the professional journalist in todays times, particularly out here in the largest democracy. (Sep'19)
CSDS report on the impact of social media & others (paper, tv) on the voting behaviour of the Indian voter based on a survey of 24K+ voters from 211 constituencies. (Jun'19)
NGO Digital Empowerment Foundation report (Fighting Fake News, WHOSE RESPONSIBILITY IS IT? ) on the awareness of fake news, disinformation, misuse of technology & so on among Indians from Tier II & Tier III cities. Study's based on a sample of 3K users across 11 states. Good staring point for those looking to solve the problem, though sample set is small. Also behaviour would differ vastly across regions, languages, metros, villages, etc. One size fits all approach is not likely to work. (Apr'19)
BBC World Service study titled Duty, Identity, Credibility: 'Fake News' and the Ordinary Citizen in India, under the Beyond Fake News programme (Nov'18).
Time to make Hopi language like features widespread. Hopi language has grammatical markers that specify whether you witnessed the event yourself, heard about it from someone else, or consider it to be an unchanging truth. Hopi speakers are forced by Hopi grammar to habitually frame all descriptions of reality in terms of the source and reliability of their information. (Source: Chapter 1: The Promise and Politics of the Mother Tongue from the book "The Horse, The Wheel & Language")

Friday, March 8, 2019

Secure DNS

Domain Name System (DNS) is one of the backbones of the internet. DNS helps translate a URL (e.g. blahblah.com) to its corresponding IP address (i.e. 10.02.93.54). Thanks to the DNS human's can access the internet via human friendly URL, than having to remember & punch in numeric IP. So much simpler to say "look it up on Google", than saying "look it up on 172.168...".

Working of DNS

The working of the DNS involves looking up DNS servers spread out over the internet. When a user enters a URL in the browser, the address resolver in their system looks up the DNS servers configured at their system (router/ network, ISP, etc.) for the corresponding IP address. The resolver recursively looks up the Root DNS server, then the top level domain (.com, .in), then second level domain (Google, Yahoo, etc.) (the Authoritative server for the domain), & from it finally the sub-domain (www, mail, etc.) to arrive at the corresponding IP address.

DNS requests are typically made in plain text via UDP or TCP. In addition to destination URL, these requests also carry enough source identifiable information with them. Together with the recursive nature of the lookups via several intermediaries, this makes DNS requests vulnerable to being observed & tracked. The response could even be spoofed via a malicious intermediary that changes the IP address & direct the user to a scam site.

DNS over HTTPS (DoH)

A very recent development has been the introduction of DNS over HTTPS (DoH) in Firefox. HTTPS is the standard protocol used for end-to-end encryption of traffic over the internet. This prevents eavesdropping of the traffic between the client & the server by any intermediary.

To further secure the DNS request, DoH also brings in the concept of Trusted Recursive Resolvers (TRR). The TRR is trusted & of repute, & provides guarantees of privacy & security to the user. The default for Firefox is Cloudflare, though other TRRs are available for the user to choose from. Sadly though, OpenDNS isn't onboard with DoH or TRR, instead has its own offerings called DNSCrypt. Hope to see more convergence as adoption of these technologies improves in the future.

Setting-up DoH with Firefox (ver. 65.0) requires going to Preferences > Network Setting & checking "Enable DNS over HTTPS", with the default Cloudflare TRR. Alternatively, the flags "network.trr.mode" & "network.trr.uri" could be set-up via the about:config.

To confirm if the set-up is correct, navigate to the Cloudflare test my browser page & validate. This should result in successful check marks in green against "Secure DNS" & "TLS 1.3". Some further set-ups may be needed in case the other two checks fail.

For DNSSEC a DNSSEC compatible DNS server will need to be added. Pick Cloudflare DNS, Google DNS or any other from the DNS severs list. On the other hand, for Encrypted SNI indicator, the flag "network.security.esni.enabled" can be enabled. Since ESNI is still at an experimental stage, there could be changes (or bugs) that get uncovered & resolved in the future.

Enabling at a Global Level

The DoH setting discussed here is limited to Firefox. DNS lookups done outside of Firefox from any other browser, application or OS is unable to leverage DoH. DoH at the global/ OS level could be set-up via proxies. Given that DoH is over HTTPS, primarily a high level protocol for secure transfer of Hyper Text Documents, it maybe preferable securing DNS directly over TLS protocol.

In this regard DNS over TLS (DoT) is being developed. Ubuntu ver.18.0 & some Linux flavours offer DoT support experimentally. While DoT has some catching up to do viz-a-vis DoH, raging debates are continuing regarding the merits & demerits of the two options for securing DNS requests. Over time we can hope for the gaps & issues to be resolved, & far better privacy & security offered to the end-user.

Update 1 (25-Feb-21)

Reference links regarding enabling DoH across devices.

Enabling In Android:
- https://android.stackexchange.com/questions/214574/how-do-i-enable-dns-over-https-on-firefox-for-android

(In addition to the various network.trr.* settings to use OpenDns Family Shield DoH, additionally lookup the IP for the domain name "doh.familyshield.opendns.com" & set that value to network.trr.uri)
- https://blog.cloudflare.com/enable-private-dns-with-1-1-1-1-on-android-9-pie/

DoH Providers:
- https://github.com/curl/curl/wiki/DNS-over-HTTPS (Cloudflare also offers a Family shield/ filter)
- https://support.opendns.com/hc/en-us/articles/360038086532-Using-DNS-over-HTTPS-DoH-with-OpenDNS
- https://support.mozilla.org/en-US/kb/dns-over-https-doh-faqs

Tuesday, March 5, 2019

Human Design

"What a blessing (mercy) it would be if we could open and shut our ears as easily as we open and shut our eyes. - Georg C. Lichtenberg"

So true. Many an offensive situations could be diffused by simply dropping down the earlids. In a hyper-noisy nation like ours where the chatter never dies down, earmarked (sic!) noise free zones (around hospitals, schools, etc.) wouldn't exist. There could even be earlid-downed marches to protest against the high decibel rants pushed at us from all nooks & corners of the planet.

Perhaps Kikazaru/ Mikazaru, the first macaque who prescribed to us hear no evil, would be seen jumping around like never before. Only to be reminded the very next minute by his two wise buddies of its futility. And how their respective advices have been largely ignored despite there being lids for the eyes & the mouth. Finally, we would perhaps be able to truly experience the world in the way that people who can't hear experience it, even today. So yes, I agree with Mr. Lichtenberg that it would be a real blessing!

In that same spirit, we could also do with another design change, one that might already exist in a parallel universe somewhere. Would be nice to shift humans from a 4-hourly hunger cycle to a more pragmatic 4-monthly one. No getting hungry every few hours, no snacking, no gorging, no fun (seriously)?

There'd instead be a triannual feasting day for the individual. That would be the day to celebrate, bigger than any birthday or anniversary combined. The person concerned would probably down a few hundred kilos of their favorite gourmets. Gastronomic desires fulfilled like there's no tomorrow. There really wouldn't be one for the next four months. Guests meanwhile, would be making merry - singing, dancing, & everything else - awaiting their day of feasting.

There are stories about Indian mystics & sadhus who achieved a state of being, or were just built differently, where they didn't need any food for days together. But they seem to have gone extinct, save for some hunger artists. On the other hand, many animal species are known to feed in cycles with long fasting breaks in between. The camel for instance carries a special biological organ (the hump) to store food (fat) reserves, & can go without food & water for weeks together. In nature the concept is not so rare, a few hundred genes at play that's all.

Yet, the impact from a triannual feeding cycle to our social structures would be unimaginable. For instance the movie screenplay where the protagonist is complaining about the paapi pet (evil stomach) would simply be gone. Hunger, malnourishment, perhaps even poverty would be over. Or is that taking it too far? Newer enterprises would no doubt emerge that would work their way to profitability around the altered version of this fundamental base human need for food. In any case, there would be a paradigm shift on our social, economic & policy frameworks all over. Our entire existence would be markedly different, & hopefully better.

Monday, March 4, 2019

AB Testing To Establish Causation

A/B testing is a form of testing performed to compare the effectiveness of different versions of a product with randomly distributed (i.i.d.) end-user groups. Each group gets to use only one version of the product. Assignment of any particular user to a specific group is done at random, without any biases, etc. User composition of the different groups are assumed to be similar, to the extent that switching the version of the products between any two groups at random would make no difference to the overall results of the test.

A/B testing is an example of a simple randomized control trial . This sort of tests help establish causal relationship between a particular element of change & the measured outcome. The element of change could be something like change of location of certain elements of a page, adding/ removing a feature of the product, conversion rate, and so on. The outcome could be to measure the impact on additional purchase, clicks, time of engagement, etc.

During the testing period, users are at randomly assigned to the different groups. The key aspect of the test is the random assignment of users being done in real-time of otherwise similar users, so that no other unknown confounding factors (demographics, seasonality, tech. competence, background, etc.) have no impact on the test objective. When tested with a fairly large number of users, almost every group will end-up with a good sample of users that are representative of the underlying population.

One of the groups (the control group) is shown the baseline version (maybe an older version of an existing product) of the product against which the alternate versions are compared. For every group the proportions of users that fulfilled the stated objective (purchased, clicked, converted, etc.) is captured.

The proportions (pi) are then used to compute the test statistics Z-value (assuming a large normally distributed user base), confidence intervals, etc. The null hypothesis being that the proportions (pi) are all similar/ not significantly different from the proportion of the control group (pc).

For the two version A/B test scenario

   Null hypothesis H0(p1 = pc) vs. the alternate hypothesis H1(p1 != pc).

   p1 = X1/ N1 (Test group)
   pc = Xc/ Nc (Control group)
   p_total = (X1 + Xc)/(N1 + Nc) (For the combined population) ,
            where X1, Xc: Number of users from groups test & control that fulfilled the objective,
                & N1, Nc: Total number of users from test & control group

Z = Observed_Difference / Standard_Error
    = (p1 - pc)/ sqrt(p_total * (1 - p_total) * (1/N1 + 1/Nc))

The confidence level for the computed Z value is looked up in a normal table. Depending upon whether it is greater than 1.96 (or 2.56) the null hypothesis can be rejected with a confidence level of 95% (or 99%, or higher). This would indicate that the behavior of the test group is significantly different from the control group, the likely cause for which being the element of change brought in by the alternate version of the product. On the other hand, if the Z value is less than 1.96, the null hypothesis is not rejected & the element of change not considered to have made any significant impact on fulfillment of the stated objective.

Sunday, March 3, 2019

Mendelian Randomization

The second law of Mendelian inheritance is about independent assortment of alleles at the time of gamete (sperm & egg cells) formation. Therefore within the population of any given species, genetic variants are likely to be distributed at random, independent of any external factors. This insight forms the basis of Mendelian Randomization (MR) technique, typically applied in studies of epidemiology.

Studies of epidemiology try to establish the causal link (given some known association) between a particular risk factor & a disease. For e.g. smoking to cancer, blood pressure to stroke, etc. The association in many cases is found to be non-causal, or reverse causal, etc. Establishing the true effect becomes challenging due to the presence of confounding factors such as social, behavioral, environmental, physiological, etc. MR helps to tackle the confounding factors in such situations.

In MR, genetic variants (polymorphism) or genotype that have an effect similar to the risk factor/ exposure are identified. An additional constraint being that the genotype must not have any direct influence on the disease. Existence of genotype in the population is random, independent of any external influence. So presence (or absence) of disease within the population possessing the genotype, establishes (or refutes) that the risk factor/ effect is actually the cause for the disease. Several researches based on Mendelian randomization have been done successfully.

Example 1: There could be a study to establish the causal relationship (given observed association) between raised cholesterol levels & chronic heart disease (CHD). Given the presence of several confounding factors such as age, physiology, smoking/ drinking habits, reverse causation (CHD causing raised cholesterol), etc., MR approach would be beneficial.

The approach would be to identify a genotype/ gene variant that is known to be linked to an increase in total cholesterol levels (but has no direct bearing on CHD). The propensity for CHD is tested for all subjects having the particular genotype, which if/ when found much higher than the general population (not possessing the gene variant) establishes that raised cholesterol levels have a causal relationship with CHD.

Instrumental Variables

MR is an application of the statistical technique of instrumental variables (IV) estimation. IV technique is also used to establish causal relationships in the presence of confounding factors.

When applied to regression models, IV technique is particularly beneficial when the explanatory variable (covariates) are correlated with the error term & give biased results. The choice of IV is such that it only induces changes in the explanatory variables, without having any independent effect on dependent variables. The IV must not be correlated to the error term. Selecting an IV that fulfills these criterias is largely done through an analytical process supported by some observational data, & by leveraging relevant priors about the field of study.

Equating MR to IV

Risk Factor/ Effect = Explanatory Variable,
Disease = Dependent Variable
Genotype = Instrument Variable

Selection of genotype (IV) is based on prior knowledge of genes, from existing sources, literature, etc.