Monday, July 20, 2020

Inclusive Online Education

In today's times maintaining physical/ social distancing is the new way of living. While most of our activities will have to be done from home re-adjustments will be required to a few to shift them over to the digital channels.

Education is one key human endeavor which will progressively be done online. An assessment of the current state of online education here in India shows that while some resources do exist there's a lot that needs to be done to make online education viable and effective for students, particularly for those from the marginalized sections of the society.

Various government bodies connected with education in India such as CBSE, NCERT, AICTE, UGC, NIOS, CIET, CEC, MHRD, etc. have over the years made attempts towards providing distance education, e-learning & MOOCs, digitization of books and materials, content delivery via various digital plaforms & tv , youtube, video conferencing and so on. These freely available resources  can be good starting points for aggregating and rolling out well thought out standardized content/ tools for the students. Parallely, availability of content in regional languages and localization can be fast paced.

On the other side are the new world EduTech startup companies that are making lots of progress in the technology driven online education space. Companies such as Byjus, Vedantu, Khan Academy, TOI etc. are now household names thanks to their big advertising push. The target student pool though seems to be the well to do convent/ public school student with the means to pay for the services.

A good grasp over language and access to internet & good mobile and computing devices are pre-requisites for using these platforms well. While absence of such novelties in the lives of students from the poor and marginalized sections of society make the platforms out of reach of such students. Perhaps, it's time for socially conscious EduTech startups to come-forth to bridge the digital divide!
 
Update 1 (29-Mar-21): 
 
- Check out the discussion on EduTech at India Economic Conclave 21 between
Vineet Nayar(Founder & Chairman, Sampark Foundation) & Ashish Jhalani (CMO (Global), Square Panda)

Thursday, May 7, 2020

Ffmpeg - Swiss Army Knife of Video/ Audio Editting

Ffmpeg is a fantastic video and audio converter to edit & create video & audio files. As is typical of *nix command line tools, ffmpeg has several options that need to be correctly configured to be able to use the tool properly to edit videos or audios.

Architecturally, ffmpeg works with streams of video, audio, images or other data files that are passed through various reader/ writer (demuxer/ muxer) and encoder/ decoder layers for the editing and creating video and audio files:


Image Credit: Official Ffmpeg Linux Manual

The command prompt may be a little overwhelming to  start off, but a little playing with the tool shows reveals its immense potential. The official documentation page & Linux manual has a few examples to get you started.

Beyond this there are several online resources, blogs and articles such as this, this, this & this, etc. which have listed down the different ffmpeg commands with options. On the other hand, for those averse to the shell prompt, there are several GUI tools written on top of ffmpeg which can be explored.


Friday, April 17, 2020

Analysis of Deaths Registered In Delhi Between 2015 - 2018

The Directorate of Economics and Statistics & Office of Chief Registrar (Births & Deaths), Government of National Capital Territory (NCT) of Delhi annually publishes its report on registrations of births and deaths that have taken place within the NCT of Delhi. The report, an overview of the Civil Registration System (CRS) in the NCT of Delhi, is a source of very useful stats on birth, deaths, infant mortality and so on within the Delhi region.

The detailed reports can be downloaded in the form of pdf files from the website of the Department of Economics and Statistics, Delhi Government. Anonymized, cleaned data is made available in the form of tables in Section Titled "STATISTICAL TABLES" in the pdf files. The births and deaths data is aggregated by attributes like age, profession, gender, etc.

Approach

In this article, an analysis has been done of tables D-4 (DEATHS BY SEX AND MONTH OF OCCURRENCE (URBAN)), D-5 (DEATHS BY TYPE OF ATTENTION AT DEATH (URBAN)), & D-8 (DEATHS BY AGE, OCCUPATION AND SEX (URBAN)) from the above pdfs. Data from for the four years 2015-18 (presently downloadable from the department's website) has been used from these tables for evaluating mortality trends in Delhi for the three most populous Urban districts of North DMC, South DMC & East DMC for the period 2015-18. 

Analysis




1) Cyclic Trends: Data for absolute death counts for period Jan-2015 to Dec-2018 is plotted in table "T1: Trends 2015-18". Another view of the same data is as monthly percentage of annual shown in table "T-2: Month/ Year_Total %".




Both tables clearly show that there is a spike in the number of deaths in the colder months of Dec to Feb. About 30% of all deaths in Delhi happen within these three months. The percentages are fairly consistent for both genders and across all 3 districts of North, South & East DMCs.

As summer sets in from March the death percentages start dropping. Reaching the lowest points below 7% monthly for June & July as the monsoons set in. Towards the end of monsoons, a second spike is seen around Aug/ Sep followed by a dip in Oct/ Nov before the next winters when the cyclic trends repeat.


  


Trends reported above are also seen with moving averages, plotted in Table "T-3: 3-Monthly Moving Avg", across the three districts and genders. Similar trends, though not plotted here, are seen in the moving averages of other tenures (such as 2 & 4 months).

2) Gender Differences: In terms of differences between genders, far more deaths of males as compared to females were noted during the peak winters on Delhi between 2015-18. This is shown in table "T4: Difference Male & Female".




From a peak gap of about 1000 in the colder months it drops to about 550-600 range in the summer months, particularly for the North & South DMCs. A narrower gap is seen the East DMC, largely attributable to its smaller population size as compared to the other two districts.






Table "T5: Percentage Male/ Female*100" plots the percentage of male deaths to females over the months. The curves of the three districts though quite wavy primarily stay within the rough band of 1.5 to 1.7 times male deaths as compared to females. The spike of the winter months is clearly visible in table T5 as well.    

3) Cross District Differences in Attention Type: Table "T6: Percentage Attention Type" plots the different form of Attention Type (hospital, non-institutional, doctor/ nurse, family, etc.) received by the person at the time of death.




While in East DMC, over 60% people were in institutional care the same is almost 20% points lower for North & South DMCs. For the later two districts the percentage for No Medical Attention received has remained consistently high, the South DMC being particularly high over 40%.

4) Vulnerable Age: Finally, a plot of the vulnerable age groups is shown in table "T7: Age 55 & Above". A clear spike in death rates is seen in the 55-64 age group, perhaps attributable to the act of retirement from active profession & subsequent life style changes. The gender skewness within the 55-64 age group may again be due to the inherent skewness in the workforce, having far higher number of male workers, who would be subjected to the effects of retirement. This aspect could be probed further from other data sources.







Age groups in-between 65-69 show far lower mortality rates as they are perhaps better adjusted and healthier. Finally, a spike is seen in the number of deaths in the super senior citizens aged 70 & above, which must be largely attributable to their advancing age resulting in frail health.

Conclusion

The analysis in this article was done using data published by the Directorate of Economics and Statistics & Office of Chief Registrar (Births & Deaths), Government of National Capital Territory (NCT) of Delhi annually on registrations of births and deaths within the NCT of Delhi. Data of mortality from the three most populous districts of North DMC, South DMC and East DMC of Delhi were analysed. Some specific monthly, yearly and age group related trends are reported here.

The analysis can be easily performed over the other districts of Delhi, as well as for data from current years as and when those are made available by the department. The data may also be used for various modeling and simulation purposes and training machine learning algorithms. A more real-time sharing of raw (anonymized, aggregated) data by the department via api's or other data feeds may be looked at in the future. These may prove beneficial for the research and data science community who may put the data to good use for public health and welfare purposes.

Resouces: 

Downloadable Datasheets For Analysis:

Friday, February 28, 2020

Defence R&D Organisation Young Scientists Lab (DYSL)


Recently there was quite a lot of buzz in the media about the launch of DRDO Young Scientists Lab (DYSL). 5 such labs have been formed by DRDO each headed by a young director under the age of 35! Each lab has its own specialized focus area from among fields such as AI, Quantum Computing, Cognitive Technologies, Asymmetric Technologies and Smart Materials.

When trying to look for specifics on what these labs are doing, particularly the AI lab, there is very little to go by for now. While a lot of information about the vintage DRDO Centre of AI and Robotics (CAIR) lab is available on the DRDO website, there's practically nothing there regarding the newly formed DRDO Young Scientists Lab on AI (DYSL-AI). Neither are the details available anywhere else in the public domain, till end-Feb 2020 atleast. While these would certainly get updated soon for now there are just these interviews with the directors of the DYSL labs:

  • Doordarshan's Y-Factor Interview with the 5 DYSL Directors Mr. Parvathaneni Shiva Prasad, Mr. Manish Pratap Singh, Mr. Ramakrishnan Raghavan, Mr. Santu Sardar, Mr. Sunny Manchanda







  • Rajya Sabha TV Interview with DYSL-AI Director Mr. Sunny Manchanda





Wednesday, February 26, 2020

Sampling Plan for Binomial Population with Zero Defects

Rough notes on sample size requirement calculations for a given confidence interval for a Binomial Population - having a probability p of success & (1 – p) of failure. The first article of relevance is Binomial Confidence Interval which lists out the different approaches to be taken when dealing with:

  • Large n (> 15), large p (>0.1) => Normal Approximation
  • Large n (> 15), small p (<0.1) => Poisson Approximation
  • Small n (< 15), small p (<0.1) => Binomial Table

On the other side, there are derivatives of the Bayes Success Run theorem such as Acceptance Sampling, Zero Defect Sampling, etc. used to work out statistically valid sampling plans. These approaches are based on a successful run of n tests, in which either zero or a an upper bounded k-failures are seen.

These approaches are used in various industries like healthcare, automotive, military, etc. for performing inspections, checks and certifications of components, parts and devices. The sampling could be single sampling (one sample of size n with confidence c), or double sampling (a first smaller sample n1 with confidences c1 & a second larger sample n2 with confidence c2 to be used if test on sample n1 shows more than c1 failures), and other sequential sampling versions of it. A few rule of thumb approximations have also emerged in practice based on the success run techique:

  • Rule of 3s: That provides a bound for p=3/n, with a 95% confidence for a given success run of length n, with zero defects.

Footnote on Distributions:
  • Poisson confidence interval is derived from Gamma Distribution - which is defined using the two-parameters shape & scale. Exponential, Erlang & Chi-Squared are all special cases of Gamma Distrubtion. Gamma distribution is used in areas such as prediction of wait time, insurance claims, wireless communication signal power fading, age distribution of cancer events, inter-spike intervals, genomics. Gamma is also the conjugate prior of Bayesian statistics & exponential distribution.