Anatomy of an AI system is a real eye-opener. This helps us to get a high level view of the enormous complexity and scale of the supply chains, manufacturers, assemblers, miners, transporters and other links that collaborate at a global scale to help commercialize something like an Amazon ECHO device.
The authors explain how extreme exploitation of human labour, environment and resources that happen at various levels largely remain unacknowledged and unaccounted for. Right from mining of rare elements, to smelting and refining, to shipping and transportation, to component manufacture and assembly, etc. these mostly happen under in-human conditions with complete disregard for health, well-being, safety of workers who are given miserable wages. These processes also cause irreversible damage to the ecology and environment at large.
Though Amazon Echo as an AI powered self-learning device connected to cloud-based web-services opens up several privacy, safety, intrusion and digital exploitation concerns for the end-user, yet focusing solely on Echo would amount to missing the forest for the trees! Most issues highlighted here would be equally true of technologies from many other traditional and non-AI, or not-yet-AI, powered sectors like automobiles, electronics, telecom, etc. Time to give a thought to these issues and bring a stop to the irreversible damage to humans lives, well-being, finances, equality, and to the environment and planetary resources!
Insights on Java, Big Data, Search, Cloud, Algorithms, Data Science, Machine Learning...
Showing posts with label machine learning. Show all posts
Showing posts with label machine learning. Show all posts
Wednesday, March 31, 2021
Flip side to Technology - Extractivism, Exploitation, Inequality, Disparity, Ecological Damage
Friday, April 17, 2020
Analysis of Deaths Registered In Delhi Between 2015 - 2018
The Directorate of Economics and Statistics & Office of Chief Registrar (Births & Deaths), Government of National Capital Territory (NCT) of Delhi annually publishes its report on registrations of births and deaths that have taken place within the NCT of Delhi. The report, an overview of the Civil Registration System (CRS) in the NCT of Delhi, is a source of very useful stats on birth, deaths, infant mortality and so on within the Delhi region.
The detailed reports can be downloaded in the form of pdf files from the website of the Department of Economics and Statistics, Delhi Government. Anonymized, cleaned data is made available in the form of tables in Section Titled "STATISTICAL TABLES" in the pdf files. The births and deaths data is aggregated by attributes like age, profession, gender, etc.
Approach
In this article, an analysis has been done of tables D-4 (DEATHS BY SEX AND MONTH OF OCCURRENCE (URBAN)), D-5 (DEATHS BY TYPE OF ATTENTION AT DEATH (URBAN)), & D-8 (DEATHS BY AGE, OCCUPATION AND SEX (URBAN)) from the above pdfs. Data from for the four years 2015-18 (presently downloadable from the department's website) has been used from these tables for evaluating mortality trends in Delhi for the three most populous Urban districts of North DMC, South DMC & East DMC for the period 2015-18.
Analysis
1) Cyclic Trends: Data for absolute death counts for period Jan-2015 to Dec-2018 is plotted in table "T1: Trends 2015-18". Another view of the same data is as monthly percentage of annual shown in table "T-2: Month/ Year_Total %".
Both tables clearly show that there is a spike in the number of deaths in the colder months of Dec to Feb. About 30% of all deaths in Delhi happen within these three months. The percentages are fairly consistent for both genders and across all 3 districts of North, South & East DMCs.
As summer sets in from March the death percentages start dropping. Reaching the lowest points below 7% monthly for June & July as the monsoons set in. Towards the end of monsoons, a second spike is seen around Aug/ Sep followed by a dip in Oct/ Nov before the next winters when the cyclic trends repeat.
Trends reported above are also seen with moving averages, plotted in Table "T-3: 3-Monthly Moving Avg", across the three districts and genders. Similar trends, though not plotted here, are seen in the moving averages of other tenures (such as 2 & 4 months).
2) Gender Differences: In terms of differences between genders, far more deaths of males as compared to females were noted during the peak winters on Delhi between 2015-18. This is shown in table "T4: Difference Male & Female".
From a peak gap of about 1000 in the colder months it drops to about 550-600 range in the summer months, particularly for the North & South DMCs. A narrower gap is seen the East DMC, largely attributable to its smaller population size as compared to the other two districts.
Table "T5: Percentage Male/ Female*100" plots the percentage of male deaths to females over the months. The curves of the three districts though quite wavy primarily stay within the rough band of 1.5 to 1.7 times male deaths as compared to females. The spike of the winter months is clearly visible in table T5 as well.
3) Cross District Differences in Attention Type: Table "T6: Percentage Attention Type" plots the different form of Attention Type (hospital, non-institutional, doctor/ nurse, family, etc.) received by the person at the time of death.
While in East DMC, over 60% people were in institutional care the same is almost 20% points lower for North & South DMCs. For the later two districts the percentage for No Medical Attention received has remained consistently high, the South DMC being particularly high over 40%.
4) Vulnerable Age: Finally, a plot of the vulnerable age groups is shown in table "T7: Age 55 & Above". A clear spike in death rates is seen in the 55-64 age group, perhaps attributable to the act of retirement from active profession & subsequent life style changes. The gender skewness within the 55-64 age group may again be due to the inherent skewness in the workforce, having far higher number of male workers, who would be subjected to the effects of retirement. This aspect could be probed further from other data sources.
Age groups in-between 65-69 show far lower mortality rates as they are perhaps better adjusted and healthier. Finally, a spike is seen in the number of deaths in the super senior citizens aged 70 & above, which must be largely attributable to their advancing age resulting in frail health.
Conclusion
The analysis in this article was done using data published by the Directorate of Economics and Statistics & Office of Chief Registrar (Births & Deaths), Government of National Capital Territory (NCT) of Delhi annually on registrations of births and deaths within the NCT of Delhi. Data of mortality from the three most populous districts of North DMC, South DMC and East DMC of Delhi were analysed. Some specific monthly, yearly and age group related trends are reported here.
The analysis can be easily performed over the other districts of Delhi, as well as for data from current years as and when those are made available by the department. The data may also be used for various modeling and simulation purposes and training machine learning algorithms. A more real-time sharing of raw (anonymized, aggregated) data by the department via api's or other data feeds may be looked at in the future. These may prove beneficial for the research and data science community who may put the data to good use for public health and welfare purposes.
Resouces:
Downloadable Datasheets For Analysis:
The detailed reports can be downloaded in the form of pdf files from the website of the Department of Economics and Statistics, Delhi Government. Anonymized, cleaned data is made available in the form of tables in Section Titled "STATISTICAL TABLES" in the pdf files. The births and deaths data is aggregated by attributes like age, profession, gender, etc.
Approach
In this article, an analysis has been done of tables D-4 (DEATHS BY SEX AND MONTH OF OCCURRENCE (URBAN)), D-5 (DEATHS BY TYPE OF ATTENTION AT DEATH (URBAN)), & D-8 (DEATHS BY AGE, OCCUPATION AND SEX (URBAN)) from the above pdfs. Data from for the four years 2015-18 (presently downloadable from the department's website) has been used from these tables for evaluating mortality trends in Delhi for the three most populous Urban districts of North DMC, South DMC & East DMC for the period 2015-18.
Analysis
1) Cyclic Trends: Data for absolute death counts for period Jan-2015 to Dec-2018 is plotted in table "T1: Trends 2015-18". Another view of the same data is as monthly percentage of annual shown in table "T-2: Month/ Year_Total %".
Both tables clearly show that there is a spike in the number of deaths in the colder months of Dec to Feb. About 30% of all deaths in Delhi happen within these three months. The percentages are fairly consistent for both genders and across all 3 districts of North, South & East DMCs.
As summer sets in from March the death percentages start dropping. Reaching the lowest points below 7% monthly for June & July as the monsoons set in. Towards the end of monsoons, a second spike is seen around Aug/ Sep followed by a dip in Oct/ Nov before the next winters when the cyclic trends repeat.
Trends reported above are also seen with moving averages, plotted in Table "T-3: 3-Monthly Moving Avg", across the three districts and genders. Similar trends, though not plotted here, are seen in the moving averages of other tenures (such as 2 & 4 months).
2) Gender Differences: In terms of differences between genders, far more deaths of males as compared to females were noted during the peak winters on Delhi between 2015-18. This is shown in table "T4: Difference Male & Female".
From a peak gap of about 1000 in the colder months it drops to about 550-600 range in the summer months, particularly for the North & South DMCs. A narrower gap is seen the East DMC, largely attributable to its smaller population size as compared to the other two districts.
Table "T5: Percentage Male/ Female*100" plots the percentage of male deaths to females over the months. The curves of the three districts though quite wavy primarily stay within the rough band of 1.5 to 1.7 times male deaths as compared to females. The spike of the winter months is clearly visible in table T5 as well.
3) Cross District Differences in Attention Type: Table "T6: Percentage Attention Type" plots the different form of Attention Type (hospital, non-institutional, doctor/ nurse, family, etc.) received by the person at the time of death.
While in East DMC, over 60% people were in institutional care the same is almost 20% points lower for North & South DMCs. For the later two districts the percentage for No Medical Attention received has remained consistently high, the South DMC being particularly high over 40%.
4) Vulnerable Age: Finally, a plot of the vulnerable age groups is shown in table "T7: Age 55 & Above". A clear spike in death rates is seen in the 55-64 age group, perhaps attributable to the act of retirement from active profession & subsequent life style changes. The gender skewness within the 55-64 age group may again be due to the inherent skewness in the workforce, having far higher number of male workers, who would be subjected to the effects of retirement. This aspect could be probed further from other data sources.
Age groups in-between 65-69 show far lower mortality rates as they are perhaps better adjusted and healthier. Finally, a spike is seen in the number of deaths in the super senior citizens aged 70 & above, which must be largely attributable to their advancing age resulting in frail health.
Conclusion
The analysis in this article was done using data published by the Directorate of Economics and Statistics & Office of Chief Registrar (Births & Deaths), Government of National Capital Territory (NCT) of Delhi annually on registrations of births and deaths within the NCT of Delhi. Data of mortality from the three most populous districts of North DMC, South DMC and East DMC of Delhi were analysed. Some specific monthly, yearly and age group related trends are reported here.
The analysis can be easily performed over the other districts of Delhi, as well as for data from current years as and when those are made available by the department. The data may also be used for various modeling and simulation purposes and training machine learning algorithms. A more real-time sharing of raw (anonymized, aggregated) data by the department via api's or other data feeds may be looked at in the future. These may prove beneficial for the research and data science community who may put the data to good use for public health and welfare purposes.
Resouces:
Downloadable Datasheets For Analysis:
Wednesday, February 26, 2020
Sampling Plan for Binomial Population with Zero Defects
Rough notes on sample size requirement calculations for a given confidence interval for a Binomial Population - having a probability p of success & (1 – p) of failure. The first article of relevance is Binomial Confidence Interval which lists out the different approaches to be taken when dealing with:
On the other side, there are derivatives of the Bayes Success Run theorem such as Acceptance Sampling, Zero Defect Sampling, etc. used to work out statistically valid sampling plans. These approaches are based on a successful run of n tests, in which either zero or a an upper bounded k-failures are seen.
These approaches are used in various industries like healthcare, automotive, military, etc. for performing inspections, checks and certifications of components, parts and devices. The sampling could be single sampling (one sample of size n with confidence c), or double sampling (a first smaller sample n1 with confidences c1 & a second larger sample n2 with confidence c2 to be used if test on sample n1 shows more than c1 failures), and other sequential sampling versions of it. A few rule of thumb approximations have also emerged in practice based on the success run techique:
- Large n (> 15), large p (>0.1) => Normal Approximation
- Large n (> 15), small p (<0.1) => Poisson Approximation
- Small n (< 15), small p (<0.1) => Binomial Table
On the other side, there are derivatives of the Bayes Success Run theorem such as Acceptance Sampling, Zero Defect Sampling, etc. used to work out statistically valid sampling plans. These approaches are based on a successful run of n tests, in which either zero or a an upper bounded k-failures are seen.
These approaches are used in various industries like healthcare, automotive, military, etc. for performing inspections, checks and certifications of components, parts and devices. The sampling could be single sampling (one sample of size n with confidence c), or double sampling (a first smaller sample n1 with confidences c1 & a second larger sample n2 with confidence c2 to be used if test on sample n1 shows more than c1 failures), and other sequential sampling versions of it. A few rule of thumb approximations have also emerged in practice based on the success run techique:
- Rule of 3s: That provides a bound for p=3/n, with a 95% confidence for a given success run of length n, with zero defects.
- Success run sample size (n) using Confidence Interval (C) & Reliability (R = 1 -p), when sampling with replacement (sampled item replaced and maybe selected again) taking a Binomial Distribution:
n = ln(1-C)/ln( R), where ln is the natural log, based on a probability of R^n, for a successful run of length n with zero defects. This can be further extended to the case when a maximum of k defects/ failures is acceptable.
For a small population and when sampling without replacement (sampled item not replaced), the Hypergeometric Distribution is used for sampling n items from a population having size N with D defects. Accordingly, Operating Characteristic (OC) curves for the given problem are prepared and used to get values of C & R for a given n.
Footnote on Distributions:
- Poisson distribution is used to model events within time/ space that are rare (small p) but show up large number of times (large n) & occur independent of the time since last event. Inter arrival times is an iid exponential random variables.
- Poisson confidence interval is derived from Gamma Distribution - which is defined using the two-parameters shape & scale. Exponential, Erlang & Chi-Squared are all special cases of Gamma Distrubtion. Gamma distribution is used in areas such as prediction of wait time, insurance claims, wireless communication signal power fading, age distribution of cancer events, inter-spike intervals, genomics. Gamma is also the conjugate prior of Bayesian statistics & exponential distribution.
- Bayesian Success Run can be derived using the Beta Distribution which is the conjugate prior for Binomial. Beta Distribution is defined via two shape parameters. Beta Distribution applications is found in order statistics (selection of k-th smallest from Uniform distribution), subjective logic, wavelet analysis, project management (PERT), etc.
Tuesday, December 3, 2013
Real-time Face Reading
The machines getting better and better at face reading. Ancient mystics have another reason to worry. Won't be long before recommendation engines of various kinds get built that leverage this sort of technology.
More about algorithms in this space to follow..
More about algorithms in this space to follow..
Subscribe to:
Posts (Atom)