Wednesday, March 31, 2021

Flip side to Technology - Extractivism, Exploitation, Inequality, Disparity, Ecological Damage

Anatomy of an AI system is a real eye-opener. This helps us to get a high level view of the enormous complexity and scale of the supply chains, manufacturers, assemblers, miners, transporters and other links that collaborate at a global scale to help commercialize something like an Amazon ECHO device.

The authors explain how extreme exploitation of human labour, environment and resources that happen at various levels largely remain unacknowledged and unaccounted for. Right from mining of rare elements, to smelting and refining, to shipping and transportation, to component manufacture and assembly, etc. these mostly happen under in-human conditions with complete disregard for health, well-being, safety of workers who are given miserable wages. These processes also cause irreversible damage to the ecology and environment at large.

Though Amazon Echo as an AI powered self-learning device connected to cloud-based web-services opens up several privacy, safety, intrusion and digital exploitation concerns for the end-user, yet focusing solely on Echo would amount to missing the forest for the trees! Most issues highlighted here would be equally true of technologies from many other traditional and non-AI, or not-yet-AI, powered sectors like automobiles, electronics, telecom, etc. Time to give a thought to these issues and bring a stop to the irreversible damage to humans lives, well-being, finances, equality, and to the environment and planetary resources!

Monday, March 29, 2021

Doing Better Data Science

In the article titled "Field Notes #1 - Easy Does It" author Will Kurt highlights a key aspect of doing good Data Science - Simplicity. This includes first and foremost getting a good understanding of the problem to be solved. Later among the hypothesis & possible solutions/ models to favour the simpler ones. Atleast giving the simpler ones a fair/ equal chance at proving their worth in tests employing standardized performance metrics.  

Another article of relevance for Data Scientists is from the allied domain of Stats titled "The 10 most common mistakes with statistics, and how to avoid them". The article based on the paper in eLife by the authors Makin and Orban de Xivry lists out the ten most common statistical mistakes in scientific research. The paper also includes tips for both the Reviewers to detect such mistakes and for Researchers (authors) to avoid them.

Many of the issues listed are linked to the p-value computations which is used to establish significance of statistical tests & draw conclusions from it. However, its incorrect usage, understanding, corrections, manipulation, etc. results in rendering the test ineffective and insignificant results getting reported. Issues of Sampling and adequate Control Groups along with faulty attempts by authors to establish Causation where none exists are also common in scientific literature.  

As per the authors, these issues typically happen due to ineffective experimental designs, inappropriate analyses and/or flawed reasoning. A strong publication bias & pressure on researchers to publish significant results as opposed to correct but failed experiments makes matters worse. Moreover senior researchers entrusted to mentor juniors are often unfamiliar with fundamentals and prone to making these errors themselves. Their aversion to taking criticism becomes a further roadblock to improvement.  

While correct mentoring of early stage researchers will certainly help, change can also come in by making science open access. Open science/ research must include details on all aspects of the study and all the materials involved such as data and analysis code. On the other hand, at the institutions and funders level incentivizing correctness over productivity can also prove beneficial.