Anatomy of an AI system is a real eye-opener. This helps us to get a high level view of the enormous complexity and scale of the supply chains, manufacturers, assemblers, miners, transporters and other links that collaborate at a global scale to help commercialize something like an Amazon ECHO device.
The authors explain how extreme exploitation of human labour, environment and resources that happen at various levels largely remain unacknowledged and unaccounted for. Right from mining of rare elements, to smelting and refining, to shipping and transportation, to component manufacture and assembly, etc. these mostly happen under in-human conditions with complete disregard for health, well-being, safety of workers who are given miserable wages. These processes also cause irreversible damage to the ecology and environment at large.
Though Amazon Echo as an AI powered self-learning device connected to cloud-based web-services opens up several privacy, safety, intrusion and digital exploitation concerns for the end-user, yet focusing solely on Echo would amount to missing the forest for the trees! Most issues highlighted here would be equally true of technologies from many other traditional and non-AI, or not-yet-AI, powered sectors like automobiles, electronics, telecom, etc. Time to give a thought to these issues and bring a stop to the irreversible damage to humans lives, well-being, finances, equality, and to the environment and planetary resources!
Insights on Java, Big Data, Search, Cloud, Algorithms, Data Science, Machine Learning...
Wednesday, March 31, 2021
Flip side to Technology - Extractivism, Exploitation, Inequality, Disparity, Ecological Damage
Monday, February 15, 2021
Parental Controls for Securing Online Usage by Children
As explained in the past, various safety features such as family shield filters from providers like OpenDNS , Cloudflare and others, DNS Over Https (DoH), HTTP Strict Transport Security (HSTS) can be used for a hassle free safe browsing across devices for members of the family. To additionally secure and regulate the usage for young kids Parental Control features and tools can be employed on devices and networks being accessed by children.
Parental Controls are available from day one across most device operating systems (OS) such as Android, iOS, and so on. All that the parent then needs to do, is to log in to the device using his/ her credentials and indicate to the device (OS) that the user of the device is a child and switch ON parental controls. Once that's done, the parental controls will get activated and only allow specific apps to run (apps white listed by the parent) while disallowing all others, and also filter out potentially harmful content from various sites and resources online.
Conceptually, that's pretty much all that there is to Parental Controls! For more info you can check out online resources such as these by Vodafone, VI and Google for a better understanding and setting-up parental controls to protect your kids online.
Friday, February 28, 2020
Defence R&D Organisation Young Scientists Lab (DYSL)
Recently there was quite a lot of buzz in the media about the launch of DRDO Young Scientists Lab (DYSL). 5 such labs have been formed by DRDO each headed by a young director under the age of 35! Each lab has its own specialized focus area from among fields such as AI, Quantum Computing, Cognitive Technologies, Asymmetric Technologies and Smart Materials.
When trying to look for specifics on what these labs are doing, particularly the AI lab, there is very little to go by for now. While a lot of information about the vintage DRDO Centre of AI and Robotics (CAIR) lab is available on the DRDO website, there's practically nothing there regarding the newly formed DRDO Young Scientists Lab on AI (DYSL-AI). Neither are the details available anywhere else in the public domain, till end-Feb 2020 atleast. While these would certainly get updated soon for now there are just these interviews with the directors of the DYSL labs:
- Doordarshan's Y-Factor Interview with the 5 DYSL Directors Mr. Parvathaneni Shiva Prasad, Mr. Manish Pratap Singh, Mr. Ramakrishnan Raghavan, Mr. Santu Sardar, Mr. Sunny Manchanda
- Rajya Sabha TV Interview with DYSL-AI Director Mr. Sunny Manchanda
Thursday, October 3, 2019
Firefox Normandy
To turn off Normandy in Firefox use the advanced settings route: about:config > app.normandy.enabled = false.
Update 1 (23-Oct-19):
- Principally Mozilla (Firefox) have always been in favour of user privacy.
Saturday, September 21, 2019
Last Petrol Car
Battery operated EVs seem to be getting better by the day. There's good traction seen in the three-wheelers (battery operated autos/ totos) space. Two- & four-wheelers are likely to hit mass markets soon, with pricing that would be lucrative (perhaps tax incentivized). Further, widespread infrastructural & service support need to be introduced to give people the confidence to switch to EVs.
Yet, at the moment, EV technologies - battery, chargers, fire & safety protocols, instrumentation, cabling & connectors, etc. - are at early-to-mid maturity level. Driving range per charge is about 100 Kms for the entry segment cars which is not enough. It's quite common for people to drive ~150 Kms daily for work. On highways, the range could be much more. So a sub-300 Km range would simply not do!
At the same time, the mass market pricing levels (INR 3 to 6 lacs) should not be breached in any way. The existing coverage of mechanics & service centres of various manufacturers (Maruti, Hyundai, Mahindra, Tata, etc.) needs to be upgraded to support EVs as well.
Reliable electricity remains a constraint in most cities including the metros. On the generation side, renewables would need a wider push. Residential solar rooftop set-ups could be one area of focus. Through such set-ups, individual households & complexes could achieve self-sufficiency for their growing energy needs, including the EV burden/ load (@20-30 Units for full charge per vehicle X 30 days = 600-900 units per vehicle per month). Standard practices to popularize rooftop solar set-ups employed the world over such as PayGo models, incentives/ tax breaks, quality controls, support & maintenance, etc. should be introduced here as well. If possible, it would be great to have the EVs themselves equipped with solar panels on the body to auto-charge whenever required under direct sunlight. Eagerly waiting for these clean green technologies to evolve and make inroads very soon!
Update 1 (09-Oct-19):
- An assessment of the current state of EV adoption in India by Business Standard.
Update 2 (23-Oct-19):
- Bajaj Chetak to be relaunched in an Electric avatar.
- Blu-Smart all electric cabs visible on Delhi roads.
Thursday, September 19, 2019
Renewable Energy In India
From an energy generation capacity from renewables pegged at ~60 GW (in 2017) we are targetting to reach about 175 GW (100 GW Solar, 60 GW wind, 10 GW biomass, 5 GW small hydro power) by 2022. Which would be close to 50% of our entire energy needs. With ground work for mass adoption of Electric Vehicles (EV) getting traction, our demands for power and generation from renewables will need to scale up even further. To the extent that we may become energy surplus one day and be able to export to the neigbourhood. For a sneak peak into the state of the art from the world of renewables, head over to the Renewable Energy India (REI) Expo 2019 currently underway at the Knowledge Park II, Greater Noida.
The REI-2019 has exhibitors from leaders in the renewables space such as China, Bangladesh, France, Germany, India, Israel, Netherlands, Saudi Arabia, Singapore, Slovakia, South Korea, Taiwan, Tunisia, UK, USA, Vietnam, etc. They are showcasing their product portfolios from solar & wind power devices to installations on floating & permanent structures, from switching & grid apparatus to connectors, from inverters & batteries to EVs, and more. Expo timings are from 10 am to 6 pm. Walk-in as well as online registrations are allowed. Go see the future!
Update 1 (21-Sep-19):
- Listen to what Greta Thrunberg has to say & check out her zero-carbon boat
Update 2 (23-Oct-19):
- Coal to continue powering India's energy requirements for decades - Swaminomics
Wednesday, September 18, 2019
Sim Swap Behind Twitter CEO's Account Hack
SIM swap fraud can be done by some form of social engineering and stealing/ illegally sharing personal data of user used to authenticate with the telecom operator. The other way is by malware or virus infected app or hardware taking over the user's device, or by plain old manipulation of personnel of the telecom company through pressure tactics, bribes, etc.
In order to limit cases of frauds DOT India has brought in a few mandatory checks into the process of swapping/ upgrading sim cards to be followed by all telecom operators. These include IVRS based confirmation call to the subscriber on current working sim, confirmation SMS to current working sim, and blocking of SMS features for 24 hours after swapping of sim.
The window of 24 hours is reasonably sized to allow the actual owner to react in case of a fraud thanks to these checks. Once they realize that their phone has mysteriously gone completely out of network coverage for long, and doesn't seem to work even after restarting and switching to a location known to have good coverage alarm bells ought to go off. Immediately they should contact the telecom operator's helpline number/ visit the official store.
At the same time, the window of 24 hours is not excessively long to discomfort a genuine user wanting to swap/ upgrade. Since SMS services remains disabled, SMS based OTP authentication for apps, banking etc. do not work within this period of time, thereby preventing misuse by fraudsters.
Perhaps, telecom regulators & players elsewhere need to follow suit. Twitter meanwhile has chosen to apply a band-aid solution by turning off their tweet via SMS feature post the hack. Clearly a lot more needs to be done to put an end to the menace.
Thursday, August 29, 2019
What? A Man In The Middle!
To begin with let's look at the high level components that are a part of the digital traffic:
- Device: Phone, pad or desktop
- App: Running one the device (Whatsapp, Fb, Gmail, Browser, etc.)
- Server: Server components of the service provider, organization, etc. that is listening to & providing some service to the app
- Network: Wired, wireless, hybrid channel through which the digital packets (bits) travel between the device & the server
Device & Apps
The user's device is the first & most common point of vulnerability in the chain. These get infected by viruses or malwares. Some defences include:
- Being particular about not installing any untrusted, unverified software. Installing only reputed apps and software that are actively maintained & updated that patch/ resolve existing vulnerabilities inherent in its components or dependent libraries. App developers themselves must be well conversant with standards (secure cookie, etc.) and industry best practices such as OWASP Top 10 and so on, to avoid building poor quality and vulnerable apps/ software.
- Keeping devices updated. Staying up to date offers the best defence against recently detected vulnerabilities, which the manufacturers & software vendors rush to fix.
- By not clicking on unverified links or downloads.
- Making use of conservative settings for all apps, with absolutely minimal privileges. Company provided default permissions are found to be too lax & liberal in many cases. So review what permissions are present & change them to more minimal settings. For instance why the hell would a SMS messages app need to access phones camera?
In order to avoid crashing your phone, make piece-meal changes to the app settings & test. If it works great. If not, make a note & revert! Later check the privileges that you felt were unnecessary and caused problems.
Too much work? Well, for the moment until the device's operating system software undergo major privacy focussed revisions, there doesn't seem to be much of an alternative.
- Sticking only to the manufacturer specified software repositories for updates.
- For Windows based/ similar systems installing an updated anti-virus is mandatory. Use the free (for personal use) Avast anti-virus if not anything else. Better still switch to a more robust *nix based OS.
- If you are a traditionalist using browsers, Mozilla Firefox set up with conservative & minimal privacy settings scores significantly over its competitors, that are mostly data capturing ad machines. If possible, contribute to help keep Mozilla, a non-profit, afloat.
- Physically secure your device with a password/ pin & do not allow any unknown person to use the same. In case temporary access is to be provided specially on desktops create guest credentials for the user with limited privileges.
Server
This is the where the real action to process the user's request takes place. Whether it is an info about the weather, sending emails, getting chat notifications, doing banking transactions, uploading photos, etc. the user sends the request along with the data to the server to perform the necessary action. The server itself being a device (mostly a collection of devices database, web-server, load-balancer, cloud service, etc.) is vulnerable to all the above set of devices & apps risks plus many others that sever engineers & operation teams work to harden against.
Standards to be employed, learnings & best practices are shared widely by most of the leaders working in server side technologies via blogs, articles, conferences, journals, communities, etc. The cloud vendors (Amazon AWS, Microsoft Azure, Google Cloud, Rackspace, and so on) are specially active in this regard. They are busy pushing the bar higher with improvements to the various server technologies being rolled out regularly.
There are some open source tools available to check the different aspects of the server set-up. For instance the Owasp Test for HSTS (HTTP Strict Transport Security ) & SslLabs Server Rating Guide provides details on the requirements for the server's SSL certificate used to encrypt data. SslLabs also has an online tool to test & rate the set up of any publicly accessible server & highlight potential weaknesses.
Network
Between the user's device and the server lies the network through which the data and instructions flow. The network may include wired, wireless or a combination of components (routers, hubs, gateways, etc.). The best form of defence against the man-in-the-middle attack is to ensure that only strongly encrypted data is sent over the network (end-to-end (e2e) encryption).
The communication between the user device & server takes place via a secure HTTPS protocol using a signed SSL certificate issued via reputed certificate authority. This ensures that as long as the certificate's private key (known only to the server) remains secure the end-to-end (e2e) encryption between user's device & server works.
Yet, there are ways in which a server set-up for HTTPS communication might end up downgrading to an insecure HTTP protocol or being compromised (SslLabs Server Rating Guide). The best defence against this is to set-up the server to solely work over HTTPS, by setting it up to work with the HTTP Strict Transport Security (HSTS) protocol.
Once HSTS is enabled on the server, any non-secure HTTP requests to the server is either rejected or redirected to the secure HTTPS channel. All insecure HTTP requests from the user's end to the server are automatically switched over to HTTPS & connection between client and server dropped in case of a problem with the server's certificate. So HSTS protects against the various man-in-the-middle attack scenarios such as protocol downgrade (to insecure HTTP) & session hijacking attack.
Beyond e2e encryption & HSTS, the server address lookup process done by the user's device could also get manipulated (by ARP spoofing within LAN & DNS spoofing). In place of the genuine address, user data could be directed to a fake server's address. Performing address lookup securely via DNSSEC provides a good mitigation strategy for DNS vulnerability.
These basic requirements are essential for managing safety of user's data. Yet, in this eternal tussle between the yin and yang of security a lot more needs to be done & certainly the end goal hasn't been reached. As new threats emerge we can only hope to collectively strengthen our defences and stay alert & updated to remain secure.
Monday, August 26, 2019
Dconf, Gsettings, Gnome Files/ Nautilus Refresher
Thursday, April 26, 2018
Biometric Authentication
Staring off by calling out known facts & assumptions about thumb-prints (an example biometric):
- Thumb-prints are globally unique to every human being.
(Counter: Enough people who don't have/ lose a thumb, or lose their thumb-prints due to some other reason. Also partial thumb-prints of two individuals taken of a portion of the thumb due to faults at the time of scanning, etc. may match.)
- Thumb-prints stay consistent over the lifetime of an individual (adult).
(Counter: May not be true due to physical changes in the human body, external injuries, growths, etc.)
- Computers are basically binary machines. So whether it's a document (pdf, doc), an image file (jpg, gif, etc.), a video (mp4), a music file (wav), a Java program, a Linux operating system, etc. all of the data, instructions, etc. get encoded into a string of bytes (of 0s & 1s).
- The thumb-print scan of an individual is similar to an image file (following a standard protocol), encoded as a string of bytes.
The thumb-prints scans of two different individuals will result in two different strings of bytes, that are unique to the individual.
Subsequent scans of the thumb-print of the same individual will result in exactly the same string of bytes over-time.
That's enough background information for a rough evaluation. A thumb-print scan of a certain size, say 10Kb is just a string of 10,000 bits of 0s & 1s. This is unique to an individual & stays the same over the individual's lifetime.
A 4-digit Pin on the other hand is a combination of four Integer numbers. Each Integer typically gets encoded into a 32-bit string. A 4-digit Pin is therefore a 4 * 32-bit = 128-bit string. The Pin normally stays the same, unless explicitly changed (rather infrequent).
In simplistic terms, when a request to authenticate an individual is made to a computer, it reads the incoming string of bits (from the Pin or the thumb-print) & matches it against a database of known/ all existing (1-to-1 or 1-to-N matches) strings. To the computer other than the difference in length between the two encoded strings of thumb-print (10,000-bit) & Pin (128-bit), there's not much difference between the two.
On the other hand, the Pin seems much better than the thumb-print if it were ever to get compromised due to a breach or a malicious app or something. The Pin can simply be changed & a new 128-bit string can replace the earlier one going forward. But in the case of the thumb-print there's really nothing that can be done as the individual's thumb-print scan will stay the same over time!
Yet another alternative for authentication is to use One Time Password (OTP). The OTP is also a 4-digit number (128-bit string) but it is re-issued each time over a separate out-of-band channel (such as SMS), is short lived, & is valid for just one use. These features make the OTP way more robust & immune to breaches & compromise.
What is a biometric to the human being, is just another string of bits to the machine, very similar to the string of bits of a Pin or an OTP. From the stand-point of safety though, the OTP is far superior to the other two. As is the common practice, it maybe ok to use biometric authentication within environments such as government offices, airports, etc. where the network is tightly regulated & monitored. For end-user authentication however, such as within phone apps, or internet payments, or other channels where the network or device is orders of magnitude more insecure & vulnerable these are not ideal. In general OTPs should be the top pick & biometrics the last option in such cases:
OTP > Pin > Biometrics
Wednesday, March 7, 2018
Ubuntu 16.04 32-bit Display Issue/ Corruption on load/ boot
Followed this up with the standard routine of creating a bootable usb using the downloaded iso, booting with the usb, plugging in the Lan cable (wifi drivers are initially unavailable/ downloaded later), formatting the disk & doing a fresh install (choosing the option to download all third-party drivers, etc). All this went off smoothly & the laptop was ready to reboot.
After restarting however found the display corrupted. Practically the entire screen from the left to right was covered with bright coloured stripes, dots & squares rendering it unusable. After a bit of fiddling around found a work-around to close the lid & reopen it, forcing the laptop to go into standby & then exit by pressing the power button. This did help restore the display, but felt there had to be a better solution.
A few suggestions online were to downgrade the Ubuntu kernel to an older version 4.12 or lower. Further search revealed the actual Bug in Kernel 4.13 : Intel Mobile Graphics 945 shows 80 % black screen. The work-around solution of Suspend/ Resume, as well as the proper solution of setting GRUB_GFXPAYLOAD_LINUX=text in grub file are mentioned there.
Setting the variable GFXPAYLOAD to text makes Linux boot up in normal text mode, typically to avoid display problems in the early boot sequence. As instructed made the addition to the /etc/default/grub file, ran sudo update-grub, rebooted, & the display issue was gone!
Monday, February 26, 2018
Metro Train
Peak Hours Rush, 100+% occupancy
Interestingly, during peak hours not all coaches get equally packed. Certain coaches, typically the ones close to the staircases, are much more congested. Now if only the passengers were notified in advance about the occupancy factor across coaches of the upcoming trains, they might be able to move a little bit on the platform & board a less congested one.
This could be achieved via existing sensors on the train that capture weight, footfall, etc. or via video feeds from the on-board cameras (see references below) within the coaches. Just need to relay this feed in real-time to a screen/ dashboard on the platform (& an app) visible to the customer. These feeds needn't be super accurate, and a reasonable estimate (Low, Medium, High, Very High) of the occupancy should do. This data can also reveal other interesting insights on occupancy across days of the week, events, festivals, seasonality, etc.
Fig 1: Occupancy Across Coaches |
Another observation is that typically low to medium occupancy trains follow/ trail the high occupancy ones. Perhaps there's a general tendency in people to board the first available train that shows up. On the other hand, if the feed could also show occupancy stats along with arrival timings of next two to three trains that might help the passenger to wait a few minutes & board a less congested one.
Fig 2: Occupancy & Arrival Timings |
Surprisingly, the expected arrival timings of next two or three trains, fairly common elsewhere (like Singapore MRT), is not available on the monitors here. This should probably be easy to introduce right away, even if the other one with the occupancy indicator takes time.
Optimizing Number of Coaches
The current logic to ply trains having 6-coaches in place of 8-coaches should also be improved in the future. Perhaps to reduce costs by roughly 75% (6/8), 6-coach trains are run during off-peak hours. Invariably though, back to back 6-coach trains show up during peak hours leading to overcrowding inside the trains & long spiralling queues at the stations.
Working out the right moment to switch between a 6-coach & an 8-coach (or other smaller) variant seems like a solution to a cost (running) minimization problem while maximizing users' comfort. Key factors being peak hours timings, occupancy levels, end-to-end runtime of the train, cost/ kg to ply the trains, time to hook/ unhook additional coaches, available parking space for spare coaches and so on. Very much worth a look at by data science folks.
Beyond the 8-Coaches Barrier
Probably adding coaches to existing trains could work. These coaches would have to be either attached to the ends of the train. Since they'll be positioned beyond the platform limits, they'll have to be door-less. Entry/ exit would be from adjacent coaches positioned on the platforms having doors. Doable in theory, though the additional movement across the train aisles, etc. will pose newer engineering & security challenges.
References:
- RIVA/ VCA Counting
- People Counting Demonstration
- Stable Multi-Target Tracking in Real-Time Surveillance Video (CVPR 2011)
- Motion-Based Multiple Object Tracking
- Crowd Size Estimation
- Counting in Extremely Dense Crowd Images
- Algorithm to count people in a crowd
- People detection from above
- Which Algorithm is used to count the number of people in a video?
Saturday, February 10, 2018
Erring On The Side Of Caution
We are at a point where not just all transactions are done online, but our interfacing with the banking & financial institutions are likely to be all virtual. It's therefore important to start thinking about how this virtual world functions. Given that there's hardly any awareness programme for the nouveau digital customers, we are left to fend for ourselves for now at least. Here're some of my ideas that, though half baked, might help get your grey cells activated in the right direction.
Convenience Vs. Caution
We are all for convenience these days. With long queues starting to disappear, 24X7 banking turning a reality, cheques heading to obsolescence we are gearing up for the inevitable fully digitized era. Yet, we shouldn't throw caution to the wind. One should be aware that the keys to your hard earned money is now the cell phone & laptops in your hands. Don't allow it to be misused.Liabilities
But then as they say you can't just be too careful, can you? So it's important to also know what to do when things go wrong. What exactly are the liabilities of the banks? Where do the banks draw the line & what do they label as the customer's fault? Knowing things like how soon do you need to report a fraud, to what if it took place overseas, in some god forsaken currency, etc. becomes important.Investigation
The next question then is how do banks investigate financial frauds. Who, how, where, when, & what means do they employ. Especially for frauds cutting across regional and international borders.For the investigating authorities already cracking under the humongous backlogs, how easy is to investigate? Are there stats around how well they've been doing? Not to mention the other aspect around competence, intent, knowledge, effort, etc., all equally problematic. Best bet therefore is to be safe & steer clear of all this hassle.
Customization/ Personalization
Banks have this tendency to deal with all customers alike. At most they'll label you a standard or a premium category customer - more as marker of your net worth than than your tech./ digital competence. Though it's the later kind of categorization that's more relevant.There's a whole bunch of different people out there. From people who may be digital novices at one end, to pros at the other end. Why not segregate accordingly and personalize the handling? The novices need a lot more hand holding. The systems should be made as such to double check all their transactions. Allow novices to keep all their limits (daily transaction, max value/ transaction, etc.) low. Ensure that they don't make mistakes. The pros on the other hand can be allowed to operate without much/ any checks.
Explain the implications of each digital category to the customer & allow them to label themselves as appropriate. And please let this be at the account level. A pro here might still be a novice there! Allow customers the option to customize their limits & features. At the moment all limits are mostly set to one fixed value for all customers of a particular bank category or card type, etc. which needs to be made flexible for the customer. There maybe people who require high limits on their cards while others who don't, so give customers the option to set & change the limits as per their convenience. At the same, customers with low limits might temporarily require higher which they can set for a specific duration (day, week, etc.) via one of the bank channels such as net-banking, phone banking, ATM, etc.
Another aspect is to strongly differentiate between the mechanism for getting informational/ read-only statements/ data about your accounts vs. the transactionally activated systems. Once email & mobile numbers are registered with the banks, customers should be able to easily request for balance info., statements, notifications, etc., all read only/ non-transactional information about their accounts (reasonably well supported even today).
However, what happens typically is that once activated for the informational service with the bank other transactional services (fund transfer, bill pay, etc.) also get activated by default. That shouldn't be the case. Banks systems must differentiate between the two kinds of services placed by the customer (read only information vs. transactional) & allow customers to select either of the two as per their convenience. At the same time, for the transactional systems allow setting of customizable limits & validation via multi-factor authentication.
Two-factor/ Multi-factor Authentication
Two-factor & multi-factor authentication are commonly heard terms, that work very well in practice. A user's identity is confirmed with 2 or more factors based on something they have (such as an ATM card) & something they know (a Pin). The general idea being that there's a very low probability of two (or more) factors getting compromised at the same time together. You may loose your card or your phone but not both together, at the same time. A chance of one in several million or so, & therefore considered safe.Any possibility to bypass the multi-factor authentication is a certain recipe for disaster. Double check with your bank if their digital access & interfacing points between you, the vendor & the bank are all multi-factor based.
While the ATM card + Pin is a perfect 2-factor example in the real/ physical world, the picture changes slightly when doing digital transactions online. In this case, the 1st factor is the Card No + Expiry Date + CVV No combination. That's right all 3 combined make up for the 1st factor. Why? Think of what happens if you were to loose the card, the finder has access to all of them. So whether you are asked to enter 3 details or a 100 details printed on that same card, that's still just 1-factor!
The 2nd factor then, is the Pin that you have to enter, similar to the ATM case. However, one major difference between when you are doing transactions online over the internet vs. when using the ATM case, is that inherently your home network is orders of magnitude more unsafe than the bank's network over which information from the ATM gets routed. There's a much higher likelihood of your computer, phone or network being hacked & someone (virus, man-in-middle, etc.) capturing all the card information & your Pin. These can then be used later to do fraudulent transactions or launch a Replay Attack.
Of course, the banks have known/ thought of this, & therefore allowed you an alternative in the form of One Time Password (OTP). An OTP is much better than the Pin, since they are regenerated each time, delivered to your phone (over a separate out-of-band SMS channel), & can be used just once. So even if they were to be replayed, the subsequent transactions would fail!
Perhaps one less heard of/ used device here for the same one time password generation, is the Security Token, also called a dongle sometimes. A small standalone device, that's immune to viruses, hacks, etc. & can do magic for securing your digital transactions. Transactions get fulfilled only once you enter the temporary pin/ password flashing on the specific security token linked to your account. There are a whole bunch of variants out there, & it's about time the security token becomes the mainstay device in our banking & financial sector.
Interestingly the old SMS based OTP mentioned earlier, is a pretty good substitute for the security token. With one caveat, that the OTP should probably not be sent to a smart phone running apps with data connectivity. That's because most apps (good & malicious ones) can very easily detect/ have access to SMS & therefore form a self-fulfilling loop, violating the 2-factor authentication. (For payment apps, valid 2nd factor is just the Pin that you know & should be changed often over a separate channel other than your smart phone, such as ATM, phone-banking, etc.).
About the 1st factor (Card No, Expiry, CVV recycle)
You now know that either one of Pins or OTP's make up the 2nd factor & why OTPs are always better. Essentially they are short lived, & one time use. So wouldn't it help to make the 1st factor, the details printed on the card, short lived as well? Yes, certainly if the cards could be re-issued often. Though it may not be feasible given the printing/ shipping costs & for other reasons.Banks tend to issue cards with validities that span several years. Could they instead issue temporary one time use card (similar to OTPs) sent virtually (don't need printed cards)? Well perhaps, but then the temporary one time card details can't be delivered via SMS (or netbanking or email), otherwise it would be using the same channel as the OTP & would violate the 2-factor requirements. Other ways that could possibly work is by phone banking, or via two separate phone nos., or with the security token (aha) - better ideas welcome.
Phone Number Recycling
Yet, another thing that seems weird is this phenomenon of allowing phone no's to get recycled. Things may have been somewhat ok in the past, but now it's absolutely wrong to allow the telecom vendor to cancel a Mr. Sharma's phone due to x,y,z reasons & issue it later to Mr. Verma after 180 days or whatever.As things stand today:
Phone no recycling = Exposing Bank a/c, Personal Id, etc.. & this needs to stop! Phone companies could still block & disable a no., but can't reissue it to anybody, other than maybe immediate family.
Legacy vs. Digital Bank
Just as we discussed that from the bank's perspective there are different sorts of customers out there, tech. savvy to novices, similarly from the customer's perspective as well, it makes sense to hold accounts with different banks. Use only one or two of those online, & use the rest in a legacy/ offline mode to keep things safe. To continue the legacy offline mode, cheques or something similar will need to survive. Though cheques have been in existence for aeons, in their current forms they seem vulnerable in terms of security.Cheque involve a long winding offline fulfilment loop for the payout. Cheques also involve a kind of good faith delayed payout understanding between the payer & payee. There's a physical instrument (the cheque) issued by the bank in the possession of the payer (=something you have, reasonable safe, though cheque numbers ought to be randomized), a signature uniquely known & reproducible by the payer (=something you know, unsafe & publicly exposed), a transportation of the cheque from the payer to the bank by the payee (rather unsafe as the cheque might move through the hands of several intermediaries), verification of the payer's details & signature by the payer/ payee bank (safe, online), & finally the payout if all's well.
As mentioned earlier cheque numbers are typically issued in sequence making them prone to hacks/ fakes, & should definitely be replaced with randomly generated numbers. Beyond that, there could be a mechanism to uniquely generate, a limited validity (30 days perhaps) one time signature for the cheque after entering the amount & payee details. The signature could be generated on a bank's site using a card (with multi-factor authentication) or some other offline mechanism (such as phone banking) or via the security token & shared with the payee/ written in place of the signature. The generated signature could also be partly human readable (for the benefit of the payee) & look like:
<AMOUNT>-<GENERATED_ALPHA_NUMERIC_KEY>
At the verification leg, the banks simply need to verify the combination of the cheque number, payee name, amount & the one time signature - no differently from what's done today. This should make this legacy instrument somewhat safer for use if it survives in the future.
Artificial Intelligence (AI)
Finally, in the not so distant future, the next generation of digital technology & AI would act as our sentinels. These AI powered machines, devices, algorithms and apps would detect, block, defer, double confirm, transactions on a case by case basis, to find that sweet spot between customer's convenience & safety. Till then, be safe, be happy!Monday, January 22, 2018
Streaming Solutions
Another very popular programming methodology in recent times is Reactive programming. This in some senses is a special case of event driven programming with the focus on data change (as the event) & the reactive step to do other downstream data changes (as the handlers).
A whole bunch of frameworks for streaming solutions have emerged from the Big Data ecosystem such as Storm, Spark Streaming, Flink, etc. These allow for quick development of streaming solutions using high level abstractions. Even Solr has a streaming expression support now for building distributed streaming search solutions.
Outside of these frameworks, Akka Streams seems promising. It's built on top of Akka's robus Actor model & the Reactive Streams api. Solutions such as Gear Pump can provide a sense of the ground up solutions possible with Akka Streams.
Tuesday, December 24, 2013
Mechanical Sympathy
More details to follow soon on the topic out here, for the moment you could refer to Martin Fowler's post.
Friday, November 2, 2012
Using Pentaho Kettle to Index Data in Solr
Solr, on the other hand, is a rich and powerful production grade search engine written on top of Lucene. So how would it be to get the two to function in tandem? To use Kettle to load data into Solr for indexing purpose.
The data load phase for indexing in Solr is very similar to an ETL process. The data is sourced (Extract) from a relational Database (MySql, Postgre, etc.). This data is denormalized and transformed to a Solr compatible document (Transform). Finally the transformed data is streamed to Solr for indexing (Load). Kettle excels in performing each of these steps!
A Kettle ETL job to load data into Solr for indexing, is a good alternative to using Solr's very own Data Import Handler (DIH). Since DIH typically runs off the same Solr setup (with a few common dependencies) so there's some intermixing of concerns with such a set-up, between what Solr is good at (search & indexing) versus what the DIH is built to do (import documents). The DIH also competes for resources (CPU, IO) with Solr. Ketttle has no such drawbacks and can be run off a different set of physical boxes.
There are additional benefits of using Kettle such as availability of stable implementations for working across data sources, querying, bulk load, setting up of staged workflows with configurable queues & worker threads. Also Kettle's exception handling, retry mechanism, REST/ WS client, JSON serializer, custom Java code extension, and several handy transformation capabilities, all add up in its favour.
On the cons, given that the call to Solr would be via standard REST client from Kettle, the set-up would not be Solr Cloud or Zookeeper (ZK) aware to be able to do any smart routing of documents. One option to solve this could be to use the Custom Java Code step in Kettle and delegate the call to Solr via the SolrJ's CloudSolrServer client (which is Solr Cloud/ ZK aware).