The 3rd AI Revolution
The availability of larger data sets (“big data”) together with more powerful computers led to remarkable advances in the construction and calibration of models with an increasing number of parameters (“machine learning”). This spawned amazing technological breakthroughs in:
- image recognition (around 2012-2015), enabling autonomous cars and autonomous drones;
- natural language processing (NLP) and speech recognition, enabling virtual assistants like Siri, Google Now, Cortana, and Alexa (2010-2014), and general-purpose pre-trained language "transformer models" like ELMo, BERT and GPT-3 (2018-2020); and
- efficient knowledge representation and fast retrieval, enabling contest-winning demonstrations like IBM Watson’s winning Jeopardy! (2011), AlphaGo’s defeating Le Se-dol (2016) and Ke Jie (2017) in the game of Go, or Libratus (2017) and Pluribus (2019) winning no-limit Texas Hold’em competitions.
Eventually, general-purpose models have been trained to understand images, videos and language in combination, e.g. Aleph Alpha's MAGMA and DeepMind's Perceiver.
The Need for Quality Standards and Quality Assurance
On the other hand, systems employing machine-learned models failed, sometimes spectacularly, for example:
- On May 7, 2016, a Tesla Model S failed to activate the automatic brake in a fatal crash, after mistaking a semi-trailer truck for an overhead road sign. A very similar fatal crash happened on March 1st, 2019 with a Tesla Model 3.
- On March 18, 2018, an Uber self-driving car killed a pedestrian, after the car continued to re-classify the pedestrian as various different objects in the time period between 5.6 seconds to 1.2 seconds before impact without reducing speed. Then the car alerted the driver and waited for one second for the driver to take control. When the driver took control – 0.2 seconds before impact – it was much too late.
Some machine learning models are easy to fool:
- Adding two white rectangles and two black rectangles to a stop sign made an AI image recognition system misclassify a stop sign as a “speed limit 45” sign.
- Image-classification networks can be re-programmed to perform a completely different task than they were trained to do, by adding a single adversarial perturbation to the input.
Some machine learning models are deceptive in the sense that - like 'Clever Hans' - they produce the correct predictions on certain inputs, but for the wrong reasons:
- An AI image classifier learned subtle correlations between certain watermarks in pictures of horses. When the watermark is manually removed from the picture of a horse, the AI image classifier is no longer able to classify it as a horse. When the watermark is added to a picture of a car, the AI classifier classifies the car as a horse, see “Unmasking Clever Hans Predictors and Assessing What Machines Really Learn” More broadly, AI classifiers have identified ships by the presence of water, trains by the presence of tracks or wolves by the presence of snow.
- A medical AI image classifier learned spurious correlations and failed to properly predict cases from new hospitals outside of the hospitals which contributed training and validation data. It turned out that “the system had unexpectedly learned to identify metal tokens seen on the training and assessment images”, see "Detecting Spurious Correlations With Sanity Tests for Artificial Intelligence Guided Radiology Systems".