Rtasya - Challenges

Is Validation of Complex AI Systems Feasible?

Concerns have been voiced that future systems employing "machine learning" might be so complex that humans will be unable to test or validate their reliability. The key to maintaining control over complex systems is to tell the difference between models which can be validated and models which cannot be validated.

On the one hand, there are seemingly simple machines (e.g. Pokemon Yellow), which are "accidentally Turing-complete". And it can be proven that very simple questions about their behavior is undecidable, let alone that performance characteristics can be guaranteed. A very closely related insight is that the language used to ask machines to perform certain tasks should be suitably restricted, preferably to regular languages or context-free languages. Otherwise it is difficult to provide security guarantees for those machines. And no security guarantee whatsoever can be given for machines which need to interpret languages that are Turing-complete.

On the other hand, there is the class PSPACE of problems requiring polynomial amount of space, which can be efficiently validated by Interactive Proof Systems (IP) despite containing "hard" combinatorial problems. In other words, Adi Shamir's theorem

PSPACE = IP,

is the reason that the validation of solutions to very hard problems is possible, if the problem class is suitably constrained.

The Goal of Independent Validation

Improve communication among humans about the strengths and weaknesses of the automated system and how it evolves with new training data. This includes

refining the often only vaguely described desirable system behaviors – “thrustworthy”, “reliable”, “ethical”, “unbiased”, “privacy-preserving” – into a complete set of measurable and actionable quality criteria,
describing the strengths of the automated system by identifying those input ranges for which the model meets certain quality criteria,
describing the weaknesses of the automated system by finding and displaying the “corner cases” or “adversarial inputs”, which lead to undesirable outputs, and
describing the explicit or implicit assumptions needed to maintain the desirable properties of the system, if the calibration of the model is updated with new training data.

The Meta-Schema for Independent Validation

Abstracting from the specific tools and methods needed for independent validation, the key questions that need to be asked and answered in the dialogue between the independent validators and the maintainers of a machined-learned system are:

Are the optimality criteria (i.e. the desirable system behavior) as well as the limits and constraints (i.e. the allowable system behavior) defined precisely enough, such that these criteria can be measured and acceptability be decided? Are the quality criteria and outputs of the system aligned with how the system is used?
Do the quality criteria properly account for rare but potentially severe losses due to undesirable system behavior?
Are the minimum requirements, which define the model’s allowable behaviors, fulfilled? How well does the model meet the objectives, which define the desirable behavior?
In the presence of randomness in the problem domain: Is the degree of uncertainty in the model output quantified? If the model implicitly uses forecasts of probability distributions, is the quality of these probability distribution forecasts quantified? Is the degree of selection bias or survivorship bias in the training data quantified?
Was the system subjected to adversarial tests? Are the boundaries of input data defined, for which the system’s allowable behaviors and a certain quality of the desirable behaviors can be guaranteed?
Are the sensitivities of model outputs to inputs and model parameters computed and graphically summarized? Are these sensitivities aligned with the allowable and desirable system behaviors? (Domain knowledge by human experts is needed to answer these questions.)
Were causality tests applied? To what extent do “spurious correlations” influence the system’s behavior? Does the system depend on spurious correlations for its response (“Clever Hans”)?