Unexpected Behaviors in AI Systems

Victoria Krakovna has created a public Google spreadsheet tracking examples of AI systems that engaged in unexpected behavior, typically because the objective the AI system was supposed to accomplish was not properly specified. Krakovna refers to these as “specification gaming” where the AI is “generating a solution that literally satisfies the stated objective but fails to solve the problem according to the human designer’s intent.”

For example,

A robotic arm trained to slide a block to a target position on a table achieves the goal by moving the table itself.

. . .


A cooperative GAN architecture for converting images from one genre to another (eg horses<->zebras) has a loss function that rewards accurate reconstruction of images from its transformed version; CycleGAN turns out to partially solve the task by, in addition to the cross-domain analogies it learns, steganographically hiding autoencoder-style data about the original image invisibly inside the transformed image to assist the reconstruction of details.

That last one is a fairly “clever” example.

Leave a Reply