GLOSSARY

PERBUTATION

A small and subtle modification made (intentionally) to input data with the intention of misleading a machine learning model.

Much like you ‘nudge’ the weights and biases in a particular direction in order to ‘train’ a network, you ‘perbute’ the input in such a way to induce misclassification.

BLACK BOX ATTACKS vs. WHITE BOX ATTACKS

In a black box attack, the attacker has no knowledge of the internals of the model. They may only have access to the input-output relationship of the model. (A cybersecurity term)

In a white box attack, the attacker has full knowledge of the model, including its architecture, weights & biases, and sometimes even the training data.

TARGETED vs. NON-TARGETED

Targeted: The goal is to cause the model to output a specific, incorrect response to certain inputs.

Non-Targeted: The objective is to cause any incorrect output, without specificity to what the incorrect output should be.

TRAINING TIME vs. INFERENCE TIME

Training: These attacks happen during the training phase of the model.

Inference: These attacks happen at the time of model inference or deployment.

ZEROTH ORDER vs. FIRST ORDER vs. SECOND ORDER ATTACKS

Zeroth Order: No access to gradient informations, relies soley on the observed output layer

First Order: Utilizes the first derivative (gradient) of the function.

Second Order: Involves the second derivative or the Hessian (a matrix of second-order partial derivatives). Results in more precise perbutations, but more computationally expensive.

ADVERSARIAL ATTACKS

Most adversarial attacks involve: given an input x and any target classification t, it is possible to find a new input x′ that is similar to x but classified as t

JAILBREAKS

MEASURING ROBUSTNESS

TO LEARN THE BACKPROPOGATION ALGORITHM