This project explores the generation of adversarial examples and the fine-tuning of the ResNet18 model on the MNIST dataset. The main focus was on evaluating model performance under normal and adversarial conditions using techniques like FGSM (Fast Gradient Sign Method) and Projected Gradient Descent (PGD).
Adversarial examples are inputs designed to deceive machine learning models by introducing small, imperceptible changes to the data. These changes cause models to misclassify the input, exposing weaknesses in model robustness.
- FGSM (Fast Gradient Sign Method): This method generates adversarial examples by perturbing the input data in the direction of the gradient of the loss function with respect to the input. The perturbation is scaled by a factor (epsilon), which controls the magnitude of the attack.
- Projected Gradient Descent (PGD): This is an iterative version of FGSM. It applies the gradient updates multiple times and projects the perturbations back into the allowed input space after each step, making it a more powerful adversarial attack.
- Other Techniques: Other methods for generating adversarial examples include Carlini-Wagner Attack and DeepFool. These methods can be more sophisticated in terms of their ability to bypass defenses, but the focus in this project was on FGSM and PGD.
The ResNet18 model was fine-tuned on the MNIST dataset to evaluate performance on clean data and under adversarial conditions. Below are the results after fine-tuning for 1 epoch:
- Clean Accuracy: 97.91%
- FGSM Accuracy: 12.51%
- FGSM + Gaussian Accuracy: 97.92%
As seen in the results, FGSM significantly reduced the accuracy of the model, showing how vulnerable the ResNet18 model is to adversarial examples. However, when combined with Gaussian noise, the accuracy was restored, demonstrating the robustness of the model under certain defense mechanisms.
PGD is an advanced adversarial attack method that refines adversarial examples iteratively. By applying small perturbations multiple times and ensuring that the perturbation remains within a feasible range, PGD generates more challenging adversarial examples compared to FGSM. This method is effective in evaluating model robustness and improving the adversarial training process.
The ResNet18 model was fine-tuned on the MNIST dataset and achieved the following performance metrics:
Fine-tuning pretrained ResNet on MNIST for 1 epoch... Epoch 1, Loss: 0.06693653437562896 Fine-tuning complete. Model saved as 'finetuned_resnet18_mnist.pth'.Evaluation on 10000 MNIST samples (ResNet18): Clean Accuracy : 9791/10000 = 97.91% FGSM Accuracy : 1251/10000 = 12.51% FGSM + Gaussian Accuracy: 9792/10000 = 97.92%
- ResNet18 Performance: The model achieved a clean accuracy of 97.91% on the MNIST test set, but its accuracy dropped significantly to 12.51% under FGSM adversarial attack. The model's performance improved to 97.92% when Gaussian noise was added to the adversarial examples.
- Adversarial Defense: Incorporating noise or using advanced defense techniques like adversarial training can help improve the model's resilience to attacks like FGSM and PGD.
- ResNet18: Pretrained model used for classification tasks on MNIST.
- FGSM and PGD: Adversarial attack techniques for generating adversarial examples.
- Python: Programming language used for model development and optimization.
- Pytorch: Framework used for training the ResNet18 model and implementing adversarial attacks.
- TensorFlow: For training and evaluating deep learning models.
- NumPy: For numerical computations during the model training and evaluation process.
- Integrate more sophisticated adversarial defense mechanisms like adversarial training and defensive distillation to improve model robustness.
- Experiment with more complex models (e.g., ResNet50, DenseNet) to test their resistance to adversarial attacks.
- Explore other adversarial attack methods like DeepFool or Carlini-Wagner to better understand model vulnerabilities and enhance defenses.