Imperceptible Perturbations support for TextAttack #817

vlwk · 2025-04-17T11:55:53Z

Summary

This PR adds the Bad Characters: Imperceptible NLP attack. It introduces a new dimension to attacks: perturbations that are invisible (on some rendering systems). Full details can be found in textattack.attack_recipes.BadCharacters2021. It uses a combination of the differential evolution search algorithm, four different transformations (invisible characters, homoglyphs, deletions, reorderings) and various goal functions.

Notes:

I suspect this PR might need multiple rounds of reviews and I am happy to make any requested changes at any time.
It's a really big PR. We could explore merging some bits of it first if you prefer.
There are many commits and some of them, especially the earlier ones, were either carelessly named or not atomic. It will probably be very hard to follow commit by commit. Probably best to squash before merging.
I really enjoyed using TextAttack, it is really well built. Thank you!

Additions

Attack recipe:

Added BadCharacters2021 recipe as textattack.attack_recipes.BadCharacters2021.

Tests:

Added a folder tests.badcharacters2021 which contains a notebook badcharacters.ipynb and a requirements file requirements.txt. You may run the entire notebook from start to end. There are flags at the top of the notebook to set whether to save the downloaded temp files or the results, or have them deleted automatically. The perturbation type can also be chosen. Each of the five experiments in the paper are replicated in this notebook, with custom model wrappers and everything.

Docs:

Added detailed docs. Used sphinx-apidoc -f -o apidoc -d 6 -E -T -M ../textattack to generate the content in apidoc. Note: this seemed to make minor modifications to every single file in apidoc. Not sure if that is intended behaviour.

Transformations:

Added WordSwapDifferentialEvolution in textattack.transformations
Added WordSwapInvisibleCharacters, WordSwapDeletions, WordSwapReorderings in textattack.transformations, which extend WordSwapDifferentialEvolution.
Added intentional_homoglyphs.txt to textattack.shared.
WordSwapHomoglyphSwap was modified.

Search methods:

Added DifferentialEvolution in textattack.search_methods

Goal functions:

Added new folder textattack.goal_functions.custom
Added LogitSum, NamedEntityRecognition, TargetedBonus, TargetedStrict in textattack.goal_functions.custom
Added MaximizeLevenshtein in textattack.goal_functions.text

Goal function results:

Added new folder textattack.goal_function_results.custom
Added LogitSumGoalFunctionResult, NamedEntityRecognitionGoalFunctionResult, TargetedBonusGoalFunctionResult, TargetedStrictGoalFunctionResult intextattack.goal_function_results.custom

Validators:

Added transformation_consists_of_word_swaps_differential_evolution in shared.validators. This is used to check that the DifferentialEvolution search method is used with a compatible transformation, which must subclass WordSwapDifferentialEvolution.

AttackArgs:

Added relevant.

Requirements:

Added Levenshtein.

Changes

I tried to minimise changes to existing files.

Added a flag allow_skip in textattack.goal_functions.GoalFunction, which defaults to True. When set to False, the attack will still continue even if the initial_result already meets the goal. This was needed to replicate the experiments in the paper.

Design choices

I did not implement the custom model wrappers for each experiment in the textattack.models.wrappers folder, instead opting to leave them in the notebook at tests.badcharacters2021. This mirrors the tutorials listed on TextAttack's documentation. If the maintainers would like, this notebook can be transferred to the tutorials section.
The key parts of the attack are the use of the DifferentialEvolution search method and the various Transformations implemented. It made sense for these to be added to the textattack.search_methods and textattack.transformations folders.
For the transformations, I realised that the existing _get_transformations and _get_replacement_words functionality was insufficient because of two reasons:
1. the search method itself generates the "perturbation instructions" for an input string (encoded in a "perturbation vector", before passing it to the transformation to manipulate the sentence.
2. the replacements are character-level, not word-level.
- To address these issues, I introduced a new base class: textattack.transformations.WordSwapDifferentialEvolution. This class provides a clean interface for my needs. Subclasses are required to implement two methods:
  - get_bounds_and_precomputed(current_text)
    Returns the bounds used by Differential Evolution to sample the perturbation vector, and any precomputed data needed to efficiently apply perturbations (e.g., homoglyph maps).
  - apply_perturbation(current_text, perturbation_vector, precomputed)
    Applies a perturbation vector to an input AttackedText object and returns the modified AttackedText.
- This design avoids redundant computation by allowing precomputed data to be passed from get_bounds_and_precomputed to apply_perturbation, rather than recalculating it on every call.
- For compatibility, I still implemented _get_replacement_words for all of the new transformations.
- The DifferentialEvolution search method also checks for transformation compatibility. The transformation must be an instance of WordSwapDifferentialEvolution.
I did also decide to implement the custom goal functions in the textattack.goal_functions folder, mainly because I didn't want to make the attack recipe too bloated. At first I placed them under the classification and text subfolders, but there were some things that didn't match. For example, one of my attacks required as input an array of logits that did not sum to 1, so this required me to override _process_model_outputs. Another was for a Named Entity Recognition task which output one score/label per input token, instead of one score per input sentence. Most required slightly different _get_score functions as well.
For each of the custom goal functions, I thus decided to implement a separate GoalFunctionResult as well. I could have used TextToTextGoalFunctionResult, because I just wanted to override get_colored_output, but decided that wouldn't be fully accurate.

Checklist

The title of your pull request should be a summary of its contribution.
Please write detailed description of what parts have been newly added and what parts have been modified. Please also explain why certain changes were made.
Make sure existing tests pass.
Add relevant tests. No quality testing = no merge.
All public methods must have informative docstrings that work nicely with sphinx. For new modules/files, please add/modify the appropriate .rst file in TextAttack/docs/apidoc.'

Reorderings, deletions, invisible characters

…wnload datasets

review-notebook-app · 2025-04-30T23:46:58Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

vlwk added 30 commits January 15, 2025 13:55

Implemented imperceptible transformations

9fc1c8a

Reorderings, deletions, invisible characters

Create differential_evolution.py

98b8021

Update __init__.py

f4db2a1

Working DE implementation with homoglyphs

bc2bec3

Implemented homoglyphs attack recipe

4176f0f

(wont run) homo+invi, translation+toxic, big reorg, todo script to do…

b6a6c42

…wnload datasets

refactor DE search, added validator, deletion and reordering logic

cc4685b

refactoring, bug fixes, moved Toxic back to texttotext temporarily

d017bb6

added emotion tests and pipeline model wrapper

902bcc8

Added support for mnli and ner

7efa2a5

update mnli load

74173d4

added toxic loads

778c1bf

uncommented some lines in toxic/load.py

b303012

slurm_toxic

1061152

parallel toxic

af1855b

fixed bug in toxic_attack_parallel

fef4e95

working toxic batch

73c4c1e

added toxic graph.py

d7c6f5e

added comments for goal functions and DE search method

2fc4161

Implemented _get_replacement_words for transformations and added typing

bc0fec9

big reorganization

90293da

removed need for get_goal_result in imperceptiblede

0e57421

Removed imperceptible_experiments dir for PR

7088c70

edited gitignore

aa35eaf

Implemented _get_replacement_words in invisible characters

2944ac0

Delete maximize_bleu

c5ce00a

Removed MaximizeBleu from init

67f4893

Replaced MaximizeBleu with MinimizeBleu in recipe

358723b

Fix maximize_levenshtein

9b51f50

repr keys imperceptible de

b44b32b

vlwk added 14 commits April 29, 2025 04:43

Attack recipe

40a97b5

Attack args

1c91e72

skip logic, typo

7428c63

Readme, attack args

38a32a0

Readme

ee7b515

docs

b07bf60

fixes

f513d90

fixes

3ca8940

Docs

22fcd04

fix classification_goal_function _get_displayed_output bug

b5a281b

Refactored goal functions

df36063

imperceptible tests wip

be9b43e

Tests

e0c478a

docs

d187e0a

vlwk added 14 commits May 1, 2025 00:50

revert gitignore

4adbb59

rename test notebook to badcharacters.ipynb

ea1931c

revert

dabb8f6

random_one for imperceptible word swaps

60e84ce

docs for recipe

972d9b3

revert attacks4components

7c09eb9

readded changes to attacks4components

11c27e9

revert changes attacks4components

24db54f

attacks4components auto reformatting

dc02162

downloaded homoglyph txt file

7126f22

docs

d35e1e3

Removed comment in homoglyphswap

890de71

Removed requests in requirements

c7ad00f

Added requests to tests/badcharacters2021

872f3af

vlwk marked this pull request as ready for review May 1, 2025 01:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Imperceptible Perturbations support for TextAttack #817

Imperceptible Perturbations support for TextAttack #817

vlwk commented Apr 17, 2025 •

edited

Loading

review-notebook-app bot commented Apr 30, 2025

Imperceptible Perturbations support for TextAttack #817

Are you sure you want to change the base?

Imperceptible Perturbations support for TextAttack #817

Conversation

vlwk commented Apr 17, 2025 • edited Loading

Summary

Additions

Changes

Design choices

Checklist

review-notebook-app bot commented Apr 30, 2025

vlwk commented Apr 17, 2025 •

edited

Loading