Skip to content

Imperceptible Perturbations support for TextAttack #817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 64 commits into
base: master
Choose a base branch
from

Conversation

vlwk
Copy link

@vlwk vlwk commented Apr 17, 2025

Summary

This PR adds the Bad Characters: Imperceptible NLP attack. It introduces a new dimension to attacks: perturbations that are invisible (on some rendering systems). Full details can be found in textattack.attack_recipes.BadCharacters2021. It uses a combination of the differential evolution search algorithm, four different transformations (invisible characters, homoglyphs, deletions, reorderings) and various goal functions.

Notes:

  • I suspect this PR might need multiple rounds of reviews and I am happy to make any requested changes at any time.
  • It's a really big PR. We could explore merging some bits of it first if you prefer.
  • There are many commits and some of them, especially the earlier ones, were either carelessly named or not atomic. It will probably be very hard to follow commit by commit. Probably best to squash before merging.
  • I really enjoyed using TextAttack, it is really well built. Thank you!

Additions

Attack recipe:

  • Added BadCharacters2021 recipe as textattack.attack_recipes.BadCharacters2021.

Tests:

  • Added a folder tests.badcharacters2021 which contains a notebook badcharacters.ipynb and a requirements file requirements.txt. You may run the entire notebook from start to end. There are flags at the top of the notebook to set whether to save the downloaded temp files or the results, or have them deleted automatically. The perturbation type can also be chosen. Each of the five experiments in the paper are replicated in this notebook, with custom model wrappers and everything.

Docs:

  • Added detailed docs. Used sphinx-apidoc -f -o apidoc -d 6 -E -T -M ../textattack to generate the content in apidoc. Note: this seemed to make minor modifications to every single file in apidoc. Not sure if that is intended behaviour.

Transformations:

  • Added WordSwapDifferentialEvolution in textattack.transformations
  • Added WordSwapInvisibleCharacters, WordSwapDeletions, WordSwapReorderings in textattack.transformations, which extend WordSwapDifferentialEvolution.
  • Added intentional_homoglyphs.txt to textattack.shared.
  • WordSwapHomoglyphSwap was modified.

Search methods:

  • Added DifferentialEvolution in textattack.search_methods

Goal functions:

  • Added new folder textattack.goal_functions.custom
  • Added LogitSum, NamedEntityRecognition, TargetedBonus, TargetedStrict in textattack.goal_functions.custom
  • Added MaximizeLevenshtein in textattack.goal_functions.text

Goal function results:

  • Added new folder textattack.goal_function_results.custom
  • Added LogitSumGoalFunctionResult, NamedEntityRecognitionGoalFunctionResult, TargetedBonusGoalFunctionResult, TargetedStrictGoalFunctionResult intextattack.goal_function_results.custom

Validators:

  • Added transformation_consists_of_word_swaps_differential_evolution in shared.validators. This is used to check that the DifferentialEvolution search method is used with a compatible transformation, which must subclass WordSwapDifferentialEvolution.

AttackArgs:

  • Added relevant.

Requirements:

  • Added Levenshtein.

Changes

I tried to minimise changes to existing files.

  • Added a flag allow_skip in textattack.goal_functions.GoalFunction, which defaults to True. When set to False, the attack will still continue even if the initial_result already meets the goal. This was needed to replicate the experiments in the paper.

Design choices

  • I did not implement the custom model wrappers for each experiment in the textattack.models.wrappers folder, instead opting to leave them in the notebook at tests.badcharacters2021. This mirrors the tutorials listed on TextAttack's documentation. If the maintainers would like, this notebook can be transferred to the tutorials section.
  • The key parts of the attack are the use of the DifferentialEvolution search method and the various Transformations implemented. It made sense for these to be added to the textattack.search_methods and textattack.transformations folders.
  • For the transformations, I realised that the existing _get_transformations and _get_replacement_words functionality was insufficient because of two reasons:
    1. the search method itself generates the "perturbation instructions" for an input string (encoded in a "perturbation vector", before passing it to the transformation to manipulate the sentence.
    2. the replacements are character-level, not word-level.
    • To address these issues, I introduced a new base class: textattack.transformations.WordSwapDifferentialEvolution. This class provides a clean interface for my needs. Subclasses are required to implement two methods:
      • get_bounds_and_precomputed(current_text)
        Returns the bounds used by Differential Evolution to sample the perturbation vector, and any precomputed data needed to efficiently apply perturbations (e.g., homoglyph maps).
      • apply_perturbation(current_text, perturbation_vector, precomputed)
        Applies a perturbation vector to an input AttackedText object and returns the modified AttackedText.
    • This design avoids redundant computation by allowing precomputed data to be passed from get_bounds_and_precomputed to apply_perturbation, rather than recalculating it on every call.
    • For compatibility, I still implemented _get_replacement_words for all of the new transformations.
    • The DifferentialEvolution search method also checks for transformation compatibility. The transformation must be an instance of WordSwapDifferentialEvolution.
  • I did also decide to implement the custom goal functions in the textattack.goal_functions folder, mainly because I didn't want to make the attack recipe too bloated. At first I placed them under the classification and text subfolders, but there were some things that didn't match. For example, one of my attacks required as input an array of logits that did not sum to 1, so this required me to override _process_model_outputs. Another was for a Named Entity Recognition task which output one score/label per input token, instead of one score per input sentence. Most required slightly different _get_score functions as well.
  • For each of the custom goal functions, I thus decided to implement a separate GoalFunctionResult as well. I could have used TextToTextGoalFunctionResult, because I just wanted to override get_colored_output, but decided that wouldn't be fully accurate.

Checklist

  • The title of your pull request should be a summary of its contribution.
  • Please write detailed description of what parts have been newly added and what parts have been modified. Please also explain why certain changes were made.
  • Make sure existing tests pass.
  • Add relevant tests. No quality testing = no merge.
  • All public methods must have informative docstrings that work nicely with sphinx. For new modules/files, please add/modify the appropriate .rst file in TextAttack/docs/apidoc.'

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@vlwk vlwk marked this pull request as ready for review May 1, 2025 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant