Skip to content

ASAFind/ASAFind-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ASAFind logo

Download ASAFind to install locally

For local installation, a command line version of ASAFind can be downloaded from our GitHub repository:

https://github.com/ASAFind/ASAFind-2

Installation steps

Using the GitHub function, you can either dowload the files as a zip archive, or you can clone the repository using the provided URL. ASAFind requires a Unix-based operating system like Linux (Ubuntu) or macOS. After download of the latest version, follow these installation steps on a command-line shell:

  • Step into the directory where you want to make the installation e.g.:
    • cd /home/marta/asafind
  • Make a clone of the GitHub (if git is installed: "git clone https://github.com/ASAFind/ASAFind-2.git"), or place the content of the downloaded zip archive in the folder.
  • Run the following commands:
    • python3 -m venv asafind_command_line
    • . asafind_command_line/bin/activate
    • pip install --upgrade pip
    • pip install -r requirements.txt
  • Now you are in a virtual environment named asafind_command_line. Here you can run the scripts or ask for help e.g.:
    • python3 S0_ASAFind.py --help

ASAFind structure

The installation procedure will create the environment asafind_command_line, activate it, install all required packages activate and create the subdirectries temp and output.

ASAFind 2.0

The script S1_ASAFind.py in the environment root directory performs the actual prediction. It is called from the script S0_ASAFind.py, which handles the options and generates the optional graphical output. Input data is a Fasta and a companion TargetP v.2.0 short format tabular output file, with the complete TargetP header (two lines starting with '#'). Some versions of SignalP/TargetP truncate the sequence names. SignalP-3.0 to 20 characters, and 4.0, 4.1 to 58 characters. Therefore, ASAFind only considers the first corresponding characters of the fasta name (and the first 90 in the case of TargetP 2.0), which must be unique within the file. Parts of the fasta name after that character are ignored. Additionally, the fasta name may not contain a '-' or '|'. This requirement is because SignalP/TargetP converts special characters in sequence names (e.g. '-' is changed to '_'). ASAFind requires at least 7 aa upstream and 22 aa downstream of the cleavage site suggested by SignalP/TargetP. The basic output of ASAFind is a tab delimited table containing the results for each sequence in the FASTA input file. The results table, the log files, and if requested the graphical output are zipped and can be found in the folder 'output'. In the current version, graphical output can only be generated for predictions generated from TargetP 2 output. Please save the results in a different location; each new run of ASAFind will overwrite the content of the folder 'output'. Python >= 3.10 is required.

python S0_ASAFind.py --help

usage: S0_ASAFind.py [-h] -f FASTA_FILE -p SIGNALP_FILE [-s SIMPLE_SCORE_CUTOFF] [-t FASTA_FILE_WITH_MOTIFS] [-w] [-v1] [-ppc] [-s_ppc SCORE_CUTOFF_PPC] [-t_ppc FASTA_FILE_WITH_MOTIFS_PPC] [-l] [-my_org MY_ORGANISM] [-v]
-h, --help Show this help message and exit
-f FASTA_FILE, --fasta_file FASTA_FILE Specify the input fasta FILE.
-p SIGNALP_FILE, --signalp_file SIGNALP_FILE Specify the input TargetP/SignalP FILE.
-s SIMPLE_SCORE_CUTOFF, --simple_score_cutoff SIMPLE_SCORE_CUTOFF Optionally, specify an explicit score cutoff, rather than using ASAFind's default algorithm, not compatible with option -v1. The score given here will not be normalized and therefore should be obtained from a distribution of normalized scores.
-t FASTA_FILE_WITH_MOTIFS, --fasta_file_with_motifs FASTA_FILE_WITH_MOTIFS Optionally, specify a custom scoring table. The scoring table will be normalized with the maximum score, which allows for processing of non-normalized as well as normalized scoring tables.
-w, --web_output Format output for web display. This is mostly useful when called by a web app.
-v1, --reproduce_ASAFind_1 Reproduce ASAFind 1.x scores and results (non-normalized scores, if no custom scoring table is specified, the original default scoring table generated without small sample size correction will be used, not compatible with option -s).
-ppc, --include_ppc_prediction Include prediction of proteins that might be targeted to the periplastidic compartment.
-t SCORE_TABLE_FILE, --score_table_file SCORE_TABLE_FILE Optionally, specify a custom scoring table. The scoring table will be normalized with the maximum score, which allows for processing of non-normalized as well as normalized scoring tables.
-o OUT_FILE, --out_file OUT_FILE Specify the path and name of the output file you wish to create. Default will be the same as the fasta_file, but with a ".tab" suffix.
-s_ppc SCORE_CUTOFF_PPC, --score_cutoff_ppc SCORE_CUTOFF_PPC Optionally, specify an explicit score cutoff for the ppc protein prediction, if given, ppc protein prediction will be included. The score given here will not be normalized and therefore should be obtained from a distribution of normalized scores.
-t_ppc SCORE_TABLE_FILE_PPC, --score_table_file_ppc SCORE_TABLE_FILE_PPC Optionally, specify a custom scoring table for the ppc protein prediction, if given, ppc protein prediction will be included. The scoring table will be normalized with the maximum score, which allows for processing of non-normalized as well as normalized scoring tables.
-l, --logomaker If chosen, the program will generate graphical output using logomaker, in .png and .svg formats. They will be included into the output compressed package.
-my_org MY_ORGANISM, --my_organism MY_ORGANISM Specify the name of organism.
-v, --version Show program's version number and exit.

Usage

A test run can be started with the supplied example files using the following command:

  • python S0_ASAFind.py -f example.fasta -p example_summary.targetp2 -l

The results will be zipped and can be found in the folder 'output'. Example SignalP output files are also provided for SignalP 4.1 and SignalP 3, these can be used analogously.

If you use ASAFind in your research please cite our publication (Gruber et al., 2025, https://doi.org/10.1111/tpj.70138) as well as the appropriate publications for SignalP or TargetP:

Further information on the biological background and on strategies for pre-sequence identification can be found in the following publications:

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages