Skip to content

Commit cca6c17

Browse files
authored
Merge pull request #110 from cokelaer/main
add a download utility function
2 parents 1ac37fc + f8c2263 commit cca6c17

File tree

6 files changed

+95
-24
lines changed

6 files changed

+95
-24
lines changed

README.rst

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,6 @@ pipelines, and pipeline management tools into a single library (Sequana) as illu
6565
in **Fig 1** below.
6666

6767
.. figure:: https://raw.githubusercontent.com/sequana/sequana_pipetools/main/doc/veryold.png
68-
:scale: 40%
6968

7069
**Figure 1** Old Sequana framework will all pipelines and Sequana library in the same
7170
place including pipetools (this library).
@@ -74,7 +73,6 @@ Despite maintaining an 80% test coverage, whenever changes were introduced to th
7473

7574

7675
.. figure:: https://raw.githubusercontent.com/sequana/sequana_pipetools/main/doc/old.png
77-
:scale: 40%
7876

7977
**Figure 2** v0.8 of Sequana moved the Snakemake pipelines in independent
8078
repositories. A `cookie cutter <https://github.com/sequana/sequana_pipeline_template>`_
@@ -86,7 +84,6 @@ Nevertheless, certain tools, including those utilized for user interface and inp
8684

8785

8886
.. figure:: https://raw.githubusercontent.com/sequana/sequana_pipetools/main/doc/new.png
89-
:scale: 40%
9087

9188
**Figure 3** New Sequana framework. The new Sequana framework comprises the core library
9289
and bioinformatics tools, which are now separate from the pipelines. Moreover, the
@@ -96,7 +93,6 @@ Nevertheless, certain tools, including those utilized for user interface and inp
9693
As a final step, we separated the rules originally available in Sequana to create an independent package featuring a collection of Snakemake wrappers. These wrappers can be accessed at https://github.com/sequana/sequana-wrappers and offer the added benefit of being rigorously tested through continuous integration.
9794

9895
.. figure:: https://raw.githubusercontent.com/sequana/sequana_pipetools/main/doc/wrappers.png
99-
:scale: 40%
10096

10197
**Figure 3** New Sequana framework 2021. The library itself with the core, the
10298
bioinformatics tools is now fully independent of the pipelines.
@@ -202,9 +198,9 @@ Python module (the last two lines is where the magic happens)::
202198
# create a function for a given option (here --method)
203199
def fill_method():
204200
# any extra sanity checks
205-
cfg['method'] = options['method']
201+
cfg["method"] = options["method"]
206202

207-
if options['from-project']:
203+
if options["from-project"]:
208204
# in --from-project, we fill the method is --method is provided only (since already pre-filled)
209205
if "--method" in sys.argv
210206
fill_method()
@@ -235,9 +231,9 @@ For FastQ files (paired ot not), The config file should look like::
235231

236232
sequana_wrappers: "v0.15.1"
237233

238-
input_directory: '.'
234+
input_directory: "."
239235
input_readtag: "_R[12]_"
240-
input_pattern: '*fastq.gz'
236+
input_pattern: "*fastq.gz"
241237

242238

243239
apptainers:
@@ -317,6 +313,7 @@ Changelog
317313
========= ======================================================================
318314
Version Description
319315
========= ======================================================================
316+
1.0.4 * add utility function to download and untar a tar.gz file
320317
1.0.3 * add levenshtein function. some typo corrections.
321318
1.0.2 * add the dot2png command. pin docutils <0.21 due to pip error
322319
1.0.1 * hot fix in the profile creation (regression)
@@ -355,7 +352,7 @@ Version Description
355352
* --from-project not funtcional (example in multitax pipeline)
356353
* Click checks that input-directoyr is a directory indeed
357354
0.16.1 * Fix/rename error_report into onerror to be included in the Snakemake
358-
onerror section. added 'slurm' in slurm output log file in the
355+
onerror section. added *slurm* in slurm output log file in the
359356
profile
360357
0.16.0 * scripts now use click instead of argparse
361358
* All Options classes have now an equivalent using click.
@@ -379,8 +376,8 @@ Version Description
379376
0.14.X * Module now returns the list of requirements. SequanaManager
380377
creates a txt file with all standalones from the requirements.
381378
0.13.0 * switch to pyproject and fixes #64
382-
0.12.X * automatically populater 'wrappers' in PipelineManager' based on the
383-
config entry 'sequana_wrappers'.
379+
0.12.X * automatically populater *wrappers* in PipelineManager based on the
380+
config entry *sequana_wrappers*.
384381
* Fix the singularity arguments by (i) adding -e and (ii) bind the
385382
/home. Indeed, snakemake sets --home to the current directory.
386383
Somehow the /home is lost. Removed deprecated function
@@ -413,7 +410,7 @@ Version Description
413410
* Move all modules related to pipelines from sequana into
414411
sequana_pipetools
415412
0.5.X * feature removed in sequana to deal with adapter removal and
416-
changes updated in the package (removed the 'design' option
413+
changes updated in the package (removed the *design* option
417414
from the cutadapt rules and needed); add TrimmingOptions.
418415
0.4.X * add FeatureCounts options and slurm status utility
419416
0.4.0 * stable version

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ build-backend = "poetry.core.masonry.api"
66
#maintainer ?#maintainer email
77
[tool.poetry]
88
name = "sequana_pipetools"
9-
version = "1.0.3"
9+
version = "1.0.4"
1010
description = "A set of tools to help building or using Sequana pipelines"
1111
authors = ["Sequana Team"]
1212
license = "BSD-3"

sequana_pipetools/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ def get_package_version(package_name):
2020

2121
logger = colorlog.getLogger(logger.name)
2222

23-
from .misc import url2hash, levenshtein_distance
23+
from .misc import download_and_extract_tar_gz, levenshtein_distance, url2hash
2424
from .sequana_manager import SequanaManager # , get_pipeline_location
2525
from .snaketools import (
2626
Pipeline,

sequana_pipetools/misc.py

Lines changed: 69 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,70 @@
1111
# Contributors: https://github.com/sequana/sequana/graphs/contributors
1212
##############################################################################
1313
import hashlib
14+
import os
1415
import sys
16+
import tarfile
17+
18+
import colorlog
19+
import requests
20+
from tqdm import tqdm
1521

1622
from sequana_pipetools import get_package_version
1723

18-
__all__ = ["Colors", "print_version", "error", "url2hash", "levenshtein_distance"]
24+
logger = colorlog.getLogger(__name__)
25+
26+
27+
__all__ = ["Colors", "print_version", "error", "url2hash", "levenshtein_distance", "download_and_extract_tar_gz"]
28+
29+
30+
def download_and_extract_tar_gz(url, extract_to):
31+
"""
32+
Downloads a .tar.gz file from a given URL and extracts it to the specified directory.
33+
34+
:param url: URL of the .tar.gz file
35+
:param extract_to: Directory where the contents will be extracted
36+
"""
37+
# Get the file name from the URL
38+
filename = url.split("/")[-1]
39+
file_path = os.path.join(extract_to, filename)
40+
41+
# create the directory
42+
os.makedirs(extract_to, exist_ok=True)
43+
44+
# Download the file
45+
logger.info(f"Downloading {filename}...")
46+
response = requests.get(url, stream=True)
47+
response.raise_for_status() # Raise an exception for HTTP errors
48+
total_size = int(response.headers.get("content-length", 0))
49+
50+
# Write the downloaded content to a file
51+
# Download with a progress bar
52+
with open(file_path, "wb") as file, tqdm(
53+
desc=f"Downloading {filename}",
54+
total=total_size,
55+
unit="B",
56+
unit_scale=True,
57+
unit_divisor=1024,
58+
) as bar:
59+
for chunk in response.iter_content(chunk_size=8192):
60+
file.write(chunk)
61+
bar.update(len(chunk))
62+
63+
logger.info(f"Downloaded {filename} to {file_path}")
64+
65+
# Extract the tar.gz file
66+
if tarfile.is_tarfile(file_path):
67+
logger.info(f"Extracting {filename}...")
68+
69+
with tarfile.open(file_path, "r:gz") as tar:
70+
tar.extractall(path=extract_to)
71+
logger.info(f"Extracted to {extract_to}")
72+
else:
73+
logger.info(f"{file_path} is not a valid tar.gz file.")
74+
75+
# Optionally, you can delete the .tar.gz file after extraction
76+
os.remove(file_path)
77+
logger.info("Process completed.")
1978

2079

2180
def levenshtein_distance(token1: str, token2: str) -> int:
@@ -59,14 +118,18 @@ def levenshtein_distance(token1: str, token2: str) -> int:
59118
if token1[t1 - 1] == token2[t2 - 1]:
60119
distances[t1][t2] = distances[t1 - 1][t2 - 1]
61120
else:
62-
distances[t1][t2] = min(
63-
distances[t1][t2 - 1], # Insertion
64-
distances[t1 - 1][t2], # Deletion
65-
distances[t1 - 1][t2 - 1] # Substitution
66-
) + 1
121+
distances[t1][t2] = (
122+
min(
123+
distances[t1][t2 - 1], # Insertion
124+
distances[t1 - 1][t2], # Deletion
125+
distances[t1 - 1][t2 - 1], # Substitution
126+
)
127+
+ 1
128+
)
67129

68130
return distances[len1][len2]
69131

132+
70133
def url2hash(url):
71134
md5hash = hashlib.md5()
72135
md5hash.update(url.encode())

sequana_pipetools/sequana_manager.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -473,8 +473,7 @@ def teardown(self, check_schema=True, check_input_files=True):
473473
msg = (
474474
"The version {} of your completion file for the {} pipeline seems older than the installed"
475475
" pipeline itself ({}). Please, consider updating the completion file {}"
476-
" using the following command: \n\t sequana_pipetools --completion {}\n"
477-
"available in the sequana_pipetools package (pip install sequana_completion)"
476+
" using the following command: \n\t sequana_pipetools --completion {}\n\n"
478477
)
479478
msg = msg.format(version, self.name, self._get_package_version(), completion, self.name)
480479
logger.info(msg)

tests/test_misc.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,18 @@
1-
from sequana_pipetools.misc import Colors, print_version, error, url2hash, levenshtein_distance
1+
from sequana_pipetools.misc import (
2+
Colors,
3+
download_and_extract_tar_gz,
4+
error,
5+
levenshtein_distance,
6+
print_version,
7+
url2hash,
8+
)
29

310

11+
def test_download_and_extract_tar_gz(tmpdir):
12+
13+
url = "https://github.com/sequana/sequana_pipetools/archive/refs/tags/v1.0.3.tar.gz"
14+
download_and_extract_tar_gz(url, tmpdir)
15+
416

517
def test_levenshtein():
618
assert levenshtein_distance("kitten", "sitting") == 3
@@ -9,7 +21,7 @@ def test_levenshtein():
921

1022
def test_url2hash():
1123
md5 = url2hash("https://zenodo.org/record/7822910/files/samtools_1.17_minimap2_2.24.0.img")
12-
assert md5 == 'c3e4a8244ce7b65fa873ebda134fea7f'
24+
assert md5 == "c3e4a8244ce7b65fa873ebda134fea7f"
1325

1426

1527
def test_colors():

0 commit comments

Comments
 (0)