Skip to content

Commit 8bb3863

Browse files
committed
fixed tar/gtar problem with MacOS, added OS detection
1 parent 99bc6a5 commit 8bb3863

File tree

2 files changed

+48
-23
lines changed

2 files changed

+48
-23
lines changed

Diff for: README.md

+14-5
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,6 @@
66

77
**fast5_fetcher** is a tool for fetching nanopore fast5 files to save time and simplify downstream analysis.
88

9-
109
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1413903.svg)](https://doi.org/10.5281/zenodo.1413903)
1110

1211
## Contents
@@ -21,9 +20,11 @@ Reducing the number of fast5 files per folder in a single experiment was a welco
2120

2221
Following a self imposed guideline, most things written to handle nanopore data or bioinformatics in general, will use as little 3rd party libraries as possible, aiming for only core libraries, or have all included files in the package.
2322

24-
In the case of `fast5_fetcher.py` and `batch_tater.py`, only core python libraries are used. So as long as **Python 2.7+** is present, everything should work with no extra steps. (Python 3 compatibility is coming in the next update)
23+
In the case of `fast5_fetcher.py` and `batch_tater.py`, only core python libraries are used. So as long as **Python 2.7+** is present, everything should work with no extra steps. (Python 3 compatibility is coming in the next big update)
24+
25+
##### Operating system:
2526

26-
There is one catch. Everything is written primarily for use with **Linux**. Due to **MacOS** running on Unix, so long as the GNU tools are installed, there should be minimal issues running it. **Windows 10** however may require more massaging to work with the new Linux integration.
27+
There is one catch. Everything is written primarily for use with **Linux**. Due to **MacOS** running on Unix, so long as the GNU tools are installed (see below), there should be minimal issues running it. **Windows 10** however may require more massaging to work with the new Linux integration.
2728

2829
# Getting Started
2930

@@ -187,6 +188,14 @@ Download the repository:
187188

188189
git clone https://github.com/Psy-Fer/fast5_fetcher.git
189190

191+
If using MacOS, and NOT using homebrew, install it here:
192+
193+
https://brew.sh/
194+
195+
then install gnu-tar with:
196+
197+
brew install gnu-tar
198+
190199
### Quick start
191200

192201
Basic use on a local computer
@@ -355,13 +364,13 @@ echo $CMD && $CMD
355364

356365
## Acknowledgements
357366

358-
I would like to thank the rest of my lab (Shaun Carswell, Kirston Barton) in Genomic Technologies team from the [Garvan Institute](https://www.garvan.org.au/) for their feedback on the development of this tool.
367+
I would like to thank the rest of my lab (Shaun Carswell, Kirston Barton, Kai Martin) in Genomic Technologies team from the [Garvan Institute](https://www.garvan.org.au/) for their feedback on the development of this tool.
359368

360369
## Cite
361370

362371
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1413903.svg)](https://doi.org/10.5281/zenodo.1413903)
363372

364-
James M. Ferguson, & Martin A. Smith. (2018, September 12). Psy-Fer/fast5_fetcher: Initial release of fast5_fetcher (Version v1.0). Zenodo. http://doi.org/10.5281/zenodo.1413903
373+
James M. Ferguson, & Martin A. Smith. (2018, September 12). Psy-Fer/fast5_fetcher: Initial release of fast5_fetcher (Version v1.0). Zenodo. <http://doi.org/10.5281/zenodo.1413903>
365374

366375
## License
367376

Diff for: fast5_fetcher.py

+34-18
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import subprocess
66
import traceback
77
import argparse
8+
import platform
89
from functools import partial
910
'''
1011
@@ -17,17 +18,17 @@
1718
It takes 3 files as input: fastq/paf/flat, sequencing_summary, index
1819
1920
--------------------------------------------------------------------------------------
20-
version 0.0 - initial
21-
version 0.2 - added argparser and buffered gz streams
22-
version 0.3 - added paf input
23-
version 0.4 - added read id flat file input
24-
version 0.5 - pppp print output instead of extracting
25-
version 0.6 - did a dumb. changed x in s to set/dic entries O(n) vs O(1)
26-
version 0.7 - cleaned up a bit to share and removed some hot and steamy features
27-
version 0.8 - Added functionality for un-tarred file structures and seq_sum only
28-
version 1.0 - First release
29-
version 1.1 - refactor with dicswitch and batch_tater updates
30-
21+
version 0.0 - initial
22+
version 0.2 - added argparser and buffered gz streams
23+
version 0.3 - added paf input
24+
version 0.4 - added read id flat file input
25+
version 0.5 - pppp print output instead of extracting
26+
version 0.6 - did a dumb. changed x in s to set/dic entries O(n) vs O(1)
27+
version 0.7 - cleaned up a bit to share and removed some hot and steamy features
28+
version 0.8 - Added functionality for un-tarred file structures and seq_sum only
29+
version 1.0 - First release
30+
version 1.1 - refactor with dicswitch and batch_tater updates
31+
version 1.1.1 - Bug fix on --transform method, added OS detection
3132
3233
TODO:
3334
- Python 3 compatibility
@@ -82,8 +83,8 @@ def main():
8283
help="paf alignment file for read ids")
8384
group.add_argument("-f", "--flat",
8485
help="flat file of read ids")
85-
# parser.add_argument("-b", "--fast5",
86-
# help="fast5.tar path to extract from - individual")
86+
parser.add_argument("--OSystem", default=platform.system(),
87+
help="running operating system - leave default unless doing odd stuff")
8788
parser.add_argument("-s", "--seq_sum",
8889
help="sequencing_summary.txt.gz file")
8990
parser.add_argument("-i", "--index",
@@ -128,7 +129,7 @@ def main():
128129
continue
129130
else:
130131
try:
131-
extract_file(p, f, args.output)
132+
extract_file(args, p, f)
132133
except:
133134
traceback.print_exc()
134135
print >> sys.stderr, "Failed to extract:", p, f
@@ -312,17 +313,32 @@ def get_paths(index_file, filenames, f5=None):
312313
return paths
313314

314315

315-
def extract_file(path, filename, save_path):
316+
def extract_file(args, path, filename):
316317
'''
317318
Do the extraction.
318319
I was using the tarfile python lib, but honestly, it sucks and was too volatile.
319320
if you have a better suggestion, let me know :)
320-
That --transform hack is awesome btw. Blows away all the leading folders. use it
321+
That --transform hack is awesome btw. Blows away all the leading folders. use
321322
cp for when using untarred structures. Not recommended, but here for completeness.
323+
324+
--transform not working on MacOS. Need to use gtar
325+
Thanks to Kai Martin for picking that one up!
326+
322327
'''
328+
OSystem = ""
329+
OSystem = args.OSystem
330+
save_path = args.output
323331
if path.endswith('.tar'):
324-
cmd = "tar -xf {} --transform='s/.*\///' -C {} {}".format(
325-
path, save_path, filename)
332+
if OSystem in ["Linux", "Windows"]:
333+
cmd = "tar -xf {} --transform='s/.*\///' -C {} {}".format(
334+
path, save_path, filename)
335+
elif OSystem == "Darwin":
336+
cmd = "gtar -xf {} --transform='s/.*\///' -C {} {}".format(
337+
path, save_path, filename)
338+
else:
339+
print >> sys.stderr, "Unsupported OSystem, trying Tar anyway, OS:", OSystem
340+
cmd = "tar -xf {} --transform='s/.*\///' -C {} {}".format(
341+
path, save_path, filename)
326342
else:
327343
cmd = "cp {} {}".format(filename, os.path.join(
328344
save_path, filename.split('/')[-1]))

0 commit comments

Comments
 (0)