Skip to content

Commit f673be2

Browse files
committed
Compression
1 parent be74b72 commit f673be2

File tree

8 files changed

+267
-263
lines changed

8 files changed

+267
-263
lines changed

7zip.md

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# 7zip
2+
3+
Microsoft proprietary program.
4+
5+
Can do lots of formats:
6+
7+
- 7z format
8+
- RAR with `p7zip-rar` installed
9+
- zip
10+
11+
But *use only for 7z*, which it was made for.
12+
13+
With 7zip, you can open `.exe` files to extract their inner data.

README.md

+9-1
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,6 @@ Media video, games, etc.) file types, viewers, editors, capture, synthesizers:
118118

119119
- [Audio](audio/): audio, music, sound.
120120
- [Book](book.md): PDF, DJVU.
121-
- [Compression](compression.md): Zip, tar, gzip, 7z.
122121
- [Dictionary](dictionary.md): dictionary formats.
123122
- [Game](game.md): games, emulation.
124123
- [Image](image/): images, photos.
@@ -133,6 +132,15 @@ File sharing:
133132
- [NFS](nfs.md) (WIP)
134133
- [LDAP](ldap.md) (WIP)
135134

135+
[Compression](compression.md):
136+
137+
- [7zip](7zip.md)
138+
- [File Roller](file-roller.md)
139+
- [RAR](rar.md)
140+
- [gzip](gzip.md)
141+
- [tar](tar.md)
142+
- [zip](zip.md)
143+
136144
User operations:
137145

138146
- [id](id.md)

compression.md

+4-262
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
# Compression
22

3-
File compression formats and utilities.
4-
5-
The performance parameters are:
3+
The main performance parameters to consider when choosing the compression method are:
64

75
- compression ratio
86
- compression time
@@ -11,263 +9,7 @@ The performance parameters are:
119
- ability to break into chunks
1210
- keep file metadata such as permissions, hidden (windows), etc.
1311

14-
## ZIP
15-
16-
Most widely supported format.
17-
18-
Not so high compression rate.
19-
20-
Easy to view and extract single files.
21-
22-
Compresses dir file by file it seems.
23-
24-
ZIP file or directory:
25-
26-
zip -r "$F".zip "$F"
27-
28-
`-r`: add dir recursively. Otherwise, adds only the top dir and not its contents.
29-
30-
Using it on a directory will keep the top directory in the ZIP. To avoid that and keep only the files in the directory, do:
31-
32-
cd dir
33-
zip -r ../dir.zip .
34-
35-
If you don't have hidden files on the top level:
36-
37-
zip -r dir.zip dir/*
38-
39-
Note that:
40-
41-
zip -r dir.zip dir/.*
42-
43-
will not work by default for hidden files, since `.*` will also expand to `.` and `..` with default `bash` options.
44-
45-
`-e`: encrypt:
46-
47-
zip -er "$F".zip "$F"
48-
49-
You can still see filenames, but not extract them!
50-
51-
List all files in zip file
52-
53-
unzip -l "$F".zip
54-
55-
Extract files from zip:
56-
57-
unzip "$F".zip
58-
59-
If has password, asks for it.
60-
61-
To a dir:
62-
63-
unzip "$F".zip -d out
64-
65-
for F in *; do echo "$F"; echo "$F".zip; zip "$F".zip "$F"; done
66-
67-
ZIP every file in cur dir to file.zip
68-
69-
## tb2
70-
71-
## tgz
72-
73-
## tar
74-
75-
Name origin: `Tape ARchive`.
76-
77-
The tar *format* is specified by POSIX 7 together with the `pax` utility: <http://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html#tag_20_92>
78-
79-
The `tar` is a GNU implementation, and is not specified by POSIX.
80-
81-
tar only turns a dir into file, but does no compression.
82-
83-
It is a popular option to transform directories in to files in Nix systems, as the format natively stores and preserves ext filesystem metadata such as:
84-
85-
- ownership
86-
- permissions
87-
- symlinks
88-
- timestamps
89-
90-
Since tar offers no compression, it is often coupled with `gz` and `bz2`: those are files compressors.
91-
92-
tar end compressions are used so commonly together that shorthand extensions exist for them:
93-
94-
- `tgz` == `tar.gz`
95-
- `txz` == `tar.xz`
96-
- `tb2` == `tbz` == `tar.bz2`
97-
98-
General GNU interface:
99-
100-
- single char options don't start with hyphen
101-
- every single letter option has a corresponding double hyphen multi char version
102-
103-
Create tar:
104-
105-
tar vcf "$F".tar "$F"
106-
tar vczf "$F".tgz "$F"
107-
tar vcjf "$F".tbz "$F"
108-
tar vcJf "$F".txz "$F"
109-
110-
- `c`: create
111-
- `f`: set output file to next argument. If not given, outputs to stdout.
112-
- `z`: `gzip`
113-
- `j`: `bzip2`
114-
- `v`: verbose
115-
116-
If the output file exists, it is overwritten.
117-
118-
Create from tar with multiple files:
119-
120-
tar vcf a.tar f1 f2
121-
122-
`r`: append file to existing tar, or create new tar:
123-
124-
tar rcf a.tar f
125-
126-
Extract:
127-
128-
tar vxf "$F".tar
129-
tar vxzf "$F".tgz
130-
tar vxjf "$F".tbz
131-
132-
## zlib
133-
134-
## gzip
135-
136-
## gunzip
137-
138-
## gz
139-
140-
Extension: `gz`.
141-
142-
Library name: zlib, GNU.
143-
144-
Popular wrapper: `gzip`, and `gunzip` to unzip.
145-
146-
Vs zip:
147-
148-
- Completely different file types.
149-
150-
- Both use the DEFLATE algorithm: <https://en.wikipedia.org/wiki/DEFLATE>, and therefore have very similar compression ratios and speeds.
151-
152-
- The zlib library does not focus on directories: only single files. For this reason it is commonly used together with `tar` which only packs directories into a file. For convenience however, the command line executable can deal with `.tgz` files.
153-
154-
- gzip seems to have very one dominant implementation: GNU zlib, so that gzip can refer to either the utility or format used by that utility.
155-
156-
zip has many implementations: WinRAR, WinZip on closed source on Windows, Info-ZIP and libzip open source, Info-ZIP being the default one present on Ubuntu 12.04. Therefore the term zip usually only refers to the file format.
157-
158-
If a file is only `.gz` but not `.tgz` you cannot use tar to extract it.
159-
160-
Create `a.txt.gz` and `rm` `a.txt`:
161-
162-
gzip a.txt
163-
164-
Extract `a.gz` and erase it if successful:
165-
166-
gunzip a.gz
167-
168-
`.gz` all files under given directory recursively individually. Remove each original:
169-
170-
gzip -r .
171-
172-
### gz file format
173-
174-
IETF standardized format: <https://www.ietf.org/rfc/rfc1952.txt>
175-
176-
### Hardlinks
177-
178-
### Keep original
179-
180-
Does not work if the file has any hard links, probably because that would not reduce memory usage as it breaks the hardlink. AKA: tries to be too smart and annoys us to hell!
181-
182-
Workaround: keep the original on the operations: <http://unix.stackexchange.com/questions/46786/how-to-tell-gzip-to-keep-original-file>
183-
184-
Workarounds: `-c` outputs to stdout:
185-
186-
gzip -c a > a.gz
187-
188-
Read input from stdin:
189-
190-
gzip < a > a.gz
191-
192-
And finally, `gzip` 1.6 (2013) has the `-k, --keep` option:
193-
194-
gzip -k a
195-
gzip -kr .
196-
197-
## RAR
198-
199-
Proprietary `Roshal ARchive`, after it's creator Eugene Roshal.
200-
201-
Can do split archive.
202-
203-
Split archive extensions match the following Perl regexes:
204-
205-
- `.part\d+\.rar`
206-
- `.r\d+`
207-
208-
Extract contents of `a.rar` to `./`
209-
210-
unrar x a.rar
211-
212-
Before / after:
213-
214-
a.rar
215-
/dir1/
216-
/dir1/f1
217-
/dir1/f2
218-
219-
===============
220-
221-
a.rar
222-
dir1/f1
223-
dir1/f2
224-
225-
Out to `./out/` directory, creates this directory if necessary:
226-
227-
unrar x a.rar out
228-
229-
Extract multipart RAR:
230-
231-
unrar x a.r00
232-
unrar x a.part1.rar
233-
234-
Recursively find all files in `a.rar`, and outputs them to current dir with old basename possible name conflicts:
235-
236-
unrar e a.rar
237-
238-
Sample output:
239-
240-
a.rar
241-
/dir1/
242-
/dir1/f1
243-
/dir1/f2
244-
245-
===============
246-
247-
a.rar
248-
f1
249-
f2
250-
251-
### Create RAR
252-
253-
`a` for add:
254-
255-
rar a dir.rar dir
256-
257-
## 7zip
258-
259-
Microsoft proprietary program.
260-
261-
Can do lots of formats:
262-
263-
- 7z format
264-
- RAR with `p7zip-rar` installed
265-
- zip
266-
267-
But *use only for 7z*, which it was made for.
268-
269-
With 7zip, you can open `.exe` files to extract their inner data.
270-
271-
## File roller
12+
If you don't have very strict constraints, default to:
27213

273-
Very good GUI app to view inside multiple archive formats and extract them.
14+
- `zip` if you want OS portability
15+
- `tar.gz` if you want to maintain Linux filesystem metadata intact: permissions, symlinks, etc.

file-roller.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# File Roller
2+
3+
Very good GUI app to view inside multiple archive formats and extract them.

gzip.md

+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# gzip
2+
3+
# zlib
4+
5+
# gunzip
6+
7+
# gz
8+
9+
Extension: `gz`.
10+
11+
Library name: zlib, GNU.
12+
13+
Popular wrapper: `gzip`, and `gunzip` to unzip.
14+
15+
Vs zip:
16+
17+
- Completely different file types.
18+
19+
- Both use the DEFLATE algorithm: <https://en.wikipedia.org/wiki/DEFLATE>, and therefore have very similar compression ratios and speeds.
20+
21+
- The zlib library does not focus on directories: only single files. For this reason it is commonly used together with `tar` which only packs directories into a file. For convenience however, the command line executable can deal with `.tgz` files.
22+
23+
- gzip seems to have very one dominant implementation: GNU zlib, so that gzip can refer to either the utility or format used by that utility.
24+
25+
zip has many implementations: WinRAR, WinZip on closed source on Windows, Info-ZIP and libzip open source, Info-ZIP being the default one present on Ubuntu 12.04. Therefore the term zip usually only refers to the file format.
26+
27+
If a file is only `.gz` but not `.tgz` you cannot use tar to extract it.
28+
29+
Create `a.txt.gz` and `rm` `a.txt`:
30+
31+
gzip a.txt
32+
33+
Extract `a.gz` and erase it if successful:
34+
35+
gunzip a.gz
36+
37+
`.gz` all files under given directory recursively individually. Remove each original:
38+
39+
gzip -r .
40+
41+
### gz file format
42+
43+
IETF standardized format: <https://www.ietf.org/rfc/rfc1952.txt>

0 commit comments

Comments
 (0)