forked from prog4biol/pfb2017
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.md~
executable file
·5227 lines (3885 loc) · 184 KB
/
README.md~
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Programming For Biology 2017
============================
__Instructors__
Simon Prochnik
Sofia Robb
---
__Table of Contents__
* [Big Picture](#big-picture)
* [Why?](#why)
* [Helpful Tips](#helpful-tips)
* [Unix](#unix)
* [Unix 1](#unix-1)
* [Unix Overview](#unix-overview)
* [What is the Command-Line?](#what-is-the-command-line)
* [The Basics](#the-basics)
* [Logging into Your Workstation](#logging-into-your-workstation)
* [Bringing up the Command-Line](#bringing-up-the-command-line)
* [OK. I've Logged in. What Now?](#ok-ive-logged-in--what-now)
* [Command-Line Prompt](#command-line-prompt)
* [Issuing Commands](#issuing-commands)
* [Command-Line Editing](#command-line-editing)
* [Wildcards](#wildcards)
* [Home Sweet Home](#home-sweet-home)
* [Getting Around](#getting-around)
* [Essential Unix Commands](#essential-unix-commands)
* [Getting Information About Commands](#getting-information-about-commands)
* [Arguments and Command Switches](#arguments-and-command-switches)
* [Spaces and Funny Characters](#spaces-and-funny-characters)
* [Useful Commands](#useful-commands)
* [Manipulating Directories](#manipulating-directories)
* [Networking](#networking)
* [Standard I/O and Command Redirection](#standard-io-and-command-redirection)
* [Redirection Meta-Characters](#redirection-meta-characters)
* [Filters, Filenames and Standard Input](#filters-filenames-and-standard-input)
* [Standard I/O and Pipes](#standard-io-and-pipes)
* [More Pipe Idioms](#more-pipe-idioms)
* [Advanced Unix](#advanced-unix)
* [Link to Unix 1 Problem Set](#link-to-unix-1-problem-set)
* [Unix 2](#unix-2)
* [Text Editors](#text-editors)
* [Git for Beginners](#git-for-beginners)
* [The Big Picture.](#the-big-picture)
* [Collaboration](#collaboration)
* [Storing Versions](#storing-versions)
* [Restoring Previous Versions](#restoring-previous-versions)
* [Backup](#backup)
* [The Details](#the-details)
* [The Basics](#the-basics-1)
* [Creating a new repository](#creating-a-new-repository)
* [Cloning a Repository](#cloning-a-repository)
* [Links to <em>Slightly</em> less basic topics](#links-to-slightly-less-basic-topics)
* [Link To Unix 2 Problem Set](#link-to-unix-2-problem-set)
* [Python](#python)
* [Python 1](#python-1)
* [Python Overview](#python-overview)
* [Running Python](#running-python)
* [Interactive Interpreter](#interactive-interpreter)
* [Running Python Scripts](#running-python-scripts)
* [Python Script](#python-script)
* [Syntax](#syntax)
* [Python Identifiers](#python-identifiers)
* [Naming conventions for Python Identifiers](#naming-conventions-for-python-identifiers)
* [Reserved Words](#reserved-words)
* [Lines and Indentation](#lines-and-indentation)
* [Comments](#comments)
* [Blank Lines](#blank-lines)
* [Python Options](#python-options)
* [Data Types and Variables](#data-types-and-variables)
* [Numbers and Strings](#numbers-and-strings)
* [Lists](#lists)
* [Command line parameters: A Special Built-in List](#command-line-parameters-a-special-built-in-list)
* [Tuple](#tuple)
* [Dictionary](#dictionary)
* [Type Conversion](#type-conversion)
* [Link to Python 1 Problem Set](#link-to-python-1-problem-set)
* [Python 2](#python-2)
* [Operators](#operators)
* [Arthmetic Operators](#arthmetic-operators)
* [Assignment Operators](#assignment-operators)
* [Comparison Operators](#comparison-operators)
* [Logical Operators](#logical-operators)
* [Membership Operators](#membership-operators)
* [Operator Precedence](#operator-precedence)
* [Truth](#truth)
* [Use bool() to test for truth](#use-bool-to-test-for-truth)
* [Logic: Control Statements](#logic-control-statements)
* [If Statement](#if-statement)
* [if/elif](#ifelif)
* [Numbers](#numbers)
* [integer](#integer)
* [floating point number](#floating-point-number)
* [complex number](#complex-number)
* [Conversion functions](#conversion-functions)
* [Numeric Functions](#numeric-functions)
* [Comparing two numbers](#comparing-two-numbers)
* [Link to Python 2 Problem Set](#link-to-python-2-problem-set)
* [Python 3](#python-3)
* [Sequences](#sequences)
* [What functions go with my object?](#what-functions-go-with-my-object)
* [Strings](#strings)
* [Quotation Marks](#quotation-marks)
* [Strings and the print() function](#strings-and-the-print-function)
* [Errors and Printing](#errors-and-printing)
* [Special/Escape Characters](#specialescape-characters)
* [Concatenation](#concatenation)
* [Determine the length of a string](#determine-the-length-of-a-string)
* [Changing String Case](#changing-string-case)
* [Find and Count](#find-and-count)
* [Find and Replace](#find-and-replace)
* [Extracting a Substring, or Slicing](#extracting-a-substring-or-slicing)
* [Locate and Report](#locate-and-report)
* [Other String Methods](#other-string-methods)
* [String Formatting](#string-formatting)
* [The format() mini-language](#the-format-mini-language)
* [Summary of special formatting symbols so far](#summary-of-special-formatting-symbols-so-far)
* [What's the point?](#whats-the-point)
* [Lists and Tuples](#lists-and-tuples)
* [Lists](#lists-1)
* [Accessing Values in Lists](#accessing-values-in-lists)
* [Changing Values in a List](#changing-values-in-a-list)
* [Exracting a Subset of a List, or Slicing](#exracting-a-subset-of-a-list-or-slicing)
* [List Operators](#list-operators)
* [List Functions](#list-functions)
* [List Methods](#list-methods)
* [Building a List one Value at a Time](#building-a-list-one-value-at-a-time)
* [Link to Python 3 Problem Set](#link-to-python-3-problem-set)
* [Python 4](#python-4)
* [Loops](#loops)
* [while loop](#while-loop)
* [While Loop Syntax](#while-loop-syntax)
* [While/Else](#whileelse)
* [For Loops](#for-loops)
* [For Loop Syntax](#for-loop-syntax)
* [For/Else](#forelse)
* [Loop Control](#loop-control)
* [Loop Control: Break](#loop-control-break)
* [Loop Control: Continue](#loop-control-continue)
* [Iterators](#iterators)
* [List Comprehension](#list-comprehension)
* [Dictionaries](#dictionaries)
* [Creating a Dictionary](#creating-a-dictionary)
* [Accessing Values in Dictionaries](#accessing-values-in-dictionaries)
* [Changing Values in a Dictionary](#changing-values-in-a-dictionary)
* [Building a Dictionary one Key/Value at a Time](#building-a-dictionary-one-keyvalue-at-a-time)
* [Checking That Dictionary Keys Exist](#checking-that-dictionary-keys-exist)
* [Sorting Dictionary Keys](#sorting-dictionary-keys)
* [Dictionary Functions](#dictionary-functions)
* [Dictionary Methods](#dictionary-methods)
* [Sets](#sets)
* [Set Operators](#set-operators)
* [Set Functions](#set-functions)
* [Set Methods](#set-methods)
* [Link to Python 4 Problem Set](#link-to-python-4-problem-set)
* [Python 5](#python-5)
* [I/O and Files](#io-and-files)
* [Writing to the Screen](#writing-to-the-screen)
* [Reading input from the keyboard](#reading-input-from-the-keyboard)
* [Reading from a File](#reading-from-a-file)
* [Open a File](#open-a-file)
* [Reading the contents of a file](#reading-the-contents-of-a-file)
* [Writing to a File](#writing-to-a-file)
* [Building a Dictionary from a File](#building-a-dictionary-from-a-file)
* [Link to Python 5 Problem Set](#link-to-python-5-problem-set)
* [Python 6](#python-6)
* [Regular Expressions](#regular-expressions)
* [Individual Characters](#individual-characters)
* [Character Classes](#character-classes)
* [Anchors](#anchors)
* [Quantifiers](#quantifiers)
* [Variables and Patterns](#variables-and-patterns)
* [Either Or](#either-or)
* [Subpatterns](#subpatterns)
* [Using Subpatterns Inside the Regular Expression Match](#using-subpatterns-inside-the-regular-expression-match)
* [Subpatterns and Greediness](#subpatterns-and-greediness)
* [Using Subpatterns Outside the Regular Expression Match](#using-subpatterns-outside-the-regular-expression-match)
* [Practical Example: Codons](#practical-example-codons)
* [Truth and Regular Expression Matches](#truth-and-regular-expression-matches)
* [Using Regular expressions in substitutions](#using-regular-expressions-in-substitutions)
* [Using subpatterns in the replacement](#using-subpatterns-in-the-replacement)
* [<a href="https://github.com/srobb1/pfb2017/blob/master/problemsets/Python_06_problemset.md">Link to Python 6 Problem Set</a>](#link-to-python-6-problem-set)
* [Python 7](#python-7)
* [Functions](#functions)
* [Creating/Defining a Funtion to Find AT Content:](#creatingdefining-a-funtion-to-find-at-content)
* [Using/Running/Calling Your function:](#usingrunningcalling-your-function)
* [The details](#the-details-1)
* [Naming Arguments](#naming-arguments)
* [Keyword Arguments](#keyword-arguments)
* [Default Values for Arguments](#default-values-for-arguments)
* [lambda](#lambda)
* [Scope](#scope)
* [Local Variables](#local-variables)
* [Global](#global)
* [Modules](#modules)
* [os.path](#ospath)
* [os.system](#ossystem)
* [subprocess](#subprocess)
* [Capturing output from a shell pipeline](#capturing-output-from-a-shell-pipeline)
* [Capturing output the long way (for a single command)](#capturing-output-the-long-way-for-a-single-command)
* [sys](#sys)
* [re](#re)
* [copy](#copy)
* [math](#math)
* [random](#random)
* [statistics](#statistics)
* [glob](#glob)
* [argparse](#argparse)
* [Many more modules that do many things](#many-more-modules-that-do-many-things)
* [Link to Python 7 Problem Set](#link-to-python-7-problem-set)
* [Python 8](#python-8)
* [Exception Handling](#exception-handling)
* [try/except/else/finally](#tryexceptelsefinally)
* [Getting more information about an exception](#getting-more-information-about-an-exception)
* [Raising an Exception](#raising-an-exception)
* [Datastructures](#datastructures)
* [Two-demensional lists](#two-demensional-lists)
* [Lists of dictionaries](#lists-of-dictionaries)
* [Dictionaries of lists](#dictionaries-of-lists)
* [Dictionaries of dictionaries](#dictionaries-of-dictionaries)
* [Link to Python 8 Problem Set](#link-to-python-8-problem-set)
* [Python 9](#python-9)
* [BioPython](#biopython)
* [BioPython Overview](#biopython-overview)
* [BioPython Subtopic 1](#biopython-subtopic-1)
* [BioPython Subtopic 2](#biopython-subtopic-2)
* [Bioinformatic Analysis and Tools](#bioinformatic-analysis-and-tools)
* [Bioinformatic Analysis and Tools Overview](#bioinformatic-analysis-and-tools-overview)
* [Sequence Search and Alignments](#sequence-search-and-alignments)
* [Assembly](#assembly)
* [DNA](#dna)
* [RNA](#rna)
* [NGS](#ngs)
* [Ontology](#ontology)
***
# Programming For Biology 2017
__Instructors__
Simon Prochnik
Sofia Robb
## Big Picture
## Why?
Why is it important for **biologists** to learn to program?
You probably already know the answer to this question since you are here.
We firmly believe that knowing how to program is just as essential as knowing how to run a gel or set up a PCR reaction. The data we now get from a single experiment can be overwhelming. This data often needs to be reformatted, filtered, and analyzed in unique ways. Programming allows you to perform these tasks in an **efficient** and **reproducible** way.
## Helpful Tips
What are our tips for having a successful programming course?
1. Practice, practice, practice. Please spend as much time as possible actually coding.
2. Write only a line or two of code, then test it. If you write too many lines, it becomes more difficult to debug if there is an error.
3. Errors are not failures. Every error you get is a learning opportunity. Every single error you debug is a major success. Fixing errors is how you will cement what you have learned.
4. Don't spend too much time trying to figure out a problem. While it's a great learning experience to try to solve an isssue on your own, it's not fun getting frustrated or spending a lot of time stuck. We are here to help you, so please ask us whenever you need help.
5. Lectures are important, but the practice is more important.
6. Review sessions are important, but practice is more important.
7. Our key goal is to slowly, but surely, teach you how to solve problems on your own.
---
# Unix
## Unix 1
### Unix Overview
#### What is the Command-Line?
Underlying the pretty Mac OSX GUI is a powerful command-line operating system. The command-line gives you access to the internals of the OS, and is also a convenient way to write custom software and scripts.
Many bioinformatics tools are written to run on the command-line and have no graphical interface. In many cases, a command-line tool is more versatile than a graphical tool, because you can easily combine command-line tools into automated scripts that accomplish tasks without human intervention.
In this course, we will be writing Python scripts that are completely command-line based.
### The Basics
#### Logging into Your Workstation
Your workstation is an iMac. To log into it, provide the following information:
_Your username:_ admin
_Your password:_ cshl
#### Bringing up the Command-Line
To bring up the command-line, use the Finder to navigate to _Applications->Utilities_ and double-click on the _Terminal_ application. This will bring up a window like the following:
![OSX Terminal](https://raw.githubusercontent.com/srobb1/pfb2017/master/images/terminal_screenshot.png)
You can open several Terminal windows at once. This is often helpful.
You will be using this application a lot, so I suggest that you drag the Terminal icon into the shortcuts bar at the bottom of your screen.
#### OK. I've Logged in. What Now?
The terminal window is running **shell** called "bash." The shell is a loop that:
1. Prints a prompt
2. Reads a line of input from the keyboard
3. Parses the line into one or more commands
4. Executes the commands (which usually print some output to the terminal)
5. Go back 1.
There are many different shells with bizarre names like **bash**, **sh**, **csh**, **tcsh**, **ksh**, and **zsh**. The "sh" part means shell. Each shell has different and somewhat confusing features. We have set up your accounts to use **bash**. Stay with **bash** and you'll get used to it, eventually.
#### Command-Line Prompt
Most of bioinformatics is done with command-line software, so you should take some time to learn to use the shell effectively.
This is a command-line prompt:
```
bush202>
```
This is another:
```
(~) 51%
```
This is another:
```
srobb@bush202 1:12PM>
```
What you get depends on how the system administrator has customized your login. You can customize it yourself when you know how.
The prompt tells you the shell is ready to accept a command. When a long-running command is going, the prompt will not reappear until the system is ready to deal with your next request.
#### Issuing Commands
Type in a command and press the <Enter> key. If the command has output, it will appear on the screen. Example:
```
(~) 53% ls -F
GNUstep/ cool_elegans.movies.txt man/
INBOX docs/ mtv/
INBOX~ etc/ nsmail/
Mail@ games/ pcod/
News/ get_this_book.txt projects/
axhome/ jcod/ public_html/
bin/ lib/ src/
build/ linux/ tmp/
ccod/
(~) 54%
```
The command here is `ls -F` which produces a listing of files and directories in the current directory (more on that later). Below its output, the command prompt appears again.
Some programs will take a long time to run. After you issue their command names, you won't recover the shell prompt until they're done. You can either launch a new shell (from Terminal's File menu), or run the command in the background by adding an ampersand after the command
```
(~) 54% long_running_application &
(~) 55%
```
> The command will now run in the background until it is finished. If it has any output, the output will be printed to the terminal window. You may wish to capture the output in a file (called redirection). We'll describe this later.
#### Command-Line Editing
Most shells offer command-line editing. Up until the comment you press <Enter>, you can go back over the command-line and edit it using the keyboard. Here are the most useful keystrokes:
- _Backspace_: Delete the previous character and back up one.
- _Left arrow_, right arrow: Move the text insertion point (cursor) one character to the left or right.
- _control-a (^a)_: Move the cursor to the beginning of the line. (Mnemonic: A is first letter of alphabet)
- _control-e (^e)_: Move the cursor to the end of the line. Mnemonic: E for the End (^z was already used for interrupt a command).
- _control-d (^d)_: Delete the character currently under the cursor. D=Delete.
- _control-k (^k)_: Delete the entire line from the cursor to the end. k=kill. The line isn't actually deleted, but put into a temporary holding place called the "kill buffer". This is like cutting text
- _control-y (^y)_: Paste the contents of the kill buffer onto the command-line starting at the cursor. y=yank. This is like paste.
- _Up arrow, down arrow_: Move up and down in the command history. This lets you reissue previous commands, possibly after modifying them.
There are also some useful shell commands you can issue:
- `history` Show all the commands that you have issued recently, nicely numbered.
- `!<number>` Reissue an old command, based on its number (which you can get from `history`).
- `!!` Reissue the immediate previous command.
- `!<partial command string>`: Reissue the previous command that began with the indicated letters. For example, `!l` (the letter el, not a number 1) would reissue the`ls -F` command from the earlier example.
**bash** offers automatic command completion and spelling correction. If you type part of a command and then the tab key, it will prompt you with all the possible completions of the command. For example:
```
(~) 51% fd<tab><tab>
(~) 51% fd
fd2ps fdesign fdformat fdlist fdmount fdmountd fdrawcmd fdumount
(~) 51%
```
> If you hit tab after typing a command, but before pressing \<Enter\>, **bash** will prompt you with a list of file names. This is because many commands operate on files.
#### Wildcards
You can use wildcards when referring to files. `*` stands for zero or more characters. `?` stands for any single character. For example, to list all files with the extension ".txt", run `ls` with the wildcard pattern "*.txt"
```
(~) 56% ls -F *.txt
final_exam_questions.txt genomics_problem.txt
genebridge.txt mapping_run.txt
```
There are several more advanced types of wildcard patterns that you can read about in the **tcsh** manual page. For example, if you want to match files that begin with the characters "f" or "g" and end with ".txt", you can use a range of characters inside square brackets `[f-g]` as part of the wildcard pattern. Here's an example
```
(~) 57% ls -F [f-g]*.txt
final_exam_questions.txt genebridge.txt genomics_problem.txt
```
#### Home Sweet Home
When you first log in, you'll be placed in a part of the system that is your personal directory, called the _home directory_. You are free to do with this area what you will: in particular you can create and delete files and other directories. In general, you cannot create files elsewhere in the system.
Your home directory lives somewhere in the filesystem. On our iMacs, it is a directory with the same name as your login name, located in `/Users`. The full directory path is therefore `/Users/username`. Since this is a pain to write, the shell allows you to abbreviate it as `~username` (where "username" is your user name), or simply as `~`. The weird character (called "tilde" or "twiddle") is usually hidden at the upper left corner of your keyboard.
To see what is in your home directory, issue the command `ls -F`:
```
(~) % ls -F
INBOX Mail/ News/ nsmail/ public_html/
```
This shows one file "INBOX" and four directories ("Mail", "News") and so on. (The `-F` in the command turns on fancy mode, which appends special characters to directory listings to tell you more about what you're seeing. `/` at the end of a filename means that file is a directory.)
In addition to the files and directories shown with `ls -F`, there may be one or more hidden files. These are files and directories whose names start with a `.` (called the "dot" character). To see these hidden files, add an `a` to the options sent to the `ls` command:
```
(~) % ls -aF
./ .cshrc .login Mail/
../ .fetchhost .netscape/ News/
.Xauthority .fvwmrc .xinitrc* nsmail/
.Xdefaults .history .xsession@ public_html/
.bash_profile .less .xsession-errors
.bashrc .lessrc INBOX
```
> Whoa! There's a lot of hidden stuff there. But don't go deleting dot files. Many of them are essential configuration files for commands and other programs. For example, the `.profile` file contains configuration information for the **bash** shell. You can peek into it and see all of **bash**'s many options. You can edit it (when you know what you're doing) in order to change things like the command prompt and command search path.
#### Getting Around
You can move around from directory to directory using the `cd` command. Give the name of the directory you want to move to, or give no name to move back to your home directory. Use the `pwd` command to see where you are (or rely on the prompt, if configured):
```
(~/docs/grad_course/i) 56% cd
(~) 57% cd /
(/) 58% ls -F
bin/ dosc/ gmon.out mnt/ sbin/
boot/ etc/ home@ net/ tmp/
cdrom/ fastboot lib/ proc/ usr/
dev/ floppy/ lost+found/ root/ var/
(/) 59% cd ~/docs/
(~/docs) 60% pwd
/usr/home/lstein/docs
(~/docs) 62% cd ../projects/
(~/projects) 63% ls
Ace-browser/ bass.patch
Ace-perl/ cgi/
Foo/ cgi3/
Interface/ computertalk/
Net-Interface-0.02/ crypt-cbc.patch
Net-Interface-0.02.tar.gz fixer/
Pts/ fixer.tcsh
Pts.bak/ introspect.pl*
PubMed/ introspection.pm
SNPdb/ rhmap/
Tie-DBI/ sbox/
ace/ sbox-1.00/
atir/ sbox-1.00.tgz
bass-1.30a/ zhmapper.tar.gz
bass-1.30a.tar.gz
(~/projects) 64%
```
> Each directory contains two special hidden directories named `.` and `..`. The first, `.` refers always to the current directory. `..` refers to the parent directory. This lets you move upward in the directory hierarchy like this:
```
(~/docs) 64% cd ..
```
and to do arbitrarily weird things like this:
```
(~/docs) 65% cd ../../lstein/docs
```
> The latter command moves upward two levels, and then into a directory named `docs` inside a directory called `lstein`.
If you get lost, the `pwd` command prints out the full path to the current directory:
```
(~) 56% pwd
/Users/lstein
```
#### Essential Unix Commands
With the exception of a few commands that are built directly into the shell, all Unix commands are standalone executable programs. When you type the name of a command, the shell will search through all the directories listed in the PATH environment variable for an executable of the same name. If found, the shell will execute the command. Otherwise, it will give a "command not found" error.
Most commands live in `/bin`, `/usr/bin`, or `/usr/local/bin`.
#### Getting Information About Commands
The `man` command will give a brief synopsis of a command. Let's get information about the command `wc`
```
(~) 76% man wc
Formatting page, please wait...
WC(1) WC(1)
NAME
wc - print the number of bytes, words, and lines in files
SYNOPSIS
wc [-clw] [--bytes] [--chars] [--lines] [--words] [--help]
[--version] [file...]
DESCRIPTION
This manual page documents the GNU version of wc. wc
counts the number of bytes, whitespace-separated words,
...
```
#### Finding Out What Commands are on Your Computer
The `apropos` command will search for commands matching a keyword or phrase. Here's an example that looks for commands related to 'column'
```
(~) 100% apropos column
showtable (1) - Show data in nicely formatted columns
colrm (1) - remove columns from a file
column (1) - columnate lists
fix132x43 (1) - fix problems with certain (132 column) graphics
modes
```
#### Arguments and Command Line Switches
Many commands take arguments. Arguments are often the names of one or more files to operate on. Most commands also take command-line "switches" or "options", which fine-tune what the command does. Some commands recognize "short switches" that consist of a minus sign `-` followed by a single character, while others recognize "long switches" consisting of two minus signs `--` followed by a whole word.
The `wc` (word count) program is an example of a command that recognizes both long and short options. You can pass it the `-c`, `-w` and/or `-l` options to count the characters, words and lines in a text file, respectively. Or you can use the longer but more readable, `--chars`, `--words` or `--lines` options. Both these examples count the number of characters and lines in the text file `/var/log/messages`:
```
(~) 102% wc -c -l /var/log/messages
23 941 /var/log/messages
(~) 103% wc --chars --lines /var/log/messages
23 941 /var/log/messages
```
You can cluster short switches by concatenating them together, as shown in this example:
```
(~) 104% wc -cl /var/log/messages
23 941 /var/log/messages
```
Many commands will give a brief usage summary when you call them with the `-h` or `--help` switch.
#### Spaces and Funny Characters
The shell uses whitespace (spaces, tabs and other non-printing characters) to separate arguments. If you want to embed whitespace in an argument, put single quotes around it. For example:
```
mail -s 'An important message' 'Bob Ghost <[email protected]>'
```
This will send an e-mail to the fictitious person Bob Ghost. The `-s` switch takes an argument, which is the subject line for the e-mail. Because the desired subject contains spaces, it has to have quotes around it. Likewise, my name and e-mail address, which contains embedded spaces, must also be quoted in this way.
Certain special non-printing characters have _escape codes_ associated with them:
| Escape Code | Description |
| ----------- | ---------------------------------------- |
| \\n | new line character |
| \\t | tab character |
| \\r | carriage return character |
| \\a | bell character (ding! ding!) |
| \\nnn | the character whose ASCII code is **nnn** |
#### Useful Commands
Here are some commands that are used extremely frequently. Use `man` to learn more about them. Some of these commands may be useful for solving the problem set ;-)
#### Manipulating Directories
| Command | Description |
| ------- | ---------------------------------------- |
| `ls` | Directory listing. Most frequently used as `ls -F` (decorated listing), `ls -l` (long listing), `ls -a` (list all files). |
| `mv` | Rename or move a file or directory. |
| `cp` | Copy a file. |
| `rm` | Remove (delete) a file. |
| `mkdir` | Make a directory |
| `rmdir` | Remove a directory |
| `ln` | Create a symbolic or hard link. |
| `chmod` | Change the permissions of a file or directory. |
| Command | Description |
| ----------------- | ---------------------------------------- |
| `cat` | Concatenate program. Can be used to concatenate multiple files together into a single file, or, much more frequently, to view the contents of a file or files in the terminal. |
| `echo` | print a copy of some text to the screen. E.g. `echo 'Hello World!'` |
| `more` | Scroll through a file page by page. Very useful when viewing large files. Works even with files that are too big to be opened by a text editor. |
| `less` | A version of `more` with more features. |
| `head` | View the first few lines of a file. You can control how many lines to view. |
| `tail` | View the end of a file. You can control how many lines to view. You can also use `tail -f` to view a file that you are writing to. |
| `wc` | Count words, lines and/or characters in one or more files. |
| `tr` | Substitute one character for another. Also useful for deleting characters. |
| `sort` | Sort the lines in a file alphabetically or numerically. |
| `uniq` | Remove duplicated lines in a file. |
| `cut` | Remove columns from each line of a file or files. |
| `fold` | Wrap each input line to fit in a specified width. |
| `grep` | Filter a file for lines matching a specified pattern. Can also be reversed to print out lines that don't match the specified pattern. |
| `gzip` (`gunzip`) | Compress (uncompress) a file. |
| `tar` | Archive or unarchive an entire directory into a single file. |
| `emacs` | Run the Emacs text editor (good for experts). |
| `vi` | Run the vi text editor (better for experts). |
#### Networking
| Command | Description |
| ---------------------- | ---------------------------------------- |
| `ssh` | A secure (encrypted) way to log into machines. |
| `scp` | A secure way to copy (cp) files to and from remote machines. |
| `ping` | See if a remote host is up. |
| `ftp`/ `sftp` (secure) | transfer files using the File Transfer Protocol. |
#### Standard I/O and Redirection
Unix commands communicate via the command-line interface. They can print information out to the terminal for you to see, and accept input from the keyboard (that is, from _you_!)
Every Unix program starts out with three connections to the outside world. These connections are called "streams", because they act like a stream of information (metaphorically speaking):
| Stream Type | Description |
| --------------- | ---------------------------------------- |
| standard input | This is a communications stream initially attached to the keyboard. When the program reads from standard input, it reads whatever text you type in. |
| standard output | This stream is initially attached to the terminal. Anything the program prints to this channel appears in your terminal window. |
| standard error | This stream is also initially attached to the terminal. It is a separate channel intended for printing error messages. |
The word "initially" might lead you to think that standard input, output and error can somehow be detached from their starting places and reattached somewhere else. And you'd be right. You can attach
one or more of these three streams to a file, a device, or even to another program. This sounds esoteric, but it is actually very useful.
#### A Simple Example
The `wc` program counts lines, characters and words in data sent to its standard input. You can use it interactively like this:
```
(~) 62% wc
Mary had a little lamb,
little lamb,
little lamb.
Mary had a little lamb,
whose fleece was white as snow.
^D
6 20 107
```
In this example, I ran the `wc` program. It waited for me to type in a little poem. When I was done, I typed the END-OF-FILE character, control-d (^d for short). `wc` then printed out three numbers indicating the number of lines, words and characters in the input.
More often, you'll want to count the number of lines in a big file; say a file filled with DNA sequences. You can do this by _redirecting_ the contents of a file to the standard input of `wc`. This uses
the `<` symbol:
```
(~) 63% wc < big_file.fasta
2943 2998 419272
```
If you wanted to record these counts for posterity, you could redirect standard output as well using the `>` symbol:
```
(~) 64% wc < big_file.fasta > count.txt
```
Now if you `cat` the file _count.txt_, you'll see that the data has been recorded. `cat` works by taking its standard input and copying it to standard output. We redirect standard input from the _count.txt_ file, and leave standard output at its default, attached to the terminal:
```
(~) 65% cat < count.txt
2943 2998 419272
```
#### Redirection Meta-Characters
Here's the complete list of redirection commands for `bash`:
| Redirect command | Description |
| ------------------- | ---------------------------------------- |
| `< myfile.txt` | Redirect the contents of the file to standard input |
| `> myfile.txt` | Redirect standard output to file |
| `>> logfile.txt` | Append standard output to the end of the file |
| `1 > myfile.txt` | Redirect just standard output to file (same as above) |
| `2 > myfile.txt` | Redirect just standard error to file |
| `> myfile.txt 2>&1` | Redirect both stdout and stderr to file |
These can be combined. For example, this command redirects standard input from the file named `/etc/passwd`, writes its results into the file `search.out`, and writes its error messages (if any) into a file named `search.err`. What does it do? It searches the password file for a user named "root" and returns all lines that refer to that user.
```
(~) 66% grep root < /etc/passwd > search.out 2> search.err
```
#### Filters, Filenames and Standard Input
Many Unix commands act as filters, taking data from a file or standard input, transforming the data, and writing the results to standard output. Most filters are designed so that if they are called with one or more filenames on the command-line, they will use those files as input. Otherwise they will act on standard input. For example, these two commands are equivalent:
```
(~) 66% grep 'gatttgc' < big_file.fasta
(~) 67% grep 'gatttgc' big_file.fasta
```
Both commands use the `grep` command to search for the string "gatttgc" in the file `big_file.fasta`. The first one searches standard input, which happens to be redirected from the file. The second command is explicitly given the name of the file on the command line.
Sometimes you want a filter to act on a series of files, one of which happens to be standard input. Many commands let you use `-` on the command-line as an alias for standard input. Example:
```
(~) 68% grep 'gatttgc' big_file.fasta bigger_file.fasta -
```
This example searches for "gatttgc" in three places. First it looks in file `big_file.fasta`, then in `bigger_file.fasta`, and lastly in standard input (which, since it isn't redirected, will come from the keyboard).
#### Standard I/O and Pipes
The coolest thing about the Unix shell is its ability to chain commands together into pipelines. Here's an example:
```
(~) 65% grep gatttgc big_file.fasta | wc -l
22
```
There are two commands here. `grep` searches a file or standard input for lines containing a particular string. Lines which contain the string are printed to standard output. `wc -l` is the familiar word count program, which counts words, lines and characters in a file or standard input. The `-l` command-line option instructs `wc` to print out just the line count. The `|` character, which is known as a "pipe", connects the two commands together so that the standard output of `grep` becomes the standard input of `wc`. Think of pipes connecting streams of data flowing.
What does this pipe do? It prints out the number of lines in which the string "gatttgc" appears in the file `big_file.fasta`.
#### More Pipe Idioms
Pipes are very powerful. Here are some common command-line idioms.
**Count the Number of Times a Pattern does NOT Appear in a File**
The example at the top of this section showed you how to count the number of lines in which a particular string pattern appears in a file. What if you want to count the number of lines in which a pattern does **not** appear?
Simple. Reverse the test with the `-v` switch:
```
(~) 65% grep -v gatttgc big_file.fasta | wc -l
2921
```
**Uniquify Lines in a File**
If you have a long list of names in a text file, and you want to weed out the duplicates:
```
(~) 66% sort long_file.txt | uniq > unique.out
```
This works by sorting all the lines alphabetically and piping the result to the `uniq` program, which removes duplicate lines that occur one after another. That's why you need to sort first. The output is placed in a file named `unique.out`.
**Concatenate Several Lists and Remove Duplicates**
If you have several lists that might contain repeated entries among them, you can combine them into a single unique list by concatenating them together, then sorting and uniquifying them as before:
```
(~) 67% cat file1 file2 file3 file4 | sort | uniq
```
**Count Unique Lines in a File**
If you just want to know how many unique lines there are in the file, add a `wc` to the end of the pipe:
```
(~) 68% sort long_file.txt | uniq | wc -l
```
**Page Through a Really Long Directory Listing**
Pipe the output of `ls` to the `more` program, which shows a page at a time. If you have it, the `less` program is even better:
```
(~) 69% ls -l | more
```
**Monitor a Growing File for a Pattern**
Pipe the output of `tail -f` (which monitors a growing file and prints out the new lines) to `grep`. For example, this will monitor the `/var/log/syslog`file for the appearance of e-mails addressed to 'mzhang':
```
(~) 70% tail -f /var/log/syslog | grep mzhang
```
### Advanced Unix
Here are a few more advanced Unix commands that are very useful and when you have time you should investigate further. We list the page numbers in the Internet Version (v3) of 'The Linux Command Line' by William Shotts.
- `awk`
- `sed` (p.295)
- `perl` one-liners
- `for` loops (p. 453)
---
### [Link to Unix 1 Problem Set](https://github.com/srobb1/pfb2017/blob/master/problemsets/problemsets/Unix_01_problemset.md)
---
# Unix 2
### Text Editors
It is often necessary to create and write to a file while using the terminal. This makes it essential to use a terminal text editor. There are many text editors out there. Some of our favorite are Emacs and vim. We are going to start you out with a simple text editor called `nano`
The way you use nano to create a file is simply by typing the command _nano_ followed by the name of the file you wish to create.
```
(~) 71% nano firstFile.txt
```
This is what you will see:
![Create a new file with nano.](https://raw.githubusercontent.com/srobb1/pfb2017/master/images/nano_new.png)
![Modified and not saced. In the top right corner it says "Modified"](https://raw.githubusercontent.com/srobb1/pfb2017/master/images/nano-modifided.png)
Things to notice:
- At the top
- the name of the program (nano) and it's version number
- the name of the file you’re editing
- and whether the file has been modified since it was last saved.
- In the middle
- you will see either a blank area or text you have typed
- At the bottom
- A listing of keyboard commands such as Save (control + o) and Exit (control + x)
Keyboard commands are the only way to interact with the editor. You cannot use your mouse or trackpad.
Find more commands by using `control g`:
![The help menu displays a listing of useful commands.](https://raw.githubusercontent.com/srobb1/pfb2017/master/images/nano-help.png)
The Meta key is \<esc\>. To use the Meta+key, hit \<esc\>, release, then hit the following key
Helpful commands:
- Jump to a specific line:
- control + _ then line number
- Copy a block of highlighted text
- control + ^ then move your cursor to start to highlight a block for copying
- Meta + ^ to end your highlight block
- Paste
- control + u
Nano is a beginners text exitor. vi and Emacs are better choices once you become a bit more comfortable using the terminal. These editors do cool stuff like syntax highlighting.
## Git for Beginners
Git is a tool for managing files and versions of files. It is a _Version Control System_. It allows you to keep track of changes. You are going to be using Git to manage your course work and keep your copy of the lecture notes and files up to date. Git can help you do very complex task with files. We are going to keep it simple.
### The Big Picture.
A Version Control System is good for Collaborations, Storing Versions, Restoring Previous Versions, and Managing Backups.
#### Collaboration
Using a Version Control System makes it possible to edit a document with others without the fear of overwritting someone's changes, even if more than one person is working on the same part of the document. All the changes can be merged into one document. These documents are all stored one place.
#### Storing Versions
A Version Control System allows you to save versions of your files and to attach notes to each version. Each save will contain information about the lines that were added or alted.
#### Restoring Previous Versions
Since you are keeping track of versions, it is possible to revert all the files in a project or just one file to a previous version.
#### Backup
A Version Control System makes it so that you work locally and sync your work remotely. This means you will have a copy of your project on your computer and the Version Control System Server you are using.
#### The Details
git is the Version Control System we will be using for tracking changes in our files.
[GitHub](https://github.com/) is the Version Control System Server we will be using. They provide free account for all public projects.
### The Basics
#### Creating a new repository
A repository is a project that contains all of the project files, and stores each file's revision history. Repositories can have multiple collaborators. Repositories usually have two components, one remote and one local.
Let's Do It!
Follow Steps 1 and 2 to create the remote repository. Follow Step 3 to create your local repository and link it to the remote.
1. Navigate to GitHub --> Create Account / Log In --> Go To Repositories --> Click 'New'
![To create a new repository click the 'New' Button in the top right cornor.](https://raw.githubusercontent.com/srobb1/pfb2017/master/images/github-newRepoButton.png)
2. Add a name (i.e., PFB2017_problemsets) and a description (i.e., Solutions for PFB2017 Problem Sets) and click "Create Repository"
![Fill in the form and click the 'Create Repository Button'](https://raw.githubusercontent.com/srobb1/pfb2017/master/images/github-newRepoForm.png)
3. Create a directory on your computer and follow the instructions provided.
![Create a directory on your computer and follow these instructions.](https://raw.githubusercontent.com/srobb1/pfb2017/master/images/github-newRepoInstructions.png)
- Open your terminal and navigate to the location you want to put a directory for your problem sets
- Create a new directory directory (i.e., PFB2017_problemsets)
- Follow the instructions provided when you created your repository. These are my instructions, yours will be differnt.
```
echo "# PFB2017_problemsets" >> README.md
git init
git add README.md
git commit -m "first commit"
git remote add origin https://github.com/srobb1/PFB2017_problemsets.git
git push -u origin master
```
You now have a repository!
Let's back up a bit and talk more about git and about these commands. For basic git use, these are almost all the command you will need to know.
Every git repository has three main elements called _trees_:
1. _The Working Directory_ contains your files
2. _The Index_ is the staging area
3. _The HEAD_ points to the last commit you made.
> There are a few new words here, we will explain them as we go
| command | description |
| --------------------------------------- | ---------------------------------------- |
| `git init` | Creates your new local repository with the three trees on (local machine) |
| `git remote add remote-name URL` | Links your local repository to a remote repository that is often named _origin_ and is found at the given URL. |
| `git add filename` | Propose changes and add file(s) with changes to the index or staging area (local machine) |