Skip to content

Commit c751d35

Browse files
committed
various fixes so examples run
1 parent ba283bd commit c751d35

File tree

5 files changed

+72
-57
lines changed

5 files changed

+72
-57
lines changed

intro.html

+28-22
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,8 @@ <h1 id="outline">Outline</h1>
9090
<li>Basic use of standard software: Python and R
9191
<ul>
9292
<li>Jupyter notebooks</li>
93+
<li>Parallelization in Python with ipyparallel</li>
94+
<li>Parallelization in R with foreach</li>
9395
<li>Dask for parallelization in Python</li>
9496
</ul></li>
9597
<li>More information
@@ -209,7 +211,7 @@ <h1 id="software-modules">Software modules</h1>
209211
<pre><code>module list # what&#39;s loaded?
210212
module avail # what&#39;s available</code></pre>
211213
<p>One thing that tricks people is that the modules are arranged in a hierarchical (nested) fashion, so you only see some of the modules as being available <em>after</em> you load the parent module (e.g., MKL, FFT, and HDF5/NetCDF software is nested within the gcc module). Here's how we see and load MPI.</p>
212-
<pre><code>module load openmpi
214+
<pre><code>module load openmpi # this fails if gcc not yet loaded
213215
module load gcc
214216
module avail
215217
module load openmpi</code></pre>
@@ -221,19 +223,22 @@ <h1 id="submitting-jobs-accounts-and-partitions">Submitting jobs: accounts and p
221223
<pre><code>sacctmgr -p show associations user=SAVIO_USERNAME</code></pre>
222224
<p>Here's an example of the output for a user who has access to an FCA, a condo, and a special partner account:</p>
223225
<pre><code>Cluster|Account|User|Partition|Share|GrpJobs|GrpTRES|GrpSubmit|GrpWall|GrpTRESMins|MaxJobs|MaxTRES|MaxTRESPerNode|MaxSubmit|MaxWall|MaxTRESMins|QOS|Def QOS|GrpTRESRunMins|
224-
brc|co_stat|paciorek|savio2_gpu|1||||||||||||savio_lowprio|savio_lowprio||
226+
brc|co_stat|paciorek|savio2_1080ti|1||||||||||||savio_lowprio|savio_lowprio||
227+
brc|co_stat|paciorek|savio2_knl|1||||||||||||savio_lowprio|savio_lowprio||
228+
brc|co_stat|paciorek|savio2_bigmem|1||||||||||||savio_lowprio|savio_lowprio||
229+
brc|co_stat|paciorek|savio2_gpu|1||||||||||||savio_lowprio,stat_gpu2_normal|stat_gpu2_normal||
225230
brc|co_stat|paciorek|savio2_htc|1||||||||||||savio_lowprio|savio_lowprio||
226231
brc|co_stat|paciorek|savio|1||||||||||||savio_lowprio|savio_lowprio||
227232
brc|co_stat|paciorek|savio_bigmem|1||||||||||||savio_lowprio|savio_lowprio||
228-
brc|co_stat|paciorek|savio2|1||||||||||||savio_lowprio,stat_normal|stat_normal||
233+
brc|co_stat|paciorek|savio2|1||||||||||||savio_lowprio,stat_savio2_normal|stat_savio2_normal||
234+
brc|fc_paciorek|paciorek|savio2_1080ti|1||||||||||||savio_debug,savio_normal|savio_normal||
235+
brc|fc_paciorek|paciorek|savio2_knl|1||||||||||||savio_debug,savio_normal|savio_normal||
236+
brc|fc_paciorek|paciorek|savio2_gpu|1||||||||||||savio_debug,savio_normal|savio_normal||
237+
brc|fc_paciorek|paciorek|savio2_htc|1||||||||||||savio_debug,savio_long,savio_normal|savio_normal||
238+
brc|fc_paciorek|paciorek|savio2_bigmem|1||||||||||||savio_debug,savio_normal|savio_normal||
229239
brc|fc_paciorek|paciorek|savio2|1||||||||||||savio_debug,savio_normal|savio_normal||
230240
brc|fc_paciorek|paciorek|savio|1||||||||||||savio_debug,savio_normal|savio_normal||
231-
brc|fc_paciorek|paciorek|savio_bigmem|1||||||||||||savio_debug,savio_normal|savio_normal||
232-
brc|ac_scsguest|paciorek|savio2_htc|1||||||||||||savio_debug,savio_normal|savio_normal||
233-
brc|ac_scsguest|paciorek|savio2_gpu|1||||||||||||savio_debug,savio_normal|savio_normal||
234-
brc|ac_scsguest|paciorek|savio2|1||||||||||||savio_debug,savio_normal|savio_normal||
235-
brc|ac_scsguest|paciorek|savio_bigmem|1||||||||||||savio_debug,savio_normal|savio_normal||
236-
brc|ac_scsguest|paciorek|savio|1||||||||||||savio_debug,savio_normal|savio_normal||</code></pre>
241+
brc|fc_paciorek|paciorek|savio_bigmem|1||||||||||||savio_debug,savio_normal|savio_normal||</code></pre>
237242
<p>If you are part of a condo, you'll notice that you have <em>low-priority</em> access to certain partitions. For example I am part of the statistics condo <em>co_stat</em>, which owns some Savio2 nodes and Savio2_gpu and therefore I have normal access to those, but I can also burst beyond the condo and use other partitions at low-priority (see below).</p>
238243
<p>In contrast, through my FCA, I have access to the savio, savio2, and big memory partitions.</p>
239244
<h1 id="submitting-a-batch-job">Submitting a batch job</h1>
@@ -265,16 +270,16 @@ <h1 id="submitting-a-batch-job">Submitting a batch job</h1>
265270
<pre><code>sacct -j &lt;JOB_ID&gt; --format=JobID,JobName,MaxRSS,Elapsed</code></pre>
266271
<p>MaxRSS will show the maximum amount of memory that the job used in kilobytes.</p>
267272
<p>You can also login to the node where you are running and use commands like <em>top</em> and <em>ps</em>:</p>
268-
<p>``` srun --jobid=<JOB_ID> --pty /bin/bash</p>
269-
<p>Note that except for the <em>savio2_htc</em> and <em>savio2_gpu</em> partitions, all jobs are given exclusive access to the entire node or nodes assigned to the job (and your account is charged for all of the cores on the node(s).</p>
273+
<pre><code>srun --jobid=&lt;JOB_ID&gt; --pty /bin/bash</code></pre>
274+
<p>Note that except for the <em>savio2_htc</em> and <em>savio2_gpu</em> partitions, all jobs are given exclusive access to the entire node or nodes assigned to the job (and your account is charged for all of the cores on the node(s)).</p>
270275
<h1 id="parallel-job-submission">Parallel job submission</h1>
271276
<p>If you are submitting a job that uses multiple nodes, you'll need to carefully specify the resources you need. The key flags for use in your job script are:</p>
272277
<ul>
273278
<li><code>--nodes</code> (or <code>-N</code>): indicates the number of nodes to use</li>
274279
<li><code>--ntasks-per-node</code>: indicates the number of tasks (i.e., processes) one wants to run on each node</li>
275280
<li><code>--cpus-per-task</code> (or <code>-c</code>): indicates the number of cpus to be used for each task</li>
276281
</ul>
277-
<p>In addition, in some cases it can make sense to use the <code>--ntasks</code> (or <code>-n</code>) option to indicate the total number of tasks and let the scheduler determine how many nodes and tasks per node are needed. In general <code>--cpus-per-task</code> will be 1 except when running threaded code.</p>
282+
<p>In addition, in some cases it can make sense to use the <code>--ntasks</code> (or <code>-n</code>) option to indicate the total number of tasks and let the scheduler determine how many nodes and tasks per node are needed. In general <code>--cpus-per-task</code> will be one except when running threaded code.</p>
278283
<p>Here's an example job script for a job that uses MPI for parallelizing over multiple nodes:</p>
279284
<div class="sourceCode"><pre class="sourceCode bash"><code class="sourceCode bash"><span class="co">#!/bin/bash</span>
280285
<span class="co"># Job name:</span>
@@ -340,7 +345,7 @@ <h1 id="alternatives-to-the-htc-partition-for-collections-of-serial-jobs">Altern
340345
<li>using <a href="https://github.com/berkeley-scf/tutorial-parallel-basics">single-node parallelism</a> and <a href="https://github.com/berkeley-scf/tutorial-parallel-distributed">multiple-node parallelism</a> in Python, R, and MATLAB
341346
<ul>
342347
<li>parallel R tools such as <em>foreach</em>, <em>parLapply</em>, and <em>mclapply</em></li>
343-
<li>parallel Python tools such as <em>IPython parallel</em>, and <em>Dask</em></li>
348+
<li>parallel Python tools such as <em>ipyparallel</em>, and <em>Dask</em></li>
344349
<li>parallel functionality in MATLAB through <em>parfor</em></li>
345350
</ul></li>
346351
</ul>
@@ -356,7 +361,7 @@ <h1 id="monitoring-jobs-and-the-job-queue">Monitoring jobs and the job queue</h1
356361
<pre><code>scancel YOUR_JOB_ID</code></pre>
357362
<p>For more information on cores, QoS, and additional (e.g., GPU) resources, here's some syntax:</p>
358363
<pre><code>squeue -o &quot;%.7i %.12P %.20j %.8u %.2t %.9M %.5C %.8r %.3D %.20R %.8p %.20q %b&quot; </code></pre>
359-
<p>We provide some <a href="http://research-it.berkeley.edu/services/high-performance-computing/tips-using-brc-savio-cluster">tips about monitoring your job</a>.</p>
364+
<p>We provide some <a href="http://research-it.berkeley.edu/services/high-performance-computing/running-your-jobs">tips about monitoring your jobs</a>. (Scroll down to the &quot;Monitoring jobs&quot; section.)</p>
360365
<h1 id="example-use-of-standard-software-ipython-and-r-notebooks-through-jupyterhub">Example use of standard software: IPython and R notebooks through JupyterHub</h1>
361366
<p>Savio allows one to <a href="http://research-it.berkeley.edu/services/high-performance-computing/using-jupyter-notebooks-and-jupyterhub-savio">run Jupyter-based notebooks via a browser-based service called Jupyterhub</a>.</p>
362367
<p>Let's see a brief demo of an IPython notebook:</p>
@@ -370,7 +375,7 @@ <h1 id="example-use-of-standard-software-ipython-and-r-notebooks-through-jupyter
370375
<h1 id="example-use-of-standard-software-python">Example use of standard software: Python</h1>
371376
<p>Let's see a basic example of doing an analysis in Python across multiple cores on multiple nodes. We'll use the airline departure data in <em>bayArea.csv</em>.</p>
372377
<p>Here we'll use <em>IPython</em> for parallel computing. The example is a bit contrived in that a lot of the time is spent moving data around rather than doing computation, but it should illustrate how to do a few things.</p>
373-
<p>First we'll install a Python package not already available as a module.</p>
378+
<p>First we'll install a Python package (pretending it is not already available via the basic python/3.6 module).</p>
374379
<pre><code>cp bayArea.csv /global/scratch/paciorek/. # remember to do I/O off scratch
375380
# install Python package
376381
module unload python
@@ -384,6 +389,7 @@ <h1 id="example-use-of-standard-software-python">Example use of standard softwar
384389
sleep 10
385390
srun ipengine &amp;
386391
sleep 20 # wait until all engines have successfully started
392+
cd /global/scratch/paciorek
387393
ipython</code></pre>
388394
<p>If we were doing this on a single node, we could start everything up in a single call to <em>ipcluster</em>:</p>
389395
<pre><code>module load python/3.6
@@ -402,7 +408,7 @@ <h1 id="example-use-of-standard-software-python">Example use of standard softwar
402408
lview.block = True
403409

404410
import pandas
405-
dat = pandas.read_csv(&#39;bayArea.csv&#39;, header = None)
411+
dat = pandas.read_csv(&#39;bayArea.csv&#39;, header = None, encoding = &#39;latin1&#39;)
406412
dat.columns = (&#39;Year&#39;,&#39;Month&#39;,&#39;DayofMonth&#39;,&#39;DayOfWeek&#39;,&#39;DepTime&#39;,
407413
&#39;CRSDepTime&#39;,&#39;ArrTime&#39;,&#39;CRSArrTime&#39;,&#39;UniqueCarrier&#39;,&#39;FlightNum&#39;,
408414
&#39;TailNum&#39;,&#39;ActualElapsedTime&#39;,&#39;CRSElapsedTime&#39;,&#39;AirTime&#39;,&#39;ArrDelay&#39;,
@@ -442,13 +448,13 @@ <h1 id="example-use-of-standard-software-r">Example use of standard software: R<
442448
<div class="sourceCode"><pre class="sourceCode bash"><code class="sourceCode bash"><span class="co"># remember to do I/O off scratch</span>
443449
<span class="kw">cp</span> bayArea.csv /global/scratch/paciorek/.
444450

445-
<span class="kw">srun</span> -A co_stat -p savio2 --nodes=3 --ntasks-per-node=24 -t 30:0 --pty bash
446-
<span class="kw">module</span> load gcc openmpi r/3.4.2 r-packages
451+
<span class="kw">srun</span> -A co_stat -p savio2 --nodes=2 --ntasks-per-node=24 -t 30:0 --pty bash
452+
<span class="kw">module</span> load r/3.4.2 r-packages
447453
<span class="kw">mpirun</span> R CMD BATCH --no-save parallel-multi.R parallel-multi.Rout <span class="kw">&amp;</span></code></pre></div>
448454
<p>Now here's the R code (see <em>parallel-multi.R</em>) we're running:</p>
449455
<pre><code>library(doMPI)
450456

451-
cl = startMPIcluster() # by default will start one fewer slave
457+
cl = startMPIcluster() # by default will start one fewer slave, using one for master
452458
registerDoMPI(cl)
453459
clusterSize(cl) # just to check
454460

@@ -505,17 +511,17 @@ <h1 id="how-to-get-additional-help">How to get additional help</h1>
505511
<li>For questions about computing resources in general, including cloud computing:
506512
<ul>
507513
508-
<li>office hours: Tues. 10:00 - 12:00, Wed. 1:30-3:30, Thur. 9:30-11:30 here in AIS</li>
514+
<li>office hours: Tues. 10:00-12:00, Wed. 1:30-3:30, Thur. 9:30-11:30 here in AIS</li>
509515
</ul></li>
510516
<li>For questions about data management (including HIPAA-protected data):
511517
<ul>
512518
513-
<li>office hours: Tues. 10:00 - 12:00, Wed. 1:30-3:30, Thur. 9:30-11:30 here in AIS</li>
519+
<li>office hours: Tues. 10:00-12:00, Wed. 1:30-3:30, Thur. 9:30-11:30 here in AIS</li>
514520
</ul></li>
515521
</ul>
516522
<h1 id="upcoming-events">Upcoming events</h1>
517523
<ul>
518-
<li><a href="http://research-it.berkeley.edu/services/cloud-computing-support/cloud-working-group">Savio installation workshop</a>, October XX.</li>
524+
<li>Savio hands-on installation workshop, mid-late October or early November.</li>
519525
</ul>
520526
</body>
521527
</html>

intro.md

+14-11
Original file line numberDiff line numberDiff line change
@@ -329,8 +329,10 @@ You can also login to the node where you are running and use commands like *top*
329329

330330
```
331331
srun --jobid=<JOB_ID> --pty /bin/bash
332+
```
333+
334+
Note that except for the *savio2_htc* and *savio2_gpu* partitions, all jobs are given exclusive access to the entire node or nodes assigned to the job (and your account is charged for all of the cores on the node(s)).
332335

333-
Note that except for the *savio2_htc* and *savio2_gpu* partitions, all jobs are given exclusive access to the entire node or nodes assigned to the job (and your account is charged for all of the cores on the node(s).
334336

335337
# Parallel job submission
336338

@@ -340,7 +342,7 @@ If you are submitting a job that uses multiple nodes, you'll need to carefully s
340342
- `--ntasks-per-node`: indicates the number of tasks (i.e., processes) one wants to run on each node
341343
- `--cpus-per-task` (or `-c`): indicates the number of cpus to be used for each task
342344

343-
In addition, in some cases it can make sense to use the `--ntasks` (or `-n`) option to indicate the total number of tasks and let the scheduler determine how many nodes and tasks per node are needed. In general `--cpus-per-task` will be 1 except when running threaded code.
345+
In addition, in some cases it can make sense to use the `--ntasks` (or `-n`) option to indicate the total number of tasks and let the scheduler determine how many nodes and tasks per node are needed. In general `--cpus-per-task` will be one except when running threaded code.
344346

345347
Here's an example job script for a job that uses MPI for parallelizing over multiple nodes:
346348

@@ -437,7 +439,7 @@ Here are some options:
437439
- using [Savio's HT Helper tool](http://research-it.berkeley.edu/services/high-performance-computing/user-guide/hthelper-script) to run many computational tasks (e.g., thousands of simulations, scanning tens of thousands of parameter values, etc.) as part of single Savio job submission
438440
- using [single-node parallelism](https://github.com/berkeley-scf/tutorial-parallel-basics) and [multiple-node parallelism](https://github.com/berkeley-scf/tutorial-parallel-distributed) in Python, R, and MATLAB
439441
- parallel R tools such as *foreach*, *parLapply*, and *mclapply*
440-
- parallel Python tools such as *IPython parallel*, and *Dask*
442+
- parallel Python tools such as *ipyparallel*, and *Dask*
441443
- parallel functionality in MATLAB through *parfor*
442444

443445
# Monitoring jobs and the job queue
@@ -465,7 +467,7 @@ For more information on cores, QoS, and additional (e.g., GPU) resources, here's
465467
squeue -o "%.7i %.12P %.20j %.8u %.2t %.9M %.5C %.8r %.3D %.20R %.8p %.20q %b"
466468
```
467469

468-
We provide some [tips about monitoring your job](http://research-it.berkeley.edu/services/high-performance-computing/tips-using-brc-savio-cluster).
470+
We provide some [tips about monitoring your jobs](http://research-it.berkeley.edu/services/high-performance-computing/running-your-jobs). (Scroll down to the "Monitoring jobs" section.)
469471

470472
# Example use of standard software: IPython and R notebooks through JupyterHub
471473

@@ -486,7 +488,7 @@ Let's see a basic example of doing an analysis in Python across multiple cores o
486488

487489
Here we'll use *IPython* for parallel computing. The example is a bit contrived in that a lot of the time is spent moving data around rather than doing computation, but it should illustrate how to do a few things.
488490

489-
First we'll install a Python package not already available as a module.
491+
First we'll install a Python package (pretending it is not already available via the basic python/3.6 module).
490492

491493
```
492494
cp bayArea.csv /global/scratch/paciorek/. # remember to do I/O off scratch
@@ -510,6 +512,7 @@ ipcontroller --ip='*' &
510512
sleep 10
511513
srun ipengine &
512514
sleep 20 # wait until all engines have successfully started
515+
cd /global/scratch/paciorek
513516
ipython
514517
```
515518

@@ -536,7 +539,7 @@ lview = c.load_balanced_view()
536539
lview.block = True
537540
538541
import pandas
539-
dat = pandas.read_csv('bayArea.csv', header = None)
542+
dat = pandas.read_csv('bayArea.csv', header = None, encoding = 'latin1')
540543
dat.columns = ('Year','Month','DayofMonth','DayOfWeek','DepTime',
541544
'CRSDepTime','ArrTime','CRSArrTime','UniqueCarrier','FlightNum',
542545
'TailNum','ActualElapsedTime','CRSElapsedTime','AirTime','ArrDelay',
@@ -586,8 +589,8 @@ We'll do this interactively though often this sort of thing would be done via a
586589
# remember to do I/O off scratch
587590
cp bayArea.csv /global/scratch/paciorek/.
588591

589-
srun -A co_stat -p savio2 --nodes=3 --ntasks-per-node=24 -t 30:0 --pty bash
590-
module load gcc openmpi r/3.4.2 r-packages
592+
srun -A co_stat -p savio2 --nodes=2 --ntasks-per-node=24 -t 30:0 --pty bash
593+
module load r/3.4.2 r-packages
591594
mpirun R CMD BATCH --no-save parallel-multi.R parallel-multi.Rout &
592595
```
593596

@@ -596,7 +599,7 @@ Now here's the R code (see *parallel-multi.R*) we're running:
596599
```
597600
library(doMPI)
598601
599-
cl = startMPIcluster() # by default will start one fewer slave
602+
cl = startMPIcluster() # by default will start one fewer slave, using one for master
600603
registerDoMPI(cl)
601604
clusterSize(cl) # just to check
602605
@@ -661,10 +664,10 @@ results
661664
662665
- For questions about computing resources in general, including cloud computing:
663666
664-
- office hours: Tues. 10:00 - 12:00, Wed. 1:30-3:30, Thur. 9:30-11:30 here in AIS
667+
- office hours: Tues. 10:00-12:00, Wed. 1:30-3:30, Thur. 9:30-11:30 here in AIS
665668
- For questions about data management (including HIPAA-protected data):
666669
667-
- office hours: Tues. 10:00 - 12:00, Wed. 1:30-3:30, Thur. 9:30-11:30 here in AIS
670+
- office hours: Tues. 10:00-12:00, Wed. 1:30-3:30, Thur. 9:30-11:30 here in AIS
668671

669672

670673
# Upcoming events

0 commit comments

Comments
 (0)