You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>One thing that tricks people is that the modules are arranged in a hierarchical (nested) fashion, so you only see some of the modules as being available <em>after</em> you load the parent module (e.g., MKL, FFT, and HDF5/NetCDF software is nested within the gcc module). Here's how we see and load MPI.</p>
212
-
<pre><code>module load openmpi
214
+
<pre><code>module load openmpi # this fails if gcc not yet loaded
213
215
module load gcc
214
216
module avail
215
217
module load openmpi</code></pre>
@@ -221,19 +223,22 @@ <h1 id="submitting-jobs-accounts-and-partitions">Submitting jobs: accounts and p
221
223
<pre><code>sacctmgr -p show associations user=SAVIO_USERNAME</code></pre>
222
224
<p>Here's an example of the output for a user who has access to an FCA, a condo, and a special partner account:</p>
<p>If you are part of a condo, you'll notice that you have <em>low-priority</em> access to certain partitions. For example I am part of the statistics condo <em>co_stat</em>, which owns some Savio2 nodes and Savio2_gpu and therefore I have normal access to those, but I can also burst beyond the condo and use other partitions at low-priority (see below).</p>
238
243
<p>In contrast, through my FCA, I have access to the savio, savio2, and big memory partitions.</p>
239
244
<h1id="submitting-a-batch-job">Submitting a batch job</h1>
@@ -265,16 +270,16 @@ <h1 id="submitting-a-batch-job">Submitting a batch job</h1>
<p>MaxRSS will show the maximum amount of memory that the job used in kilobytes.</p>
267
272
<p>You can also login to the node where you are running and use commands like <em>top</em> and <em>ps</em>:</p>
268
-
<p>``` srun --jobid=<JOB_ID> --pty /bin/bash</p>
269
-
<p>Note that except for the <em>savio2_htc</em> and <em>savio2_gpu</em> partitions, all jobs are given exclusive access to the entire node or nodes assigned to the job (and your account is charged for all of the cores on the node(s).</p>
<p>Note that except for the <em>savio2_htc</em> and <em>savio2_gpu</em> partitions, all jobs are given exclusive access to the entire node or nodes assigned to the job (and your account is charged for all of the cores on the node(s)).</p>
<p>If you are submitting a job that uses multiple nodes, you'll need to carefully specify the resources you need. The key flags for use in your job script are:</p>
272
277
<ul>
273
278
<li><code>--nodes</code> (or <code>-N</code>): indicates the number of nodes to use</li>
274
279
<li><code>--ntasks-per-node</code>: indicates the number of tasks (i.e., processes) one wants to run on each node</li>
275
280
<li><code>--cpus-per-task</code> (or <code>-c</code>): indicates the number of cpus to be used for each task</li>
276
281
</ul>
277
-
<p>In addition, in some cases it can make sense to use the <code>--ntasks</code> (or <code>-n</code>) option to indicate the total number of tasks and let the scheduler determine how many nodes and tasks per node are needed. In general <code>--cpus-per-task</code> will be 1 except when running threaded code.</p>
282
+
<p>In addition, in some cases it can make sense to use the <code>--ntasks</code> (or <code>-n</code>) option to indicate the total number of tasks and let the scheduler determine how many nodes and tasks per node are needed. In general <code>--cpus-per-task</code> will be one except when running threaded code.</p>
278
283
<p>Here's an example job script for a job that uses MPI for parallelizing over multiple nodes:</p>
<li>using <ahref="https://github.com/berkeley-scf/tutorial-parallel-basics">single-node parallelism</a> and <ahref="https://github.com/berkeley-scf/tutorial-parallel-distributed">multiple-node parallelism</a> in Python, R, and MATLAB
341
346
<ul>
342
347
<li>parallel R tools such as <em>foreach</em>, <em>parLapply</em>, and <em>mclapply</em></li>
343
-
<li>parallel Python tools such as <em>IPython parallel</em>, and <em>Dask</em></li>
348
+
<li>parallel Python tools such as <em>ipyparallel</em>, and <em>Dask</em></li>
344
349
<li>parallel functionality in MATLAB through <em>parfor</em></li>
345
350
</ul></li>
346
351
</ul>
@@ -356,7 +361,7 @@ <h1 id="monitoring-jobs-and-the-job-queue">Monitoring jobs and the job queue</h1
356
361
<pre><code>scancel YOUR_JOB_ID</code></pre>
357
362
<p>For more information on cores, QoS, and additional (e.g., GPU) resources, here's some syntax:</p>
<p>We provide some <ahref="http://research-it.berkeley.edu/services/high-performance-computing/tips-using-brc-savio-cluster">tips about monitoring your job</a>.</p>
364
+
<p>We provide some <ahref="http://research-it.berkeley.edu/services/high-performance-computing/running-your-jobs">tips about monitoring your jobs</a>. (Scroll down to the "Monitoring jobs" section.)</p>
360
365
<h1id="example-use-of-standard-software-ipython-and-r-notebooks-through-jupyterhub">Example use of standard software: IPython and R notebooks through JupyterHub</h1>
361
366
<p>Savio allows one to <ahref="http://research-it.berkeley.edu/services/high-performance-computing/using-jupyter-notebooks-and-jupyterhub-savio">run Jupyter-based notebooks via a browser-based service called Jupyterhub</a>.</p>
362
367
<p>Let's see a brief demo of an IPython notebook:</p>
<h1id="example-use-of-standard-software-python">Example use of standard software: Python</h1>
371
376
<p>Let's see a basic example of doing an analysis in Python across multiple cores on multiple nodes. We'll use the airline departure data in <em>bayArea.csv</em>.</p>
372
377
<p>Here we'll use <em>IPython</em> for parallel computing. The example is a bit contrived in that a lot of the time is spent moving data around rather than doing computation, but it should illustrate how to do a few things.</p>
373
-
<p>First we'll install a Python package not already available as a module.</p>
378
+
<p>First we'll install a Python package (pretending it is not already available via the basic python/3.6 module).</p>
374
379
<pre><code>cp bayArea.csv /global/scratch/paciorek/. # remember to do I/O off scratch
375
380
# install Python package
376
381
module unload python
@@ -384,6 +389,7 @@ <h1 id="example-use-of-standard-software-python">Example use of standard softwar
384
389
sleep 10
385
390
srun ipengine &
386
391
sleep 20 # wait until all engines have successfully started
392
+
cd /global/scratch/paciorek
387
393
ipython</code></pre>
388
394
<p>If we were doing this on a single node, we could start everything up in a single call to <em>ipcluster</em>:</p>
389
395
<pre><code>module load python/3.6
@@ -402,7 +408,7 @@ <h1 id="example-use-of-standard-software-python">Example use of standard softwar
402
408
lview.block = True
403
409
404
410
import pandas
405
-
dat = pandas.read_csv('bayArea.csv', header = None)
411
+
dat = pandas.read_csv('bayArea.csv', header = None, encoding = 'latin1')
Copy file name to clipboardexpand all lines: intro.md
+14-11
Original file line number
Diff line number
Diff line change
@@ -329,8 +329,10 @@ You can also login to the node where you are running and use commands like *top*
329
329
330
330
```
331
331
srun --jobid=<JOB_ID> --pty /bin/bash
332
+
```
333
+
334
+
Note that except for the *savio2_htc* and *savio2_gpu* partitions, all jobs are given exclusive access to the entire node or nodes assigned to the job (and your account is charged for all of the cores on the node(s)).
332
335
333
-
Note that except for the *savio2_htc* and *savio2_gpu* partitions, all jobs are given exclusive access to the entire node or nodes assigned to the job (and your account is charged for all of the cores on the node(s).
334
336
335
337
# Parallel job submission
336
338
@@ -340,7 +342,7 @@ If you are submitting a job that uses multiple nodes, you'll need to carefully s
340
342
-`--ntasks-per-node`: indicates the number of tasks (i.e., processes) one wants to run on each node
341
343
-`--cpus-per-task` (or `-c`): indicates the number of cpus to be used for each task
342
344
343
-
In addition, in some cases it can make sense to use the `--ntasks` (or `-n`) option to indicate the total number of tasks and let the scheduler determine how many nodes and tasks per node are needed. In general `--cpus-per-task` will be 1 except when running threaded code.
345
+
In addition, in some cases it can make sense to use the `--ntasks` (or `-n`) option to indicate the total number of tasks and let the scheduler determine how many nodes and tasks per node are needed. In general `--cpus-per-task` will be one except when running threaded code.
344
346
345
347
Here's an example job script for a job that uses MPI for parallelizing over multiple nodes:
346
348
@@ -437,7 +439,7 @@ Here are some options:
437
439
- using [Savio's HT Helper tool](http://research-it.berkeley.edu/services/high-performance-computing/user-guide/hthelper-script) to run many computational tasks (e.g., thousands of simulations, scanning tens of thousands of parameter values, etc.) as part of single Savio job submission
438
440
- using [single-node parallelism](https://github.com/berkeley-scf/tutorial-parallel-basics) and [multiple-node parallelism](https://github.com/berkeley-scf/tutorial-parallel-distributed) in Python, R, and MATLAB
439
441
- parallel R tools such as *foreach*, *parLapply*, and *mclapply*
440
-
- parallel Python tools such as *IPython parallel*, and *Dask*
442
+
- parallel Python tools such as *ipyparallel*, and *Dask*
441
443
- parallel functionality in MATLAB through *parfor*
442
444
443
445
# Monitoring jobs and the job queue
@@ -465,7 +467,7 @@ For more information on cores, QoS, and additional (e.g., GPU) resources, here's
We provide some [tips about monitoring your job](http://research-it.berkeley.edu/services/high-performance-computing/tips-using-brc-savio-cluster).
470
+
We provide some [tips about monitoring your jobs](http://research-it.berkeley.edu/services/high-performance-computing/running-your-jobs). (Scroll down to the "Monitoring jobs" section.)
469
471
470
472
# Example use of standard software: IPython and R notebooks through JupyterHub
471
473
@@ -486,7 +488,7 @@ Let's see a basic example of doing an analysis in Python across multiple cores o
486
488
487
489
Here we'll use *IPython* for parallel computing. The example is a bit contrived in that a lot of the time is spent moving data around rather than doing computation, but it should illustrate how to do a few things.
488
490
489
-
First we'll install a Python package not already available as a module.
491
+
First we'll install a Python package (pretending it is not already available via the basic python/3.6 module).
490
492
491
493
```
492
494
cp bayArea.csv /global/scratch/paciorek/. # remember to do I/O off scratch
@@ -510,6 +512,7 @@ ipcontroller --ip='*' &
510
512
sleep 10
511
513
srun ipengine &
512
514
sleep 20 # wait until all engines have successfully started
0 commit comments