| ... | ... | @@ -333,7 +333,7 @@ This is something I could (hopefully easily) whip up and have it be useful -- ru |
|
|
|
|
|
|
|
I might try this out right now on Nautilus... let me go ahead set up a slightly larger volume.
|
|
|
|
|
|
|
|
- [ ] Talk with John/Dima about how large of a volume and how many pods I can set up for future benchmarking on Nautilus
|
|
|
|
- [x] ~~Talk with John/Dima about how large of a volume and how many pods I can set up for future benchmarking on Nautilus~~
|
|
|
|
|
|
|
|
---
|
|
|
|
|
| ... | ... | @@ -346,3 +346,156 @@ One thing I'm not fully sure the importance of or how to use is specifying a fil |
|
|
|
|
|
|
|
> ###### [c57f6dd](ssh///https://gitlab-ssh.nautilus.optiputer.net/30622/parkeraddison/filesystem-benchmarks/commit/c57f6dd8feb60eedf5a0c0ae809f30e3ef91f111)
|
|
|
|
> "Minimal IOR test script; Repo organization" | HEAD -> main | 2021-03-16
|
|
|
|
|
|
|
|
###### 03/21
|
|
|
|
|
|
|
|
When I spoke with John he was interested in the IO500 leaderboard and FIO benchmark. He mentioned a few cool things:
|
|
|
|
1. I can create a namespace to run the benchmarks on Nautilus
|
|
|
|
2. They have used FIO a lot before!
|
|
|
|
3. I should talk to Igor about the benchmarks/IO500
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
It's about time that I run a job on Pleiades! Then I'll try to run a minimal IOR and FIO run.
|
|
|
|
|
|
|
|
## Pleiades PBS Hello World
|
|
|
|
|
|
|
|
Alright, let's give this a go.
|
|
|
|
|
|
|
|
Log in to the enclave (secure front-end), then log in to a Pleiades front-end
|
|
|
|
```
|
|
|
|
ssh sfe
|
|
|
|
ssh pfe
|
|
|
|
```
|
|
|
|
|
|
|
|
Explanation of PBS on the HECC knowedlgebase: https://www.nas.nasa.gov/hecc/support/kb/portable-batch-system-(pbs)-overview_126.html
|
|
|
|
> Batch jobs run on compute nodes, not the front-end nodes. A PBS scheduler allocates blocks of compute nodes to jobs to provide exclusive access. You will submit batch jobs to run on one or more compute nodes using the qsub command from an interactive session on one of Pleiades front-end systems (PFEs).
|
|
|
|
>
|
|
|
|
> Normal batch jobs are typically run by submitting a script. A "jobid" is assigned after submission. When the resources you request become available, your job will execute on the compute nodes. When the job is complete, the PBS standard output and standard error of the job will be returned in files available to you.
|
|
|
|
>
|
|
|
|
> When porting job submission scripts from systems outside of the NAS environment or between the supercomputers, be careful to make changes to your existing scripts to make them work properly on these systems.
|
|
|
|
|
|
|
|
A job is submitted to PBS using `qsub`. Typing `man qsub` gives a nice description of the expected job script format and capabilities. Here are some useful parts:
|
|
|
|
- The script can run Python, Sh, Csh, Batch, Perl
|
|
|
|
- A script consists of: 1) An optional shell specification, 2) PBS directives, 3) User tasks, programs, commands, applications, 4) Comments
|
|
|
|
- A shebang can be used to specify the shell, or the `-S` command line option can be used
|
|
|
|
- E.g. Python can be used by having the first line of the script as `#!/usr/bin/python3`
|
|
|
|
|
|
|
|
### PBS Directives
|
|
|
|
|
|
|
|
These are needed in a job script, and are written as `#PBS`-prefixed lines at the top of the script file, or can be passed in as arguments to the `qsub` command. It's probably best to include them in the script though!
|
|
|
|
|
|
|
|
With that said, the shell could be specified with `#PBS -S`, too.
|
|
|
|
|
|
|
|
Common directives can be found here: https://www.nas.nasa.gov/hecc/support/kb/commonly-used-qsub-command-options_175.html. And other directives (options) can be seen with `man qsub`.
|
|
|
|
|
|
|
|
### Hello World
|
|
|
|
|
|
|
|
Here's a basic script seen in the man pages, but I modified 'print' to 'echo' instead to avoid an invalid command!
|
|
|
|
```bash
|
|
|
|
#!/bin/sh
|
|
|
|
#PBS -l select=1:ncpus=1:mem=1gb
|
|
|
|
#PBS -N HelloJob
|
|
|
|
echo "Hello"
|
|
|
|
```
|
|
|
|
The script will be executed using the shell based on the first line shebang. The PBS `-l` directive specifies resources. It asks for 1 'chunk' of resources with 1 cpu and 1 gb of memory. Here is also where we could specify the specific compute nodes we want (model=), the number of mpi processes we want (mpiprocs=), and the filesystem (?). See `man pbs_resources`. Finally, the PBS `-N` directive specifies the job name.
|
|
|
|
|
|
|
|
Let's try running it!
|
|
|
|
```bash
|
|
|
|
qsub hello-job.sh
|
|
|
|
```
|
|
|
|
|
|
|
|
Alright, it was rejected because the node model was not specified. I'll specify Pleiades Sandy Bridge with `model=san` in the resource line.
|
|
|
|
|
|
|
|
Also worth noting that there is a [**Pleiades development queue**](https://www.nas.nasa.gov/hecc/support/kb/pleiades-devel-queue_290.html) that I think this work would fall under (testing the commands that is, not the final benchmarks!).
|
|
|
|
|
|
|
|
- [ ] I should ask Henry about the billing and mission shares.
|
|
|
|
|
|
|
|
I just added `-q devel` and `-l model=san` to the script, trying again.
|
|
|
|
|
|
|
|
```bash
|
|
|
|
qsub hello-job.sh
|
|
|
|
```
|
|
|
|
Output: `10791518.pbspl1.nas.nasa.gov`
|
|
|
|
|
|
|
|
Running `qstat -u paddison` lists the jobs I've submitted. This is a pretty quick job on a fast-turnaround queue, so it'll go by quickly. But three quick runs of that command showed the job in three different states. The fourth time running qstat the output was empty -- the job was complete.
|
|
|
|
<details>
|
|
|
|
```
|
|
|
|
paddison@pfe24:~> qstat -u paddison
|
|
|
|
Req'd Elap
|
|
|
|
JobID User Queue Jobname TSK Nds wallt S wallt Eff
|
|
|
|
--------------- -------- ----- -------- --- --- ----- - ----- ---
|
|
|
|
10791518.pbspl1 paddison devel HelloJob 1 1 02:00 Q 00:00 --
|
|
|
|
```
|
|
|
|
```
|
|
|
|
paddison@pfe24:~> qstat -u paddison
|
|
|
|
Req'd Elap
|
|
|
|
JobID User Queue Jobname TSK Nds wallt S wallt Eff
|
|
|
|
--------------- -------- ----- -------- --- --- ----- - ----- ---
|
|
|
|
10791518.pbspl1 paddison devel HelloJob 1 1 02:00 R 00:00 50%
|
|
|
|
```
|
|
|
|
```
|
|
|
|
paddison@pfe24:~> qstat -u paddison
|
|
|
|
Req'd Elap
|
|
|
|
JobID User Queue Jobname TSK Nds wallt S wallt Eff
|
|
|
|
--------------- -------- ----- -------- --- --- ----- - ----- ---
|
|
|
|
10791518.pbspl1 paddison devel HelloJob 1 1 02:00 E 00:00 50%
|
|
|
|
```
|
|
|
|
</details>
|
|
|
|
|
|
|
|
Two files are now present in the directory where I ran `qsub`.
|
|
|
|
|
|
|
|
<details>
|
|
|
|
<summary>HelloJob.o10791518/summary>
|
|
|
|
|
|
|
|
```
|
|
|
|
Job 10791518.pbspl1.nas.nasa.gov started on Sun Mar 21 20:12:29 PDT 2021
|
|
|
|
The job requested the following resources:
|
|
|
|
mem=1gb
|
|
|
|
ncpus=1
|
|
|
|
place=scatter:excl
|
|
|
|
walltime=02:00:00
|
|
|
|
|
|
|
|
PBS set the following environment variables:
|
|
|
|
FORT_BUFFERED = 1
|
|
|
|
TZ = PST8PDT
|
|
|
|
|
|
|
|
On r325i3n2:
|
|
|
|
Current directory is /home6/paddison
|
|
|
|
Hello
|
|
|
|
|
|
|
|
____________________________________________________________________
|
|
|
|
Job Resource Usage Summary for 10791518.pbspl1.nas.nasa.gov
|
|
|
|
|
|
|
|
CPU Time Used : 00:00:02
|
|
|
|
Real Memory Used : 2732kb
|
|
|
|
Walltime Used : 00:00:02
|
|
|
|
Exit Status : 0
|
|
|
|
|
|
|
|
Memory Requested : 1gb
|
|
|
|
Number of CPUs Requested : 1
|
|
|
|
Walltime Requested : 02:00:00
|
|
|
|
|
|
|
|
Execution Queue : devel
|
|
|
|
Charged To : g1119
|
|
|
|
|
|
|
|
Job Stopped : Sun Mar 21 20:12:36 2021
|
|
|
|
____________________________________________________________________
|
|
|
|
```
|
|
|
|
|
|
|
|
</details>
|
|
|
|
|
|
|
|
The `e` file was empty. Here is that file from a previous run where an invalid command was used.
|
|
|
|
<details>
|
|
|
|
<summary>HelloJob.e10791410</summary>
|
|
|
|
```
|
|
|
|
/var/spool/pbs/mom_priv/jobs/10791410.pbspl1.nas.nasa.gov.SC: line 5: print: command not found
|
|
|
|
```
|
|
|
|
</details>
|
|
|
|
|
|
|
|
The job summary and output is shown in the `o` file, and it appears that stderr is shown in the `e` file.
|
|
|
|
|
|
|
|
Nice!
|
|
|
|
|
|
|
|
> ###### [6e9d944](ssh///https://gitlab-ssh.nautilus.optiputer.net/30622/parkeraddison/filesystem-benchmarks/commit/6e9d9445847ef3dba7e80e79885bbd54cadb122b)
|
|
|
|
> "Hello World PBS job run on Pleiades" | HEAD -> main | %as |