| ... | @@ -31,6 +31,11 @@ |
... | @@ -31,6 +31,11 @@ |
|
|
* [Darshan on NAS](#darshan-on-nas)
|
|
* [Darshan on NAS](#darshan-on-nas)
|
|
|
* [What MPI module to use](#what-mpi-module-to-use)
|
|
* [What MPI module to use](#what-mpi-module-to-use)
|
|
|
* [Conducting initial parameter sweeps](#conducting-initial-parameter-sweeps)
|
|
* [Conducting initial parameter sweeps](#conducting-initial-parameter-sweeps)
|
|
|
|
* [Initial sweeps on /nobackupp12](#initial-sweeps-on-nobackupp12)
|
|
|
|
* [Lustre stripe counts](#lustre-stripe-counts)
|
|
|
|
* [IOR Lustre directives](#ior-lustre-directives)
|
|
|
|
* [IOR Lustre striping workaround](#ior-lustre-striping-workaround)
|
|
|
|
* [MPI on PRP](#mpi-on-prp)
|
|
|
|
|
|
|
|
<!-- vim-markdown-toc -->
|
|
<!-- vim-markdown-toc -->
|
|
|
|
|
|
| ... | @@ -992,9 +997,93 @@ Let's try Darshan super quick? Damn. It hung again. Alright, I'll give up on Dar |
... | @@ -992,9 +997,93 @@ Let's try Darshan super quick? Damn. It hung again. Alright, I'll give up on Dar |
|
|
|
|
|
|
|
**Additional IOR documentation can be found here https://github.com/hpc/ior/blob/main/doc/USER_GUIDE**. It includes some things that aren't on the website. Based on this, I could have written Python code to generate IOR scripts then have the PBS job script run that, rather than execute commands within Python. Oh well, maybe I will change to that in the future.
|
|
**Additional IOR documentation can be found here https://github.com/hpc/ior/blob/main/doc/USER_GUIDE**. It includes some things that aren't on the website. Based on this, I could have written Python code to generate IOR scripts then have the PBS job script run that, rather than execute commands within Python. Oh well, maybe I will change to that in the future.
|
|
|
|
|
|
|
|
I've gone ahead and done the parameter sweeps. The results are plotted and commented on in the `1_Parameter_sweeps.ipynb` notebook. Here are some of the notable things we saw.
|
|
###### 04/13
|
|
|
|
|
|
|
|
|
I've gone ahead and done the parameter sweeps. The results are plotted and commented on in the `1_Parameter_sweeps.ipynb` notebook (on NAS pfe). Most notably, there was an interesting dip in performance at a transferSize of 4MiB and performance decreased with more nodes.
|
|
|
|
|
|
|
|
It's important to figure out if that behavior is consistent, then if so figure out what is causing it. The hardware? The network topology? The software, like Lustre stripe sizes?
|
|
|
|
|
|
|
|
|
I ran all of the benchmarks on /nobackupp18 but supposedly that filesystem is not fully set up yet. It also has different hardware (SSDs) than /nobackupp12. I will attempt to run the same set of tests on /nobackupp12 and compare the results.
|
|
|
|
|
|
|
> ###### [a10b7d7](ssh///https://gitlab-ssh.nautilus.optiputer.net/30622/parkeraddison/filesystem-benchmarks/commit/a10b7d71f4ffb1420f45d916b72968b6174236e4)
|
|
> ###### [a10b7d7](ssh///https://gitlab-ssh.nautilus.optiputer.net/30622/parkeraddison/filesystem-benchmarks/commit/a10b7d71f4ffb1420f45d916b72968b6174236e4)
|
|
|
> "Initial parameter sweeps; Configurable sweeps; Parsing/plotting" | HEAD -> main | 2021-04-14
|
|
> "Initial parameter sweeps; Configurable sweeps; Parsing/plotting" | HEAD -> main | 2021-04-14
|
|
|
|
|
|
|
|
|
###### 04/16
|
|
|
|
|
|
|
|
## Initial sweeps on /nobackupp12
|
|
|
|
|
|
|
|
### Lustre stripe counts
|
|
|
|
|
|
|
|
Henry warned me that the progressive Lustre striping on /nobackupp12 is broken, and I should make sure that a fixed stripe count is being used instead. To see what stripe count is currently being used, I can run `lfs getstripe [options] path`. So for instance, I ran a small test with the `keepFile` directive enabled so I could see what striping is being done on the written `testFile`.
|
|
|
|
|
|
|
|
```bash
|
|
|
|
lfs getstripe testFile
|
|
|
|
```
|
|
|
|
confirms that progressive striping is taking place. Whereas if I specify a new file with a fixed stripe count (or size)
|
|
|
|
```bash
|
|
|
|
lfs setstripe -c 2 testFile2
|
|
|
|
cp testFile testFile2
|
|
|
|
lfs getstripe testFile2
|
|
|
|
```
|
|
|
|
I see that fixed number! Fortunately, I can specify stripe size by using IOR directive options!
|
|
|
|
|
|
|
|
### IOR Lustre directives
|
|
|
|
|
|
|
|
Huh... when I tried to run IOR with a Lustre-specific directive it complained
|
|
|
|
```
|
|
|
|
ior ERROR: ior was not compiled with Lustre support, errno 34, Numerical result out of range (parse_options.c:248)
|
|
|
|
```
|
|
|
|
|
|
|
|
I compiled this version of IOR with the mpi-hpe module... I'll try ./configure again to see if Lustre is shown as supported. This time around I ran `./configure --with-lustre`, then `make`. Let's see if it works. I suppose if it doesn't I can always just add an explicit `lfs setstripe` command before each test.
|
|
|
|
|
|
|
|
Didn't work. Maybe I need to compile it on a Lustre filesystem? Like, move it to a /nobackup and then re-configure/compile?
|
|
|
|
|
|
|
|
Maybe it's related: https://github.com/hpc/ior/issues/189
|
|
|
|
|
|
|
|
Shucks, as a workaround I tried an explicit `lfs setstripe` on `testFile` before running IOR, but the `getstripe` afterwards showed that it didn't work. I think this is because IOR deletes the file before writing it.
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
Here are some great resources about Lustre striping, IO benchmarks, etc:
|
|
|
|
- https://www.nics.tennessee.edu/computing-resources/file-systems/lustre-striping-guide
|
|
|
|
- https://www.nics.tennessee.edu/computing-resources/file-systems/io-lustre-tips
|
|
|
|
|
|
|
|
These explain that performance greatly benefits from **stripe alignment**, in which OST contention is minimized by ensuring each processor is requesting parts of a file from different OSTs -- this can be done by setting the number of stripes to the number of processes, for instance. Performance is also optimized by a stripe size similar to the transfer size.
|
|
|
|
|
|
|
|
Honestly, this document has some incredible tips and insight. NICS is co-located on the ORNL campus so has ties to the DoE.
|
|
|
|
|
|
|
|
### IOR Lustre striping workaround
|
|
|
|
|
|
|
|
Ah! Looks like IOR Lustre options not working is potentially a known issue: https://github.com/hpc/ior/issues/353
|
|
|
|
|
|
|
|
Perhaps this is a workaround to pre-stripe and *keep* the file: https://github.com/hpc/ior/issues/273 Basically, use the `-E` (existing file) option :) And that works!
|
|
|
|
|
|
|
|
```bash
|
|
|
|
lfs setstripe -c 2 testFile
|
|
|
|
mpiexec -np 2 ~/benchmarks/ior/ior-3.3.0/src/ior -a MPIIO -E -k
|
|
|
|
lfs getstripe testFile
|
|
|
|
```
|
|
|
|
|
|
|
|
So we can explicitly run `lfs setstripe` and create the testFile before hand *as long as we also make sure to use the existing file flag*!
|
|
|
|
|
|
|
|
Woohoo, let's run those tests again!
|
|
|
|
|
|
|
|
###### 04/17
|
|
|
|
|
|
|
|
I'm taking a closer look at some more test runs... I think perhaps part of the reason behind the high variance is due to the data sizes being relatively small? I'm not sure... but the variation between two consecutive repetitions can be *huge*. For instance, I ran another transfer size test and the first repetition read time was 1.3 seconds -- then the next was 0.3 seconds.
|
|
|
|
|
|
|
|
Actually, I've looked at all of the individual tests now (not just the summary) and it looks like the first repetition *always* takes considerably (~3x) longer than the rest of the repetitions. The next repetitions or two are usually the best, then read time starts to climb again.
|
|
|
|
|
|
|
|
This is not true for writes -- though there is a lot of variation in write time... I'm not sure why there would be.
|
|
|
|
|
|
|
|
Perhaps there is truly some caching effect happening when I run the repetitions?
|
|
|
|
|
|
|
|
###### 04/19
|
|
|
|
|
|
|
|
## MPI on PRP
|
|
|
|
|
|
|
|
Looking into running multiple an mpi job (IOR) across multiple node
|
|
|
|
|
|
|
|
> ###### [1aeb895](ssh///https://gitlab-ssh.nautilus.optiputer.net/30622/parkeraddison/filesystem-benchmarks/commit/1aeb8957bb17f24494cbff25ebb03d371422054e)
|
|
|
|
> "sync changes" | HEAD -> main | 2021-04-24
|
|
|
|
|