| ... | @@ -426,7 +426,7 @@ Alright, it was rejected because the node model was not specified. I'll specify |
... | @@ -426,7 +426,7 @@ Alright, it was rejected because the node model was not specified. I'll specify |
|
|
|
|
|
|
|
Also worth noting that there is a [**Pleiades development queue**](https://www.nas.nasa.gov/hecc/support/kb/pleiades-devel-queue_290.html) that I think this work would fall under (testing the commands that is, not the final benchmarks!).
|
|
Also worth noting that there is a [**Pleiades development queue**](https://www.nas.nasa.gov/hecc/support/kb/pleiades-devel-queue_290.html) that I think this work would fall under (testing the commands that is, not the final benchmarks!).
|
|
|
|
|
|
|
|
- [ ] I should ask Henry about the billing and mission shares.
|
|
- [x] ~~I should ask Henry about the billing and mission shares.~~
|
|
|
|
|
|
|
|
I just added `-q devel` and `-l model=san` to the script, trying again.
|
|
I just added `-q devel` and `-l model=san` to the script, trying again.
|
|
|
|
|
|
| ... | @@ -480,7 +480,7 @@ PBS set the following environment variables: |
... | @@ -480,7 +480,7 @@ PBS set the following environment variables: |
|
|
FORT_BUFFERED = 1
|
|
FORT_BUFFERED = 1
|
|
|
TZ = PST8PDT
|
|
TZ = PST8PDT
|
|
|
|
|
|
|
|
On r325i3n2:
|
|
On *****:
|
|
|
Current directory is /home6/paddison
|
|
Current directory is /home6/paddison
|
|
|
Hello
|
|
Hello
|
|
|
|
|
|
| ... | @@ -497,7 +497,7 @@ Job Resource Usage Summary for 10791518.pbspl1.nas.nasa.gov |
... | @@ -497,7 +497,7 @@ Job Resource Usage Summary for 10791518.pbspl1.nas.nasa.gov |
|
|
Walltime Requested : 02:00:00
|
|
Walltime Requested : 02:00:00
|
|
|
|
|
|
|
|
Execution Queue : devel
|
|
Execution Queue : devel
|
|
|
Charged To : g1119
|
|
Charged To : *****
|
|
|
|
|
|
|
|
Job Stopped : Sun Mar 21 20:12:36 2021
|
|
Job Stopped : Sun Mar 21 20:12:36 2021
|
|
|
____________________________________________________________________
|
|
____________________________________________________________________
|
| ... | @@ -656,7 +656,7 @@ PBS set the following environment variables: |
... | @@ -656,7 +656,7 @@ PBS set the following environment variables: |
|
|
FORT_BUFFERED = 1
|
|
FORT_BUFFERED = 1
|
|
|
TZ = PST8PDT
|
|
TZ = PST8PDT
|
|
|
|
|
|
|
|
On r327i7n6:
|
|
On *****:
|
|
|
Current directory is /home6/paddison
|
|
Current directory is /home6/paddison
|
|
|
|
|
|
|
|
____________________________________________________________________
|
|
____________________________________________________________________
|
| ... | @@ -672,7 +672,7 @@ Job Resource Usage Summary for 10799410.pbspl1.nas.nasa.gov |
... | @@ -672,7 +672,7 @@ Job Resource Usage Summary for 10799410.pbspl1.nas.nasa.gov |
|
|
Walltime Requested : 02:00:00
|
|
Walltime Requested : 02:00:00
|
|
|
|
|
|
|
|
Execution Queue : devel
|
|
Execution Queue : devel
|
|
|
Charged To : g1119
|
|
Charged To : *****
|
|
|
|
|
|
|
|
Job Stopped : Mon Mar 22 15:52:58 2021
|
|
Job Stopped : Mon Mar 22 15:52:58 2021
|
|
|
____________________________________________________________________
|
|
____________________________________________________________________
|
| ... | @@ -715,9 +715,9 @@ To make figuring this out easier, we can run the PBS job in interactive! This is |
... | @@ -715,9 +715,9 @@ To make figuring this out easier, we can run the PBS job in interactive! This is |
|
|
After loading the mpi-sgi, mpi-hpcx, and comp-intel modules, here's what running ior shows:
|
|
After loading the mpi-sgi, mpi-hpcx, and comp-intel modules, here's what running ior shows:
|
|
|
|
|
|
|
|
```
|
|
```
|
|
|
PBS r301i5n3:~> cd ior-3.3.0/
|
|
PBS *****:~> cd ior-3.3.0/
|
|
|
PBS r301i5n3:~/ior-3.3.0> ./src/ior
|
|
PBS *****:~/ior-3.3.0> ./src/ior
|
|
|
[r301i5n3:20237:0:20237] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xe5)
|
|
[*****:20237:0:20237] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xe5)
|
|
|
==== backtrace ====
|
|
==== backtrace ====
|
|
|
0 /nasa/hpcx/2.4.0_mt/ucx/install/lib/libucs.so.0(+0x1d98c) [0x2aaabb7a498c]
|
|
0 /nasa/hpcx/2.4.0_mt/ucx/install/lib/libucs.so.0(+0x1d98c) [0x2aaabb7a498c]
|
|
|
1 /nasa/hpcx/2.4.0_mt/ucx/install/lib/libucs.so.0(+0x1dbfb) [0x2aaabb7a4bfb]
|
|
1 /nasa/hpcx/2.4.0_mt/ucx/install/lib/libucs.so.0(+0x1dbfb) [0x2aaabb7a4bfb]
|
| ... | @@ -730,8 +730,8 @@ Segmentation fault (core dumped) |
... | @@ -730,8 +730,8 @@ Segmentation fault (core dumped) |
|
|
```
|
|
```
|
|
|
|
|
|
|
|
```
|
|
```
|
|
|
PBS r301i5n3:~/ior-3.3.0/src> mpirun -n 1 ./ior
|
|
PBS *****:~/ior-3.3.0/src> mpirun -n 1 ./ior
|
|
|
[r301i5n3:20556:0:20556] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xe5)
|
|
[*****:20556:0:20556] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xe5)
|
|
|
==== backtrace ====
|
|
==== backtrace ====
|
|
|
0 /nasa/hpcx/2.4.0_mt/ucx/install/lib/libucs.so.0(+0x1d98c) [0x2aaabb7a498c]
|
|
0 /nasa/hpcx/2.4.0_mt/ucx/install/lib/libucs.so.0(+0x1d98c) [0x2aaabb7a498c]
|
|
|
1 /nasa/hpcx/2.4.0_mt/ucx/install/lib/libucs.so.0(+0x1dbfb) [0x2aaabb7a4bfb]
|
|
1 /nasa/hpcx/2.4.0_mt/ucx/install/lib/libucs.so.0(+0x1dbfb) [0x2aaabb7a4bfb]
|
| ... | @@ -745,7 +745,7 @@ Primary job terminated normally, but 1 process returned |
... | @@ -745,7 +745,7 @@ Primary job terminated normally, but 1 process returned |
|
|
a non-zero exit code. Per user-direction, the job has been aborted.
|
|
a non-zero exit code. Per user-direction, the job has been aborted.
|
|
|
--------------------------------------------------------------------------
|
|
--------------------------------------------------------------------------
|
|
|
--------------------------------------------------------------------------
|
|
--------------------------------------------------------------------------
|
|
|
mpirun noticed that process rank 0 with PID 0 on node r301i5n3 exited on signal 11 (Segmentation fault).
|
|
mpirun noticed that process rank 0 with PID 0 on node ***** exited on signal 11 (Segmentation fault).
|
|
|
--------------------------------------------------------------------------
|
|
--------------------------------------------------------------------------
|
|
|
```
|
|
```
|
|
|
|
|
|
| ... | @@ -786,8 +786,37 @@ Right now I have some hard-coded values in the Python script itself for the para |
... | @@ -786,8 +786,37 @@ Right now I have some hard-coded values in the Python script itself for the para |
|
|
|
|
|
|
|
To more easily validate and see the effect of the different parameters, it would help to have code that converts the IOR outputs into tables then graphs them!
|
|
To more easily validate and see the effect of the different parameters, it would help to have code that converts the IOR outputs into tables then graphs them!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
> ###### [fddf049](ssh///https://gitlab-ssh.nautilus.optiputer.net/30622/parkeraddison/filesystem-benchmarks/commit/fddf04911b8fb2af9f03f07a1a1f5c6dbd49cdc7)
|
|
> ###### [fddf049](ssh///https://gitlab-ssh.nautilus.optiputer.net/30622/parkeraddison/filesystem-benchmarks/commit/fddf04911b8fb2af9f03f07a1a1f5c6dbd49cdc7)
|
|
|
> "Facilitate running IOR with a parameter sweep" | HEAD -> main | 2021-03-30
|
|
> "Facilitate running IOR with a parameter sweep" | HEAD -> main | 2021-03-30
|
|
|
|
|
|
|
|
|
###### 04/04
|
|
|
|
|
|
|
|
Following https://www.nas.nasa.gov/hecc/support/kb/secure-setup-for-using-jupyter-notebook-on-nas-systems_622.html and https://www.nas.nasa.gov/hecc/support/kb/using-jupyter-notebook-for-machine-learning-development-on-nas-systems_576.html, I can do my visualization work in a Jupyter notebook running on a compute node.
|
|
|
|
|
|
|
|
When going through the setup steps, I used the pyt1_8 environment -- I'm not sure what if 'pyt' stands for anything besides 'Python' and what the number denotes, I imaging that the tf... environments are for TensorFlow. But regardless, I checked and pyt1_8 has jupyter and Python 3.9, along with pandas, numpy, scipy, and matplotlib, so it'll work well!
|
|
|
|
|
|
|
|
###### 04/05
|
|
|
|
|
|
|
|
Okay, yesterday I ended up giving up on NAS Jupyter because I kept running into SSL errors when trying to actually go to localhost and access the lab. After multiple attempts today, trying different environments, following all steps again, etc, I've realized **the problem was Chrome** -- after *switching to Firefox to view Jupyter* all is fine.
|
|
|
|
|
|
|
|
Also, now that I can finally use Jupyter for development, it's worth remembering the following help CSS rule to inject to add an 80ch ruler to the JupyterLab code editor:
|
|
|
|
```css
|
|
|
|
.CodeMirror-line::after {
|
|
|
|
content: '';
|
|
|
|
position: absolute;
|
|
|
|
left: 88ex;
|
|
|
|
border-left: 1px dashed gray;
|
|
|
|
}
|
|
|
|
```
|
|
|
|
|
|
|
|
Here are the descriptions of the different NAS ML conda environments: https://www.nas.nasa.gov/hecc/support/kb/machine-learning-overview_572.html. Looks like 'pyt' stands for PyTorch (d'oh). It also looks like the /nasa `jupyterlab` environment doesn't have matplotlib. The machine learning environments do, however. So in the future I'll start up the lab from that environment. I could also go ahead and create my own virtual environment probably... but I really don't need to! Having PyTorch or TensorFlow is overkill, but that is fine by me ;)
|
|
|
|
|
|
|
|
Turns out IOR has a few different output formats -- including JSON and CSV which make life a lot easier -- I've been trying to parse the human-readable output but ran into some issues with whitespace delimiting. It looks like the JSON output is the best (in my opinion) since it's easy to access exactly what you need and it doesn't hide any information. Side note - I wonder if YAML will ever take over JSON's place in society...
|
|
|
|
|
|
|
|
###### 04/06
|
|
|
|
|
|
|
|
Now that I'm using the JSON output from IOR, everything is much more straightforward when it comes to parsing. I polished up a file to parse and plot outputs. I think it's time now to actually do some larger-scale runs so we can make sure what we're getting as a result makes sense.
|
|
|
|
|
|
|
|
> ###### [9aa37d7](ssh///https://gitlab-ssh.nautilus.optiputer.net/30622/parkeraddison/filesystem-benchmarks/commit/9aa37d75a8cb70b06023251e065a64795cb58569)
|
|
|
|
> "Output parsing and plotting functions complete" | HEAD -> main | 2021-04-06
|
|
|
|
|