Changes

Parker Addison · 8447fdde
--- a/Notes.md
+++ b/Notes.md
@@ -59,6 +59,7 @@
  - [Understanding performance](#understanding-performance)
 - [Darshan on PRP](#darshan-on-prp)
 - [Darshan to observe an ML application](#darshan-to-observe-an-ml-application)
+  - [I/O Behavior of Flight Anomaly Detection](#io-behavior-of-flight-anomaly-detection)
 <!-- vim-markdown-toc -->
@@ -1523,7 +1524,7 @@ Trying to get a test run of Darshan observing some ML application like image ana
 Turns out, Darshan *was* working but there are a few things to consider:
 - The environment variable `DARSHAN_ENABLE_NONMPI` needs to be set (it can be empty)
 - I think UTC is used so sometimes you need to look at the next day of log data
-```
+```bash
 env DARSHAN_ENABLE_NONMPI= LD_PRELOAD=/usr/local/lib/libdarshan.so python script.py
 ```
@@ -1550,3 +1551,66 @@ I don't have Docker on the nasmac, so I'm trying to get the Nautilus GitLab cont
 > ###### ddb3cc2587ee8151dd88fa956d9d00a37c81a64f
 > "Prompt image build" | HEAD -> main | 2021-05-24
+###### 05/25
+Alright, I've figured out the images, I have a deployment with PyTorch and Darshan running and I've copied over the flight anomaly code and data. Let's run it once to make sure it does indeed run.
+```bash
+python main_CCLP.py -e 1 -v 1
+```
+Ha! It does!
+Okie dokes, now time to try monitoring it with Darshan.
+```bash
+env DARSHAN_ENABLE_NONMPI= LD_PRELOAD=/usr/local/lib/libdarshan.so python main_CCLP.py -e 1
+```
+Sweet, now to examine the Darshan logs.
+### I/O Behavior of Flight Anomaly Detection
+I can create a human readable text dump of the log with `darshan-parser`, but I should also have PyDarshan installed in this image, so let's try to use it! Hmm, trying to import darshan complained. When I install darshan-util I should ./configure it with `--enable-pydarshan --enable-shared`.
+Then I can read in a report as the following in a Python shell, and tell it to read in all records (POSIX, MPI, STDIO, etc):
+```python
+import darshan
+report = darshan.DarshanReport('filename',read_all=False)
+report.read_all_generic_records()
+```
+Within the report, there are multiple records. We can see what records we have with `report.info()`, then access them through `report.records['API']` and then run things like `record.info(plot=True)`.
+However, this relies on an implied IPython environment, since it uses `display`. I'll try installing jupyter into this image. Seems to be working like a charm!
+```bash
+pip install jupyter              # Pod
+jupyter notebook
+k port-forward podname 8888:8888 # Local
+```
+Oh my I'm always reminded just how much I *absolutely love* working in Jupyter notebooks :')
+Great! This is awesome. I have the data and can play around with it.
+Okay, now to do a run on multiple epochs. Also it's worth noting this is just the *training* process we're monitoring -- the preprocessing stage is entirely separate.
+I'd be really interested in seeing **how much of the total runtime was waiting for I/O**?
+Looks like I'll want to use their experimental aggregators: https://www.mcs.anl.gov/research/projects/darshan/docs/pydarshan/api/pydarshan/darshan.experimental.aggregators.html. They don't return plots (so actually I guess we don't need jupyter, but I'm still going to use it), so we'll want to write some plotting code to visualize.
+```python
+darshan.enable_experimental(True)
+# IO Size Histogram, given the API ('module')
+report.mod_agg_iohist('POSIX')
+# Cumulative number of operations
+report.agg_ioops()
+```
+It seems like I can basically call them all using `.summarize()` then access with `.summary`
+```python
+report.summarize()
+report.summary
+```
+Plotting a hist/bar of access sizes is easy enough. How about the timeline? Here are the plots I want to replicate: https://www.mcs.anl.gov/research/projects/darshan/docs/ssnyder_ior-hdf5_id3655016_9-23-29011-12333993518351519212_1.darshan.pdf
+> ###### 241c061caced2a176cd50628d9c4954a267152a8
+> "Fix PyDarshan installation" | HEAD -> main | 2021-05-25