| ... | @@ -42,6 +42,9 @@ |
... | @@ -42,6 +42,9 @@ |
|
|
- [Shared volume](#shared-volume)
|
|
- [Shared volume](#shared-volume)
|
|
|
- [Node selection](#node-selection)
|
|
- [Node selection](#node-selection)
|
|
|
- [PRP SeaweedFS](#prp-seaweedfs)
|
|
- [PRP SeaweedFS](#prp-seaweedfs)
|
|
|
|
- [Single node CephFS parameter sweep](#single-node-cephfs-parameter-sweep)
|
|
|
|
- [Attach VSCode to kubernetes pod](#attach-vscode-to-kubernetes-pod)
|
|
|
|
- [IOR scripts and outputs](#ior-scripts-and-outputs)
|
|
|
|
|
|
|
|
<!-- vim-markdown-toc -->
|
|
<!-- vim-markdown-toc -->
|
|
|
|
|
|
| ... | @@ -1264,6 +1267,63 @@ Dima mentioned there are some issues with CephFS at the moment and heavy usage i |
... | @@ -1264,6 +1267,63 @@ Dima mentioned there are some issues with CephFS at the moment and heavy usage i |
|
|
|
|
|
|
|
https://pacificresearchplatform.org/userdocs/storage/seaweedfs/
|
|
https://pacificresearchplatform.org/userdocs/storage/seaweedfs/
|
|
|
|
|
|
|
|
|
|
|
|
|
> ###### ab474f1db0cedc44951f190a509d34a7194f1c76
|
|
> ###### ab474f1db0cedc44951f190a509d34a7194f1c76
|
|
|
> "SeaweedFS volume" | HEAD -> main | 2021-04-27
|
|
> "SeaweedFS volume" | HEAD -> main | 2021-04-27
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
Running into some issues with SeaweedFS. I created the pvc, but when I created a deployment the pod failed to mount to the pvc.
|
|
|
|
|
|
|
|
Later in the day I tried it again and the pvc itself failed to be provisioned
|
|
|
|
```
|
|
|
|
Events:
|
|
|
|
Type Reason Age From Message
|
|
|
|
---- ------ ---- ---- -------
|
|
|
|
Normal ExternalProvisioning 2m46s (x26 over 8m46s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "seaweedfs-csi-driver" or manually created by system administrator
|
|
|
|
Normal Provisioning 15s (x10 over 8m46s) seaweedfs-csi-driver_csi-seaweedfs-controller-0_7da00a20-3339-4cce-a620-44a28c9b6d7d External provisioner is provisioning volume for claim "usra-hpc/shared-seaweedfs"
|
|
|
|
Warning ProvisioningFailed 15s (x10 over 8m46s) seaweedfs-csi-driver_csi-seaweedfs-controller-0_7da00a20-3339-4cce-a620-44a28c9b6d7d failed to provision volume with StorageClass "seaweedfs-storage": rpc error: code = Unknown desc = Error setting bucket metadata: mkdir /buckets/pvc-439d990a-501a-4801-99b8-d5163aedbdf8: CreateEntry: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 10.98.219.214:18888: connect: connection refused"
|
|
|
|
```
|
|
|
|
|
|
|
|
Then after a while of waiting it magically worked. Then the deployment failed to mount again, then after a while that too managed to work...
|
|
|
|
```
|
|
|
|
Events:
|
|
|
|
Type Reason Age From Message
|
|
|
|
---- ------ ---- ---- -------
|
|
|
|
Normal Scheduled <unknown> Successfully assigned usra-hpc/filebench-seaweedfs-85cf7589b5-6f4pd to suncave-11
|
|
|
|
Normal SuccessfulAttachVolume 7m39s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-439d990a-501a-4801-99b8-d5163aedbdf8"
|
|
|
|
Warning FailedMount 3m45s (x9 over 7m13s) kubelet, suncave-11 MountVolume.SetUp failed for volume "pvc-439d990a-501a-4801-99b8-d5163aedbdf8" : rpc error: code = Internal desc = Timeout waiting for mount
|
|
|
|
Warning FailedMount 3m19s (x2 over 5m36s) kubelet, suncave-11 Unable to attach or mount volumes: unmounted volumes=[shared-seaweedfs, unattached volumes=[shared-seaweedfs default-token-nqkfj: timed out waiting for the condition
|
|
|
|
Normal Pulling 101s kubelet, suncave-11 Pulling image "localhost:30081/parkeraddison/kube-openmpi-ior"
|
|
|
|
Normal Pulled 100s kubelet, suncave-11 Successfully pulled image "localhost:30081/parkeraddison/kube-openmpi-ior" in 1.209529308s
|
|
|
|
Normal Created 100s kubelet, suncave-11 Created container filebench-seaweedfs
|
|
|
|
Normal Started 100s kubelet, suncave-11 Started container filebench-seaweedfs
|
|
|
|
```
|
|
|
|
|
|
|
|
Both times I eventually ran into an issue where performing any filesystem operations (e.g. `ls`) would hang. It seemed that sometimes these operations would complete after a while... sometimes I got impacient and killed the process and that seemed to un-hang things. Really not sure what's going on there.
|
|
|
|
|
|
|
|
## Single node CephFS parameter sweep
|
|
|
|
|
|
|
|
When doing my single node parameter sweep, I came across a bunch of helpful things to keep in mind for the future:
|
|
|
|
|
|
|
|
### Attach VSCode to kubernetes pod
|
|
|
|
|
|
|
|
This is very useful for editing files inside a pod without needing to install vim on that pod. With the VS Code Kubernetes extension installed, we can:
|
|
|
|
- Command pallet: `View: Show Kubernetes`
|
|
|
|
- Kubernetes cluster panel: `nautilus > Workloads > Pods`
|
|
|
|
- Right click pod name: `Attach Visual Studio Code`
|
|
|
|
This'll open up a new window, take care of all the port forwarding, and allow you to open remote folders and files just as you would any other remote ssh host!
|
|
|
|
|
|
|
|
Only caution I've noticed so far: the integrated terminal doesn't handle text wrapping well. As always, I recommend using a separate terminal window in general.
|
|
|
|
|
|
|
|
### IOR scripts and outputs
|
|
|
|
|
|
|
|
Rather than using the Python parameter_sweep script, I just whipped up a very tiny amount of code to populate an [IOR script](https://github.com/hpc/ior/blob/0ed2dc318724846b33237080442eac9fae528e04/doc/USER_GUIDE#L394) than ran that via `ior -f path/to/script`. This is similar to how it's done in Glenn' Lockwood's [TOKIO-ABC](https://github.com/NERSC/tokio-abc). **Using IOR scripts is the way to go**. Multiple tests of different parameters can be defined at once in a portable file then shared between systems without a Python dependency.
|
|
|
|
|
|
|
|
On NAS at the moment I still need to use the Python orchestration in order to set the Lustre stripe sizes/counts before each run... at least until the IOR Lustre options are fixed.
|
|
|
|
|
|
|
|
- [ ] ==NOTE== **I'm curious... does my previous Lustre striping workaround still work when there are multiple repetitions? I don't recall checking...**
|
|
|
|
|
|
|
|
Also **latency output (json) is measured in seconds**.
|
|
|
|
|
|
|
|
> ###### e5bc2cf2836df6c51b96bbd18bfe828595154d51
|
|
|
|
> "Run using IOR scripts; PRP ceph and seaweed" | HEAD -> main | 2021-04-28 |