Migrate to FileStorage (RWX) for Imaging Kubernetes deployments


Applicable when upgrading to ≥ 3.6.0-funcrel

Overview

This guide describes how to convert the pvc-shared-datadir PVC used by the analysis-node from DiskStorage (ReadWriteOnce/RWO) to FileStorage (ReadWriteMany/RWX) on an existing CAST Imaging deployment on Kubernetes. This conversion is required as part of updating to CAST Imaging 3.6.x-funcrel when your deployment currently uses AnalysisNodeFS.enable=false. Once complete, continue with the standard Cloud services via Kubernetes update process.

Applicability

This procedure applies only if all of the following are true:

  • You are upgdating from CAST Imaging < 3.6.0-funcrel to 3.6.0-funcrel or later
  • Your current deployment uses AnalysisNodeFS.enable=false

It applies whether or not you plan to use multiple analysis nodes in the future.

Important note

During this procedure, the analysis-node will be shut down and no analyses will be able to run. imaging-services, dashboards, and imaging-viewer will remain available throughout.

Prerequisites

  • A local machine with the kubectl tool installed and remote access to the Kubernetes cluster running your CAST Imaging instance
  • The examples below assume the namespace is castimaging-v3 - replace this with your actual namespace value

Conversion procedure

Step 1 - Prepare the migration

Run helm upgrade with the following values:

prepareMigrationToAnalysisNodeFS: true
AnalysisNodeReplicaCount: 0
AnalysisNodeFS:
  enable: false

This will:

  • Scale down the analysis-node
  • Create a new pvc-shared-datadir-mig volume (FileStorage)
  • Run a job to copy the current volume content to pvc-shared-datadir-mig

Step 2 - Verify the copy job succeeded

Check the status of the save-pvc-data-xxxxx pod - it should show Completed. To find its name:

kubectl get pods -n castimaging-v3

To view the log:

kubectl logs save-pvc-data-xxxxx -n castimaging-v3

If the job failed i.e. it does not show as Completed, follow these recovery steps:

a) Examine the pod status, log file, and cluster events to identify and fix the issue. One possible cause is a File Storage provisioning problem - check cluster events and provisioner logs.

b) Once the issue is resolved, delete the failed job, the migration PVC, and the storage class:

kubectl delete job save-pvc-data -n castimaging-v3
kubectl delete pvc pvc-shared-datadir-mig -n castimaging-v3
kubectl delete sc castimaging-fs

c) Re-run helm upgrade with the same values as in Step 1 and repeat from step a) above until the job completes successfully.

Step 3 - Delete the old PVC

Once the save-pvc-data-xxxxx job has succeeded and any warnings/failures in the log have been resolved, delete the job and the old PVC:

kubectl delete job save-pvc-data -n castimaging-v3
kubectl delete pvc pvc-shared-datadir -n castimaging-v3

Step 4 - Restart the analysis-node

Run helm upgrade with the following values:

prepareMigrationToAnalysisNodeFS: true
AnalysisNodeReplicaCount: 1
AnalysisNodeFS:
  enable: true

This will restart the analysis-node using the new FileStorage PVC and copy the saved data from pvc-shared-datadir-mig to pvc-shared-datadir via an init-container.

Step 5 - Global health check

Check pod statuses, open CAST Imaging, and confirm all services are up.

Step 6 - Cleanup

Run helm upgrade with the following values:

prepareMigrationToAnalysisNodeFS: false
AnalysisNodeReplicaCount: 1
AnalysisNodeFS:
  enable: true

This will delete the pvc-shared-datadir-mig PVC and perform a final restart of the analysis-node.


The FileStorage migration is now complete. Continue with the standard Cloud services via Kubernetes update process to finish updating to the new release.

Appendix

Cancelling the FileStorage migration process

When reviewing the save-pvc-data-xxxxx job logs as described in Step 2, if an issue is reported in the log that you cannot immediately resolve and you need more time to decide how to proceed, you can cancel the conversion process and attempt it again later. To do so:

  • Run helm upgrade with the following values (this will bring the analysis node back online and restore the initial state):
prepareMigrationToAnalysisNodeFS: false
AnalysisNodeReplicaCount: 1
AnalysisNodeFS:
  enable: false