This guide walks through creating an Amazon EFS file system and connecting it to your EKS cluster. The EFS CSI Driver was already installed as an addon via eksctl.yaml during cluster creation. Now we need to create the actual file system and make it available to Kubernetes workloads.
This filesystem will be used by Dynamo to store shared model weights and compilation cache across nodes.
- EKS cluster created following the README
- Environment variables set:
export AWS_REGION="us-east-1"
export CLUSTER_NAME="ai-dynamo"
export DYNAMO_NAMESPACE="dynamo-system"Get the VPC ID associated with your EKS cluster:
export VPC_ID=$(aws eks describe-cluster \
--name $CLUSTER_NAME \
--region $AWS_REGION \
--query "cluster.resourcesVpcConfig.vpcId" \
--output text)Get the CIDR range for the VPC (used for the security group rule):
export VPC_CIDR=$(aws ec2 describe-vpcs \
--vpc-ids $VPC_ID \
--query "Vpcs[0].CidrBlock" \
--output text)Create a security group that allows NFS traffic (port 2049) from within the VPC:
export EFS_SG_ID=$(aws ec2 create-security-group \
--group-name dynamo-efs-sg \
--description "Security group for EFS access from EKS" \
--vpc-id $VPC_ID \
--region $AWS_REGION \
--query "GroupId" \
--output text)Add an inbound rule to allow NFS traffic from the VPC CIDR:
aws ec2 authorize-security-group-ingress \
--group-id $EFS_SG_ID \
--protocol tcp \
--port 2049 \
--cidr $VPC_CIDR \
--region $AWS_REGIONexport EFS_FS_ID=$(aws efs create-file-system \
--performance-mode generalPurpose \
--throughput-mode elastic \
--encrypted \
--region $AWS_REGION \
--tags Key=Name,Value=dynamo-efs \
--query "FileSystemId" \
--output text)Wait for the file system to become available:
aws efs describe-file-systems \
--file-system-id $EFS_FS_ID \
--region $AWS_REGION \
--query "FileSystems[0].LifeCycleState" \
--output textYou should see available before proceeding.
Mount targets allow your EKS nodes to access the EFS file system. You need one mount target per subnet where your nodes run.
Get the subnet IDs used by your EKS cluster:
export SUBNET_IDS=$(aws eks describe-cluster \
--name $CLUSTER_NAME \
--region $AWS_REGION \
--query "cluster.resourcesVpcConfig.subnetIds[]" \
--output text)
echo "Subnet IDs: $SUBNET_IDS"Create a mount target in each subnet:
for SUBNET_ID in $(echo "$SUBNET_IDS" | tr '\t' '\n'); do
echo "Creating mount target in subnet: $SUBNET_ID"
aws efs create-mount-target \
--file-system-id $EFS_FS_ID \
--subnet-id $SUBNET_ID \
--security-groups $EFS_SG_ID \
--region $AWS_REGION 2>/dev/null || echo " Mount target already exists or subnet is in a duplicate AZ (this is OK)"
doneNote: EFS allows only one mount target per Availability Zone. If multiple subnets are in the same AZ, the command will fail for the duplicates, which is expected and safe to ignore.
Verify mount targets are available:
aws efs describe-mount-targets \
--file-system-id $EFS_FS_ID \
--region $AWS_REGION \
--query "MountTargets[*].{SubnetId:SubnetId,AZ:AvailabilityZoneName,State:LifeCycleState}" \
--output tableWait until all mount targets show available in the State column before proceeding.
Create a StorageClass that uses the EFS CSI driver with dynamic provisioning:
kubectl apply -f - << EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: efs-sc-dynamic
provisioner: efs.csi.aws.com
parameters:
provisioningMode: efs-ap
fileSystemId: "${EFS_FS_ID}"
directoryPerms: "777"
uid: "1000"
gid: "1000"
EOFWe create three separate PVCs because different Dynamo recipe examples reference each one individually:
model-cachestores downloaded model weights (e.g. from HuggingFace).compilation-cachestores vLLM/TRT-LLM compilation artifacts.perf-cachestores benchmark traces and performance results.
# Create the namespace we will use for Dynamo if not already exists
kubectl create namespace ${DYNAMO_NAMESPACE}
# Create PVCs
kubectl apply -f - << EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: model-cache
namespace: ${DYNAMO_NAMESPACE}
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
storageClassName: "efs-sc-dynamic"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: compilation-cache
namespace: ${DYNAMO_NAMESPACE}
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
storageClassName: "efs-sc-dynamic"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: perf-cache
namespace: ${DYNAMO_NAMESPACE}
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 5Gi
storageClassName: "efs-sc-dynamic"
EOFNote: EFS is elastic, the
storagevalue in the PVC is required by Kubernetes but does not limit the actual storage. EFS will grow and shrink automatically.
Confirm the PVC is bound:
kubectl get pvc -n ${DYNAMO_NAMESPACE}You should see output similar to:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
compilation-cache Bound pvc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx 5Gi RWX efs-sc-dynamic <unset> 41s
model-cache Bound pvc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx 5Gi RWX efs-sc-dynamic <unset> 42s
perf-cache Bound pvc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx 5Gi RWX efs-sc-dynamic <unset> 41s
To delete the EFS resources when no longer needed:
# Delete the Kubernetes resources
kubectl delete pvc model-cache compilation-cache perf-cache -n ${DYNAMO_NAMESPACE}
kubectl delete storageclass efs-sc-dynamic
# Delete mount targets
for MT_ID in $(aws efs describe-mount-targets --file-system-id $EFS_FS_ID --region $AWS_REGION --query "MountTargets[*].MountTargetId" --output text); do
aws efs delete-mount-target --mount-target-id $MT_ID --region $AWS_REGION
done
# Delete the EFS file system
aws efs delete-file-system --file-system-id $EFS_FS_ID --region $AWS_REGION
# Delete the security group
aws ec2 delete-security-group --group-id $EFS_SG_ID --region $AWS_REGION