The CPI/CSI providers are generic. There are a few differences I have found due to additional taints RKE applies and the fact that all components in RKE need to run in a container.
To start off when you create a cluster, please edit the cluster.yaml in Rancher / RKE with the following tweaks to the kubelet.
kubelet:
fail_swap_on: false
generate_serving_certificate: false
extra_binds:
- /var/lib/csi/sockets/pluginproxy/csi.vsphere.vmware.com:/var/lib/csi/sockets/pluginproxy/csi.vsphere.vmware.com:rshared
- /csi:/csi:rshared
extra_args:
cloud-provider: external
Now when nodes are provisioned via Rancher, you will see the additional taints before installing the CPI.
Now you can use the CPI install instructions.
A minor tweak is needed to the CPI daemonset manifest to allow it tolerate the RKE taints.
tee $HOME/cloud-provider.yaml > /dev/null << EOF
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: cloud-controller-manager
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: vsphere-cloud-controller-manager
namespace: kube-system
labels:
k8s-app: vsphere-cloud-controller-manager
spec:
selector:
matchLabels:
k8s-app: vsphere-cloud-controller-manager
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
k8s-app: vsphere-cloud-controller-manager
spec:
nodeSelector:
node-role.kubernetes.io/controlplane: "true"
securityContext:
runAsUser: 0
tolerations:
- key: node.cloudprovider.kubernetes.io/uninitialized
value: "true"
effect: NoSchedule
- key: node-role.kubernetes.io/controlplane
value: "true"
effect: NoSchedule
- key: node-role.kubernetes.io/etcd
value: "true"
effect: NoExecute
serviceAccountName: cloud-controller-manager
containers:
- name: vsphere-cloud-controller-manager
image: gcr.io/cloud-provider-vsphere/cpi/release/manager:latest
args:
- --v=2
- --cloud-provider=vsphere
- --cloud-config=/etc/cloud/vsphere.conf
volumeMounts:
- mountPath: /etc/cloud
name: vsphere-config-volume
readOnly: true
resources:
requests:
cpu: 200m
hostNetwork: true
volumes:
- name: vsphere-config-volume
configMap:
name: cloud-config
---
apiVersion: v1
kind: Service
metadata:
labels:
component: cloud-controller-manager
name: vsphere-cloud-controller-manager
namespace: kube-system
spec:
type: NodePort
ports:
- port: 43001
protocol: TCP
targetPort: 43001
selector:
component: cloud-controller-manager
---
EOF
Once the cloud-controller is installed you will see the taints are removed.
A similar tweak is needed for the CSI controller manifest, to allow it to handle RKE taints
tee csi-controller.yaml >/dev/null <<'EOF'
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: vsphere-csi-controller
namespace: kube-system
spec:
serviceName: vsphere-csi-controller
replicas: 1
updateStrategy:
type: "RollingUpdate"
selector:
matchLabels:
app: vsphere-csi-controller
template:
metadata:
labels:
app: vsphere-csi-controller
role: vsphere-csi
spec:
serviceAccountName: vsphere-csi-controller
nodeSelector:
node-role.kubernetes.io/controlplane: "true"
tolerations:
- key: node-role.kubernetes.io/controlplane
value: "true"
effect: NoSchedule
- key: node-role.kubernetes.io/etcd
value: "true"
effect: NoExecute
dnsPolicy: "Default"
containers:
- name: csi-attacher
image: quay.io/k8scsi/csi-attacher:v1.1.1
args:
- "--v=4"
- "--timeout=300s"
- "--csi-address=$(ADDRESS)"
env:
- name: ADDRESS
value: /csi/csi.sock
volumeMounts:
- mountPath: /csi
name: socket-dir
- name: vsphere-csi-controller
image: gcr.io/cloud-provider-vsphere/csi/release/driver:v1.0.1
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "rm -rf /var/lib/csi/sockets/pluginproxy/csi.vsphere.vmware.com"]
args:
- "--v=4"
imagePullPolicy: "Always"
env:
- name: CSI_ENDPOINT
value: unix:///var/lib/csi/sockets/pluginproxy/csi.sock
- name: X_CSI_MODE
value: "controller"
- name: VSPHERE_CSI_CONFIG
value: "/etc/cloud/csi-vsphere.conf"
volumeMounts:
- mountPath: /etc/cloud
name: vsphere-config-volume
readOnly: true
- mountPath: /var/lib/csi/sockets/pluginproxy/
name: socket-dir
ports:
- name: healthz
containerPort: 9808
protocol: TCP
livenessProbe:
httpGet:
path: /healthz
port: healthz
initialDelaySeconds: 10
timeoutSeconds: 3
periodSeconds: 5
failureThreshold: 3
- name: liveness-probe
image: quay.io/k8scsi/livenessprobe:v1.1.0
args:
- "--csi-address=$(ADDRESS)"
env:
- name: ADDRESS
value: /var/lib/csi/sockets/pluginproxy/csi.sock
volumeMounts:
- mountPath: /var/lib/csi/sockets/pluginproxy/
name: socket-dir
- name: vsphere-syncer
image: gcr.io/cloud-provider-vsphere/csi/release/syncer:v1.0.1
args:
- "--v=2"
imagePullPolicy: "Always"
env:
- name: FULL_SYNC_INTERVAL_MINUTES
value: "30"
- name: VSPHERE_CSI_CONFIG
value: "/etc/cloud/csi-vsphere.conf"
volumeMounts:
- mountPath: /etc/cloud
name: vsphere-config-volume
readOnly: true
- name: csi-provisioner
image: quay.io/k8scsi/csi-provisioner:v1.2.2
args:
- "--v=4"
- "--timeout=300s"
- "--csi-address=$(ADDRESS)"
- "--feature-gates=Topology=true"
- "--strict-topology"
env:
- name: ADDRESS
value: /csi/csi.sock
volumeMounts:
- mountPath: /csi
name: socket-dir
volumes:
- name: vsphere-config-volume
secret:
secretName: vsphere-config-secret
- name: socket-dir
hostPath:
path: /var/lib/csi/sockets/pluginproxy/csi.vsphere.vmware.com
type: DirectoryOrCreate
---
apiVersion: storage.k8s.io/v1beta1
kind: CSIDriver
metadata:
name: csi.vsphere.vmware.com
spec:
attachRequired: true
podInfoOnMount: false
EOF
The node drivers dont need any tweaks as they are a standard daemonset.
Post this you should be able to configure a storage class and consume it in your workloads.
