Shouldn't the Kubernetes deployment documentation use a headless service?

Been fleshing out my crate deployment and I noticed that the crate on kubernetes doc doesn’t use a headless service for the discovery via dns.

This seems like a misconfiguration to me. Should the docs be updated to account for this?


Below I’ve included a deployment manifest with a headless service I named crate-pods as well as an example from a debug pod illustrating the difference between the srv records for a headless and non-headless service.

This is the debug pod.

$ kubectl run -n databases --rm -it debug-pod --image alpine -- sh
If you don't see a command prompt, try pressing enter.
/ # apk add -U bind-tools curl jq
... truncated
Executing ca-certificates-20211220-r0.trigger
OK: 16 MiB in 34 packages


/ # dig srv _transport._tcp.crate-pods.databases.svc.cluster.local +short
0 33 4300 10-42-0-230.crate-pods.databases.svc.cluster.local.
0 33 4300 10-42-1-190.crate-pods.databases.svc.cluster.local.
0 33 4300 10-42-2-111.crate-pods.databases.svc.cluster.local.


/ # dig srv _transport._tcp.crate.databases.svc.cluster.local +short
0 100 4300 crate.databases.svc.cluster.local.

This is my manifest yaml.

---
kind: Namespace
apiVersion: v1
metadata:
  name: databases

---
kind: Service
apiVersion: v1
metadata:
  namespace: databases
  name: crate-pods
  labels:
    app: crate
spec:
  clusterIP: None
  ports:
  - port: 4300
    name: transport
  selector:
    app: crate

---
kind: Service
apiVersion: v1
metadata:
  namespace: databases
  name: crate
  labels:
    app: crate
spec:
  type: ClusterIP
  ports:
  - port: 4300
    name: transport
  - port: 4200
    name: http
  - port: 5432
    name: pgsql
  selector:
    app: crate

---
kind: StatefulSet
apiVersion: "apps/v1"
metadata:
  namespace: databases
  name: crate
spec:
  serviceName: "crate"
  podManagementPolicy: "Parallel"
  replicas: 3
  selector:
    matchLabels:
      app: crate
  template:
    metadata:
      labels:
        app: crate
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values:
                       - crate
                topologyKey: kubernetes.io/hostname

      initContainers:
      - name: init-sysctl
        image: busybox
        imagePullPolicy: IfNotPresent
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
          
      containers:
      - name: crate
        image: crate:4.7
        args:
          - -Cnode.name=${POD_NAME}
          - -Ccluster.name=${CLUSTER_NAME}
          - -Ccluster.initial_master_nodes=crate-0,crate-1,crate-2
          - -Cdiscovery.seed_providers=srv
          - -Cdiscovery.srv.query=_transport._tcp.crate-pods.${NAMESPACE}.svc.cluster.local
          - -Cgateway.recover_after_nodes=2
          - -Cgateway.expected_nodes=${EXPECTED_NODES}
          - -Cpath.data=/data
          # Ignore these, it's a WIP. Mostly just stuck on where I want that darn trust to live... ConfigMap or elsewhere... idk yet.
          #
          # When it is time to do hardening, perhaps this is the way to start working on host_based access?
          #     
          #
          #- -Cauth.host_based.enabled=true
          #
          # For node-to-node communication a cert needs to be used maybe?
          #     https://crate.io/docs/crate/reference/en/4.7/admin/auth/hba.html#node-to-node-communication
          #
          #- -Cssl.transport.mode=on
          #- -Cauth.host_based.config.0.protocol=transport
          #- -Cauth.host_based.config.0.ssl=on
          #- -Cauth.host_based.config.0.method=cert
          # 
          # Locking down the crate user?
          #     https://crate.io/docs/crate/reference/en/4.7/admin/auth/hba.html#authentication-against-cratedb
          #
          #- -Cauth.host_based.config.1.user=crate
          #- -Cauth.host_based.config.1.address=127.0.0.1
          #- -Cauth.host_based.config.1.method=trust
        volumeMounts:
            - mountPath: /data
              name: data
        resources:
          limits:
            memory: 512Mi
        livenessProbe:
          httpGet:
            path: "/"
            port: crate-web
            scheme: HTTP
          initialDelaySeconds: 300
          periodSeconds: 5
        ports:
        - containerPort: 4300
          name: crate-internal
        - containerPort: 4200
          name: crate-web
        - containerPort: 5432
          name: postgres
        env:
          # Heap-size detected by cratedb
        - name: CRATE_HEAP_SIZE
          value: "256m"
          # command-line options
        - name: EXPECTED_NODES
          value: "3"
        - name: CLUSTER_NAME
          value: "crate-cluster"
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace

  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
1 Like

@protosam

fair point, I think I’d generally agree with you.
I will ask in the team if there was a reason to use a ClusterIP service instead

Would think there should be 3 services for this to work right:

  • one headless service for cluster discovery
  • one ClusterIP service for load balanced access intra-cluster
  • one LoadBalancer service for external access

At least that’s how I’m setting my cratedb cluster up in my cluster right now.

In theory even only one headless service would be sufficient, if CrateDB would only be accessed from within the k8s cluster. ClusterIP also works, as CrateDB will also discover all nodes after some time due multiple dns queries and shares the pods IP addresses through the cluster state.

  • one ClusterIP service for load balanced access intra-cluster

What is this for? CrateDB balances load on its own, independent of k8s. In fact the (headless) service is only used for discovering new nodes.

Using a non-headless service absolutely works when testing with 3 - 5 nodes, but theoretically it’s possible for a node to end up isolated from the rest of the cluster in larger deployments which will take longer to start up.

When connecting to any service in a cluster, it should be done via service-name.svc.namespace.cluster.local.

In the case of a headless service, this would result in DNS routing/load balancing, which there’s tons of reasons this is bad, I’m kinda being lazy here and just going to say this is bad; and in Kubernetes there are a few extra reasons with the statefulness of the pods that it’s bad. However for discovery headless services are wonderful.

For access to the service, the clusterip service should be used for routing non-crate applications inside the cluster to the crate db, without leaving the cluster network. Yes, crate distributes it’s workloads and knows about it’s own health, however for consistency in Kubernetes when it’s deployed you should rely on the service to decide where you get routed. Otherwise bypassing cluster health checks should be considered bad practice.

Think that kinda sums up my thought process and where I’m going with my own deployment.

1 Like

For access to the service, the clusterip service should be used for routing non-crate applications inside the cluster to the crate db, without leaving the cluster network

Yes, absolutely agree :slight_smile:
I just misread “intra-cluster” as “intra-crate-cluster”, not “intra-k8s-cluster”


Think we might need to update the docs a bit

Made a comment on your PR about it. I think the documentation can be expanded a bit.