How to Resolve Kubernetes Autoscaler Error “Unable to read all metrics”

I was exploring GKE or Google Kubernetes Engine recently. Google Cloud has pretty good UI interface where you can deploy workloads & services without having to define your own YAML file. There at first, you can skip kubectl console if you are starting as a Kubernetes beginner. I deployed one application in GKE cluster from UI. Everything was fine. Three pods got deployed successfully & they were in running state. But while exploring deployment OVERVIEW tab, I noticed one thing. K8s Autoscaler was not working. I saw a message saying “Status Unable to read all metrics” in AutoScaler section.

But I could see that CPU metrics of pods were getting tracked by GKE properly.

So what can be the problem? How can Kubernetes autoscaler can’t read the metrics when our K8s cluster can track & show the same metrics properly.

Upon a bit of investigating, I found a problem with YAML file generated by GKE. In K8s deployment yaml file, the resources field was empty. When Google Cloud UI were taking inputs for new deployment, it should have asked for the details to populate resources field. This field tells autoscaler what resource limit is requested by one pod. Resource can be CPU & memory. It will define maximum CPU or memory utilization of a pod. Beyond that, autoscaler will create new pod to scale up. Part of the YAML file looked like below:

    spec:
      containers:
      - image: test/image1:latest
        imagePullPolicy: Always
        name: catalogue-1
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File

As resource utilization limit was not set in deployment YAML file, autoscaler didn’t know when to scale up or scale down a replica set. If it can’t detect resource limit, it can’t scale up when that limit is breached. It is as simple as that.

To fix this, I just added CPU limit to 500m & updated deployment YAML file. Now the same section looked similar to below:

    spec:
      containers:
      - image: test/image1:latest
        imagePullPolicy: Always
        name: carts-1
        resources:
          requests:
            cpu: 500m
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File

Once the updated workload got redeployed, autoscaler started working. From yellow, it became green.

So if you face same kind of problem with your Horizontal Pod Autoscaler (HPA), just check the above steps & make sure you are not doing the same mistake.

Leave a Comment