Easier Troubleshooting of cert-manager Certificates

[Editor’s note: This post was written by Haoxiang Zhou who was a work placement student at Jetstack for the past four months. We are grateful to Haoxiang for adding this very useful feature, and all his other contributions, and wish him all the best with his final year of study.]

This post will explore the newest addition to the kubectl plugin of cert-manager, kubectl cert-manager status certificate, a command designed to make the troubleshooting experience of cert-manager problems easier. The command was hugely improved in the recent v1 release. Jump to the bottom for more information on how to get involved and start contributing!

Why do we need the command?

Troubleshooting cert-manager has always been a recurring topic in the community, that is why the cert-manager team has created the Troubleshooting guide along with extra help for ACME Certificates to assist users to identify the most common causes of errors. Even if the user is unable to find the solution themselves, these guides enable them to narrow down the problem and make seeking help more effective.

We have received a lot of positive feedback from users saying that the guides were very helpful. As a small bonus point, it made our team members’ lives easier as well. But we wanted to take it a step further.

The usual advice that we give users and which is documented in the Troubleshooting guides is to use kubectl describe on various resources in order to pinpoint where the error is happening.

For example, to find out why an ACME Certificate is not ready, in the worst case, the user has to run kubectl describe on the Certificate resource, copy the name of the CertificateRequest then kubectl describe on the CertificateRequest. Then copy the name of the Issuer to do kubectl describe on the Issuer. Then again for the Order, and again for the Challenges. In the example below I omitted parts of the actual output for the sake of readability.

$ kubectl get certificate
NAME               READY   SECRET     AGE
acme-certificate   False   acme-tls   66s

$ kubectl describe certificate acme-certificate
[...]
Events:
  Type    Reason     Age   From          Message
  ----    ------     ----  ----          -------
  Normal  Issuing    90s   cert-manager  Issuing certificate as Secret does not exist
  Normal  Generated  90s   cert-manager  Stored new private key in temporary Secret resource "acme-certificate-tr8b2"
  Normal  Requested  89s   cert-manager  Created new CertificateRequest resource "acme-certificate-qp5dm"

$ kubectl describe certificaterequest acme-certificate-qp5dm
[...]
Events:
  Type    Reason        Age    From          Message
  ----    ------        ----   ----          -------
  Normal  OrderCreated  7m17s  cert-manager  Created Order resource default/acme-certificate-qp5dm-1319513028

$ kubectl describe order acme-certificate-qp5dm-1319513028
[...]
Events:
  Type    Reason   Age    From          Message
  ----    ------   ----   ----          -------
  Normal  Created  7m51s  cert-manager  Created Challenge resource "acme-certificate-qp5dm-1319513028-1825664779" for domain "example-domain.net"

$ kubectl describe challenge acme-certificate-qp5dm-1319513028-1825664779
[...]
Status:
  Presented:   false
  Processing:  true
  Reason:      error getting clouddns service account: secret "clouddns-accoun" not found
  State:       pending
Events:
  Type     Reason        Age                    From          Message
  ----     ------        ----                   ----          -------
  Normal   Started       8m56s                  cert-manager  Challenge scheduled for processing
  Warning  PresentError  3m52s (x7 over 8m56s)  cert-manager  Error presenting challenge: error getting clouddns service account: secret "clouddns-accoun" not found

While it helps us achieve what we want, speaking out of personal experience, that can quickly become tedious. That is why when the team discussed building a command-line tool that outputs all the information needed for troubleshooting as an addition to our kubectl CLI plugin, I gladly took over the task.

What does the command do?

To run the command, you need to have the cert-manager kubectl CLI plugin installed. You can either follow these steps in the cert-manager docs, or use krew, the package manager for kubectl plugins, to install the plugin.

Once the plugin is ready, you can run kubectl cert-manager status certificate <name-of-cert>. That will then look for the Certificate with the name <name-of-cert> in the specified/default namespace and any related resources like CertificateRequest, Secret, Issuer, as well as Order and Challenges if it is an ACME Certificate. The command outputs information about the resources, including Conditions, Events and resource specific fields like Key Usages and Extended Key Usages of the Secret or Authorizations of the Order.

Taking the example from above:

$ kubectl cert-manager status certificate acme-certificate
Name: acme-certificate
Namespace: default
Created at: 2020-08-21T16:44:13+02:00
Conditions:
  Ready: False, Reason: DoesNotExist, Message: Issuing certificate as Secret does not exist
  Issuing: True, Reason: DoesNotExist, Message: Issuing certificate as Secret does not exist
DNS Names:
- example-domain.net
Events:
  Type    Reason     Age   From          Message
  ----    ------     ----  ----          -------
  Normal  Issuing    18m   cert-manager  Issuing certificate as Secret does not exist
  Normal  Generated  18m   cert-manager  Stored new private key in temporary Secret resource "acme-certificate-tr8b2"
  Normal  Requested  18m   cert-manager  Created new CertificateRequest resource "acme-certificate-qp5dm"
Issuer:
  Name: acme-issuer
  Kind: Issuer
  Conditions:
    Ready: True, Reason: ACMEAccountRegistered, Message: The ACME account was registered with the ACME server
  Events:  <none>
error when finding Secret "acme-tls": secrets "acme-tls" not found
Not Before: <none>
Not After: <none>
Renewal Time: <none>
CertificateRequest:
  Name: acme-certificate-qp5dm
  Namespace: default
  Conditions:
    Ready: False, Reason: Pending, Message: Waiting on certificate issuance from order default/acme-certificate-qp5dm-1319513028: "pending"
  Events:
    Type    Reason        Age   From          Message
    ----    ------        ----  ----          -------
    Normal  OrderCreated  18m   cert-manager  Created Order resource default/acme-certificate-qp5dm-1319513028
Order:
  Name: acme-certificate-qp5dm-1319513028
  State: pending, Reason: 
  Authorizations:
    URL: https://acme-staging-v02.api.letsencrypt.org/acme/authz-v3/12345678, Identifier: example-domain.net, Initial State: pending, Wildcard: false
Challenges:
- Name: acme-certificate-qp5dm-1319513028-1825664779, Type: DNS-01, Token: exampleToken, Key: exampleKey, State: pending, Reason: error getting clouddns service account: secret "clouddns-accoun" not found, Processing: true, Presented: false

Conclusion

With the command, the user can just dump all the information about a Certificate resource if something is not going well, making it much simpler to find out what is causing trouble. It also makes it easier for others to help spot any issues, e.g. when seeking advice on Slack, the output of the status certificate command is a great starting point for others to assist in the troubleshooting process, rather than just “My certificate is stuck in pending state” or pasting a page of logs which are usually far too verbose to be useful in this context, especially in production environments.

Finally, I encourage everybody to use this command as well as other commands of the plugin. Feedback, bug reports, feature requests or even PRs are very much welcome.

To get involved, sign up for the mailing list which we use to send out calendar invites. Once you have joined, you’ll receive invites to our bi-weekly development meeting (held every other Wednesday at 5pm UK time). The #cert-manager channel on the Kubernetes Slack is also a great place to start and get chatting about ideas or questions you have. Please drop by and say hello!

Of course the above does not only apply to the cert-manager kubectl plugin, but to anything cert-manager related! We always welcome new participants, in whatever capacity.