In this article, part three of the IoT Security Foundations series, we examine issues related to certificate authentication and the complexities around its use in the Internet of Things.
Many security issues that plague the Internet of Things are directly caused by insecure password authentication. We have reviewed these issues and possible solutions in the previous article. Certificate authentication provides a stronger alternative, as unlike passwords, it does not rely on a short token memorized by a human operator; instead, it uses public key cryptography, with larger storage and processing requirements, more advanced protocols, and better security guarantees as a result. Certificate-based authentication is common in the Internet of Things: outside of regular client-server communication, it is used in such areas as firmware updates and local access. This article should be useful to IoT manufacturers and service providers looking for the right way to design their certificate management.
What is a Certificate?
Put simply, a certificate is a signed digital document. The most intuitive analogy is a human passport, which contains personal data (ID number, name, date of birth), and can be verified as genuine by a visual inspection of all the stamps and watermarks that should be present. A passport also contains a stamp by the issuing authority and an expiration date.
Similarly, a digital certificate contains a list of fields called "attributes" identifying its owner (called subject) and issuer. Certificates also contain the subject's name, organization name, and an expiration date. Here is where the similarity ends, however. The two most important fields of a digital certificate are cryptographic: one is the public key belonging to the subject, and the other is a digital signature, which is applied by the issuer to the entire certificate. The public key can be used to establish secure communications with the subject, and the signature proves that the subject's identity has been verified by the issuer. The subject also has a private key which matches the public key - but that key is kept secret and is not part of the certificate.
On the modern Internet, the issuer is typically a Certificate Authority (CA), and the most common digital certificate standard is X.509. The X.509 certificate format is supported in most standard communication protocols, such as SSL, TLS, SSH, IPSec, and so on.
Validating the certificate's cryptographic parts is more complex and is performed by the browser as part of establishing the web session.
The major advantage of certificates over passwords is evident from their use on the web: as a user, you don't need to have the authentication details for every website before you visit them. As long as you trust a CA, you can navigate to any website whose certificate is signed by that CA; given that each user may easily browse hundreds of websites a week, imagine the burden the CA model lifts by replacing the need for directly authenticating each website.
We should note that certificates are not necessary to benefit from the advantages of public key cryptography. Authentication with raw keys - cryptographic tokens lacking the identifying fields and the signatures present in a certificate - is not only possible but widely supported as part of common protocols such as SSH and CoAP. However, the CA model is more powerful precisely because it leverages metadata to establish trust relationships. One example is certificate chaining: a "root" issuer can sign a certificate for an intermediate authority, which in turn can sign another certificate and so on. This enables indirect trust relationships and reduces the number of root certificates that any machine or device needs to store. We review certificate chaining in detail below.
Certificate verification is a multi-step process, including a check to ensure that the certificate is correctly signed by a trusted authority, that it has not expired, that it contains all the necessary fields and more. When a communication session is established, the certificate owner proves their identity by using their private key, which only they are supposed to have: the verifier sends a randomly chosen challenge message, that it encrypts using the owner's public key. The owner of the certificate receives the encrypted challenge, decrypts it with the private key and sends it back for verification.
An important part of certificate verification is the check for certificate revocation. The CA model relies on trust along the chain: the root CA relies on the proper conduct of intermediate CAs, and those rely on the proper conduct of bottom-level CA. Every link in the chain, including the certificate's eventual subject, must also keep their private keys secret. Should this trust be broken, or should private keys be leaked, malicious entities could make use of compromised certificates since cryptographically they remain valid. A mechanism is needed to notify potential recipients of the certificates of their compromise; this is called revocation, and we review it in detail below.
Certificates have many additional fields, which we will not review here in detail. Refer to RFC 5280 for the full specification. We will mention extensions - additional fields used for different purposes and used on the application layer to pass additional information.
Certificate chaining creates a hierarchy, in which the owners of upper-level certificates sign the lower-level certificates, possibly repeating in several stages, with root CAs signing certificates for lower-level CAs and so on until the final subject certificate on the bottom level. In a properly designed hierarchy, the root CA is only rarely used, and so it can be stored in a highly secured environment or even stay offline for most of its existence.
Certificate chaining is useful in many use cases on the Internet of Things. Certificate chaining lets software developers selected by the manufacturer obtain the manufacturer's signature on their certificates. Based on the signature, devices can then receive applications installations or firmware updates. In other cases, manufacturers can authorize service providers to perform invasive debugging on devices deployed in the field, using the same mechanism. As with web communications, the device does not need to know of the developer's identity in advance, instead, the trust relationship is established by checking that their certificate is signed by a root CA associated with the device.
Certificates stored on the device occasionally need to be renewed, when they expire or become compromised. This does not concern lower levels in the certificate chaining hierarchy, whose certificates are accepted if properly signed by a known CA, without prior association, but the device must be able to update the certificate store for root CAs. On the traditional web, this is achieved by browser software updates, but in the Internet of Things, fully-featured browsers and browser certificate stores are not to be taken for granted. Device manufacturers must take explicit steps to enable certificate renewal.
The simplest way to enable certificate renewal is provisioning several extra root certificates in advance. The device can switch between them as necessary if one of them becomes invalid. The device can even save space by storing the hashes of certificates, rather than the entire certificate body. In that case, when the device encounters a new root CA certificate, it can check if its hash matches its store, and if so, accept the certificate.
Of course, the private keys for the different certificates should be stored separately, in different hardware modules or even in different geographic locations, so that a compromise would not invalidate them all at once.
Revocation is a central security concept in certificate management. Trust relationships between a CA and its customer, the subject of a certificate, can be broken for any reason, or a certificate's private keys may be stolen, allowing anyone with the private key to impersonate the original subject. Devices holding the public part of the compromised certificate then need to be explicitly notified, so that they will distrust connections or messages using that certificate. This is called revocation, and several alternative mechanisms exist as outlined below.
In the Internet of Things, certificate compromise may occur in a scenario like this:
- A server keeps a certificate to identify itself to devices. The server can be part of a cloud infrastructure or web service, or it can be a gateway or even another device.
- Devices have copies of the server's public certificate and trust the server. Trust can mean accepting commands from it, submitting data to it, using it as a gateway to communicate through it to other hosts, or installing firmware updates signed by it.
- An attacker breaks into the server and steals the server certificate's private key. The attacker can now impersonate the server, connect to the devices, and effectively command them or expose their data.
- The server's owner detects the attack, plugs the security hole which allowed it to take place, and resets the server. The server's owner updates the server certificate, and if a renewal mechanism is in place as discussed earlier, the server's owner can contact the devices securely. Unfortunately, at this point, the attacker can still impersonate the server using the old certificate.
- The owner now needs to revoke the server's old certificate trusted by the devices.
Not all vendors plan ahead for certificate revocation and renewal, resulting in a costly and difficult disaster recovery process in case of compromise, involving device recalls, and threatening business continuity.
Several alternative methods for revocation are supported by modern clients and servers, with different pros and cons for each.
Certificate Revocation Lists (CRL)
CRL-based revocation is a mechanism supported by TLS clients as part of the X.509 standard. To check if a certificate is revoked, the device reads a certificate field called the CDP (short for CRL Distribution Point), which contains a web address. The device then has to issue a separate HTTP(s) query to that address, which usually - but not necessarily - points to the same CA server that issued the certificate. The device then receives a Certificate Revocation List, a digital list of invalidated certificates, and searches it to see if the server's certificate in question is listed. For the solution to be robust, the device also needs to cryptographically authenticate the CRL.
The CRL model is costly because of the extra communication involved: the device spends more time and consumes more power when creating a connection, and on the server side, when dealing with large device populations, the load on the CA can be considerable, necessitating a scalable design. Another issue is the need for the device to parse the CRL, which in some cases can grow quite large over time, especially when compared to small available memory sizes in embedded devices.
Most critically, the CRL may be unavailable to the device - either due to network issues or intentional attacker meddling. The device then faces an unpleasant dilemma: distrust the certificate and drop the connection or go ahead in spite of the risk. Different client communication stacks and libraries take different options. In fact, both options are bad, because an attacker capable of manipulating the network either gains the ability to continue working with the device despite a revoked certificate or they can effectively block the device from communicating altogether just by blocking the CRL.
The OCSP protocol, below, attempts to solve the issues with CRL.
Online Certificate Status Protocol (OCSP)
By itself, OCSP is just a simplified and concise protocol for a device to perform a revocation check. The device can query a CA about a particular server certificate and receive an answer on whether it has been revoked. That would be more efficient than downloading and parsing CRLs but still requires the device to perform an online query. What makes this protocol truly useful is OCSP stapling.
When both the server and the device support the TLS extension called "OCSP stapling", the server sends the client a fresh OCSP response along with its certificate. The OCSP response is signed by the relevant CA server. This proves that the certificate has not been revoked, at least at the time of the signature. For the device, this removes the need to initiate a separate connection to verify the certificate. The burden of communicating to the CA is now on the server, and the solution can be quite scalable because the CA can cache and reuse an OCSP response within its validity window. For example, defining a validity window of 24 hours means that each server must only contact the CA once per day. Compare that to multiple daily requests from fleet hundreds of thousands of devices in the CRL model, and you should see the benefits.
The problem with OCSP stapling is that it was added to the TLS protocol fairly recently, in version 1.2. Because of that, it is missing from older IoT connectivity stacks supporting only TLS 1.1 and earlier. Some other implementations support the basic TLS 1.2 but are missing the OCSP stapling extension. This may be an obstacle to the wide adoption of OCSP stapling.
Another alternative is to use short-lived server certificates, making explicit revocation unnecessary. Under this model, a server identifies itself by a very short-lived certificate, but the CA certificate signing it can be long-term. The device only stores the CA certificate and does not have to refresh it often, but the server has to contact the CA often to have its short-lived certificate refreshed.
Because the root CA certificate is long-lived, its private key should be stored securely (for example, in a Hardware Security Module) to reduce the risk of compromise. Otherwise having the private key stolen could result in a severe impact, especially if no revocation and renewal mechanisms are in place.
The CA certificate signs a server certificate with a short validity period - for example, 24 hours. When a device receives the server certificate, it can see from the CA signature that the server was not revoked, with a confidence level which directly depends on the validity period: the shorter the expiration window, the higher the confidence. On the other hand, the server has to obtain a new certificate every time it expires. Shorter validity periods cause more communications between the server and the CA. However, the extra communication on the cloud side is negligible.
Since the short-lived certificate will expire in 24 hours, its usefulness to an attacker who steals it is limited. If the certificate is compromised, its owner can simply let it expire. These security guarantees should be enough to replace the need for a revocation mechanism altogether.
For this scheme to work properly, the limited server certificate lifespan must be actively enforced by the CA or the device, preferably both. Otherwise, a compromised server could obtain a long-lived certificate and subvert this mechanism. This requires software support in both the CA and the device.
Overall, this scheme incurs a slight communication burden on the CA and the server, and a slight communication between the server and the device - the handshake size grows due to the OCSP response. These communication costs are usually negligible. No extensions to the TLS protocol are involved, but the device and CA need some logic to enforce the short certificate lifetime, which can be difficult with some TLS libraries.
Short-lived certificates have received some support in the industry, and an effort to standardize them is underway. Although there is no technical obstacle to implementing short-lived certificates in the Internet of Things, so far, they are not in wide use. Perhaps that will change over time, or perhaps the dominant solution would be to upgrade to TLS 1.2 and support OCSP stapling instead.
Certificate authentication can provide robust and scalable solutions in the Internet of Things, as it already does in the traditional internet and the web. It takes planning to deploy and maintain certificates securely. IoT manufacturers should design their certificate management mechanisms carefully while addressing such concerns as certificate renewal and revocation.
Written by Leo Dorrendorf and Anna Schnaiderman