In part 1 of this blog, we looked at Kubernetes from the perspective of understanding how it works and how it differs from a traditional software deployment environment. It is important to have that knowledge before deciding what technology to use to secure your cluster.
In this part, we will consider two of the previously identified concerns in more detail: image health and anomaly detection. We will also explain some of the common attack vectors for Kubernetes.
Image scanning
Every time the orchestrator needs to spin up a new container, it reads the deployment manifest associated with that container, looks up and retrieves the associated image in the image repository and spins up a new container based on that image.
An image is the base of a running container. You create an image specific to your containers by deploying your application code, along with associated libraries and other dependencies, on top of a ‘parent image’. As stated in the Docker documentation, “A parent image is the image that your image is based on.” These images are usually ‘light’ versions of common operating systems like Ubuntu, Red Hat and Alpine Linux, but they can also be platform images for databases like MySQL, PostgreSQL, Redis etc.
If the parent image or the framework libraries retrieved during your image creation contain vulnerabilities, they are inherited by your containers and will be deployed in the cluster when spinning up your new containers.
A good practice is to protect your code base by following secure development practices, including code review, code scan and vulnerability scanning before building an image. When building your container image, it’s also good practice to run a vulnerability scan on your newly built image using an automated tool. Once that is done and testing is complete, the image can be stamped by the security team as suitable for running in the production environments.
You should also follow good security practices when promoting your images to production to protect their integrity, including image signing, integrity checks and continuous vulnerability scans in production. There is a range of products that can help you monitor your vulnerability status, including Aqua Sec Platform, NueVector, Qualys and Twistlock.
All these tools have their own distinctive features and ways to manage the findings, but there are basic features common to most of them. All these solutions will deploy an agent in your cluster that scans your images every time you start a new container and provides you with a view of the current state of images that are running in your environment.
The information will look similar to the screenshots in Figures 1 and 2.
Figure 1: Images and vulnerabilities (source: Twistlock)
Figure 2: Vulnerabilities scores and coverage (source: Twistlock)
I’d like to stress that the agent scans the images from which containers are created and not the actual containers themselves. This means that if you fix problems in the running instances of your containers rather than in the images, the report will still list the same vulnerabilities for your container image after re-running a scan. This is expected behaviour, as the scanning agent sits with the orchestrator and scans images when a new container is created, not at runtime. We need to remember that when that container is destroyed or recreated, the substitute will be built anew from the vulnerable container image. All the changes made manually in a running instance will be lost.
The right way to fix security problems is to go back upstream and resolve them in your delivery pipeline at the build stage, following the established route for change management, including all the stages for validating and promoting the new image into the production repository. This fixes the problem in the image itself and ensures consistency, versioning and developer discipline.
In Kubernetes, when you rebuild the image and tag it, you can also update the deployment manifest to use the latest tag. An update to the deployment manifest will make your orchestrator rebuild all the containers affected by that change. This happens only when the orchestrator perceives that there’s a difference in the image name, tag or other configuration in the container deployment manifest.
A good practice to achieve that is to maintain versioning in the image tagging and promote that version into your manifest rather than using the generic tag ‘:latest’ for a newly built image.
Activity monitoring for anomaly detection
Everybody agrees that containers are designed to be disposable. To achieve efficient lifecycle management, the container platforms use several layers of abstraction to ensure portability, scalability and, of course, soft eviction of containers once they have been terminated.
Logs, data and configuration are also all present at different layers of abstraction. When we evict a container, the data and logs stored by that container will also be deleted. Yet, many of the cluster’s activities are not visible to traditional monitoring systems, which means that traditional log analysis is likely to be unsuitable or at least incomplete for use with container clusters.
In Kubernetes, logging is available at either the node or cluster level. Application logs will normally be directed to ‘stderr’ and ‘stdout’, and for them to be captured persistently, a logging agent must be specified to manage the logs.
In contrast, system components in a container always log into ‘/var/log’. So, it’s always sensible to mount persistence volumes to map a container’s ‘/var/log’ directory to persistent storage outside the container to collect these logs since it allows you to retain the logs after pod eviction.
Kubernetes cluster-level logging uses three main delivery mechanisms:
- Use of node-level logging agents (running on each node)
- Dedicated sidecar containers for logging in an application pod
- Pushing log messages directly to a back end from within the application
Each of these logging approaches has its benefits and constraints, and you must select one that suits your case best. You can find more details and examples of implementing each mechanism on the Kubernetes website or on vendor websites.
In the previous part of this article, we discussed an example where an attacker had been able to compromise an entire cluster by exploiting a Struts 2 vulnerability via a legitimate access protocol. To establish the full chain of events in a traditional environment, the forensic team would require a full set of logs and an unaltered image of the host. As mentioned, traditional methods for logging and monitoring are not sufficient for the security monitoring of containerised platforms like Kubernetes.
What if the attacker, after establishing a legitimate access route to elevate their privileges (like becoming a privileged user in the administration dashboard), removed the compromised container or evicted the pod? Removing that compromised container would trigger Kubernetes self-healing and result in the cluster running a brand new, clean version of the same container without any traces of compromise.
There are a number of tools (like NeuVector and Twistlock) that will collect the audit information for you and store it in a separate volume, even if the pods have been recreated and the old containers cease to exist. This means you will still be able to access audit logs and shell commands that were run inside the terminated containers. The log is collected in real time and stored in a separate persistent volume or delivered to a remote location. You can use visualisation features to analyse the data or transport the log to an external tool for security monitoring and alerting. Tools like these are valuable because even after a compromise and container rebuild, you will still have access to logs and be able to recreate some parts of the attack.
Many activity monitoring solutions have some type of machine learning capability to recognise utilisation patterns and use them as a baseline to assess observed behaviour for anomalies. These solutions can usually run in passive mode (watch and report) as well as active mode (stop activities that are deemed malicious).
The screenshot in Figure 3 shows an example of a tool in action, illustrating how they approach learning mode for newly created services.
Figure 3: Learning mode (source: Twistlock)
Let’s examine one of these activity monitoring solutions in more detail. In my lab, I deployed a monitoring solution from one of the vendors. In parallel, I ran a free tool to perform a configuration scan of my environment to identify cluster misconfigurations and weaknesses. Since that was not a ‘common’ activity on my cluster, the monitoring solution immediately detected the activity as suspicious and alerted the administrator. The same happened when I tried to violate good security practices and accessed a container via direct shell access. The tool provided a full log of commands executed during the scan, and in the case of shell access, I was able to see the shell command history, which would allow further investigation of these suspicious incidents.
The screenshots in Figures 4 and 5 show how the activity from these examples is presented via the user interface of the tool I used.
Figure 4: Detection of suspicious activity (source: Twistlock)
Figure 5: Reporting on suspicious activity (source: Twistlock)
The purpose of these examples is not to promote or endorse any particular tool but rather to illustrate the usefulness of this class of monitoring tool. As you can see, they provide a lot of insight into the activity in a Kubernetes cluster.
To conclude this point, remember that your Kubernetes cluster probably doesn’t come with all the tools and mechanisms to allow you to have full visibility of what is happening internally. You must be aware of the capabilities provided by the platform and extend those or add the technology that will help you gain full control, visibility and, most importantly, actionable insights in case of suspicious activity or a newly detected exploitable vulnerability.
Kubernetes attack vectors
Attacks on any computer system stem from the motivation to compromise one or a combination of the three pillars of security, known as the CIA triad: confidentiality, integrity and availability. Kubernetes is no different, although the concrete mechanisms and techniques used to attack a Kubernetes cluster might differ from those used against more traditional systems.
Different sources describe the threats differently, and some even try to rank threats in order of severity. But in reality, a relatively small number of types of attack account for the vast majority of attempts to compromise Kubernetes clusters. The most common of these prominent attack types are described below:
1. Container compromise
An application misconfiguration or vulnerability enables the attacker to break into a container and start probing for weaknesses in the network, process controls or file system.
2. Unauthorised connections between pods
Compromised containers can attempt to connect to other pods running on the same host or other accessible hosts to probe for vulnerabilities or launch an attack. Although Layer 3 network controls, such as whitelisting pod IP addresses, can offer some protection, attacks over trusted IP addresses can only be detected with Layer 7 network filtering.
3. Data exfiltration from a pod
Stealing data is often achieved using a combination of techniques, which can include a reverse shell in a pod connecting to a command-and-control server, combined with network tunnelling to hide confidential data.
4. Compromised container running malicious process
Containers generally run a limited and well-defined set of processes, but a compromised container can start malware such as crypto-mining or suspicious processes like network port scanning.
5. Container file system compromised
If an attacker can access the file system freely or compromise its security controls, they can install vulnerable libraries or packages to exploit the container or change sensitive files within the container. Once exploited, a privilege escalation to root or other similar breakouts can be attempted.
6. Compromised worker node
Just as a container can be compromised, a skilled attacker may be able to compromise the underlying host machine, making anything running on that host vulnerable. For example, the Dirty Cow Linux kernel vulnerability enabled a user to escalate to root privilege.
7. Attacks on the Kubernetes infrastructure itself
In order to disable or disrupt applications or gain access to secrets, resources or containers, hackers can also attempt to compromise Kubernetes resources such as the API server or kubelets.
8. Attack on CI/CD pipeline
An attacker can use poorly configured CI/CD to get access to the source code and inject malicious code and backdoors into the application that will eventually be deployed and run out of your Kubernetes cluster.
Key security practices for Kubernetes
Taking all we have discussed in this article into consideration, I have listed some of the key security practices below that you might find useful when building a Kubernetes cluster. Of course, this is not a complete and exhaustive list of all the security practices you may need, but it provides a starting point from which you can assess your situation and identify the practices needed to mitigate your security threats and meet your security objectives.
- Use namespaces; use separate namespaces for each of your applications.
- Use Kubernetes Network Policies to restrict ingress and egress traffic for your pods.
- Use quotas to limit resource utilisation and avoid draining resources from other services in case of a container compromise or unexpected behaviour.
- Choose the minimal and most secure base image for your container (like Alpine, BusyBox, Debian); harden the base image that you choose; refer to images by version (‘alpine:3.9’) and not by mutable labels (‘alpine:latest’) to avoid unexpected changes published by the image provider.
- Implement a solution for image vulnerability scanning in your container images.
- Consider implementing a supply chain vulnerabilities firewall to prevent vulnerable third-party code from becoming part of your application (like Nexus Firewall).
- Keep your secrets in Secret objects. Store the secrets according to a defined secret management strategy (for example with a secret management tool).
- Use TLS certificates at your ingress resources (like nginx-ingress).
- Run your containers in least-privilege mode.
- Back up your persistent volumes to allow easier recovery after a security incident.
- Implement log rotation and file integrity monitoring.
- Implement log analysis solutions (like SIEM, ELK, Splunk etc).
- Remember: “If nobody is there to hear it, a falling tree does not make a sound.” Define roles, responsibilities and notification channels to receive and react to events in your cluster.
Conclusion
In this article, we considered some important practices to address Kubernetes security. I hope the concepts, practices and tools discussed help you to direct your thinking in this area and to consider the possible threats and attack scenarios that you need to be aware of when designing or operating a complex Kubernetes cluster. The approach to designing, building and protecting your environment will vary between different situations, but the understanding of what is ‘under the hood’ will help you to navigate through your technology and make the right decisions regarding the security controls and tools to use.
Technology and how it is used to solve business problems is evolving rapidly. Kubernetes is becoming a commodity, and there is a plethora of tools, technology and approaches to help get the most value from its use. However, it’s rare that an out-of-the-box method or a theoretical approach can be applied to your specific situation without any adjustments. At the end of the day, even if you are trying to solve a common problem, your situation is almost certainly unique in some ways. To meet this challenge, we should always follow proven practices but equally remember to validate and customise them to our specific situation.
Sources and acknowledgement
- Kubernetes official website (https://kubernetes.io)
- NeuVector blogs (https://neuvector.com/container-security/blog/)
- Twistlock website and Twistlock evaluation license (https://twistlock.com)
- Aqua Security website and AquaSec open-source projects (https://aquasec.com)
- Qualys website (https://qualys.com)
- Sonatype website (https://www.sonatype.com)