<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=4958233&amp;fmt=gif">
 
RSS Feed

Architecture | Vlad Calmic |
09 June 2020


Part 2: security Technology In Kubernetes 

INTRODUCTION 

In Part 1 of this blog, we looked at Kubernetes from the perspective of understanding how it works and how different it is from a traditional software deployment environment. It is important for a designer or a security specialist to have that knowledge before deciding what technology to use to secure your cluster.

In Part 2 we will consider two of the previously identified concerns in more detail: Image health and anomaly detection. We will also explain some of the common attack vectors for Kubernetes.

IMAGE SCANNING 

Every time the orchestrator needs to spin up a new container it will read the deployment manifest associated with that container, look up the associated image in the image repository and retrieve it, and spin up a new container based on that image.

An image is the base of a running container. You create an image specific to your containers by deploying your application code, along with associated libraries and other dependencies, on top of a ‘parent image’. As stated in the Docker documentation, “A parent image is the image that your image is based on”. They are usually ‘light’ versions of common operating systems like Ubuntu, Red Hat and Alpine Linux, but they can also be platform images such as those for databases like MySQL, PostgreSQL, Redis, etc.

If the parent image or the framework libraries retrieved during your image creation contain vulnerabilities, they are inherited by your containers and will be deployed in the cluster when spinning up your new containers.

A good practice is to protect your code base by following secure development practices, including code review, code scan, and vulnerability scanning prior to building an image. When building your container image, it is also good practice to run a vulnerability scan on your newly built image using an automated tool. Once that is done, and after the testing is complete, the image can be stamped by the security team as suitable for running in the production environments. Follow good security practice when promoting your images to production to protect their integrity, including image signing and integrity checks and continuous vulnerability scans in production.

There is a range of products on the market that can help you monitor your vulnerability status, including (in alphabetical order): Aqua Sec Platform, NueVector, Qualys, and Twistlock.

Each of these tools have their own distinctive features and ways to manage the findings, but there are basic features common to most of them. All of the solutions will deploy an agent in your cluster that will scan your images every time you start a new container and provide you with a view of the current state of images that are running in your environment.

The information will look similar to the screenshot in Figure 1 and Figure 2.

Images-and-vulnerabilities
Figure 1 - Images and vulnerabilities (source: Twistlock)

 

Vulnerabilities-scores-and-coverage
Figure 2 - Vulnerabilities scores and coverage (source: Twistlock)

 

It is worth stressing that the agent scans the images from which containers are created and not the actual running instance (that is, the containers themselves). This means that if you fix problems in the running containers rather than in the images, after running the scan again, the report will still list the same vulnerabilities for your container image. This is expected behaviour, as the agent that does the scanning sits with the orchestrator and scans images when a new container is created and not at runtime. We need to remember that when that container is destroyed or recreated, the new substitute will be built anew from the vulnerable container image. All the changes made manually in a running instance will be lost.

The right way to fix security problems is to go back upstream and resolve them in your delivery pipeline at the build stage following the established route for change management, including all the stages for validating and promotion of the new image into the production repository. This fixes the problem in the image and ensures consistency, versioning, and developer discipline.

In Kubernetes, when you rebuild the image and tag it, you can also update the deployment manifest to use the latest tag. An update to the deployment manifest will make your orchestrator rebuild all the containers affected by that change. This happens only when the orchestrator perceives that there is a difference in the image name, tag, or other configuration in the container deployment manifest.

A good practice to achieve that is to maintain versioning in the image tagging and promote that version into your manifest rather than using the ‘:latest’ generic tag for a newly built image.

ACTIVITY MONITORING FOR ANOMALY DETECTION 

Everybody agrees that containers are designed to be disposable. To achieve efficient lifecycle management, the container platforms use several layers of abstraction to ensure portability, scalability, and of course, soft eviction of containers once they have been terminated.

Logs, data, and configuration are also all present at different layers of abstraction. When we evict a container, all of the data and logs stored by that container will also be deleted. Yet, many of the cluster’s activities are not visible to traditional monitoring systems, which means that traditional log analysis is likely to be unsuitable or at least incomplete for use with container clusters.

In Kubernetes, logging is available at either the node level or cluster level. Application logs will normally be directed to ‘stderr’ and ‘stdout’, and for them to be captured persistently, a logging agent must be specified to manage the logs.

In contrast, system components in a container always log into ‘/var/log’, so it is always sensible to mount persistence volumes to map a container’s ‘/var/log’ directory to persistent storage outside the container to collect these logs. This allows retention of the logs after pod eviction.

Kubernetes cluster-level logging uses three main delivery mechanisms:

  • Use of node-level logging agents (running on each node)
  • Dedicated sidecar containers for logging in an application pod
  • Pushing log messages directly to a backend from within the application
 

Each of the logging approaches have their benefits and constraints, and you must select one that is most suited to your case. You can find more details and examples of implementing each mechanism on the Kubernetes website or on vendor web sites.

In the previous part of this blog, we discussed an example when an attacker was able to compromise an entire cluster by exploiting a Struts 2 vulnerability via a legitimate access protocol. In order to establish the full chain of events in a traditional environment, the forensic team would require a full set of logs and an unaltered image of the host. As we mentioned earlier, traditional methods for logging and monitoring are not sufficient for the security monitoring of containerised platforms like Kubernetes.

What if the attacker, after establishing a legitimate access route to elevate their privileges (e.g. became a privileged user in the administration dashboard), removed the compromised container or evicted the pod? Removing that compromised container would trigger Kubernetes self-healing and result in the cluster running a brand new, clean version of the same container without any traces of compromise.

There are a number of tools on the market (e.g. NeuVector and Twistlock) that will collect the audit information for you and store it in a separate volume, even if the pods have been recreated and the old containers cease to exist. This means you will still be able to access audit logs and shell commands that were run inside the terminated containers. The log is collected in real-time and stored in a separate persistent volume or delivered to a remote location. You can use visualisation features of the tool to analyse the data, or you can transport the log to an external tool for security monitoring and alerting. A tool like this is valuable, because even after a compromise and container rebuild, you will still have access to logs and will be able to recreate some parts of the attack.

Many activity monitoring solutions usually have some type of machine learning capability to recognise utilisation patterns and use them as a baseline to assess observed behaviour for anomalies. These solutions can usually run in passive mode (watch and report) as well as active mode when it stops some activities that are deemed to be malicious.

The screenshot in Figure 3 is an instance of an example tool in action to illustrate the way they approach learning mode for newly created services.

Learning-mode
Figure 3 - Learning mode (source: Twistlock)

 

Let’s examine one of these activity monitoring solutions in action. In my lab I, deployed a monitoring solution from one of the vendors. In parallel, I ran a free tool to perform a configuration scan of my environment to identify cluster misconfiguration and weaknesses. Since that was not a ‘common’ activity on my cluster, the monitoring solution immediately detected the activity as suspicious and alerted the administrator. The same thing happened when I tried to violate good security practice and accessed a container via direct shell access. The tool provided a full log of commands executed during the scan, and in the case of shell access, I was able to see the shell command history, which would allow further investigation of these suspicious incidents.

The screenshots in Figure 4 and Figure 5 show how the activity from these examples is presented via the user interface of the tool we used.

Detection-of-suspicious-activity
Figure 4 - Detection of suspicious activity (source: Twistlock)

 

Reporting-on-suspicious-activity
Figure 5 - Reporting on suspicious activity (source: Twistlock)

 

The purpose of these examples is not to promote or endorse any particular tool, but rather to illustrate the usefulness of this class of monitoring tool. As you can see, they provide a lot of insight into the activity going on within a Kubernetes cluster.

To conclude this point, remember that your Kubernetes cluster probably doesn’t come with all the tools and mechanisms to allow you to have full visibility of what is happening internally. You as a designer must be aware of the capabilities provided by the platform and extend those or add the technology that will help you take full control, visibility, and most importantly, actionable insights in case of a suspicious activity or newly detected exploitable vulnerability.

KUBERNETES ATTACK VECTORS 

The attacks on any computer system in the world stem from the motivation to compromise one or a combination of the three pillars of security, known as the CIA-triad: Confidentiality, Integrity and Availability.

Kubernetes is no different, although the mechanisms and techniques used to attack a Kubernetes cluster might differ from those used against more traditional systems.

Different sources describe the threats differently, and some even try to rank threats in order of severity. But in reality, a relatively small number of types of attack account for the vast majority of attempts to compromise Kubernetes clusters. The most common of these prominent attack types are described below:

[V1]: CONTAINER COMPROMISE 

An application misconfiguration or vulnerability enables the attacker to break into a container and start probing for weaknesses in the network, process controls, or file system.

[V2]: UNAUTHORIZED CONNECTIONS BETWEEN PODS 

Compromised containers can attempt to connect to other pods running on the same host or other accessible hosts to probe for vulnerabilities or launch an attack. Although Layer 3 network controls such as whitelisting pod IP addresses can offer some protection, attacks over trusted IP addresses can only be detected with Layer 7 network filtering.

[V3]: DATA EXFILTRATION FROM A POD  

Stealing data is often achieved using a combination of techniques, which can include a reverse shell in a pod connecting to a command and control server, combined with network tunnelling to hide confidential data.

[V4]: COMPROMISED CONTAINER RUNNING MALICIOUS PROCESS 

Containers generally run a limited and well-defined set of processes, but a compromised container can start malware such as crypto mining or suspicious processes like network port scanning.

[V5]: CONTAINER FILE SYSTEM COMPROMISED 

If an attacker can access the file system freely or compromise its security controls, they can install vulnerable libraries or packages in order to exploit the container or change sensitive files within the container. Once exploited, a privilege escalation to root or other similar breakouts can be attempted.

[V6]: COMPROMISED WORKER NODE 

Just as a container can be compromised, a skilled attacker may be able to compromise the underlying host machine, making anything running on that host vulnerable. For example, the Dirty Cow Linux kernel vulnerability enabled a user to escalate to root privilege.

[V7]: ATTACKS ON THE KUBERNETES INFRASTRUCTURE ITSELF 

In order to disable or disrupt applications, or gain access to secrets, resources, or containers, hackers can also attempt to compromise Kubernetes resources such as the API Server or kubelets.

[V8]: ATTACK ON CI/CD PIPELINE 

An attacker can use poorly configured CI/CD to get access to the source code and inject malicious code and backdoors into the application that will eventually be deployed and run out of your Kubernetes cluster.

KEY SECURITY PRACTICES FOR KUBERNETES 

Taking all we have discussed in this article into consideration, I have assembled some of the key security practices into the list below that you might find useful when building a Kubernetes cluster. Of course, this is not a complete and exhaustive list of all of the security practices you may need, but it provides a starting point from which you can assess your situation and identify the practices needed to mitigate your security threats and meet your security objectives.

  • Use namespaces; use separate namespaces for each of your applications.
  • Use Kubernetes Network Policies to restrict ingress and egress traffic for your pods.
  • Use quotas to limit resource utilisation and avoid draining resources from other services in case of a container compromise or unexpected behaviour.
  • Choose the minimal and most secure base image for your container (e.g. Alpine, BusyBox, Debian); harden the base image that you choose; and refer to images by version (e.g. ‘alpine:3.9’) and not by mutable labels (e.g. ‘alpine:latest’) to avoid unexpected changes published by the image provider.
  • Implement a solution for image vulnerability scanning in your container images.
  • Consider implementing a supply chain vulnerabilities firewall to prevent vulnerable 3rd party code becoming part of your application (e.g. Nexus Firewall).
  • Keep your secrets in the Secret objects. Store the secrets according to a defined secret management strategy (e.g. secret management tool).
  • Use TLS certificates at your Ingress resources (e.g. nginx-ingress).
  • Run your containers in least privilege mode.
  • Back up your persistent volumes to allow easier recovery after a security incident.
  • Implement log rotation and File Integrity Monitoring.
  • Implement log analysis solutions (e.g. SIEM, ELK, Splunk, etc.).
  • Remember: “If nobody is there to hear it – a falling tree does not make a sound”. Define roles and responsibilities and notification channels to receive and react to the events in your cluster.
 

CONCLUSION 

In this article we considered a few important security practices to address Kubernetes security. I hope the concepts, practices, and tools discussed in it have helped you to direct your thinking in this area and to consider the possible threats and attack scenarios that you need to be aware of when designing or operating a complex, but probably unique, Kubernetes cluster. The approach to designing, building, and protecting your environment will vary between different situations, but the understanding what is ‘under the hood’ will help you to navigate through your technology and make the right decisions with regards to the security controls and tools to use.

Current technology and how it is used to solve business problems is evolving rapidly. Kubernetes is becoming a commodity, and there is a plethora of tools, technology, and approaches to help get the most value from its use. However, it is rare that an out-of-the-box method or a theoretical approach can be applied without any adjustments to your specific situation. At the end of a day, even if you are trying to solve a common problem, your situation is almost certainly still unique in some ways. To meet this challenge, we should always follow proven practices, but equally we must remember to validate and customise them to our specific situation.

Sources and acknowledgment: 

  1. Kubernetes official web site (https://kubernetes.io)
  2. NeuVector Blogs (https://neuvector.com/container-security/blog/)
  3. Twistlock web site and Twistlock evaluation license (https://twistlock.com)
  4. Aqua Security web site and AquaSec Open Source Projects (https://aquasec.com)
  5. Qualys web site (https://qualys.com)
  6.  Sonatype web site (https://www.sonatype.com)

Vlad Calmic

VP Security & Crypto

Vlad is a passionate security professional with over 20 years of hands-on experience in designing and delivering complex IT solutions for e-commerce and financial services. He likes experimenting with new technologies and pushing them to their limits to get the most out of them. He believes that focusing on getting the small details right leads to an excellent big picture. He also believes that a good coffee in the morning will supercharge you for the whole day, especially when you get to have it with the people you love. When he isn’t getting ready to ensure the security of our client’s platforms and applications, he might be right next to you on a motorway, driving to his next big adventure.

 

From This Author

  • 05 November 2019

    The Twisted Concept of Securing Kubernetes Clusters

 

Archive

  • 13 November 2023

    Delving Deeper Into Generative AI: Unlocking Benefits and Opportunities

  • 07 November 2023

    Retrieval Augmented Generation: Combining LLMs, Task-chaining and Vector Databases

  • 19 September 2023

    The Rise of Vector Databases

  • 27 July 2023

    Large Language Models Automating the Enterprise – Part 2

  • 20 July 2023

    Large Language Models Automating the Enterprise – Part 1

  • 11 July 2023

    Boost Your Game’s Success with Tools – Part 2

  • 04 July 2023

    Boost Your Game’s Success with Tools – Part 1

  • 01 June 2023

    Challenges for Adopting AI Systems in Software Development

  • 07 March 2023

    Will AI Transform Even The Most Creative Professions?

  • 14 February 2023

    Generative AI: Technology of Tomorrow, Today

  • 25 January 2023

    The Joy and Challenge of being a Video Game Tester

  • 14 November 2022

    Can Software Really Be Green

  • 26 July 2022

    Is Data Mesh Going to Replace Centralised Repositories?

  • 09 June 2022

    A Spatial Analysis of the Covid-19 Infection and Its Determinants

  • 17 May 2022

    An R&D Project on AI in 3D Asset Creation for Games

  • 07 February 2022

    Using Two Cloud Vendors Side by Side – a Survey of Cost and Effort

  • 25 January 2022

    Scalable Microservices Architecture with .NET Made Easy – a Tutorial

  • 04 January 2022

    Create Production-Ready, Automated Deliverables Using a Build Pipeline for Games – Part 2

  • 23 November 2021

    How User Experience Design is Increasing ROI

  • 16 November 2021

    Create Production-Ready, Automated Deliverables Using a Build Pipeline for Games – Part 1

  • 19 October 2021

    A Basic Setup for Mass-Testing a Multiplayer Online Board Game

  • 24 August 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 3

  • 20 July 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 2

  • 29 June 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 1

  • 08 June 2021

    Elasticsearch and Apache Lucene: Fundamentals Behind the Relevance Score

  • 27 May 2021

    Endava at NASA’s 2020 Space Apps Challenge

  • 27 January 2021

    Following the Patterns – The Rise of Neo4j and Graph Databases

  • 12 January 2021

    Data is Everything

  • 05 January 2021

    Distributed Agile – Closing the Gap Between the Product Owner and the Team – Part 3

  • 02 December 2020

    8 Tips for Sharing Technical Knowledge – Part 2

  • 12 November 2020

    8 Tips for Sharing Technical Knowledge – Part 1

  • 30 October 2020

    API Management

  • 22 September 2020

    Distributed Agile – Closing the Gap Between the Product Owner and the Team – Part 2

  • 25 August 2020

    Cloud Maturity Level: IaaS vs PaaS and SaaS – Part 2

  • 18 August 2020

    Cloud Maturity Level: IaaS vs PaaS and SaaS – Part 1

  • 08 July 2020

    A Virtual Hackathon Together with Microsoft

  • 30 June 2020

    Distributed safe PI planning

  • 09 June 2020

    The Twisted Concept of Securing Kubernetes Clusters – Part 2

  • 15 May 2020

    Performance and security testing shifting left

  • 30 April 2020

    AR & ML deployment in the wild – a story about friendly animals

  • 16 April 2020

    Cucumber: Automation Framework or Collaboration Tool?

  • 25 February 2020

    Challenges in creating relevant test data without using personally identifiable information

  • 04 January 2020

    Service Meshes – from Kubernetes service management to universal compute fabric

  • 10 December 2019

    AWS Serverless with Terraform – Best Practices

  • 05 November 2019

    The Twisted Concept of Securing Kubernetes Clusters

  • 01 October 2019

    Cognitive Computing Using Cloud-Based Resources II

  • 17 September 2019

    Cognitive Computing Using Cloud-Based Resources

  • 03 September 2019

    Creating A Visual Culture

  • 20 August 2019

    Extracting Data from Images in Presentations

  • 06 August 2019

    Evaluating the current testing trends

  • 23 July 2019

    11 Things I wish I knew before working with Terraform – part 2

  • 12 July 2019

    The Rising Cost of Poor Software Security

  • 09 July 2019

    Developing your Product Owner mindset

  • 25 June 2019

    11 Things I wish I knew before working with Terraform – part 1

  • 30 May 2019

    Microservices and Serverless Computing

  • 14 May 2019

    Edge Services

  • 30 April 2019

    Kubernetes Design Principles Part 1

  • 09 April 2019

    Keeping Up With The Norm In An Era Of Software Defined Everything

  • 25 February 2019

    Infrastructure as Code with Terraform

  • 11 February 2019

    Distributed Agile – Closing the Gap Between the Product Owner and the Team

  • 28 January 2019

    Internet Scale Architecture

OLDER POSTS