<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=4958233&amp;fmt=gif">
 
RSS Feed

Architecture | Vlad Calmic |
05 November 2019

PART 1: KUBERNETES AT A GLANCE THROUGH A SECURITY SPECIALISTS EYES

Introduction

Kubernetes has gained a lot popularity amongst developers in a relatively short period of time as an easy and straightforward way to quickly develop and deploy containerised applications. It simplifies change management and applying changes so that it becomes a straightforward matter of retiring the compromised running copy of your application, and the Kubernetes engine will do the rest, spinning up a new shiny copy of a fixed instance. The important question which is the focus of this two part blog series is to assess this from a security specialist’s point of view - does Kubernets provide a security expert enough control of the environment and the ability to assess the state of it?

What is a container management platform

In order to understand Kubernetes, and the role it plays, we first need to look at the factors that led to the development of such a platform. The age of virtualisation and rise of container-based application development, gave birth to a need for an administration and operations group to manage the provisioning, configuration and deployment of those application in a controlled, consistent and transparent manner. The development of continuous integration and continuous delivery, and DevOps ways of working, drove widespread automaton of the deployment process (through the philosophy of 'everything as code').

Mainstream use of containers for application packaging really began with the first release of the Docker engine, in 2013, and since then a comprehensive set of tools has developed around it, to manage the definition of a Docker application service and the management of its configuration options. Docker has created its own set of tools and deployment definition framework for example docker-compose, docker swarm, docker-machine.

Of course, Google was running containers long before this using their in-house platform called Borg, which was a Google internal project, tightly tied to their own proprietary technologies. Due to their experience with large scale container deployment, Google added a lot of features to Borg to make deploying and operating large containerised applications easier. However, it’s dependency on their in-house technology meant that it wasn’t something that other organisations could use. This was the reason that in 2014 Google decided to create a new open source project called Kubernetes that would create a Borg-like container management platform, but one independent of their internal technology, for everyone to use.

With the rise of Kubernetes, container management and orchestration is no longer a proprietary technology and it is slowly moving into a space of commodity utilities. The expectation is that it will be seen as a typical technology to support managed services and will become the standard way of managing containerised platforms. Already we are seeing that it has been adopted by leading cloud providers which have wrapped it into various services, to help delivery teams easily deploy, manage and scale containerised applications.

Under the hood

So, now that we know where container management technologies came from, let’s move deeper into the mechanics and focus on a part that often gets less attention – securing the cluster. As a security specialist, I admit the term ‘securing’ is pretty vague, so for thue purpose of this blog I will define it as security controls that will help us protect the confidentiality, integrity and availability of the cluster and the resources it manages.

The purpose of Kubernetes is to abstract the underlying container management technology, be that containers or virtual machines. The definition of the cluster and its policies is achieved using a relatively simple declarative language (YAML specifically) that will be interpreted by the engine to retrieve the necessary resources to deploy our application and monitor their health.

Kubernetes makes it is easy for a developer to define a new configuration and have it up and running in seconds, without being fully aware of the ways in which each element communicates with the others in the cluster. To be fair, that was the concept of Kubernetes in first place when Google developers made it easy for their applications to communicate. Although that solves one of the main business problems - 'we have a service and it runs' there is a certain degree of scepticism from the security specialists as the tendency is to build everything in privileged mode and let everything talk to everything within your cluster.

A common mistake that many people make is to assume that a containerised environment like a Kubernetes cluster is 'secure by default'. This is actually true to some extent, but only in a limited way.

A Kubernetes deployment is secure in some ways, because the virtualisation engine isolates the processes and abstracts the underlying infrastructure and network, which allows you to create virtual environments and control their access to resources. The container platforms have also implemented native built-in security and isolation of the container runtime.

This means that containerised platforms provide all the pre-requisites and mechanisms to create a robust and secure environment to deploy your applications, but we all must be aware that it does not come “for free”; you must define your policies and configure their enforcement in your deployment scripts.

Let’s consider some of the specific security factors that are important to consider in a Kubernetes environment.

Network


In a future blog post on this topic, I will introduce a list of attack vectors and the resulting risks that a container-based platform is exposed to. However, there are two specific risks that I would like to discuss in more detail in this section, which are:

- container isolation problems (misconfiguration of that isolation); and
- deployment and running of components with known vulnerabilities.

Let’s consider container isolation problems first. Our focus here is the possible misconfiguration of the pod isolation in Kubernetes, rather than the more general possible problem of vulnerabilities in the container isolation mechanisms.

Firstly, what is are isolated and non-isolated pods?

The concept of pod isolation is similar to the concept of network access control policies in software defined networks. In the context of Kubernetes, the difference between an isolated pod and a non-isolated pod is that a pod becomes isolated when there is a network policy that selects it.

Based on the definition on kubernetes.io web site - “A network policy is a specification of how groups of pods are allowed to communicate with each other and other network endpoints. NetworkPolicy resources use labels to select pods and define rules which specify what traffic is allowed to the selected pods.” You can find a full spec of a network policy resource on Kubernetes website, but an example of it might look like:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-network-policy
  namespace: default
spec:
  podSelector:
    matchLabels:
      role: db
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - ipBlock:
        cidr: 172.17.0.0/16
        except:
        - 172.17.1.0/24
    - namespaceSelector:
        matchLabels:
          project: myproject
    - podSelector:
        matchLabels:
          role: frontend
    ports:
    - protocol: TCP
      port: 6379
  egress:
  - to:
    - ipBlock:
        cidr: 10.0.0.0/24
    ports:
    - protocol: TCP
      port: 5978

The default network policy behaviour in Kubernetes is 'allow all', which means that a pod will accept connections from any source. The network policies are mechanisms that are added as plugins to define the ingress and/or egress restrictions associated with a pod or a selection of pods. Once you have selected (listed) a pod in your network policy, the only traffic allowed to and from that pod is the set defined in the relevant policy objects; all other traffic is denied. I think that concept is called ‘denying by allowing’ when you deny all the other traffic except the one you have explicitly allowed. Where there are multiple policies that are applicable to a pod, then they are combined using an OR condition, meaning the traffic will be allowed if it is allowed by any of the policies. A good practice is to define a “deny all” network policy with an empty 'podSelector' attribute thus enforcing a 'deny all' policy by default and then start adding network policies to allow the connections needed.

You can define network policies using YAML and apply them to your cluster. Network policies live within the boundaries of a namespace or the default namespace if the policy must extend to the whole cluster. It is also important to know that network policies will only be enforced if there is a Container Network Interface (CNI) network plugin installed that supports the network policies definition.

To understand this, let’s consider a simple example of a Kubernetes two-node network topology (shown in fig. 1). The nodes are logically segregated, and each contain pods. Each pod has its own IP address that is shared among containers in that pod, but usually one pod runs one container. All the IPs are routable by default, because the default deployment pods are not isolated, which means they accept connections from any source. This point is worth remembering as it is a call to action for your designer to look for network plugins that will support network policies in your cluster. As we’ve already discussed, you isolate your pods by specifying the ingress and egress rules for that specific pod or selection of pods.

An example of YAML ingress rule looks like the code fragment below:

...
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          project: myproject
    - podSelector:
        matchLabels:
          role: frontend
...

In the example above the rule allows ingress traffic from pods in the namespace with the label ‘project=myproject’ and the pods labels ‘role=frontend’, all the other traffic will be rejected.

Network Policies are supported in Kubernetes by a number of CNI networking plugins such as Calico, Romana, Weave Net, Flannel and Conntiv to list just a few.

Kubernetes nodes and the overlay network.
Fig 1. Kubernetes nodes and the overlay network. (source: NeuVector)

Running Components with Known vulnerabilities

Now that we have talked about the security implications of network topology, network policies and isolation, let's see how the two risks we listed above (known vulnerabilities and misconfiguration) are related.

You are probably aware of the risk of known vulnerabilities in software, particularly given that the exploitation of known vulnerabilities has accounted for a number of recent high-profile security breaches. In this context we are interested in exploitable vulnerabilities that would allow an attacker to compromise the confidentiality, integrity or availability of your service. A distinctive attribute of this sort of attack is that it is carried out via a legitimate protocol, allowed by the access policies. For the sake of this article it does not matter how that vulnerability ended up in the code: programming error, intentional backdoor, supply chain or any other ways you can think of.

For example, a well-known vulnerability that made the headlines (CVE-2017-5638, a Struts 2 RCE vulnerability) enables remote code execution and is exploitable via a legitimate access protocol (HTTP). There are examples on the web of how easy is to compromise an entire cluster once the attacker has got access to one of the containers by mounting a reverse shell attack as a result of this Struts2 exploit. In one of the examples, once inside, the attacker was able to elevate privileges, run services in privileged mode, move laterally in the network and ultimately take over the control panel and therefore the whole cluster.

Running containers in privileged mode

We’ve probably all made the mistake at some point, of running an administrative console in privileged mode. It just makes things so much easier, by removing all the barriers, so that everything talks to everything and it all works like magic. In this context, if you take one thing form this article, STOP THAT PRACTICE! If you build an image in privileged mode and you open a bash command to any of your containers and run 'ps aux' command, you will notice that all the processes inside it run as a root. Running 'whoami’ will confirm that you are the root user too! How cool is that? (If I was an attacker). Once an initial compromise is achieved, the attacker can run a package manager to install any tool that they need and start any process inside on the machine, such as creating a virtual jump box to launch an internal attack. I know what you are thinking - containers are ephemeral... they can vanish very quickly, so the risk must be low? That is true, but even a short time can be enough to compromise the entire cluster or move laterally to a process that has a longer lifetime or even gain privileged access to the entire cluster.

The solution is the same as it has always been - harden your base image, run your applications in least-privilege mode, patch your images and application code, watch the activity inside your containers.

A Kubernetes specific action to take is to define the run mode and privilege escalation in the securityContext (runAsNonRoot:true, allowPrivilegeEscalation:false) in the definition specs, the result of which is that Kubernetes will kill containers that run in privileged mode if they were not meant to.

Our next question is understanding and monitoring what is going on in our environment. It is obvious that knowing what is happening inside your environment in an abstracted, highly dynamic environment can be tricky.

The attack I mentioned above included all of the components of a kill chain using elevation of privileges and use of misconfigured container deployment to take control over the cluster. Yet, the attack leaves tracks that can be detected and analysed in real-time or by an external intrusion detection tool. This leads us neatly into the next topic, which is how you can watch what is going on inside you cluster and, in certain situations, stop certain activity or alert the operations teams to any suspicious behaviour.

Cluster monitoring

There are many solutions on the market that can help protect your cluster from the security threats that I have described in this article. They differ in details of their implementation and the level of insights that they provide, as well as the type of cluster that they are designed for (i.e. managed or hosted clusters). However, none of the tools will provide a “silver bullet” solution that will solve all of your cluster security problems. Similar to any software purchase, you will need to work out what threats you might be exposed to, what you need to monitor and then select an appropriate technology to meet those needs.

There are standard types of monitoring that are likely to be of interest to your development and operations groups:

- Resource utilisation (CPU, RAM, Storage, Network)
- DevOps statistics (number of resources created, destroyed during a period of time)
- Host monitoring (CPU, RAM, swap, storage)
- Container health - number of running containers, their performance and network I/O
- API access - Application endpoints and control plane endpoints, utilisation patterns
- Image health - integrity and vulnerabilities status of your base images
- Anomaly detection - traffic exchange (host, cluster, pod), resource utilisation, internal services and commands

You probably don’t need to worry about a good number of the standard performance and resource management indicators, as most of the container service providers provide these, either by allowing you to query the cluster control plane API or by providing an administrative visualisation tool for that purpose.

The key takeaway from this article is that it is easy for a developer to spin up a Kubernetes cluster and deploy applications. The difficult part is to understand the foundational concepts that makes all the moving parts inside the cluster to work together and leverage that knowledge to configure your cluster with the security principles in mind. In this article we introduced two of the concepts - traffic restrictions using network policies and basics of cluster monitoring.

In the Part 2 of this series we will address some of the concerns related to image health, activity monitoring and anomaly detection. We will also outline a list of attack vectors and good practices that will guide the designer to address security early and design the cluster with security concerns in mind.

Watch this space, the Part 2 is in the process of being readied for publication!

Vlad Calmic

VP Security & Crypto

Vlad is a passionate security professional with over 20 years of hands-on experience in designing and delivering complex IT solutions for e-commerce and financial services. He likes experimenting with new technologies and pushing them to their limits to get the most out of them. He believes that focusing on getting the small details right leads to an excellent big picture. He also believes that a good coffee in the morning will supercharge you for the whole day, especially when you get to have it with the people you love. When he isn’t getting ready to ensure the security of our client’s platforms and applications, he might be right next to you on a motorway, driving to his next big adventure.

 

From This Author

  • 09 June 2020

    The Twisted Concept of Securing Kubernetes Clusters – Part 2

 

Archive

  • 13 November 2023

    Delving Deeper Into Generative AI: Unlocking Benefits and Opportunities

  • 07 November 2023

    Retrieval Augmented Generation: Combining LLMs, Task-chaining and Vector Databases

  • 19 September 2023

    The Rise of Vector Databases

  • 27 July 2023

    Large Language Models Automating the Enterprise – Part 2

  • 20 July 2023

    Large Language Models Automating the Enterprise – Part 1

  • 11 July 2023

    Boost Your Game’s Success with Tools – Part 2

  • 04 July 2023

    Boost Your Game’s Success with Tools – Part 1

  • 01 June 2023

    Challenges for Adopting AI Systems in Software Development

  • 07 March 2023

    Will AI Transform Even The Most Creative Professions?

  • 14 February 2023

    Generative AI: Technology of Tomorrow, Today

  • 25 January 2023

    The Joy and Challenge of being a Video Game Tester

  • 14 November 2022

    Can Software Really Be Green

  • 26 July 2022

    Is Data Mesh Going to Replace Centralised Repositories?

  • 09 June 2022

    A Spatial Analysis of the Covid-19 Infection and Its Determinants

  • 17 May 2022

    An R&D Project on AI in 3D Asset Creation for Games

  • 07 February 2022

    Using Two Cloud Vendors Side by Side – a Survey of Cost and Effort

  • 25 January 2022

    Scalable Microservices Architecture with .NET Made Easy – a Tutorial

  • 04 January 2022

    Create Production-Ready, Automated Deliverables Using a Build Pipeline for Games – Part 2

  • 23 November 2021

    How User Experience Design is Increasing ROI

  • 16 November 2021

    Create Production-Ready, Automated Deliverables Using a Build Pipeline for Games – Part 1

  • 19 October 2021

    A Basic Setup for Mass-Testing a Multiplayer Online Board Game

  • 24 August 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 3

  • 20 July 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 2

  • 29 June 2021

    EHR to HL7 FHIR Integration: The Software Developer’s Guide – Part 1

  • 08 June 2021

    Elasticsearch and Apache Lucene: Fundamentals Behind the Relevance Score

  • 27 May 2021

    Endava at NASA’s 2020 Space Apps Challenge

  • 27 January 2021

    Following the Patterns – The Rise of Neo4j and Graph Databases

  • 12 January 2021

    Data is Everything

  • 05 January 2021

    Distributed Agile – Closing the Gap Between the Product Owner and the Team – Part 3

  • 02 December 2020

    8 Tips for Sharing Technical Knowledge – Part 2

  • 12 November 2020

    8 Tips for Sharing Technical Knowledge – Part 1

  • 30 October 2020

    API Management

  • 22 September 2020

    Distributed Agile – Closing the Gap Between the Product Owner and the Team – Part 2

  • 25 August 2020

    Cloud Maturity Level: IaaS vs PaaS and SaaS – Part 2

  • 18 August 2020

    Cloud Maturity Level: IaaS vs PaaS and SaaS – Part 1

  • 08 July 2020

    A Virtual Hackathon Together with Microsoft

  • 30 June 2020

    Distributed safe PI planning

  • 09 June 2020

    The Twisted Concept of Securing Kubernetes Clusters – Part 2

  • 15 May 2020

    Performance and security testing shifting left

  • 30 April 2020

    AR & ML deployment in the wild – a story about friendly animals

  • 16 April 2020

    Cucumber: Automation Framework or Collaboration Tool?

  • 25 February 2020

    Challenges in creating relevant test data without using personally identifiable information

  • 04 January 2020

    Service Meshes – from Kubernetes service management to universal compute fabric

  • 10 December 2019

    AWS Serverless with Terraform – Best Practices

  • 05 November 2019

    The Twisted Concept of Securing Kubernetes Clusters

  • 01 October 2019

    Cognitive Computing Using Cloud-Based Resources II

  • 17 September 2019

    Cognitive Computing Using Cloud-Based Resources

  • 03 September 2019

    Creating A Visual Culture

  • 20 August 2019

    Extracting Data from Images in Presentations

  • 06 August 2019

    Evaluating the current testing trends

  • 23 July 2019

    11 Things I wish I knew before working with Terraform – part 2

  • 12 July 2019

    The Rising Cost of Poor Software Security

  • 09 July 2019

    Developing your Product Owner mindset

  • 25 June 2019

    11 Things I wish I knew before working with Terraform – part 1

  • 30 May 2019

    Microservices and Serverless Computing

  • 14 May 2019

    Edge Services

  • 30 April 2019

    Kubernetes Design Principles Part 1

  • 09 April 2019

    Keeping Up With The Norm In An Era Of Software Defined Everything

  • 25 February 2019

    Infrastructure as Code with Terraform

  • 11 February 2019

    Distributed Agile – Closing the Gap Between the Product Owner and the Team

  • 28 January 2019

    Internet Scale Architecture

OLDER POSTS