As part of our daily operations, our team and our customers use the company-wide Splunk application. Splunk is used to search through application logs and check the status of file transfers. So naturally, when we were moving our application to Openshift, one of the main prerequisites was to set up all container logs to Splunk. This required a bit of digging, as normally the Splunk team would install an agent on our Linux hosts and configure the agents to pick-up the logs we want to add. Though, in the world of pods where containers can live very briefly, installing such agents would never work. Luckily for us, we could rely on Splunk Connect for Kubernetes.
The concept of this project is rather simple: a helm chart is generated where you can tweak Splunk-specific forwarders. Out of the box, 3 components are available:
- splunk-kubernetes-logging – this chart simply configures Splunk to read all container (stdout) logs. Usually, this is the only one you’ll really need. Installing this chart will result in a daemonset of forwarders (each node gets one) and all crio logs from that node are forwarded to Splunk.
- splunk-kubernetes-objects – this chart will upload all kubernetes objects, such as creation of projects, deployments etc. Installing this chart will result in a single objects pod which talks to the api.
- splunk-kubernetes-metrics – a specific metric chart, just in case you’d rather use Splunk metrics instead of the built-in Grafana. Installing this chart will also create a daemonset.
For each of these charts you can set the Splunk host, port, certificate, HEC token and index. This means you can use different indexes for each component (e.g. a logging index for users and an objects index for the OPS team). This blog assumes that your Splunk team has created the required HEC tokens and Indexes which will be used in the Helm chart.
To start, create a new project where all Splunk forwarders live. This is quite simple:
oc new-project splunk-logging --description="log forwarder to Company-wide Splunk"
If you work with infranodes on your cluster and adjusted your default cluster scheduler to ignore infranodes, by default no Splunk forwarders will be installed there. This might be exactly what you’d want, but if you also want Splunk forwarders on these nodes, type:
oc edit project splunk-logging
and add
openshift.io/node-selector: ""
The next part is a bit scary: Splunk logging forwarders simply look at the filesystem of the Openshift worker (specifically at /var/log/containers/ ) as this is the default location where CRI-O logs are stored in Openshift. There is no ‘sidecar’ approach here (an additional container on each of your pods) to push logs to Splunk.
It is a straightforward approach, but of course pods are not allowed to go on the worker filesystem out of the box. We’ll need to create a security context to allow this inside the splunk-logging namespace.
--- contents of scc.yaml --- kind: SecurityContextConstraints apiVersion: security.openshift.io/v1 metadata: name: scc-splunk-logging allowPrivilegedContainer: true allowHostDirVolumePlugin: true runAsUser: type: RunAsAny runAsUser: type: RunAsAny seLinuxContext: type: RunAsAny volumes: - "*"
Note the ‘allowPriviledContainer: true’ and ‘allowHostDirVolumePlugin: true’ which allows (privileged) Splunk pods to look on the worker filesystem. Setting up the scc is only half the puzzle though, you’ll need to create a Service Account and map this to the security context.
oc apply -f ./scc.yaml oc create sa splunk-logging oc adm policy add-scc-to-user scc-splunk-logging -z splunk-logging
Next, get the helm binary and run
helm repo add splunk https://splunk.github.io/splunk-connect-for-kubernetes
If your bastion host cannot go to splunk.github.io like due to a firewall policy, you can download the Splunk-connect-for-kubernetes repository here in .tar.gz format and use:
helm install my-first-splunk-repo splunk-connect-for-kubernetes-1.4.9.tgz
Great, now you’ll need a chart to tweak into. To generate the chart, type
helm show values splunk/splunk-connect-for-kubernetes > values.yaml
Note: the values.yaml is generated based on the repository. You can tweak this file as much as you want, but please know that the values are based on the repository version you are currently using. This means that the accepted values in the helm chart might change over time. Always generate a vanilla chart after upgrading the splunk-connect-for-kubernetes repository and compare to your specific helm chart to the new template.
Your values.yaml will have 4 sections: a ‘global’ section and the 3 component section listed above. At the global section you can set generic values such as Splunk host, Splunk Port, caFile and Openshift clustername. At each specific section you can set the appropriate HEC token and Splunk index. The helm chart is too large to discuss here, but some words of advice:
- To disable a section, simply set enabled: false, e.g.
splunk-kubernetes-objects: enabled: false
- Your pods will need to run with privileged=true. Use this in each of the components, it doesn’t work in the ‘global’ section of the helm chart
# this used to be: openshift: true
securityContext: true
- You’ve already created the serviceaccount with the mapped scc, so make sure Helm uses it:
serviceAccount: create: false name: splunk-logging
- The default log location of Openshift is /var/log/containers, so you’ll need to set this in each section:
fluentd: path: /var/log/containers/*.log containers: path: /var/log pathDest: /var/log/containers logFormatType: cri logFormat: "%Y-%m-%dT%H:%M:%S.%N%:z"
- if you don’t want all the openshift-pod stdout logs (which can be a HUGE amount of logs), exclude them like this:
exclude_path: - /var/log/containers/*-splunk-kubernetes-logging* - /var/log/containers/downloads-*openshift-console* - /var/log/containers/tekton-pipelines-webhook* - /var/log/containers/node-ca-*openshift-image-registry* - /var/log/containers/ovs-*openshift-sdn* - /var/log/containers/network-metrics-daemon-* - /var/log/containers/sdn*openshift-sdn*
- if you don’t want all the etcd and apiserver logs, remove this line so no forwarder pods are installed on the masternodes:
tolerations: # - key: node-role.kubernetes.io/master # effect: NoSchedule
If you’re happy with the helm chart (which can be very trial-and-error), simply type:
helm install my-first-splunk-repo -f your-values-file.yaml splunk/splunk-connect-for-kubernetes $ or in an offline environment: helm install my-first-splunk-repo -f your-values-file.yaml splunk-connect-for-kubernetes-1.4.9.tgz
If your chart is valid, you’ll see something like:
Splunk Connect for Kubernetes is spinning up in your cluster.
After a few minutes, you should see data being indexed in your Splunk
To see the daemon set pods spinning up, simply type
watch oc get pods -n splunk-logging
In case you want to make more changes to the helm chart (for example to add more filters), you can always modify your values.yaml and then hit:
helm upgrade my-first-splunk-repo -f your-values-file.yaml splunk/splunk-connect-for-kubernetes
Helm will detect the change and only modify the affected parts. For example if you’ve added more logs to the exclude_path, helm will update the configmap containing the Fluentd config and then terminate the daemonset one-by-one.
That’s it for now. In the next blog I’ll show you how to add a filter that prevents Java Stacktraces to become multiple Splunk events!