(Apache) Big data ❤ Kubernetes
Flokkr is a containerization project forApache Flink, Kafka, Ozone, Spark and other big data project to run them in Kubernetes with a GitOps based approach.
It provides:
1. ready to use containers for various Bigdata project
2. framework to generate Kubernetes resources with any type of customization
3. various configuration set for different use cases (Kerberos, HA, TDE, etc.)
4. helper tools to use the projects in cloud-native environments
Please note that this project is NOT an official Apache project
Containers
name | version (latest) | Example K8s deployments | Kubernetes test tatus |
flink | [1.12.20 1.11.3] | Kubernetes example | |
hadoop | [3.3.0 3.2.2 3.1.4 2.7.3 2.7.0] | Kubernetes example | |
hbase | [2.4.2 2.3.5] | Kubernetes example | |
kafka | [2.8.0 2.7.0] | Kubernetes example | |
ozone | [1.1.0] | Kubernetes example | |
spark | [3.0.0 2.4.5 2.4.6] | Kubernetes example | |
zookeeper | [3.6.2] | Kubernetes example |
Note: there are some other containers on dockerhub, but they are either experimental or retired. The latest version from this table are tested with CI.x
Getting started
Flokkr provides tools and building elements to create your own cluster. It’s based on Flekszible which is a highly flexible Kubernetes resource generator.
To start, install Flekszible and register Flokkr sources:
>flekszible source search
Available flekszible repositories:
+------------------------------------+-----------------------------------------------------------------------------+
| name | description |
+------------------------------------+-----------------------------------------------------------------------------+
| github.com/flokkr/k8s | Flekszible based kubernetes manfiest templates for Apache bigdata projects. |
| github.com/flokkr/infra-flekszible | Flekszible based Kubernetes recipes for logging/monitoring/ci |
| github.com/elek/ozone-flekszible | Apache Hadoop Ozone deployment definitions with flekszible |
+------------------------------------+-----------------------------------------------------------------------------+
Add flekszible topic to your repository to show your repository here.
Register main Flokkr repositories. The repositories contain the Kubernetes resource definitions (together with optional transformations) for the specific projects.
> flekszible source add github.com/flokkr/k8s
> flekszible source add github.com/elek/> ozone-flekszible
> flekszible source add github.com/flokkr/infra-flekszible
Now you can list the available components:
> flekszible app search
INFO[0000] Input dir: /tmp, output dir: /tmp
INFO[0000] Reading resources from /tmp/resources
+--------------------------+------------------------------------------------------------------+
| path | description |
+--------------------------+------------------------------------------------------------------+
| flink | Apache Flink |
| grafana | Grafana dashboard server |
| hdfs | Apache Hadoop HDFS base setup |
| hdfs-ha | Apache Hadoop HDFS, HA setup |
| jaeger | Jaeger tracing server |
| kafka | Apache Kafka |
| kafka-demo | Simple console producer / consumer for Kafka |
| krb5-dev | Unsecure MIT kerberos server for DEVELOPMENT only |
| krb5-dev/getkeystore | Sidecar definition to import java trust/keystore from vault |
| monitor | K8s level monitoring |
| pv-test | Nginx example deployment with persistent volume claim. |
| zookeeper | Scalable Apache Zookeeper setup |
| ozone | Apache Hadoop Ozone |
| ozone/freon | Load test tool for Apache Hadoop Ozone |
| anonymous-proxy | permission to access proxy url by anonymous users |
| cadvisor | CAdvisor node level container metrics |
| grafana | Grafana dashboard server |
| jaeger | Jaeger tracing server |
| kube-dashboard/fulladmin | Full admin privilege for kube-dashboard |
| kube-state-metrics | Kubernetes metrics exporter |
| kubernetes-monitoring | prometheus instance to be configured for k8s cluster monitoring. |
| loki | loki based log collector |
| minio | Simple MINIO S3 server |
| node-exporter | Prometheus Node Exporter |
| prometheus | Prometheus monitoring |
| sleep | Forever sleeping test containers |
+--------------------------+------------------------------------------------------------------+
And add everything what you need:
> flekszible app add zookeeper
> flekszible app add flink
> flekszible app add kafka
Finally you can generate the Kubernetes resources files:
> flekszible generate
It generates all the required yaml files.
> ls -lah
.rwxr-xr-x 194 elek 15 Dec 9:00 Flekszible
.rw-r-xr-x 177 elek 15 Dec 9:02 flink-config-configmap.yaml
.rw-r-xr-x 230 elek 15 Dec 9:02 flink-jobmanager-service.yaml
.rw-r-xr-x 627 elek 15 Dec 9:02 flink-jobmanager-statefulset.yaml
.rw-r-xr-x 233 elek 15 Dec 9:02 flink-taskmanager-service.yaml
.rw-r-xr-x 634 elek 15 Dec 9:02 flink-taskmanager-statefulset.yaml
.rw-r-xr-x 174 elek 15 Dec 9:02 kafka-broker-service.yaml
.rw-r-xr-x 723 elek 15 Dec 9:02 kafka-broker-statefulset.yaml
.rw-r-xr-x 331 elek 15 Dec 9:02 kafka-config-configmap.yaml
.rw-r-xr-x 506 elek 15 Dec 9:02 zookeeper-config-configmap.yaml
.rw-r-xr-x 177 elek 15 Dec 9:02 zookeeper-service.yaml
.rw-r-xr-x 766 elek 15 Dec 9:02 zookeeper-statefulset.yaml
Finally you can install it:
kubectl apply -f .
For more customization check the documentation of Flekszible
Next steps
The next steps after the first cluster is the customization. It can be done by adding any kind of custom transformation or reuse ready-to-use transformations.
> flekszible transformation search
+---------------------+--------------------------------------------------------------------------------------------+
| name | description |
+---------------------+--------------------------------------------------------------------------------------------+
| Namespace | Use explicit namespace |
| Pipe | Transform content with external shell command. |
| Remove | Remove yaml fragment from an existing k8s resources |
| ozone/emptydir | Add empty dir based ephemeral persistence |
| ozone/onenode | remove scheduling rules to make it possible to run multiple datanode on the same k8s node. |
| ozone/persistence | Add real PVC based persistence |
| ozone/profiler | Enable profiler endpoint. |
| Add | Extends yaml fragment to an existing k8s resources |
| Image | Replaces the docker image definition |
| Prefix | Add same prefix to all the k8s names |
| ozone/devtracing | Enable jaeger tracing for ALL the requests (100% sampling) |
| ozone/grafana | Enable grafana for ozone dashboards |
| ozone/memdisk | Use memdisks for empty dirs |
| ozone/ozonefs | copy ozonefs jar file to a temporary emptydir volume |
| Change | Replace existing value literal in the yaml struct |
| ConfigHash | Add labels to the k8s resources with the hash of the used configmaps |
| DaemonToStatefulSet | Converts daemonset to statefulset |
| K8sWriter | Internal transformation to print out k8s resources as yaml |
| PublishService | Creates additional service for internal services |
| Replace | Replace a yaml subtree with an other one. |
| ozone/tracing | Enable jaeger tracing |
| PublishStatefulSet | Creates additional NodeType service for StatefulSet internal services |
| zookeeper/scale | Set the number of the zookeeper replicas. |
| ozone/prometheus | Enable prometheus monitoring in Ozone |
+---------------------+--------------------------------------------------------------------------------------------+
The prefixed transformations (like zookeeper/scale
) are usually combined, pre-defined transformation, you can apply it with command line (flekszible transformation add ozone/emptydir
) or with editing the Flekszible
descriptor.
(Note: all the previous command line just modified this descriptor)
In this example we imported Ozone app, with a transformation which adds emptyDir based persistence:
Content of Flekszible
:
import:
- path: ozone
transformations:
- type: ozone/emptydir
In the next example we imported kafka, kafka-demo, flink, but the flink resources are transformed to add a custom imagePullPolicy
:
Content of Flekszible
:
import:
- path: kafka
- path: kafka-demo
- path: flink
transformations:
- type: add
path:
- spec
- template
- spec
- containers
- ".*"
value:
imagePullPolicy: IfNotPresent
News
Presentations
- 4 ways to Dockerize Apache bigdata project (Docker meetup, Budapest)
- From docker to Kubernetes: Running Hadoop in a cloud-natie way (Berlin Buzzwords 2018)
- Apache Hadoop Ozone in the cloud-native word: Use Hadoop as a Kubernetes persistent storage provider (Apache Roadshow EU, 2018)