(Apache) Big data ❤ Kubernetes

Flokkr is a containerization project forApache Flink, Kafka, Ozone, Spark and other big data project to run them in Kubernetes with a GitOps based approach.

It provides:

1. ready to use containers for various Bigdata project
2. framework to generate Kubernetes resources with any type of customization
3. various configuration set for different use cases (Kerberos, HA, TDE, etc.)
4. helper tools to use the projects in cloud-native environments

Please note that this project is NOT an official Apache project

Containers

name	version (latest)	Example K8s deployments	Kubernetes test tatus
flink	[1.12.20 1.11.3]	Kubernetes example
hadoop	[3.3.0 3.2.2 3.1.4 2.7.3 2.7.0]	Kubernetes example
hbase	[2.4.2 2.3.5]	Kubernetes example
kafka	[2.8.0 2.7.0]	Kubernetes example
ozone	[1.1.0]	Kubernetes example
spark	[3.0.0 2.4.5 2.4.6]	Kubernetes example
zookeeper	[3.6.2]	Kubernetes example

Note: there are some other containers on dockerhub, but they are either experimental or retired. The latest version from this table are tested with CI.x

Getting started

Flokkr provides tools and building elements to create your own cluster. It’s based on Flekszible which is a highly flexible Kubernetes resource generator.

To start, install Flekszible and register Flokkr sources:

>flekszible source search
Available flekszible repositories:

+------------------------------------+-----------------------------------------------------------------------------+
| name                               | description                                                                 |
+------------------------------------+-----------------------------------------------------------------------------+
| github.com/flokkr/k8s              | Flekszible based kubernetes manfiest templates for Apache bigdata projects. |
| github.com/flokkr/infra-flekszible | Flekszible based Kubernetes recipes for logging/monitoring/ci               |
| github.com/elek/ozone-flekszible   | Apache Hadoop Ozone deployment definitions with flekszible                  |
+------------------------------------+-----------------------------------------------------------------------------+


Add flekszible topic to your repository to show your repository here.

Register main Flokkr repositories. The repositories contain the Kubernetes resource definitions (together with optional transformations) for the specific projects.

> flekszible source add github.com/flokkr/k8s
> flekszible source add github.com/elek/> ozone-flekszible
> flekszible source add github.com/flokkr/infra-flekszible

Now you can list the available components:

> flekszible app search
INFO[0000] Input dir: /tmp, output dir: /tmp
INFO[0000] Reading resources from /tmp/resources
+--------------------------+------------------------------------------------------------------+
| path                     | description                                                      |
+--------------------------+------------------------------------------------------------------+
| flink                    | Apache Flink                                                     |
| grafana                  | Grafana dashboard server                                         |
| hdfs                     | Apache Hadoop HDFS base setup                                    |
| hdfs-ha                  | Apache Hadoop HDFS, HA setup                                     |
| jaeger                   | Jaeger tracing server                                            |
| kafka                    | Apache Kafka                                                     |
| kafka-demo               | Simple console producer / consumer for Kafka                     |
| krb5-dev                 | Unsecure MIT kerberos server for DEVELOPMENT only                |
| krb5-dev/getkeystore     | Sidecar definition to import java trust/keystore from vault      |
| monitor                  | K8s level monitoring                                             |
| pv-test                  | Nginx example deployment with persistent volume claim.           |
| zookeeper                | Scalable Apache Zookeeper setup                                  |
| ozone                    | Apache Hadoop Ozone                                              |
| ozone/freon              | Load test tool for Apache Hadoop Ozone                           |
| anonymous-proxy          | permission to access proxy url by anonymous users                |
| cadvisor                 | CAdvisor node level container metrics                            |
| grafana                  | Grafana dashboard server                                         |
| jaeger                   | Jaeger tracing server                                            |
| kube-dashboard/fulladmin | Full admin privilege for kube-dashboard                          |
| kube-state-metrics       | Kubernetes metrics exporter                                      |
| kubernetes-monitoring    | prometheus instance to be configured for k8s cluster monitoring. |
| loki                     | loki based log collector                                         |
| minio                    | Simple MINIO S3 server                                           |
| node-exporter            | Prometheus Node Exporter                                         |
| prometheus               | Prometheus monitoring                                            |
| sleep                    | Forever sleeping test containers                                 |
+--------------------------+------------------------------------------------------------------+

And add everything what you need:

> flekszible app add zookeeper
> flekszible app add flink
> flekszible app add kafka

Finally you can generate the Kubernetes resources files:

> flekszible generate

It generates all the required yaml files.

> ls -lah
.rwxr-xr-x   194 elek 15 Dec  9:00  Flekszible
.rw-r-xr-x   177 elek 15 Dec  9:02  flink-config-configmap.yaml
.rw-r-xr-x   230 elek 15 Dec  9:02  flink-jobmanager-service.yaml
.rw-r-xr-x   627 elek 15 Dec  9:02  flink-jobmanager-statefulset.yaml
.rw-r-xr-x   233 elek 15 Dec  9:02  flink-taskmanager-service.yaml
.rw-r-xr-x   634 elek 15 Dec  9:02  flink-taskmanager-statefulset.yaml
.rw-r-xr-x   174 elek 15 Dec  9:02  kafka-broker-service.yaml
.rw-r-xr-x   723 elek 15 Dec  9:02  kafka-broker-statefulset.yaml
.rw-r-xr-x   331 elek 15 Dec  9:02  kafka-config-configmap.yaml
.rw-r-xr-x   506 elek 15 Dec  9:02  zookeeper-config-configmap.yaml
.rw-r-xr-x   177 elek 15 Dec  9:02  zookeeper-service.yaml
.rw-r-xr-x   766 elek 15 Dec  9:02  zookeeper-statefulset.yaml

Finally you can install it:

kubectl apply -f .

For more customization check the documentation of Flekszible

Next steps

The next steps after the first cluster is the customization. It can be done by adding any kind of custom transformation or reuse ready-to-use transformations.

> flekszible transformation search
+---------------------+--------------------------------------------------------------------------------------------+
| name                | description                                                                                |
+---------------------+--------------------------------------------------------------------------------------------+
| Namespace           | Use explicit namespace                                                                     |
| Pipe                | Transform content with external shell command.                                             |
| Remove              | Remove yaml fragment from an existing k8s resources                                        |
| ozone/emptydir      | Add empty dir based ephemeral persistence                                                  |
| ozone/onenode       | remove scheduling rules to make it possible to run multiple datanode on the same k8s node. |
| ozone/persistence   | Add real PVC based persistence                                                             |
| ozone/profiler      | Enable profiler endpoint.                                                                  |
| Add                 | Extends yaml fragment to an existing k8s resources                                         |
| Image               | Replaces the docker image definition                                                       |
| Prefix              | Add same prefix to all the k8s names                                                       |
| ozone/devtracing    | Enable jaeger tracing for ALL the requests (100% sampling)                                 |
| ozone/grafana       | Enable grafana for ozone dashboards                                                        |
| ozone/memdisk       | Use memdisks for empty dirs                                                                |
| ozone/ozonefs       | copy ozonefs jar file to a temporary emptydir volume                                       |
| Change              | Replace existing value literal in the yaml struct                                          |
| ConfigHash          | Add labels to the k8s resources with the hash of the used configmaps                       |
| DaemonToStatefulSet | Converts daemonset to statefulset                                                          |
| K8sWriter           | Internal transformation to print out k8s resources as yaml                                 |
| PublishService      | Creates additional service for internal services                                           |
| Replace             | Replace a yaml subtree with an other one.                                                  |
| ozone/tracing       | Enable jaeger tracing                                                                      |
| PublishStatefulSet  | Creates additional NodeType service for StatefulSet internal services                      |
| zookeeper/scale     | Set the number of the zookeeper replicas.                                                  |
| ozone/prometheus    | Enable prometheus monitoring in Ozone                                                      |
+---------------------+--------------------------------------------------------------------------------------------+

The prefixed transformations (like zookeeper/scale) are usually combined, pre-defined transformation, you can apply it with command line (flekszible transformation add ozone/emptydir) or with editing the Flekszible descriptor.

(Note: all the previous command line just modified this descriptor)

In this example we imported Ozone app, with a transformation which adds emptyDir based persistence:

Content of Flekszible:

import:
 - path: ozone
   transformations:
   - type: ozone/emptydir

In the next example we imported kafka, kafka-demo, flink, but the flink resources are transformed to add a custom imagePullPolicy:

Content of Flekszible:

import:
 - path: kafka
 - path: kafka-demo
 - path: flink
   transformations:
   - type: add
     path:
       - spec
       - template
       - spec
       - containers
       - ".*"
     value:
       imagePullPolicy: IfNotPresent

News

Presentations

4 ways to Dockerize Apache bigdata project (Docker meetup, Budapest)
From docker to Kubernetes: Running Hadoop in a cloud-natie way (Berlin Buzzwords 2018)
Apache Hadoop Ozone in the cloud-native word: Use Hadoop as a Kubernetes persistent storage provider (Apache Roadshow EU, 2018)