1. Introduction

1.1. Eskimo

A state of the art Big Data Infrastructure and Management Web Console to build, manage and operate Big Data 2.0 Analytics clusters

50

Eskimo is in a certain way the Operating System of your Big Data Cluster:

  • A plug and play, working out of the Box, Big Data Analytics platform fulfilling enterprise environment requirements.

  • A state of the art Big Data 2.0 platform

    • based on Kubernetes, Docker and Systemd

    • packaging Gluster, Spark, Kafka, Flink and ElasticSearch

    • with all the administration and management consoles such as Cerebro, Kibana, Zeppelin, Kafka-Manager, Grafana, Prometheus and of course the _Kubernetes Dashboard:.

  • An Administration Application aimed at drastically simplifying the deployment, administration and operation of your Big Data Cluster

  • A Data Science Laboratory and Production environment where Data Analytics is both

    • developed and

    • operated in production

Eskimo is as well:

  • a collection of ready to use docker container images packaging fine-tuned and highly customized plug and play services with all the nuts and bolts required to make them work perfectly together.

  • a framework for building and deploying Big Data and NoSQL software components on Kubernetes as well as SystemD for host native components.

800

1.2. Key Features

Eskimo key features are as follows:

60

Abstraction of Location

Just pick up the services you need, define where and how you want to run them and let eskimo take care of everything.

Move native services between nodes, change their configuration or install new services in just a few clicks.

Don’t bother remembering where and how things run, Eskimo wraps everything up in a single and coherent User Interface.

60

Eskimo Web User Interface

Eskimo’s tip of the iceberg is its flagship Web User Interface.

The Eskimo User Interface is the single and only entry point to all your cluster operations, from services installation to accessing Kibana, Zeppelin and other UI applications.

The Eskimo User Interface also provides SSH consoles, File browser access and monitoring to your cluster.

60

Services Framework

Eskimo is a Big Data Components service development and integration framework based on Kubernetes, Docker and Systemd.

Eskimo provides out of the box ready-to use components such as Spark, Flink, ElasticSearch, Kafka, Kubernetes Dashboard, Zeppelin, etc.

Eskimo also enables developers and administrators to implement and package their own services very easily.

1.3. The Service Developer Guide

This documentation is related to the last of the key features presented above : The Services Development Framework.
It presents all concetps and aspects a developer needs to understand and know to implement his own services and let eskimo distribute and operate them, or extend current services.

2. Introducing the Service Development Framework

The Service Development framework is actually composed by two distinct parts:

2.1. Principle schema

The whole services development framework can be represented as follows:

800
  • The Docker Images Development Framework is used to develop and build Container templates images that are later used to build the actual docker containers images that are fine-tuned to Eskimo deployed on Kubernetes or natively on individual Eskimo cluster nodes.

  • The Services Installation Framework which creates the actual container image from the template image and adapts it to the specific situation of the eskimo node or kube cluster on which it is being deployed.

The different scripts involved in the different stages are presented on the schema above along with their responsibilities and the environment in which they are executed (outside of the container - on the eskimo host machine or the eskimo cluster node - or from within the docker container).

2.2. Core principles

The core principles on which both the Docker Images Development Framework and the Services Installation Framework are built are as follows:

  • A service is eventually two things

    • a docker image packaging the software component and its dependencies (also, rarely, it can be an archive of native programs directly deployed on the cluster node, e.g. for the Kubernetes stack).

    • a set of shell scripts aimed at installing and setting up the service

    • plus essentially:

      • either A kubernetes service deployment YAML file describing the components to deploy on kubernetes

      • or a systemd unit configuration file aimed to operate it on nodes

  • Everything - from building the docker image to installing it on cluster nodes - is done using simple shell scripts. With Eskimo, a system administrator desiring to leverage on eskimo to implement his own services in order to integrate additional software components on Eskimo doesn’t need to learn any new and fancy technology, just plain old shell scripting, docker, and aiether systemd or perhaps a bit of kubernetes configuration; that’s it.

  • Eskimo leverages on unix standards. Software components are installed in /usr/local, log files are in sub-folders of /var/log, persistent data is in sub-folders of /var/lib, etc.

2.3. A note on images' template download.

At eskimo first start, the user (administrator) needs to choose whether he wants to build the services images (docker container templates images) locally or download them from eskimo.sh.

  • Buidling the images locally means creating each and every docker template image using the build.sh script. Only the vanilla software packages are downloaded from internet (such as the elasticsearch distribution, the spark or flink, archive, etc.).
    This can take a significant time, several dozens of minutes or more.

  • Downloading the images from internet means downloading ready-to-use docker template images from eskimo.sh

The whole setup process when downloading the images from internet can be represented this way:

800

3. Docker Images Development Framework

Eskimo manages services (e.g. Flink, Kafka, ElasticSearch, Kibana, etc.) as Docker containers. A docker container is the individual and single unit representing a service to be managed by Eskimo.
Docker containers packaging eskimo services should be most of the time downloaded from the Eskimo online package archive. But an administrator can very well decide to build his own versions of Eskimo provided service container images or event implement his own service packages.
This is described in this section.

The Docker Images Development Framework provides tools and defines standards to build the docker images deployed on the eskimo cluster nodes.

Eskimo is leveraging on docker to deploy services either on Kubernetes or natively across the cluster nodes and to ensure independence from host OSes and specificities as well as proper isolation between services.

Again, Docker images can be downloaded from internet (or the internal corporation network) or build locally on the machine running the Eskimo backend.

3.1. Requirements

In order for one to be able to either build the eskimo software components packages locally (or have eskimo building them locally), there are a few requirements:

  • Docker must be installed and available on the user local computer or the eskimo host machine in case Eskimo builds them.

  • At least 10Gb of hard drive space on the partition hosting /tmp must be available.

  • At least 15Gb of hard drive space on the partition hosting the eskimo installation folder (Eskimo backend) must be available.

3.2. Principle

The principle is pretty straightforward:

  • Every docker image (or package) is built by calling build.sh from the corresponding package folder, i.e. a sub-folder of this very folder packages_dev

  • That build.sh script is free to perform whatever it wants as long as the result is a docker image with the expected name put in the folder packages_distrib in the parent folder and packaging the target software component.

Build files are provided for each and every service pre-packaged within eskimo.

The user is welcome to modify them in order to fine tune everything the way he wants or implement his own packages for other software components not yet bundled with eskimo.
Again, the only requirement is that at the end of the build process a corresponding image is put in packages_distrib as well as that some conventions are properly followed as explained below.

Internet is usually required on the eskimo machine to build or re-build the eskimo provided pre-packages images since the target software component is downloaded from Internet.

3.3. identifying required images in services.json

Eskimo needs a way during initial setup time to know which packages need to be built or downloaded. The list of docker images to be built is composed by the conjunction of two lists:

  • The images referenced by services in services.json (See Configuration file services.json )

  • The additional packages from configuration property system.additionalPackagesToBuild in the configuration file eskimo.properties, such as, for instance: base-eskimo

default system.packagesToBuild property
system.packagesToBuild=base-eskimo

The property system.additionalPackagesToBuild is required to identify 3rd party or additional images that are required when building service images.
Only service images can be declared in services.json.
By default, only the image base-eskimo being used by all the orher package images needs to be built in addition to the ones actually in use by services and declared in services.json.

3.4. Standards and conventions over requirements

There are no requirements when building an eskimo package docker image. The image developer is free to install the software package within the image the way he wants and no specific requirement is enforced by eskimo.

As long as eventually, The Service Installation framework for the new software package provides a systemd unit configuration file enabling the eskimo framework to manipulate the service, a service developer has all the freedom to design his docker container the way he wants.

However, adhering to some conventions eases a lot the implementation and maintenance of these images.
These standard conventions are as follows:

  • All docker images are based on the base-eskimo image which is itself based on a lightweight debian stretch image.

  • A software component named X with version Y should be installed in /usr/local/lib/x-y (if and only if it is not available in apt repositories and installed using standard debian apt-get install x).

    • In this case, a simlink preventing the further Services Installation framework from the need to mess with version numbers should be created : /usr/local/lib/x pointing to /usr/local/lib/x-version.

    • This simlink is pretty important. In the second stage, when installing services on eskimo cluster nodes, it is important that setup and installation scripts can be prevented from knowing about the actual version of the software component being installed. Hence the need for that simlink.

  • The presence of a single build.sh script is a requirement. That script is called by eskimo to build the package if it is not yet available (either built or downloaded)

  • A helper script ../common/common.sh can be linked in the new package build folder using ln -s ../common/common.sh. This common script provides various helper functions to build the docker container, save it to the proper location after building, etc.

  • The script performing the in container installation of the software component X is usually called installX.sh.

  • Software versions to be downloaded and installed within docker images are coded once and for all in the file ../common/common.sh. This is actually a requirement since most of the time software version for common components such as ElasticSearch or scala are used in several different packages. Defining version numbers of software components to be downloaded and installed in a common place helps to enforce consistency.

An eskimo component package developer should look at pre-packaged services to get inspired and find out how eskimo packaged services are usually implemented.

3.5. Typical build.sh process

3.5.1. Operations performed

The build process implemented as a standard in the build.sh script has two different stages:

  1. The container preparation and all the container setup operated from outside the container

  2. The software installation done from inside the container

As a convention, installation of all dependencies is performed from outside the container but then the installation of the software packaged in the container is performed from a script called within the container (script installX.sh for package X).

The build process thus typically looks this way:

  1. From outside the container:

    • Creation of the container from the base eskimo image (debian based)

    • Installation of the prerequisites (Java JDK, Scala, python, etc.) using docker exec …​

    • Calling of the software installation script : docker exec -it …​ installX.sh

  2. From inside the container:

    • Downloading of the software from internet

    • Installation in a temporary folder

    • Moving of the installation software to /usr/local/lib/X-Version or else

    • symlinking the software from /usr/local/lib/X (without version number)

And that’s it.

The package installation is limited to this, all the customizations and fine tuning the image for its use by eskimo is done at a later stage, during the Service Installation on Kubernetes or directly on eskimo cluster nodes.

3.6. Look for examples and get inspired

Look at the eskimo pre-packaged component packages development scripts for examples and the way they are built to get inspired for implementing your own packages.

3.7. Building the Kubernetes archive

Building the Kubernetes archive is different than building the component images as presented above. As a matter of fact, Kubernetes is installed natively on Eskimo cluster nodes and not through docker.

The build.sh script in the binary_k8s folder behaves differently. It’s objective is to package all of the Kubernetes runtime components software within a single archive. These components, from etcd to the different kube master and slave processes are packages together and extracted natively on Eskimo cluster nodes for native execution during the node base installation process.

This section presents different important notes related to some specific services shipped with Eskimo building aspects.

3.8.1. Zeppelin building

Zeppelin can be built from a checkout of the latest git repository master or from an official release.

The file common/common.sh defines a variable ZEPPELIN_IS_SNAPSHOT which, when defined to true, causes the build system to work from git and rebuild zeppelin from the sources instead of downloading a released package.

export ZEPPELIN_IS_SNAPSHOT="false" # set to "true" to build zeppelin from zeppelin git master

Importantly, zeppelin will be build in the folder /tmp/ of the host machine running eskimo (using a docker container though) which maps /tmp to its own /tmp).
At least 20 GB of storage space needs to be available in /tmp of the machine running eskimo for the build to succeed.

It is recommended though to use a binary archive of Zeppelin since the build from the source rarely succeeds out of the box (Zeppelin build system is fairly complicated and the eskimo team maintains the "build from source" feature only when the last binary release of Zeppelin suffers from impacting bugs).

3.9. Setting up a remote packages repository

When running eskimo, software packages - either service docker images or the Kube binary archive - can be either built or downloaded from a remote packages repository.

Setting up a remote packages repository is extremely simple:

  • The software packages need to be downloadable from internet at a specific URL using either HTTP or HTTPS.

  • at the same location where packages are downloaded, a meta-data file should be downloadable and present the various available packages.

For instance the following layout should be available from internet or the local network:

A software package should be named as follows:

  • docker_template_[software]_[software_version]_[eskimo_version].tar.gz for service docker images

  • eskimo_kube_[software_version]_[eskimo_version].tar.gz for kube archive

The file eskimo_packages_versions.json describes the repository of packages and the available packages.

Example eskimo_packages_versions.json
{
  "base-eskimo" : {
    "software" : "0.2",
    "distribution" : "1"
  },
  "cerebro": {
    "software": "0.8.4",
    "distribution": "1"
  },
  "elasticsearch" : {
    "software": "6.8.3",
    "distribution": "1"
  },
  "flink" : {
    "software" : "1.9.1",
    "distribution": "1"
  },
  "gluster": {
    "software" : "debian_09_stretch",
    "distribution": "1"
  },
  ...
  "kube": {
    "software": "1.23.5",
    "distribution": "1"
  }
}

It’s content should be aligned with the following properties from the configuration file eskimo.properties:

  • system.additionalPackagesToBuild and images declared on services.json giving the set of docker images for packages to be or downloaded

  • system.kubePackages giving the name of the kubernetes package to built or downloaded

4. Services Installation Framework

The Services Installation Framework provides tools and standards to install the packaged docker images containing the target software component on Kubernetes or natively on the eskimo cluster nodes.

Eskimo is leveraging on Kubernetes to run and operate most of the services and the couple docker and SystemD for the few of them - such as gluster, ntp, etc - that need to run natively on Eskimo cluster nodes.

  • An eskimo package has to be a docker image

  • An eskimo package has to provide

    • either a systemd unit configuration file to enable eskimo to operate the component natively on the cluster nodes.

    • or a Kubernetes YAML configuration file to delegate the deployment and operation to Kubernetes.

The Eskimo Web User Interface takes care of installing the services defined in the services.json configuration file and copies them over to the nodes where they are intended to run (native node or Kubernetes). After the proper installation, eskimo relies either on plain old systemctl or kubectl commands to operate (deploy / start / stop / restart / query / etc.) the services.

4.1. Principle

The principle is pretty straightforward:

  • Whenever a service serviceX is to be deployed, eskimo makes an archive of the folder services_setup/serviceX, copies that archive over to the target node and extracts it in a subfolder of /tmp.

  • Then eskimo calls the script setup.sh from within that folder. The script setup.sh can do whatever it wants but has to respect an important constraint.

  • After that setup.sh script is properly executed, the service should be

    • either installed natively on the node along with a systemd system unit file with name serviceX.service which is used to control the serviceX service lifecycle through commands such as systemctl start serviceX,

    • or properly deployed in Kubernetes and executing a POD name prefixed by the service name and a kube service matching it.

Aside from the above, nothing is enforced and service developers can implement services the way they want.

4.1.1. Gluster share mounts

Many Eskimo services can leverage on gluster to share data accross cluster nodes.
SystemD services rely on the host to mount gluster shares and then mount the share to the gluster container from the host mount.
The way to do this is as follows:

  • The service setup.sh script calls the script /usr/local/sbin/gluster_mount.sh [SHARE_NAME] [SHARE_PATH] [OWNER_USER]
    This script will take care of registering the gluster mount with SystemD, fstab, etc.

  • The service SystemD unit file should define a dependency on the SystemD mount by using the following statements
    After=gluster.service
    After=[SHARE_PATH_HYPHEN-SEPARATED].mount

Using the host to mount gluster shares is interesting since it enables Eskimo users to see the content of the gluster share using the Eskimo File Manager.

The approach is very similar for Kubernetes services, except they can’t be relying on SystemD (which is not available to Kube containers)
So Kubernetes services actually mount the gluster share directly from inside the docker container.
The way to do this is as follows:

  • The container startup script calls the script inContainerMountGluster.sh [SHARE_NAME] [SHARE_PATH] [OWNER_USER]

4.1.2. OS System Users creation

OS system users required to execute Kubernetes and native services are required to be created on every node of the Eskimo cluster nodes with consistent user IDs across the cluster . For this reason, the linux system users to be created on every node are not created in the individual services setup.sh scripts. They are created by a specific script /usr/local/sbin/eskimo-system-checks.sh generated at installation time by the eskimo base system installation script install-eskimo-base-system.sh.

4.2. Standards and conventions over requirements

There are no requirements when setting up a service on a node aside from the constraints mentioned above. Services developers can set up services on nodes the way then want and no specific requirement is enforced by eskimo.

However, adhering to some conventions eases a lot the implementation and maintenance of these services.
These standard conventions are as follows (illustrated for a service called serviceX).

  • Data persistency

    • Cluster node native Services should put their persistent data (to be persisted between two docker container restart) in /var/lib/serviceX which shozld be mounted from the host by the called to docker in the SystemD unit file

    • Kubernetes services should either rely on Kubernetes provided persistent storage or use a gluster share.

  • Services should put their log files in /var/log/serviceX which is mounted from the runtime host.

  • If the service requires a file to track its PID, that file should be stored under /var/run/serviceX to be mounted from the runtime host as well.

  • Whenever a service serviceX requires a subfolder of /var/log/serviceX to be shared among cluster nodes, a script setupServiceXGlusterSares.sh should be defined that calls the common helper script (define at eskimo base system installation on every node) /usr/local/sbin/gluster_mount.sh in the following way, for instance to define the flink data share : /usr/local/sbin/gluster_mount.sh flink_data /var/lib/flink/data flink

  • The approach is the same from within a container, but the name if the script to call is different: /usr/local/sbin/inContainerMountGlusterShare.sh.

At the end of the day, it’s really plain old Unix standards. The only challenge comes from the use of docker and/or Kubernetes which requires to play with docker mounts a little.
Just look at eskimo pre-packaged services to see examples.

4.3. Typical setup.sh process

4.3.1. Operations performed

The setup process implemented as a standard in the setup.sh script has three different stages:

  1. The container instantiation from the pre-packaged image performed from outside the container

  2. The software component setup performed from inside the container

    • The registration of the service to SystemD or Kubernetes

  3. The software component configuration applied at runtime, i.e. at the time the container starts, re-applied everytime.

The fourth phase is most of the time required to apply configurations depending on environment dynamically at startup time and not statically at setup time.
The goal is to address situations where, for instance, master services are moved to another node (native deployment) or moved around by Kubernetes. In this case, applying the master setup configuration at service startup time instead of statically enables to simply restart a slave service whenever the master node is moved to another node instead of requiring to entirely re-configure them.

The install and setup process thus typically looks this way:

  1. From outside the container:

    • Perform required configurations on host OS (create /var/lib subfolder, required system user, etc.)

    • Run docker container that will be used to create the set up image

    • Call in container setup script

  2. From inside the container:

    • Create the in container required folders and system user, etc.

    • Adapt configuration files to eskimo context (static configuration only !)

  3. At service startup time:

And that’s it.

Again, the most essential configuration, the adaptation to the cluster topology is not done statically at container setup time but dynamically at service startup time.

4.3.2. Standard and conventions

While nothing is really enforced as a requirement by eskimo (aside of SystemD / Kubernetes and the name of the setup.sh script, there are some standards that should be followed (illustrated for a service named serviceX:

  • The "in container" setup script is usually called inContainerSetupServiceX.sh

  • The script taking care of the dynamic configuration and the starting of the service - the one actually called by systemd upon service startup - is usually called inContainerStartServiceX.sh

  • The systemd system configuration file is usually limited to stopping and starting the docker container

  • The Kubernetes deployment file usually create a deployment (for replicaset) or a statefulset along with all services required to reach the software component.

4.3.3. Look for examples and get inspired

Look at examples and the way the packages provided with eskimo are set up and get inspired for implementing your own packages.

4.4. Eskimo services configuration

Creating the service setup folder and writing the setup.sh script is unfortunately not sufficient for eskimo to be able to operate the service.
A few additional steps are required, most importantly, defining the new service in the configuration file services.json.

4.4.1. Configuration file services.json

In order for a service to be understood and operable by eskimo, it needs to be declared in the services configuration file services.json.

A service declaration in services.json for instance for serviceX would be defined as follows:

ServiceX declaration in services.json
"serviceX" : {

  "config": {

    ## [mandatory] giving the column nbr in status table
    "order": [0-X],

    ## [optional] whether or not it has to be instaled on every node
    ## Default value is false.##
    "mandatory": [true,false],

    ## [unique] whether the service is a unique service (singpe instance) or multiple
    "unique": [true,false],

    ## [unique] whether the service is managed through Kubernetes (true) or natively
    ## on nodes with SystemD (false)
    "kubernetes": [true,false],

    ## [optional] name of the group to associate it in the status table
    "group" : "{group name}",

    ## [mandatory] name of the service. must be consistent with service under
    ## 'service_setup'
    "name" : "{service name},

    ## [mandatory] name of the image. must be consistent with docker image name under
    ## 'packages_dev'
    ## Most of the time, this is the same as {service name}
    "imageName" : "{image name},

    ## [mandatory] where to place the service in 'Service Selection Window'
    "selectionLayout" : {
      "row" : [1 - X],
      "col" : [1 - X]
    },

    ## memory to allocate to the service
    ## (negligible means the service is excluded from the memory allocation policy
    ##  Kubernetes services are accounted specifically:
    ##  - services running on all nodes are account as native services
    ##  - services running as replicaSet are accounted globally and their total
    ##    required memory is divided amongst all nodes.
    ## )
    "memory": "[negligible|small|medium|large|verylarge]",

    ## [mandatory] The logo to use whenever displaying the service in the UI is
    ##     required
    ## Use "images/{logo_file_name}" for resources packaged within eskimo web app
    ## Use "static_images/{logo_file_name}" for resources put in the eskimo
    ##    distribution folder "static_images"
    ## (static_images is configurable in eskimo.properties with property
    ##    eskimo.externalLogoAndIconFolder)
    "logo" : "[images|static_images]/{logo_file_name}"

    ## [mandatory] The icon to use ine the menu for the service
    ## Use "images/{icon_file_name}" for resources packaged within eskimo web app
    ## Use "static_images/{icon_file_name}" for resources put in the eskimo
    ##    distribution folder "static_images"
    ## (static_images is configurable in eskimo.properties with property
    ##    eskimo.externalLogoAndIconFolder)
    "icon" : "[images|static_images]/{icon_file_name}"

    # The specific Kubernetes configuration for kubernetes=true services
    "kubeConfig": {

      # the resource request to be made by PODs
      "request": {

        # The number of CPUs to be allocated to the POD(s) by Kubernetes
        # Format : X for X cpus, can have decimal values
        "cpu": "{number of CPU}, # e.g. 0.5

        # The amount of RAM to be allocated to the POD(s) by Kubernetes
        # Format: X[k|m|g|p] where k,m,g,p are multipliers (kilo, mega, etc.)
        "ram": "{amount of RAM}, # e.g. 1600m

      }
    }
  },

  ## [optional] configuration of the serice web console (if anym)
  "ui": {

    ## [optional] (A) either URL template should be configured ...
    "urlTemplate": "http://{NODE_ADDRESS}:{PORT}/",

    ## [optional] (B) .... or proxy configuration in case the service has
    ## to be proxied by eskimo
    "proxyTargetPort" : {target port},

    ## [mandatory] the time  to wait for the web console to initialize before
    ## making it available
    "waitTime": {1000 - X},

    ## [mandatory] the name of the menu entry
    "title" : "{menu name}",

    ## [mandatory] the role that the logged in user needs to have to be able
    ## to see and use the service (UI)
    ## Possible values are :
    ##  - "*" for any role (open access)
    ## - "ADMIN" to limit usage to administrators
    ## - "USER" to limit usage to users (makes little sense)
    "role" : "[*|ADMIN|USER]",

    ## [optional] the title to use for the link to the service on the status page
    "statusPageLinktitle" : "{Link Title}",

    ## [optional] Whether standard rewrite rules need to be applied to this
    ## service
    ## (Standard rewrite rules are documented hereunder)
    ## (default is true)
    "applyStandardProxyReplacements": [true|false],

    ## [optional] List of custom rewrite rules for proxying of web consoles
    "proxyReplacements" : [

      ## first rewrite rule. As many as required can be declared
      {

        ## [mandatory] Type of rwrite rule. At the moment only PLAIN is supported
        ## for full text search and replace.
        ## In the future REGEXP type shall be implemented
        "type" : "[PLAIN]",

        ## [optional] a text searched in the URL. this replacement is applied only
        ## if the text is found in the URL
        "urlPattern" : "{url_pattern}", ## e.g. controllers.js

        ## [mandatory] source text to be replaced
        "source" : "{source_URL}", ## e.g. "/API"

        ## [mandatory] replacement text
        "target" : "{proxied_URL}" ## e.g. "/eskimo/kibana/API"
      }
    ],

     ## [optional] List of page scripter
     ## Page scripts are added to the target resource just aboce the closing 'body'
     ## tag
    "pageScripters" : [
      {

        # [mandatory] the target resource where the script should be added
        "resourceUrl" : "{relative path to target resource}",

        # [mandatpry] content of the 'script' tag to be added
        "script": "{javascript script}"
      }
    ],

    ## [optional] list of URL in headers (e.g. for redirects) that should be
    ## rewritten
    "urlRewriting" : [
      {

        # [mandatory] the start pattern of the URL to be searched in returned headers
        "startUrl" : "{searched prefix}" ## e.g. "{APP_ROOT_URL}/history/",

        # [mandatory] the replacement for that pattern
        "replacement" : "{replacement}" ## e.g.
                                   ## "{APP_ROOT_URL}/spark-console/history/"
      }
    ]

  },

  ## [optional] array of dependencies that need to be available and configured
  "dependencies": [

    ## first dependency. As many as required can be declared
    {

      ## [mandatory] For services not operated by kubernetes, this is
      ## essential: it defines how the master service is determined.
      "masterElectionStrategy": "[NONE|FIRST_NODE|SAME_NODE_OR_RANDOM|RANDOM|RANDOM_NODE_AFTER|SAME_NODE|ALl_NODES]"

      ## the service relating to this dependency
      "masterService": "{master service name}",

      ## The number of master expected
      "numberOfMasters": [1-x],

      ## whether that dependency is mandatory or not
      "mandatory": [true|false],

      ## whether or not the dependent service (parent JSON definition) should be
      ## restarted in case an operation affects this service
      "restart": [true|false],
    }
  ]

  ## [optional] array of configuration properties that should be editable using the
  ## Eskimo UI. These configuration properties are injected
  "editableConfigurations": [

    ## first editable configuration. As many as required can be declared
    {

      ## the name of the configuration file to search for in the software
      ## installation directory (and sub-folders)
      "filename": "{configuration file name}", ## e.g. "server.properties"

      ## the name of the service installation folder under /usr/local/lib
      ## (eskimo standard installation path)
      "filesystemService": "{folder name}", ## e.g. "kafka"

      ## the type of the property syntax
      ##  - "variable" for a simple approach where a variable declaration of the
      ##    expected format is searched for
      ##  - "regex" for a more advanced approach where the configuration is searched
      ##    and replaces using the regex given in format
      "propertyType": "variable",

      ## The format of the property definition in the configuration file
      ## Supported formats are:
      ##  - "{name}: {value}" or
      ##  - "{name}={value}" or
      ##  - "{name} = s{value} or"
      ##  - "REXG with {name} and {value} as placeholders"
      "propertyFormat": "property format", ## e.g. "{name}={value}"

      ## The prefix to use in the configuration file for comments
      "commentPrefix": "#",

      ## The list of properties to be editable by administrators using the eskimo UI
      "properties": [

        ## first property. As many as required can be declared
        {

          ## name of the property
          "name": "{property name}", ## e.g. "num.network.threads"

          ## the description to show in the UI
          "comment": "{property description}",

          ## the default value to use if undefined by administrators
          "defaultValue": "{default property value}" ## e.g. "3"
        }
      ]
    }
  ],

  ## [optional] array of custom commands that are made available from the context
  ## menu on the System Status Page (when clicking on services status (OK/KO/etc.)
  "commands" : [
    {

      ## ID of the command. Needs to be a string with only [a-zA-Z_]
      "id" : "{command_id}", ## e.g. "show_log"

      ## Name of the command. This name is displayed in the menu
      "name" : "{command_name}", ## e.g. "Show Logs"

      ## The System command to be called on the node running the service
      "command": "{system_command}", ## e.g. "cat /var/log/ntp/ntp.log"

      ## The font-awesome icon to be displayed in the menu
      "icon": "{fa-icon}" ## e.g. "fa-file"
    }
  ],

  ## Additional environment information to be generated in eskimo_topology.sh
  ## This can contain multiple values, all possibilities are listed underneath as
  ## example
  "additionalEnvironment": {

    # Create an env var that lists all nodes where serviceX is installed
    "ALL_NODES_LIST_serviceX",

    # Create a env var that gives the number for this service, in a consistent and
    # persistent way (can be 0 or 1 based
    "SERVICE_NUMBER_[0|1]_BASED",

    # Give in evnv var the context path under which the eskimo Wen Use Interface is
    # deployed
    "CONTEXT_PATH"

  }
}

(Bear in mind that since json actually doesn’t support such thing as comments, the example above is actually not a valid JSON snippet - comments starting with '##' would need to be removed.)

Everything is pretty straightforward and one should really look at the services pre-packaged within eskimo to get inspiration when designing a new service to be operated by eskimo.

4.4.2. Eskimo Topology and dependency management

As stated above, the most essential configuration property in a service definition is the masterElectionStrategy of a dependency.
The whole master / slave topology management logic as well as the whole dependencies framework of eskimo relies on it.

This is especially important for non-kubernetes services since most of the time the notion of "master" (in the eskimo sense) is replaced by the usage of a kubernetes service to reach the software component deployed on Kubernetes.

4.4.3. Master Election strategy

Let’s start by introducing what are the supported values for this masterElectionStrategy property:

  • NONE : This is the simplest case. This enables a service to define as requiring another service without bothering where it should be installed. It just has to be present somewhere on the cluster and the first service doesn’t care where.
    It however enforces the presence of that dependency service somewhere and refuses to validate the installation if the dependency is not available somewhere on the eskimo nodes cluster.

  • FIRST_NODE : This is used to define a simple dependency on another service. In addition, FIRST_NODE indicates that the service where it is declared wants to know about at least one node where the dependency service is available.
    That other node should be the first node found where that dependency service is available.
    First node means that the nodes are processed by their order of declaration. The first node than runs the dependency service will be given as dependency to the declaring service.

  • SAME_NODE_OR_RANDOM : This is used to define a simple dependency on another service. In details, SAME_NODE_OR_RANDOM indicates that the first service wants to know about at least one node where the dependency service is available.
    In the case of SAME_NODE_OR_RANDOM, eskimo tries to find the dependency service on the very same node than the one running the declaring service if that dependent service is available on that very same node.
    If no instance of the dependency service is not running on the very same node, then any other random node running the dependency service is used as dependency. (This is only possible for native nodes SystemD services)

  • RANDOM : This is used to define a simple dependency on another service. In details, RANDOM indicates that the first service wants to know about at least one node where the dependency service is available. That other node can be any other node of the cluster where the dependency service is installed.

  • RANDOM_NODE_AFTER : This is used to define a simple dependency on another service. In details, RANDOM_NODE_AFTER indicates that the first service wants to know about at least one node where that dependency service is available.
    That other node should be any node of the cluster where the second service is installed yet with a node number (internal eskimo node declaration order) greater than the current node where the first service is installed.
    This is useful to define a chain of dependencies where every node instance depends on another node instance in a circular way - pretty nifty for instance for elasticsearch discovery configuration. (This is only possible for native nodes SystemD services)

  • SAME_NODE : This means that the dependency service is expected to be available on the same node than the first service, otherwise eskimo will report an error during service installation. (This is only possible for native nodes SystemD services)

  • ALL_NODES : this meands that every service defining this dependency will receive the full list of nodes running the master service in an topology variable.

The best way to understand this is to look at the examples in eskimo pre-packaged services declared in the bundled services.json.

For instance:

  • Etcd wants to use the co-located instance of gluster. Since gluster is expected to be available from all nodes of the eskimo cluster, this dependency is simply expressed as:

etcd dependency on gluster
    "dependencies": [
      {
        "masterElectionStrategy": "SAME_NODE",
        "masterService": "gluster",
        "numberOfMasters": 1,
        "mandatory": false,
        "restart": true
      }
    ]
  • kube-slave services needs to reach the first node where kube-master is available (only one in Eskimo Community edition in anyway), so the dependency is defined as follows:

kube-slave dependency on first kube-master
    "dependencies": [
      {
        "masterElectionStrategy": "FIRST_NODE",
        "masterService": "kube-master",
        "numberOfMasters": 1,
        "mandatory": true,
        "restart": true
      },
  • kafka-manager needs to reach any random instance of kafka running on the cluster, so the dependency is expressed as simply as:

kafka-manager dependency on kafka:
    "dependencies": [
      {
        "masterElectionStrategy": "FIRST_NODE",
        "masterService": "zookeeper",
        "numberOfMasters": 1,
        "mandatory": true,
        "restart": true
      },
      {
        "masterElectionStrategy": "RANDOM",
        "masterService": "kafka",
        "numberOfMasters": 1,
        "mandatory": true,
        "restart": false
      }

Look at other examples to get inspired.

4.4.4. Memory allocation

Another pretty important property in a service configuration in services.json is the memory consumption property: memory.

Services memory configuration

The possible values for that property are as follows :

  • negligible : the service is not accounted in memory allocation

  • small : the service gets a single share of memory

  • medium : the service gets two shares of memory

  • large : the service gets three shares of memory

The system then works by computing the sum of shares for all nodes and then allocating the available memory on the node to every service by dividing it amongst shares and allocating the corresponding portion of memory to every service.
Of course, the system first removes from the available memory a significant portion to ensure some room for kernel and filesystem cache.

Also, Kubernetes services deployed as statefulSet on every node are accounted on every node; while unique kubernetes services are accounted only partially, with a ratio corresponding to the amount of memory it would take divided by the number of nodes.
Since unique Kubernetes services are spread among nodes, this works well in practice and is realistic.

Examples of memory allocation

Let’s imagine the following services installed on a cluster node, along with their memory setting:

Native services :

  • ntp - negligible

  • prometheus - negligible

  • gluster - negligible

  • zookeeper - small

Kubernetes services :

  • elasticsearch - large

  • logstash - small

  • kafka - large

  • kibana - medium

  • zeppelin - very large

The following table gives various examples in terms of memory allocation for three different total RAM size values on the cluster node running these services.
The different columns gives how much memory is allocated to the different services in the different rows for various size of total RAM.

Node total RAM Nbr. parts 8 Gb node 16 Gb node 20 Gb node

ntp

0

-

-

-

prometheus

0

-

-

-

gluster

0

-

-

-

zookeeper

1

525m

1125m

1425m

elasticsearch

3

1575m

3375m

4275m

logstash

1

525m

1125m

1425m

kafka

3

1575m

3375m

4275m

kibana

2/3*

350m

750m

950m

zeppelin

5/3*

875m

1875m

2375m

Filesystem cache reserve

3

1575m

3375m

4275m

OS reserve

-

1000m

1000m

1000m

(*For 3 nodes)

The services Kibana and Zeppelin are unique services running on Kubernetes, this example above accounts that there would be 3 nodes in the clzster, hence their memory share is split by 3 on each node.

Kubernetes services memory configuration

The memory configures above is injected directly in the services themselves, without any consideration for the memory requested by the corresponding Kubernetes POD. One should take that into account and declare a comparable amount of memory when declaring the requested POD memory for Kubernetes Services. In fact, one should declare a little more memory as Kubernetes requested memory for POD accounting for overhead.

Custom memory allocation

Every Eskimo service provides a mean to administrator to specify the memory the service process should be using in the Eskimo Service Settings Configuration page.

4.4.5. Topology file on cluster nodes

Every time the cluster nodes / services configuration is changed. Eskimo will verify the global services topology and generate for every node of the cluster a "topology definition file".

That topology definition file defines all the dependencies and where to find them (using the notion of MASTER) for every service running on every node. It also gives indications about the last known services installation status along with kubernetes memory and cpu requests, etc.

The "topology definition file" can be fond on nodes in /etc/eskimo_topology.sh.

4.5. Proxying services web consoles

Many services managed by eskimo have web consoles used to administer them, such as the kubernetes dashboard, cerebro, kafka-manager, etc. Some are even only web consoles used to administer other services or perform Data Science tasks, such as Kibana, Zeppelin or EGMI, etc.

With Eskimo, these consoles, either running natively or managed by kubernetes, are reach from within Eskimo and can be completely isolated from the client network.
Eskimo provides these Graphical User Interfaces in its own UI and proxies the backend call through SSH tunnels to the actual service.

Proxying is however a little more complicated to set up since eskimo needs to perform a lot of rewriting on the text resources (javascript, html and json) served by the proxied web console to rewrite served URLs to make them pass through the proxy.

Eskimo provides a powerful rewrite engine that one can use to implement the rewrite rules defined in the configuration as presented above.

The minimum configuration that needs to be given to put in place a proxy for a service is to give a value to the property [serviceName].ui.proxyTargetPort indicating the target port where to find the service (either on the cluster npdes where it runs or through the Kubernetes proxy.).

The different possibilities to configure rewrite rules and replacements are presented above in the section Configuration file services.json.

4.5.1. Source text replacements

Proxying web consoles HTTP flow means that a lot of the text resources served by the individual target web consoles need to be processed in such a way that absolute URLs are rewritten. This is unfortunately tricky and many different situations can occur, from URL build dynamically in javascript to static resources URLs in CSS files for instance.

An eskimo service developer needs to analyze the application, debug it and understand every pattern that needs to be replaced and define a proxy replacement for each of them.

Standard replacements

A set of standard proxy replacements are implemented once and for all by the eskimo HTTP proxy for all services. By default these standard rewrite rules are enabled for a service unless the service config declares "applyStandardProxyReplacements": false in which case they are not applied to that specific service.
This is useful when a standard rule is actually harming a specific web console behaviour.

The standard replacements are as follows:

Standard replacements
{
  "type" : "PLAIN",
  "source" : "src=\"/",
  "target" : "src=\"/{PREFIX_PATH}/"
},
{
  "type" : "PLAIN",
  "source" : "action=\"/",
  "target" : "action=\"/{PREFIX_PATH}/"
},
{
  "type" : "PLAIN",
  "source" : "href=\"/",
  "target" : "href=\"/{PREFIX_PATH}/"
},
{
  "type" : "PLAIN",
  "source" : "href='/",
  "target" : "href='/{PREFIX_PATH}/"
},
{
  "type" : "PLAIN",
  "source" : "url(\"/",
  "target" : "url(\"/{PREFIX_PATH}/"
},
{
  "type" : "PLAIN",
  "source" : "url('/",
  "target" : "url('/{PREFIX_PATH}/"
},
{
  "type" : "PLAIN",
  "source" : "url(/",
  "target" : "url(/{PREFIX_PATH}/"
},
{
  "type" : "PLAIN",
  "source" : "/api/v1",
  "target" : "/{PREFIX_PATH}/api/v1"
},
{
  "type" : "PLAIN",
  "source" : "\"/static/",
  "target" : "\"/{PREFIX_PATH}/static/"
},
Custom replacements

In addition to the standard rewrite rules - that can be used or not by a service web console - an eskimo service developer can define as many custom rewrite rules as he wants in the service configuration in services.json as presented above.

Some patterns can be used in both the source and target strings that will be replaced by the framework before they are searched, respectively injected, in the text stream:

  • CONTEXT_PATH will be resolved by the context root at which the eskimo web application is deployed, such as for instance eskimo

  • PREFIX_PATH will be resolved by the specific context path of the service web console context, such as for instance for kibana {CONTEXT_PATH}/kibana, e.g. eskimo/kibana or kibana if no context root is used.

  • APP_ROOT_URL will be resolved to the full URL used to reach eskimo, e.g. http://localhost:9191/eskimo

4.5.2. URL rewriting

URL rewriting is another mechanism available to fine tune eskimo proxying.
Sometimes, a service backend sends a redirect (HTTP code 302 or else) to an absolute URL. In such cases, the absolute URL needs to be replaced by the corresponding sub-path in the eskimo context.

This is achieved using URL rewriting rules.

URL rewriting rule example for spark-console
      "urlRewriting" : [
        {
          "startUrl" : "{APP_ROOT_URL}/history/",
          "replacement" : "{APP_ROOT_URL}/spark-console/history/"
        }

The spark history servre uses such redirect when it is loading a spark log file for as long as the spark log file is being loaded. The rule above takes care or replacing such URL used in the HTTP redirect.

4.5.3. Page scripters

Page scripters form a third mechanism aimed at customizing the behaviour of proxied application. They consists of declaring a javascript snippet that is injected at the bottom of the body tag in the referenced HTML document.

Eskimo is Copyright 2019 - 2022 eskimo.sh / https://www.eskimo.sh - All rights reserved.
Author : eskimo.sh / https://www.eskimo.sh

Eskimo is available under a dual licensing model : commercial and GNU AGPL.
If you did not acquire a commercial licence for Eskimo, you can still use it and consider it free software under the terms of the GNU Affero Public License. You can redistribute it and/or modify it under the terms of the GNU Affero Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Compliance to each and every aspect of the GNU Affero Public License is mandatory for users who did no acquire a commercial license.

Eskimo is distributed as a free software under GNU AGPL in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero Public License for more details.

You should have received a copy of the GNU Affero Public License along with Eskimo. If not, see https://www.gnu.org/licenses/ or write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA, 02110-1301 USA.

You can be released from the requirements of the license by purchasing a commercial license. Buying such a commercial license is mandatory as soon as :

  • you develop activities involving Eskimo without disclosing the source code of your own product, software, platform, use cases or scripts.

  • you deploy eskimo as part of a commercial product, platform or software.

For more information, please contact eskimo.sh at https://www.eskimo.sh

The above copyright notice and this licensing notice shall be included in all copies or substantial portions of the Software.