Getting started with BigTop

What is this?

This is a simple docker images hosting mixql-platform and BigTop. It can be used to easily spin up Hadoop cluster and test mixql queries.

Quick Start

Clone mixql-test-env to /mixql and start cluster:

git clone https://github.com/ntlegion/mixql-test-env.git
cd mixql-test-env
docker compose up -d

Expected reuslt

docker compose up -d
[+] Running 8/8
 ⠿ Network mixql_mixql-net        Created                                                                          0.0s
 ⠿ Volume "mixql_mixql-app"       Created                                                                          0.0s
 ⠿ Volume "mixql_mixql-host-app"  Created                                                                          0.0s
 ⠿ Volume "mixql_mixql-src"       Created                                                                          0.0s
 ⠿ Volume "mixql_mixql-samples"   Created                                                                          0.0s
 ⠿ Container mixql-node1-1        Started                                                                          1.2s
 ⠿ Container mixql-main-1         Started                                                                          1.1s
 ⠿ Container mixql-demo-1         Started                                                                          1.0s

Listing containers must show containers running and the port mapping as below:

docker compose ps

Expected reuslt

NAME                COMMAND                  SERVICE             STATUS              PORTS
mixql-demo-1        "/mixql/entrypoint.s…"   demo                exited (0)
mixql-main-1        "/mixql/entrypoint.s…"   main                running             0.0.0.0:8088->8081/tcp, :::8088->8081/tcp, 0.0.0.0:18017->8088/tcp, :::18017->8088/tcp, 0.0.0.0:57017->50070/tcp, :::57017->50070/tcp, 0.0.0.0:57517->50075/tcp, :::57517->50075/tcp
mixql-node1-1       "/mixql/entrypoint.s…"   node1               running             0.0.0.0:8087->8081/tcp, :::8087->8081/tcp, 0.0.0.0:18016->8088/tcp, :::18016->8088/tcp, 0.0.0.0:57016->50070/tcp, :::57016->50070/tcp, 0.0.0.0:57516->50075/tcp, :::57516->50075/tcp

At the first start all dependencies will be installed in the hadoop containers (main + node1) by masterless puppet within 10 minutes. Next time container will be started faster (1-2 min).

Live logs from container: docker compose logs -f main.

Check hadoop services

docker compose exec main bash
root@main:/mixql# hdfs dfs -ls /
hdfs dfs -ls /
Found 7 items
drwxr-xr-x   - hdfs  hadoop          0 2023-03-28 10:58 /apps
drwxrwxrwx   - hdfs  hadoop          0 2023-03-28 10:58 /benchmarks
drwxr-xr-x   - hbase hbase           0 2023-03-28 10:58 /hbase
drwxr-xr-x   - solr  solr            0 2023-03-28 10:58 /solr
drwxrwxrwt   - hdfs  hadoop          0 2023-03-28 10:59 /tmp
drwxr-xr-x   - hdfs  hadoop          0 2023-03-28 10:58 /user
drwxr-xr-x   - hdfs  hadoop          0 2023-03-28 10:58 /var

root@main:/mixql# spark-shell
...
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.3
      /_/

Using Scala version 2.12.15, OpenJDK 64-Bit Server VM, 1.8.0_362

root@main:/mixql# beeline -u jdbc:hive2://localhost:10000/default -n testuser -e "show databases"
...
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (1.001 seconds)
Beeline version 3.1.3 by Apache Hive

root@main:/mixql# oozie admin -version
Oozie server build version: {"build.version":"5.2.1","vc.url":"git@github.com:apache\/oozie.git","vc.revision":"branch-5.2@8f0e5ee09","build.time":"2022.12.25-07:25:51GMT -Dvc.revision=unavailable -Dvc.url=unavailable -DgenerateDocs","build.user":"jenkins"}

Start demo app

Start precompiled mixql-platform-demo from console with default configuration:

docker compose exec main bash
cd $MIXQL_CLUSTER_BASE_PATH
./mixql-platform-demo

or by /mixql/entrypoint.sh:

docker compose exec main entrypont.sh run

Start mixql-platform-demo with particular script and database from container console:

docker compose exec main bash
cd $MIXQL_CLUSTER_BASE_PATH
./mixql-platform-demo  \
    --sql-file /mixql/src/mixql-platform/mixql-platform-demo/src/test/resources/test_simple_func.sql \
    -Dmixql.org.engine.sqlight.db.path="jdbc:sqlite:/mixql-host/samples/db/titanic.db"

or by /mixql/entrypoint.sh:

docker compose exec \
    -e SCRIPT=/mixql/src/mixql-platform/mixql-platform-demo/src/test/resources/test_simple_func.sql \
    -e DB=/mixql-host/samples/db/titanic.db
    main entrypont.sh run

Compile and run

Source must be placed in parent folder like this:

/mixql
|mixql-platform  <- source code
|mixql-test-env  <- test environment

Create folders and get the code:

mkdir mixql && cd mixql
git clone https://github.com/ntlegion/mixql-test-env.git
git clone https://github.com/mixql/mixql-platform.git
cd mixql-test-env

Compile app from host source

You can change source code on the host and compile it by container demo

docker compose run -it --rm demo compile

Generated app will be placed in /mixql-host/app/ and available from hadoop containers.

Run compiled demo-app with SQLite dababase

Change $MIXQL_CLUSTER_BASE_PATH to new path to /mixql-host/app/ and then manually run

docker compose exec main bash
export MIXQL_CLUSTER_BASE_PATH="/mixql-host/app/mixql-platform-demo-$MIXQL_APP_VERSION/bin"
cd $MIXQL_CLUSTER_BASE_PATH
./mixql-platform-demo  \
    --sql-file /mixql/src/mixql-platform/mixql-platform-demo/src/test/resources/test_simple_func.sql \
    -Dmixql.org.engine.sqlight.db.path="jdbc:sqlite:/mixql-host/samples/db/titanic.db"

Run compiled app with sqlite database and script from host:

docker compose exec   \
    -e SCRIPT=/mixql-host/samples/scripts/test-titanic.sql \
    -e DB=/mixql-host/samples/db/titanic.db \
    main entrypoint.sh run-host

Shut down

  • Stop and ready to quick start

    Stop services: docker compose stop.

    After this containers and volumes are not destroyed and will be run quickly next time.

    Start cluster again: docker compose start.

  • Stop and remove containers

    Turn off the cluster after work (only delete containers): docker compose down.

  • Stop and full cleaning

    Clean volumes and images used by services (stops containers and removes containers, networks, volumes, and images created by up):

    docker compose  down --volumes --rmi all

Check system: docker system df

For developers

Folder structure

/mixql                                        <-- preinstalled files in container
|-- app                                       <-- precompiled app from git
|   `-- mixql-platform-demo-0.3.0-SNAPSHOT
|       |-- bin                               <-- execution script
|       |-- lib                               <-- compiled libs
|-- samples
|   |-- db                                    <-- sample dataset in container
|   `-- scripts
|-- src
|    `-- mixql-platform                       <-- source code from git
|
/mixql-host/                                  <-- shared folders from host
|-- app                                       <-- compiled app from /mixql-host/src/
|-- samples                                   <-- samples from host
`-- src                                       <-- source code from host to compile

Environment variables

Variable

Usage

SCRIPT

Specifies a file containing a mixql script to be executed by entrypoint.sh

DB

Specifies the database file SQLite should use entrypoint.sh or create if it not exists in

JAVA_VERSION

Version to install (mixql/demo image)

SCALA_VERSION

Version to install (mixql/demo image)

SBT_VERSION

Version to install (mixql/demo image)

MIXQL_APP_VERSION

Published version for path (mixql/demo image)

MIXQL_GIT_COMMIT

Git version of mixql to checkout (mixql/demo image)

BIGTOP_GIT_COMMIT

Git version of BigTop to checkout (mixql/bigtop image)

Rebuild images

To build only Hadoop image: docker compose build main.

To rebuild all image: docker compose build --no-cache

Prepared commands for hadoop container entrypoint

docker compose exec main entrypoint.sh COMMAND

COMMAND:

  • puppet - install Hadoop services by puppet

  • run - run precompiled app from /mixql/app;

  • run-host - run compiled app from /mixql-host/app. Be sure app is ready to use (is compiled);

  • bash - run bash shell

  • wait - infinite loop

  • any other string will be executed by exec

Use word bash after main command (run, puppet…​) to stay in bash after execution:

docker compose exec main entrypoint.sh run-host bash