Azkaban Architecture

Azkaban Architecture Diagram

Azkaban Executor Server Responsibilities

Dispatch

Orchestration

Flow Management

Log Management

Performance and Resource Tuning

The amount of CPU and Memory to assign to each Executor Server will depend on a number of factors:

Run some sample jobs and monitor CPU and memory, using only a few concurrent jobs/tasks. This will give you an idea of how to scale out.

If some job types require more memory and/or CPU consider creating specific Executors that perform those job types and assign an appropriate tag to the executor(s) and jobs to ensure they run on the appropriate executor.

As a general rule each Task in a RED Job makes up a thread and each RED task could consume between 100 - 300mb on average:

If you try to run more jobs (or tasks) in parallel than your Executors can safely handle then you are likely to get out-of-memory errors on one or more of your Executor Servers. To mitigate this you can consider these solutions:

Azkaban Executor Server Configuration

Each RED Job is a single Flow in Azkaban.

These settings can be found and/or applied in azkaban.local.properties for each Executor Server installation.

ParameterDescriptionDefault
executor.flow.threads

The number of simultaneous flows that can be run.

Since each Job in RED equates to a Flow in Azkaban then depending on server resources and task parallelism leaving this setting at it's default could lead to out-of-memory failures on the Executor, 

(tick) Tip: Currently this setting is not present in the properties file in WhereScape's Azkaban but can be added to limit the number of parallel jobs. If you had 5 jobs running each with 5 parallel threads then you could potentially hit 25 active threads, each consuming 100-300mb on average which might require 7.5GB of memory. 

30
job.log.chunk.sizeFor rolling job logs. The chuck size for each roll over5MB
job.log.backup.indexThe number of log chunks. The max size of each log is then the index * chunksize4
flow.num.job.threads

The number of concurrent running jobs in each flow. 

(tick) Tip: This value is overridden in the New/Edit Job screen 'Maximum Threads' setting for each job in the RED UI. Only Jobs either started from RED or triggered via a schedule, honor this setting. For direct execution via the Azkaban Dashboard you must set it via a Flow Property Override.

Each Task in a RED Job makes up a thread and each RED task could consume between 100 - 300mb on average. Control this value for each job via the RED UI and keep it as low as possible to avoid running out of executor resources.

50 (to match the theoretical max threads in the RED New/Edit Job Screen)
job.max.XmsThe maximum initial amount of memory each job can request. If a job requests more than this, then Azkaban server will not launch this job1GB
job.max.XmxThe maximum amount of memory each job can request. If a job requests more than this, then Azkaban server will not launch this job2GB
azkaban.server.flow.max.running.minutesThe maximum time in minutes a flow will be living inside azkaban after being executed. If a flow runs longer than this, it will be killed. If smaller or equal to 0, there's no restriction on running time.-1


Flow Property Override

When running WhereScape Jobs (Flows) Directly from Azkaban you can override global properties for a job.

When running Jobs directly from the Azkaban Dashboard the Maximum Threads setting in RED is not used and instead tasks which can run in parallel will be started until the max thread count specified for the Executor (azkaban.local.properties, property flow.num.job.threads) is reached. This could of course overload your scheduler if you have many tasks within the job which can run concurrently.

The workaround for this situation is to add a Flow Property Override in the Flow Parameters for the property flow.num.job.threads when running the Job directly to restrict the thread count down from the default max.