simple-gpu-scheduler

A simple scheduler for running commands on multiple GPUs.

These details have not been verified by PyPI

Project links

Homepage

Project description

simple_gpu_scheduler

A simple scheduler to run your commands on individual GPUs. Following the KISS principle, this script simply accepts commands via stdin and executes them on a specific GPU by setting the CUDA_VISIBLE_DEVICES variable.

The commands read are executed using the login shell, thus redirections > pipes | and all other kinds of bash magic can be used.

Installation

The package can simply be installed from pypi

$ pip install simple_gpu_scheduler

Example

To show how this generally works, we will create jobs that simply outputs a job id and the value of CUDA_VISIBLE_DEVICES:

for i in {0..10}; do echo "echo job_id=$i device=\$CUDA_VISIBLE_DEVICES && sleep 3"; done | simple_gpu_scheduler --gpus 0,1,2

which results in the following output:

Processing `command echo job_id=0 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 2
Processing `command echo job_id=1 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 1
Processing `command echo job_id=2 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 0
job_id=0 device=2
job_id=1 device=1
job_id=2 device=0
--- 3 seconds no output ---
Processing command `echo job_id=3 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 2
Processing command `echo job_id=4 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 1
Processing command `echo job_id=5 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 0
job_id=3 device=2
job_id=4 device=1
job_id=5 device=0
--- 3 seconds no output ---
Processing command `echo job_id=6 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 2
Processing command `echo job_id=7 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 1
Processing command `echo job_id=8 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 0
job_id=6 device=2
job_id=7 device=1
job_id=8 device=0
--- 3 seconds no output ---
Processing command `echo job_id=9 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 2
Processing command `echo job_id=10 device=$CUDA_VISIBLE_DEVICES && sleep 3` on gpu 0
job_id=9 device=2
job_id=10 device=0

This is equivalent to creating a file commands.txt with the following content:

echo job_id=0 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=1 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=2 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=3 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=4 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=5 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=6 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=7 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=8 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=9 device=$CUDA_VISIBLE_DEVICES && sleep 3
echo job_id=10 device=$CUDA_VISIBLE_DEVICES && sleep 3

and running

simple_gpu_scheduler --gpus 0,1,2 < commands.txt

Simple scheduler for jobs

Combined with some basic command line tools, one can set up a very basic scheduler which waits for new jobs to be "submitted" and executes them in order of submission.

Setup and start scheduler in background or in a separate permanent session (using for example tmux):

touch gpu.queue
tail -f -n 0 gpu.queue | simple_gpu_scheduler --gpus 0,1,2

the command tail -f -n 0 follows the end of the gpu.queue file. Thus if there was anything written into gpu.queue prior to the execution of the command it will not be passed to simple_gpu_scheduler.

Then submitting commands boils down to appending text to the gpu.queue file:

echo "my_command_with | and stuff > logfile" >> gpu.queue

Hyperparameter search

In order to allow user friendly utilization of the scheduler in the common scenario of hyperparameter search, a convenience script simple_hypersearch is included in the package.

simple_hypersearch -h

usage: simple_hypersearch [-h] [--sampling-mode {shuffled_grid,grid}]
                          [--n-samples N_SAMPLES] [--seed SEED]
                          [-p NAME [VALUES ...]]
                          command_pattern

Convenience tool to generate hyperparameter search commands from a command pattern and parameter ranges.

positional arguments:
  command_pattern       Command pattern where placeholders with {parameter_name} should be replaced.

optional arguments:
  -h, --help            show this help message and exit
  --sampling-mode {shuffled_grid,grid}
                        Determine how to sample commands. Either in the grid order [grid]
                        or in a shuffled order [shuffled_grid, default].
  --n-samples N_SAMPLES
                        Number of samples to draw. If not provided use all possible combinations.
  --seed SEED           Random seed to ensure reproducability when using randomized order of the grid.
  -p NAME [VALUES ...], --parameter NAME [VALUES ...]
                        Name of parameter followed by values that should be considered for hyperparameter search.
                        Example: `-p lr 0.01 0.001 0.0001`

Usage example:
    simple_hypersearch "my_program --param1 {param1} --param2 {param2}" -p param1 0 1 -p param2 2 3
    will generate the output:
    my_program --param1 0 --param2 2
    my_program --param1 0 --param2 3
    my_program --param1 1 --param2 2
    my_program --param1 1 --param2 3

This allows to easily perform hyperparameter searches over a grid of values or uniform samples of the grid (dependent on the setting of sampling-mode). The output can directly be piped into simple_gpu_scheduler or appended to the "queue file" (see Simple scheduler for jobs).

Here some more concrete examples:

Grid of all possible parameter configurations in random order:

simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2

5 uniformly sampled parameter configurations:

simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" --n-samples 5 -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2

TODO

Multi line jobs (evtl. we would then need a submission script after all)
Stop, but let commands finish when receiving a defined signal
Tests would be nice, until now the project is still very small but if it grows tests should be added

Algorithm	Hash digest
SHA256	`9b976e5ecbda2a4b9ba1e7651995232192572a72acff6176444be9f051b6e13f`
MD5	`5def38730b75192393d587d16951f789`
BLAKE2b-256	`7a650e26c47691274fcb14616f5cd6075c4b67534fadaedee95588c961339114`

Algorithm	Hash digest
SHA256	`bc3a8d7268eac63891e051807f61eb2d4155701f3aa821f50052ae128617ebda`
MD5	`ec3bf3b5203f9d89271605f7f873f7cf`
BLAKE2b-256	`5fe8f88a719d4c6c1e58446a126d7a78f17d29df41a23fa15f30542e8c02f245`

simple-gpu-scheduler 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

simple_gpu_scheduler

Installation

Example

Simple scheduler for jobs

Hyperparameter search

TODO

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes