schola.scripts.ray.settings.ResourceSettings

Class Definition

class schola.scripts.ray.settings.ResourceSettings(num_gpus=0, num_cpus=1, num_learners=0, num_cpus_for_main_process=1, num_cpus_per_learner=1, num_gpus_per_learner=0)

Bases: object

Dataclass for resource settings used in the RLlib training process. This class defines the parameters for allocating computational resources, including the number of GPUs and CPUs to use for the training job. These settings help to control how resources are allocated for the training process, which can impact performance and training times. This is especially important when running on a cluster or distributed environment.

Parameters

num_gpus

Type: int | None

num_cpus

Type: int | None

num_learners

Type: int | None

num_cpus_for_main_process

Type: int | None

num_cpus_per_learner

Type: int | None

num_gpus_per_learner

Type: int | None

Attributes

name

Type: str

num_cpus

Type: int | None
Default: 1

The total number of CPUs to use for the training process. This specifies how many CPU cores are available for the RLlib training job. This can be used to parallelize the training process across multiple CPU cores, which can help to speed up training times.

num_cpus_for_main_process

Type: int | None
Default: 1

The number of CPUs to allocate for the main process. This is the number of CPU cores that will be allocated to the main process that manages the training job. This can be used to ensure that the main process has enough resources to handle the workload and manage the learner processes effectively.

num_cpus_per_learner

Type: int | None
Default: 1

The number of CPUs to allocate for each learner process. This specifies how many CPU cores will be allocated to each individual learner process that is used for training. This can be used to ensure that each learner has enough resources to handle its workload and process the training data efficiently.

num_gpus

Type: int | None
Default: 0

The number of GPUs to use for the training process. This specifies how many GPUs are available for the RLlib training job. If set to 0, it will default to CPU training. This can be used to leverage GPU acceleration for faster training times if available.

num_gpus_per_learner

Type: int | None
Default: 0

The number of GPUs to allocate for each learner process. This specifies how many GPUs will be allocated to each individual learner process that is used for training.

num_learners

Type: int | None
Default: 0

The number of learner processes to use for the training job. This specifies how many parallel learner processes will be used to train the model. Each learner will process a portion of the training data and update the model weights independently. This can help to speed up training times by leveraging multiple CPU cores or GPUs.

Methods

init

__init__(num_gpus=0, num_cpus=1, num_learners=0, num_cpus_for_main_process=1, num_cpus_per_learner=1, num_gpus_per_learner=0)

Return type: None

populate_arg_group

classmethod populate_arg_group(args_group)