schola.scripts.ray.settings.ResourceSettings
Class Definition
class schola.scripts.ray.settings.ResourceSettings(num_gpus=0, num_cpus=1, num_learners=0, num_cpus_for_main_process=1, num_cpus_per_learner=1, num_gpus_per_learner=0)
Bases: object
Dataclass for resource settings used in the RLlib training process. This class defines the parameters for allocating computational resources, including the number of GPUs and CPUs to use for the training job. These settings help to control how resources are allocated for the training process, which can impact performance and training times. This is especially important when running on a cluster or distributed environment.
Parameters
num_gpus
Type: int | None
num_cpus
Type: int | None
num_learners
Type: int | None
num_cpus_for_main_process
Type: int | None
num_cpus_per_learner
Type: int | None
num_gpus_per_learner
Type: int | None
Attributes
name
Type: str
num_cpus
Type: int | None
Default: 1
The total number of CPUs to use for the training process. This specifies how many CPU cores are available for the RLlib training job. This can be used to parallelize the training process across multiple CPU cores, which can help to speed up training times.
num_cpus_for_main_process
Type: int | None
Default: 1
The number of CPUs to allocate for the main process. This is the number of CPU cores that will be allocated to the main process that manages the training job. This can be used to ensure that the main process has enough resources to handle the workload and manage the learner processes effectively.
num_cpus_per_learner
Type: int | None
Default: 1
The number of CPUs to allocate for each learner process. This specifies how many CPU cores will be allocated to each individual learner process that is used for training. This can be used to ensure that each learner has enough resources to handle its workload and process the training data efficiently.
num_gpus
Type: int | None
Default: 0
The number of GPUs to use for the training process. This specifies how many GPUs are available for the RLlib training job. If set to 0, it will default to CPU training. This can be used to leverage GPU acceleration for faster training times if available.
num_gpus_per_learner
Type: int | None
Default: 0
The number of GPUs to allocate for each learner process. This specifies how many GPUs will be allocated to each individual learner process that is used for training.
num_learners
Type: int | None
Default: 0
The number of learner processes to use for the training job. This specifies how many parallel learner processes will be used to train the model. Each learner will process a portion of the training data and update the model weights independently. This can help to speed up training times by leveraging multiple CPU cores or GPUs.
Methods
__init__
__init__(num_gpus=0, num_cpus=1, num_learners=0, num_cpus_for_main_process=1, num_cpus_per_learner=1, num_gpus_per_learner=0)
Return type: None
populate_arg_group
classmethod populate_arg_group(args_group)