Skip to content

BaseSb3AlgorithmArgs

Full path: schola.scripts.sb3.settings.BaseSb3AlgorithmArgs

schola.scripts.sb3.settings.BaseSb3AlgorithmArgs

BaseSb3AlgorithmArgs

BaseSb3AlgorithmArgs(learning_rate: 'Annotated[float, Parameter(validator=validators.Number(gt=0.0))]' = 0.0003, n_steps: 'Annotated[int, Parameter(validator=validators.Number(gte=1))]' = 2048, batch_size: 'Annotated[int, Parameter(validator=validators.Number(gte=1))]' = 64)

Bases: object

Methods

ItemDescription
init

Attributes

ItemDescription
batch_sizeMinibatch size for each update.
learning_rateLearning rate for the optimizer.
n_stepsNumber of steps to run for each environment per update.

Parameters

learning_rate (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=0, gte=None, modulo=None),))])

n_steps (Annotated[int, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=1, modulo=None),))])

batch_size (Annotated[int, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=1, modulo=None),))])

init

__init__(learning_rate=0.0003, n_steps=2048, batch_size=64)

Parameters

learning_rate (Annotated[float, Parameter(validator=(Number(lt=None, lte=None, gt=0, gte=None, modulo=None),))])

n_steps (Annotated[int, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=1, modulo=None),))])

batch_size (Annotated[int, Parameter(validator=(Number(lt=None, lte=None, gt=None, gte=1, modulo=None),))])

Returns

None


batch_size

= 64 batch_size: Annotated[int, Parameter(validator=Number(lt=None, lte=None, gt=None, gte=1, modulo=None))]

Minibatch size for each update. This is the number of timesteps used in each batch for training the policy. Must be a divisor of n_steps.


learning_rate

= 0.0003 learning_rate: Annotated[float, Parameter(validator=Number(lt=None, lte=None, gt=0, gte=None, modulo=None))]

Learning rate for the optimizer.


n_steps

= 2048 n_steps: Annotated[int, Parameter(validator=Number(lt=None, lte=None, gt=None, gte=1, modulo=None))]

Number of steps to run for each environment per update. This is the number of timesteps collected before updating the policy.