PolicyIteration¶

class safe_learning.PolicyIteration(policy, dynamics, reward_function, value_function, gamma=0.98)¶

A class for policy iteration.

Parameters:

policy : callable: The policy that maps states to actions.
dynamics : callable: A function that can be called with states and actions as inputs and returns future states.
reward_function : callable: A function that takes the state, action, and next state as input and returns the reward corresponding to this transition.
value_function : instance of DeterministicFunction: The function approximator for the value function. It is used to evaluate the value function at states.
gamma : float: The discount factor for reinforcement learning.

Methods

`bellmann_error`(self, states)	Compute the squared bellmann error.
`discrete_policy_optimization`(self, action_space)	Optimize the policy for a given value function.
`future_values`(self, states[, policy, …])	Return the value at the current states.
`optimize_value_function`(self, \\solver_options)	Optimize the value function using cvx.
`value_iteration`(self)	Perform one step of value iteration.

bellmann_error(self, states)¶

Compute the squared bellmann error.

Parameters:	states : array
Returns:	error : float

discrete_policy_optimization(self, action_space, constraint=None)¶

Optimize the policy for a given value function.

Parameters:	action_space : ndarray The parameter value to evaluate (for each parameter). This is geared towards piecewise linear functions. constraint : callable A function that can be called with a policy. Returns the slack of the safety constraint for each state. A policy is safe if the slack is >=0 for all constraints.

future_values(self, states, policy=None, actions=None, lyapunov=None, lagrange_multiplier=1.0)¶

Return the value at the current states.

Parameters:

states : ndarray: The states at which to compute future values.
policy : callable, optional: The policy for which to evaluate. Defaults to self.policy. This argument is ignored if actions is not None.
actions : array or tensor, optional: The actions to be taken for the states.
lyapunov : instance of Lyapunov: A Lyapunov function that acts as a constraint for the optimization.
lagrange_multiplier: float: A scaling factor for the slack of the optimization problem.

Returns:

optimize_value_function(self, **solver_options)¶

Optimize the value function using cvx.

Parameters:	solver_options : kwargs, optional Additional solver options passes to cvxpy.Problem.solve.
Returns:	assign_op : tf.Tensor An assign operation that updates the value function.