PolicyIteration¶
-
class
safe_learning.
PolicyIteration
(policy, dynamics, reward_function, value_function, gamma=0.98)¶ A class for policy iteration.
Parameters: - policy : callable
The policy that maps states to actions.
- dynamics : callable
A function that can be called with states and actions as inputs and returns future states.
- reward_function : callable
A function that takes the state, action, and next state as input and returns the reward corresponding to this transition.
- value_function : instance of DeterministicFunction
The function approximator for the value function. It is used to evaluate the value function at states.
- gamma : float
The discount factor for reinforcement learning.
Methods
bellmann_error
(self, states)Compute the squared bellmann error. discrete_policy_optimization
(self, action_space)Optimize the policy for a given value function. future_values
(self, states[, policy, …])Return the value at the current states. optimize_value_function
(self, \*\*solver_options)Optimize the value function using cvx. value_iteration
(self)Perform one step of value iteration. -
bellmann_error
(self, states)¶ Compute the squared bellmann error.
Parameters: - states : array
Returns: - error : float
-
discrete_policy_optimization
(self, action_space, constraint=None)¶ Optimize the policy for a given value function.
Parameters: - action_space : ndarray
The parameter value to evaluate (for each parameter). This is geared towards piecewise linear functions.
- constraint : callable
A function that can be called with a policy. Returns the slack of the safety constraint for each state. A policy is safe if the slack is >=0 for all constraints.
-
future_values
(self, states, policy=None, actions=None, lyapunov=None, lagrange_multiplier=1.0)¶ Return the value at the current states.
Parameters: - states : ndarray
The states at which to compute future values.
- policy : callable, optional
The policy for which to evaluate. Defaults to self.policy. This argument is ignored if actions is not None.
- actions : array or tensor, optional
The actions to be taken for the states.
- lyapunov : instance of Lyapunov
A Lyapunov function that acts as a constraint for the optimization.
- lagrange_multiplier: float
A scaling factor for the slack of the optimization problem.
Returns: - The expected long term reward when taking an action according to the
- policy and then taking the value of self.value_function.
-
optimize_value_function
(self, **solver_options)¶ Optimize the value function using cvx.
Parameters: - solver_options : kwargs, optional
Additional solver options passes to cvxpy.Problem.solve.
Returns: - assign_op : tf.Tensor
An assign operation that updates the value function.
-
value_iteration
(self)¶ Perform one step of value iteration.