48 ANN_Policy (
int n_states,
int n_actions,
int n_hidden = 0,
real alpha=0.1,
real gamma=0.8,
real lambda=0.8,
bool eligibility =
false,
bool softmax =
false,
real randomness=0.1,
real init_eval=0.0,
bool separate_actions =
false);
bool confidence
Confidence estimates option.
virtual bool useConfidenceEstimates(bool confidence, real zeta=0.01)
Set to use confidence estimates for action selection, with variance smoothing zeta.
virtual void Reset()
Reset eligibility traces.
real * eval
evaluation of current aciton
virtual int SelectAction(real *s, real r, int forced_a=-1)
Select an action, given a vector of real numbers which represents the state.
ANN * J
Evaluation network.
int n_actions
number of actions
real * ps
Previous state vector.
ANN_Policy(int n_states, int n_actions, int n_hidden=0, real alpha=0.1, real gamma=0.8, real lambda=0.8, bool eligibility=false, bool softmax=false, real randomness=0.1, real init_eval=0.0, bool separate_actions=false)
Make a new policy.
bool eligibility
eligibility option
A type of discrete action policy using a neural network for function approximation.
int n_states
number of states
real lambda
Eligibility trace decay.
real * JQs
Placeholder for evaluation vector (separate_actions)
real gamma
Future discount parameter.
virtual real getLastActionValue()
Return the last action value.
bool separate_actions
Single/separate evaluation option.
real J_ps_pa
Evaluation of last action.
Discrete policies with reinforcement learning.
real * delta_vector
Scratch vector for TD error.
ANN management structure.
ANN ** Ja
Evaluation networks (for separate_actions case)
virtual real * getActionProbabilities()
real zeta
Confidence smoothing.