TORCS  1.3.9
The Open Racing Car Simulator
ANN_Policy Class Reference

A type of discrete action policy using a neural network for function approximation. More...

#include <ann_policy.h>

Inheritance diagram for ANN_Policy:
Collaboration diagram for ANN_Policy:

Public Member Functions

 ANN_Policy (int n_states, int n_actions, int n_hidden=0, real alpha=0.1, real gamma=0.8, real lambda=0.8, bool eligibility=false, bool softmax=false, real randomness=0.1, real init_eval=0.0, bool separate_actions=false)
 Make a new policy. More...
 
virtual ~ANN_Policy ()
 
virtual int SelectAction (real *s, real r, int forced_a=-1)
 Select an action, given a vector of real numbers which represents the state. More...
 
virtual void Reset ()
 Reset eligibility traces. More...
 
virtual real getLastActionValue ()
 Return the last action value. More...
 
virtual realgetActionProbabilities ()
 
virtual bool useConfidenceEstimates (bool confidence, real zeta=0.01)
 Set to use confidence estimates for action selection, with variance smoothing zeta. More...
 
- Public Member Functions inherited from DiscretePolicy
 DiscretePolicy (int n_states, int n_actions, real alpha=0.1, real gamma=0.8, real lambda=0.8, bool softmax=false, real randomness=0.1, real init_eval=0.0)
 Create a new discrete policy. More...
 
virtual ~DiscretePolicy ()
 Kill the agent and free everything. More...
 
virtual void setLearningRate (real alpha)
 Set the learning rate. More...
 
virtual real getTDError ()
 Get the temporal difference error of the previous action. More...
 
virtual int SelectAction (int s, real r, int forced_a=-1)
 Select an action a, given state s and reward from previous action. More...
 
virtual void loadFile (char *f)
 Load policy from a file. More...
 
virtual void saveFile (char *f)
 Save policy to a file. More...
 
virtual void setQLearning ()
 Set the algorithm to QLearning mode. More...
 
virtual void setELearning ()
 Set the algorithm to ELearning mode. More...
 
virtual void setSarsa ()
 Set the algorithm to SARSA mode. More...
 
virtual bool useConfidenceEstimates (bool confidence, real zeta=0.01, bool confidence_eligibility=false)
 Set to use confidence estimates for action selection, with variance smoothing zeta. More...
 
virtual void setForcedLearning (bool forced)
 Set forced learning (force-feed actions) More...
 
virtual void setRandomness (real epsilon)
 Set randomness for action selection. Does not affect confidence mode. More...
 
virtual void setGamma (real gamma)
 Set the gamma of the sum to be maximised. More...
 
virtual void setPursuit (bool pursuit)
 Use Pursuit for action selection. More...
 
virtual void setReplacingTraces (bool replacing)
 Use Pursuit for action selection. More...
 
virtual void useSoftmax (bool softmax)
 Set action selection to softmax. More...
 
virtual void setConfidenceDistribution (enum ConfidenceDistribution cd)
 Set the distribution for direct action sampling. More...
 
virtual void useGibbsConfidence (bool gibbs)
 Add Gibbs sampling for confidences. More...
 
virtual void useReliabilityEstimate (bool ri)
 Use the reliability estimate method for action selection. More...
 
virtual void saveState (FILE *f)
 Save the current evaluations in text format to a file. More...
 

Protected Attributes

ANNJ
 Evaluation network. More...
 
ANN ** Ja
 Evaluation networks (for separate_actions case) More...
 
realps
 Previous state vector. More...
 
realJQs
 Placeholder for evaluation vector (separate_actions) More...
 
real J_ps_pa
 Evaluation of last action. More...
 
realdelta_vector
 Scratch vector for TD error. More...
 
bool eligibility
 eligibility option More...
 
bool separate_actions
 Single/separate evaluation option. More...
 
- Protected Attributes inherited from DiscretePolicy
enum LearningMethod learning_method
 learning method to use; More...
 
int n_states
 number of states More...
 
int n_actions
 number of actions More...
 
real ** Q
 state-action evaluation More...
 
real ** e
 eligibility trace More...
 
realeval
 evaluation of current aciton More...
 
realsample
 sampling output More...
 
real pQ
 previous Q More...
 
int ps
 previous state More...
 
int pa
 previous action More...
 
real r
 reward More...
 
real temp
 scratch More...
 
real tdError
 temporal difference error More...
 
bool smax
 softmax option More...
 
bool pursuit
 pursuit option More...
 
real ** P
 pursuit action probabilities More...
 
real gamma
 Future discount parameter. More...
 
real lambda
 Eligibility trace decay. More...
 
real alpha
 learning rate More...
 
real expected_r
 Expected reward. More...
 
real expected_V
 Expected state return. More...
 
int n_samples
 number of samples for above expected r and V More...
 
int min_el_state
 min state ID to search for eligibility More...
 
int max_el_state
 max state ID to search for eligibility More...
 
bool replacing_traces
 Replacing instead of accumulating traces. More...
 
bool forced_learning
 Force agent to take supplied action. More...
 
bool confidence
 Confidence estimates option. More...
 
bool confidence_eligibility
 Apply eligibility traces to confidence. More...
 
bool reliability_estimate
 reliability estimates option More...
 
enum ConfidenceDistribution confidence_distribution
 Distribution to use for confidence sampling. More...
 
bool confidence_uses_gibbs
 Additional gibbs sampling for confidence. More...
 
real zeta
 Confidence smoothing. More...
 
real ** vQ
 variance estimate for Q More...
 

Additional Inherited Members

- Protected Member Functions inherited from DiscretePolicy
int confMax (real *Qs, real *vQs, real p=1.0)
 Confidence-based Gibbs sampling. More...
 
int confSample (real *Qs, real *vQs)
 Directly sample from action value distribution. More...
 
int softMax (real *Qs)
 Softmax Gibbs sampling. More...
 
int eGreedy (real *Qs)
 e-greedy sampling More...
 
int argMax (real *Qs)
 Get ID of maximum action. More...
 

Detailed Description

A type of discrete action policy using a neural network for function approximation.

Constructor arguments offer the additional option separate_actions. This is useful for the case of eligibility traces. It allows to use clearing actions traces, since it uses a separate approximator for each action, rather than a single approximator with many outputs.

The class has essentially the same interface as DiscretePolicy. A major difference is the fact that you must supply a real vector that represents the state.

Note that using Q-learning with eligibility traces in this class can result in divergence theoretically.

Definition at line 35 of file ann_policy.h.

Constructor & Destructor Documentation

◆ ANN_Policy()

ANN_Policy::ANN_Policy ( int  n_states,
int  n_actions,
int  n_hidden = 0,
real  alpha = 0.1,
real  gamma = 0.8,
real  lambda = 0.8,
bool  eligibility = false,
bool  softmax = false,
real  randomness = 0.1,
real  init_eval = 0.0,
bool  separate_actions = false 
)

Make a new policy.

Definition at line 17 of file ann_policy.cpp.

Here is the call graph for this function:

◆ ~ANN_Policy()

ANN_Policy::~ANN_Policy ( )
virtual

Definition at line 60 of file ann_policy.cpp.

Here is the call graph for this function:

Member Function Documentation

◆ getActionProbabilities()

virtual real* ANN_Policy::getActionProbabilities ( )
inlinevirtual

Definition at line 58 of file ann_policy.h.

◆ getLastActionValue()

virtual real ANN_Policy::getLastActionValue ( )
inlinevirtual

Return the last action value.

Reimplemented from DiscretePolicy.

Definition at line 56 of file ann_policy.h.

◆ Reset()

void ANN_Policy::Reset ( )
virtual

Reset eligibility traces.

Reimplemented from DiscretePolicy.

Definition at line 175 of file ann_policy.cpp.

Here is the call graph for this function:

◆ SelectAction()

int ANN_Policy::SelectAction ( real s,
real  r,
int  forced_a = -1 
)
virtual

Select an action, given a vector of real numbers which represents the state.

Definition at line 75 of file ann_policy.cpp.

Here is the call graph for this function:

◆ useConfidenceEstimates()

bool ANN_Policy::useConfidenceEstimates ( bool  confidence,
real  zeta = 0.01 
)
virtual

Set to use confidence estimates for action selection, with variance smoothing zeta.

Definition at line 188 of file ann_policy.cpp.

Here is the call graph for this function:

Member Data Documentation

◆ delta_vector

real* ANN_Policy::delta_vector
protected

Scratch vector for TD error.

Definition at line 43 of file ann_policy.h.

◆ eligibility

bool ANN_Policy::eligibility
protected

eligibility option

Definition at line 44 of file ann_policy.h.

◆ J

ANN* ANN_Policy::J
protected

Evaluation network.

Definition at line 38 of file ann_policy.h.

◆ J_ps_pa

real ANN_Policy::J_ps_pa
protected

Evaluation of last action.

Definition at line 42 of file ann_policy.h.

◆ Ja

ANN** ANN_Policy::Ja
protected

Evaluation networks (for separate_actions case)

Definition at line 39 of file ann_policy.h.

◆ JQs

real* ANN_Policy::JQs
protected

Placeholder for evaluation vector (separate_actions)

Definition at line 41 of file ann_policy.h.

◆ ps

real* ANN_Policy::ps
protected

Previous state vector.

Definition at line 40 of file ann_policy.h.

◆ separate_actions

bool ANN_Policy::separate_actions
protected

Single/separate evaluation option.

Definition at line 45 of file ann_policy.h.


The documentation for this class was generated from the following files: