197 virtual void Reset();
bool forced_learning
Force agent to take supplied action.
int softMax(real *Qs)
Softmax Gibbs sampling.
virtual void setPursuit(bool pursuit)
Use Pursuit for action selection.
bool confidence_uses_gibbs
Additional gibbs sampling for confidence.
bool confidence
Confidence estimates option.
real tdError
temporal difference error
real * sample
sampling output
real ** P
pursuit action probabilities
virtual ~DiscretePolicy()
Kill the agent and free everything.
int max_el_state
max state ID to search for eligibility
real * eval
evaluation of current aciton
LearningMethod
Types of learning methods.
virtual void setSarsa()
Set the algorithm to SARSA mode.
int eGreedy(real *Qs)
e-greedy sampling
virtual void saveFile(char *f)
Save policy to a file.
bool pursuit
pursuit option
virtual void Reset()
Use at the end of every episode, after agent has entered the absorbing state.
int n_actions
number of actions
real expected_r
Expected reward.
real ** vQ
variance estimate for Q
virtual real getTDError()
Get the temporal difference error of the previous action.
bool replacing_traces
Replacing instead of accumulating traces.
virtual void setELearning()
Set the algorithm to ELearning mode.
int n_samples
number of samples for above expected r and V
virtual void setLearningRate(real alpha)
Set the learning rate.
virtual void useSoftmax(bool softmax)
Set action selection to softmax.
virtual void setForcedLearning(bool forced)
Set forced learning (force-feed actions)
virtual void loadFile(char *f)
Load policy from a file.
enum ConfidenceDistribution confidence_distribution
Distribution to use for confidence sampling.
int n_states
number of states
virtual real getLastActionValue()
Get the vale of the last action taken.
enum LearningMethod learning_method
learning method to use;
virtual void useGibbsConfidence(bool gibbs)
Add Gibbs sampling for confidences.
virtual void setGamma(real gamma)
Set the gamma of the sum to be maximised.
ConfidenceDistribution
Types of confidence distributions.
int confMax(real *Qs, real *vQs, real p=1.0)
Confidence-based Gibbs sampling.
virtual void setConfidenceDistribution(enum ConfidenceDistribution cd)
Set the distribution for direct action sampling.
bool confidence_eligibility
Apply eligibility traces to confidence.
virtual void setQLearning()
Set the algorithm to QLearning mode.
int argMax(real *Qs)
Get ID of maximum action.
real lambda
Eligibility trace decay.
virtual void setRandomness(real epsilon)
Set randomness for action selection. Does not affect confidence mode.
virtual void useReliabilityEstimate(bool ri)
Use the reliability estimate method for action selection.
real gamma
Future discount parameter.
virtual bool useConfidenceEstimates(bool confidence, real zeta=0.01, bool confidence_eligibility=false)
Set to use confidence estimates for action selection, with variance smoothing zeta.
int confSample(real *Qs, real *vQs)
Directly sample from action value distribution.
real ** e
eligibility trace
virtual int SelectAction(int s, real r, int forced_a=-1)
Select an action a, given state s and reward from previous action.
real expected_V
Expected state return.
Discrete policies with reinforcement learning.
real ** Q
state-action evaluation
A neural network implementation.
virtual void saveState(FILE *f)
Save the current evaluations in text format to a file.
int min_el_state
min state ID to search for eligibility
DiscretePolicy(int n_states, int n_actions, real alpha=0.1, real gamma=0.8, real lambda=0.8, bool softmax=false, real randomness=0.1, real init_eval=0.0)
Create a new discrete policy.
bool reliability_estimate
reliability estimates option
virtual void setReplacingTraces(bool replacing)
Use Pursuit for action selection.
real zeta
Confidence smoothing.