Fitted Value Iteration
- Python Automation and Machine Learning for ICs -
- An Online Book -

Python Automation and Machine Learning for ICs http://www.globalsino.com/ICs/

Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix

=================================================================================

Fitted Value Iteration (VI) is a term used in the context of approximate dynamic programming and reinforcement learning (RL). In RL, fitted VI gives approximation to V*. It's a method that combines elements of dynamic programming and function approximation to estimate the value function iteratively:

i) Value Iteration:

In traditional value iteration, you iteratively update the value function for each state using the Bellman equation until convergence. This method is exact but can be computationally expensive for large state spaces.

ii) Function Approximation:

In many real-world problems with large state spaces, it's infeasible to store the exact values for every state. Instead, function approximation techniques, such as using parameterized function approximators like neural networks or linear functions, are employed.

iii) Fitted Value Iteration:

Fitted Value Iteration combines value iteration with function approximation. Instead of exactly updating the values for all states, it uses a function approximator to estimate the value function. The term "fitted" indicates that the function approximator is fitted to the observed data.

The fitted value iteration process involves collecting samples of state transitions and rewards from the environment, using these samples to update the function approximator parameters, and iteratively refining the approximation.

iv) Challenges and Considerations:

Fitted value iteration introduces challenges related to generalization and stability. The function approximator needs to generalize well to unseen states, and care must be taken to ensure stable and reliable learning.

Fitted Value Iteration in RL typically involves using function approximation to represent the value function, and it can be associated with various algorithms. One common approach is to use a parameterized function, often represented by a neural network or another type of model, to approximate the value function. The general idea is to update the parameters of this function approximator based on observed samples, incorporating elements of value iteration.

The value iteration update rule is modified to involve function approximation in fitted value iteration:

i) Approximate value function:

Represent the value function using a parameterized function, denoted as Q(s, a; θ), where θ are the parameters of the function approximator.

ii) Bellman update:

Update the parameters θ based on a sample of transitions (s, a, r, s′ ) using a loss function that reflects the Bellman equation.

------------------------------- [3669a]

where,

α is the learning rate.

r is the immediate reward.

γ is the discount factor.

s′ is the next state.

iii) Repeat:

Collect more samples and repeat the update process iteratively.

Table 3669. Applications of fitted value iteration.

Applications	Details
Reinforcement Learning	page4321

============================================

=================================================================================