Model types, Algorithms and approaches, Function approximation, Deep reinforcement-learning, Deep Multi-agent Reinforcem

Hello I am Nitsan Soffair, A Deep RL researcher at BGU.

In my Deep reinforcement-learning course you will learn the newest state-of-the-art Deep reinforcement-learning knowledge.

You will do the following

1. Get state-of-the-art knowledge regarding

1. Model types

2. Algorithms and approaches

3. Function approximation

4. Deep reinforcement-learning

5. Deep Multi-agent Reinforcement-learning

2. Validate your knowledge by answering short and very short quizzes of each lecture.

3. Be able to complete the course by ~2 hours.

Syllabus

1. Model types

1. Markov decision process (MDP)

A discrete-time stochastic control process.

2. Partially observable Markov decision process (POMDP)

A generalization of MDP in which an agent cannot observe the state.

3. Decentralized Partially observable Markov decision process (Dec-POMDP)

A generalization of POMDP to consider multiple decentralized agents.

2. Algorithms and approaches

1. Bellman equations

A condition for optimality of optimization of dynamic programming.

2. Model-free

A model-free algorithm is an algorithm which does not use the policy of the MDP.

3. Off-policy

An off-policy algorithm is an algorithm that use policy 1 for learning and policy 2 for acting in the environment.

4. Exploration-exploitation

A trade-off in Reinforcement-learning between exploring new policies to use existing policies.

5. Value-iteration

An iterative algorithm applying bellman optimality backup.

6. SARSA

An algorithm for learning a Markov decision process policy

7. Q-learning

A model-free reinforcement learning algorithm to learn the value of an action in a particular state.

3. Function approximation

1. Function approximators

The problem asks us to select a function among a well-defined class that closely matches («approximates») a target function in a task-specific way.

Value-based, Policy-based, Actor-critic, policy-gradient, and softmax policy

3. REINFORCE

4. Deep reinforcement-learning

1. Deep Q-Network (DQN)

A deep reinforcement-learning algorithm using experience reply and fixed Q-targets.

2. Deep Recurrent Q-Learning (DRQN)

Deep reinforcement-learning algorithm for POMDP extends DQN and uses LSTM.

3. Optimistic Exploration with Pessimistic Initialization (OPIQ)

A deep reinforcement-learning for MDP based on DQN.

4. Value Decomposition Networks (VDN)

A multi-agent deep reinforcement-learning algorithm for Dec-POMDP.

5. QMIX

A multi-agent deep reinforcement-learning algorithm for Dec-POMDP.

6. QTRAN

A multi-agent deep reinforcement-learning algorithm for Dec-POMDP.

7. Weighted QMIX

A deep multi-agent reinforcement-learning for Dec-POMDP.

Resources

• Wikipedia

• David Silver’s Reinforcement-learning course

Aprendizaje por refuerzo moderno con aprendizaje profundo

