For my MSc thesis, I studied the field of Bayesian Reinforcement Learning, focusing on Bayesian methods, later adding the possibility for multi-agent cases.

Reinforcement Learning is a field of Machine Learning optimising the behaviour of agents in an environment by maximising the reward. These agents can interact with the environment through actions, and navigates through the states. In the simplest cases, these are modelled as Markov Decision Processes, and the environment is model-full, that is the reward function and transition probabilities are explicit functions. A big challenge for application of these algorithms is that, in real environments, these are often not known.

Inverse Reinforcement Learning (IRL) tries to fill this gap. Instead of optimising the agents behaviour, we already have an optimal behaviour (e.g. recordings of humans), and we want to find the reward function which would generate the same behaviour. Take for instance the case of self-driving. The reward function would be extremely complicated, since it would have to be the compositions of thousands or more rules (stop at red light, stay on road, do not hit pedestrians or others cars, etc.) each perfectly balanced to get a final reward function. Writing this by hand would be *impossible*, and IRL algorithms attempt to do it for us.

Other applications are to try and model an agent to find its preferences (what its “internal” reward function is), and can therefore be used in marketing, to suggest products.

Bayesian IRL is simply a Bayesian approach, and in this case the algorithms are also multi-agent, meaning the algorithm is able to distinguish different agents by observing them, and retrieve each reward function.