One of the most frustrating situation I often encountered as a student and hobbyist in artificial intelligence, was the lack of possible testing grounds. Often times, I would find myself with friends, all excited about some new super cool tensorflow feature, a new algorithm, or simply a new idea, and the biggest obstacle would be the lack of interesting or adapted environments to test in. Sure, we could always code a simple sandbox game, but it would take time, probably lead to basic solutions, and on top of that we’d also have to write some sort of rendering, otherwise where’s the fun in looking at numbers only?
Then two weeks ago, I came across a podcast featuring Danny Lange (VP of AI at Unity), during which he described a framework they’d created in-house which gives all that is needed to create environments and agents in Unity¹, as well as a python API. This aroused my curiosity and I found myself binging videos of the VP on a sunday evening, and next thing I know, I was downloading unity and all the necessary packages.
Now I had a software and a library, both of which completely unknown to me, and as many programmers will know, the best way to learn a new skill is to pick a project and just do it. I had thought about self-driving cars for a long time so I just went ahead and started working on a basic environment to generate circuits and drive a car. The next two weeks were pretty much eat-sleep-work-unity and the occasional partying (necessary evil), until finally I finished it: it was time to teach my agent to drive!
Here is how it works:
This is an example of a circuit, generated using an open source library, Path Creator.
This library generates a bezier curve, and inherently also contains a set of vertices, which are used to give the rewards.
The car only has 4 inputs: its current speed, and the distance to the wall in the three directions shown by the green rays (-45,0,+45 degrees). The allowed actions are turn right, left, accelerate and brake.
With all of that ready to go, I used Proximal Policy Optimization (PPO) to teach the car to drive. PPO trains a neural network to approximate the optimal policy of an agent, where a policy is basically a function saying what action to take given the current state (the aforementioned 4 features). I let it run while dreaming of nobel prizes, and to my surprise I woke up to a car which could drive better than I can!
This being done, my basic project is over but there’s a lot of space for improvement. I aim to improve the physics by using car models instead of a cuboid, possibly add random obstacles, and have random validation circuits generated during training. Furthermore, the idea of curriculum training is very interesting, so I’ll probably add that too at some point.
I hope you enjoyed this littleexperiment, if there’s anything else you think would be interesting to test, let me know, and if you want to know more about the tools used, here are a few ressources:
- The video of Danny Lange that inspired me: https://www.youtube.com/watch?v=sRyZ-XwmgnE
- The ML Agents page and repository: https://unity3d.com/machine-learning
- Proximal Policy Optimisation: https://openai.com/blog/openai-baselines-ppo/
¹ Unity is a 2D/3D physics engine mostly used for the creation of games, but which is also commonly adopted to create simulation environments.