I am doing research for a projet of mine and was greatly interested in this article from OpenAI and UC Berkeley. It talks about one-shot learning for robots and more generally, one shot learning policies (I knew one-shot learning was used in computer vision but had no idea it was also done in this kind of domain).
The idea is to make our computer generalize movements, so that it may learn a new move after seeing it once. This kind of method is called meta-learning, because the machine "learns to learn" instead of learning a precise task. Here, we want our machine to learn how to make stacks of cubes, but we want it to make the same exact stacks as what we demonstrated.
The tricky thing is that we show the execution only once and the blocks are not located the same way for the computer. To be more precise, let's say there are six blocks on a table, each of them a different color or a different letter. The computer will have the same blocks with the same colors (or letters) but the blocks will not have the same initial positions. That allows to concentrate on learning the generalization of a move and not the transition between some geographic coordinates and some others.
As always, I will not get too much into technical details, but what is interesting is that they tested different models : one using behavioral cloning (BC, in a word, it is giving the information of your data to the machine so that it can train on it), and three using the DAGGER algorithm (maybe a future article !). Of the three, one learned with entire simulations, the other with only the end result (it knows the initial situation, and we give it only the final frame, where all the stacks are done). The last one is a bit more complex, because it receives a collection of snapshots, meaning that it receives the last frame of each stage of the demonstration trajectory (when you stack the B block on the C block, then when you put the A on top of the B etc). The issue here is that it add the limitation that we need to receive those snapshots even during the test phase, which we do not need in the other models.
The authors were surprised that the model with the complete simulations outperformed even the one with the snapshots collections. However the behavioral cloning method managed to have about the same results as the DAGGER one with complete smiulations. The BC method needs to be injected some noise at the start but is less supervised during the training so it is a nice alternative method.
Independantly of the method however, the results were better for a small number of tasks, and the errors grow as the number of tasks (or their complexity) grows at the same time. It can be explained, at least partly, by the fact that when there is a lot of stacking tasks, the arm will more often hit an already made stack of blocks while trying to build another stack. A more obvious reason is that with more operations, the machine has more chances to miss one.
This subject is not that common so there isn't as much litterature as I would like. Nevertheless here is the link to the paper https://arxiv.org/pdf/1703.07326.pdf, and herer is the article OpenAI wrote when combining One-Shot Imitation Learning and Domain Randomization (this is where the GIF is from): https://openai.com/blog/robots-that-learn/. Thaks for reading and see you next time !
Comments