Adapting a Robot’s Linguistic Style Based on Socially-Aware Reinforcement Learning

Vladimir Steiner
Apr 29, 2020
3 min read

Updated: Oct 12, 2022

Hi everyone! Today we are back for a paper on understanding human interest in a discussion!

This article is about creating a robot whose linguistic style will change depending of its interlocutor. The goal is to modify the expressiveness of the robot to match what is preferred by the user. The authors describe that variation as leaning towards introversion or extraversion.

The robot describes Alice in Wonderland characters, telling the main facts about a character before asking the user who he wants to learn about next. During each character explanation, a microphone and a Microsoft Kinect 2 will gather information about the user and will extract social signals. This discussion is a pretext to understand the level of the user’s engagement. The microphone will analyse the user'answers and the Kinect will work on his/her position, the movements of his/her head etc. The engagement is determined by a Bayesian Network (see simplified view below).

The engagement at time t is defined as E(t) and it is defined on the interval [-2, +2]. Of course if E(t) is positive, the user is engaged in the conversation and if it is negative he is not interested or unhappy about it. During the test, the evolution is studied through the variation of E : dE = E(t) - E(t-1) If dE is positive, the listener is getting more interested, and vice versa. However, if the variation is equal to 0, the robot gets a small reward as well, for managing to keep the interest.

The other dimension of the state space is X, also defined on the interval [-2,2]. It represents the expresiveness in the robot’s linguistic style. At -2, the robot will use very few words and will temper the use of positive words (talking about Alice, it would say she is “somewhat imaginative” for exemple). On the contrary, at 2 it will be much more talkative comparatively and use more positive words , even repeating them for emphasis. X is the dimension in which the action state will be represented. At each learning step, our agent can either modify X by +/- 1 or keep it unchanged. It prevents the variablity from becoming excessive, preventing the user from seeing a drastic difference in the robot’s speech from a step to another.

The model follows the Q-Learning algorithm with epsilon-greedy exploration. Here, epsilon = 0,2, which means that one time in five, the action taken will be random, helping our model to explore new paths. During the learning, the epsilon usually decreases until being considered equal to 0. The learning sequence is composed of 30 learning steps, during which the simulated user has a specific preference about the robot’s expressiveness. This preference changes twice, at steps 15 and 26, to see how the robot manages to modify quickly its behaviour to match what the user likes.

We can see that without noise, the model is robust and quickly adapts to the two changes as described previously. Its reward quickly averages at 0,5, which is the reward given when the model does not change its behaviour. Even with noise the reward is not often negative which means that the model does not often change its behaviour in the wrong direction.

This article shows that the robot seems able to capture linguistic preferences of users and to act accordingly. However, the action space is quite narrow, certainly not complex enough to get every subtlety of a person’s character.I also regretted that the authors did not detailed the workings of their simulated users, so as to help us better understand the training process. Furthermore, what the authors describe as introversion and extraversion is quite simplistic, and if one wanted to dig deeper, we would need a more specific model. The experiment is still a really interesting one and quite innovative. I personally find it surprising not to see more papers on the subject of non-verbal communication.

Adapting a Robot’s Linguistic Style Based on Socially-Aware Reinforcement Learning

Recent Posts

Comentarios