Adaptation

Our robot learns to track a specific target using supervisory feedback from a teacher. During training, the teacher watches the robot's video input and corrects its behavior whenever it makes a mistake. The teacher corrects the robot by indicating where the target is located in the visual field using a graphical user interface. This action prompts the learning algorithm to adjust the weights of the convolutional neural network in order to better track the object at that location.

The algorithm uses the supervisory signals to adjust the kernels W_Zand scalar weights c_Z of the neural network. The neural network is updated whenever the maximally salient location of the neural network (i_m,j_m) does not correspond to the desired object's true position (i_n,j_n) as identified by the teacher. Objective functions proportional to the sum squared error terms at the maximal location and at the new desired location are used for training the network:

e_m²	=	\|g_m-g[X^s_m(i_m,j_m)\|² ,	(5)
e_n²	=	$\displaystyle \min_s \vert g_n-g[X^{s}(i_n,j_n)\vert^2 .$	(6)

For each correction that is provided by the teacher, the algorithm tries to minimize the errors in these objective functions. First, the gradients of Eqs. 5-6 are computed. These gradient terms are then backpropagated through the convolutional network [Nowlan,LeCun], resulting in the following rules for adaptation:

$\displaystyle \Delta c_Z$	=	$\displaystyle \eta\,e_m g'(X_m)g[\bar{Z}(i_m,j_m)] +$
		$\displaystyle \eta\,e_n g'(X_n) g[\bar{Z}(i_n,j_n)] ,$	(7)
$\displaystyle \Delta W_Z$	=	$\displaystyle \eta\,e_m g'(X_m) g'(\bar{Z}_m) c_Z Z_m +$
		$\displaystyle \eta\,e_n g'(X_n) g'(\bar{Z}_n) c_Z Z_n .$	(8)

In typical applications of neural networks where a large set of the training examples can be considered simultaneously, the learning rate $\eta$ is set to some small positive number. However in our case, it is desirable for the robot to learn to track an object in a new environment as quickly as possible. Thus, rapid adaptation of the weights during even a single training example is needed. A natural way of doing this is to use a fairly large learning rate, and to repeatedly apply the update rules in Eqs. 7-8 until the calculated maximally salient location is very close to the actual desired position.