Tracking a Swinging Target with a Robot Manipulator using Visual Sensing

In this paper we develop a method for loading parts onto a swinging target using an industrial robot. The orientation of the target is estimated by a particle filter using camera images as measurements. Robust and accurate tracking is achieved by using an accurate dynamic model of the target. The dynamical model is also used to compensate for the time delay between the acquisition of images and the motion response of the robot. The target dynamics is modeled as a spherical pendulum. To ensure robust visual tracking the position of the target mass center is estimated. The method is experimentally validated in a laboratory loading station with a swinging conveyor trolley as target, which is commonly used in industry.


Introduction
Robot vision in industrial applications is typically used where work objects are static or moving at constant velocity, such as when picking from a conveyor belt.A more demanding task is the loading of objects on a swinging conveyor trolley, which is illustrated in Figure 1.In the usual industrial solution objects are loaded on the trolleys manually because they are swinging freely.They are swinging freely to avoid excessive forces and accelerations.This paper presents a method to perform this task automatically, by real-time control of an industrial robot manipulator using an estimate of the trolley orientation computed from camera images with a particle filter.
The controller interfaces of industrial robots are designed to operate at fixed update rates, which can range from 125 Hz (Universal Robots), 250 Hz (KUKA) and up to the kHz level.However, cameras that are used in computer vision applications typically have lower update rates.They are limited by the camera hardware itself and the bandwidth available for image transfer.The images are often transferred using Eth-ernet or USB, which provides no guarantees of realtime performance by default.These limitations are described in Corke and others (1996) as the "dynamics of visual sensing".Clearly, for a fixed-rate robot controller to work with the varying frame rates provided by cameras, some method must be employed to compensate for delays and interpolate between images.An approach to compensate for these effects was proposed in Wang et al. (2013), using a dual-rate Kalman filter.A dynamic model was used to predict the target motion at the time instants required for robot control.In Wang et al. (2015) and Lin et al. (2013) it was proposed to identify the parameters of the dynamical model using Expectation-Maximization.
Since the work Isard and Blake (1998) many authors have explored the use of particle filters for visual object tracking.For robotic visual servoing it is specially interesting to look at the work on tracking of rigid bodies in Cartesian space.Particle filter based tracking on the SE(3) group has been investigated using different assumptions on the underlying dynamical model in Kwon et al. (2007), Choi et al. (2011) and Choi and Christensen (2012).In particular, the particle filters were designed to account for the properties of SE(3).The kinetics were modeled as random walk or with an autoregressive dynamic model.To develop this further, we belive that it may be an advantage to use a kinetic model based on the physical equations of motion.This was done in our previous work Myhre and Egeland (2015) where the dynamics of a spherical pendulum was used.A potential benefit of a physical model is that the assumed noise level in the particle filter can be significantly reduced.One of the challenges in visual tracking is to achieve accuracy despite occlusions and cluttered scenes.A possible solution is to use multiple cameras as in Lippiello et al. (2007) and Kermorgant and Chaumette (2011).
In recent years researchers have demonstrated that particle filter based visual tracking can be used in the robot feedback loop Ibarguren et al. (2014), Chitchian et al. (2013), even though it is considered as a computationally heavy method.The parallel nature of the filter makes it possible to run them on commodity Graphical Processing Units (GPUs) to achieve good performance Choi and Christensen (2013), Concha et al. (2014) and Pauwels et al. (2013).
A dynamical model based on physical principles needs to have accurate parameters in order to be useful.This can be done with parameter estimation.An overview of methods for parameter estimation based on particle filtering can be found in Kantas et al. (2009).The two main on-line approaches are Expectation-Maximization Schön et al. (2011) and gradient ascent Poyiadjis et al. (2011).We used the approach Poyiadjis et al. (2011) in our previous work Myhre and Egeland (2015) to find accurate parameters of a spherical pendulum during visual tracking.This is further developed in the following.
In this paper we propose a method to perform the task illustrated in Figure 1, namely to control a robot manipulator tracking a swinging target using computer vision.The proposed method uses a model based on physical principles with estimated parameters, namely the center of mass position, which enables the robot to accurately track the target even as it accelerates.This is experimentally demonstrated in Section 5.The method proposed in this paper has the same goal as in Lin et al. (2013), which is to compensate for visual sensing dynamics using a model of the motion of the target object.The method presented in Lin et al. (2013) uses a general dynamic model, while we propose to use a physically based dynamical model of a spherical pendulum in order to achieve high performance for the specific application.
In this paper XYZ Euler angles was chosen for the kinematic representation of rotation.The main motivation for this is to simplify and improve the parameter estimation algorithm.The benefits of a coordinate invariant representation with the particle filter was thoroughly discussed in Kwon et al. (2007).In the use case that is presented in this paper, it is unrealistic to consider swinging motions with amplitudes larger than 10 • , so the benefits of using a coordinate invariant representation is not as great as in the general case.However, the kinematic convention chosen in this paper has the benefit that the center of mass position naturally can be described using two angular offsets and one linear offset.This enables the parameter estimator to identify the accurate value of the two angular offsets, even when the target is hanging with no velocity (stationary), in which case it is impossible to identify the correct value of the linear offset.In the case of a stationary target, only the two angular offsets are required for visual servoing.A coordinate-free version may be the topic of future research.
This paper presents a novel method for accurate tracking of a swinging target with an industrial robot.
1.The proposed method can handle both stationary and moving targets.
2. A two stage control system is proposed, where one part is running at the rate at which the cameras can deliver images, while the other part is running at the rate required by the robot motion controller.
The two stages are connected using a prediction module based on an accurate dynamical model of the swinging target.
3. Experiments are performed to demonstrate that the method can be used for automatic loading of parts onto the swinging target, using a laboratory version of a loading station found in industry.
4. The experiments are performed using standard commercial equipment.
The structure of the paper is as follows: In Section 2 we discuss the preliminaries of particle filtering and parameter estimation, in Section 3 we present a dynamic model and observation model used for tracking, in Section 4 we propose a method for using the state estimate to control an industrial robot manipulator in real-time.Experiments that validate the proposed method are presented in Section 5.

Preliminaries
In this paper we consider a non-linear system with additive Gaussian noise where x k ∈ R n is the state vector at time step k, θ is a vector of static parameters and v k ∼ N (0, Σ) is a noise vector.The probability density of ( 1) is known as the transition density and can in the case of additive Gaussian noise be written as An observation is made at each time step k by a camera taking an image, denoted by I k .The relation between I k and x k is given by the observation density g(I k |x k ), which is given in Section 3.2.In this section we first present a particle filter for estimation of the state x k , then a method for estimating the vector of static parameters θ, based on the sequence of camera images.

Particle Filter
Inferences about the state vector at timestep k can be made using the prediction equation (3) and the update equation These equations are intractable in general, but particle filters are efficient methods for computing numerical approximations where δ(•) is the Dirac delta function, w k are scalar weights and x The specific particle filter used in this paper is known as Sequential Importance Sampling with Resampling which is described in detail in e.g.Cappé et al. (2007) and Doucet and Johansen (2011).
A numerical approximation to the expected value of p(x k |I 1:k ) can be found as (6)

Estimation of Static Parameters
Methods for sequential estimation of static parameters using particle filters have recently been developed Kantas et al. ( 2014).An online gradient ascent method is used here, where Using the approach presented in Poyiadjis et al. ( 2011) a set of vectors α k are given by the recursive expression where ∇ θ log f (x k |x k−1 ) and ∇ θ log g(I k |x k ) are gradients of the transition (2) and observation (29) densities respectively, as developed in the next section.

Kinematic and Dynamic Modeling
The dynamics of the system is modeled as a spherical pendulum with one additional degree of freedom describing the rotation about the pendulum axis.The configuration can then be described by the XYZ Euler angles φ x , φ y and φ z .When a workpiece is attached to the hanger, the center of mass will be shifted, and the equilibrium position of the hanger will have an unknown offset.To account for this uncertainty, we include offset angles θ 1 and θ 2 about the x and y axes, so that the Euler angles become Φ x = φ x + θ 1 , Φ y = φ y + θ 2 and φ z .Here θ 1 and θ 2 are constant unknown parameters to be identified.As shown in Figure 2 the world frame is denoted W, and the body-fixed frame is denoted B. The rotation matrix from W to B is then given by In the stationary position of the pendulum we have that φ x + θ 1 = 0 and φ y + θ 2 = 0.The equations of motion are derived using the Euler-Lagrange equations applied to the Lagrangian where θ 3 is the unknown constant distance from the pendulum attachment point to the center of mass, and r 3 is the last column in R W B = r 1 r 2 r 3 .The distance θ 3 is the third unknown parameter to be identified.
The resulting equations of motion are φx = 2 φx φy where g = 9.81 m s −2 is the acceleration of gravity.
The state vector of the system (1) is while the vector of unknown parameters is The velocity components in ( 14) are affected by additive noise, modeled by where the components are samples from Gaussian distributions.
A continuous state space model is found from equations ( 13), ( 14) and ( 15) as ẋ = F (x, θ). ( The model is discretized in time using the first order Euler method giving F d (x k−1 , θ) in the system (1).
The gradient then since the transition density is Gaussian, where is the sensitivity with respect to the parameters θ.The sensitivity is an estimate of the effect variations in the parameter θ has on F d (x k−1 , θ).
From Khalil (2002) the sensitivity is computed by taking derivatives of ( 17) with respect to the parameters θ and solving this ode (20)

Image Model
It is assumed that the object is a rigid body with orientation given by the rotation matrix R W B , as illustrated in Fig. 2 The transformation from the frame W to C is given by The image is a two dimensional array of pixel intensities I(p) where p = u v T and u and v are pixel coordinates in the image plane.The camera calibration matrix is where k x , k y , k u and k v are intrinsic camera calibration parameters.Given a point p C = p x p y p z T , the coordinates of the point in the image plane is given by in vector form.
The methods used to find the intrinsic camera parameters k x , k y , k u and k v and the extrinsic camera parameters R C W and t C W are described in Section 5.

Trolley Frame Visual Model
The visual model of the trolley is given by N g line segments.Each line segment j is specified by its endpoints p B j and q B j .For each particle i the rotation matrix R W B is computed using the state vector x k .The two points in the image plane corresponding to p B j and q B j are found in homogeneous pixel coordinates as The line segment defined by p B j and q B j is found in pixel coordinates as the line segment from pj to qj , and can be described as the homogeneous vector ˜ j = a b c T , which is found from the cross product Hartley and Zisserman (2003) ˜ j = γS( qj ) pj (28) where S( qj ) is the skew symmetric form of qj .A scaling factor γ is used to ensure that the two dimensional vector n j = a b T is a unit vector.It is noted that in this description the vector n j is the normal vector to the line segment in image coordinates.
The following observation density is proposed for the particle filter: where p j is the two-dimensional version of the homogeneous vector pj , n j is the normal vector to line segment j, and λ is a parameter which was set to λ = 5, which gives a distance between the points of approximately 5 pixels.
The parameter vector θ does not enter into the observation density, which means that

Control System
In this section we describe a control system used for automatic loading of parts on the swinging trolley.The control system structure is visualized in Figure 3.The content of the block containing "State and parameter estimator" was described in Section 2.1 and 2.2.The mean of the state estimate is computed at each time step k using ( 6).The contents of the remaining blocks are described in the following.

Visual Sensing Dynamics Compensation
The control system illustrated in Figure 3 is logically divided into two parts running at different rates.The modules inside the red dotted polygon are running at the fixed rate required by the robot controller, while the modules inside the blue dotted rectangle are running at the camera rate.As cameras typically come without real-time guarantees, and the frame rate is typically lower than of the robot motion control system, there is a mismatch between the robot controller rate and the camera rate.In order to bridge the gap between the modules running at different rates, the "Predictor" module uses the most recent state-estimate xk and parameter estimate θ, to predict the trolley state x(t) at the time instants required to compute set-points for the robot motion controller.The predictor module predicts the trolley state by integrating the system in ( 17) The result is the predicted transform T W B based on the predicted state x(t).

Robot Program and Motion Compensation
In order to move the end-effector smoothly between two reference frames 2 ) we define the end-effector reference trajectory (33) where exp(•) is the exponential map so(3) → SO(3) and log(•) is the inverse as defined in Murray et al. (1994).A monotonic function β(s) ∈ [0, 1] is used for interpolation of frames such that excessive acceleration is avoided.We use a linear function of the Logistic function where a and b are choosen so that β(0) = 0 and β(1) = 1.The motion of the end-effector is computed using the predicted transform

Reflexxes and Inverse Kinematics
The Reflexxes motion libraries Kröger (2011) was used in Cartesian space to filter low amplitude high frequency noise that is part of the particle filter estimate.Set-points for the robot joint controller was thereafter computed using an inverse kinematics procedure.

Laboratory set-up
As shown in Figure 1 a laboratory set-up was built to perform the experiments described in this section.
The set-up consisted of two Prosilica GC 1020 Ethernet cameras streaming images to a computer at approximately 35 Hz, which is the fastest they can deliver images at full resolution (1024 × 768 pixels). . . .q j q j+1 q j+2 q j+3 q j+4 q j+5 . . .

x(t)
Figure 4: Timing of state estimates x k the camera rate (blue box) and the desired joint variables q j in the robot controller rate (red box).
The computer had a Intel i7-3820 CPU, 16 Gb RAM and a Nvidia Titan graphics card, running Ubuntu Linux 14.10.The Precision Time Protocol was used to achieve synchronization between the clock on the two cameras and the clock on the computer controlling the robot.
A chessboard was mounted on the robot end-effector in order to find the camera calibration parameters required in Section 3.2.For each camera 26 pictures were taken of the chessboard with the robot end-effector in different poses.The intrinsic camera parameters k x , k y , k u and k v were found using standard camera calibration methods provided by the OpenCV library described in Bradski and Kaehler (2008).The extrinsic camera parameters R C W and t C W were found using the method described in Park and Martin (1994).The distance from each of the cameras to the trolley frame was approximately 2.2 m.
The differential equations ( 13) and (20) were discretized using the Euler method.To achieve an accurate estimate of the parameter vector θ it is important that the sensitivity estimate in (20) is accurate.Therefore the step-size for integration was set to 0.0002 s.The two most computationally intensive parts of the particle filter are the observation model and the dynamical model, which therefore were implemented in CUDA in order to run them on the GPU of the Nvidia Titan graphics card.

Visual Sensing Dynamics Compensation
The experiment in this section was performed to validate the performance of the visual dynamics sensing compensation described in Section 4.1.A 5 s sequence of the state estimate was recorded while the target was swinging, and the resulting φ x component is shown in Figure 5.The graphs show that the output from the  predictor module provide a smoother and more accurate estimate of the target state, than the estimate coming directly from the state estimator.

Case Study: Part Loading
In this section we present the experimental validation of the proposed method.We decided to do this by synchronizing the motion of the robot and the trolley, and then using the robot to place a hollow cylinder in loading position.Instead of releasing the grip on the the robot then removed the cylinder from the loading position.The idea was then that if the robot could do this without the cylinder coming into contact with the trolley, the synchronization would be accurate within the difference in dimension of the hole in the cylinder and the size of the attachment hook on the trolley.In this case the documented accuracy would be 6 mm.The target was a trolley hanging from an overhead conveyor, which is used in industrial loading stations commonly operated with manual labour.The experiment was designed to demonstrate that the proposed method also can handle the situations where the position of the mass center changes as objects are attached to the trolley, and that this can be achieved both with moving and stationary targets.To this end, an object was loaded on the trolley so that the center of mass changed to an unknown position, which was estimated by the parameter estimation algorithm.Then the synchronization of robot and trolley motion was demonstrated by letting the robot move the hollow cylinder into loading position and back again without touching the trolley.In this motion the cylinder is very close to the trolley, and a synchronization of 6 mm is validated if the cylinder does not touch the trolley.
The motions that were performed in the experiments are shown in Figure 7.The cylindrical object carried by the robot had inner diameter 22 mm.The trolley frame was welded from steel bars with a square cross section (8 mm×8 mm).The robot program is described in Figure 6.
Step 2) Set the time variable t = 0 and start tracking using the found parameter vector θ.
Step 3) Move the end-effector from the initial position to the pose T B 1 when t ∈ [0, 15).
Step 4) Move the end-effector according to Figure 6: Robot program.
The program was first executed with a stationary target.The results are shown in Figure 8a.In Step 1 the values of the parameters θ 1 and θ 2 converged after 20 s.The parameter θ 3 did not converge because there was no excitation that could be used to identify its value in (7).Estimation was cut off after 25 s.In Step 4 the robot moved the cylindrical object to the loading position on the trolley and back.The states in Figure 8a show that the target was stationary without coming in contact with the robot.
The program was then executed with a moving target.The results are shown in Figure 8b.In Step 1 the values of parameters θ 1 , θ 2 and θ 3 converged after 20 s.Estimation was cut off after 25 s.In Step 4 the robot moved the cylindrical object to the loading position on the trolley and back.The states in Figure 8b show that the target motion was smooth without coming in contact with the robot.

Discussion
The results from the experiment in Section 5.3 shows that • The mean values of φ x and φ y were −θ 1 and −θ 2 , which is consistent with (11).
• The estimated distance to the mass center θ 3 converged when the target was in motion, but not when it was stationary.This was the motivation for choosing the kinematic convention.  2 ) and finally back (to T B 1 ).
• The state vector was not interrupted by external forces during the loading sequence in Step 4, which means that there was no collision between the robot and the target.
• The proposed method for visual servoing worked both with a stationary and a moving target, which is one of the main contributions of this paper.

Conclusion
This paper presented a method for accurate tracking of a swinging target using an industrial robot.A dynamical model of the swinging target was used and a method for estimating the parameters describing the mass center position was presented.The model was used to predict the motion of the swinging target, both for compensation visual sensing dynamics and in the particle filter.The experiments in Section 5 demonstrate that the proposed method can be used to achieve accurate tracking of a swinging target with a robot manipulator.
The method was demonstrated on the industrial task of part loading on swinging conveyor trolleys, where the tolerance was less than 6 mm.The method was demonstrated on moving and stationary targets.The final values are shown as the horizontal black dashed lines.In Step 4 the end-effector was controlled according to (36), and in the interval (15 s to 30 s) the cylinder was put on the loading position on the trolley.The state vector was recorded and is shown here in blue.It is noted that the states in both Figure 8a and 8b are smooth and show no sign of collision.

Figure 1 :
Figure 1: Manual loading of objects onto a swinging conveyor trolley is a common task in industry.The proposed automatic solution is to control the trajectory of the robot manipulator in real-time using an estimate of the trolley motion.An estimate of the trolley motion is found using particle filter based visual tracking.

Figure 2 :
Figure 2: The body reference frame B is rotating relative to the inertial reference frame W. The geometric trolley model is described by a list of line segments in R 3 .

Figure 3 :
Figure 3: The modules comprising the control system.The modules in the blue box are running at the frame rate determined by the cameras.The modules in the red box are running at the fixed rate required by the robot manipulator (125 Hz for UR5).

Figure 5 :
Figure 5: The figure shows the state estimate of φ x (in blue), which is updated at the camera rate, and the predicted state (in red), which is in the faster robot rate of 125 Hz.
(a) Illustration of the experiment performed in Figure 8a.(b) Illustration of the experiment performed in Figure 8b.

Figure 7 :
Figure 7: The same experiment is performed twice, first with a heavy load, then without a heavy load.During the experiment the end-effector moves from the pose on the left (T B 1 ), to the pose on the right (T B 2 ) and finally back (to T B 1 ).
center of mass) (a) Part loading experiment with stationary target.An additional load was placed on the upper right loading position in order to shift the center of mass as shown in Figure 7a.(Dist center of mass) (b) Part loading experiment with moving target.The additional load was removed as illustrated in Figure 7b.

Figure 8 :
Figure8: The graphs illustrates the result of the robot program (Figure6).The results from Step 1 are shown as the green graphs, which display the evolution of the parameter vector θ.The final values are shown as the horizontal black dashed lines.In Step 4 the end-effector was controlled according to (36), and in the interval (15 s to 30 s) the cylinder was put on the loading position on the trolley.The state vector was recorded and is shown here in blue.It is noted that the states in both Figure8aand 8b are smooth and show no sign of collision.