Unscented Multi-point Smoother for Fusion of Delayed Displacement Measurements : Application to Agricultural Robots

Visual Odometry (VO) is increasingly a useful tool for robotic navigation in a variety of applications, including weed removal for agricultural robotics. The methods of evaluating VO are often computationally expensive and can cause the VO measurements to be significantly delayed with respect to a compass, wheel odometry, and GPS measurements. In this paper we present a Bayesian formulation of fusing delayed displacement measurements. We implement solutions to this problem based on the unscented Kalman filter (UKF), leading to what we term an unscented multi-point smoother. The proposed methods are tested in simulations of an agricultural robot. The simulations show improvements in the localization RMS error when including the VO measurements with a variety of latencies.


INTRODUCTION
In recent years Agricultural Robotics has been an increasingly important research topic, and there are numerous publications presenting unmanned ground vehicles and robot platforms, (Bogue, 2016;Biber et al., 2012;Jørgensen et al., 2007;Molstad et al., 2014;Grim-stad et al., 2015Grim-stad et al., , 2016)).Most agricultural platforms rely heavily on GPS for navigation.RTK GPS systems may provide localization accuracy of ±0.02 m under ideal conditions.The cost of an RTK GPS system may hinder the adaption of mobile robots in agriculture, and the signal conditions often reduce the position accuracy.
Visual-aided navigation may provide several benefits for agricultural robots, and lessen the dependence on expensive RTK GPS systems.The work presented here has sprung out from a research project by Adigo AS, in cooperation with the Norwegian University of Science and Technology and the Norwegian Institute of Bioeconomy Research (NIBIO).
An agricultural robot for high precision drop-ondemand herbicide application for row crops is under development.The robot uses a downward facing camera to identify weeds and a nozzle array applies the herbicide as the robot passes over them, (Urdal et al., 2014;Utstumo and Gravdahl, 2013;Utstumo et al., 2015).
Weed control is a vital part of agriculture, and autonomous robotic weed control has become an important research area.The review by Slaughter et al. (2008) illustrates the potential of robotic weed control, and presents several similar systems to the one which is the focus of this paper.
The same images used for plant classification may also be used to compute visual odometry (VO) measurements.In this paper, VO measurements are assumed to be frame-to-frame rotation and displacements that the robot has undergone between two overlapping images.
The VO technique considered finds a set of matching features in two subsequent images.These features are used to reconstruct the movement of the camera by minimization of the transformation that matches one set to the other.This is similar to wheel odometry (WO) which uses the wheel encoders to reconstruct the movement of the robot.Unlike the WO measurements, VO will be unaffected by any skidding and sliding of the wheels.Both VO and WO are relative displacements with respect to a previous state of the system.Absolute measurements, such as GPS, provide measurements with respect to a known coordinate frame.
The feature identification and matching algorithms introduces a latency between when a picture is taken, and when the measurement becomes available.We refer to this time as processing delay.This processing delay may be more than one second in non-optimized implementations, to an order of milliseconds in implementations as described in Forster et al. (2014), or with dedicated hardware and tight coupling of inertial and VO measurements as in Goldberg and Matthies (2011).
The processing delay for the VO measurement is characteristically different from the other available sensors such as compass, GPS, wheel encoders, which are considered instantaneous in this paper.The process of detecting and matching feature points, and solving for the displacement measurement, is not necessarily fixed.It may vary with the number of feature points processed.
The camera is triggered by a hardware GPIO line.This allows us to know when a picture is taken with the same time reference as the GPS, compass and other measurements.
For the fusion of relative displacement measurements, a method called stochastic cloning (Mourikis et al., 2007) was considered.This method "clones" the state estimates when a measurement should have arrived, augmenting the state vector used by the filter, thus maintaining the cross-covariances between the current state of the system and the time when a measurement should have arrived.Practical examples of stochastic cloning can be found in Romanovas et al. (2013) for visual-inertial/magnetic data fusion, in Van der Merwe and Wan (2004) under the name latency compensation, and in Mourikis et al. (2009) for spacecraft entry, descent and landing.
Similar to Van der Merwe and Wan (2004), we use the UKF as opposed to the EKF.The EKF often suffers from providing covariance estimates that are lower than the actual covariance, something which can be very detrimental in precision agriculture where the decision to spray or not may be directly based on the covariance estimate.Instead we use the Unscented Kalman filter (UKF), which through the unscented transform (UT) deals more directly with non-linearities by choosing a strategic set of sample points (Julier and Uhlmann, 1997).
The main contributions of this paper are • A presentation of a Bayesian framework for delayed displacement fusion, which build upon the method of stochastic cloning.
• An implementation of a novel fixed-point smoother termed unscented multi-point smoother.
• A description of the relation between stochastic cloning and the proposed framework.
The remainder of this paper is structured as follows: in Section 2 the Bayesian formulation of delayed displacement measurement fusion is presented.The dependencies in the basic model and the delayed displacement scenario are presented with Hidden Markov models.These are used to formulate a Bayesian filter capable of fusing the delayed displacement measurement.In Section 3 the unscented multi-point smoother is implemented.First Gaussian assumptions are applied to the filter algorithm, such that the algorithm is described using an augmented state vector for a multivariate Gaussian distribution.Then the unscented transform is briefly described for evaluating the necessary expectations and covariances.In Section 4 the robot simulation setup is described, followed by simulation results.In Section 5 potential future work is outlined.

BAYESIAN FORMULATION
In this section we present the Bayesian formulation of fusion with delayed displacement measurements, using graphical models to indicate stochastic dependencies.

Basic System
The basic underlying system considered in this paper is where x ∈ R L is the state vector of pose, forward velocity, and rotational velocity, z are the measurements from the fast sensors, compass, GPS, and WO.The process noise w k and measurement v k is considered additive Gaussian.In Fig. 2 a Hidden Markov model is presented where there is no delay on any measurements, and there are no displacement measurements.The green arrows indicate the transition model, (1), showing how the next state is only conditioned on the previous state.The red arrow indicates the measurement model, (2), and how the likelihood of the measurement is only conditioned on the current state.We desire to find the probability density function (pdf) of the current state given the measurements, p(x k |z 0:k ).For the basic system this can be done with a recursive Bayesian filter (Thrun et al., 2005).

Delayed Displacement Measurements
The delayed displacement measurement model is where d n is the delayed displacement measurement dependent on two previous states x l and x m , and ṽn is Gaussian noise.The Hidden Markov model of the system is given in Fig. 3.At time t l the first picture is taken, and at time t m the second picture is taken.The displacement measurement becomes available at time t n .We desire to find the distribution p(x n |z 0:n , d n ).As the delayed displacement measurement is dependent on two previous states, optimal fusion of d n requires a joint distribution of x l , x m and x n .
In reality, the displacement measurements are dependent on the underlying features' poses relative to the camera's pose at time t l and t m .The simplification of the displacement measurement as a function of the states is based on Mourikis et al. (2007).
With an initial distribution p(x 0 ), from t 0 until t l , the measurements z 0:l are fused iteratively using the recursive Bayes filter up to p(x l |z l ).From there on the filter must maintain x l .This can be done by augmenting the state vector by a "clone" x c l of x l at time t l .This leads to a joint density as where p(x c l |x l ) is a Dirac delta distribution.Thus, even if p(x l |z l ) is Gaussian, the joint density will not be Gaussian unless a regularization constant is used.Under Gaussian assumptions, A just as straightforward, and in principle more general, solution to the delayed fusion problem follows from "extending" the state vector when the first prediction after time t l is done: which under Gaussian assumptions is This is referred to as extending, and occurs as an alternative to prediction at timestep t l+1 .In this spirit, we can make further extensions as required.In particular, we also extend at time t m+1 to arrive at the joint distribution p(x l , x m , x m+1 |z m ), which, when iteratively fusing z with a recursive Bayes filter leads to p(x l , x m , x n |z n ).When a picture has been taken at a timestep, the next predict step in the filter is replaced with an extend step.A system where the camera is not synchronized with the filter will face the additional challenge of estimating the clock synchronization and trigger time.
With the joint distribution p(x l , x m , x n |z n ), the delayed displacement measurement d n is fused using Bayes theorem according to where µ is the normalization constant and p(d n |x l , x m ) is the likelihood of the displacement measurement.
Note that the likelihood is specified conditional on both x l and x m , but not on x n .If no other displacement measurements depend on the states x l and x m , the states are marginalized from the joint distribution by which under Gaussian assumptions is simply done by omitting the parts of the expectation vector and covariance matrix associated with x l and x m .In Alg.1 the delayed displacement fusing filter algorithm is outlined.This algorithm is a Bayesian ondemand smoother.When a picture has been taken, the state vector is extended with the predicted state.And after updating with a displacement measurement, any unnecessary old states are marginalized.The list of state indices when the pictures have been taken is denoted K.The set N contains the current index i only if a displacement measurement is available.The augmented state vector x S contains all the states maintained in the joint distribution.
The purpose of the list S is to keep track of the corresponding state indices.In the algorithm ∪ is used to append an index, and \ is used to remove indices.The function ind(d) returns the indices associated with a displacement measurement d (e.g.l and m) that can be marginalized.The function ind and the list S are used to handle the case when a picture is used for more than one displacement measurement.This is generally the case, as an image generates displacement Algorithm 1 Bayesian delayed displacement fusion Require: p(x 0 ), K, N 1: S ← {0} 2: for i = 1 to i = ∞ do 3: Predict: S ← S ∪ {i} Update: end if 21: end for measurements both together with the preceeding and the succeeding image.
K is updated by the camera triggering before each iteration and N is updated before the update step by the VO module indicating a displacement measurement being ready.The filter acts similar to a fixed-point smoother with the capability of formulating the points to be smoothed on-demand, a "multi-point" smoother.

UNSCENTED MULTI-POINT SMOOTHER
In this section we look at an unscented multi-point smoother based on Alg.1.First we describe more thoroughly what the Gaussian assumptions means for our augmented state vector x S , then we present important properties of the unscented transformation, and show how the method in this paper relates to stochastic cloning.

Gaussian Assumption
To implement Alg.1 with a UKF as the underlying filter, one must define how the extension method, (10), under Gaussian assumptions, behaves for the augmented state vector x S .Consider the case where we are extending at time Var(x k ) . (13) The notation Cov(x S , x k ) is shorthand for where S 1 indicates the oldest index in the list of state indices S. The newest index is always k − 1.The mean and covariance of x S is maintained by the filter.We must evaluate the parts associated with x k .In the following section we describe E(x k ) and Var(x k ) with the unscented transform.For the cross-covariances between x S and x k , ( 14), consider S + to be a list of state indices containing the same indices as S, with the last index, k −1 replaced with k.Then the cross-covariance between x S and x S + is by definition Hence, evaluating the cross-covariance between x S and x S + , and extracting the rightmost columns of size L, gives us ( 14).This augmented system keeps all but the last state the same between each iteration. x The augmented version of the displacement measurement model h must know which states in the augmented state vector x S correspond to the imminent displacement measurement d n = ha (x S ) + ṽn . (18)

Unscented Transform
In this section we briefly introduce the unscented transform and the properties that need to be evaluated for Alg.1.
The basic premise of the unscented transform is that it is easier to approximate a probability distribution than a nonlinear function (Julier and Uhlmann, 2002).This is done by propagating a set of deterministically chosen sigma-points through the nonlinear function.For a nonlinear transformation where x ∈ R L is a Gaussian random variable of dimension L. Sigma-points can be defined by the selection scheme for h = 1, ..., L, i = 1, ..., L, and j = L + 1, ..., 2L + 1, where λ is a tuning parameter defining the spread of the sigma-points around the expected value ( Van der Merwe, 2004).
The subscript i of the matrix Var(x) denotes its ith column vector.From Van der Merwe (2004), under the Gaussian assumption, the mean, covariance and crosscovariance are constructed from the sigma-points by These methods allow us to evaluate the expectation, covariance, and cross-covariance of the augmented system described in the previous section, and the unscented multi-point smoother can be constructed.

Remark on Stochastic Cloning
If the underlying filter is an EKF, then f (x k ) ≈ F k x k .The cloning procedure of (7) without a regularization constant constructs the augmented expectation state vector xS|k , and covariance matrix P S k|k of the filter according to xS|k = xk|k xk|k ( 28) This does not describe a Gaussian distribution as the covariance matrix is singular.When performing the prediction step, an UKF will fail as it relies on the Cholesky factorization of the state covariance matrix, which is only unique on nonsingular matrices.On the other hand, an EKF can readily perform the prediction step with an augmented transition model giving the expectation state vector and covariance matrix given by and which is identical to the extension method described in this paper.It is interesting to note that by Schur's complement, we require non-zero process noise to ensure that the distribution remains Gaussian through an extend step.

Remark on Smoothing
The filter algorithm, Alg.1, performs optimal fixedpoint smoothing of the augmented states under the assumption that the model can be described by ( 1), ( 2) and ( 5).If the transformations are highly nonlinear, the unscented transform will also encounter problems.As such, it is debatable whether the smoothing procedure is beneficial.When absolute measurements are available from e.g.GPS or a compass, it was observed that the covariance associated with an old augmented state diminished to the point where the delayed displacement measurement was assumed to be more accurate than it actually was.To remedy this, for the simulation with high process delay and low camera framerate, the covariance associated with the oldest augmented state had a lower threshold it could not decrease below.This was used on the pose of the robot as these were the states affected by the absolute measurements.

System
The robot is modeled as a unicycle-like robot with noslip conditions.The kinematic model for the system is based on the model by Cruz and Carelli (2006), with a change where the castor wheel is at the rear end of the robot, where x and y is the robot's east-north position, ψ is yaw, u is the forward speed, and ω is the yaw rate.
The kinematic model ( 32) was implemented by using the forward Euler method and Gaussian process noise was added: We assume that the onboard sensors are not delayed, except for the visual odometry measurements, when setting up the observation mapping h(x k ), (2).We model the onboard RTK-GPS, compass, and wheel odometry, as direct measurements of the position, (x, y), the heading (ψ), and linear and angular velocities (u, ω) respectively.The measurements are assumed to be affected by additive Gaussian measurement noise.
The measurement d n is rotated by the oldest picture frame, where R(•) is the 2D rotation matrix.Using the simulink models and controller of Martins (2013) the robot was simulated with a timestep of 0.05 s, following a figure eight trajectory.As the robot is heavy and slow moving, the process noise is negligible.The acronym JUKF is used for the delayed displacement fusing UKF in reference to the joint distribution.The kinematic controller has an upper bound of 1.5 m/s in these simulations.
For the simulations, the errors of the filter estimates are

Simulation Results
In Test 1 the time between images, τ C , is 0.25 s, and the processing delay, τ D is 0.2 s.The RMS errors are given in Tab. 1.The JUKF fusing the delayed displacement measurements has a lower RMS error than the UKF not fusing the delayed displacement measurement.
In Test 2 the time between images is 2 s, and the processing delay is 1 s.The RMS errors are given in Tab. 2. The JUKF fusing the delayed displacement In Test 3 the time between images is 3 s, and the processing delay is 2 s.The RMS errors are given in Tab. 3. See Fig. 4 for the error in pose with respect to time, and Fig. 5 for error in velocities with respect to time.The purple line is the UKF, the green is the JUKF.Note that whenever a VO measurement is fused, the error in JUKF decreases, causing a sawtooth effect.
In Test 4 the time between images is 5 s, and the processing delay is 3 s.The RMS errors are given in Tab. 4. In this test a covariance threshold was applied to the covariances associated with the oldest x, y and ψ.Without the covariance threshold, the JUKF did not have lower RMS errors than the UKF.At this frame rate, the pictures are not expected to overlap for the robot under typical field conditions.This test was included to show that the algorithm show improvement in the localization RMS error for cases beyond the worst expected latencies.

Future Work and Discussion
The algorithm proposed in this paper handles a mild case of out-of-sequence measurements, in the sense that displacement measurements are only processed after several GPS and compass measurements have been processed, i.e. out-of-sequence.The key to handling this is the ind function, declaring which augmented states are to be marginalized.
In the simulations presented here, we have only considered GPS and compass measurements in addition to VO.In practice, most platforms will also have inertial  sensors available, which may be included in the measurement model.When using MEMS sensors, it would also be natural to include the gyro and accelerometer biases in the filter.
Finally, this work employed a full-state representation as opposed to an error-state representation, which has been used for visual odometry as described in Mourikis and Roumeliotis (2007).In continuation of this research, it may be interesting to investigate how error-state smoothing relates to the Bayesian paradigm.

Conclusion
This paper has proposed a Bayesian framework for the fusion of delayed displacement measurements.This led to a multi-point smoother capable of defining fixedpoints to be smoothed on demand according to the GPS, WO, and compass measurements.This filtering technique may prove useful in other scenarios where fixed-points to be smoothed are formulated while the filter is running.
By using the unscented transform, the filter was able to use the smoothing effects of the fusion method for its benefits.The filter maintains estimates of the state the robot was in when a picture was taken, smoothed by the absolute measurements.The VO fusing methods improved the localization RMS error for a variety of latencies.

Figure 1 :
Figure 1: A prototype of the Asterix robot.

Figure 3 :
Figure 3: Hidden Markov model showing a delayed displacement measurement d n being dependent on the states x l and x m , but arriving at t n .Green arrows represent (1), red arrows represent (2), and blue arrows represent (5).

Figure 4 :Figure 5 :
Figure 4: Simulation of the robot showing pose error with respect to time.Simulated with τ C = 3s and τ D = 2s.Purple indicates UKF, green indicates JUKF.