Preconditioned Conjugate Gradient for Reduced Camera System in SLAM
1. Introduction
Try to reproduce the article Pushing the Envelope of Modern Methods for Bundle Adjustment.
The main contribution of this article are :
- Use BLAS to rewrite the optimziation algorithms.
- A novel embedded point iterations (EPIs).
- Use block-based preconditioned conjugate gradients.
1.1 Linear algebra softwares
Recommond to read this slide from Standford Convex optimzation course EE364B website .
- BLAS offers three levels of basic matrix operations.
- LAPACK offers more high level linear algebra algorithms.
- Both can be called from C++ (our main develop language).
- Eigen is another matrix calculation library (has better interface, but little bit slows). See their Benchmark, And also see How does Eigen compare to BLAS/LAPACK?
In my point of view, I prefer Eigen , as it is easier to use, and it offers NEON acceleration (As I am focus on android cellphone applications).
1.2 Preconditioned Conjugate Gradient
see my blog of Conjugate Gradient for more details.
- In the upper blog, I have tested solving SLAM hessien matrix useing PCG, while it didn’t work well.
- But now we know (from this article), that PCG will accelerate the system when it is used to the RCS (Reduced camera system), which is the linear system after schur complement (which marginalize the feature point parameters). So we will go on this direction.
1.3 Prepare real SLAM data
- We will extract a real SLAM data from ORB_SLAM2 system. Code in gitee
- One small RCS system with 33 camera frames. (data in gitee)
(Left image is the original data, while in the right side, we want to show the sparsity pattern, by assigning all non-zero elements to “1”) - Another larger RCS system with 230 camera frames. (data in gitee)
- Another larger RCS system with 710 camera frames. (As in my later tests, I find the upper two sets are too small for my PC, and I don’t know how to limit my matlab cpu usage. So I made a much larger set to better the acceleration)
2. PCG test
2.1 PCG Algorithm
- PCG use the same as my last blog , matlab code could be found here (which has shown to be faster than the matlab offical version).
- Preconditioning with a block diagonal matrix (bandwidth choose to be 6, the DOF of camera frame).
time_start = cputime;
n = length(A);
A_approx = sparse(n, n);
m = 6;
for i = 1:n/m
Asub = A(((i-1)*m+1):(i*m),((i-1)*m+1):(i*m));
A_approx(((i-1)*m+1):(i*m),((i-1)*m+1):(i*m)) = Asub;
end
L = chol(A_approx)';
L = chol(A_approx)';
time_end = cputime;
tchol = time_end - time_start;
fprintf('\nCholesky factorization of A approx. Time taken: %e\n', tchol);
fprintf('\nStarting Mine PCG ...\n');
time_start = cputime;
[x,res,resvec] = mine_pcg(A,b,L, L', 1e-4,200);
time_end = cputime;
fprintf('Mine PCG done. With %d iterations\nTime taken: %e\n', ...
length(res),time_end - time_start);
fprintf('\nTotal time includes Cholesky factorization: %e\n', tchol+time_end - time_start);
semilogy(resvec/norm(b), '.--'); hold on;
set(gca,'FontSize', 16, 'FontName', 'Times');
xlabel('cgiter'); ylabel('relres');
2.2 Direct solve
fprintf('\nStarting dense direct solve ...\n');
time_start = cputime;
x_star_dense = A\b;
time_end = cputime;
relres_dense = norm(A*x_star_dense - b)/norm(b);
fprintf('Relative residual: %e\n', relres_dense);
fprintf('Dense direct solve done.\nTime taken: %e\n',...
time_end - time_start);
2.3 CG
- Mine CG matlab implementation could be found here
- Tested to use exactly the same time as the matlab offical version.
- Different from the paper, I used Cholesky factorization.
fprintf('\nStarting Mine CG ...\n');
time_start = cputime;
[x,res,resvec] = mine_cg(A,b,1e-4,200);
time_end = cputime;
fprintf('Mine CG done. With %d iterations\nTime taken: %e\n',...
length(res),time_end - time_start);
semilogy(resvec/norm(b), '.--'); hold on;
set(gca,'FontSize', 16, 'FontName', 'Times');
xlabel('cgiter'); ylabel('relres');
2.4 Final Output
Algorithm | Full Time |
---|---|
Direct solve | 1.2656 |
CG | Failed |
PCG | 0.125 |
In a word, We have a 10 times speed up !