论文链接:https://ieeexplore.ieee.org/document/10970072
视频链接:https://ieeexplore.ieee.org/document/10970072/media#media
现有的物体级SLAM方法对几何特征和语义信息进行单独处理和简单叠加,忽视了几何特征和语义信息之间的内在联系与相互制约,导致了二者的相互割裂。3D物体检测方面:(1) 现有方法大多依赖先验信息或海量训练数据,难以适应复杂任务场景。(2) 现有方法严重依赖二维边界框信息,且只利用几何特征对候选模型进行评分,无法建立几何特征与物体模型的良好对应关系,导致估计得到的3D物体模型难以达到较高的精度。物体级SLAM方面: 现有方法只利用几何特征对相机位姿进行优化,而语义信息只提供简单的交并比约束,忽略了几何特征与语义信息之间的对应关系,大大降低了它们在SLAM系统中的一致性,难以为机器人定位和建图提供更为充足和有效的约束信息。
针对上述问题,本文提出一个语义-几何紧耦合的单目视觉物体级SLAM系统TiMoSLAM,在3D物体表示、检测、数据关联和联合优化过程中,考虑语义信息和几何特征之间的严格对应关系,从而确保语义-几何的一致性,实现准确的3D物体检测和物体级SLAM。
W. Zhu, J. Yuan*, X. Zhang and F. Chen, Bridging the Gap Between Semantics and Geometry in SLAM: A Semantic-Geometric Tight-Coupling Monocular Visual Object SLAM System. IEEE Transactions on Robotics (T-RO), doi: 10.1109/TRO.2025.3562440.
Abstract
Existing object-level SLAM methods often overlook the correspondence between semantic information and geometric features, resulting in a significant gap between them within SLAM frameworks. To tackle this issue, this paper proposes TiMoSLAM, a semantic-geometric tight-coupling monocular visual object SLAM system, which considers a rigorous correspondence between semantics and geometry across all steps of SLAM. Initially, a general Semantic Relation Graph (SRG) is developed to consistently represent semantic information alongside geometric features. Detailed analyses on complete constraints of the geometric feature combinations on estimation of 3D cuboid model are performed. Subsequently, a Compound Hypothesis Tree (CHT) is proposed to incrementally construct the object-specific SRG and concurrently estimate the 3D cuboid model of an object, ensuing semantic-geometric consistency in object representation and estimation. Special attention is given to the matching errors between geometric features and objects during the optimization of camera poses and object parameters. The effectiveness of this method is validated on various datasets, as well as in real-world environments.