基于小波变换的多模态航拍车辆目标检测网络Multimodal aerial vehicle object detection network based on wavelet transform
李洪玉,韩晶,吕学强
摘要(Abstract):
可见光与红外模态间的差异导致弱光环境下的车辆检测任务面临空间和语义对齐问题;此外,红外模态图像分辨率不足也使得小目标车辆特征提取困难。为此,提出一种无人机视角下基于小波变换的多模态航拍车辆检测网络(wavelet transform-based multimodal aerial vehicle detection network, WAVDNet)。首先,设计基于小波变换的特征增强模块,利用高频信息强化特征提取能力,并据此筛选关键特征向量,以抑制模态间的冗余信息干扰。此外,设计了一种可变形注意力模块,利用高频信息进行采样点的自适应调节,解决模态间的空间和语义对齐问题,对可见光和红外模态的特征进行多语义层次的自适应融合。最后,在DroneVehicle和VEDAI两个基准数据集上进行的实验表明,所提算法的mAP@0.5指标分别达到了81.7%和90.7%,优于多种当前先进算法,高于次优算法1.4和3.5百分点,验证了所提算法的有效性。
关键词(KeyWords): 可见光-红外;多模态融合;小波变换;特征筛选;可变形注意力
基金项目(Foundation): 国家自然科学基金项目(62171043);; 北京市自然科学基金项目(4254096);; 北京市教委科研计划科技一般项目(KM202311232003)
作者(Author): 李洪玉,韩晶,吕学强
DOI: 10.16508/j.cnki.11-5866/n.2025.06.007
参考文献(References):
- [1]王满利,张航,张长森.基于深度学习的低光照目标检测算法[J].北京邮电大学学报,2024,47(5):59-65.WANG M L,ZHANG H,ZHANG C S. A low light target detection algorithm based on deep learning[J]. Journal of Beijing University of Posts and Telecommunications,2024,47(5):59-65.(in Chinese)
- [2]PRABHAKAR K R,SRIKAR V S,BABU R V. DeepFuse:a deep unsupervised approach for exposure fusion with extreme exposure image pairs[C]//2017 IEEE International Conference on Computer Vision(ICCV). New York, NY, USA:IEEE,2017:4724-4732.
- [3]CHEN H,SHEN F H,DING D,et al. Disentangled cross-modal Transformer for RGB-D salient object detection and beyond[J].IEEE Transactions on Image Processing,2024,33:1699-1709.
- [4]YUAN M X,WEI X X. C2former:calibrated and complementary Transformer for RGB-infrared object detection[J]. IEEE Transactions on Geoscience and Remote Sensing,2024,62:5403712.
- [5]HE X,TANG C,ZOU X,et al. Multispectral object detection via cross-modal conflict-aware learning[C]//Proceedings of the31st ACM International Conference on Multimedia. New York,NY, USA:ACM,2023:1465-1474.
- [6]LI H F,YANG Z Y,ZHANG Y F,et al. MulFS-CAP:multimodal fusion-supervised cross-modality alignment perception for unregistered infrared-visible image fusion[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2025,47(5):3673-3690.
- [7]HU Y X,SHI L M,YAO L B,et al. Dual attention feature fusion for visible-infrared object detection[C]//Artificial Neural Networks and Machine Learning:ICANN 2023. Berlin,Germany:Springer-Verlag Berlin,2023, 14260:53-65.
- [8]朱自文,宋晓鸥,崔巍,等.可见光-红外图像融合的目标检测综述[J].计算机工程与应用,2025,61(17):17-32.ZHU Z W,SONG X O,CUI W,et al. Review of visible and infrared image fusion for intelligent object detection[J].Computer Engineering and Applications,2025,61(17):17-32.(in Chinese)
- [9]LEE S,KIM T,SHIN J,et al. INSANet:intra-inter spectral attention network for effective feature fusion of multispectral pedestrian detection[J]. Sensors,2024,24(4):1168.
- [10]BAO W,HUANG M Y,HU J J,et al. Dual-dynamic crossmodal interaction network for multimodal remote sensing object detection[J]. IEEE Transactions on Geoscience and Remote Sensing,2025,63:5401013.
- [11]季赛,乔礼维,孙亚杰.语义引导的红外与可见光图像混合交叉特征融合方法[J/OL].计算机科学,2025:1-15.(2025-06-16)[2025-09-11]. https://link. cnki. net/urlid/50. 1075. TP.20250616. 1045. 012.JI S,QIAO L W,SUN Y J. Semantic-guided hybrid crossfeature fusion method for infrared and visible images[J/OL].Computer Science,2025:1-15.(2025-06-16)[2025-09-11].https://link. cnki. net/urlid/50. 1075. TP. 20250616. 1045. 012.(in Chinese)
- [12]GUO J J,GAO C Q,LIU F C,et al. DAMSDet:dynamic adaptive multispectral detection transformer with competitive query selection and adaptive feature fusion[C]//Computer Vision-ECCV 2024. Cham:Springer,2024:464-481.
- [13]ZHU X Z,SU W J,LU L W,et al. Deformable DETR:deformable transformers for end-to-end object detection[C]//9th International Conference on Learning Representations.Washington DC, USA:ICLR:2021:894-510.
- [14]FINDER S E,AMOYAL R,TREISTER E,et al. Wavelet convolutions for large receptive fields[C]//Computer VisionECCV 2024. Cham:Springer, 2025:363-380.
- [15]SUN Y M,CAO B,ZHU P F,et al. Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning[J]. IEEE Transactions on Circuits and Systems for Video Technology,2022,32(10):6700-6713.
- [16]RAZAKARIVONY S,JURIE F. Vehicle detection in aerial imagery:a small target detection benchmark[J]. Journal of Visual Communication and Image Representation,2016,34:187-203.
- [17]XIE X X,CHENG G,RAO C F,et al. Oriented object detection via contextual dependence mining and penalty-incentive allocation[J]. IEEE Transactions on Geoscience and Remote Sensing,2024,62:5618010.
- [18]ZHANG N,LIU Y M,LIU H,et al. DTNet:a specialized dualtuning network for infrared vehicle detection in aerial images[J]. IEEE Transactions on Geoscience and Remote Sensing,2024,62:5002815.
- [19]SUN X,YU Y H,CHENG Q. Low-rank multimodal remote sensing object detection with frequency filtering experts[J].IEEE Transactions on Geoscience and Remote Sensing,2024,62:5637114.
- [20]YUAN M X,SHI X R,WANG N,et al. Improving RGB-infrared object detection with cascade alignment-guided transformer[J].Information Fusion,2024,105:102246.
- [21]XIE J,NIE J,DING B N,et al. Cross-modal local calibration and global context modeling network for RGB-infrared remotesensing object detection[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,2023,16:8933-8942.
- [22]WANG S M,WANG C P,SHI C Y,et al. Mask-guided Mamba fusion for drone-based visible-infrared vehicle detection[J].IEEE Transactions on Geoscience and Remote Sensing,2024,62:1-12.
- [23]SHEN L Y,LANG B H,SONG Z X. DS-YOLOv8-based object detection method for remote sensing images[J]. IEEE Access,2023,11:125122-125137.
- [24]SHEN L Y,LANG B H,SONG Z X. Infrared object detection method based on DBD-YOLOv8[J]. IEEE Access,2023,11:145853-145868.
- [25]WANG H Y,WANG C P,FU Q,et al. YOLOFIV:object detection algorithm for around-the-clock aerial remote sensing images by fusing infrared and visible features[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,2024,17:15269-15287.
- [26]ZHANG J Q,LEI J,XIE W Y,et al. Guided hybrid quantization for object detection in remote sensing imagery via one-to-one self-teaching[J]. IEEE Transactions on Geoscience and Remote Sensing,2023,61:1-15.
- [27]WANG Z A,LIAO X H,YUAN J,et al. EMCFormer:equalized multimodal cues fusion transformer for remote sensing visibleinfrared object detection under long-tailed distribution[J].IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,2025,18:9533-9545.
- [28]QIAN J C,QIAO B Y,ZHANG Y K,et al. DACFusion:dual asymmetric cross-attention guided feature fusion for multispectral object detection[J]. Neurocomputing,2025,635:129913.