基于对比学习的机械专利附图相似性计算Contrastive learning-based similarity calculation for mechanical patent drawings
孙潇岳,吕学强,韩晶,游新冬
摘要(Abstract):
在机械专利自动化评估中,专利附图的语义相似度对比是关键环节。现有深度学习方法难以适应机械专利附图的独特数据特点。为此,提出一种基于对比学习的机械专利附图相似性计算方法 ViT-Mech。引入PatchMix方法构建图像块序列,通过混合多张图像生成训练样本及软标签,模拟复杂结构间的相似关系,增强模型对语义信息的理解;结合空间变换网络(spatial transformer network, STN)与DINOv2(distillation with no labels version 2),提出Spatial DINO模块用于特征提取,利用DINOv2的自蒸馏视觉Transformer(vision Transformer, ViT)权重处理机械图不同于现实的视觉特征,并通过STN提升模型对不同绘制方式的鲁棒性。构建了包含4级相似度标注的小样本验证集,用于评估模型性能。实验结果表明,ViT-Mech在相似度评价任务中准确度达到70.0%,较DINOv2提升2.0百分点。
关键词(KeyWords): 计算机视觉;对比学习;机械专利附图;特征提取;相似性计算
基金项目(Foundation): 国家自然科学基金项目(62171043);; 北京市自然科学基金项目(4254096);; 北京市教委科研计划科技一般项目(KM202311232003)
作者(Author): 孙潇岳,吕学强,韩晶,游新冬
DOI: 10.16508/j.cnki.11-5866/n.2025.06.008
参考文献(References):
- [1]HE K M,ZHANG X Y,REN S Q,et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). New York, NY,USA:IEEE,2016:770-778.
- [2]余容平,李柏林,苏欣,等.深度度量学习的多视角高频工件图像检索[J].机械设计与制造,2023,393(11):31-34.YU R P,LI B L,SU X,et al. Multi-view high-frequency workpiece image retrieval based on deep metric learning[J].Machinery Design&Manufacture,2023,393(11):31-34.(in Chinese)
- [3]余璀璨,李慧斌.基于深度学习的人脸识别方法综述[J].工程数学学报,2021,38(4):451-469.YU C C,LI H B. Deep learning based 2D face recognition:a survey[J]. Chinese Journal of Engineering Mathematics,2021,38(4):451-469.(in Chinese)
- [4]岳有军,漆潇,赵辉,等.基于改进YOLOv8的果园复杂环境下苹果检测模型研究[J].南京信息工程大学学报,2025,17(1):31-41.YUE Y J,QI X,ZHAO H,et al. Apple detection in complex orchard environments based on improved YOLOv8[J]. Journal of Nanjing University of Information Science&Technology,2025,17(1):31-41.(in Chinese)
- [5]PUSTU-IREN K,BRUNS G,EWERTH R. A multimodal approach for semantic patent image retrieval[C]//Proceedings of the 2nd Workshop on Patent Text Mining and Semantic Technologies. Aachen:CEUR-WS, 2021, 2909:45-49.
- [6]高一聪,王彦坤,费少梅,等.基于迁移学习的机械制图智能评阅方法[J].浙江大学学报(工学版),2022,56(5):856-863.GAO Y C,WANG Y K,FEI S M,et al. Intelligent proofreading method of engineering drawing based on transfer learning[J].Journal of Zhejiang University(Engineering Science),2022,56(5):856-863.(in Chinese)
- [7]汪伟.基于人工智能的机械图纸相似匹配研究[J].现代工程科技,2025,4(5):5-9.WANG W. Research on similarity matching of mechanical drawings based on artificial intelligence[J]. Modern Engineering Technology,2025,4(5):5-9.(in Chinese)
- [8]徐岩,郭晓燕,荣磊磊.无监督学习的车辆重识别方法研究综述[J].计算机科学与探索,2023,17(5):1017-1037.XU Y,GUO X Y,RONG L L. Review of research on vehicle reidentification methods with unsupervised learning[J]. Journal of Frontiers of Computer Science&Technology,2023,17(5):1017-1037.(in Chinese)
- [9]HE K M,CHEN X L,XIE S N,et al. Masked autoencoders are scalable vision learners[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). New York, NY,USA:IEEE,2022:16000-16009.
- [10]CHEN X L,HE K M. Exploring simple Siamese representation learning[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). New York, NY,USA:IEEE,2021:15745-15753.
- [11]李子琛,易修文,陈顺,等.基于深度对比孪生网络的事件辨重方法[J].计算机科学,2024,51(12):30-36.LI Z C,YI X W,CHEN S,et al. Deep contrastive Siamese network based repeated event identification[J]. Computer Science,2024,51(12):30-36.(in Chinese)
- [12]DENG J,DONG W,SOCHER R,et al. ImageNet:a large-scale hierarchical image database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. New York, NY,USA:IEEE,2009:248-255.
- [13]OQUAB M,DARCET T,MOUTAKANNI T,et al. DINOv2:learning robust visual features without supervision[EB/OL].(2024-02-02)[2025-06-01]. https://doi. org/10. 48550/arXiv. 2304. 07193.
- [14]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al. An image is worth 16×16 words:Transformers for image recognition at scale[EB/OL].(2021-06-03)[2025-06-01]. https://arxiv. org/abs/2010. 11929.
- [15]张重生,陈杰,李岐龙,等.深度对比学习综述[J].自动化学报,2023,49(1):15-39.ZHANG C S,CHEN J,LI Q L,et al. Deep contrastive learning:a survey[J]. Acta Automatica Sinica,2023,49(1):15-39.(in Chinese)
- [16]HE K M,FAN H Q,WU Y X,et al. Momentum contrast for unsupervised visual representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York, NY,USA:IEEE,2020:9729-9738.
- [17]CHEN T,KORNBLITH S,NOROUZI M,et al. A simple framework for contrastive learning of visual representations[C]//Proceedings of the 37th International Conference on Machine Learning. San Diego, CA, United States:JMLR,2020:1597-1607.
- [18]GRILL J B,STRUB F,ALTCH??F,et al. Bootstrap your own latent:a new approach to self-supervised learning[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Red Hook:Neural information processing systems foundation,2020:21271-21284.
- [19]SHEN C C,LIU D W,TANG H,et al. Inter-instance similarity modeling for contrastive learning[EB/OL].(2023-06-29)[2025-06-01]. https://arxiv. org/abs/2306. 12243.
- [20]JADERBERG M, SIMONYAN K, ZISSERMAN A,et al.Spatial transformer networks[C]//Advances in Neural Information Processing Systems. Montreal, Canada:NeurIPS,2015:5854-5863.
- [21]CARON M,MISRA I,MAIRAL J,et al. Unsupervised learning of visual features by contrasting cluster assignments[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. La Jolla, CA, USA:NIPS,2020:9912-9924.