大语言模型对高考英语语法知识点的解题适配性研究A study on the adaptability of large language models to answering English grammar questions in the Chinese Gaokao
沈为,张仰森
摘要(Abstract):
为了探索大语言模型对高考英语语法知识点的解题适配性,基于高考语法题设计了答案匹配、解释与语法纠错3项递进任务,对不同参数规模的大语言模型进行测评。结果显示:小模型如Qwen-7B在基础语法任务上表现尚可,但在处理复杂语法结构时存在明显不足;大模型如GPT-4表现优异,零样本学习的纠错准确率达90.68%;小模型微调后性能仍不及大模型,这表明参数规模对模型性能影响显著。人工评估结果表明,解释质量与答案的正确性高度相关,大模型更接近人类但有“答案驱动型”偏向,缺乏层级推理。
关键词(KeyWords): 大语言模型;高考英语语法;语法纠错;大模型评估;适配性
基金项目(Foundation): 北京市自然科学基金-小米创新联合基金项目(L233008)
作者(Author): 沈为,张仰森
DOI: 10.16508/j.cnki.11-5866/n.2025.05.004
参考文献(References):
- [1] YE J J,CHEN X T,XU N,et al. A comprehensive capability analysis of GPT-3 and GPT-3. 5 series models[EB/OL].(2023-12-23)[2025-05-15]. https://arxiv. org/abs/2303. 10420.
- [2] BUBECK S, CHANDRASEKARAN V,ELDAN R,et al. Sparks of artificial general intelligence:early experiments with GPT-4[EB/OL].(2023-04-13)[2025-05-15]. https://arxiv. org/abs/2303. 12712.
- [3] LIU Y X,FABBRIA I, CHEN J W,et al. Benchmarking generation and evaluation capabilities of large language models for instruction controllable summarization[C]//Findings of the Association for Computational Linguistics:NAACL.Stroudsburg, PA, USA:ACL,2024:4481-4501.
- [4] KANEKO M,MITA M,K S, KIYONO S,et al. Encoderdecoder models can benefit from pre-trained masked language models in grammatical error correction[C]//Proceedings of the58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA:ACL,2020:4248-4254.
- [5] ROTHE S,MALLINSON J,MALMI E, et al. A simple recipe for multilingual grammatical error correction[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing(Volume 2:Short Papers).Stroudsburg, PA, USA:ACL, 2021:702-707.
- [6] YE J H,QIN S,LI Y H,et al. EXCGEC:a benchmark for editwise explainable Chinese grammatical error correction[C]//Proceedings of the 39th AAAI Conference on Artificial Intelligence and 37th Conference on Innovative Applications of Artificial Intelligence and 15th Symposium on Educational Advances in Artificial Intelligence. New York, NY, USA:ACM, 2025, 39(24):25678-25686.
- [7] SONG Y X, KRISHNA K,BHATT R, et al. GEE! Grammar error explanation with large language models[C]//Findings of the Association for Computational Linguistics:NAACL 2024.New York, NY, USA:ACM, 2024:754-781.
- [8] KOBAYASHI M, MITA M,KOMACHI M. Revisiting metaevaluation for grammatical error correction[J]. Transactions of the Association for Computational Linguistics,2024,12:837-855.
- [9] SCHICK T, DWIVEDI-YU J,JIANG Z B,et al. PEER:a collaborative language model[EB/OL].(2022-08-24)[2025-05-16]. https://arxiv. org/abs/2208. 11663.
- [10] DWIVEDI-YU J,SCHICK T,JIANG Z B,et al. EDITEVAL:an instruction-based benchmark for text improvements[EB/OL].(2022-09-27)[2025-05-17]. https://arxiv. org/abs/2209. 13331v1.
- [11] SOTTANA A,LIANG B,ZOU K,et al. Evaluation metrics in the era of GPT-4:reliably evaluating large language models on sequence to sequence tasks[C]//Proceedings of the 2023Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA:ACL, 2023:8776-8788.
- [12] GRUNDKIEWICZ R,JUNCZYS-DOWMUNT M,GILLIAN E.Human evaluation of grammatical error correction systems[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA:ACL,2015:461-470.
- [13]申丽萍,何朝帆,曹东旭,等.大语言模型在中学历史学科中的应用测评分析[J].现代教育技术,2024,34(2):62-71.SHEN L P,HE C F,CAO D X,et al. Evaluation and analysis of large language models application in of historical discipline middle schools[J]. Modern Educational Technology,2024,34(2):62-71.(in Chinese)
- [14]许家金,赵冲.大语言模型在英语教学中的角色[J].外语教育研究前沿,2024,7(1):3-10.XU J J,ZHAO C. The roles of large language models in English language teaching[J]. Foreign Language Education in China,2024,7(1):3-10.(in Chinese)
- [15] HESSEL J,MARASOVI?A,HWANG J D,et al. Do androids laugh at electric sheep? Humor"understanding"benchmarks from the new yorker caption contest[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA:ACL, 2023, 1:688-714.
- [16] HU E,SHEN Y L,WALLIS P,et al. LoRA:low-rank adaptation of large language models[C]//ICLR 2022:10th International Conference on Learning Representations. ICLR,2021:1-26.