References

1.
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in neural information processing systems. 2017;30.
2.
Runyon C. Using large language models (LLMs) to apply analytic rubrics to score post-encounter notes. Medical Teacher. Published online 2025:1-9.
3.
Sennrich R, Haddow B, Birch A. Neural machine translation of rare words with subword units. arXiv preprint arXiv:150807909. Published online 2015.
4.
Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners. OpenAI blog. 2019;1(8):9.
5.
Firth J. A synopsis of linguistic theory, 1930-1955. Studies in linguistic analysis. Published online 1957:10-32.
6.
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781. Published online 2013.
7.
Pennington J, Socher R, Manning CD. Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014:1532-1543.
8.
Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems. 2022;35:27730-27744.
9.
Ethayarajh K. How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. arXiv preprint arXiv:190900512. Published online 2019.
10.
Reimers N, Gurevych I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:190810084. Published online 2019.