References
1.
Vaswani A, Shazeer N, Parmar N, et al.
Attention is all you need. Advances in neural information processing
systems. 2017;30.
2.
Runyon C. Using large language models (LLMs) to
apply analytic rubrics to score post-encounter notes. Medical
Teacher. Published online 2025:1-9.
3.
Sennrich R, Haddow B, Birch A. Neural machine
translation of rare words with subword units. arXiv preprint
arXiv:150807909. Published online 2015.
4.
Radford A, Wu J, Child R, et al. Language
models are unsupervised multitask learners. OpenAI blog.
2019;1(8):9.
5.
Firth J. A synopsis of linguistic theory,
1930-1955. Studies in linguistic analysis. Published online
1957:10-32.
6.
Mikolov T, Chen K, Corrado G, Dean J. Efficient
estimation of word representations in vector space. arXiv preprint
arXiv:13013781. Published online 2013.
7.
Pennington J, Socher R, Manning CD. Glove:
Global vectors for word representation. In: Proceedings of the 2014
Conference on Empirical Methods in Natural Language Processing
(EMNLP). 2014:1532-1543.
8.
Ouyang L, Wu J, Jiang X, et al. Training
language models to follow instructions with human feedback. Advances
in neural information processing systems.
2022;35:27730-27744.
9.
Ethayarajh K. How contextual are contextualized
word representations? Comparing the geometry of BERT, ELMo, and GPT-2
embeddings. arXiv preprint arXiv:190900512. Published online
2019.
10.
Reimers N, Gurevych I. Sentence-bert: Sentence
embeddings using siamese bert-networks. arXiv preprint
arXiv:190810084. Published online 2019.