Text Encoders
UNDER CONSTRUCTION
CLIP (Contrastive Language-Image Pre-training)
Contrary to popular usage, "CLIP" is not a model but a training procedure to make models embed both text and images into the same embedding space. Though when used to refer to one model, it's probably OpenCLIP by openai.
LLM-As-Encoders
Modern models have largely shifted to using off-the-shelf LLMs such as Gemma, Qwen, etc. as encoders, the idea being they're trained a massive corpus which should give them deep understanding of language.