Skip to content

Text Encoders

UNDER CONSTRUCTION

CLIP (Contrastive Language-Image Pre-training)

Contrary to popular usage, "CLIP" is not a model but a training procedure to make models embed both text and images into the same embedding space. Though when used to refer to one model, it's probably OpenCLIP by openai.

LLM-As-Encoders

Modern models have largely shifted to using off-the-shelf LLMs such as Gemma, Qwen, etc. as encoders, the idea being they're trained a massive corpus which should give them deep understanding of language.