LLMs (Large Language Models)

Language models + deep learning + huge Internet corpora = LLMs

Large Language Models

A large language model is a deep neural network trained on a large corpus of text data to perform various natural language processing (NLP) tasks, such as language translation, text generation, text summarization, etc.

The main characteristic of a large language model is its large number of parameters, which allows the model to capture a wide range of patterns and relationships in the text data. These models are also able to handle large vocabularies and can generate high-quality text.

One property of LLM is what we call emergence abilities. In huge LLM it turns out that they achieve learning properties, being able to learn how to solve new tasks. That was quite surprisng at first, because LLM were not designed with this objective in mind (e.g. some reasoning skills and arithmetic problem solving, few shot learning etc.)

Examples of large language models are GPT, BERT, RoBERTa, T5, BLOOM, etc. GPT (Generative Pre-trained Transformer) is a model developed by OpenAI that is behind huge neural networks such as GPT3.

Additional fine-tuning processes, supervised and even using RLHF (Reinforcement Learning with Human Feedback), allow to adapt these models to specific tasks, domains, and even get very high accuracy in some particulary tasks.

PreviousWord embeddings NextTransformers

Last updated 2 years ago