Pre-trained models
Last updated
Last updated
It is a saved network that was previously trained on a large dataset, typically on a large-scale task. It is possible to use the pre-trained model as it is or use transfer learning to customize it.
Training big models (with huge amounts of parameters) from scratch is expensive.
A lot of computing power is needed
Involves a lot of training time
Large corpora
Let's take a look at the parameters of some of the most powerful language models...
Model
Lab
"Selected playgrounds"
"Parameters (B)"
"Tokens trained (B)"
"Ratio T:P (Chinchilla scaling)"
Training dataset
"Announced "
Public?
Released
"Paper/ Repo"
Notes
GPT-4
OpenAI
TBA
🆆 📚 ⬆ 🕸 🌋
-
Inflection
TBA
🕸
Devs from DeepMind
LaMDA 2
Google AI
⬆ 🕸 👥
May/2022
🟡
TBA
Chatbot with tiny walled garden demo TBA
Fairseq
Meta AI
13 & 1100
🆆 📚 ⬆ 🕸 🕸 🕸
Dec/2021
🟢
Dec/2021
GLaM
Google Inc
1200
🆆 📚⬆ 🕸 👥
Dec/2021
🔴
N/A
PaLM
Google Research
540
780
2:1
🆆 📚⬆ 🕸 👥
Apr/2022
🔴
N/A
MT-NLG
Microsoft/NVIDIA
530
270
1:1
🆆 📚 ⬆ 🌋 🕸 🕸
Oct/2021
🔴
N/A
BERT-480
Google Research
480
🆆 📚 🕸
Nov/2021
🔴
N/A
Gopher
DeepMind
280
300
2:1
🆆 📚 ⬆ 🕸 🌋
Dec/2021
🔴
N/A
Luminous
Aleph Alpha
🕸
Nov/2021
🟢
Apr/2022
Devs from EleutherAI
Jurassic-1
AI21
178
300
2:1
🆆 📚 ⬆ 🕸
Aug/2021
🟢
Aug/2021
Emulated GPT-3 dataset
BLOOMZ
BigScience
176
366
3:1
⬆ 🕸
Nov/2022
🟢
Nov/2022
fine-tuned
OPT-IML
Meta AI
175
300
2:1
🆆 📚 ⬆ 🕸
Dec/2022
🟢
Dec/2022
Instruct
ChatGPT
OpenAI
175
300
2:1
🆆 📚 ⬆ 🕸
Nov/2022
🟢
Nov/2022
Instruct with strict policies ("extremely limited")
BlenderBot 3
Meta AI
175
🆆 📚 ⬆ 🕸
Aug/2022
🟢
Aug/2022
GPT-3
OpenAI
175
300
2:1
🆆 📚 ⬆ 🕸
May/2020
🟢
Nov/2021
Popular: 3.1M wpm
FLAN
137
⬆ 🕸 👥
Sep/2021
🔴
N/A
Fine-tuned LaMDA
LaMDA
Google AI
137
168
2:1
⬆ 🕸 👥
Jun/2021
🔴
N/A
Chatbot
GLM-130B
Tsinghua & Zhipu
130
400
4:1
🆆 📚 ⬆ 🕸
Aug/2022
🟢
Aug/2022
50% English (200B tokens), so included here
Galactica
Meta AI
120
450
4:1
📚
Nov/2022
🟢
Nov/2022
scientific only
YaLM 100B
Yandex
100
300
3:1
🆆 📚 ⬆ 🕸
Jun/2022
🟢
Jun/2022
Sparrow
DeepMind
70
1400
20:1
🆆 📚 ⬆ 🕸 🌋
Sep/2022
🔴
N/A
Chatbot as a fine-tuned version of Chinchilla 70B
Chinchilla
DeepMind
70
1400
20:1
🆆 📚 ⬆ 🕸 🌋
Mar/2022
🔴
N/A
First to double tokens per size increase
NLLB
Meta AI
54.5
🌋
Jul/2022
🟢
Jul/2022
54.5B MOE, 3.3B dense. 200+ languages
xlarge
Cohere
52.4
📚 🕸
Sep/2021
🟢
Nov/2021
Stealth 'ebooks and webpages'. 52B: https://crfm.stanford.edu/helm/v1.0/?models=1
RL-CAI
Anthropic
52
🆆 📚⬆ 🕸 👥
Dec/2022
🔴
N/A
RLAIF=reinforcement learning with AI feedback
AlexaTM 20B
Amazon Alexa AI
20
1000
50:1
🆆 🕸
Aug/2022
🟢
TBA
Wikipedia and mC4 only. seq2seq
Key:
🆆 Wikipedia
👥 Dialogue
📚 Books
🆀🅰 Questions and answers
⬆ Reddit outbound
🌋 Special
🕸 Common Crawl
🇫🇷 French
(video only)
, ,
Submission to benchmarks. Original dataset was BookCorpus + Wikipedia:
(US only)
,
(video only)
(train/deploy)
Megatron-LM clone, Russian/English:
(train/deploy)
(train/deploy)