Pre-trained models
What is a pre-trained model?
It is a saved network that was previously trained on a large dataset, typically on a large-scale task. It is possible to use the pre-trained model as it is or use transfer learning to customize it.
Why pre-trained model?
Training big models (with huge amounts of parameters) from scratch is expensive.
A lot of computing power is needed
Involves a lot of training time
Large corpora
Let's take a look at the parameters of some of the most powerful language models...
Deep learning neural network parameters until June 2021
Neural network parameters until december of 2022
Some of the biggest model language models we have nowadays
Model
Lab
"Selected playgrounds"
"Parameters (B)"
"Tokens trained (B)"
"Ratio T:P (Chinchilla scaling)"
Training dataset
"Announced "
Public?
Released
"Paper/ Repo"
Notes
GPT-4
OpenAI
TBA
🆆 📚 ⬆ 🕸 🌋
BERT-480
Google Research
480
🆆 📚 🕸
Nov/2021
🔴
N/A
Submission to benchmarks. Original dataset was BookCorpus + Wikipedia: https://arxiv.org/pdf/1810.04805.pdf
OPT-IML
Meta AI
175
300
2:1
🆆 📚 ⬆ 🕸
Dec/2022
🟢
Dec/2022
Instruct
ChatGPT
OpenAI
175
300
2:1
🆆 📚 ⬆ 🕸
Nov/2022
🟢
Nov/2022
Instruct with strict policies ("extremely limited")
GLM-130B
Tsinghua & Zhipu
130
400
4:1
🆆 📚 ⬆ 🕸
Aug/2022
🟢
Aug/2022
50% English (200B tokens), so included here
YaLM 100B
Yandex
Github (train/deploy)
100
300
3:1
🆆 📚 ⬆ 🕸
Jun/2022
🟢
Jun/2022
Megatron-LM clone, Russian/English: https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6
xlarge
Cohere
52.4
📚 🕸
Sep/2021
🟢
Nov/2021
Stealth 'ebooks and webpages'. 52B: https://crfm.stanford.edu/helm/v1.0/?models=1
AlexaTM 20B
Amazon Alexa AI
Github (train/deploy)
20
1000
50:1
🆆 🕸
Aug/2022
🟢
TBA
Wikipedia and mC4 only. seq2seq
Key:
🆆 Wikipedia
👥 Dialogue
📚 Books
🆀🅰 Questions and answers
⬆ Reddit outbound
🌋 Special
🕸 Common Crawl
🇫🇷 French
Last updated