Pre-trained models

What is a pre-trained model?

It is a saved network that was previously trained on a large dataset, typically on a large-scale task. It is possible to use the pre-trained model as it is or use transfer learning to customize it.

Why pre-trained model?

Training big models (with huge amounts of parameters) from scratch is expensive.

  • A lot of computing power is needed

  • Involves a lot of training time

  • Large corpora

Let's take a look at the parameters of some of the most powerful language models...

Deep learning neural network parameters until June 2021

Neural network parameters until december of 2022

Some of the biggest model language models we have nowadays

Model

Lab

"Selected playgrounds"

"Parameters (B)"

"Tokens trained (B)"

"Ratio T:P (Chinchilla scaling)"

Training dataset

"Announced "

Public?

Released

"Paper/ Repo"

Notes

GPT-4

OpenAI

TBA

🆆 📚 ⬆ 🕸 🌋

-

Inflection

TBA

🕸

Devs from DeepMind

LaMDA 2

Google AI

YouTube (video only)

⬆ 🕸 👥

May/2022

🟡

TBA

Chatbot with tiny walled garden demo TBA

Fairseq

Meta AI

13 & 1100

🆆 📚 ⬆ 🕸 🕸 🕸

Dec/2021

🟢

Dec/2021

GLaM

Google Inc

1200

🆆 📚⬆ 🕸 👥

Dec/2021

🔴

N/A

PaLM

Google Research

540

780

2:1

🆆 📚⬆ 🕸 👥

Apr/2022

🔴

N/A

MT-NLG

Microsoft/NVIDIA

530

270

1:1

🆆 📚 ⬆ 🌋 🕸 🕸

Oct/2021

🔴

N/A

BERT-480

Google Research

480

🆆 📚 🕸

Nov/2021

🔴

N/A

Submission to benchmarks. Original dataset was BookCorpus + Wikipedia: https://arxiv.org/pdf/1810.04805.pdf

Gopher

DeepMind

280

300

2:1

🆆 📚 ⬆ 🕸 🌋

Dec/2021

🔴

N/A

Luminous

Aleph Alpha

🕸

Nov/2021

🟢

Apr/2022

Devs from EleutherAI

Jurassic-1

AI21

178

300

2:1

🆆 📚 ⬆ 🕸

Aug/2021

🟢

Aug/2021

Emulated GPT-3 dataset

BLOOMZ

BigScience

176

366

3:1

⬆ 🕸

Nov/2022

🟢

Nov/2022

fine-tuned

OPT-IML

Meta AI

175

300

2:1

🆆 📚 ⬆ 🕸

Dec/2022

🟢

Dec/2022

Instruct

ChatGPT

OpenAI

175

300

2:1

🆆 📚 ⬆ 🕸

Nov/2022

🟢

Nov/2022

Instruct with strict policies ("extremely limited")

BlenderBot 3

Meta AI

blenderbot.ai (US only)

175

🆆 📚 ⬆ 🕸

Aug/2022

🟢

Aug/2022

GPT-3

OpenAI

175

300

2:1

🆆 📚 ⬆ 🕸

May/2020

🟢

Nov/2021

Popular: 3.1M wpm

FLAN

Google

137

⬆ 🕸 👥

Sep/2021

🔴

N/A

Fine-tuned LaMDA

LaMDA

Google AI

YouTube (video only)

137

168

2:1

⬆ 🕸 👥

Jun/2021

🔴

N/A

Chatbot

GLM-130B

Tsinghua & Zhipu

130

400

4:1

🆆 📚 ⬆ 🕸

Aug/2022

🟢

Aug/2022

50% English (200B tokens), so included here

Galactica

Meta AI

120

450

4:1

📚

Nov/2022

🟢

Nov/2022

scientific only

YaLM 100B

Yandex

Github (train/deploy)

100

300

3:1

🆆 📚 ⬆ 🕸

Jun/2022

🟢

Jun/2022

Sparrow

DeepMind

70

1400

20:1

🆆 📚 ⬆ 🕸 🌋

Sep/2022

🔴

N/A

Chatbot as a fine-tuned version of Chinchilla 70B

Chinchilla

DeepMind

70

1400

20:1

🆆 📚 ⬆ 🕸 🌋

Mar/2022

🔴

N/A

First to double tokens per size increase

NLLB

Meta AI

Github (train/deploy)

54.5

🌋

Jul/2022

🟢

Jul/2022

54.5B MOE, 3.3B dense. 200+ languages

xlarge

Cohere

52.4

📚 🕸

Sep/2021

🟢

Nov/2021

Stealth 'ebooks and webpages'. 52B: https://crfm.stanford.edu/helm/v1.0/?models=1

RL-CAI

Anthropic

52

🆆 📚⬆ 🕸 👥

Dec/2022

🔴

N/A

RLAIF=reinforcement learning with AI feedback

AlexaTM 20B

Amazon Alexa AI

Github (train/deploy)

20

1000

50:1

🆆 🕸

Aug/2022

🟢

TBA

Wikipedia and mC4 only. seq2seq

Key:

🆆 Wikipedia

👥 Dialogue

📚 Books

🆀🅰 Questions and answers

⬆ Reddit outbound

🌋 Special

🕸 Common Crawl

🇫🇷 French

Last updated