Pre-trained models

What is a pre-trained model?

It is a saved network that was previously trained on a large dataset, typically on a large-scale task. It is possible to use the pre-trained model as it is or use transfer learning to customize it.

Why pre-trained model?

Training big models (with huge amounts of parameters) from scratch is expensive.

A lot of computing power is needed
Involves a lot of training time
Large corpora

Let's take a look at the parameters of some of the most powerful language models...

Deep learning neural network parameters until June 2021

Neural network parameters until december of 2022

Some of the biggest model language models we have nowadays

Model

Lab

"Selected playgrounds"

"Parameters (B)"

"Tokens trained (B)"

"Ratio T:P (Chinchilla scaling)"

Training dataset

"Announced "

Public?

Released

"Paper/ Repo"

Notes

GPT-4

OpenAI

TBA

🆆 📚 ⬆ 🕸 🌋

Inflection

TBA

🕸

🔗

Devs from DeepMind

LaMDA 2

Google AI

YouTube (video only)

178?

⬆ 🕸 👥

May/2022

🟡

TBA

Chatbot with tiny walled garden demo TBA

Fairseq

Meta AI

Forefront, TS, Goose

13 & 1100

🆆 📚 ⬆ 🕸 🕸 🕸

Dec/2021

🟢

Dec/2021

🔗

GLaM

Google Inc

1200

🆆 📚⬆ 🕸 👥

Dec/2021

🔴

N/A

🔗

PaLM

Google Research

540

780

2:1

🆆 📚⬆ 🕸 👥

Apr/2022

🔴

N/A

🔗

MT-NLG

Microsoft/NVIDIA

530

270

1:1

🆆 📚 ⬆ 🌋 🕸 🕸

Oct/2021

🔴

N/A

🔗

BERT-480

Google Research

480

🆆 📚 🕸

Nov/2021

🔴

N/A

🔗

Submission to benchmarks. Original dataset was BookCorpus + Wikipedia: https://arxiv.org/pdf/1810.04805.pdf

Gopher

DeepMind

280

300

2:1

🆆 📚 ⬆ 🕸 🌋

Dec/2021

🔴

N/A

🔗

Luminous

Aleph Alpha

AA playground

200

🕸

Nov/2021

🟢

Apr/2022

🔗

Devs from EleutherAI

Jurassic-1

AI21

Studio

178

300

2:1

🆆 📚 ⬆ 🕸

Aug/2021

🟢

Aug/2021

🔗

Emulated GPT-3 dataset

BLOOMZ

BigScience

https://github.com/bigscience-workshop/xmtf

176

366

3:1

⬆ 🕸

Nov/2022

🟢

Nov/2022

🔗

fine-tuned

OPT-IML

Meta AI

https://github.com/facebookresearch/metaseq/tree/main/projects/OPT-IML

175

300

2:1

🆆 📚 ⬆ 🕸

Dec/2022

🟢

Dec/2022

🔗

Instruct

ChatGPT

OpenAI

https://chat.openai.com/

175

300

2:1

🆆 📚 ⬆ 🕸

Nov/2022

🟢

Nov/2022

🔗

Instruct with strict policies ("extremely limited")

BlenderBot 3

Meta AI

blenderbot.ai (US only)

175

🆆 📚 ⬆ 🕸

Aug/2022

🟢

Aug/2022

🔗

GPT-3

OpenAI

Playground, Emerson

175

300

2:1

🆆 📚 ⬆ 🕸

May/2020

🟢

Nov/2021

🔗

Popular: 3.1M wpm

FLAN

Google

137

⬆ 🕸 👥

Sep/2021

🔴

N/A

🔗

Fine-tuned LaMDA

LaMDA

Google AI

YouTube (video only)

137

168

2:1

⬆ 🕸 👥

Jun/2021

🔴

N/A

🔗

Chatbot

GLM-130B

Tsinghua & Zhipu

https://huggingface.co/spaces/THUDM/GLM-130B

130

400

4:1

🆆 📚 ⬆ 🕸

Aug/2022

🟢

Aug/2022

🔗

50% English (200B tokens), so included here

Galactica

Meta AI

https://galactica.org/

120

450

4:1

📚

Nov/2022

🟢

Nov/2022

🔗

scientific only

YaLM 100B

Yandex

Github (train/deploy)

100

300

3:1

🆆 📚 ⬆ 🕸

Jun/2022

🟢

Jun/2022

🔗

Megatron-LM clone, Russian/English: https://medium.com/yandex/yandex-publishes-yalm-100b-its-the-largest-gpt-like-neural-network-in-open-source-d1df53d0e9a6

Sparrow

DeepMind

1400

20:1

🆆 📚 ⬆ 🕸 🌋

Sep/2022

🔴

N/A

🔗

Chatbot as a fine-tuned version of Chinchilla 70B

Chinchilla

DeepMind

1400

20:1

🆆 📚 ⬆ 🕸 🌋

Mar/2022

🔴

N/A

🔗

First to double tokens per size increase

NLLB

Meta AI

Github (train/deploy)

54.5

🌋

Jul/2022

🟢

Jul/2022

🔗

54.5B MOE, 3.3B dense. 200+ languages

xlarge

Cohere

52.4

📚 🕸

Sep/2021

🟢

Nov/2021

🔗

Stealth 'ebooks and webpages'. 52B: https://crfm.stanford.edu/helm/v1.0/?models=1

RL-CAI

Anthropic

🆆 📚⬆ 🕸 👥

Dec/2022

🔴

N/A

🔗

RLAIF=reinforcement learning with AI feedback

AlexaTM 20B

Amazon Alexa AI

Github (train/deploy)

1000

50:1

🆆 🕸

Aug/2022

🟢

TBA

🔗

Wikipedia and mC4 only. seq2seq

Key:

🆆 Wikipedia

👥 Dialogue

📚 Books

🆀🅰 Questions and answers

⬆ Reddit outbound

🌋 Special

🕸 Common Crawl

🇫🇷 French

PreviousMain Generative AI tools NextChatbots, search engines and LLMs

Last updated 2 years ago