Model Catalog

v3.0.0 · 2026-05-10

Browse browser-compatible AI models, check compatibility, estimate performance, and analyze your browser's readiness.

Task:

17 models

all-MiniLM-L6-v2

The standard lightweight sentence embedding model for browser use. 384-dim output, excellent speed-to-quality ratio. Best pick for local semantic search.

Low risk

Text Embedding

Transformers.js

FP32

23 MB~85 MB RAM2+ GB recommended

embeddingsemantic-searchlightweightrecommended

BGE Small EN v1.5

BAAI's BGE-small-en-v1.5 — competitive quality vs MiniLM at nearly the same size. 384-dim, strong on MTEB benchmarks for a sub-30 MB model.

Low risk

Text Embedding

Transformers.js

FP32

24 MB~90 MB RAM2+ GB recommended

embeddingsemantic-searchlightweightbaai

BGE Base EN v1.5

BAAI BGE-base-en-v1.5 — higher quality than the small variant. 768-dim, noticeably better retrieval on long documents. Worth the extra 85 MB.

Low risk

Text Embedding

Transformers.js

FP32

109 MB~280 MB RAM2+ GB recommended

embeddingsemantic-searchbaai

nomic-embed-text-v1

Nomic AI's open embedding model. 768-dim with 8 192-token context window — significantly longer than MiniLM. Good for embedding long documents.

Low risk

Text Embedding

Transformers.js

FP32

8,192 ctx

137 MB~320 MB RAM2+ GB recommended

embeddinglong-contextnomic

MobileViT Small

Apple's MobileViT-small — a hybrid CNN + Vision Transformer at 22 MB. Excellent for real-time browser image classification. Top pick for mobile and low-memory devices.

Low risk

Image Classification

Transformers.js

FP32

22 MB~70 MB RAM2+ GB recommended

visionclassificationlightweightrecommended

ViT-Base/16

Google's Vision Transformer base model. Higher accuracy than MobileViT at 10× the size. Suitable when quality matters more than speed.

Medium risk

Image Classification

Transformers.js

FP32

330 MB~750 MB RAM4+ GB recommended

visionclassificationvit

YOLOS-Tiny

A 6M-parameter detection transformer based on DeiT-Tiny. At 12 MB it is the lightest serious object detector available for browser inference as of May 2026.

Low risk

Object Detection

Transformers.js

FP32

12 MB~55 MB RAM2+ GB recommended

visiondetectionlightweightrecommended

DETR ResNet-50

Facebook's DETR with ResNet-50 backbone. Higher accuracy than YOLOS-Tiny but 14× larger. Expect noticeable latency in browser — prefer YOLOS-Tiny for real-time use.

Medium risk

Object Detection

Transformers.js

FP32

165 MB~500 MB RAM4+ GB recommended

visiondetectiondetr

Whisper Tiny (English)

OpenAI Whisper Tiny, English-only. At 75 MB it is the smallest Whisper variant — best-effort transcription speed in browsers. Expect 5–15× slower than real-time without GPU.

Medium risk

Speech-to-Text

Transformers.js

FP32

75 MB~310 MB RAM4+ GB recommended

audiospeechwhisperenglish-only

Whisper Base (multilingual)

OpenAI Whisper Base — 74M parameters, supports 99 languages. Better accuracy than Tiny at ~2× the size. Slow in browsers without WebGPU acceleration.

Medium risk

Speech-to-Text

Transformers.js

FP32

145 MB~480 MB RAM4+ GB recommended

audiospeechwhispermultilingual

SmolLM2 135M Instruct

HuggingFace's SmolLM2 at 135M parameters — the smallest instruction-tuned LLM with usable output quality. Available in q4f16 via Transformers.js v3. Expect 80–180 tokens/min on GPU.

Medium risk

Text Generation

Transformers.js

INT4

2,048 ctx

90 MB~350 MB RAM4+ GB recommended

llmtext-generationtinyrecommended

SmolLM2 360M Instruct

SmolLM2 at 360M parameters — noticeably better coherence than the 135M variant. q4f16 via Transformers.js v3. Expect 40–100 tokens/min on GPU.

Medium risk

Text Generation

Transformers.js

INT4

2,048 ctx

210 MB~650 MB RAM4+ GB recommended

llmtext-generationtiny

Qwen2.5 0.5B Instruct

Alibaba's Qwen2.5 at 0.5B parameters. Strong reasoning and instruction-following for its size. Available via onnx-community in q4f16. A step up from SmolLM2 in output quality.

Medium risk

Text Generation

ONNX Runtime Web

INT4

4,096 ctx

350 MB~900 MB RAM4+ GB recommended

llmtext-generationqwenrecommended

Gemma 3 1B Instruct

Google's Gemma 3 at 1B parameters — released March 2025. Excellent instruction-following for a 1B model. Available in ONNX web format with q4f16. Requires WebGPU for practical speed.

Medium risk

Text Generation

ONNX Runtime Web

INT4

8,192 ctx

680 MB~1.6 GB RAM6+ GB recommended

llmtext-generationgooglegemma

Llama 3.2 1B Instruct

Meta's Llama 3.2 1B — released September 2024, widely supported in ONNX format. Practical upper limit for browser LLM inference as of May 2026. Requires a capable GPU and 6+ GB RAM.

High risk

Text Generation

ONNX Runtime Web

INT4

131,072 ctx

700 MB~1.7 GB RAM6+ GB recommended

llmtext-generationmetallama

DeepSeek-R1-Distill-Qwen 1.5B

DeepSeek's R1 reasoning model distilled into Qwen 1.5B — released January 2025. Includes chain-of-thought reasoning. Heavy for browsers: ~1 GB download, WebGPU required, 8+ GB RAM recommended.

High risk

Text Generation

ONNX Runtime Web

INT4

4,096 ctx

1.0 GB~2.3 GB RAM8+ GB recommended

llmtext-generationreasoningdeepseek

SAM ViT-Base

Meta's Segment Anything Model (SAM) with ViT-Base backbone. Click-based image segmentation — impressive quality, but 375 MB download makes it slow to load on first use.

Medium risk

Image Processing

Transformers.js

FP32

375 MB~950 MB RAM4+ GB recommended

visionsegmentationsam