Model Catalog
v3.0.0 · 2026-05-10
Browse browser-compatible AI models, check compatibility, estimate performance, and analyze your browser's readiness.
Task:
17 models
all-MiniLM-L6-v2
The standard lightweight sentence embedding model for browser use. 384-dim output, excellent speed-to-quality ratio. Best pick for local semantic search.
Low risk
Text Embedding
Transformers.js
FP32
23 MB~85 MB RAM2+ GB recommended
embeddingsemantic-searchlightweightrecommended
BGE Small EN v1.5
BAAI's BGE-small-en-v1.5 — competitive quality vs MiniLM at nearly the same size. 384-dim, strong on MTEB benchmarks for a sub-30 MB model.
Low risk
Text Embedding
Transformers.js
FP32
24 MB~90 MB RAM2+ GB recommended
embeddingsemantic-searchlightweightbaai
BGE Base EN v1.5
BAAI BGE-base-en-v1.5 — higher quality than the small variant. 768-dim, noticeably better retrieval on long documents. Worth the extra 85 MB.
Low risk
Text Embedding
Transformers.js
FP32
109 MB~280 MB RAM2+ GB recommended
embeddingsemantic-searchbaai
nomic-embed-text-v1
Nomic AI's open embedding model. 768-dim with 8 192-token context window — significantly longer than MiniLM. Good for embedding long documents.
Low risk
Text Embedding
Transformers.js
FP32
8,192 ctx
137 MB~320 MB RAM2+ GB recommended
embeddinglong-contextnomic
MobileViT Small
Apple's MobileViT-small — a hybrid CNN + Vision Transformer at 22 MB. Excellent for real-time browser image classification. Top pick for mobile and low-memory devices.
Low risk
Image Classification
Transformers.js
FP32
22 MB~70 MB RAM2+ GB recommended
visionclassificationlightweightrecommended
ViT-Base/16
Google's Vision Transformer base model. Higher accuracy than MobileViT at 10× the size. Suitable when quality matters more than speed.
Medium risk
Image Classification
Transformers.js
FP32
330 MB~750 MB RAM4+ GB recommended
visionclassificationvit
YOLOS-Tiny
A 6M-parameter detection transformer based on DeiT-Tiny. At 12 MB it is the lightest serious object detector available for browser inference as of May 2026.
Low risk
Object Detection
Transformers.js
FP32
12 MB~55 MB RAM2+ GB recommended
visiondetectionlightweightrecommended
DETR ResNet-50
Facebook's DETR with ResNet-50 backbone. Higher accuracy than YOLOS-Tiny but 14× larger. Expect noticeable latency in browser — prefer YOLOS-Tiny for real-time use.
Medium risk
Object Detection
Transformers.js
FP32
165 MB~500 MB RAM4+ GB recommended
visiondetectiondetr
Whisper Tiny (English)
OpenAI Whisper Tiny, English-only. At 75 MB it is the smallest Whisper variant — best-effort transcription speed in browsers. Expect 5–15× slower than real-time without GPU.
Medium risk
Speech-to-Text
Transformers.js
FP32
75 MB~310 MB RAM4+ GB recommended
audiospeechwhisperenglish-only
Whisper Base (multilingual)
OpenAI Whisper Base — 74M parameters, supports 99 languages. Better accuracy than Tiny at ~2× the size. Slow in browsers without WebGPU acceleration.
Medium risk
Speech-to-Text
Transformers.js
FP32
145 MB~480 MB RAM4+ GB recommended
audiospeechwhispermultilingual
SmolLM2 135M Instruct
HuggingFace's SmolLM2 at 135M parameters — the smallest instruction-tuned LLM with usable output quality. Available in q4f16 via Transformers.js v3. Expect 80–180 tokens/min on GPU.
Medium risk
Text Generation
Transformers.js
INT4
2,048 ctx
90 MB~350 MB RAM4+ GB recommended
llmtext-generationtinyrecommended
SmolLM2 360M Instruct
SmolLM2 at 360M parameters — noticeably better coherence than the 135M variant. q4f16 via Transformers.js v3. Expect 40–100 tokens/min on GPU.
Medium risk
Text Generation
Transformers.js
INT4
2,048 ctx
210 MB~650 MB RAM4+ GB recommended
llmtext-generationtiny
Qwen2.5 0.5B Instruct
Alibaba's Qwen2.5 at 0.5B parameters. Strong reasoning and instruction-following for its size. Available via onnx-community in q4f16. A step up from SmolLM2 in output quality.
Medium risk
Text Generation
ONNX Runtime Web
INT4
4,096 ctx
350 MB~900 MB RAM4+ GB recommended
llmtext-generationqwenrecommended
Gemma 3 1B Instruct
Google's Gemma 3 at 1B parameters — released March 2025. Excellent instruction-following for a 1B model. Available in ONNX web format with q4f16. Requires WebGPU for practical speed.
Medium risk
Text Generation
ONNX Runtime Web
INT4
8,192 ctx
680 MB~1.6 GB RAM6+ GB recommended
llmtext-generationgooglegemma
Llama 3.2 1B Instruct
Meta's Llama 3.2 1B — released September 2024, widely supported in ONNX format. Practical upper limit for browser LLM inference as of May 2026. Requires a capable GPU and 6+ GB RAM.
High risk
Text Generation
ONNX Runtime Web
INT4
131,072 ctx
700 MB~1.7 GB RAM6+ GB recommended
llmtext-generationmetallama
DeepSeek-R1-Distill-Qwen 1.5B
DeepSeek's R1 reasoning model distilled into Qwen 1.5B — released January 2025. Includes chain-of-thought reasoning. Heavy for browsers: ~1 GB download, WebGPU required, 8+ GB RAM recommended.
High risk
Text Generation
ONNX Runtime Web
INT4
4,096 ctx
1.0 GB~2.3 GB RAM8+ GB recommended
llmtext-generationreasoningdeepseek
SAM ViT-Base
Meta's Segment Anything Model (SAM) with ViT-Base backbone. Click-based image segmentation — impressive quality, but 375 MB download makes it slow to load on first use.
Medium risk
Image Processing
Transformers.js
FP32
375 MB~950 MB RAM4+ GB recommended
visionsegmentationsam