Edge-first model selection

Find Your Perfect VLM
in 30 Seconds

Stop guessing. Input your constraints — latency, memory, hardware — and get ranked model recommendations with trade-off explanations.

Start Decision Tree
Step 1

Select Your Use Case

Choose the primary vision-language task for your application.

Step 2

Set Your Constraints

Define the hard limits for your deployment environment.

20ms (real-time)500ms (batch)
500MB16GB
3W (mobile)50W (desktop GPU)
Step 3

Ranked Recommendations

7 models match all your constraints.

Top PickEdge Ready

Florence-2-base

Microsoft

78
accuracy

Unified vision foundation model excelling at detection and captioning tasks.

Latency
28ms
Pass
Memory
1.2GB
Pass
Power
5W
Pass
Hardware
0.23B
Pass

Strengths

  • + Very small
  • + Fastest inference
  • + Runs anywhere

Trade-offs

  • - Weak on reasoning tasks
  • - Limited VQA capability
Edge Ready

PaliGemma 3B

Google

74
accuracy

Versatile VLM built on SigLIP and Gemma. Strong across multiple vision tasks.

Latency
85ms
Pass
Memory
4.5GB
Pass
Power
12W
Pass
Hardware
2.92B
Pass

Strengths

  • + Multi-task capable
  • + Good accuracy-size ratio
  • + Fine-tunable

Trade-offs

  • - Requires GPU for real-time
  • - Moderate memory usage
Edge Ready

InternVL2-2B

OpenGVLab

70
accuracy

Competitive small VLM with strong multi-task vision performance.

Latency
68ms
Pass
Memory
3.6GB
Pass
Power
10W
Pass
Hardware
2.21B
Pass

Strengths

  • + Competitive at small scale
  • + Multi-task versatile
  • + Active development

Trade-offs

  • - Requires GPU
  • - Mid-range accuracy
Edge Ready

Phi-3.5-Vision

Microsoft

72
accuracy

Efficient multimodal model balancing capability with deployability.

Latency
110ms
Pass
Memory
7.2GB
Pass
Power
18W
Pass
Hardware
4.15B
Pass

Strengths

  • + Good balance of size and capability
  • + Strong reasoning for size
  • + Efficient architecture

Trade-offs

  • - Needs decent GPU
  • - Mid-range on detection
Edge Ready

Qwen2-VL-2B

Alibaba

68
accuracy

Compact multimodal model with strong OCR and document understanding.

Latency
72ms
Pass
Memory
3.8GB
Pass
Power
10W
Pass
Hardware
2.21B
Pass

Strengths

  • + Best-in-class OCR
  • + Good document understanding
  • + Multi-language support

Trade-offs

  • - Higher memory than alternatives
  • - GPU recommended
Edge Ready

MobileVLM-3B

Meituan

66
accuracy

Purpose-built for mobile and edge deployment with optimized architecture.

Latency
55ms
Pass
Memory
3.2GB
Pass
Power
9W
Pass
Hardware
2.96B
Pass

Strengths

  • + Mobile-optimized
  • + Low power consumption
  • + Fast on-device

Trade-offs

  • - Lower accuracy ceiling
  • - Limited reasoning
Edge Ready

Moondream2

vikhyatk

62
accuracy

Tiny but capable VLM optimized for edge deployment. Excellent latency-to-accuracy ratio.

Latency
45ms
Pass
Memory
2.8GB
Pass
Power
8W
Pass
Hardware
1.86B
Pass

Strengths

  • + Extremely lightweight
  • + Fast inference
  • + Runs on CPU

Trade-offs

  • - Limited complex reasoning
  • - Lower accuracy on OCR

Qwen2-VL-7B

Alibaba

81
accuracy

Powerful multimodal model with state-of-the-art performance on vision-language benchmarks.

Latency
180ms
Pass
Memory
12.0GB
Fail
Power
35W
Fail
Hardware
7.61B
Pass

Strengths

  • + Top-tier accuracy
  • + Excellent OCR
  • + Strong reasoning

Trade-offs

  • - Large model size
  • - High memory requirement
  • - Not edge-friendly

LLaVA-1.6-7B

LLaVA Team

70
accuracy

Popular open-source VLM with strong visual conversation and reasoning abilities.

Latency
165ms
Pass
Memory
11.5GB
Fail
Power
32W
Fail
Hardware
7.06B
Pass

Strengths

  • + Strong conversational ability
  • + Good visual reasoning
  • + Large community

Trade-offs

  • - Large footprint
  • - Weaker on detection tasks

IDEFICS2-8B

Hugging Face

65
accuracy

Open multimodal model with strong document and chart understanding.

Latency
200ms
Pass
Memory
13.5GB
Fail
Power
38W
Fail
Hardware
8.36B
Pass

Strengths

  • + Excellent document understanding
  • + Strong chart/table parsing
  • + Open weights

Trade-offs

  • - Very large
  • - Slow inference
  • - Not edge-deployable