2x NVIDIA GeForce RTX 3090 · llama.cpp / IQ4_NL · 4/25/2026
Show all run details
Run ID
cmodt5m380002l204yqwh7l4a
Model
unsloth/Qwen3.6-27B
Display name
Qwen3.6-27B
Revision
main
Family
Qwen
Parameters
28B
Active params
n/a
MoE
no
Output tok/s
42.1
Prefill tok/s
n/a
Total tok/s
n/a
TTFT
228.8ms
Peak VRAM
n/a
Prompt tokens
25
Output tokens
256
Prefill tokens
n/a
Context length
131,712
Batch size
1
Hardware class
DISCRETE_GPU
Hardware
2x NVIDIA GeForce RTX 3090
GPU slots
n/a
GPU count
2
VRAM
48GB
Chip vendor
n/a
Chip family
n/a
Chip variant
n/a
Unified memory
n/a
NPU TOPS
n/a
CPU
AMD 7790
RAM
96GB
OS
windows 11
Power
n/a
Engine
llama.cpp
Engine version
n/a
Quantization
IQ4_NL
Backend
n/a
Tensor parallel
n/a
Pipeline parallel
n/a
GPU layers
n/a
Split mode
n/a
KV cache dtype
n/a
KV cache size
n/a
Prefix caching
n/a
Attention backend
n/a
Flash attention
n/a
Chunked prefill
n/a
Prefill chunk
n/a
Continuous batching
n/a
CPU offload
n/a
CPU layers
n/a
Rope scaling
n/a
Rope scale
n/a
Yarn ext factor
n/a
Engine quant
n/a
SGLang quant
n/a
GPU mem util
n/a
Max running seqs
n/a
Scheduler delay
n/a
Num parallel
n/a
Concurrency
n/a
Spec decoding
no
Spec method
n/a
Spec model
n/a
Spec draft model
n/a
Spec tokens
n/a
Spec ngram
n/a
Spec draft TP
n/a
MTP enabled
no
MTP draft layers
n/a
Temperature
n/a
Top P
n/a
Top K
n/a
Min P
n/a
Repeat penalty
n/a
Mirostat
n/a
Command
Extra flags
n/a
Notes
LM Studio on Win11, 96GB RAM, dual RTX 3090 (48GB VRAM), flash_attention=true, parallel=2 instances
Submitted
4/25/2026, 3:56:57 AM
Last edited
n/a