Jeffrey Morgan
|
f8241bfba3
gpu: report system free memory instead of 0 (#5521)
|
6 mesi fa |
Daniel Hiltgen
|
6f351bf586
review comments and coverage
|
7 mesi fa |
Daniel Hiltgen
|
fc37c192ae
Refine CPU load behavior with system memory visibility
|
7 mesi fa |
Daniel Hiltgen
|
30a7d7096c
Bump VRAM buffer back up
|
8 mesi fa |
Michael Yang
|
4736391bfb
llm: add minimum based on layer size
|
8 mesi fa |
Jeffrey Morgan
|
f0c454ab57
gpu: add 512MiB to darwin minimum, metal doesn't have partial offloading overhead (#4068)
|
8 mesi fa |
Daniel Hiltgen
|
34b9db5afc
Request and model concurrency
|
10 mesi fa |
Michael Yang
|
26df674785
scale graph based on gpu count
|
9 mesi fa |
Michael Yang
|
41a272de9f
darwin: no partial offloading if required memory greater than system
|
9 mesi fa |
Michael Yang
|
7e33a017c0
partial offloading
|
9 mesi fa |
Daniel Hiltgen
|
be330174dd
Allow setting max vram for workarounds
|
10 mesi fa |
peanut256
|
a189810df6
Determine max VRAM on macOS using `recommendedMaxWorkingSetSize` (#2354)
|
11 mesi fa |
Daniel Hiltgen
|
7427fa1387
Fix up the CPU fallback selection
|
1 anno fa |
Daniel Hiltgen
|
39928a42e8
Always dynamically load the llm server library
|
1 anno fa |
Daniel Hiltgen
|
d88c527be3
Build multiple CPU variants and pick the best
|
1 anno fa |
Jeffrey Morgan
|
c336693f07
calculate overhead based number of gpu devices (#1875)
|
1 anno fa |
Jeffrey Morgan
|
08f1e18965
Offload layers to GPU based on new model size estimates (#1850)
|
1 anno fa |
Jeffrey Morgan
|
c7ea8f237e
set `num_gpu` to 1 only by default on darwin arm64 (#1771)
|
1 anno fa |
Daniel Hiltgen
|
a2ad952440
Fix windows system memory lookup
|
1 anno fa |
Daniel Hiltgen
|
d966b730ac
Switch windows build to fully dynamic
|
1 anno fa |
Daniel Hiltgen
|
7555ea44f8
Revamp the dynamic library shim
|
1 anno fa |
Daniel Hiltgen
|
6558f94ed0
Fix darwin intel build
|
1 anno fa |
Daniel Hiltgen
|
35934b2e05
Adapted rocm support to cgo based llama.cpp
|
1 anno fa |