Michael Yang
|
76bdebbadf
decode ggla
|
10 mesi fa |
Patrick Devine
|
2c017ca441
Convert Safetensors to an Ollama model (#2824)
|
10 mesi fa |
Michael Yang
|
949d7b1c48
add gguf file types (#2532)
|
11 mesi fa |
Michael Yang
|
eaed6f8c45
add max context length check
|
1 anno fa |
Michael Yang
|
2bb2bdd5d4
fix lint
|
1 anno fa |
Jeffrey Morgan
|
08f1e18965
Offload layers to GPU based on new model size estimates (#1850)
|
1 anno fa |
Bruce MacDonald
|
811b1f03c8
deprecate ggml
|
1 anno fa |
Jeffrey Morgan
|
d9a250e9b5
seek to end of file when decoding older model formats
|
1 anno fa |
Jeffrey Morgan
|
944519ed16
seek to eof for older model binaries
|
1 anno fa |
Michael Yang
|
72e7a49aa9
seek instead of copyn
|
1 anno fa |
Michael Yang
|
2cb0fa7d40
split from into one or more models
|
1 anno fa |
Michael Yang
|
b2816bca67
unnecessary ReadSeeker for DecodeGGML
|
1 anno fa |
Michael Yang
|
125d0a013a
ggufv3
|
1 anno fa |
Michael Yang
|
c02c0cd483
starcoder
|
1 anno fa |
Bruce MacDonald
|
86279f4ae3
unbound max num gpu layers (#591)
|
1 anno fa |
Bruce MacDonald
|
4cba75efc5
remove tmp directories created by previous servers (#559)
|
1 anno fa |
Bruce MacDonald
|
66003e1d05
subprocess improvements (#524)
|
1 anno fa |
Bruce MacDonald
|
2540c9181c
support for packaging in multiple cuda runners (#509)
|
1 anno fa |
Michael Yang
|
7dee25a07f
fix falcon decode
|
1 anno fa |
Bruce MacDonald
|
09dd2aeff9
GGUF support (#441)
|
1 anno fa |
Michael Yang
|
b1cececb8e
add 34b model type
|
1 anno fa |
Michael Yang
|
a894cc792d
model and file type as strings
|
1 anno fa |
Michael Yang
|
6ed991c8e2
ggml: fix off by one error
|
1 anno fa |
Michael Yang
|
fccf8d179f
partial decode ggml bin for more info
|
1 anno fa |