Michael Yang
|
2cb0fa7d40
split from into one or more models
|
1 year ago |
Michael Yang
|
199941cd15
fix: gguf int type
|
1 year ago |
Michael Yang
|
c5e1bbabda
instead of static number of parameters for each model family, get the real number from the tensors (#1022)
|
1 year ago |
Michael Yang
|
125d0a013a
ggufv3
|
1 year ago |
Michael Yang
|
c02c0cd483
starcoder
|
1 year ago |
Bruce MacDonald
|
86279f4ae3
unbound max num gpu layers (#591)
|
1 year ago |
Bruce MacDonald
|
4cba75efc5
remove tmp directories created by previous servers (#559)
|
1 year ago |
Bruce MacDonald
|
66003e1d05
subprocess improvements (#524)
|
1 year ago |
Bruce MacDonald
|
2540c9181c
support for packaging in multiple cuda runners (#509)
|
1 year ago |
Michael Yang
|
0c5a454361
fix model type for 70b
|
1 year ago |
Michael Yang
|
7dee25a07f
fix falcon decode
|
1 year ago |
Bruce MacDonald
|
09dd2aeff9
GGUF support (#441)
|
1 year ago |