Michael Yang 3f0ed03856 Update examples/flyio/README.md | 8 місяців тому | |
---|---|---|
.. | ||
.gitignore | 1 рік тому | |
README.md | 8 місяців тому |
Note: this example exposes a public endpoint and does not configure authentication. Use with care.
Login to Fly.io
fly auth login
Create a new Fly app
fly launch --name <name> --image ollama/ollama --internal-port 11434 --vm-size shared-cpu-8x --now
Pull and run orca-mini:3b
OLLAMA_HOST=https://<name>.fly.dev ollama run orca-mini:3b
shared-cpu-8x
is a free-tier eligible machine type. For better performance, switch to a performance
or dedicated
machine type or attach a GPU for hardware acceleration (see below).
By default Fly Machines use ephemeral storage which is problematic if you want to use the same model across restarts without pulling it again. Create and attach a persistent volume to store the downloaded models:
Create the Fly Volume
fly volume create ollama
Update fly.toml
and add [mounts]
[mounts]
source = "ollama"
destination = "/mnt/ollama/models"
Update fly.toml
and add [env]
[env]
OLLAMA_MODELS = "/mnt/ollama/models"
Deploy your app
fly deploy
Fly.io GPU is currently in waitlist. Sign up for the waitlist: https://fly.io/gpu
Once you've been accepted, create the app with the additional flags --vm-gpu-kind a100-pcie-40gb
or --vm-gpu-kind a100-pcie-80gb
.