@victormustar
š llama.cpp now has Ollama-style model management. ⢠Auto-discover GGUFs from cache ⢠Load on first request ⢠Each model runs in its own process ⢠Route by `model` (OpenAI-compatible API) ⢠LRU unload at `--models-max` https://t.co/yfmfHL7zzj