@_philschmid
Deploy your local GGUF models to the cloud with just one click. 🤯 Excited to share @huggingface Inference Endpoints now natively supports llama.cpp, enabling one-click deployment of your local models to the cloud (AWS/Azure/GCP) with an @OpenAI-compatible endpoint. 🤯 TL;DR: 💡 Optimized llama.cpp container for Hugging Face Inference Endpoints 🦙 Supports all popular open Models in GGUF format, like @AIatMeta Llama, @GoogleDeepMind Gemma, @MistralAI …. 📈 Seamless transition from local to cloud deployment 🛠️ OpenAI-compatible endpoint for easy integration 📚 Multi-cloud support (@awscloud, @Azure, @googlecloud) using GPUs 💰 Llama.cpp team directly benefits from deployments We're actively collaborating with @ggerganov and the llama.cpp team to improve this functionality. In the future, expect more features, broader hardware support, and improved performance. 🤝