No need to use Ollama. LLama.cpp has its own OpenAI-compatible server[0] and it ...

citizenpaul · on Oct 31, 2024

Thanks didn't know that.

Do you happen to know the reason to use ollama rather than the built in server? How much work is required to get similar functionality? looks like just downloading the models? I find it odd that ollama took off so quickly if LLamma.cpp had the same built in functionality.

PhilippGille · on Oct 30, 2024

Yes I'm aware. I was contrasting the general use of an inference server vs calling llama.cpp directly (not via HTTP request).

And among servers Ollama seems to be more popular, so it's worth mentioning when talking about support for local LLMs.