The 'Llama as a Service' infrastructure is designed to provide AO users with a fully decentralized LLM inference environment that is easy to use. Send one simple AO message to the Llama herder and you will receive a response from one of the herded inference workers.
In the background, Llama-Herder
offers the services of AOS-Llama, a port of Llama.cpp. AOS-Llama allows users to execute Meta's Llama models, Microsoft's Phi models, amongst many others in AO's