Why
Running LLMs locally can be useful for a number of reasons:
- Privacy: You don’t have to send your data to a server
- Speed: You don’t have to wait for the server to respond
- Faster than searching: The model has the information locally
- Offline: You can use it without an internet connection, you can access the information without internet
- Different models: You can use models that are not available online
Why not
Running LLMs locally can be resource intensive, especially for larger models, if you don’t have a good enough GPU it might be slow (for larger models).
Installation
Download from https://ollama.com/download/↗
For Linux it is
curl -fsSL https://ollama.com/install.sh | sh
Models
You can get your models from their library↗
Pull the image with:
ollama pull <name>
I recommend:
- phi3↗, a lightweight model
- llama3.1↗, Llama 3.1 is a state-of-the-art model from Meta
- llama2-uncensored↗, Uncensored Llama 2 model by George Sung and Jarrad Hope.
Note that you’d need a beefer system for the larger models.
Usage
(Pull and) run the image with:
ollama run <name> [prompt]
For more options:
ollama --help
API
If you want the ollama API to be accessable to other systems on your network you’d need add this to the [Service]
part of the config file /etc/systemd/system/ollama.service
:
Environment="OLLAMA_HOST=0.0.0.0:11434"
You can checkout the API documentation here↗.
UIs
You can use UIs for a more user friendly experience:
Uninstall
sudo systemctl stop ollama
sudo systemctl disable ollama
sudo rm /etc/systemd/system/ollama.service
sudo rm $(which ollama)
sudo rm -r /usr/share/ollama
sudo userdel ollama
sudo groupdel ollama