Using ngrok with Ollama
Ollama is a locally deployed AI model runner, designed to allow users to download and execute large language models (LLMs) locally on your machine. A perfect pairing for ngrok. By combining Ollama with ngrok, you can give your local Ollama an endpoint on the internet, enabling remote access and integration with other applications.
What you'll need
- The ngrok agent installed on your system
- An Ollama instance running on the same system
- An ngrok account
1. Connect the agent to your ngrok account
Sign up for an account at ngrok.com and run:
Loading…
2. Install and run Ollama on your machine
Download Ollama by following the instructions on the Ollama website and search for a model you'd like to use.
Pull the model you've chosen to your instance:
Loading…
Start the Ollama server:
Loading…
By default, Ollama will start on http://localhost:11434
.
3. Create an endpoint for your Ollama server
In a new terminal window, start an ngrok tunnel to your local Ollama port:
Loading…
ngrok will generate a public forwarding URL like https://abcd1234.ngrok.app
.
This URL now provides public access to your local Ollama instance.
4. Use your Ollama instance from anywhere
You can now send requests to your Ollama server from anywhere using the ngrok URL.
For example, run the curl
command below, replacing abcd1234.ngrok.app
with your domain name and gemma3
with a model you've pulled, to prompt your LLM.
Loading…
Last thing, you now have a public endpoint for your Ollama instance, which means anyone on the internet could find it and use it.
5. Protect your Ollama instance with basic auth
You may not want everyone to be able to access your LLM. ngrok can quickly add authentication to your LLM without any changes. Explore Traffic Policy to understand all the ways ngrok can protect your endpoint.
Create a new traffic-policy.yml
file and paste in the policy below, which uses the basic-auth
Traffic Policy action to only allow visitors with the credentials user:password1
or admin:password2
to access your app.
Loading…
Start the agent again with the --traffic-policy-file
flag:
Loading…
You can test your traffic policy by sending the same LLM prompt to Ollama's API with the Authorization: Basic
header and a base64 encoded version of user:password1
.
Loading…
If you send the same request without the Authorization
header, you should receive a 401 Unauthorized
response.
Your personal LLM is now locked down to only accept authenticated users.