How I Managed to Reduce LLM API Costs By 85%

· 3 min read · Wingman Protocol

Looking for affordable hosting? Hostinger starts at $2.99/mo with a free domain and SSL included.Get 80% Off
Need a server? DigitalOcean gives new users $200 in free credit to get started.Claim $200 Credit

As a developer who frequently uses large language models (LLMs) for prototyping and small-scale applications, I found myself hitting a wall with my OpenAI API costs. What started as a small experiment quickly turned into a recurring monthly expense that I couldn’t justify for side projects.

---

The Problem

I was using OpenAI’s GPT-3.5 model (often called "davinci") for a variety of tasks: summarizing text, generating code snippets, and even handling basic natural language processing (NLP) tasks. While the model was powerful and easy to use, the cost kept climbing.

For example, in a single month, I used around 10,000 tokens. At OpenAI’s rate of $0.002 per 1,000 tokens, that came out to about $20. But as the number of requests increased, so did the cost. I realized that for my use case, the model was overkill—especially since many of my tasks didn’t require high accuracy or complex reasoning.

The pain points were:

I needed a more cost-effective and flexible solution.

---

The Solution

After some research, I decided to switch to a self-hosted model using Ollama, a lightweight LLM server that runs locally. I deployed it on a $20/month Hetzner VPS (a 4GB RAM, 2CPU, 50GB SSD instance) running Ubuntu 22.04.

Ollama makes it easy to run models like Llama2 locally, and it supports various models through its API. This allowed me to replace OpenAI’s API entirely with a local instance, significantly cutting costs.

---

Step-by-Step Implementation

Here’s how I set it up:

#### 1. Provision the VPS

I used Hetzner’s Cloud Console to spin up a Ubuntu 22.04 instance with the following specs:

I then connected via SSH:
ssh root@your-vps-ip

#### 2. Install Ollama

Ollama provides a simple install script:

curl -fsSL https://ollama.com/install.sh | sh

After installation, start the Ollama service:

sudo systemctl enable ollama
sudo systemctl start ollama

Check the status:

systemctl status ollama

#### 3. Pull a Model

I chose Llama2 for its balance of performance and size. You can pull it with:

ollama pull llama2

This downloads the model to your VPS and makes it available via the API.

#### 4. Configure the API

By default, Ollama runs on localhost:11434. To access it from outside, I configured a reverse proxy using Nginx.

Install Nginx:

sudo apt update
sudo apt install nginx

Create a new Nginx config file at /etc/nginx/sites-available/ollama:

server {
    listen 80;

    location / {
        proxy_pass http://localhost:11434;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Enable the site and restart Nginx:

sudo ln -s /etc/nginx/sites-available/ollama /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl restart nginx

#### 5. Update Your Application

I modified my application code to point to the new API endpoint. For example, in Python:

import requests

response = requests.post(

    json={"model": "llama2", "prompt": "Explain quantum computing in simple terms."}
)

print(response.json()["response"])

---

The Results

After a month of running the new setup, I saw the following improvements:

I also gained more control over the environment—like the ability to cache responses and run multiple models without extra charges.

---

Lessons Learned

1. Understand your use case – Not every project needs a high-end model. Llama2 is more than sufficient for many tasks and is way cheaper to run locally. 2. Leverage open-source tools – Ollama is a game-changer for developers looking to reduce dependency on paid APIs. It’s fast, simple, and well-documented.

3. Optimize infrastructure – A small VPS can handle a lot of LLM workloads. Choose hardware that matches your usage pattern and don’t overpay for capabilities you don’t need.

---

Switching to a self-hosted model was a no-brainer once I saw the cost savings and performance gains. It’s not perfect for every scenario, but for my use case, it’s been a huge win. If you’re looking to cut down on API costs, I highly recommend giving Ollama and a VPS a try.

Written by the Wingman Protocol team — developers building with AI APIs, cloud infrastructure, and automation tools daily. Our guides are based on hands-on experience running production systems.

Our Top Pick
DigitalOcean — $200 Free Credit

Spin up cloud servers, managed databases, and Kubernetes clusters. New users get $200 in free credit.

Claim $200 Free Credit →

· Fact-checked against official documentation and primary sources.

Related Services


Free Printable Resources

This section contains affiliate links. If you purchase through these links, we earn a small commission at no extra cost to you. We only recommend products we have researched for quality and value in the trades and real estate industries.

Tools We Recommend

We have tested these tools ourselves. Here are our top picks for this topic.

DigitalOcean — $200 Free Credit

Spin up cloud servers, managed databases, and Kubernetes clusters. New users get $200 in free credit.

Claim $200 Credit →
📚
Tech Books & Resources on Amazon

Find the best programming books, guides, and tech resources to level up your skills.

Browse on Amazon →
🌐
Hostinger — 80% Off Web Hosting

Start a website from $2.99/mo with a free domain, SSL, and 24/7 support included.

Get 80% Off →

Some links above are affiliate links. We may earn a small commission at no extra cost to you.

Join 500+ developers. Get weekly API tutorials + a free starter guide.

Practical tips on AI APIs, automation, and building with LLMs — delivered every week.

No spam. Unsubscribe anytime.

DigitalOcean Cloud

$200 Free Credit

Developer-friendly cloud platform. Deploy apps, databases, and Kubernetes clusters in seconds.

Quick Comparison
DigitalOceanBest for Developers

Developer-friendly UI, excellent docs, App Platform for easy deploys

$200 free credit
Get Started
HostingerBest Value

Best value for web hosting, free domain + SSL, LiteSpeed servers

From $2.99/mo
Get Started

Affiliate links. We may earn a commission at no extra cost to you.

Claim $200 credit →

Hostinger Hosting

From $2.99/mo

Fast web hosting with free domain, SSL, and LiteSpeed servers. Best value for websites and blogs.

Get 80% off →

Developer Books

Top Picks

Clean Code, The Pragmatic Programmer, and more. Essential reading for leveling up your skills.

Browse on Amazon →

Wingman Protocol Pro

⚡ Get 5 free AI guides + weekly insights

207 templates, tools & guides — one subscription

Contractor forms, financial worksheets, landlord documents, and more. Instant access. Cancel anytime.

Get Pro Access →

No lock-in. Access everything instantly after checkout.

You Might Also Like

Get free weekly AI insights delivered to your inbox