> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nadles.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Billing for AI tokens usage

<Info>
  TL;DR:

  * Nadles natively supports automatic AI and LLM tokens usage billing.
  * All you need is to add one line to billable metric configuration.
  * Nadles Gateway processes the response automatically and records the tokens usage.
  * Nadles Billing aggregates the usage and initiates payments for your customers.
</Info>

## How it works

<Steps>
  <Step title="Create a product">
    This is a plan that your customers will subscribe to.
  </Step>

  <Step title="Add billable metrics">
    For example, "Input tokens", "Output tokens", "Total tokens", etc.
  </Step>

  <Step title="Configure token metering">
    Let Nadles know the response format for each endpoint, such as OpenAI Completions API, OpenAI Responses API, Anthropic APIs, Ollama, Gemini, etc.
  </Step>

  <Step title="Set a price per token">
    For each billable metric.
  </Step>

  <Step title="Set limits on the number of tokens used">
    Set limits on the number of tokens used for each billable metric.
  </Step>

  <Step title="Done">
    You've now set up billing for AI tokens usage. [Set up the user portal](/user-portal/getting-started) to let your customers pay and start using your API.
  </Step>
</Steps>

## Prerequisites

* You've [added an API](/api-management/add-api#adding-a-new-api)

<Frame>
  <img src="https://mintcdn.com/nadles/2ll2yZ-QPfDKqfVS/images/products/llm/llm-add-api.png?fit=max&auto=format&n=2ll2yZ-QPfDKqfVS&q=85&s=13228b1cbe46592868cc564aa6ae6706" style={{maxWidth: "550px", height: "auto", borderRadius: "10px"}} alt="Adding a new API" width="1144" height="476" data-path="images/products/llm/llm-add-api.png" />
</Frame>

* You've added the [LLM endpoint](/api-management/add-api#adding-endpoints)

<Frame>
  <img src="https://mintcdn.com/nadles/2ll2yZ-QPfDKqfVS/images/products/llm/llm-add-endpoint.png?fit=max&auto=format&n=2ll2yZ-QPfDKqfVS&q=85&s=5f88eba14a35037ab7687797f337ce48" alt="Adding an LLM endpoint" width="2370" height="1406" data-path="images/products/llm/llm-add-endpoint.png" />
</Frame>

## Create product

Navigate to **Products** in the left menu and click **Add new product**.
Make sure to select the endpoint you want to bill for.

<Frame caption="Adding a new product">
  <img src="https://mintcdn.com/nadles/2ll2yZ-QPfDKqfVS/images/products/llm/llm-add-product.png?fit=max&auto=format&n=2ll2yZ-QPfDKqfVS&q=85&s=4a2534b2264d1e0f9139f97c3a973dad" alt="Adding a new product" width="2466" height="1058" data-path="images/products/llm/llm-add-product.png" />
</Frame>

## Add billable metrics

Once the product is created, you can add billable metrics to it.
Billable metrics tell Nadles what you want to charge for.
For example, "Input tokens", "Output tokens", "Total tokens", etc.
Navigate to **Billable metrics** in the left menu and click **Add new metric**.

<Frame>
  <img src="https://mintcdn.com/nadles/2ll2yZ-QPfDKqfVS/images/products/llm/llm-add-billable-metric-form.png?fit=max&auto=format&n=2ll2yZ-QPfDKqfVS&q=85&s=ffb4c4aa7d6054c60667b6a632d649f4" alt="Adding billable metrics" width="2404" height="1550" data-path="images/products/llm/llm-add-billable-metric-form.png" />
</Frame>

<Note>
  If you need to later associate more endpoints with the billable metrics for the same product, you can do so by navigating to the **Usage configuration** tab and clicking **Add endpoint**.
</Note>

### Charge for input and output tokens separately

If you want to charge for input and output tokens separately, you can add two billable metrics: **Input tokens** and **Output tokens**.
Make sure to select the endpoint you want to bill for. You should associate the same endpoint with both billable metrics.

<Frame>
  <img src="https://mintcdn.com/nadles/2ll2yZ-QPfDKqfVS/images/products/llm/llm-billable-metric-list.png?fit=max&auto=format&n=2ll2yZ-QPfDKqfVS&q=85&s=b95ae9cbe6e69813ff987fee449e1713" alt="Billable metrics list" width="2444" height="1090" data-path="images/products/llm/llm-billable-metric-list.png" />
</Frame>

Navigate to the **Usage configuration** tab and configure the usage for each billable metric.

Set the **Quantity used by a single call** to:

* `openai_completions_usage().prompt_tokens` for **Input tokens**
* `openai_completions_usage().completion_tokens` for **Output tokens**

This will tell Nadles to record the usage of input and output tokens separately.

<Frame>
  <img src="https://mintcdn.com/nadles/2ll2yZ-QPfDKqfVS/images/products/llm/llm-billable-metric-usage-config.png?fit=max&auto=format&n=2ll2yZ-QPfDKqfVS&q=85&s=18acdb7a7648ec1f94e34f2a2bc77e73" alt="Billable metric usage configuration" width="2406" height="1816" data-path="images/products/llm/llm-billable-metric-usage-config.png" />
</Frame>

<Info>
  The `openai_completions_usage()` function returns the usage of the input and output tokens for the OpenAI Completions API.
  If you are using a different API, you can use the following functions:

  * openai\_responses\_usage
  * anthropic\_messages\_usage
  * ollama\_usage
  * openai\_completions\_usage
  * deepseek\_usage
  * mistral\_usage
  * gemini\_usage

  See full list of functions and their return values [below](#supported-response-formats).
</Info>

### Charge for total tokens

If you want to charge for total tokens, you can add one billable metric: **Total tokens**.
Make sure to select the endpoint you want to bill for.

<Frame>
  <img src="https://mintcdn.com/nadles/2ll2yZ-QPfDKqfVS/images/products/llm/llm-billable-metric-list-total-tokens.png?fit=max&auto=format&n=2ll2yZ-QPfDKqfVS&q=85&s=8ab854c09391733263dbff38b6a869ff" alt="Billable metrics list for total tokens" width="2428" height="1160" data-path="images/products/llm/llm-billable-metric-list-total-tokens.png" />
</Frame>

Navigate to the **Usage configuration** tab and configure the usage for the billable metric.

Set the **Quantity used by a single call** to:

```js theme={null}
openai_completions_usage().prompt_tokens + openai_completions_usage().completion_tokens
```

This will tell Nadles to record the total usage of input and output tokens.

<Info>
  The `openai_completions_usage()` function returns the usage of the input and output tokens for the OpenAI Completions API.
  If you are using a different API, you can use the following functions:

  * openai\_responses\_usage
  * anthropic\_messages\_usage
  * ollama\_usage
  * openai\_completions\_usage
  * deepseek\_usage
  * mistral\_usage
  * gemini\_usage

  See full list of functions and their return values [below](#supported-response-formats).
</Info>

## Add prices

Navigate to the **Prices** in the left menu and click **Add recurring price**.

<Note>
  This is a basic example of a price. You can configure more complex pricing models [here](/pricing/overview).
</Note>

Enter price name (e.g., "Input tokens", "Output tokens", "Total tokens", etc.) and select the billable metric you want to price.

For pay per use, check the checkbox **Usage is metered**. Or for unmetered, don't check the checkbox and set the quantity as a hard limit (e.g., 1000 tokens).

Set the price per token. For convenience, you can also set a price per N tokens, e.g. \$0.5 per 1000 tokens or \$5 per 1M tokens.

Set the billing period (e.g., monthly, yearly, etc.).

Click on **Submit**.

<Frame>
  <img src="https://mintcdn.com/nadles/jB3yFH2iSLY0A40y/images/products/llm/llm-billable-metric-prices.png?fit=max&auto=format&n=jB3yFH2iSLY0A40y&q=85&s=3ae4b35962580bae7d26f091c5fb1dca" alt="Billable metric prices" width="2416" height="1428" data-path="images/products/llm/llm-billable-metric-prices.png" />
</Frame>

## What's next

With this setup, Nadles will meter the usage of tokens, calculate the payment amount based on the actual usage and initiate payments for customers subscribed to this product.

[Set up the user portal](/user-portal/getting-started) to let your customers pay and start using your API.

## Force usage reporting

To make sure usage is reported by the LLM API if the response is streaming, you may want Nadles API Gateway to modify the request body to include the option to report usage.
Use [request transformation](/api-management/transformations#body) to achieve that.

Usually that is done by adding the following to request JSON body:

```json theme={null}
{
    "stream_options": {
        "include_usage": true
    }
}
```

Nadles can add this parameter automatically for you.

Navigate to your API on Nadles and click on **Transformations**.

Click on **Add new request transformation**.

Choose **Target:** `Body`, **Action:** `Replace`, **Value is**: `Jq expression`.

Enter the following expression as a value:

```js theme={null}
.request.body | fromjson | if .stream == true then .stream_options = {"include_usage": true} else . end
```

<Frame>
  <img src="https://mintcdn.com/nadles/2ll2yZ-QPfDKqfVS/images/products/llm/llm-request-body-transformation.png?fit=max&auto=format&n=2ll2yZ-QPfDKqfVS&q=85&s=0f10a2e97758cb370f2c5cb14d8da65e" alt="Force usage reporting" width="2718" height="1782" data-path="images/products/llm/llm-request-body-transformation.png" />
</Frame>

Click on **Submit**.

Now, Nadles will automatically add the `{ stream_options: { include_usage: true }}` parameter to the request body if the response is streaming.

## Supported response formats

Nadles supports the following response formats, both streaming and non-streaming:

### OpenAI Completions

**Used by:** OpenAI Completions API, DeepSeek, Mistral. Also, Ollama supports this format.

**Function:** `openai_completions_usage()`

**Return value:**

```json theme={null}
{
	"prompt_tokens":     100,
	"completion_tokens": 1500,
	"total_tokens":      1600
}
```

### OpenAI Responses

**Used by:** OpenAI Responses API. Also, Ollama supports this format.

**Function:** `openai_responses_usage()`

**Return value:**

```json theme={null}
{
	"input_tokens":          100,
	"input_tokens_details":  { 
        "cached_tokens": 0 
    },
	"output_tokens":         1500,
	"output_tokens_details": { 
        "reasoning_tokens": 0 
    },
	"total_tokens":          1600
}
```

### Anthropic models

**Used by:** Anthropic APIs.

**Function:** `anthropic_messages_usage()`

**Return value:**

```json theme={null}
{
    "input_tokens":  0,
	"output_tokens": 0,
}
```

### Ollama

In addition to OpenAI and Anthropic-compatible formats, Ollama uses its own response structure.
Its streaming responses differ by using NDJSON (newline-delimited JSON) instead of Server-Sent Events (SSE).
To enable Nadles to correctly process Ollama responses, use the following built-in function:

```js theme={null}
ollama_usage()
```

It returns the following object:

```json theme={null}
{
    "prompt_eval_count": 100, // number of input tokens
    "eval_count": 100 // number of output tokens
}
```

### Gemini

**Used by:** Gemini API.

**Function:** `gemini_usage()`

**Return value:**

```json theme={null}
{
	"promptTokenCount":     0,
	"candidatesTokenCount": 0,
	"totalTokenCount":      0,
}
```
