LLM Cost dashboard

Large language models (LLMs) are a key component of AI-powered applications. Therefore, understanding the costs associated with their usage becomes important, as it allows you to optimize resources and manage budgets efficiently. Most LLMs use a token-based pricing system, where the cost depends on the number of tokens generated between the model and the user.

The LLM Cost dashboard helps you track the cost and usage of LLM models. It provides metrics to analyze the following parameters:

Tokens and their cost
Retrieval Augmented Generation (RAG) latency and score
Graphical Processing Unit (GPU) usage

To view the dashboard

From the navigation menu, click Dashboards.
Search for the AIOps Observability folder and select it.
Click LLM Cost.
The dashboard is displayed.

Metrics in the LLM Cost dashboard

LLM Usage

Panel	Description
Total Tokens	Displays the total number of tokens processed by the model during a given operation.
Cost Per Token	Displays the total cost incurred per token for using the LLM during a given operation.
Total Cost	Displays the total cost incurred for using the LLM during a given operation.
Latency	Displays the time required by the LLM to process a request and return a response.
Rag Documents Retrieved	Displays the number of documents that were retrieved while using the Retrieval Augmented Generation (RAG) system with the LLM.
Rag Latency	Displays the latency (response time) of the RAG system while using the LLM.
Rag Relevance Score	Displays a relevance score that indicates how relevant the retrieved information is to the query in the RAG system.
Top 5 GenAI Models by Token Usage	Displays the bar chart that shows the top five models according to token usage.
Latency Trend	Displays the latency trend of the model to process a request and return a response for a selected period.
Avg Token Consumption vs Avg Usage Cost	Displays the comparison of the average number of tokens consumed and the average cost of token usage.
Rag Latency Trend	Displays the latency trend of the RAG system while using the LLM for a selected period.

LLM GPU Usage

Panel	Description
GPU Power Usage	Displays the power usage (in watts) of the Graphical Processing Unit (GPU) at a given moment.
GPU Temperature	Displays the temperature (in degree Celsius) of the GPU.
GPU Memory Used	Displays the GPU memory (in MB) that is currently being used.
CPU Memory Utilization	Displays the percentage usage of CPU memory that is used for data transfers.
GPU Utilization	Displays the percentage usage of GPU at a given moment. This metric indicates how much of the GPU compute resources (cores and processing units) are being utilized for tasks, such as computations, rendering, or machine learning operations.