Back

AI / LLM System

AI Agent with Local LLM

Built an AI agent using a locally hosted LLM to interact with the internal helpdesk system.

Next.js
FastAPI
LangGraph
LangChain
vLLM
Docker
BM25
Vector Search

Overview

An AI agent that can answer questions about helpdesk tickets, look up asset information, and perform basic actions in the helpdesk system — all running on local hardware with no external API calls.

Architecture

The system has three main layers:

Frontend — A Next.js chat interface where users type queries in natural language.

Agent — A LangGraph-based agent orchestrating multi-step reasoning. It decides which tools to call, handles tool output, and composes a final answer.

Retrieval — A hybrid retrieval system combining BM25 keyword search and vector similarity search. BM25 handles exact term matches well (ticket IDs, names); vector search handles semantic similarity. Results are fused before being passed to the model.

What I Built

  • Hosted a fine-tuned model with vLLM for inference, exposing an OpenAI-compatible API
  • Built MCP (Model Context Protocol) connectors to give the agent access to live helpdesk data
  • Implemented long-term memory using a summarization approach: older conversation turns are condensed and stored, keeping the context window manageable
  • Added a context window budget manager to prevent token overflow on long sessions
  • Fine-tuned a base model on internal helpdesk examples to improve domain-specific response quality
  • Containerized all services with Docker Compose for consistent local deployment

Why Local?

Privacy requirements made external APIs a non-starter for this use case. Running everything locally also eliminated per-token costs and allowed fine-tuning on internal data without sending it to a third party.