AI / LLM System

AI Agent with Local LLM

Built an AI agent using a locally hosted LLM to interact with the internal helpdesk system.

Next.js

FastAPI

LangGraph

LangChain

vLLM

Docker

BM25

Vector Search

Overview

An AI agent that can answer questions about helpdesk tickets, look up asset information, and perform basic actions in the helpdesk system — all running on local hardware with no external API calls.

Architecture

The system has three main layers:

Frontend — A Next.js chat interface where users type queries in natural language.

Agent — A LangGraph-based agent orchestrating multi-step reasoning. It decides which tools to call, handles tool output, and composes a final answer.

Retrieval — A hybrid retrieval system combining BM25 keyword search and vector similarity search. BM25 handles exact term matches well (ticket IDs, names); vector search handles semantic similarity. Results are fused before being passed to the model.

What I Built

Hosted a fine-tuned model with vLLM for inference, exposing an OpenAI-compatible API
Built MCP (Model Context Protocol) connectors to give the agent access to live helpdesk data
Implemented long-term memory using a summarization approach: older conversation turns are condensed and stored, keeping the context window manageable
Added a context window budget manager to prevent token overflow on long sessions
Fine-tuned a base model on internal helpdesk examples to improve domain-specific response quality
Containerized all services with Docker Compose for consistent local deployment

Why Local?

Privacy requirements made external APIs a non-starter for this use case. Running everything locally also eliminated per-token costs and allowed fine-tuning on internal data without sending it to a third party.