NexusAI.
AI Infrastructure

NexusAI
Gateway

A corporate AI gateway built with Spring AI. Routes LLM requests through a full secure pipeline — DLP masking, RAG injection, multi-tenancy, and SHA-256 audit logs. Switch between Ollama and OpenAI with a single env var.

Spring AI 1.0pgvector RAGDLP MaskingMulti-tenant
Requests processed1,284
View Source

Request flow

JWT Auth
Rate Limiter
DLP Masking
Cache Check

+ 3 more stages

Request Flow

7-Stage Security Pipeline

Every request passes through the full pipeline before reaching the LLM. No sensitive data ever reaches the model unmasked.

JWT Authprocessing

Spring Security validates JWT. Tenant extracted and stored in ThreadLocal.

Rate Limiter

Bucket4j enforces per-tenant limits based on their subscription plan.

DLP Masking

Regex patterns replace CPF, email, phone, and credit cards before the prompt reaches the LLM.

Cache Check

Caffeine in-memory cache returns identical responses instantly (skipped if RAG is enabled).

RAG Injection

pgvector similarity search on uploaded documents injects relevant context into the prompt.

LLM Call

Spring AI abstracts the provider. Switch between Ollama and OpenAI via a single env var.

Async Audit

SHA-256 hash of the prompt stored asynchronously. Raw text never persisted.

JWT Auth

1/7

Spring Security validates JWT. Tenant extracted and stored in ThreadLocal.

Multi-tenancy

Row-level isolation via tenant_id

Every registered company gets its own tenant_id. All tables filter by it. The tenant is extracted from the JWT and stored in a ThreadLocal for the duration of each request.

Ollama

AI_PROVIDER=ollama · runs locally

OpenAI

AI_PROVIDER=openai · env switch

Data Privacy

DLP Masking

Four regex patterns applied in order before the prompt reaches the LLM. Credit card runs first to avoid partial matches from the phone regex.

Brazilian CPFmasked

Input

Customer CPF 123.456.789-00 wants a refund

After masking

Customer CPF [CPF_REDACTED] wants a refund

Credit Card

Input

Card 4111 1111 1111 1111 was declined

After masking

Card [CARD_REDACTED] was declined

Email Address

Input

Contact user@company.com for details

After masking

Contact [EMAIL_REDACTED] for details

Brazilian Phone

Input

Call (11) 98765-4321 to confirm

After masking

Call [PHONE_REDACTED] to confirm

SHA-256 Audit Logs

The audit log stores a SHA-256 hash of the (already masked) prompt — never the raw text. This preserves auditability without creating a data liability. Token counts and latency are also recorded for billing and performance tracking.

Architecture

Key Concepts

Chat Pipeline — Full Flow

DLP masks sensitive data, RAG injects document context when enabled, Spring AI calls the LLM, and audit logs SHA-256 hashes asynchronously — never raw text.

DLP — Regex Masking

Credit card pattern runs first to avoid partial phone matches. CPF, email, and phone patterns applied in sequence before the LLM sees the prompt.

RAG — pgvector Search

JdbcTemplate queries pgvector for similar document chunks per tenant. Hibernate 6 lacks native vector support — raw SQL is the correct tool here.

Tech Stack

Java 21Spring Boot 3.2Spring AI 1.0pgvectorPostgreSQL 16Flyway 10Spring Security 6Caffeine CacheBucket4jOllamaOpenAITestcontainers

Engineering Insights

Spring AI abstractions are genuinely useful

Switching between Ollama and OpenAI is a single env var change. The ChatClient abstraction handles the rest.

pgvector + JdbcTemplate is the right call

Hibernate 6 doesn't support custom vector types natively — raw SQL via JdbcTemplate wasn't a workaround, it was the correct tool.

Filter double-registration gotcha

@Component filters auto-register as servlet filters AND inside the Spring Security chain unless you explicitly disable one.

SHA-256 for audit logs: less obvious than it sounds

Storing raw prompts creates a data liability. Hashing them preserves auditability without keeping the sensitive content.