NexusAI
Gateway
A corporate AI gateway built with Spring AI. Routes LLM requests through a full secure pipeline — DLP masking, RAG injection, multi-tenancy, and SHA-256 audit logs. Switch between Ollama and OpenAI with a single env var.
Request flow
+ 3 more stages
7-Stage Security Pipeline
Every request passes through the full pipeline before reaching the LLM. No sensitive data ever reaches the model unmasked.
Spring Security validates JWT. Tenant extracted and stored in ThreadLocal.
Bucket4j enforces per-tenant limits based on their subscription plan.
Regex patterns replace CPF, email, phone, and credit cards before the prompt reaches the LLM.
Caffeine in-memory cache returns identical responses instantly (skipped if RAG is enabled).
pgvector similarity search on uploaded documents injects relevant context into the prompt.
Spring AI abstracts the provider. Switch between Ollama and OpenAI via a single env var.
SHA-256 hash of the prompt stored asynchronously. Raw text never persisted.
JWT Auth
1/7Spring Security validates JWT. Tenant extracted and stored in ThreadLocal.
Multi-tenancy
Row-level isolation via tenant_id
Every registered company gets its own tenant_id. All tables filter by it. The tenant is extracted from the JWT and stored in a ThreadLocal for the duration of each request.
Ollama
AI_PROVIDER=ollama · runs locally
OpenAI
AI_PROVIDER=openai · env switch
DLP Masking
Four regex patterns applied in order before the prompt reaches the LLM. Credit card runs first to avoid partial matches from the phone regex.
Input
Customer CPF 123.456.789-00 wants a refund
After masking
Customer CPF [CPF_REDACTED] wants a refund
Input
Card 4111 1111 1111 1111 was declined
After masking
Card [CARD_REDACTED] was declined
Input
Contact user@company.com for details
After masking
Contact [EMAIL_REDACTED] for details
Input
Call (11) 98765-4321 to confirm
After masking
Call [PHONE_REDACTED] to confirm
SHA-256 Audit Logs
The audit log stores a SHA-256 hash of the (already masked) prompt — never the raw text. This preserves auditability without creating a data liability. Token counts and latency are also recorded for billing and performance tracking.
Key Concepts
Chat Pipeline — Full Flow
DLP masks sensitive data, RAG injects document context when enabled, Spring AI calls the LLM, and audit logs SHA-256 hashes asynchronously — never raw text.
DLP — Regex Masking
Credit card pattern runs first to avoid partial phone matches. CPF, email, and phone patterns applied in sequence before the LLM sees the prompt.
RAG — pgvector Search
JdbcTemplate queries pgvector for similar document chunks per tenant. Hibernate 6 lacks native vector support — raw SQL is the correct tool here.
Tech Stack
Engineering Insights
Spring AI abstractions are genuinely useful
Switching between Ollama and OpenAI is a single env var change. The ChatClient abstraction handles the rest.
pgvector + JdbcTemplate is the right call
Hibernate 6 doesn't support custom vector types natively — raw SQL via JdbcTemplate wasn't a workaround, it was the correct tool.
Filter double-registration gotcha
@Component filters auto-register as servlet filters AND inside the Spring Security chain unless you explicitly disable one.
SHA-256 for audit logs: less obvious than it sounds
Storing raw prompts creates a data liability. Hashing them preserves auditability without keeping the sensitive content.