Bodhi LogoBodhi
DocsGitHubDiscord

Home

IntroductionInstallation
Concepts
OverviewDeployment ModesModels, Aliases, and FilesAPI CompatibilityAuth and RolesMCP Overview
Features
ChatModelsSettingsMCPsAuth
Deployment
Deployment OverviewDesktop (Tauri)DockerReverse Proxy
Developer
Getting StartedBuilding Third-Party AppsBodhi JS SDKBrowser ExtensionApp Access RequestsOpenAPI Reference
API Compatibility
OverviewOpenAI Chat CompletionsOpenAI ResponsesOpenAI EmbeddingsAnthropic MessagesGeminiOllama (deprecated)MCP ProxyError Format
Advanced
ArchitectureSecurity ModelInference StackPerformance TuningObservability
Reference
Environment VariablesSettings PrecedenceRoles and ScopesError CodesGlossary
Support
FAQTroubleshootingWhat's New

In This Section

  • Architecture
  • Security Model
  • Inference Stack
  • Performance Tuning
  • Observability

Home

Advanced

Architecture

How a request travels through Bodhi App: from the wire to the inference engine and back

Security Model

What Bodhi App protects, what it relies on the deployment to provide, and how to harden a self-hosted installation

Inference Stack

How Bodhi App invokes llama.cpp: variants, GGUF resolution, runtime arguments, and the keep-alive timer

Performance Tuning

Choosing variants, quantization, context window, and concurrency to match Bodhi App to your hardware

Observability

Logs, settings introspection, the background queue, and what is honest about today’s observability gaps in Bodhi App