Architectural Breakdown | Go Memory Docs

📄️ Dual Memory System

Why Go Memory has a dual memory architecture

📄️ Storage Architecture

We use Redis for storing Core Memories and Qdrant (a vector database) for storing General Memories.

A lot of thought went into engineering a robust memory insertion process because each memory insert for a user can contain information that might contradict an existing memory of the user. This memory pruning and updation protocol is at the heart of what makes Go Memory so effective.

📄️ Memory Retrieval Process (sub-100ms CPU, sub-50ms GPU)

One of the most competitive advantages of using Go Memory is that the memory retrieval time is the fastest for any memory layer in the business that maintains an acceptable memory retrieval quality. The memory retrieval quality of Go Memory rivals that of mem0 while blowing them out of the water in terms of speed. The accuracy and quality are expected to bump up even further and beat mem0 in v2 of Go Memory, which will introduce concurrent scraping... allowing longer messages to be used for memory retrieval.

📄️ Embedding Microservice

Embedding generation is an extremely core/fundamental operation to any RAG pipeline, and Go Memory is no exception to this. With some amount of pain, we discovered that having a Go monolith for Go Memory, which would have embedding generation coupled into it as well, would be very unreasonable, architecturally unstable, and sub-optimal.

📄️ How Go Memory Saves You API Tokens

At their core, Large Language Models are stateless. To make an AI agent feel "smart" and personalized, developers historically resorted to the easiest method available: dumping the entire conversation history into the context window for every single prompt.

📄️ System Integrity & Optimizations

Go Memory initially started out as a pet project, but upon realizing its usefulness, it was decided that it should be made open source. Bringing anything good to the public requires a lot of thought on how to optimize the system, ensure data integrity, make sure it doesn't break under load, and conserve resources to ensure that important compute power is not wasted.