System Integrity & Optimizations
Go Memory initially started out as a pet project, but upon realizing its usefulness, it was decided that it should be made open source. Bringing anything good to the public requires a lot of thought on how to optimize the system, ensure data integrity, make sure it doesn't break under load, and conserve resources to ensure that important compute power is not wasted.
In Go Memory, we have made a lot of these integrations and additions to ensure the integrity of the system while making sure that it is highly optimized.
In this section, we discuss the aforementioned challenges and how Go Memory handles the other systems and functions integral to a production-ready memory layer.
1. User Account Creation
It is absolutely essential for a user to first sign up via the user endpoint. Currently, in the latest release, since Go Memory is designed to be self-hosted, we haven't added explicit authentication using email or phone numbers, but that is planned for the foreseeable future.
Users (or developers, for now) can simply hit the /create/user API endpoint or use the create_user function in the Python SDK.
The Account Creation endpoint creates an empty memory list in Redis for the user, setting the key as the userId (which is returned to the client). This means that if a request queries for memories using a userId that doesn't have a corresponding key in Redis, it triggers an error which is then returned to the client.
2. Request Creation and Status Polling
Memory jobs are extremely important and also compute-intensive. One memory job (if actual memory insertion is required) necessitates two LLM API calls, at least one embedding generation request, and potentially more depending on how many inserts there are. It takes around 30-35 seconds for a full memory job to be completed.
Given this nature, we cannot afford to:
- Lose a memory job.
- Redo the same memory job twice.
Sometimes, because of network jitter, server reloads, or connection errors, the same POST request for the same memory job can be sent to the Go Memory server. In this case, we need to make sure that the memory job is processed exactly once and is not lost.
To ensure that memory jobs are not lost even during server outages or crashes, we use NATS-JetStream for reliability. It saves the pending memory jobs on disk, which can then be safely processed when the server comes back online.
To handle the second challenge—ensuring that the same memory job is not processed twice—we create a hash of the memory insertion request and the userId. We then insert a hashmap in Redis with the key set as the hash of the request and its status. We also set its TTL (Time To Live) to 24 hours.
Here is the Request Status object structure:
{
"request_id": "string",
"status": "pending | processing | success | failure"
}
Users can poll the GET /get_status/{id} endpoint (where the id is the requestId) to know the status of the request. All statuses exist in Redis for 24 hours.
- If a request has a status of
Failure, the user can retry their memory request immediately. - If a user retries the exact same request—and it has a status other than
Failurewhile its TTL hasn't expired—a duplicate error will be triggered.
As a second layer of defense, simply utilizing what NATS has to offer, the nats-msg-Id of each memory job is the requestId. NATS has a rolling window of 2 minutes, so any duplicate memory requests wouldn't even be entertained by NATS itself.
3. Prompt-Engineering Optimization
Memory IDs in the database are stored as unique UUIDs. However, when performing the continual memory updation protocol, the memory lists provided to the LLM use simple integer IDs.
This is done specifically to prevent hallucinations in LLMs when they have to generate target_memory_ids into the JSON response schema. Outputting long, random UUIDs correctly is much harder for the LLM to get right compared to simple integer IDs.
During memory fetching and prompt injection, the UUIDs are mapped to integer IDs starting from 1. Once the JSON response is generated by the LLM, these integer IDs are mapped back to their respective UUIDs before they are processed further for deletion in Qdrant.