Most agent codebases I have read collapse two distinct ideas into one folder called memory/. Sometimes it is called state/, sometimes context/, but the symptom is the same: a single store that holds everything the agent has ever seen, retrieved, written, or guessed.
This works until it doesn't. The two ideas want different lifecycles, different schemas, and different reads.
The two ideas
Agent memory is what the agent learned about a user, a workflow, or a goal. It is shaped by the agent's behaviour and is meant to influence future runs. It is small, structured, and read by the model when planning.
Agentic storage is the data the agent operates on. Files, documents, transactions, search indices. It is shaped by the world and read by tools, not the model directly.
A coding agent's memory is "this user prefers TypeScript over JavaScript and wants tests in the same file." Its storage is the actual repository.
A trading agent's memory is "this user has rejected high-slippage routes three times this week." Its storage is the order book, position state, and price feeds.
Mixing the two creates predictable bugs.
What goes wrong when they are mixed
Three failure modes show up consistently.
- The memory store grows unbounded. When everything goes in, nothing leaves. The model sees too much, planning gets slower, and embeddings become noise.
- Eviction policies stop making sense. Document chunks should expire on different rules than user preferences. One TTL per store is the wrong abstraction.
- Privacy boundaries blur. Storage is often per-tenant. Memory is often per-user. The same row can mean both, and that is how data leaks across contexts.
A simple separation
The fix is unglamorous. Two stores, two schemas, two access patterns.
Memory is a small typed table per user, written by the agent and read by the planner. It is inspectable, editable by the user, and small enough to not need vector search.
Storage is whatever your tools need: a vector index for documents, a relational DB for transactions, a file system for code. It is read by tools, not the planner.
The planner can ask storage for facts through tools. It does not browse storage directly.
Knock-on benefits
Once the line is drawn, three things get easier.
- Evals. Memory becomes testable. You can fix the agent's memory state, run a planner step, and assert on the plan. Storage stays out of the test.
- Debugging. Traces show which memory entries shaped the plan. Storage queries appear as tool calls, with their own logs.
- User trust. Users can read their memory and edit it. They cannot meaningfully read your vector index, and they should not need to.
When the line bends
Some agents need the model to reason over storage directly. Long-context retrieval, code review, document analysis. In those cases the model is consuming storage as input, not writing memory through it. The store is read-only at planning time, and the read goes through a tool.
Memory still stays small, structured, and per-user.
The shorter version
Memory is what the agent knows about you. Storage is what the agent operates on. They want different schemas, different lifecycles, and different access. Building them as one store costs more than separating them.