键值存储

概述和用例

搜索引擎的键值存储缓存查询结果，以改善响应时间并减少后端服务的负载。热门的搜索可以直接从缓存中提供，而不是从头开始处理每个搜索查询。

图：带键值缓存的搜索引擎查询流程

来源：solutions/system_design/query_cache/README.md61-69 solutions/system_design/query_cache/README.md81-91

主要限制和要求

对于搜索引擎查询缓存，我们需要考虑：

快速查找：搜索查询必须快速提供服务，最好在毫秒级完成
有限内存：我们无法存储所有可能的查询结果
非均匀访问模式：热门查询几乎总是应该被缓存
缓存陈旧性：随着网页内容的变化，结果会过时
可伸缩性：系统需要处理数百万个独立查询

来源：solutions/system_design/query_cache/README.md24-33

核心组件

缓存数据结构

键值存储实现了最近最少使用（LRU）缓存，以有效地管理有限的内存资源。当达到内存限制时，此淘汰策略可确保从缓存中移除很少访问的查询。

图：LRU缓存实现组件

来源：solutions/system_design/query_cache/query_cache_snippets.py25-90 solutions/system_design/query_cache/README.md92-152

LRU缓存实现的关键组件有：

节点（Node）：表示单个缓存条目，包含查询字符串及其搜索结果
链表（LinkedList）：根据访问时间维护缓存条目的顺序
缓存（Cache）：实现键值存储和查找操作的主要类

该缓存使用哈希表进行O(1)查找，并使用双向链表管理LRU淘汰策略。当缓存达到容量时，最近最少使用的项目（位于链表尾部）将被移除。

查询处理

QueryApi类处理查询并与缓存交互。

图：查询处理流程

来源：solutions/system_design/query_cache/query_cache_snippets.py4-22 solutions/system_design/query_cache/README.md67-91

实现细节

缓存操作

缓存实现了两个主要操作：

get(query)：检索查询结果并将条目移动到LRU列表的前面
set(query, results)：在缓存中添加或更新查询-结果对

Cache.get(query):
1. Look up the query in the hash table
2. If found, move the node to the front of the linked list (mark as recently used)
3. Return the associated results or None if not found

Cache.set(query, results):
1. If the query exists in the cache, update its results and move to front of list
2. If the query is new:
   a. If cache is at capacity, remove the least recently used item (from tail)
   b. Create a new node with the query and results
   c. Add the node to the front of the linked list
   d. Add the query-node mapping to the hash table

来源：solutions/system_design/query_cache/query_cache_snippets.py56-90 solutions/system_design/query_cache/README.md152-196

缓存更新策略

当底层数据发生变化时，缓存需要刷新。常见的更新策略包括：

策略	描述	优点	缺点
生存时间（TTL）	缓存条目在设定时间后过期	易于实现	可能在过期前返回陈旧数据
旁路缓存（Cache-aside）	当数据更改时，应用程序更新缓存	应用程序控制新鲜度	应用程序逻辑更复杂
直写式缓存（Write-through）	数据库更新时缓存也更新	保证数据一致性	写操作较慢
回写式缓存（Write-behind）	更新排队并异步处理	更好的写入性能	临时数据不一致