Buffa DESIGN.md 逐段解读

本文按原文段落顺序，交替呈现 DESIGN.md 的英文原文（引用块）和中文附注（正文）。原文来自 github.com/anthropics/buffa，Apache-2.0 许可。 buffa 由 Anthropic 开源，README 标注 "Written by Claude ❣️"。

文件的真实身份

在开始逐段解读之前，需要先理解这个文件的真实功能。

表面上它是一份技术设计文档。但 buffa 的 CLAUDE.md 中写道：

See [DESIGN.md](DESIGN.md) for the architectural overview.

这意味着每次 Claude Code 启动新会话时，都会读取这个文件。DESIGN.md 的真实身份是 AI agent 的长期记忆层——它补偿了 AI 没有跨会话持久记忆的缺陷，让每次新会话的 Claude 都能"恢复"对整个项目的架构认知。

这解释了为什么这个文件的信息密度远超正常技术文档：它不是写给"偶尔翻阅的人类"的，而是写给"每次从零开始、但需要立即做出正确决策的 AI"的。

标题与开场

Buffa: Design Document

A pure Rust Protocol Buffers implementation with first-class editions support.

附注：一句话定位——"纯 Rust"和"editions 优先"是两个最核心的约束。后续所有设计决策都要在这两个约束下做出。对 AI 来说，这句话的作用是：当面临"要不要引入 C 依赖来加速"或"要不要先支持 proto3 再补 editions"的选择时，答案已经在这里了。

Motivation（动机）

The Rust protobuf ecosystem has a gap:

Library Pure Rust Editions Maintained Unknown Fields Reflection

prost v0.13 Yes No Passive No No

Google protobuf v4 No (upb/C++) Yes Active Yes Yes

rust-protobuf v3 Yes No Maintenance only Yes Yes

quick-protobuf Yes No Low No No

micropb Yes No Active (niche) No No

No actively maintained, pure-Rust protobuf library supports protobuf editions.

Buffa fills this gap: a pure Rust implementation designed from the ground up with editions as the core abstraction.

Library	Pure Rust	Editions	Maintained	Unknown Fields	Reflection
prost v0.13	Yes	No	Passive	No	No
Google protobuf v4	No (upb/C++)	Yes	Active	Yes	Yes
rust-protobuf v3	Yes	No	Maintenance only	Yes	Yes
quick-protobuf	Yes	No	Low	No	No
micropb	Yes	No	Active (niche)	No	No

附注：这张表不是给人看的背景介绍——它是给 AI 的决策锚点。没有这张表，AI 在被要求"调研是否可以 fork prost 来支持 editions"时可能会花大量 token 分析。有了这张表，答案是即时的：prost 被标记为 "Passive" 维护且不支持 editions，fork 不可行。

更微妙的是表头的选择：Pure Rust / Editions / Maintained / Unknown Fields / Reflection。这五个维度恰好是 buffa 的竞争力定位：前两个是"为什么要做"，后三个是"做到什么程度"。

Design Principles（设计原则）

Pure Rust, zero C dependencies. Builds with cargo build, nothing else.

Editions-first. Proto2 and proto3 are understood as feature presets within the editions model, not as separate code paths. The internal model is always editions-based.

Correct by default. Unknown fields are preserved. UTF-8 is validated. Conformance tests pass.

Idiomatic Rust API. Generated code uses plain structs, proper Rust enums, MessageField<T> for singular message fields, and derive the traits you'd expect (Clone, Debug, PartialEq, Default).

Zero-copy read path. Two-tier owned/borrowed model: MyMessage for building and storage, MyMessageView<'a> for zero-copy deserialization.

Linear-time serialization. Cached encoded sizes prevent the exponential blowup that affects prost with deeply nested messages.

no_std capable. The core runtime works without std (requires alloc).

Descriptor-centric. The code generator operates on google.protobuf.FileDescriptorProto — the standard descriptor format that protoc and buf both produce. Buffa does not need its own .proto parser; protoc is the de-facto standard and buf is an ergonomic alternative.

附注：八条原则，每条一行。这是给 AI 的硬约束集，有隐含的优先级排序。

原则 1 > 原则 5：如果零拷贝需要 C 依赖（比如 simd-json），不做。
原则 3 > 原则 6：如果线性时间序列化和正确性冲突（比如某些优化跳过 unknown fields 编码），正确性赢。
原则 4 > 原则 7：如果 idiomatic API 需要 std::collections::HashMap（比如 map 字段），在 std feature 下提供，不为 no_std 牺牲 API。

原则 8 特别值得注意：buffa 不写自己的 .proto 解析器。这是一个主动放弃的能力。对 AI 来说，这意味着"不要提议给 buffa 加 proto 解析功能"——这会违反 descriptor-centric 原则。

Crate Descriptions（包描述）

buffa — Core Runtime

The runtime library that generated code depends on. Contains:

Message trait: The central trait for owned message types, with two-pass compute_size() / write_to() serialization.

MessageView trait: The trait for borrowed/zero-copy message views.

OwnedView<V>: Self-referential container that pairs a Bytes buffer with a decoded view, producing a 'static + Send + Sync type suitable for async and RPC frameworks.

MessageField<T>: Ergonomic wrapper for optional message fields that dereferences to a default instance when unset.

CachedSize: Per-message cached encoded size for linear-time serialization.

EnumValue<T>: Type-safe wrapper for open enum fields that preserves unknown values.

Wire format codec: Varint, fixed-width, length-delimited, and group encoding/decoding using bytes::{Buf, BufMut}.

Unknown field storage: Preserves unknown fields for round-trip fidelity.

Edition feature types: Rust types representing edition features (FieldPresence, EnumType, RepeatedFieldEncoding, etc.) used by generated code and runtime logic.

The runtime is no_std + alloc by default, with an optional std feature for std::io integration.

附注：这段列表的目的是告诉 AI "buffa 运行时包含哪些类型"，但更重要的是排除了什么：没有 codegen、没有 proto 解析、没有 RPC。当 AI 被要求"在 buffa crate 中添加 gRPC 支持"时，它应该知道这属于别的 crate。

注意 OwnedView<V> 的描述用了"Self-referential container"这个术语。这是一个关键的心智模型：OwnedView 通过 ManuallyDrop + transmute 实现自引用，AI 在修改相关代码时必须保持这个不变量。

buffa-types — Well-Known Types

Pre-generated Rust types for Google's well-known .proto files...

No build-time code generation. The WKT Message impls are checked in at src/generated/ (regenerated via task gen-wkt-types when buffa-codegen output format changes). This means consumers depend only on the buffa runtime — not protoc, not buffa-build, not buffa-codegen. It also means buffa-types cross-compiles to bare-metal targets.

The WKT wire format is completely vanilla — two varints for Timestamp, etc. What's special about WKTs is:

Their proto3-JSON representations (RFC3339 string for Timestamp, "3.000001s" for Duration, type-URL dispatch for Any) — hand-written in *_ext.rs.

Their stdlib affinity (SystemTime, std::time::Duration) — hand-written From/TryFrom impls, also in *_ext.rs.

Both layer on top of the generated Message impl via include!() + sibling modules; the checked-in code and the hand-written extensions coexist cleanly.

附注：这段信息密度极高。对 AI 来说有三个关键指令：

不要给 buffa-types 加 build.rs——WKT 是预生成并签入的，不是构建时生成的。
wire format 在 src/generated/，JSON 和 stdlib 转换在 *_ext.rs——这告诉 AI 修改 WKT 行为时该编辑哪个文件。修改 Timestamp 的 wire encoding 去 generated；修改 RFC3339 格式化去 timestamp_ext.rs。
include!() + sibling modules 的共存模式——AI 不应该把手写代码移进 generated 文件，也不应该反过来。

buffa-codegen — Shared Code Generation Logic

The code generation library, shared between protoc-gen-buffa and buffa-build. Takes protobuf descriptors (from protoc's FileDescriptorProto) and emits Rust source code.

This is a library crate with no binary — it doesn't know how descriptors were produced (protoc or buf). It just takes descriptors in and produces Rust out.

...

The code generator always works with resolved edition features — it never branches on "is this proto2 or proto3?" because protoc resolves edition features in the FileDescriptorProto itself.

附注：最后一句是核心架构决策。codegen 内部没有 if proto2 {} else if proto3 {} 的分支——一切都是 editions 的特征值（FieldPresence::Explicit vs Implicit）。这意味着当 AI 修改 codegen 时，永远不应该添加 syntax 版本判断，而应该查询 resolved features。

protoc-gen-buffa — Protoc Plugin

Bootstrapping: The CodeGeneratorRequest and CodeGeneratorResponse messages are themselves protobuf — we decode/encode them using buffa's own generated descriptor and compiler types (checked into buffa-codegen/src/generated/), eliminating any external protobuf library dependency from the build graph.

附注：这是一个经典的自举（bootstrapping）问题。protoc 插件需要解析 protobuf 消息（CodeGeneratorRequest），但 buffa 本身就是 protobuf 库。解决方案是在 codegen crate 中签入预生成的 descriptor 类型。对 AI 来说，这意味着修改 codegen 的输出格式时，可能需要运行两轮生成：先 gen-bootstrap-types（更新 codegen 自己的 descriptor 类型），再 gen-wkt-types（更新 buffa-types 的 WKT）。CLAUDE.md 中的那条规则正是在说这件事。

buffa-build — Build Script Integration

Descriptor back-ends:

protoc (default): the de-facto standard...

buf: an ergonomic alternative...

Escape hatch — .descriptor_set(path): The Config::descriptor_set method accepts a pre-built FileDescriptorSet file, so users can obtain descriptors through any means... and pass them directly, bypassing the protoc invocation layer entirely.

附注：escape hatch 的设计是给高级用户的，但对 AI 来说更重要的是：它确认了 buffa-build 的唯一职责是"拿到 descriptor → 交给 codegen"。不管 descriptor 怎么来（protoc、buf、预构建文件），codegen 的输入始终是 FileDescriptorProto。AI 不应该在 buffa-build 中做任何与 Rust 代码生成相关的事。

Custom Type Implementations

For types that need a custom Rust representation while remaining wire-compatible with a .proto definition, implement the Message trait by hand and use extern_path to map the proto type to your custom implementation. This is rare...

附注："This is rare"这三个词至关重要。它告诉 AI：当被要求"给某个 proto 类型做自定义 Rust 表示"时，默认反应应该是"你确定需要吗？大多数情况下用 codegen 的输出加上 ext 方法就够了"。只有在用户明确坚持时才走 custom impl 路线。

Core Design Decisions（核心设计决策）

§1 Editions as the Internal Model

All .proto files—regardless of declared syntax—are normalized to the editions model during compilation:
proto2 file → proto2 feature defaults
proto3 file → proto3 feature defaults
edition N file → edition N defaults + file-level feature overrides
This means:

The code generator has one code path, parameterized by resolved features.

Adding support for future editions (2024, 2025, ...) is a matter of adding new default feature values..., not new edition-specific code paths.

Proto2 and proto3 files can be imported into edition files and vice versa seamlessly.

附注：这个决策的深远影响是——buffa codegen 中永远不会出现 match syntax { Proto2 => ..., Proto3 => ... } 这种分支。所有行为差异都来自 resolved features 的不同值组合。当 AI 需要支持一个新的 edition（比如 2025），它只需要在 editions.rs 中添加默认特征值，codegen 代码几乎不用改。

这种"参数化而非分支"的架构对 AI 编码特别友好：AI 不需要理解 proto2 和 proto3 的历史差异，只需要理解特征值的含义。

§2 Two-Tier Owned/Borrowed Model

For each protobuf message, buffa generates two Rust types:

Owned type (MyMessage) — heap-allocated fields, used for building, storing, and mutating messages:

pub struct Person {
    pub name: String,
    pub id: i32,
    pub avatar: Vec<u8>,
    pub tags: Vec<String>,
    pub address: buffa::MessageField<Address>,
}

Borrowed view type (PersonView<'a>) — zero-copy from the input buffer:

pub struct PersonView<'a> {
    pub name: &'a str,
    pub id: i32,
    pub avatar: &'a [u8],
    pub tags: buffa::RepeatedView<'a, &'a str>,
    pub address: buffa::MessageFieldView<AddressView<'a>>,
}

附注：这两段代码不是教程——它们是给 AI 的代码形状模板。当 AI 修改 codegen 输出时，生成的代码必须遵循这个形状。注意细节：

String → &'a str（字符串零拷贝）
Vec<u8> → &'a [u8]（字节零拷贝）
Vec<String> → RepeatedView<'a, &'a str>（重复字段用专门的 view 容器）
MessageField<T> → MessageFieldView<TView<'a>>（子消息嵌套 view）
i32 → i32（标量类型不变——varint 解码必须发生，无法零拷贝）

OwnedView<V> — views across async boundaries:

The scoped 'a lifetime on MyMessageView<'a> prevents it from satisfying 'static bounds, which tower services, BoxFuture<'static, _>, and tokio::spawn all require. OwnedView<V> solves this by storing the bytes::Bytes buffer alongside the decoded view in a self-referential struct. Internally it extends the view's lifetime to 'static via transmute, which is sound because Bytes is reference-counted (its heap data pointer is stable across moves), immutable, and a manual Drop impl ensures the view is dropped before the buffer.

附注：这段是整个文件中最关键的安全性论证。它回答了一个核心问题："你们怎么敢 transmute 一个带生命周期的引用到 'static？"答案是三重保证：

Bytes 是引用计数的 → 堆数据指针在 move 后稳定
Bytes 是不可变的 → view 借用的数据不会被修改
手动 Drop impl → view 一定在 buffer 之前被释放

如果 AI 修改 OwnedView 的任何方面，它必须重新验证这三个条件是否仍然成立。这就是为什么这段论证被写进 DESIGN.md 而不是只放在代码注释里——注释可能被忽略，DESIGN.md 作为"长期记忆"会被每次会话读取。

§3 MessageField<T>

Prost uses Option<Box<M>> for optional message fields, which creates unwrapping ceremony everywhere:
let name = msg.address.as_ref().unwrap().street.as_ref().unwrap();
Buffa uses a wrapper type MessageField<T>, which dereferences to a default instance when unset:
// Buffa: just works
let name = &msg.address.street;

// Check if actually set
if msg.address.is_set() { ... }

附注：这是 buffa 对 prost 最直接的"API 品味碾压"。DESIGN.md 先展示 prost 的痛苦（两层 .as_ref().unwrap()），再展示 buffa 的优雅（直接 .address.street）。

对 AI 来说，这段代码确立了一个不可违反的 API 原则：MessageField 的 Deref 必须是透明的。如果未来任何修改让用户需要 .unwrap() 才能访问子消息字段，那就是一个 regression。

MessageField<T> is heap-allocated (Option<Box<T>> internally) so the struct size stays small, but the Deref impl provides transparent read access through a lazily-initialized &'static T default singleton.

附注：这句暗示了 DefaultInstance trait 的存在及其 unsafe 安全契约——default singleton 必须是 'static 的，一旦发布就永远不能被修改。这是 MessageField 之所以能做到"unset 时也能 Deref"的底层机制。

§4 EnumValue<T>

Prost represents all enum fields as i32, losing type safety. Buffa generates Rust enums and wraps open-enum fields in EnumValue<T>:
pub enum EnumValue<T: Enumeration> {
    Known(T),
    Unknown(i32),
}
For open enums (default in editions), the field type is EnumValue<PhoneType> — preserving unknown values for round-tripping while giving match ergonomics for known variants.

For closed enums, the field type is PhoneType directly, and unknown values are routed to unknown fields during decoding.

附注：这是 editions-first 设计原则在 enum 层面的直接体现。proto2 的 closed enum 和 proto3/editions 的 open enum 不是两套代码路径，而是同一个 enum_type feature 的两个值（Open vs Closed），codegen 据此选择 EnumValue<E> 还是裸 E。

注意一个容易被忽略的细节："unknown values are routed to unknown fields during decoding"——closed enum 收到未知值时，整个字段被存入 __buffa_unknown_fields，而不是静默丢弃。这是原则 3（"Correct by default"）的体现。

§5 Cached Encoded Size

Prost recomputes message sizes at every nesting level during serialization, leading to potentially exponential time for deeply nested messages. Buffa fixes this with CachedSize:
pub struct CachedSize {
    size: AtomicU32,  // Relaxed ordering — free on all major platforms
}

附注：这个问题在 prost 的 issue tracker 里存在多年。buffa 的解决方案是教科书式的：每个消息 struct 内嵌一个 CachedSize 字段，compute_size() 自底向上缓存，write_to() 使用缓存值。两遍都是 O(n)。

AtomicU32 over Cell<u32>: An earlier design used Cell<u32> on the assumption that avoiding atomics would be faster, since serialization is single-threaded. In practice, Relaxed-ordered atomic load/store compiles to identical machine instructions as a plain memory access on every major platform (x86/x86_64 TSO, ARM64, ARM32, RISC-V) — the only difference is a compiler reordering barrier, which has zero runtime cost. Switching to AtomicU32 makes messages Sync, enabling Arc<Message> and read-sharing across threads, at no measurable overhead. The DefaultInstance trait requires T: Sync for its static lazy-initializer pattern; !Sync messages made it impossible to compile the generated DefaultInstance impl, which was the decisive factor.

附注：这是整个文件中最精彩的"决策演进历史"。结构是：

我们曾经用了 Cell<u32>（看起来更高效）
理由是"避免原子开销"
但实际上 Relaxed 在所有主要平台上等价于普通内存操作
改成 AtomicU32 后消息变为 Sync（可以放进 Arc）
而且 DefaultInstance trait 需要 Sync——Cell 直接无法编译

这段话的核心作用是：阻止 AI 退化。 如果没有这段历史，AI 看到 AtomicU32 可能会"优化"成 Cell<u32>——毕竟序列化是单线程的嘛。但这段话告诉它：这条路走过了，走不通，不只是性能问题，还会导致编译失败。

Serialization must still be sequenced: compute_size() and write_to() must be called in order without interleaving from another thread. merge() requires &mut self, so mutation is still exclusive. Sync enables shared read access to a fully-built message, not concurrent serialization.

附注：最后这段澄清了 Sync 的语义边界——shared read OK，concurrent serialize NOT OK。这防止了 AI 在看到 Sync 后误以为可以从多个线程同时 compute_size() + write_to() 同一个消息。

§6 Unknown Field Preservation

Buffa preserves unknown fields by default:
pub struct UnknownFields {
    fields: Vec<UnknownField>,
}
This ensures round-trip fidelity: decoding a message with a newer schema and re-encoding it preserves fields the current schema doesn't know about. This is especially important for middleware/proxy use cases.

Default: on. The trade-off for most usages is memory, not throughput: when no unknown fields appear on the wire (the common case for schema-aligned services) the decode-loop fallthrough arm simply never fires, so the cost is the 24-byte Vec header per message, not a per-field penalty. Opting out via .preserve_unknown_fields(false) is worth considering for memory-constrained targets or large in-memory collections of small messages — not as a general throughput optimization.

附注：这段话预设了一个 AI 容易犯的错误，并主动纠正。

AI 的错误认知可能是："关闭 unknown fields 能提升 throughput。"

实际情况是：成本是 24 字节内存/消息（Vec 的空 header），不是 per-field 的 CPU 开销。在大多数场景下 decode loop 的 fallthrough arm 根本不会执行（因为 schema 对齐，没有未知字段）。

"not as a general throughput optimization"这句话直接告诉 AI：不要建议用户为了性能关掉它，只有内存受限场景才值得考虑。

§7 Feature Resolution Pipeline

Edition features are resolved by protoc (or buf) and encoded directly in the FileDescriptorProto that buffa-codegen receives. The runtime never needs to interpret edition features — the generated code already embodies the correct behaviour, and buffa-codegen reads the resolved features straight from the descriptor.

.proto file(s)
    │
    ▼
┌──────────────────────────────────────────┐
│  protoc / buf                            │
│  (parse, resolve, edition feature        │
│   resolution baked into descriptors)     │
└───────────┬──────────────────────────────┘
            │ FileDescriptorSet (binary proto)
            ▼
┌─────────────────────────┐
│  buffa-build /          │
│  protoc-gen-buffa       │
│  (decode + dispatch)    │
└───────────┬─────────────┘
            │ FileDescriptorProto (per file)
            ▼
┌─────────────────────────┐
│  buffa-codegen          │
│  (Rust code generation) │
│  (owned + view types)   │
└─────────────────────────┘

附注：这个 ASCII 图的作用是给 AI 一个数据流心智模型：数据从 .proto 开始，经过 protoc/buf 解析（buffa 不参与），变成 FileDescriptorProto，然后 codegen 读取特征值并输出 Rust 代码。

关键约束：buffa 不解析 .proto 文件。它依赖 protoc/buf 做解析和特征解析。这是原则 8（descriptor-centric）的图形化表达。AI 在被要求"支持一种新的 proto 语法"时，正确的做法是"等 protoc 支持它"，而不是"在 buffa 中实现解析"。

§8-§10（简述）

8. Configurable Recursion Limits

Default remains 100 for compatibility.

附注：100 是 protobuf 官方实现的标准值。AI 不应该随意修改这个默认值。

9. no_std Support

The buffa runtime crate is no_std compatible with alloc.

附注：确认了原则 7 的实现方式——no_std + alloc，不是 no_std + no_alloc。AI 在生成代码时可以使用 alloc::vec::Vec、alloc::string::String 等，但不能使用 std::io 等。

10. Serde Integration

The canonical protobuf JSON mapping is non-trivial and cannot be satisfied by plain derive(Serialize, Deserialize) alone.

附注：这句话防止了一个常见错误：AI 可能会建议"直接 derive Serialize"来实现 JSON 支持。但 protobuf JSON mapping 有一堆特殊规则（int64 序列化为字符串、bytes 序列化为 base64、enum 用名称不用数字等），必须手写 serde 辅助模块。

Owned decode: intentional throughput trade-offs

Owned decode (Message::decode_from_slice) benchmarks within roughly ±10% of prost in most cases. The costs are intentional and attributable to specific features:

Feature Decode cost Why

Unknown-field preservation (default-on) Fallthrough arm does decode_unknown_field + Vec::push per unknown tag; 24 B/message for the Vec header Round-trip fidelity for proxies and schema-skewed services. Disable with .preserve_unknown_fields(false) when not needed.

EnumValue<E> wrapper EnumValue::from(i32) branches on known-variant lookup per enum field Typed open-enum semantics instead of raw i32 (prost's approach).

Arithmetic-limit decode (merge_to_limit) One extra buf.remaining() > limit comparison per decode-loop iteration vs buf.take(len) Supports recursive message types (google.protobuf.Struct ↔ Value) without Take<Take<Take<…>>> type explosion (E0275). prost cannot compile these without manual Box indirection.

Box<T> per nested message Heap allocation per sub-message vs upb's arena bump-allocator Standard Rust ownership model. protobuf-v4's decode lead on deeply-nested messages (+90% on AnalyticsEvent) comes from upb batching all sub-messages into one arena.

Feature	Decode cost	Why
Unknown-field preservation (default-on)	Fallthrough arm does `decode_unknown_field` + `Vec::push` per unknown tag; 24 B/message for the `Vec` header	Round-trip fidelity for proxies and schema-skewed services. Disable with `.preserve_unknown_fields(false)` when not needed.
`EnumValue<E>` wrapper	`EnumValue::from(i32)` branches on known-variant lookup per enum field	Typed open-enum semantics instead of raw `i32` (prost's approach).
Arithmetic-limit decode (`merge_to_limit`)	One extra `buf.remaining() > limit` comparison per decode-loop iteration vs `buf.take(len)`	Supports recursive message types (`google.protobuf.Struct` ↔ `Value`) without `Take<Take<Take<…>>>` type explosion (E0275). prost cannot compile these without manual `Box` indirection.
`Box<T>` per nested message	Heap allocation per sub-message vs upb's arena bump-allocator	Standard Rust ownership model. protobuf-v4's decode lead on deeply-nested messages (+90% on AnalyticsEvent) comes from upb batching all sub-messages into one arena.

附注：这张表是给 AI 的**"不要优化掉"清单**。

当 AI 被要求"优化 decode 性能"时，它的第一反应可能是消除这些开销。但这张表明确说了：每一项开销都是有意的，对应一个不可放弃的功能。

特别是最后一行——Box<T> per nested message 导致了 AnalyticsEvent 上比 protobuf-v4 慢 90%。一般来说这是一个需要修复的 regression。但 DESIGN.md 解释了原因（upb 用 arena allocator 批量分配）和选择理由（Standard Rust ownership model）。AI 不应该提议"给 buffa 加 arena allocator"，因为那会违反原则 1（pure Rust）和原则 4（idiomatic Rust API）。

The view decode path (MessageView::decode_view) sidesteps the allocation cost entirely — no Box, borrows strings/bytes from the input buffer — and is the recommended fast path for read-only request handling.

附注：这句是"正确答案"——如果用户要求更快的 decode，推荐 view 模式，不要试图优化 owned 模式中那些有意的开销。

Rejected: Pre-scan capacity reservation for view Vecs

During connect-rust integration, pprof profiling showed allocation overhead from Vec growth in RepeatedView and MapView during view decoding. We investigated pre-scanning the wire bytes before the main decode loop to count repeated field occurrences and reserve() exact capacity.

Two approaches were benchmarked:

Per-field scanning (count_field_occurrences called once per repeated/map field): O(N × buf.len()) where N is the number of repeated fields. Resulted in 20-97% regressions across all message sizes.

Single-pass multi-field counting (count_fields scanning all field numbers in one pass): O(buf.len()) regardless of field count. Still showed 5-40% regressions.

Even the single-pass approach was slower than Vec's amortized doubling because: (1) the scan touches every byte of the buffer doing varint decode + skip, which is comparable in cost to the actual decode pass, and (2) Vec's doubling strategy produces at most log2(n) allocations, and for typical protobuf maps/repeated fields (2-20 entries), that's only 2-5 allocations of small arrays — cheaper than a full buffer scan.

Vec already grows by powers of 2 (capacity doubles on realloc), which is the optimal amortized strategy. A fixed initial capacity (e.g., with_capacity(4)) was considered but rejected because it would allocate for every RepeatedView/MapView in every message, including fields that are usually empty.

附注：这是整个文件中最重要的"不要做什么"章节。

预扫描看起来是一个"显然正确"的优化——先数一遍有多少元素，然后一次性分配。任何有经验的系统程序员（或 AI）都可能提出这个方案。但数据说不：两种实现都更慢。

原因的解释特别精妙：预扫描需要遍历整个 buffer 做 varint 解码和跳过——这和真正的 decode pass 成本相当。等于做了两遍 decode，而省下来的只是 Vec 的 2-5 次 realloc（对于典型的小型 repeated 字段）。

最后一句防住了第三种可能的"优化"："with_capacity(4) 被拒绝了，因为它会为每个消息中的每个 RepeatedView/MapView 都分配，包括通常为空的字段。"——这告诉 AI：不要试图在默认路径上加预分配。

Profile-guided decode optimizations

Three optimizations were applied based on pprof data from connect-rust's LogRecord view-decode benchmark (~350 string fields, ~450 varints per request). Each is a small, commented change that preserves readability.

encode_varint unbounded loop (encoding.rs). An earlier refactor had changed loop { ... return } to for _ in 0..10 { ... return } for explicit bounds. LLVM cannot prove the inner return always fires before the counter bound, so it keeps loop-counter machinery alive. Since value >>= 7 monotonically decreases, termination is already guaranteed; the unbounded loop lets LLVM see that. Impact: ~40% encode throughput recovery.

附注：这个优化的故事很有教育意义。某次重构把无界 loop 改成了有界 for _ in 0..10（看起来更安全），结果编码吞吐量暴跌 40%。因为 LLVM 无法证明循环一定在 10 次之内结束，所以保留了循环计数器的开销。教训：不要在已知终止的循环上加人为边界。

这段话防止了 AI 的一个典型"改进"：把 loop 改成 for _ in 0..10（因为 AI 可能认为有界循环"更安全"）。

Tag::decode one-byte fast path (encoding.rs). Field numbers 1–15 with any wire type encode as a single byte. decode_varint already has a one-byte fast path, but with plain #[inline] LLVM often declines to inline it into the per-field decode loop... Impact: +12–29% view decode, +9–16% owned.

附注：field number 1-15 在 protobuf 中特别常见（protobuf 风格指南推荐把高频字段放在这个范围内），所以在 Tag::decode 中内联单字节快速路径有显著收益。

strict_utf8_mapping opt-in (codegen). core::str::from_utf8 was 11% of decode CPU... The codegen flag maps utf8_validation = NONE string fields to Vec<u8> / &[u8]; the caller explicitly chooses from_utf8 (checked) or from_utf8_unchecked (trusted-input) at the use site. Default-off because proto2's default is NONE — automatic mapping would break all proto2 string fields. Impact: ~2× RPS in connect-rust's trusted-input server...

附注：这个优化直接来自 connect-rust 的生产环境 profiling。UTF-8 验证占了 11% CPU，但 Rust 的 &str 有类型级 UTF-8 不变量——不能跳过验证还保持 &str 类型。解决方案是把这类字段映射到 &[u8]，让调用者自己选择是检查还是 unsafe 信任。

注意 "Default-off" 的理由：proto2 默认就是 NONE，如果自动启用会破坏所有 proto2 字符串字段的类型。这种"新功能默认关闭"的审慎态度是典型的库级代码设计。

Readability line we hold: fast-path/slow-path splits with a "why" comment are fine. Manual unrolling, #[inline(always)] sprinkled defensively, SIMD intrinsics, or likely()/unlikely() workarounds are not. The test: can a new contributor read the code, understand the fast path, and safely modify the slow path?

附注：这是给 AI 的品味边界——"你可以做哪些优化，不可以做哪些"。

允许的：快速路径/慢速路径分离 + "why" 注释。不允许的：手动循环展开、防御性 #[inline(always)]、SIMD intrinsics、likely()/unlikely() 黑科技。

判断标准不是"能否让代码更快"，而是"新贡献者能否读懂代码、理解快速路径、并安全修改慢速路径"。AI 天然倾向于激进优化（因为它不在乎可读性），这条规则把 AI 拉回"人类可维护"的范围内。

Proto Syntax Supported

Edition 2023 / 2024

Runtime types for all edition features exist in editions.rs. Editions 2023 and 2024 are fully supported with feature-driven codegen...

field_presence: EXPLICIT, IMPLICIT, LEGACY_REQUIRED

enum_type: OPEN, CLOSED

repeated_field_encoding: PACKED, EXPANDED

utf8_validation: VERIFY, NONE

message_encoding: LENGTH_PREFIXED, DELIMITED

json_format: ALLOW, LEGACY_BEST_EFFORT

附注：这是 codegen 的参数化轴的完整列表。每个 proto 元素（字段、枚举、消息）都有一组 resolved features，codegen 根据这些值的组合生成对应的 Rust 代码。AI 在修改 codegen 时应该查阅这个列表，确认新代码覆盖了所有相关的 feature 组合。

Proto2

Full proto2 support:

optional, required, repeated

Closed enums... Remaining gap: view packed-repeated (no per-element span to borrow) and map values (spec requires the entire entry to go to unknown fields — needs re-encode).

Custom default values via [default = ...] annotations on required fields...

Extensions: fully supported.

附注：这段列出了 proto2 支持中的已知缺口——view 模式下 packed-repeated closed enum 和 map value 的未知值处理。对 AI 来说，这是"可以贡献的方向"，同时也告诉 AI 这些缺口是已知的、有意推迟的，不是 bug。

Extensions

Typed extension access is layered on top of unknown-field storage — extension values are decoded lazily on each extension() call rather than stored in dedicated fields. This matches protobuf-es and avoids the registration-timing footgun in protobuf-go's eager model, where an extension registered after decode is silently ignored by both Get and JSON encode. With lazy decode, registration timing is irrelevant — the unknown-field record is always there.

附注：这段展示了 buffa 如何在不同实现之间做选择。lazy decode（protobuf-es 的方式）vs eager decode（protobuf-go 的方式）——选择 lazy 是因为 eager 有一个"注册时机陷阱"（注册太晚会丢失扩展）。这种跨生态的比较研究对 AI 来说是极有价值的决策上下文。

Versioning and Compatibility

Crate Versioning

All workspace crates share a version and are released together. This avoids the compatibility matrix problems that plague split-version ecosystems.

附注：统一版本号是一个有意识的简化决策。AI 在修改 Cargo.toml 时不应该给不同 crate 设置不同版本。

API Stability

The Message trait and core types are designed for stability. The generated code shape is part of the public API contract—changing it requires a major version bump.

附注：这句话的分量很重：codegen 输出的形状是公共 API 的一部分。这意味着 AI 修改 codegen 时，如果改变了生成代码的 struct 字段名、方法签名或 trait impl，就是一个 breaking change，需要 bump major version。

What Buffa is Not

Not a gRPC framework. RPC support is provided by separate crates (e.g., connect-rust for ConnectRPC)...

Not a protoc replacement. Buffa does not ship its own .proto parser...

Not backwards-compatible with prost. The generated code and trait system are different. Migration from prost will require updating generated code and call sites. A migration guide is provided.

附注：这最后三条是防御性声明——告诉 AI（和人类）buffa 的边界在哪里。最关键的是第一条：当有人要求"在 buffa 中加 gRPC 支持"时，正确答案是"去看 connect-rust"，而不是在 buffa 里实现。

全文总结：DESIGN.md 的五种功能性内容

回顾整个文件，它包含五种不同功能的信息，各有不同的 AI 辅助作用：

内容类型	大致占比	对 AI 的作用	代表段落
竞品对比	~5%	防止 AI 建议"用现有库"或"fork X"	动机表格
设计原则	~5%	硬约束排序，解决决策冲突	八条原则
模块边界	~30%	告诉 AI 每个文件该改什么、不该改什么	Crate 描述
决策演进	~35%	防止 AI 退化到已否决方案	AtomicU32 vs Cell, Pre-scan rejection
性能因果	~25%	防止 AI "优化掉"有意的开销	Trade-offs 表, Profile-guided optimizations

前三种是任何好的设计文档都会有的。后两种——决策演进历史和被拒绝方案的基准测试数据——是专门为 AI 协作增加的内容。它们回答的不是"代码是什么样的"，而是"代码为什么不是别的样子"。这恰恰是 AI 最容易犯错的地方：它能看到当前代码，但看不到代码的过去和被放弃的可能性。

DESIGN.md 的本质是把一个项目的决策历史压缩成了一个可以在每次 AI 会话中恢复的文本。它不是文档——它是 AI 的记忆。

ZhangHanDong/buffa-design.md