渐进式 JSON

May 31, 2025

你了解渐进式 JPEG 吗？这里有一篇很好的解释，介绍了什么是渐进式 JPEG。其核心思想是，图片不是自上而下逐行加载，而是先模糊显示，然后逐步变得清晰。

如果我们把同样的思想应用到 JSON 的传输上，会怎么样？

假设你有一棵包含一些数据的 JSON 树：

{
  header: 'Welcome to my blog',
  post: {
    content: 'This is my article',
    comments: [
      'First comment',
      'Second comment',
      // ...
    ]
  },
  footer: 'Hope you like it'
}

现在，假设你要通过网络传输它。由于格式是 JSON，在最后一个字节加载完成之前，你无法拥有一个有效的对象树。你必须等到全部数据加载完毕，才能调用 JSON.parse，然后再处理它。

客户端在服务器发送最后一个字节之前，什么都做不了。如果 JSON 的某一部分在服务器端生成很慢（比如加载 comments 需要一次慢速数据库查询），客户端在服务器完成所有工作之前，无法开始任何处理。

你会认为这是好的工程实践吗？但这却是现状——99.9999%^* 的应用都是这样发送和处理 JSON 的。我们敢于改进这一点吗？

* 这个数字是我编的

流式 JSON

我们可以尝试通过实现流式 JSON 解析器来改进这一点。流式 JSON 解析器能够从不完整的输入中生成对象树：

{
  header: 'Welcome to my blog',
  post: {
    content: 'This is my article',
    comments: [
      'First comment',
      'Second comment'

如果你在此时请求结果，流式解析器会返回如下内容：

{
  header: 'Welcome to my blog',
  post: {
    content: 'This is my article',
    comments: [
      'First comment',
      'Second comment'
      // （其余评论尚未到达）
    ]
  }
  // （footer 属性缺失）
}

然而，这种方式也有不少问题。

这种方式的一个缺点是，得到的对象其实是不完整的。例如，顶层对象本应有三个属性（header、post 和 footer），但由于 footer 还没到达流中，所以缺失了。post 应该有三个 comments，但你无法判断后面还会不会有更多评论。

某种程度上，这正是流式传输的本质——我们想要获取不完整的数据——但**这让客户端实际上很难利用这些数据。**由于字段缺失，类型对不上。我们不知道哪些是完整的，哪些还没到。这也是为什么流式 JSON 除了极少数场景外并不流行。应用逻辑通常假定类型是正确的，“就绪”就意味着“完整”，而流式 JSON 很难满足这种需求。

类比 JPEG，这种朴素的流式方式就像默认的“自上而下”加载机制。你看到的图片很清晰，但只显示了顶部 10%。尽管清晰度很高，但你其实看不到图片内容。

有趣的是，这也是流式HTML的默认工作方式。如果你在慢速网络下加载一个 HTML 页面，它会按文档顺序流式传输：

<html>
  <body>
    <header>Welcome to my blog</header>
    <article>
      <p>This is my article</p>
        <ul class="comments">
          <li>First comment</li>
          <li>Second comment</li>

这种方式有一些好处——浏览器能够部分显示页面——但也有同样的问题。截断点是任意的，可能导致视觉上的突兀，甚至破坏页面布局。你无法确定后面是否还有内容。比如 footer，即使它已经在服务器端准备好，本可以更早发送，也会被截断。按顺序流式传输数据时，只要有一处变慢，所有后续内容都会被拖慢。

再强调一遍：当我们按出现顺序流式传输时，任何一个慢的部分都会拖慢所有后面的内容。你能想到什么办法来解决这个问题吗？

渐进式 JSON

还有另一种流式传输的思路。

目前为止，我们一直是深度优先地发送数据。我们先发送顶层对象的属性，然后进入该对象的 post 属性，再进入 comments 属性，依此类推。如果某个部分很慢，其他所有内容都会被阻塞。

但我们也可以广度优先地发送数据。

假设我们这样发送顶层对象：

{
  header: "$1",
  post: "$2",
  footer: "$3"
}

这里，"$1"、"$2"、"$3" 代表尚未发送的信息片段。这些是占位符，可以在后续流中逐步填充。

比如，服务器向流中再发送几行数据：

{
  header: "$1",
  post: "$2",
  footer: "$3"
}
/* $1 */
"Welcome to my blog"
/* $3 */
"Hope you like it"

注意，我们并不需要按特定顺序发送这些片段。在上面的例子中，我们只发送了 $1 和 $3，但 $2 还没到！

如果客户端此时尝试重建树，可能是这样的：

{
  header: "Welcome to my blog",
  post: new Promise(/* ... 尚未完成 ... */),
  footer: "Hope you like it"
}

我们用 Promise 来表示尚未加载的部分。

接下来，服务器可以继续流式发送更多数据：

{
  header: "$1",
  post: "$2",
  footer: "$3"
}
/* $1 */
"Welcome to my blog"
/* $3 */
"Hope you like it"
/* $2 */
{
  content: "$4",
  comments: "$5"
}
/* $4 */
"This is my article"

这样，客户端视角下，部分缺失的数据被“补全”了：

{
  header: "Welcome to my blog",
  post: {
    content: "This is my article",
    comments: new Promise(/* ... 尚未完成 ... */),
  },
  footer: "Hope you like it"
}

此时，post 的 Promise 已经解析为一个对象，但 comments 仍然是一个 Promise。

最后，comments 也流式传输过来：

{
  header: "$1",
  post: "$2",
  footer: "$3"
}
/* $1 */
"Welcome to my blog"
/* $3 */
"Hope you like it"
/* $2 */
{
  content: "$4",
  comments: "$5"
}
/* $4 */
"This is my article"
/* $5 */
["$6", "$7", "$8"]
/* $6 */
"This is the first comment"
/* $7 */
"This is the second comment"
/* $8 */
"This is the third comment"

现在，客户端视角下，整棵树已经完整：

{
  header: "Welcome to my blog",
  post: {
    content: "This is my article",
    comments: [
      "This is the first comment",
      "This is the second comment",
      "This is the third comment"
    ]
  },
  footer: "Hope you like it"
}

通过以广度优先方式分块发送数据，我们让客户端能够逐步处理这些数据。只要客户端能够处理部分“未就绪”的内容（用 Promise 表示），并处理剩下的部分，这就是一种进步！

内联

现在我们有了基本机制，可以对输出做进一步优化。让我们再看一遍上面例子的完整流式传输过程：

{
  header: "$1",
  post: "$2",
  footer: "$3"
}
/* $1 */
"Welcome to my blog"
/* $3 */
"Hope you like it"
/* $2 */
{
  content: "$4",
  comments: "$5"
}
/* $4 */
"This is my article"
/* $5 */
["$6", "$7", "$8"]
/* $6 */
"This is the first comment"
/* $7 */
"This is the second comment"
/* $8 */
"This is the third comment"

这里的流式拆分可能有点过头了。除非某些部分的生成确实很慢，否则把它们分成单独的片段其实没什么好处。

假设我们有两个慢操作：加载 post 和加载 post 的 comments。在这种情况下，分三块发送就很合理。

首先，发送外层壳：

{
  header: "Welcome to my blog",
  post: "$1",
  footer: "Hope you like it"
}

客户端立刻能得到：

{
  header: "Welcome to my blog",
  post: new Promise(/* ... 尚未完成 ... */),
  footer: "Hope you like it"
}

然后，发送 post 数据（但不包含 comments）：

{
  header: "Welcome to my blog",
  post: "$1",
  footer: "Hope you like it"
}
/* $1 */
{
  content: "This is my article",
  comments: "$2"
}

客户端视角：

{
  header: "Welcome to my blog",
  post: {
    content: "This is my article",
    comments: new Promise(/* ... 尚未完成 ... */),
  },
  footer: "Hope you like it"
}

最后，comments 一次性发送：

{
  header: "Welcome to my blog",
  post: "$1",
  footer: "Hope you like it"
}
/* $1 */
{
  content: "This is my article",
  comments: "$2"
}
/* $2 */
[
  "This is the first comment",
  "This is the second comment",
  "This is the third comment"
]

这样，客户端就得到了完整的树：

{
  header: "Welcome to my blog",
  post: {
    content: "This is my article",
    comments: [
      "This is the first comment",
      "This is the second comment",
      "This is the third comment"
    ]
  },
  footer: "Hope you like it"
}

这种方式更紧凑，同时也达到了同样的目的。

总的来说，这种格式让我们可以灵活决定何时将内容作为单独片段发送，何时合并发送。只要客户端能适应片段乱序到达，服务器就可以采用不同的分批和分块策略。

外联

这种方式还有一个有趣的副作用：它天然支持减少输出流中的重复内容。如果我们要序列化一个已经出现过的对象，只需将其单独作为一行，然后复用即可。

比如，假设我们有如下对象树：

const userInfo = { name: 'Dan' };
 
[
  { type: 'header', user: userInfo },
  { type: 'sidebar', user: userInfo },
  { type: 'footer', user: userInfo }
]

如果直接序列化为普通 JSON，{ name: 'Dan' } 会被重复：

[
  { type: 'header', user: { name: 'Dan' } },
  { type: 'sidebar', user: { name: 'Dan' } },
  { type: 'footer', user: { name: 'Dan' } }
]

但如果采用渐进式 JSON，可以选择外联：

[
  { type: 'header', user: "$1" },
  { type: 'sidebar', user: "$1" },
  { type: 'footer', user: "$1" }
]
/* $1 */
{ name: "Dan" }

我们还可以采用更平衡的策略——比如默认内联对象（更紧凑），但当某个对象被多次引用时，再将其单独输出并在流中去重。

这也意味着，与普通 JSON 不同，我们可以支持序列化循环引用对象。循环引用对象只需让其属性指向自己的流“行”即可。

流式数据 vs 流式 UI

上述方法本质上就是 React Server Components（RSC）的工作机制。

假设你用 React Server Components 写了一个页面：

function Page() {
  return (
    <html>
      <body>
        <header>Welcome to my blog</header>
        <Post />
        <footer>Hope you like it</footer>
      </body>
    </html>
  );
}
 
async function Post() {
  const post = await loadPost();
  return (
    <article>
      <p>{post.text}</p>
      <Comments />
    </article>
  );
}
 
async function Comments() {
  const comments = await loadComments();
  return <ul>{comments.map(c => <li key={c.id}>{c.text}</li>)}</ul>;
}

React 会以渐进式 JSON 流的方式输出 Page 的内容。客户端会将其重建为逐步加载的 React 树。

最初，客户端上的 React 树可能是这样的：

<html>
  <body>
    <header>Welcome to my blog</header>
    {new Promise(/* ... 尚未完成 ... */)}
    <footer>Hope you like it</footer>
  </body>
</html>

然后，随着服务器端 loadPost 解析，更多内容流入：

<html>
  <body>
    <header>Welcome to my blog</header>
    <article>
      <p>This is my post</p>
      {new Promise(/* ... 尚未完成 ... */)}
    </article>
    <footer>Hope you like it</footer>
  </body>
</html>

最后，当服务器端 loadComments 解析后，客户端收到剩余内容：

<html>
  <body>
    <header>Welcome to my blog</header>
    <article>
      <p>This is my post</p>
      <ul>
        <li key="1">This is the first comment</li>
        <li key="2">This is the second comment</li>
        <li key="3">This is the third comment</li>
      </ul>
    </article>
    <footer>Hope you like it</footer>
  </body>
</html>

但这里有个关键点。

你其实不希望页面在数据流入时随意跳动。比如，也许你根本不想让页面在没有 post 内容时就显示出来。

这就是为什么 React 不会为待定的 Promise 显示“空洞”，而是显示最近的声明式加载状态，这由 <Suspense> 组件控制。

在上面的例子中，树中没有 <Suspense> 边界。这意味着，虽然 React 会以流的方式接收数据，但实际上不会向用户展示“跳动”的页面。它会等到整个页面准备好后再显示。

但你可以通过在 UI 树的某部分包裹 <Suspense>，主动选择渐进式展示加载状态。这不会改变数据的发送方式（仍然尽可能流式），但会改变 React 向用户揭示内容的时机。

例如：

import { Suspense } from 'react';
 
function Page() {
  return (
    <html>
      <body>
        <header>Welcome to my blog</header>
        <Post />
        <footer>Hope you like it</footer>
      </body>
    </html>
  );
}
 
async function Post() {
  const post = await loadPost();
  return (
    <article>
      <p>{post.text}</p>
      <Suspense fallback={<CommentsGlimmer />}>
        <Comments />
      </Suspense>
    </article>
  );
}
 
async function Comments() {
  const comments = await loadComments();
  return <ul>{comments.map(c => <li key={c.id}>{c.text}</li>)}</ul>;
}

现在，用户会感受到两阶段的加载体验：

首先，post 和 header、footer 以及评论的 glimmer（骨架屏）一起“弹出”。header 和 footer 不会单独出现。
然后，评论部分单独“弹出”。

换句话说，UI 的揭示阶段与数据到达的顺序解耦。数据会按可用性流式传输，但我们只会根据有意设计的加载状态向用户揭示内容。

某种程度上，可以把 React 树中的 Promise 看作类似于 throw，而 <Suspense> 则类似于 catch。数据会以服务器准备好的任意顺序尽快到达，但 React 会精心控制加载序列的展示，让开发者掌控视觉呈现。

需要注意的是，上述描述和“SSR”或 HTML 并无直接关系。我描述的是一种用 JSON 表示的 UI 树流式传输的通用机制。你可以把这种 JSON 树转成渐进式 HTML（React 就能做到），但这个思想远不止 HTML，也适用于 SPA 场景下的导航。

总结

本文简要介绍了 RSC 的一项核心创新。它不是把数据作为一个大块发送，而是把组件树的 props 由外到内渐进式发送。这样，只要有设计好的加载状态，React 就能在页面其余数据还在流式传输时优雅地展示出来。

我希望更多工具能采用渐进式数据流。只要你遇到“客户端必须等服务器全部完成后才能开始处理”的场景，这就是流式传输可以帮上忙的例子。如果某一个慢的操作能拖慢所有后续内容，那也是个明显的信号。

正如本文所示，单靠流式传输还不够——你还需要能利用流式传输并优雅处理不完整信息的编程模型。React 用有意设计的 <Suspense> 加载状态解决了这个问题。如果你知道有其他系统用不同方式解决，欢迎交流！

Pay what you like

Discuss on Bluesky · Watch on YouTube · Edit on GitHub