Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow HTML streaming #28

Open
aralroca opened this issue Feb 19, 2024 · 1 comment
Open

feat: allow HTML streaming #28

aralroca opened this issue Feb 19, 2024 · 1 comment

Comments

@aralroca
Copy link

Nice library! I have been trying to use set-dom and it works very well. However I was thinking of using it with HTML streaming and then it doesn't work anymore, I think it's because it's implemented with BFS and the streaming is received with DFS. I have tried to extend the changes to make it work but I have had some troubles.

Do you think it would be feasible to do it or is there any restriction that it has to be in BFS?

To transform chunks to nodes I use this:

const START_CHUNK_SELECTOR = "S-C";
const START_CHUNK_COMMENT = `<!--${START_CHUNK_SELECTOR}-->`;
const decoder = new TextDecoder();
const parser = new DOMParser();

/**
 * Create a generator that extracts nodes from a stream of HTML.
 *
 * This is useful to work with the RPC response stream and
 * transform the HTML into a stream of nodes to use in the
 * diffing algorithm.
 */
export default async function* parseHTMLStream(
  streamReader: ReadableStreamDefaultReader<Uint8Array>,
  ignoreNodeTypes: Set<number> = new Set(),
  text = "",
): AsyncGenerator<Node> {
  const { done, value } = await streamReader.read();

  if (done) return;

  // Append the new chunk to the text with a marker.
  // This marker is necessary because without it, we
  // can't know where the new chunk starts and ends.
  text = `${text.replace(START_CHUNK_COMMENT, "")}${START_CHUNK_COMMENT}${decoder.decode(value)}`;

  // Find the start chunk node
  function startChunk() {
    return document
    .createTreeWalker(
      parser.parseFromString(text, "text/html"),
      128, /* NodeFilter.SHOW_COMMENT */
      {
          acceptNode:  (node) =>  node.textContent === START_CHUNK_SELECTOR 
            ? 1 /* NodeFilter.FILTER_ACCEPT */
            : 2 /* NodeFilter.FILTER_REJECT */
      }
    )
    .nextNode();
  }

  // Iterate over the chunk nodes
  for (
    let node = getNextNode(startChunk());
    node;
    node = getNextNode(node)
  ) {
    if(!ignoreNodeTypes.has(node.nodeType)) yield node;
  }

  // Continue reading the stream
  yield* await parseHTMLStream(streamReader, ignoreNodeTypes, text);
}

/**
 * Get the next node in the tree.
 * It uses depth-first search in order to work with the streamed HTML.
 */
export function getNextNode(
  node: Node | null,
  deeperDone?: Boolean,
): Node | null {
  if (!node) return null;
  if (node.childNodes.length && !deeperDone) return node.firstChild;
  return node.nextSibling ?? getNextNode(node.parentNode, true);
}

Then I can use the stream nodes directly with:

const reader = res.body.getReader();

for await (const node of parseHTMLStream(reader)) {
  console.log(node);
}
@aralroca
Copy link
Author

aralroca commented Apr 4, 2024

@DylanPiercey I implemented here https://github.com/aralroca/diff-dom-streaming is working fine with streaming. I take set-dom as reference (commented in the code). Would be great to have some feedback from you, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant