modules/html/treebuilder.zzm

html-0.0.2 documentation

Package

Name
html
Version
0.0.2
Uploaded
2026-06-12 23:25:02
Repository
https://github.com/tobyink/zuzu-html
Dependencies
Metadata
zuzu-distribution.json
Archive
Download .tar.gz

NAME

html/treebuilder - HTML tree construction framework.

SYNOPSIS

  from html/treebuilder import HTMLTreeBuilder;

  let result := new HTMLTreeBuilder(
    _input: "<!doctype html><title>Example</title>",
  ).parse();

NOTE

This module is not normally useful to end users. Instead use html/parser.

DESCRIPTION

This module implements the tree-builder layer for html/parser. It connects the tokenizer to the html/dom classes and covers the initial, before html, before head, in head, text, after head, in body, table, select, template, frameset, after body, after after body, and fragment insertion-mode setup. It also routes SVG and MathML foreign content through namespace-aware insertion, adjusted SVG/MathML names, foreign XLink/XML/XMLNS attributes, HTML/MathML integration points, and foreign CDATA sections.

It deliberately does not implement script execution, file load/dump helpers, or the html5lib .dat harness.

EXPORTS

Classes

  • HTMLTreeBuilder

    Tree-construction engine. Most applications should use HTML.parse or HTMLParser; this class is exported for tests, diagnostics, and tools which need direct access to the tree-building layer.

    Construct with _input to provide source text. parse() returns an HTMLTreeConstructionResult for a full document. parseFragment parses a context-sensitive fragment and returns an HTMLTreeConstructionResult with both a staging document and a fragment.

    Useful public accessors are tokenizer, document, fragment, errors, parseErrors, insertionMode, and currentNode. errors() returns tokenizer and tree-construction parse errors collected during the latest parse.

    Lower-level stack, scope, insertion, and mode methods are exposed by the class because the implementation is Pure ZuzuScript, but they are not part of the stable application API. Prefer the parser facade unless a test or tool needs exact tree-builder state.

  • HTMLTreeConstructionResult

    Result object returned by HTMLTreeBuilder.parse and HTMLTreeBuilder.parseFragment. document() returns the parsed or staging HTMLDocument. fragment() returns the HTMLDocumentFragment for fragment parses and null for full documents. errors() returns a copy of parse errors, and parseErrors() is an alias for errors().

  • HTMLTreeTestSerializer

    Serializer for html5lib tree-construction tests. The static serialize(node) method returns the tree-test representation used by tests/html/tree-construction.zzs. It serializes document and fragment children, element namespaces, sorted attributes, comments, doctypes, text nodes, and template content in the shape expected by the vendored fixtures.

LIMITATIONS

This module implements the tree-construction behaviour claimed by the distribution tests, not every edge case in the WHATWG algorithm. Known html5lib expected failures are tracked in tests/html/tree-construction-xfails.zzm and summarized in the distribution README.

Script execution during parsing is not implemented. The scripting flag affects noscript parsing decisions but does not run scripts or allow parser-time script DOM mutation.

COPYRIGHT AND LICENCE

html/treebuilder is copyright Toby Inkster.

It is free software; you may redistribute it and/or modify it under the terms of either the Artistic License 1.0 or the GNU General Public License version 2.