parseHtml

Parse HTML strings into a traversable virtual DOM.

Interface

this.parseHtml(htmlString)

Parameters

htmlString - HTML string to parse

Returns

VirtualNode tree structure

Description

The parseHtml service converts HTML strings into VirtualNode objects for traversal and manipulation. Useful for content processing, URL extraction, and HTML transformation.

VirtualNode API

node.type          // Element type ('div', '#text', '#comment', '#fragment')
node.attributes    // Object of element attributes
node.children      // Array of child VirtualNodes
node.text          // Concatenated text content of all descendants
node.traverse(fn)  // Recursively call fn on all nodes
node.toString()    // Serialize back to HTML

Examples

Basic Parsing

const node = this.parseHtml('<div class="container">Hello</div>');

node.children[0].type              // 'div'
node.children[0].attributes.class  // 'container'
node.text                          // 'Hello'

Extract Content

const html = await this.renderMarkdown(body);

this.parseHtml(html).traverse(node => {
    if (node.type === 'h1') {
        title = node.text;
    }
});

Find All Links

const links = [];
const virtualDom = this.parseHtml(htmlContent);

virtualDom.traverse(node => {
    if (node.type === 'a' && node.attributes.href) {
        links.push({
            href: node.attributes.href,
            text: node.text
        });
    }
});

Extract URLs

const urls = [];
const virtualDom = this.parseHtml(html);

virtualDom.traverse(({ attributes }) => {
    ['src', 'href'].forEach(name => {
        if (attributes[name]) {
            urls.push(attributes[name]);
        }
    });
});

Notes

Handles self-closing tags (img, br, input)
Preserves comments and doctypes
HTML entity decoding for attributes and text
Gracefully handles malformed HTML