@giancosta86/wiki-transform
TypeScript icon, indicating that this package has built-in type declarations

1.3.0 • Public • Published

wiki-transform

Stream transforming raw XML into wiki pages

GitHub CI npm version MIT License

Overview

wiki-transform provides a WikiTransform hybrid stream for NodeJS: it takes XML chunks and outputs WikiPage objects.

It is an extremely fast stream, because it internally uses a SAX parser combined with a hyper-minimalist algorithm.

Last but not least, WikiTransform is a standard stream, so you can use it in pipelines, or you can manually control it via the usual stream methods.

Installation

npm install @giancosta86/wiki-transform

or

yarn add @giancosta86/wiki-transform

The public API entirely resides in the root package index, so you shouldn't reference specific modules.

Usage

Just create a new instance of WikiTransform - maybe passing options. You will then be able to:

  • add it to a pipeline - via a chain of .pipe() method calls, or via the pipeline() function provided by NodeJS

  • call its standard methods - like .write(), .end(), .on() and .once()

Supported format

WikiTransform will create a WikiPage object whenever it encounters the following XML pattern:

<page>
  <title>The title</title>
  <text>The text</text>
</page>

with the following rules:

  • The order of the subfields is ignored

  • Additional subfields are ignored

  • Ancestor nodes are ignored

  • Whitespace is ignored

  • XML entities like &gt; are substituted with their actual characters

  • CDATA blocks within significant fields are correctly parsed, and can be freely mixed with non-CDATA text

  • in lieu of <page>, the root tag can be something else - just pass the related opening tag (without angle brackets) to the pageTag constructor option

Please, note: this library does NOT support nested tags within the <text> element! To handle them, you should instead rely on dedicated SAX parsing.

Example

This basic but fairly general-purpose function:

  • extracts wiki pages from any source stream actually generating XML chunks - for example, an HTTP connection, or a file

  • outputs such WikiPage objects to the given target stream

import { Readable, Writable } from "node:stream";
import { pipeline } from "node:stream/promises";
import { WikiTransform } from "@giancosta86/wiki-transform";

export async function extractWikiPages(
  source: Readable,
  target: Writable
): Promise<void> {
  const wikiTransform = new WikiTransform();

  return pipeline(source, wikiTransform, target);
}

Constructor parameters

  • pageTag: if present, defines the tag opening each page, without angle brackets. Default: "page"

  • logger: a Logger interface, as exported by unified-logging. Default: no logger

  • highWaterMark: if present, passed to the base constructor

  • signal: if present, passed to the base constructor

Additional notes

As a convenience utility, especially for testing, the package also provides a wikiPageToXml() function, which converts a WikiPage to XML - using a CDATA block in every field.

Further reference

For additional examples, please consult the unit tests in the source code repository.

Package Sidebar

Install

npm i @giancosta86/wiki-transform

Weekly Downloads

1

Version

1.3.0

License

MIT

Unpacked Size

16.6 kB

Total Files

21

Last publish

Collaborators

  • giancosta86