wordsoap

0.2.0 • Public • Published

wordsoap

Build Status NPM version

Clean up dirty HTML output from Microsoft Word

Usage

command line

$ npm install -g wordsoap
$ cat msword_garbage.html | wordsoap

module

$ npm install --save wordsoap
var wordsoap = require('wordsoap')
 
var dirty = "<p class=MsoNormal style='font-size:12pt'>Text</p>")
var clean = wordsoap(dirty) // <p>Text</p>
 
// access individual regex strings
wordsoap.regexes.msoAttributes // <(\w+)(?: (?:class|lang|style|size|face|[ovwxp]))=(?:'[^']*'|""[^""]*""|[^\s>]+)(?:[^>]*)>
 
// access individual regexes compiled with 'gi' flags
wordsoap.regexesCompiled.msoAttributes // <(\w+)(?: (?:class|lang|style|size|face|[ovwxp]))=(?:'[^']*'|""[^""]*""|[^\s>]+)(?:[^>]*)>

License

ISC © Raine Lourie

Package Sidebar

Install

npm i wordsoap

Weekly Downloads

2

Version

0.2.0

License

ISC

Last publish

Collaborators

  • raine