wordsoap
Clean up dirty HTML output from Microsoft Word
Usage
command line
$ npm install -g wordsoap$ cat msword_garbage.html | wordsoap
module
$ npm install --save wordsoap
var wordsoap = var dirty = "<p class=MsoNormal style='font-size:12pt'>Text</p>")var clean = // <p>Text</p> // access individual regex stringswordsoapregexesmsoAttributes // <(\w+)(?: (?:class|lang|style|size|face|[ovwxp]))=(?:'[^']*'|""[^""]*""|[^\s>]+)(?:[^>]*)> // access individual regexes compiled with 'gi' flagswordsoapregexesCompiledmsoAttributes // <(\w+)(?: (?:class|lang|style|size|face|[ovwxp]))=(?:'[^']*'|""[^""]*""|[^\s>]+)(?:[^>]*)>
License
ISC © Raine Lourie