Skip to content

ukoloff/valid-8

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

valid-8

Build Status Build status npm version Bower version

Pure JavaScript implementation of UTF-8 validation.

To be drop-in replacement for utf-8-validate.

Most time and efforts were spent to develop extensive test suite (over 18k assertions).

Testing

Tests are run using mocha with regular command:

npm test

Many non-obvious aspects of UTF-8 validation are tested, including:

  • UTF surrogates
  • long sequences
  • overlong sequences
  • incomplete sequences

Testing other libraries

To test other UTF-8 validation libraries, first install them

cd test/others
npm install
cd ../..

and then run tests for one library, eg:

npm test --lib=utf-8-validate

or:

npm test --lib=is-utf8

Speed

Validation speed is measured during test. So far this validator is fastest (this is not a joke!).

  • valid-8: 300 Mb/s (pure JavaScript)
  • utf-8-validate: 260 Mb/s (C++)
  • is-utf8: 110 Mb/s (pure JavaScript either)

API

Validation is simple:

valid8 = require('valid-8')

if(!valid8(new Buffer('你好,世界!')))
{
  // ...
}

For compatibility with utf-8-validate alias is set valid8.Validation.isValidUTF8 === validate8.

By default, valid8 rejects UTF surrogates (0xD800-0xDFFF) and codepoints higher than 0x10FFFF, according to UTF specification.

One can force UTF surrogates to pass test setting valid8.surrogates = true.

To allow long sequences (say, 5 or 6 bytes), set validate8.maxBytes to 5 or 6. 7-byte sequences will always be rejected. By default validate8.maxBytes=4, and can be set to 1, 2 or 3 either. Eg, set validate8.maxBytes=2 to disable Chinese ideograms (and many other symbols).

Rivals

See also