open-korean-text-node
A nodejs binding for open-korean-text via node-java interface.
Dependency
Currently wraps open-korean-text 2.2.0
현재 이 프로젝트는 open-korean-text 2.2.0을 사용중입니다.
Requirement
Since it uses java code compiled with Java 8, make sure you have both Java 8 JDK and JRE installed.
For more details about installing java interface, see installation notes on below links.
이 프로젝트는 Java 8로 컴파일된 코드를 사용하기 때문에, Java 8 JDK/JRE가 설치되어 있어야 합니다.
Java interface의 설치에 관련된 더 자세한 사항은 아래 링크에서 확인하세요.
Installation
npm install --save open-korean-text-node
Usage
;// or;
- See API section to get more informations.
Examples
API
OpenKoreanText
Tokenizing
OpenKoreanText.tokenizetext: string: Promise<IntermediaryTokens>;OpenKoreanText.tokenizeSynctext: string: IntermediaryTokens;
text
a target string to tokenize
Detokenizing
OpenKoreanText.detokenizetokens: IntermediaryTokensObject: Promise<string>;OpenKoreanText.detokenizewords: string: Promise<string>;OpenKoreanText.detokenize...words: string: Promise<string>;OpenKoreanText.detokenizeSynctokens: IntermediaryTokensObject: string;OpenKoreanText.detokenizeSyncwords: string: string;OpenKoreanText.detokenizeSync...words: string: string;
tokens
an intermediary token object fromtokenize
words
an array of words to detokenize
Phrase Extracting
OpenKoreanText.extractPhrasestokens: IntermediaryTokens, options?: ExcludePhrasesOptions: Promise<KoreanToken>;OpenKoreanText.extractPhrasesSynctokens: IntermediaryTokens, options?: ExcludePhrasesOptions: KoreanToken;
tokens
an intermediary token object fromtokenize
orstem
options
an object to pass options to extract phrases wherefilterSpam
- a flag to filter spam tokens. defaults totrue
includeHashtag
- a flag to include hashtag tokens. defaults tofalse
Normalizing
OpenKoreanText.normalizetext: string: Promise<string>;OpenKoreanText.normalizeSynctext: string: string;
text
a target string to normalize
Sentence Splitting
OpenKoreanText.splitSentencestext: string: Promise<Sentence>;OpenKoreanText.splitSentencesSynctext: string: Sentence;
text
a target string to normalize
- returns array of
Sentence
which includes:text
: string - the sentence's textstart
: number - the sentence's start position from original stringend
: number - the sentence's end position from original string
Custom Dictionary
OpenKoreanText.addNounsToDictionary...words: string: Promise<void>;OpenKoreanText.addNounsToDictionarySync...words: string: void;
words
words to add to dictionary
toJSON
OpenKoreanText.tokensToJsonArraytokens: IntermediaryTokensObject, keepSpace?: boolean: Promise<KoreanToken>;OpenKoreanText.tokensToJsonArraySynctokens: IntermediaryTokensObject, keepSpace?: boolean: KoreanToken;
tokens
an intermediary token object fromtokenize
orstem
keepSpace
a flag to omit 'Space' token or not, defaults tofalse
IntermediaryToken object
An intermediate token object required for internal processing.
Provides a convenience wrapper functionS to process text without using processor object
tokens.extractPhrasesoptions?: ExcludePhrasesOptions: Promise<KoreanToken>;tokens.extractPhrasesSyncoptions?: ExcludePhrasesOptions: KoreanToken;tokens.detokenize: Promise<string>;tokens.detokenizeSync: string;tokens.toJSON: KoreanToken;
- NOTE:
tokens.toJSON()
method is equivalent withOpenKoreanText.tokensToJsonArraySync(tokens, false)
KoreanToken object
A JSON output object which contains:
text
: string - token's textstem
: string - token's stempos
: stirng - type of token. possible entries are:- Word level POS:
Noun
,Verb
,Adjective
,Adverb
,Determiner
,Exclamation
,Josa
,Eomi
,PreEomi
,Conjunction
,NounPrefix
,VerbPrefix
,Suffix
,Unknown
- Chunk level POS:
Korean
,Foreign
,Number
,KoreanParticle
,Alpha
,Punctuation
,Hashtag
,ScreenName
,Email
,URL
,CashTag
- Functional POS:
Space
,Others
- Word level POS:
offset
: number - position from original stringlength
: number - length of textisUnknown
: boolean