Package for Bulgarian Natural Language Processing (NLP)

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

bgnlp: Model-first approach to Bulgarian NLP

pip install bgnlp

Package functionalities

Part-of-speech
Lemmatization
Named Entity Recognition
Keyword Extraction
Commatization

Please note - only the first time you run one of these operations a model will be downloaded! Therefore, the first run might take more time.

Part-of-speech (PoS) tagging

from bgnlp import pos


print(pos("Това е библиотека за обработка на естествен език."))

[{
    "word": "Това",
    "tag": "PDOsn",
    "bg_desc": "местоимение",
    "en_desc": "pronoun"
}, {
    "word": "е",
    "tag": "VLINr3s",
    "bg_desc": "глагол",
    "en_desc": "verb"
}, {
    "word": "библиотека",
    "tag": "NCFsof",
    "bg_desc": "съществително име",
    "en_desc": "noun"
}, {
    "word": "за",
    "tag": "R",
    "bg_desc": "предлог",
    "en_desc": "preposition"
}, {
    "word": "обработка",
    "tag": "NCFsof",
    "bg_desc": "съществително име",
    "en_desc": "noun"
}, {
    "word": "на",
    "tag": "R",
    "bg_desc": "предлог",
    "en_desc": "preposition"
}, {
    "word": "естествен",
    "tag": "Asmo",
    "bg_desc": "прилагателно име",
    "en_desc": "adjective"
}, {
    "word": "език",
    "tag": "NCMsom",
    "bg_desc": "съществително име",
    "en_desc": "noun"
}, {
    "word": ".",
    "tag": "U",
    "bg_desc": "препинателен знак",
    "en_desc": "punctuation"
}]

Lemmatization

from bgnlp import lemmatize


text = "Добре дошли!"
print(lemmatize(text))

[{'word': 'Добре', 'lemma': 'Добре'}, {'word': 'дошли', 'lemma': 'дойда'}, {'word': '!', 'lemma': '!'}]

# Generating a string of lemmas.
print(lemmatize(text, as_string=True))

Добре дойда!

Named Entity Recognition (NER) tagging

Currently, the available NER tags are:

PER - Person
ORG - Organization
LOC - Location

from bgnlp import ner


text = "Барух Спиноза е роден в Амстердам"

print(f"Input: {text}")
print("Result:", ner(text))

Input: Барух Спиноза е роден в Амстердам
Result: [{'word': 'Барух Спиноза', 'entity_group': 'PER'}, {'word': 'Амстердам', 'entity_group': 'LOC'}]

Keyword Extraction

from bgnlp import extract_keywords


# Reading the text from a file, since it may be large, hence it wouldn't be 
# pleasant to write it directly here.
# The current input is this Bulgarian news article (only the text, no HTML!):
# https://novini.bg/sviat/eu/781622
with open("input_text.txt", "r", encoding="utf-8") as f:
    text = f.read()

# Extracting keywords with probability of at least 0.5.
keywords = extract_keywords(text, threshold=0.5)
print("Keywords:")
pprint(keywords)

Keywords:
[{'keyword': 'Еманюел Макрон', 'score': 0.8759163320064545},
 {'keyword': 'Г-7', 'score': 0.5938143730163574},
 {'keyword': 'Япония', 'score': 0.607077419757843}]

Commatization

from pprint import pprint

from bgnlp import commatize


text = "Човекът искащ безгрижно писане ме помоли да създам този модел."

print("Without metadata:")
print(commatize(text))

print("\nWith metadata:")
pprint(commatize(text, return_metadata=True))

Without metadata:
Човекът, искащ безгрижно писане, ме помоли да създам този модел.

With metadata:
('Човекът, искащ безгрижно писане, ме помоли да създам този модел.',
 [{'end': 12,
   'score': 0.9301406145095825,
   'start': 0,
   'substring': 'Човекът, иск'},
  {'end': 34,
   'score': 0.93571537733078,
   'start': 24,
   'substring': ' писане, м'}])

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.5.3

Jan 27, 2024

0.5.2

Jan 27, 2024

0.5.1

Dec 30, 2023

0.5.0

Dec 30, 2023

0.4.1

Dec 22, 2023

0.4.0

Dec 22, 2023

0.3.1

May 17, 2023

0.3.0

May 16, 2023

0.2.0

May 8, 2023

0.1.4

Apr 7, 2023

0.1.3

Apr 7, 2023

0.1.2

Apr 7, 2023

0.1.1

Apr 7, 2023

0.1.0

Apr 7, 2023

0.0.13

Mar 21, 2023

0.0.12

Mar 20, 2023

0.0.11

Mar 18, 2023

0.0.10

Mar 18, 2023

0.0.9

Mar 18, 2023

0.0.8

Mar 9, 2023

0.0.7

Mar 9, 2023

0.0.6

Mar 9, 2023

0.0.5

Jan 27, 2023

0.0.4

Jan 21, 2023

0.0.3

Jan 21, 2023

0.0.2

Jan 18, 2023

0.0.1

Jan 13, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bgnlp-0.5.3.tar.gz (52.0 kB view hashes)

Uploaded Jan 27, 2024 Source

Built Distribution

bgnlp-0.5.3-py3-none-any.whl (50.9 kB view hashes)

Uploaded Jan 27, 2024 Python 3

Hashes for bgnlp-0.5.3.tar.gz

Hashes for bgnlp-0.5.3.tar.gz
Algorithm	Hash digest
SHA256	`96e67221583538fb013fa7e6ae6f585ce89f4e1b191cc84c015116518b1581db`
MD5	`0e9056a1004147a27bcdf8fba0c183f6`
BLAKE2b-256	`f631f50bd56638760e395c5998696190403fcca6370845ee3678f773d755e6eb`

Hashes for bgnlp-0.5.3-py3-none-any.whl

Hashes for bgnlp-0.5.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7d9e82108bffe74e1ffa1d5f22f6ea2c5372ef22da32587ff6bb765be3212fc6`
MD5	`9adc7dea55413e353f011d9f80cb78e1`
BLAKE2b-256	`0f1e4da61a314656bceff8d3b348a0e898a1fe1ab50fc21d2ba0a846423485b7`