pybiber

Extract Biber features from a document parsed and annotated by spaCy.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

The pybiber package aggregates the lexicogrammatical and functional features described by Biber (1988) and widely used for text-type, register, and genre classification tasks.

The package uses spaCy part-of-speech tagging and dependency parsing to summarize and aggregate patterns.

Because feature extraction builds from the outputs of probabilistic taggers, the accuracy of the resulting counts are reliant on the accuracy of those models. Thus, texts with irregular spellings, non-normative punctuation, etc. will likely produce unreliable outputs, unless taggers are tuned specifically for those purposes.

See the documentation for description of the package’s full functionality.

See pseudobibeR for the R implementation.

Installation

You can install the released version of pybiber from PyPI:

pip install pybiber

Install a spaCY model:

python -m spacy download en_core_web_sm

Usage

To use the pybiber package, you must first import spaCy and initiate an instance. You will also need to create a corpus. The biber function expects a polars DataFrame with a doc_id column and a text column. This follows the convention for readtext and corpus processing using quanteda in R.

import spacy
import pybiber as pb
from pybiber.data import micusp_mini

The pybiber package requires a model that will carry out part-of-speech tagging and dependency parsing.

nlp = spacy.load("en_core_web_sm", disable=["ner"])

To process the corpus, use spacy_parse. Processing the micusp_mini corpus should take between 20-30 seconds.

df_spacy = pb.spacy_parse(micusp_mini, nlp)

After parsing the corpus, features can then be aggregated using biber.

df_biber = pb.biber(df_spacy)

License

Code licensed under Apache License 2.0. See LICENSE file.

Algorithm	Hash digest
SHA256	`a03b2e6602323345c7665f2fd901c9b69d16486d859f5a595f5c724d50e815a1`
MD5	`fe6e28c84f2d641e9cd21621f7da4c07`
BLAKE2b-256	`3bae68f61ffc358334e159b0d1df3317dd4855affd490441d5c37f22d9bfe544`

Algorithm	Hash digest
SHA256	`3c9fc4def57b3e292ce7e28863ad44925570545ca18388f50e73c9632d2e111f`
MD5	`1272ee81aeb97d3d03a6c22c38bc307f`
BLAKE2b-256	`dd8bc05f5570b54705d80587a6235990c059ddd35d50626129e78a60f085d391`

pybiber 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Installation

Usage

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance