Skip to main content

Python library to parse Apertium stream format

Project description

Apertium Streamparser

Build Status Coverage Status PyPI PyPI - Python Version PyPI - Implementation

Python 3 library to parse Apertium stream format, generating LexicalUnits.

Installation

Streamparser is available through PyPi:

$ pip install apertium-streamparser
$ apertium-streamparser
$^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$
[[SReading(baseform='vino', tags=['n', 'm', 'sg'])], [SReading(baseform='venir', tags=['vblex', 'ifi', 'p3', 'sg'])]]

Installation through PyPi will also install the streamparser module.

Usage

As a library

With string input

>>> from streamparser import parse
>>> lexical_units = parse('^hypercholesterolemia/*hypercholesterolemia$\[\]\^\$[^ignoreme/yesreally$]^a\/s/a\/s<n><nt>$^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$.eefe^dímelo/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><nt>/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><m><sg>$')
>>> for lexical_unit in lexical_units:
        print('%s (%s) → %s' % (lexical_unit.wordform, lexical_unit.knownness, lexical_unit.readings))
hypercholesterolemia (<class 'streamparser.unknown'>) → [[SReading(baseform='*hypercholesterolemia', tags=[])]]
a\/s (<class 'streamparser.known'>) → [[SReading(baseform='a\\/s', tags=['n', 'nt'])]]
vino (<class 'streamparser.known'>) → [[SReading(baseform='vino', tags=['n', 'm', 'sg'])], [SReading(baseform='venir', tags=['vblex', 'ifi', 'p3', 'sg'])]]
dímelo (<class 'streamparser.known'>) → [[SReading(baseform='decir', tags=['vblex', 'imp', 'p2', 'sg']), SReading(baseform='me', tags=['prn', 'enc', 'p1', 'mf', 'sg']), SReading(baseform='lo', tags=['prn', 'enc', 'p3', 'nt'])], [SReading(baseform='decir', tags=['vblex', 'imp', 'p2', 'sg']), SReading(baseform='me', tags=['prn', 'enc', 'p1', 'mf', 'sg']), SReading(baseform='lo', tags=['prn', 'enc', 'p3', 'm', 'sg'])]]

With file input

>>> from streamparser import parse_file
>>> lexical_units = parse_file(open('~/Downloads/analyzed.txt'))
>>> for lexical_unit in lexical_units:
        print('%s (%s) → %s' % (lexical_unit.wordform, lexical_unit.knownness, lexical_unit.readings))
Høgre (<class 'streamparser.known'>) → [[SReading(baseform='Høgre', tags=['np'])], [SReading(baseform='høgre', tags=['n', 'nt', 'sp'])], [SReading(baseform='høg', tags=['un', 'sint', 'sp', 'comp', 'adj'])], [SReading(baseform='høgre', tags=['f', 'n', 'ind', 'sg'])], [SReading(baseform='høgre', tags=['f', 'n', 'ind', 'sg'])], [SReading(baseform='høgre', tags=['sg', 'nt', 'ind', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['mf', 'sg', 'ind', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['un', 'ind', 'pl', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['un', 'def', 'sp', 'posi', 'adj'])]]
kolonne (<class 'streamparser.known'>) → [[SReading(baseform='kolonne', tags=['m', 'n', 'ind', 'sg'])], [SReading(baseform='kolonne', tags=['m', 'n', 'ind', 'sg'])]]
Grunnprinsipp (<class 'streamparser.known'>) → [[SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'ind', 'sg'])], S[Reading(baseform='grunnprinsipp', tags=['n', 'nt', 'pl', 'ind'])], [SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'ind', 'sg'])], [SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'pl', 'ind'])]]
7 (<class 'streamparser.known'>) → [[SReading(baseform='7', tags=['qnt', 'pl', 'det'])]]
px (<class 'streamparser.unknown'>) → []

From the terminal

With standard input

$ bzcat ~/corpora/nnclean2.txt.bz2 | apertium-deshtml | lt-proc -we /usr/share/apertium/apertium-nno/nno.automorf.bin | python3 streamparser.py
[[SReading(baseform='Høgre', tags=['np'])],
 [SReading(baseform='høgre', tags=['n', 'sp', 'nt'])],
 [SReading(baseform='høg', tags=['un', 'sp', 'adj', 'comp', 'sint'])],
 [SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
 [SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'nt', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'mf', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'un', 'pl'])],
 [SReading(baseform='høgre', tags=['posi', 'def', 'sp', 'adj', 'un'])]]
[[SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])],
 [SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])]]
...

With file input in terminal

$ bzcat ~/corpora/nnclean2.txt.bz2 | apertium-deshtml | lt-proc -we /usr/share/apertium/apertium-nno/nno.automorf.bin > analyzed.txt
$ python3 streamparser.py analyzed.txt
[[SReading(baseform='Høgre', tags=['np'])],
 [SReading(baseform='høgre', tags=['n', 'sp', 'nt'])],
 [SReading(baseform='høg', tags=['un', 'sp', 'adj', 'comp', 'sint'])],
 [SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
 [SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'nt', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'mf', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'un', 'pl'])],
 [SReading(baseform='høgre', tags=['posi', 'def', 'sp', 'adj', 'un'])]]
[[SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])],
 [SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])]]
...

Contributing

Streamparser uses TravisCI for continous integration. Locally, use make test to run the same checks it does. Use pip install -r requirements.txt to install the requirements required for development, e.g. linters.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apertium-streamparser-5.0.2.tar.gz (19.3 kB view hashes)

Uploaded Source

Built Distribution

apertium_streamparser-5.0.2-py3-none-any.whl (5.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page