Skip to main content

Yet another beancount importer - import any file formats, with total customization

Project description

yabci - yet another beancount importer

yabci (yet another beancount importer) is a flexible & extensible importer for beancount (v2), aiming to replace any standard importer without the need to write custom python code.

Its goal is to support as many import formats as possible, while giving you complete control over the conversion into beancount transactions. The conversion is configured by a config, eliminating the need to write custom python code (but which can be used for complex cases)

Motivation

There are a lot of beancount importers available. Most of them are specifically tailored for a certain format of certain banks or payment providers. And depending on the author's needs, they map import data to beancount transactions in a certain way. Any additional data from your import files is discarded. If you want to finetune beancount transactions or want to use more advanced features like tags, most of the time you are out of luck.

yabci tries to fill this gap: yabci is format-agnostic regarding input files (everything the underlying benedict supports, which means CSV, JSON, and more). On the beancount side, yabci supports all transaction properties, postings & balances (from the basic ones like date, payee, narration to tags, meta data & links).

The only thing to do for the end user is to tell yabci, which input field shall be mapped into which beancount fields. yabci takes care of the rest, like parsing dates from strings, parsing numbers with currencies, duplicate detection, etc.

Features:

  • supports any input file format
    • a lot of formats out of the box, such as csv & json (anything that the fantastic benedict supports)
    • anything else can used by implementing a custom python function to convert the input file into a nested dict
  • complete control: you can decide specifically how your input data gets transformed into a beancount transaction
    • support for all beancount transaction properties (date, flag, payee, narration, tags, links)
    • support for all posting properties (account, amount, cost, price, flag)
    • support for transaction & post meta data
    • support for multiple postings per transaction
    • any field can be transformed while importing it, giving you total control over the output
  • conversion of data types: no more custom date or number parsing
  • duplication detection (optionally using existing identifiers in your input data)

Getting started with beancount importers

If you already know beancount importers, you can skip to [Getting started with yabci]

To import external data into beancount, beancount uses so-called importers. You can install them from pip or write them on your own. If you are reading this, you probably want to use yabci to create one on your own.

To tell beancount about your importers, you have to create importer config. This is a python file (with the ending .py) with the necessary importer code. While example importers can become complicated very easily (see the example at [https://github.com/beancount/beancount/blob/v2/examples/ingest/office/importers/utrade/utrade_csv.py]), importers using yabci should look a lot simpler.

If you have your importer ready, you can run the beancount command bean-extract on your import files. bean-extract will use your importer to generate beancount transactions, which you can paste / redirect into your *.beancount files.

Getting started with yabci

(if you want to see some real world code, check the repository's examples folder)

Say, you have the following csv from your bank, and want to import it into beancount:

bank-foo.csv

"ID","Datetime","Note","Type","From","To","Amount"
"2394198259925614643","2017-04-25T03:15:53","foo service","Payment","Brian Taylor","Foo company","-220"
"9571985041865770691","2017-06-05T23:25:11","by debit card-OTHPG 063441 bar service","Charge","Brian Taylor","Bar restaurant","-140"

Or maybe you have the data as json (yabci treats both input formats the same):

*bank-foo.json*
{
    "values": [
        {
            "ID": "2394198259925614643",
            "Datetime": "2017-04-25T03:15:53",
            "Note": "foo service",
            "Type": "Payment",
            "From": "Brian Taylor",
            "To": "Foo company",
            "Amount": "-220"
        },
        {
            "ID": "9571985041865770691",
            "Datetime": "2017-06-05T23:25:11",
            "Note": "by debit card-OTHPG 063441 bar service",
            "Type": "Charge",
            "From": "Brian Taylor",
            "To": "Bar restaurant",
            "Amount": "-140"
        }
    ]
}

You want to import that data into beancount, with the following requirements

  • transaction date shall obviously be taken from "Datetime"
  • payee shall be taken from "To"
  • description shall be a combination of "Type" and "Note"
  • flag shall always be *
  • transaction meta data shall contain the value of "ID"
  • transaction shall be tagged with #sampleimporter
  • you want one posting for the account Assets:FooBank:Account1 containing "Amount" as €
  • you want another posting for the account Expenses:Misc

With an according yabci config (see below), beancount can import & map your import data like this:

$ bean-extract config.py sample.csv

2017-04-25 * "Foo company" "(Payment): foo service" #sampleimporter
  id: "2394198259925614643"
  Assets:FooBank:Account1  -220 EUR
  Expenses:Misc

2017-06-05 * "Bar restaurant" "(Charge): by debit card-OTHPG 063441 bar service" #sampleimporter
  id: "9571985041865770691"
  Assets:FooBank:Account1  -140 EUR
  Expenses:Misc

Now how does this work?

Like for any beancount importer, you have to specify how the data in the bank's export files shall be mapped into beancount transactions.

Following yabci config can be used to get the results above:

config.py

import yabci

CONFIG = [
    yabci.Importer(
        target_account="Assets:FooBank:Account1",

        # where to find the list of transactions (csv files can use "values")
        mapping_transactions="values",

        mapping_transaction={

            # regular str: use the value of "TransactionDate" in input data
            "date": "Datetime",
            "payee": "To",

            # if you want a fixed string, use type bytes (since regular strings
            # would be interpreted as dict key)
            "flag": b"*",

            # for more complex cases, you can use lambda functions. The function
            # receives the (complete) raw input dict as single argument
            "narration": lambda data: "(%s): %s" % (data.get("Type"), data.get("Note")),

            # if you pass a dict, the dict itself will be mapped again (with the
            # same logic as above)
            "meta": {
                "id": "ID",
            },

            # same goes for sets
            "tags": {b"sampleimporter"},

            # same goes for lists of dicts: each dict will be mapped again
            "postings": [
                {
                    "amount": lambda data: [data.get("Amount"), "EUR"],
                },
                {
                    "account": b"Expenses:Misc",
                },
            ],
        }
    ),
]

Notes:

  • "date" only accepts datetime.date. If a string is passed, yabci tries to convert it via dateutil.parser
  • "amount" must be a 2-element list, containing numeric amount & currency

More advanced features

benedict arguments

If you need to pass special parameters to benedict (for example how your CSV is formatted), you can use the config entry benedict_kwargs. This dict gets passed to benedict and determines how you input file is parsed. See for available options.

Example for passing options about CSV format:

CONFIG = [
    yabci.Importer(
        benedict_kwargs={"delimiter": ";"},
        # ...
    )
]

date parsing

Transaction dates are parsed using dateutil. If you need to pass certain options to parse(), you can use parse_date_options:

Example for european dates ("01.06.2023" is parsed as "January 6th" by default, if you want it to be interpreted as "June 1st", you have to pass the dayfirst option)

CONFIG = [
    yabci.Importer(
        parse_date_options={"dayfirst": True},
        # ...
    )
]

Unsupported ìnput data formats

If you want to import data from formats which are not supported by benedict, you can define a prepare_data method. This method should transform the input file into a (nested) dictionary which benedict can parse afterwards. Since you can use arbitrary python code here, you should be able to use yabci for really any file formats.

.json inside a zip file (as found in moneywallet)

moneywallet .mwbx backup files are zip files which contain a database.json file. You can support this format:

def read_json_from_zip(filename, pattern):
    import zipfile
    import json
    import re
    with zipfile.ZipFile(filename, "r") as z:
        for filename in z.namelist():
            if re.match(pattern, filename):
                with z.open(filename) as f:
                    return json.loads(f.read())

CONFIG = [
    yabci.Importer(
        prepare_data=lambda filename: read_json_from_zip(filename, r".*database\.json"),
        # ...
    )
]

CSVs with special encoding

Benedict tries to read CSVs with the system encoding (probably utf-8), and will choke on different encodings. If your CSV uses a different encoding, you have to read the CSV into dict explicitly:

# https://docs.python.org/3/library/csv.html
def read_csv_windows1252(filename):
    import csv
    values = []
    with open(filename, newline='', encoding='windows-1252') as f:
        reader = csv.DictReader(f)
        for row in reader:
            values.append(row)

    return {"values": values}


CONFIG = [
    yabci.Importer(
        prepare_data=read_csv_windows1252,
        # ...
    )
]

Detecting duplicate transactions

If your input data contains some form of unique id, you can use it to prevent importing the same transaction twice.

Therefore, you must import the unique id into a meta field, and let yabci know it should be used to identifiy duplicates. Beancount will not re-import these transactions.

confiy.py

import yabci
from beancount.ingest.scripts_utils import ingest

yabci.Importer({
    # ...
    "duplication_key": "meta.duplication_key",

    "mapping": {
        # ...
        "transaction": {
            # ...
            "meta": {
                # use the value of "transaction_id"
                "duplication_key": "transaction_id",
            },
        },
    },
})

# beancount uses its own duplicate detection by default, which interferes with
# yabci's approach. Disable it therefore. The variable `HOOKS` is needed to
# disable it within fava as well, see
# https://github.com/beancount/fava/issues/1197 and
# https://github.com/beancount/fava/issues/1184

HOOKS = []

if __name__ == "main":
    ingest(CONFIG, hooks=[])

This creates transactions with meta data duplication_key:

2023-01-01 * "foo transaction"
  duplication_key: "8461dd69-e9eb-4deb-9014-b5ffd082ede0"
  ...

2023-01-02 * "bar transaction"
  duplication_key: "be8595a1-c0af-496f-87ac-7ff67e6d757b"
  ...

The next time you try to import the same transaction, beancount will identify it as duplicate & comment the transactions, so they will not be imported a second time.

; 2023-01-01 * "foo transaction"
;   duplication_key: "8461dd69-e9eb-4deb-9014-b5ffd082ede0"
;   ...

; 2023-01-02 * "bar transaction"
;   duplication_key: "be8595a1-c0af-496f-87ac-7ff67e6d757b"
;   ...

Duplicate detection without suitable identifer field

If your input data contains no suitable field, you can also fallback to hashing the complete raw transaction data:

"duplication_key": lambda data: yabci.utils.hash_str(data.dump())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yabci-0.3.2.tar.gz (52.0 kB view hashes)

Uploaded Source

Built Distribution

yabci-0.3.2-py3-none-any.whl (34.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page