A tool for linking two DataJoint tables located on different database servers

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

:link: datajoint-link

datajoint-link logo

A tool for convenient and integrity-preserving data sharing between database servers.

:floppy_disk: Installation

Only users interacting with the destination of the data need to install the datajoint-link package:

pip install datajoint-link

:wrench: Setup

Source

Datajoint-link requires access to the database server from which data will be pulled. It is recommended to create a new user for this purpose:

CREATE USER 'djlink'@'%' IDENTIFIED BY 'secret-password';

The user needs to have certain privileges on the table from which data will be pulled:

GRANT SELECT, REFERENCES ON `source\_schema`.`source\_table` TO 'djlink'@'%';

Each table from which data will be pulled also needs an additional helper table:

GRANT ALL PRIVILEGES ON `helper\_schema`.`helper\_table` TO 'djlink'@'%';

In order to preserve data integrity across the link regular users must not have any privileges on this helper table.

Destination

Datajoint-link needs to be configured with the username and password of the user created in the previous section. This is accomplished via environment variables:

LINK_USER=djlink
LINK_PASS=secret-password

:computer: Usage

The destination table is created by passing information about where to find the source table to the link decorator:

from link import link

@link(
    "databaseserver.com", 
    "source_schema", 
    "helper_schema", 
    "helper_table", 
    "destination_schema"
)
class Table:
    """Some table present in the source schema on the source database server."""

Note that the name of the declared class must match the name of the table from which the data will be pulled.

The class returned by the decorator behaves like a regular table with some added functionality. For one it allows the browsing of rows present in the source:

Table().source

All the rows can be pulled like so:

Table().source.pull()  # Hint: Pass display_progress=True to get a progress bar

That said usually we only want to pull rows that match a certain criteria:

(Table().source & "foo = 1").pull()

The deletion of already pulled rows works the same as for any other table:

(Table() & "foo = 1").delete()

The deletion of certain rows from the destination can also be requested by flagging them in the corresponding helper table:

row = (Helper() & "foo = 1").fetch1()
(Helper() & row).delete()
row["is_flagged"] = "TRUE"
Helper().insert1(row)

The flagged attribute makes the deletion of flagged rows from the destination table convenient:

(Table() & Table().source.flagged).delete()

Deleting a flagged row automatically updates its corresponding row in the helper table:

assert (Helper() & "foo = 1").fetch1("is_deprecated") == "TRUE" # No error!

Now it is save to delete the row from the source table as well!

:package: External Storage

Data stored in a source table that refers to one (or more) external stores can be stored in different store(s) after pulling:

@link(
    ...,
    stores={"source_store": "destination_store"}
)
class Table:
    ...

Note that all stores mentioned in the dictionary need to be configured via dj.config.

:white_check_mark: Tests

Clone this repository and run the following command from within the cloned repository to run all tests:

docker compose run functional_tests tests

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.11

Feb 28, 2024

0.10

Nov 9, 2023

0.9

Oct 9, 2023

0.8

Sep 28, 2023

0.7

Sep 27, 2023

0.6

Sep 8, 2023

0.5

Aug 25, 2023

0.4

Mar 30, 2023

0.3

Mar 12, 2021

0.2

Oct 6, 2020

0.1

Oct 6, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datajoint_link-0.11.tar.gz (48.4 kB view hashes)

Uploaded Feb 28, 2024 Source

Built Distribution

datajoint_link-0.11-py3-none-any.whl (38.7 kB view hashes)

Uploaded Feb 28, 2024 Python 3

Hashes for datajoint_link-0.11.tar.gz

Hashes for datajoint_link-0.11.tar.gz
Algorithm	Hash digest
SHA256	`29ff75f408f03cca89d596bb6465cb473a84a0a5e7e77b353e4498f3fb446db4`
MD5	`8c1799ab50f57260366e1d70ef7333ef`
BLAKE2b-256	`8c7da690eddcbb46897e50deb7d773ab942d09fb5e1eb02fe30b575619800c59`

Hashes for datajoint_link-0.11-py3-none-any.whl

Hashes for datajoint_link-0.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`428d396d0ce886a8e72252e401332ebc58bbca11e5cfec617afc02d9785bebae`
MD5	`005ad8f872eedbd43fde749811d7238f`
BLAKE2b-256	`dce0473c03eae4ff42baee997409cb0aa00c86b93630d68d5f3df74f6bfda7b1`