Skip to main content

A utility for packaging objects and validating metadata for FAIRSCAPE

Project description

fairscape-cli

A utility for packaging objects and validating metadata for FAIRSCAPE.


Documentation: https://fairscape.github.io/fairscape-cli/

Features

fairscape-cli provides a Command Line Interface (CLI) that allows the client side to create:

  • RO-Crate - a light-weight approach to packaging research data with their metadata. The CLI allows users to:
    • Create Research Object Crates (RO-Crates)
    • Add (transfer) digital objects to the RO-Crate
    • Register metadata of the objects
    • Describe the schema of tabular dataset objects as metadata and perform validation.

Requirements

Python 3.8+

Installation

$ pip install fairscape-cli

Minimal example

Basic commands

  • Show all commands, arguments, and options
$ fairscape-cli --help
  • Create an RO-Crate in a specified directory
$ fairscape-cli rocrate create \
  --name "test rocrate" \
  --description "Example RO Crate for Tests" \
  --organization-name "UVA" \
  --project-name "B2AI"  \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  "./test_rocrate"
  • Create an RO-Crate in the current working directory
$ fairscape-cli rocrate init \
  --name "test rocrate" \
  --description "Example RO Crate for Tests" \
  --organization-name "UVA" \
  --project-name "B2AI"  \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS"
  • Add a dataset to the RO-Crate
$ fairscape-cli rocrate add dataset \
  --name "AP-MS embeddings" \
  --author "Krogan lab (https://kroganlab.ucsf.edu/krogan-lab)" \
  --version "1.0" \
  --date-published "2021-04-23" \
  --description "Affinity purification mass spectrometer (APMS) embeddings for each protein in the study,  generated by node2vec predict." \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  --data-format "CSV" \
  --source-filepath "./tests/data/APMS_embedding_MUSIC.csv" \
  --destination-filepath "./test_rocrate/APMS_embedding_MUSIC.csv" \
  "./test_rocrate"
  • Add a software to the RO-Crate
$ fairscape-cli rocrate add software \
  --name "calibrate pairwise distance" \
  --author "Qin, Y." \
  --version "1.0" \
  --description "script written in python to calibrate pairwise distance." \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  --file-format "py" \
  --source-filepath "./tests/data/calibrate_pairwise_distance.py" \
  --destination-filepath "./test_rocrate/calibrate_pairwise_distance.py" \
  --date-modified "2021-04-23" \
  "./test_rocrate"
  • Register a computation to the RO-Crate
$ fairscape-cli rocrate register computation \
  --name "calibrate pairwise distance" \
  --run-by "Qin, Y." \
  --date-created "2021-05-23" \
  --description "Average the predicted proximities" \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  "./test_rocrate"
  • Create a schema
$ fairscape-cli schema create-tabular \
    --name 'APMS Embedding Schema' \
    --description 'Tabular format for APMS music embeddings from PPI networks from the music pipeline from the B2AI Cellmaps for AI project' \
    --separator ',' \
    --header False \
    ./schema_apms_music_embedding.json
  • Add a string property
$ fairscape-cli schema add-property string \
    --name 'Experiment Identifier' \
    --index 0 \
    --description 'Identifier for the APMS experiment responsible for generating the raw PPI used to create this embedding vector' \
    --pattern '^APMS_[0-9]*$' \
    ./schema_apms_music_embedding.json
  • Add annother string property
$ fairscape-cli schema add-property string \
    --name 'Gene Symbol' \
    --index 1 \
    --description 'Gene Symbol for the APMS bait protien' \
    --pattern '^[A-Za-z0-9\-]*$' \
    --value-url 'http://edamontology.org/data_1026' \
    ./schema_apms_music_embedding.json
  • Add an array property
$ fairscape-cli schema add-property array \
    --name 'MUSIC APMS Embedding' \
    --index '2::' \
    --description 'Embedding Vector values for genes determined by running node2vec on APMS PPI networks. Vector has 1024 values for each bait protien' \
    --items-datatype 'number' \
    --unique-items False \
    --min-items 1024 \
    --max-items 1024 \
    ./schema_apms_music_embedding.json
  • Show a successful validation of the schema against the dataset
$ fairscape-cli schema validate \
    --data ./examples/schemas/MUSIC_embedding/APMS_embedding_MUSIC.csv  \
    --schema ./examples/schemas/MUSIC_embedding/music_apms_embedding_schema.json
  • Show an unsuccessful validation of the schema against the dataset
$ fairscape-cli schema validate \
    --data examples/schemas/MUSIC_embedding/APMS_embedding_corrupted.csv \
    --schema examples/schemas/MUSIC_embedding/music_apms_embedding_schema.json
  • Validate using default schemas
# validate imageloader files
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/imageloader/samplescopy.csv" \
        --schema "ark:59852/schema-cm4ai-imageloader-samplescopy" 
    
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/imageloader/uniquecopy.csv" \
        --schema "ark:59852/schema-cm4ai-imageloader-uniquecopy"
       
# validate image embedding outputs
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/image_embedding/image_emd.tsv" \
        --schema "ark:59852/schema-cm4ai-image-embedding-image-emd"
     
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/image_embedding/labels_prob.tsv" \
        --schema "ark:59852/schema-cm4ai-image-embedding-labels-prob"

# validate apsm loader input
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/apmsloader/ppi_gene_node_attributes.tsv" \
        --schema "ark:59852/schema-cm4ai-apmsloader-gene-node-attributes"

$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/apmsloader/ppi_edgelist.tsv" \
        --schema "ark:59852/schema-cm4ai-apmsloader-ppi-edgelist"

# validate apms embedding 
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/apms_embedding/ppi_emd.tsv" \
        --schema "ark:59852/schema-cm4ai-apms-embedding"    

# validate coembedding 
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/coembedding/coembedding_emd.tsv" \
        --schema "ark:59852/schema-cm4ai-coembedding"

Contribution

If you'd like to request a feature or report a bug, please create a GitHub Issue using one of the templates provided.

License

This project is licensed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fairscape-cli-0.2.0.tar.gz (32.6 kB view hashes)

Uploaded Source

Built Distribution

fairscape_cli-0.2.0-py3-none-any.whl (38.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page