Project description

Time Series Diffusion in the Frequency Domain

This repository implements time series diffusion in the frequency domain. For more details, please read our paper: Time Series Diffusion in the Frequency Domain.

1. Install

From repository:

Clone the repository.
Create and activate a new environment with conda (with Python 3.10 or newer).

conda env create -n fdiff python=3.10
conda activate fdiff

Install the requirement.

pip install -e .

If you intend to train models, make sure that wandb is correctly configured on your machine by following this guide.
Some of the datasets are automatically downloaded by our scripts via kaggle API. Make sure to create a kaggle token as explained here.

When the packages are installed, you are ready to train diffusion models!

2. Use

2.1 Train

In order to train models, you can simply run the following command:

python cmd/train.py

By default, this command will train a score model in the time domain with the ecg dataset. In order to modify this behaviour, you can use hydra override syntax. The following hyperparameters can be modified to retrain all the models appearing in the paper:

Hyperparameter	Description	Values
fourier_transform	Whether or not to train a diffusion model in the frequency domain.	true, false
datamodule	Name of the dataset to use.	ecg, mimiciii, nasa, nasdaq, usdroughts
datamodule.subdataset	For the NASA dataset only. Selects between the charge and discharge subsets.	charge, discharge
datamodule.smoother_width	For the ECG dataset only. Width of the Gaussian kernel smoother applied in the frequency domain.	$\mathbb{R}^+$
score_model	The backbone to use for the score model.	default, lstm

At the end of training, your model is stored in the lightning_logs directory, in a folder named after the current run_id. You can find the run_id in the logs of the training and in the wandb dashboard if you have correctly configured wandb.

2.2 Sample

In order to sample from a trained model, you can simply run the following command:

python cmd/sample.py model_id=XYZ

where XYZ is the run_id of the model you want to sample from. At the end of sampling, the samples are stored in the lightning_logs directory, in a folder named after the current run_id.

One can then reproduce the plots in the paper by including the run_id to the run_list list appearing in this notebook and running all cells.

3. Contribute

If you wish to contribute, please make sure that your code is compliant with our tests and coding conventions. To do so, you should install the required testing packages with:

pip install -e .[test]

Then, you can run the tests with:

pytest

Before any commit, please make sure that your staged code is compliant with our coding conventions by running:

pre-commit

4. Cite us

If you use this code, please acknowledge our work by citing

@misc{crabbé2024time,
      title={Time Series Diffusion in the Frequency Domain}, 
      author={Jonathan Crabbé and Nicolas Huynh and Jan Stanczuk and Mihaela van der Schaar},
      year={2024},
      eprint={2402.05933},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.10
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

This version

0.1.0

Feb 9, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

freqdiff-0.1.0.tar.gz (34.1 kB view hashes)

Uploaded Feb 9, 2024 Source

Hashes for freqdiff-0.1.0.tar.gz

Hashes for freqdiff-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`9be5fafd56a15609341517c0edebbfc3c0117f993892d55d2b239e7ce1c4bfc2`
MD5	`0dd4fd4bd2d9dbeb966516ee8008870a`
BLAKE2b-256	`b63c6e4ce9461de0dec213f16eafa8558311cca7984bd2eb570e1972b7c45564`