Skip to main content

DyGyS is a package for Maximum Entropy regression models with gravity specification for undirected and directed network data. Moreover, it can solve them, generate the graph ensemble, compute several network statistics, calculate model selection measures such as AIC and BIC and quantify their reproduction accuracy in topological and weighted properties.

Project description

PyPI License:GPLv3 Python Version PRR CSF

DyGyS: DYadic GravitY regression models with Soft constraints

DyGyS is a package developed on python3 for Maximum Entropy regression models with gravity specification for undirected and directed network data.

DyGyS provides a numerous amount of models, described in their undirected declination in articles 1 and 2 and consisting of both econometric and statistical physics-inspired models. The use of soft constraints enable the user to explicitly constrain network properties such as the number of links, the degree sequence (degree centrality for Undirected Networks and out-degree/in-degree centralities for directed networks), and the total weight (for a small number of viable models).

Furthermore it is not only possible to solve the model and extract the parameters, but also to generate the ensemble, compute a number of network statistics, compute model selection measures such as AIC and BIC, and quantify the reproduction accuracy of:

  • Topology using measures as True Positive Rate, Specificity, Precision, Accuracy, Balanced Accuracy and F1score;
  • Weights, measuring the fraction of weights inside the percentile CI extracted from the ensemble of graphs;
  • Network Statistics, measuring the fraction of nodes for which the network statistics are inside the wanted percentile CI extracted from the ensemble of graphs.

To explore Maximum-Entropy modeling on networks, checkout Maximum Entropy Hub.

When using the module for your scientific research please consider citing:

    @article{PhysRevResearch.4.033105,
      title = {Gravity models of networks: Integrating maximum-entropy and econometric     approaches},
      author = {Di Vece, Marzio and Garlaschelli, Diego and Squartini, Tiziano},
      journal = {Phys. Rev. Research},
      volume = {4},
      issue = {3},
      pages = {033105},
      numpages = {19},
      year = {2022},
      month = {Aug},
      publisher = {American Physical Society},
      doi = {10.1103/PhysRevResearch.4.033105},
      url = {https://link.aps.org/doi/10.1103/PhysRevResearch.4.033105}
    }

and

    @article{DIVECE2023112958,
    title = {Reconciling econometrics with continuous maximum-entropy network models},
    journal = {Chaos, Solitons & Fractals},
    volume = {166},
    pages = {112958},
    year = {2023},
    issn = {0960-0779},
    doi = {https://doi.org/10.1016/j.chaos.2022.112958},
    url = {https://www.sciencedirect.com/science/article/pii/S0960077922011377},
    author = {Marzio {Di Vece} and Diego Garlaschelli and Tiziano Squartini},
    keywords = {Shannon entropy, Network reconstruction, Econophysics, Econometrics, Trade, Gravity},
    expected link weight coming from a probability distribution whose functional form can be chosen arbitrarily, while statistical-physics approaches construct maximum-entropy distributions of weighted graphs, constrained to satisfy a given set of measurable network properties. In a recent, companion paper, we integrated the two approaches and applied them to the World Trade Web, i.e. the network of international trade among world countries. While the companion paper dealt only with discrete-valued link weights, the present paper extends the theoretical framework to continuous-valued link weights. In particular, we construct two broad classes of maximum-entropy models, namely the integrated and the conditional ones, defined by different criteria to derive and combine the probabilistic rules for placing links and loading them with weights. In the integrated models, both rules follow from a single, constrained optimization of the continuous Kullback–Leibler divergence; in the conditional models, the two rules are disentangled and the functional form of the weight distribution follows from a conditional, optimization procedure. After deriving the general functional form of the two classes, we turn each of them into a proper family of econometric models via a suitable identification of the econometric function relating the corresponding, expected link weights to macroeconomic factors. After testing the two classes of models on World Trade Web data, we discuss their strengths and weaknesses.}
    }

Contents

Currently Available Models

DyGyS contains models for network data with both continuous and discrete-valued semi-definite positive weights. The available models for discrete count data are described in 1 and consist of:

  • POIS Poisson Model
  • ZIP Zero-Inflated Poisson Model
  • NB2 Negative Binomial Model
  • ZINB Zero-Inflated Negative Binomial Model
  • L-CGeom L-constrained Conditional Geometric Model, noted as TSF in the paper.
  • k-CGeom k-constrained Conditional Geometric Model, noted as TS in the paper.
  • L-IGeom L-constrained Integrated Geometric Model, noted as H(1) in the paper.
  • k-IGeom k-constrained Integrated Geometric Model, noted as H(2) in the paper.

The analogue models for continuous-valued data are described in 2 and consist of:

  • L-CExp L-constrained Conditional Exponential Model, the L-constrained variant of C-Exp in the paper.
  • k-CExp k-constrained Conditional Exponential Model, noted as CExp in the paper.
  • L-IExp L-constrained Integrated Exponential Model, the L-constrained variant of I-Exp in the paper.
  • k-IExp k-constrained Integrated Exponential Model, noted as IExp in the paper.
  • L-CGamma L-constrained Conditional Gamma Model, the L-constrained variant of C-Gamma in the paper.
  • k-CGamma k-constrained Conditional Gamma Model, noted as CGamma in the paper.
  • L-CPareto L-constrained Conditional Pareto Model, the L-constrained variant of C-Pareto in the paper.
  • k-CPareto k-constrained Conditional Pareto Model, noted as CPareto in the paper.
  • L-CLognormal L-constrained Conditional Lognormal Model, the L-constrained variant of C-Lognormal in the paper.
  • k-CLognormal k-constrained Conditional Lognormal Model, noted as CLognormal in the paper.

Please refer to the papers for further details.

Installation

DyGyS can be installed via pip. You can do it from your terminal

    $ pip install DyGyS

If you already installed the package and want to upgrade it, you can type from your terminal:

    $ pip install DyGyS --upgrade

Dependencies

DyGyS uses the following dependencies:

  • scipy for optimization and root solving;
  • numba for fast computation of network statistics and criterion functions.
  • numba-scipy for fast computation of special functions such as gammaincinv and erfinv.

They can be easily installed via pip typing

$ pip install scipy
$ pip install numba
$ pip install numba-scipy

How-to Guidelines

The module containes two classes, namely UndirectedGraph and DirectedGraph. An Undirected Graph is defined as a network where weights are reciprocal, i.e., $w_{ij} = w_{ji}$ where $w_{ij}$ is the network weight from node $i$ to node $j$. If weights are not reciprocal, please use the DirectedGraph class.

Class Instance and Empirical Network Statistics

To inizialize an UndirectedGraph or DirectedGraph instance you can type:

G = UndirectedGraph(adjacency=Wij)
or
G = DirectedGraph(adjacency=Wij)

where Wij is the weighted adjacency matrix in 1-D (dense) or 2-D numpy array format.

After initializing you can already explore core network statistics such as (out-)degree, in-degree, average neighbor degree, binary clustering coefficient, (out-)strength, in-strength, average neighbor strength and weighted clustering coefficient. These are available using the respective codewords:

G.degree, G.degree_in, G.annd, G.clust, G.strength, G.strength_in, G.anns, G.clust_w

Solving the models

You can explore the currently available models using

G.implemented_models

use their names as described in this list not to incur in error messages.

In order to solve the models you need to define a regressor matrix $X_w$ of dimension $N_{obs} \times k$ where $N_{obs} = N^2$ is the number of observations (equivalent to the square of the number of nodes), and $k$ is the number of exogenous variables introduced in the Gravity Specification. For L-Constrained Conditional Models and Zero-Inflated models you ought to define also a regressor matrix $X_t$ for the first-stage (or topological) optimization and you can choose to fix some of the first-stage parameters.

When ready you can choose one of the aforementioned models and solve for their parameters using

G.solve(model= <chosen model>,exogenous_variables = X_w, selection_variables = X_t,
    fixed_selection_parameters = <chosen fixed selection parameters>)

Once you solved the model various other attributes become visible and measures dependent solely on criterion functions are computed. These include Loglikelihood, Jacobian, Infinite Jacobian Norm, AIC, Binary AIC and BIC, available using the codewords:

G.ll, G.jacobian, G.norm, G.aic, G.aic_binary, G.bic

For further details on the .solve functions please see the documentation.

Generating the network ensemble

Generating the network ensemble is very easy. It's enough to type:

G.gen_ensemble(n_ensemble=<wanted number of graphs>)

The graphs are produced using the "default_rng" method for discrete-valued models or using Inverse Transform Sampling for continuous-valued models.

This method returns

G.w_ensemble_matrix

which is a $N_{obs} \times N_{ensemble}$ matrix which includes all of the $N_{ensemble}$ adjacency matrices in the ensemble. Such method behaves well for networks up to $ N=200 $ for $10^{4}$ ensemble graphs, no test has been done for large networks where G.w_ensemble_matrix could be limited by RAM.

Computing relevant measures

Let's start by showing how to compute topology-related measures. You can type:

G.classification_measures(n_ensemble=<wanted number of graphs>,percentiles = (inf_p, sup_p), stats =[<list of wanted statistics>])

This method does not need G.w_ensemble_matrix so you can use it without generating the ensemble of weighted networks. The statistics you can compute are listed in G.implemented_classifier_statistics and once you define the number of networks, the ensemble percentiles and statistics of interest, it returns

G.avg_*, G.std_*, G.percentiles_*, G.array_*

where "avg" stands for ensemble average, "std" for ensemble standard deviation, "array" stands for the entire measures on each ensemble graph, "percentiles" is a tuple containing the inf_p-percentile (default 2.5) and sup_p-percentile (default 97.5) in the ensemble and * is the statistic of interest, written as in G.implemented_classifier_statistics.

To compute network statistics you can type:

G.netstats_measures(percentiles=(inf_p, sup_p), stats = [<list of wanted statistics>])

This method needs the previous computation of G.w_ensemble_matrix. It computes average, standard deviation, percentiles and ensemble arrays of the network statistics of interest which can be seen in G.implemented_network_statistics. It returns:

G.avg_*, G.std_*, G.percentiles_*, G.array_*

To compute the reproduction accuracy for the network statistics (introduced in 2) you can type:

G.reproduction_accuracy_s(percentiles=(inf_p,sup_p),stats=[])

This method needs the previous computation of G.w_ensemble_matrix. It computes the fraction of nodes for which the network measure is inside a percentile CI extracted from the graph ensemble. It returns

G.RA_s

i.e., a list of reproduction accuracies for each of the network statistics introduced via -stats- list arranged according to its order.

Finally, you can compute the reproduction accuracy for the weights (introduced in 2) using:

G.reproduction_accuracy_w(percentiles=(inf_p,sup_p))

This method needs the previous computation of G.w_ensemble_matrix. It computes the fraction of empirical weights which fall inside the percentile CI interval given by the inf_p-percentile and sup_p-percentile, extracted from the graph ensemble and it returns as the attribute

G.RA_w.

Documentation

You can find the complete documentation of the DyGyS library in documentation

Credits

Author:

Marzio Di Vece (a.k.a. MarsMDK)

Acknowledgments: The module was developed under the supervision of Diego Garlaschelli and Tiziano Squartini. It was developed at IMT School for Advanced Studies Lucca and supported by the Italian ‘Programma di Attività Integrata’ (PAI) project ‘Prosociality, Cognition and Peer Effects’ (Pro.Co.P.E.), funded by IMT School for Advanced Studies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DyGyS-0.0.5.tar.gz (48.5 kB view hashes)

Uploaded Source

Built Distribution

DyGyS-0.0.5-py3-none-any.whl (50.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page