A Python library for knowledge graph representation learning (graph embedding)
Project description
Emgraph
Emgraph is a Python toolkit for graph embedding.
Number | Algorithm |
1 | TransE |
2 | ComplEx |
3 | HolE |
4 | DistMult |
5 | ConvE |
6 | ConvKB |
7 | RandomBaseline |
Installation
Install the latest version of Emgraph:
$ pip install emgraph
Documentation
Soon
Simple example
Embedding wordnet11 graph using
TransE
model:
from sklearn.metrics import brier_score_loss, log_loss
from scipy.special import expit
from emgraph.datasets import BaseDataset, DatasetType
from emgraph.models import TransE
def train_transe():
X = BaseDataset.load_dataset(DatasetType.WN11)
model = TransE(batches_count=64, seed=0, epochs=20, k=100, eta=20,
optimizer='adam', optimizer_params={'lr': 0.0001},
loss='pairwise', verbose=True, large_graphs=False)
model.fit(X['train'])
scores = model.predict(X['test'])
print("Scores: ", scores)
print("Brier score loss:", brier_score_loss(X['test_labels'], expit(scores)))
# Executing the function
if __name__ == '__main__':
train_transe()
Evaluating ComplEx
model after training:
import numpy as np
from emgraph.datasets import BaseDataset, DatasetType
from emgraph.models import ComplEx
from emgraph.evaluation import evaluate_performance
def complex_performance():
X = BaseDataset.load_dataset(DatasetType.WN18)
model = ComplEx(batches_count=10, seed=0, epochs=20, k=150, eta=1,
loss='nll', optimizer='adam')
model.fit(np.concatenate((X['train'], X['valid'])))
filter_triples = np.concatenate((X['train'], X['valid'], X['test']))
ranks = evaluate_performance(X['test'][:5], model=model,
filter_triples=filter_triples,
corrupt_side='s+o',
use_default_protocol=False)
return ranks
# Executing the function
if __name__ == '__main__':
ranks = complex_performance()
print("ranks {}".format(ranks))
Call for Contributions
The Emgraph project welcomes your expertise and enthusiasm!
Ways to contribute to Emgraph:
- Writing code
- Review pull requests
- Develop tutorials, presentations, and other educational materials
- Translate documentation and readme contents
Issues
If you happened to encounter any issue in the codes, please report it here. A better way is to fork the repository on Github and/or create a pull request.
More examples
Embedding wordnet11 graph using
DistMult
model:
from sklearn.metrics import brier_score_loss, log_loss
from scipy.special import expit
from emgraph.datasets import BaseDataset, DatasetType
from emgraph.models import DistMult
def train_dist_mult():
X = BaseDataset.load_dataset(DatasetType.WN11)
model = DistMult(batches_count=1, seed=555, epochs=20, k=10, loss='pairwise',
loss_params={'margin': 5})
model.fit(X['train'])
scores = model.predict(X['test'])
print("Scores: ", scores)
print("Brier score loss:", brier_score_loss(X['test_labels'], expit(scores)))
# Executing the function
if __name__ == '__main__':
train_dist_mult()
Future work
- Modulate the functions
- Add more algorithms
- Run on CUDA cores
- Make it faster using vectorization etc.
- Add more preprocessors
- Add dataset, graph, and dataframe manipulations
- Unify and reconstruct the architecture and eliminate redundancy
If you found it helpful, please give us a :star:
License
Released under the BSD license