GraphMLDatasets

GraphMLDatasets.Cora
GraphMLDatasets.OGBNArxiv
GraphMLDatasets.OGBNProducts
GraphMLDatasets.OGBNProteins
GraphMLDatasets.PPI
GraphMLDatasets.Planetoid
GraphMLDatasets.QM7b
GraphMLDatasets.Reddit
GraphMLDatasets.alldata
GraphMLDatasets.edge_features
GraphMLDatasets.graphdata
GraphMLDatasets.metadata
GraphMLDatasets.node_features
GraphMLDatasets.node_labels
GraphMLDatasets.rawdata
GraphMLDatasets.test_indices
GraphMLDatasets.testdata
GraphMLDatasets.train_indices
GraphMLDatasets.traindata
GraphMLDatasets.valid_indices
GraphMLDatasets.validdata

Usage

graph = graphdata(Planetoid(), :cora)
train_X, train_y = traindata(Planetoid(), :cora)
test_X, test_y = testdata(Planetoid(), :cora)

# OBG datasets
graph = graphdata(OGBNProteins())
ef = edge_features(OGBNProteins())
nl = node_labels(OGBNProteins())

APIs

GraphMLDatasets.traindata — Function

traindata(dataset)

Returns training data for dataset.

source

GraphMLDatasets.validdata — Function

validdata(dataset)

Returns validation data for dataset.

source

GraphMLDatasets.testdata — Function

testdata(dataset)

Returns testing data for dataset.

source

GraphMLDatasets.train_indices — Function

train_indices(dataset)

Returns indices of training data for dataset.

source

GraphMLDatasets.valid_indices — Function

valid_indices(dataset)

Returns indices of validation data for dataset.

source

GraphMLDatasets.test_indices — Function

test_indices(dataset)

Returns indices of testing data for dataset.

source

GraphMLDatasets.graphdata — Function

graphdata(dataset)

Returns graph for dataset in the form of JuliaGraphs objects.

source

GraphMLDatasets.rawdata — Function

rawdata(dataset)

Returns the raw data for dataset.

source

GraphMLDatasets.alldata — Function

alldata(dataset)

Returns the whole dataset for dataset.

source

GraphMLDatasets.metadata — Function

metadata(dataset)

Returns the auxiliary data about dataset.

source

GraphMLDatasets.node_features — Function

node_features(dataset)

Returns all the node features for dataset.

source

GraphMLDatasets.edge_features — Function

edge_features(dataset)

Returns all the edge features for dataset.

source

GraphMLDatasets.node_labels — Function

node_labels(dataset)

Returns all the node labels for dataset.

source

Available datasets

Planetoid dataset

GraphMLDatasets.Planetoid — Type

Planetoid()

Planetoid dataset contains Cora, CiteSeer, PubMed three citation networks. Nodes represent documents and edges represent citation links.

Implements: graphdata, traindata, testdata, alldata, rawdata, metadata

source

Cora dataset

GraphMLDatasets.Cora — Type

Cora()

Cora dataset contains full Cora citation networks. Nodes represent documents and edges represent citation links.

Implements: graphdata, alldata, rawdata, metadata

source

PPI dataset

GraphMLDatasets.PPI — Type

PPI()

PPI dataset contains the protein-protein interaction networks. Nodes represent proteins and edges represent if proteins have interaction with each other. Positional gene sets, motif gene sets and immunological signatures as features (50 in total) and gene ontology sets as labels (121 in total).

Implements: traindata, validdata, testdata

source

Reddit dataset

GraphMLDatasets.Reddit — Type

Reddit()

Reddit dataset contains Reddit post networks. Reddit is a large online discussion forum where users post and comment in 50 communities. Reddit posts belonging to different communities. Nodes represent posts and edges represent if the same user comments on both posts. The task is to predict post categories of community.

Implements: graphdata, alldata, rawdata, metadata

source

QM7b dataset

GraphMLDatasets.QM7b — Type

QM7b()

QM7b dataset contains molecular structure graphs and is subset of the GDB-13 database. It contains stable and synthetically organic molecular structures. Nodes represent atoms in a molecule and edges represent there is a chemical bond between atoms. The 3D Cartesian coordinates of the stable conformation is given as features. The task is to predict the electronic properties. It contains 7,211 molecules with 14 regression targets.

Implements: rawdata

source

OGB Node Property Prediction

OGBNProteins dataset

GraphMLDatasets.OGBNProteins — Type

OGBNProteins()

OGBNProteins dataset contains protein-protein interaction network. The task to predict the presence of protein functions in a multi-label binary classification. Training/validation/test splits are given by node indices.

Description

Graph: undirected, weighted, and typed (according to species) graph.
Node: proteins.
Edge: different types of biologically meaningful associations between proteins, e.g., physical interactions, co-expression or homology.

References

Damian Szklarczyk, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta- Cepas, Milan Simonovic, Nadezhda T Doncheva, John H Morris, Peer Bork, et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1):D607–D613, 2019.
Gene Ontology Consortium. The gene ontology resource: 20 years and still going strong. Nucleic Acids Research, 47(D1):D330–D338, 2018.

Implements: graphdata, train_indices, valid_indices, test_indices, edge_features, node_labels

source

OGBNProducts dataset

GraphMLDatasets.OGBNProducts — Type

OGBNProducts()

OGBNProducts dataset contains an Amazon product co-purchasing network. The task to predict the category of a product in a multi-class classification. Training/validation/test splits are given by node indices.

Description

Graph: undirected and unweighted graph.
Node: products sold in Amazon.
Edge: the products are purchased together.

References

http://manikvarma.org/downloads/XC/XMLRepository.html

Implements: graphdata, train_indices, valid_indices, test_indices, node_features, node_labels

source

OGBNArxiv dataset

GraphMLDatasets.OGBNArxiv — Type

OGBNArxiv()

OGBNArxiv dataset contains the citation network between all Computer Science (CS) arXiv papers indexed by MAG. The task to predict the primary categories of the arXiv papers from 40 subject areas in a multi-class classification. Training/validation/test splits are given by node indices.

Description

Graph: directed graph.
Node: arXiv paper.
Edge: each directed edge indicates that one paper cites another one.

References

Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. Microsoft academic graph: When experts are not enough. Quantitative Science Studies, 1(1):396–413, 2020.
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representationsof words and phrases and their compositionality. In Advances in Neural Information Processing Systems (NeurIPS), pp. 3111–3119, 2013.

Implements: graphdata, train_indices, valid_indices, test_indices, node_features, node_labels

source

GraphMLDatasets

Usage

APIs

Available datasets

Planetoid dataset

Cora dataset

PPI dataset

Reddit dataset

QM7b dataset

OGB Node Property Prediction

OGBNProteins dataset

OGBNProducts dataset

OGBNArxiv dataset

OGB Link Property Prediction

OGB Graph Property Prediction