GraphMLDatasets
GraphMLDatasets.Cora
GraphMLDatasets.OGBNArxiv
GraphMLDatasets.OGBNProducts
GraphMLDatasets.OGBNProteins
GraphMLDatasets.PPI
GraphMLDatasets.Planetoid
GraphMLDatasets.QM7b
GraphMLDatasets.Reddit
GraphMLDatasets.alldata
GraphMLDatasets.edge_features
GraphMLDatasets.graphdata
GraphMLDatasets.metadata
GraphMLDatasets.node_features
GraphMLDatasets.node_labels
GraphMLDatasets.rawdata
GraphMLDatasets.test_indices
GraphMLDatasets.testdata
GraphMLDatasets.train_indices
GraphMLDatasets.traindata
GraphMLDatasets.valid_indices
GraphMLDatasets.validdata
Usage
graph = graphdata(Planetoid(), :cora)
train_X, train_y = traindata(Planetoid(), :cora)
test_X, test_y = testdata(Planetoid(), :cora)
# OBG datasets
graph = graphdata(OGBNProteins())
ef = edge_features(OGBNProteins())
nl = node_labels(OGBNProteins())
APIs
GraphMLDatasets.traindata
— Functiontraindata(dataset)
Returns training data for dataset
.
GraphMLDatasets.validdata
— Functionvaliddata(dataset)
Returns validation data for dataset
.
GraphMLDatasets.testdata
— Functiontestdata(dataset)
Returns testing data for dataset
.
GraphMLDatasets.train_indices
— Functiontrain_indices(dataset)
Returns indices of training data for dataset
.
GraphMLDatasets.valid_indices
— Functionvalid_indices(dataset)
Returns indices of validation data for dataset
.
GraphMLDatasets.test_indices
— Functiontest_indices(dataset)
Returns indices of testing data for dataset
.
GraphMLDatasets.graphdata
— Functiongraphdata(dataset)
Returns graph for dataset
in the form of JuliaGraphs objects.
GraphMLDatasets.rawdata
— Functionrawdata(dataset)
Returns the raw data for dataset
.
GraphMLDatasets.alldata
— Functionalldata(dataset)
Returns the whole dataset for dataset
.
GraphMLDatasets.metadata
— Functionmetadata(dataset)
Returns the auxiliary data about dataset
.
GraphMLDatasets.node_features
— Functionnode_features(dataset)
Returns all the node features for dataset
.
GraphMLDatasets.edge_features
— Functionedge_features(dataset)
Returns all the edge features for dataset
.
GraphMLDatasets.node_labels
— Functionnode_labels(dataset)
Returns all the node labels for dataset
.
Available datasets
Planetoid dataset
GraphMLDatasets.Planetoid
— TypePlanetoid()
Planetoid dataset contains Cora, CiteSeer, PubMed three citation networks. Nodes represent documents and edges represent citation links.
Implements: graphdata
, traindata
, testdata
, alldata
, rawdata
, metadata
Cora dataset
GraphMLDatasets.Cora
— TypeCora()
Cora dataset contains full Cora citation networks. Nodes represent documents and edges represent citation links.
PPI dataset
GraphMLDatasets.PPI
— TypePPI()
PPI dataset contains the protein-protein interaction networks. Nodes represent proteins and edges represent if proteins have interaction with each other. Positional gene sets, motif gene sets and immunological signatures as features (50 in total) and gene ontology sets as labels (121 in total).
Reddit dataset
GraphMLDatasets.Reddit
— TypeReddit()
Reddit dataset contains Reddit post networks. Reddit is a large online discussion forum where users post and comment in 50 communities. Reddit posts belonging to different communities. Nodes represent posts and edges represent if the same user comments on both posts. The task is to predict post categories of community.
QM7b dataset
GraphMLDatasets.QM7b
— TypeQM7b()
QM7b dataset contains molecular structure graphs and is subset of the GDB-13 database. It contains stable and synthetically organic molecular structures. Nodes represent atoms in a molecule and edges represent there is a chemical bond between atoms. The 3D Cartesian coordinates of the stable conformation is given as features. The task is to predict the electronic properties. It contains 7,211 molecules with 14 regression targets.
Implements: rawdata
OGB Node Property Prediction
OGBNProteins dataset
GraphMLDatasets.OGBNProteins
— TypeOGBNProteins()
OGBNProteins
dataset contains protein-protein interaction network. The task to predict the presence of protein functions in a multi-label binary classification. Training/validation/test splits are given by node indices.
Description
- Graph: undirected, weighted, and typed (according to species) graph.
- Node: proteins.
- Edge: different types of biologically meaningful associations between proteins, e.g., physical interactions, co-expression or homology.
References
- Damian Szklarczyk, Annika L Gable, David Lyon, Alexander Junge, Stefan Wyder, Jaime Huerta- Cepas, Milan Simonovic, Nadezhda T Doncheva, John H Morris, Peer Bork, et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Research, 47(D1):D607–D613, 2019.
- Gene Ontology Consortium. The gene ontology resource: 20 years and still going strong. Nucleic Acids Research, 47(D1):D330–D338, 2018.
Implements: graphdata
, train_indices
, valid_indices
, test_indices
, edge_features
, node_labels
OGBNProducts dataset
GraphMLDatasets.OGBNProducts
— TypeOGBNProducts()
OGBNProducts
dataset contains an Amazon product co-purchasing network. The task to predict the category of a product in a multi-class classification. Training/validation/test splits are given by node indices.
Description
- Graph: undirected and unweighted graph.
- Node: products sold in Amazon.
- Edge: the products are purchased together.
References
- http://manikvarma.org/downloads/XC/XMLRepository.html
Implements: graphdata
, train_indices
, valid_indices
, test_indices
, node_features
, node_labels
OGBNArxiv dataset
GraphMLDatasets.OGBNArxiv
— TypeOGBNArxiv()
OGBNArxiv
dataset contains the citation network between all Computer Science (CS) arXiv papers indexed by MAG. The task to predict the primary categories of the arXiv papers from 40 subject areas in a multi-class classification. Training/validation/test splits are given by node indices.
Description
- Graph: directed graph.
- Node: arXiv paper.
- Edge: each directed edge indicates that one paper cites another one.
References
- Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. Microsoft academic graph: When experts are not enough. Quantitative Science Studies, 1(1):396–413, 2020.
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representationsof words and phrases and their compositionality. In Advances in Neural Information Processing Systems (NeurIPS), pp. 3111–3119, 2013.
Implements: graphdata
, train_indices
, valid_indices
, test_indices
, node_features
, node_labels