FluxML
diff --git a/‎Project.toml
Lines changed: 4 additions & 4 deletions b/‎Project.toml
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/make.jl
Lines changed: 5 additions & 0 deletions b/‎docs/make.jl
Lines changed: 5 additions & 0 deletions
diff --git a/‎docs/src/tutorials/gcn_fixed_graph.md
Lines changed: 109 additions & 0 deletions b/‎docs/src/tutorials/gcn_fixed_graph.md
Lines changed: 109 additions & 0 deletions
diff --git a/‎docs/src/tutorials/semisupervised_gcn.md
Lines changed: 135 additions & 0 deletions b/‎docs/src/tutorials/semisupervised_gcn.md
Lines changed: 135 additions & 0 deletions
diff --git a/‎examples/gae.jl
Lines changed: 7 additions & 3 deletions b/‎examples/gae.jl
Lines changed: 7 additions & 3 deletions
diff --git a/‎examples/gat.jl
Lines changed: 9 additions & 8 deletions b/‎examples/gat.jl
Lines changed: 9 additions & 8 deletions
@@ -27,13 +27,13 @@ Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
 CUDA = "3"
 ChainRulesCore = "1.7"
 DataStructures = "0.18"
-FillArrays = "0.12"
+FillArrays = "0.12 - 0.13"
 Flux = "0.12"
 GraphMLDatasets = "0.1"
 GraphSignals = "0.3"
-Graphs = "1.4"
-NNlib = "0.7"
-NNlibCUDA = "0.1"
+Graphs = "1"
+NNlib = "0.7 - 0.8"
+NNlibCUDA = "0.1 - 0.2"
 Reexport = "1.1"
 Word2Vec = "0.5"
 Zygote = "0.6"
 
@@ -21,6 +21,11 @@ makedocs(
                 "Building layers" => "basics/layers.md",
                 "Graph passing" => "basics/passgraph.md"],
              "Cooperate with Flux layers" => "cooperate.md",
+             "Tutorials" =>
+                [
+                  "Semi-supervised learning with GCN" => "tutorials/semisupervised_gcn.md",
+                  "GCN with Fixed Graph" => "tutorials/gcn_fixed_graph.md",
+                ],
              "Abstractions" =>
                ["Message passing scheme" => "abstractions/msgpass.md",
                 "Graph network block" => "abstractions/gn.md"],
 
@@ -0,0 +1,109 @@
+# GCN with Fixed Graph
+
+In the tutorial for semi-supervised learning with GCN, variable graphs are provided to GNN from `FeaturedGraph`, which contains a graph and node features. Each `FeaturedGraph` object can contain different graph and different node features, and can be train on the same GNN model. However, variable graph doesn't have the proper form of graph structure with respect to GNN layers and this lead to inefficient training/inference process. Fixed graph strategy can be used to train a GNN model with the same graph structure in GeometricFlux.
+
+## Fixed Graph
+
+A fixed graph is given to a layer by `WithGraph` syntax. `WithGraph` wrap a `FeaturedGraph` object and a GNN layer as first and second arguments, respectively.
+
+```julia
+fg = FeaturedGraph(graph)
+WithGraph(fg, GCNConv(1024=>256, relu))
+```
+
+This way, we can customize by binding different graph to certain layer and the layer will specialize graph to a required form. For example, a `GCNConv` layer requires graph in the form of normalized adjacency matrix. Once the graph is bound to a `GCNConv` layer, it transforms graph into normalized adjacency matrix and stores in `WithGraph` object. It accelerates training or inference by avoiding calculating transformations. The features in `FeaturedGraph` object in `WithGraph` are not used in any layer or model training or inference.
+
+## Array in, Array out
+
+With this approach, a GNN layer accepts features in array. It takes an array as input and outputs array. Thus, a GNN layer wrapped with `WithGraph` should accept a feature array, just like regular deep learning model.
+
+## Batch Learning
+
+Since features are in the form of array, they can be batched up for batched learning. We will demonstrate how to achieve these goals.
+
+## Step 1: Load Dataset
+
+Different from loading datasets in semi-supervised learning example, we use `alldata` for supervised learning here and `padding=true` is added in order to padding features from partial nodes to pseudo-full nodes. A padded features contains zeros in the nodes that are not supposed to be train on.
+
+```julia
+train_X, train_y = map(x -> Matrix(x), alldata(Planetoid(), dataset, padding=true))
+```
+
+We need graph and node indices for training as well.
+
+```julia
+g = graphdata(Planetoid(), dataset)
+train_idx = 1:size(train_X, 2)
+```
+
+## Step 2: Batch up Features and Labels
+
+In order to make batch learning available, we separate graph and node features. We don't subgraph here. Node features are batched up by repeating node features here for demonstration, since planetoid dataset doesn't have batched settings. Different repeat numbers can be specified by `train_repeats` and `train_repeats`.
+
+```julia
+fg = FeaturedGraph(g)
+train_data = (repeat(train_X, outer=(1,1,train_repeats)), repeat(train_y, outer=(1,1,train_repeats)))
+```
+
+## Step 3: Build a GCN model
+
+Here comes to building a GCN model. We build a model as building a regular Flux model but just wrap `GCNConv` layer with `WithGraph`.
+
+```julia
+model = Chain(
+    WithGraph(fg, GCNConv(args.input_dim=>args.hidden_dim, relu)),
+    Dropout(0.5),
+    WithGraph(fg, GCNConv(args.hidden_dim=>args.target_dim)),
+)
+```
+
+## Step 4: Loss Functions and Accuracy
+
+Almost all codes are the same as in semi-supervised learning example, except that indices for subgraphing are needed to get partial features out for calculating loss.
+
+```julia
+l2norm(x) = sum(abs2, x)
+
+function model_loss(model, λ, X, y, idx)
+    loss = logitcrossentropy(model(X)[:,idx,:], y[:,idx,:])
+    loss += λ*sum(l2norm, Flux.params(model[1]))
+    return loss
+end
+```
+
+And the accuracy measurement also needs indices.
+
+```julia
+function accuracy(model, X::AbstractArray, y::AbstractArray, idx)
+    return mean(onecold(softmax(cpu(model(X))[:,idx,:])) .== onecold(cpu(y)[:,idx,:]))
+end
+
+accuracy(model, loader::DataLoader, device, idx) = mean(accuracy(model, X |> device, y |> device, idx) for (X, y) in loader)
+```
+
+## Step 5: Training GCN Model
+
+```julia
+train_loader, test_loader, fg, train_idx, test_idx = load_data(:cora, args.batch_size)
+
+# optimizer
+opt = ADAM(args.η)
+
+# parameters
+ps = Flux.params(model)
+
+# training
+train_steps = 0
+@info "Start Training, total $(args.epochs) epochs"
+for epoch = 1:args.epochs
+    @info "Epoch $(epoch)"
+
+    for (X, y) in train_loader
+        grad = gradient(() -> model_loss(model, args.λ, X |> device, y |> device, train_idx |> device), ps)
+        Flux.Optimise.update!(opt, ps, grad)
+        train_steps += 1
+    end
+end
+```
+
+Now we could just train the GCN model directly!
@@ -0,0 +1,135 @@
+# Semi-supervised Learning with Graph Convolution Networks (GCN)
+
+Graph convolution networks (GCN) have been considered as the first step to graph neural networks (GNN). This example will go through how to train a vanilla GCN.
+
+## Semi-supervised Learning in Graph Neural Networks
+
+The semi-supervised learning task defines a learning by given features and labels for only partial nodes in a graph. We train features and labels for partial nodes, and test the model for another partial nodes in graph.
+
+## Node Classification task
+
+In this task, we learn a node classification task which learns a model to predict labels for each node in a graph. In GCN network, node features are given and the model outputs node labels.
+
+## Step 1: Load Dataset
+
+GeometricFlux provides planetoid dataset in `GeometricFlux.Datasets`, which is provided by GraphMLDatasets. Planetoid dataset has three sub-datasets: Cora, Citeseer, PubMed. We demonstrate Cora dataset in this example. `traindata` provides the functionality for loading training data from various kinds of datasets. Dataset can be specified by the first argument, and the second for sub-datasets.
+
+```julia
+using GeometricFlux.Datasets
+
+train_X, train_y = traindata(Planetoid(), :cora)
+```
+
+`traindata` returns a pre-defined training features and labels. These features are node features.
+
+```julia
+train_X, train_y = map(x->Matrix(x), traindata(Planetoid(), :cora))
+```
+
+We can load graph from `graphdata`, and the graph is preprocessed into `SimpleGraph` type, which is provided by Graphs.
+
+```julia
+g = graphdata(Planetoid(), :cora)
+train_idx = train_indices(Planetoid(), :cora)
+```
+
+We need node indices to index a subgraph from original graph. `train_indices` gives node indices for training.
+
+## Step 2: Wrapping Graph and Features into `FeaturedGraph`
+
+`FeaturedGraph` is a container for holding a graph, node features, edge features and global features. It is provided by GraphSignals. To wrap graph and node features into `FeaturedGraph`, graph `g` should be placed as the first argument and `nf` is to specify node features.
+
+```julia
+using GraphSignals
+
+FeaturedGraph(g, nf=train_X)
+```
+
+If we want to get a subgraph from a `FeaturedGraph` object, we call `subgraph` and provide node indices `train_idx` as second argument.
+
+```julia
+subgraph(FeaturedGraph(g, nf=train_X), train_idx)
+```
+
+## Step 3: Build a GCN model
+
+A GCn model is composed of two layers of `GCNConv` and the activation function for first layer is `relu`. In the middle, a `Dropout` layer is placed. We need a `GraphParallel` to integrate with regular Flux layer, and it specifies node features go to `node_layer=Dropout(0.5)`.
+
+```julia
+model = Chain(
+    GCNConv(input_dim=>hidden_dim, relu),
+    GraphParallel(node_layer=Dropout(0.5)),
+    GCNConv(hidden_dim=>target_dim),
+    node_feature,
+)
+```
+
+Since the model input is a `FeaturedGraph` object, the model output a `FeaturedGraph` object as well. In the end of model, we get node features out from a `FeaturedGraph` object using `node_feature`.
+
+## Step 4: Loss Functions and Accuracy
+
+Then, since it is a node classification task, we define the model loss by `logitcrossentropy`, and a L2 regularization is used. In the vanilla GCN, only first layer is applied to L2 regularization and can be adjusted by hyperparameter `λ`.
+
+```julia
+l2norm(x) = sum(abs2, x)
+
+function model_loss(model, λ, batch)
+    loss = 0.f0
+    for (x, y) in batch
+        loss += logitcrossentropy(model(x), y)
+        loss += λ*sum(l2norm, Flux.params(model[1]))
+    end
+    return loss
+end
+```
+
+Accuracy for a batch and for data loader are provided.
+
+```julia
+function accuracy(model, batch::AbstractVector)
+    return mean(mean(onecold(softmax(cpu(model(x)))) .== onecold(cpu(y))) for (x, y) in batch)
+end
+
+accuracy(model, loader::DataLoader, device) = mean(accuracy(model, batch |> device) for batch in loader)
+```
+
+## Step 5: Training GCN Model
+
+We train the model with the same process as training a Flux model.
+
+```julia
+train_loader, test_loader = load_data(:cora, args.batch_size)
+
+# optimizer
+opt = ADAM(args.η)
+    
+# parameters
+ps = Flux.params(model)
+
+# training
+train_steps = 0
+@info "Start Training, total $(args.epochs) epochs"
+for epoch = 1:args.epochs
+    @info "Epoch $(epoch)"
+
+    for batch in train_loader
+        grad = gradient(() -> model_loss(model, args.λ, batch |> device), ps)
+        Flux.Optimise.update!(opt, ps, grad)
+        train_steps += 1
+    end
+end
+```
+
+So far, we complete a basic tutorial for training a GCN model!
+
+For the complete example, please check the script `examples/semisupervised_gcn.jl`.
+
+## Acceleration by Pre-computing Normalized Adjacency Matrix
+
+The training process can be slow in this example. Since we place the graph and features together in `FeaturedGraph` object, `GCNConv` will need to compute a normalized adjacency matrix in the training process. This behavior will lead to long training time. We can accelerate training process by pre-compute normalized adjacency matrix for all `FeaturedGraph` objects. To do so, we can call the following function and it will compute normalized adjacency matrix for `fg` before training. This will reduce the training time.
+
+```julia
+GraphSignals.normalized_adjacency_matrix!(fg)
+```
+
+Since the normalized adjacency matrix is used in `GCNConv`, we could pre-compute normalized adjacency matrix for it. If a layer doesn't require a normalized adjacency matrix, this step will lead to error.
@@ -1,4 +1,5 @@
 using GeometricFlux
+using GraphSignals
 using Flux
 using Flux: throttle
 using Flux.Losses: logitbinarycrossentropy
@@ -9,6 +10,8 @@ using SparseArrays
 using Graphs.SimpleGraphs
 using CUDA
 
+CUDA.allowscalar(false)
+
 @load "data/cora_features.jld2" features
 @load "data/cora_graph.jld2" g
 
@@ -20,14 +23,15 @@ target_catg = 7
 epochs = 200
 
 ## Preprocessing data
-fg = FeaturedGraph(g) |> gpu
+fg = FeaturedGraph(g)  # pass to gpu together in model layers
 train_X = Matrix{Float32}(features) |> gpu  # dim: num_features * num_nodes
-train_y = fg  # dim: num_nodes * num_nodes
+train_y = fg |> GraphSignals.adjacency_matrix |> gpu  # dim: num_nodes * num_nodes
 
 ## Model
 encoder = Chain(GCNConv(fg, num_features=>hidden1, relu),
                 GCNConv(fg, hidden1=>hidden2))
-model = Chain(GAE(encoder, σ)) |> gpu
+model = Chain(GAE(encoder, σ)) |> gpu;
+# do not show model architecture, showing CuSparseMatrix will trigger errors
 
 ## Loss
 loss(x, y) = logitbinarycrossentropy(model(x), y)
 
@@ -2,10 +2,10 @@ using GeometricFlux
 using Flux
 using Flux: onehotbatch, onecold, logitcrossentropy, throttle
 using Flux: @epochs
-using Flux.Data: DataLoader
 using JLD2
 using Statistics: mean
 using SparseArrays
+using LinearAlgebra
 using Graphs.SimpleGraphs
 using Graphs: adjacency_matrix
 using CUDA
@@ -24,29 +24,30 @@ epochs = 10
 ## Preprocessing data
 train_X = Matrix{Float32}(features) |> gpu  # dim: num_features * num_nodes
 train_y = Matrix{Float32}(labels) |> gpu  # dim: target_catg * num_nodes
-adj_mat = Matrix{Float32}(adjacency_matrix(g)) |> gpu
+A = Matrix{Int}((adjacency_matrix(g) + I) .≥ 1)
+fg = FeaturedGraph(A, :adjm)
 
 ## Model
-model = Chain(GATConv(g, num_features=>hidden, heads=heads),
+model = Chain(GATConv(fg, num_features=>hidden, heads=heads),
               Dropout(0.6),
-              GATConv(g, hidden*heads=>target_catg, heads=heads, concat=false)
+              GATConv(fg, hidden*heads=>target_catg, heads=heads, concat=false)
               ) |> gpu
 # test model
-# @show model(train_X)
+@show model(train_X)
 
 ## Loss
 loss(x, y) = logitcrossentropy(model(x), y)
 accuracy(x, y) = mean(onecold(cpu(model(x))) .== onecold(cpu(y)))
 
 # test loss
-# @show loss(train_X, train_y)
+@show loss(train_X, train_y)
 
 # test gradient
-# @show gradient(X -> loss(X, train_y), train_X)
+@show gradient(()->loss(train_X, train_y), Flux.params(model))
 
 ## Training
 ps = Flux.params(model)
-train_data = DataLoader(train_X, train_y, batchsize=num_nodes)
+train_data = Flux.Data.DataLoader((train_X, train_y), batchsize=num_nodes)
 opt = ADAM(0.01)
 evalcb() = @show(accuracy(train_X, train_y))