nnx: experimental 'nn' components

The original neural network from Torch7, nn, contains stable and widely used modules. 'nnx' contains more experimental, unproven modules, and optimizations. Modules that become stable and which are proven useful make their way into 'nn' (some already have).

Library Documentation

This section includes documentation for the following objects:

SoftMaxTree : a hierarchical log-softmax Module;
TreeNLLCriterion : a negative log-likelihood Criterion for the SoftMaxTree;
CTCCriterion : a Connectionist Temporal Classification Criterion based on warp-ctc;
PushTable (and PullTable) : extracts a table element and inserts it later in the network;
MultiSoftMax : performs a softmax over the last dimension of a 2D or 3D input;
SpatialReSampling : performs bilinear resampling of a 3D or 4D input image;
QDRiemaNNLinear : quasi-diagonal reduction for Riemannian gradient descent
Recurrent : a generalized recurrent neural network container;

SoftMaxTree

A hierarchy of parameterized log-softmaxes. Used for computing the likelihood of a leaf class. This Module should be used in conjunction with the TreeNLLCriterion. Using this for large vocabularies (100,000 and more) greatly accelerates training and evaluation of neural network language models (NNLM). A vocabulary hierarchy is provided via the dp package's BillionWords DataSource.

The constructor takes 2 mandatory and 4 optional arguments :

inputSize : the number of units in the input embedding representation;
hierarchy : a Tensor mapping one parent_id to many child_id (a tree);
rootId : a number identifying the root node in the hierarchy. Defaults to -1;
accUpdate : when the intent is to use backwardUpdate or accUpdateGradParameters, set this to true to save memory. Defaults to false;
static : when true (the defualt), returns parameters with keys that don't change from batch to batch;
verbose : prints some additional information concerning the hierarchy during construction.

The forward method returns an output Tensor of size 1D, while backward returns a table {gradInput, gradTarget}. The second variable is just a Tensor of zeros , such that the targets can be propagated through Containers like ParallelTable.

> input = torch.randn(5,10)
> target = torch.IntTensor{20,24,27,10,12}
> gradOutput = torch.randn(5)
> root_id = 29
> input_size = 10   
> hierarchy = {
>>    [29]=torch.IntTensor{30,1,2}, [1]=torch.IntTensor{3,4,5}, 
>>    [2]=torch.IntTensor{6,7,8}, [3]=torch.IntTensor{9,10,11},
>>    [4]=torch.IntTensor{12,13,14}, [5]=torch.IntTensor{15,16,17},
>>    [6]=torch.IntTensor{18,19,20}, [7]=torch.IntTensor{21,22,23},
>>    [8]=torch.IntTensor{24,25,26,27,28}
>> }
> smt = nn.SoftMaxTree(input_size, hierarchy, root_id)
> smt:forward{input, target}
-3.5186
-3.8950
-3.7433
-3.3071
-3.0522
[torch.DoubleTensor of dimension 5]
> smt:backward({input, target}, gradOutput)
{
  1 : DoubleTensor - size: 5x10
  2 : IntTensor - size: 5
}

TreeNLLCriterion

Measures the Negative log-likelihood (NLL) for SoftMaxTrees. Used for maximizing the likelihood of SoftMaxTree outputs. The SoftMaxTree Module outputs a column Tensor representing the log likelihood of each target in the batch. Thus SoftMaxTree requires the targets. So this Criterion only computes the negative of those outputs, as well as its corresponding gradients.

PushTable (and PullTable)

PushTable and PullTable work together. The first can be put earlier in a digraph of Modules such that it can communicate with a PullTable located later in the graph. PushTable:forward(input) for an input table of Tensors to the output, excluding one, the index of which is specified by the index argument in the PushTable(index) constructor. The Tensor identified by this index is communicated to one or many PullTables created via the PushTable:pull(index) factory method. These can be inserted later in the digraph such that a call to PushTable:forward(input), where input is a table or a Tensor, will output a table with the previously pushed Tensor inserted at index index.

An example utilizing the above SoftMaxTree Module and a Linear Module demonstrates how the PushTable can be used to forward the target Tensor without any other Table Modules:

> mlp = nn.Sequential()
> linear = nn.Linear(50,100)
> push = nn.PushTable(2)
> pull = push:pull(2)
> mlp:add(push)
> mlp:add(nn.SelectTable(1))
> mlp:add(linear)
> mlp:add(pull)
> mlp:add(smt) --smt is a SoftMaxTree instance
> mlp:forward{input, target} -- input and target are defined above
-3.5186
-3.8950
-3.7433
-3.3071
-3.0522
[torch.DoubleTensor of dimension 5]
> mlp:backward({input, target}, gradOutput) -- so is gradOutput
{
  1 : DoubleTensor - size: 5x10
  2 : IntTensor - size: 5
}

The above code is equivalent to the following:

> mlp2 = nn.Sequential()
> para = nn.ParallelTable()
> para:add(linear)
> para:add(nn.Identity())
> mlp2:add(para)
> mlp2:add(smt)
> mlp2:forward{input, target}
-3.5186
-3.8950
-3.7433
-3.3071
-3.0522
[torch.DoubleTensor of dimension 5]
> mlp2:backward({input, target}, gradOutput)
{
  1 : DoubleTensor - size: 5x10
  2 : IntTensor - size: 5
}

In some cases, this can simplify the digraph of Modules. Note that a PushTable can be associated to many PullTables, but each PullTable is associated to only one PushTable.

CTCCriterion

criterion = nn.CTCCriterion()

Creates a Criterion based on Baidus' warp-ctc implementation. This Module measures the loss between a 3D output of (batch x time x inputdim) and a target without needing alignment of inputs and labels. Must have installed warp-ctc which can be installed via luarocks:

luarocks install http://raw.githubusercontent.com/baidu-research/warp-ctc/master/torch_binding/rocks/warp-ctc-scm-1.rockspec

Supports cuda via:

criterion = nn.CTCCriterion():cuda()

Example:

output = torch.Tensor({{{1,2,3,4,5},{6,7,8,9,10}}}) -- Tensor of size 1x1x5 (batch x time x inputdim).
label = {{1,3}}
sizes = torch.Tensor({2}) -- Size of each sequence (sequence-length) in the batch as a tensor
ctcCriterion = nn.CTCCriterion()

err = ctcCriterion:forward(output,label,sizes)
gradOut = ctcCriterion:backward(output,label)
print("----CPU----")
print("Error : " .. err)
print("Gradients :")
print(gradOut)

ctcCriterion = ctcCriterion:cuda() -- Switch to cuda implementation.
output = output:cuda()

err = ctcCriterion:forward(output,label,sizes)
gradOut = ctcCriterion:backward(output,label)
print("----GPU----")
print("Error : " .. err)
print("Gradients :")
print(gradOut)

gives the output:

----CPU---- 
Error : 4.9038286209106 
Gradients : 
(1,.,.) = 
  0.0117 -0.9683  0.0861  0.2341  0.6364
  0.0117  0.0317  0.0861 -0.7659  0.6364
[torch.FloatTensor of size 1x2x5]

----GPU---- 
Error : 4.9038290977478 
Gradients : 
(1,.,.) = 
  0.0117 -0.9683  0.0861  0.2341  0.6364
  0.0117  0.0317  0.0861 -0.7659  0.6364
[torch.CudaTensor of size 1x2x5]

MultiSoftMax

This Module takes 2D or 3D input and performs a softmax over the last dimension. It uses the existing SoftMax CUDA/C code to do so such that the Module can be used on both GPU and CPU. This can be useful for keypoint detection.

SpatialReSampling

Applies a 2D re-sampling over an input image composed of several input planes (or channels, colors). The input tensor in forward(input) is expected to be a 3D or 4D tensor of size : [batchSize x] nInputPlane x width x height. The number of output planes will be the same as the number of input planes.

The re-sampling is done using bilinear interpolation. For a simple nearest-neihbor upsampling, use nn.SpatialUpSampling(), and for a simple average-based down-sampling, use nn.SpatialDownSampling().

If the input image is a 3D tensor of size nInputPlane x height x width, the output image size will be nInputPlane x oheight x owidth where owidth and oheight are given to the constructor.

Instead of owidth and oheight, one can provide rwidth and rheight, such that owidth = iwidth*rwidth and oheight = iheight*rheight.

As an example, we can run the following code on the famous Lenna image:

require 'image'                                                           
require 'nnx'
input = image.loadPNG('doc/image/Lenna.png')
l = nn.SpatialReSampling{owidth=150,oheight=150}
output = l:forward(input)
image.save('doc/image/Lenna-150x150-bilinear.png', output)

The input:

The re-sampled output:

QDRiemaNNLinear

The Quasi-Diagonal Riemannian Neural Network Linear (QDRiemaNNLinear) module is an implementation of the quasi-diagonal reduction of metrics, used for Riemannian gradient descent. The algorithm is defined in Riemannian metrics for neural networks I: feedforward networks by Yann Ollivier (http://arxiv.org/abs/1303.0818) and an efficient implementation is described in Practical Riemannian Neural Networks by Yann Ollivier and Gaetan Marceau-Caron (http://arxiv.org/abs/1602.08007). To use this module, simply replace nn.Linear(ninput,noutput) with nnx.QDRiemaNNLinear(ninput,noutput). As always, the step-size must be chosen accordingly. Two additional arguments are also possible:

gamma (default=0.01): determine the update rate of the metric for a minibatch setting, i.e., (1-gamma) * oldMetric + gamma newMetric. Smaller minibatches require a smaller gamma. A default value depending on the size of the minibatches is gamma = 1. - torch.pow(1.-1./nTraining,miniBatchSize) where nTraining is the number of training examples of the dataset and miniBatchSize is the number of training examples per minibatch.
qdFlag (default=true): Whether to use the quasi-diagonal reduction (true) or only the diagonal (false). The former should be better.

This module is a straightforward implementation of the outer product gradient descent.

Requirements

Torch7 (www.torch.ch)

Installation

Install Torch7 (refer to its own documentation).
clone this project into dev directory of Torch7.
Rebuild torch, it will include new projects too.

Use the library

First run torch, and load nnx:

$ torch

> require 'nnx'

Once loaded, tab-completion will help you navigate through the library (note that most function are added directly to nn):

> nnx. + TAB
...
> nn. + TAB

In particular, it's good to verify that all modules provided pass their tests:

> nnx.test_all()
> nnx.test_omp()

Recurrent

DEPRECATED July 6th, 2015. Use rnn instead.

Failed to load latest commit information.
doc/image	SpatialReSampling documentation	Oct 8, 2014
generic	Extra checks and cleanup in C code	Feb 19, 2016
test	Divide loss by batch size, initialize 2D activations	May 18, 2016
.gitignore	tell git to ignore build output	Sep 7, 2015
Balance.lua	Balance	Jul 19, 2014
CMakeLists.txt	Work under windows	Jul 24, 2015
CTCCriterion.lua	Divide loss by batch size, initialize 2D activations	May 18, 2016
DataList.lua	talk with Clement better integration	Sep 15, 2011
DataSet.lua	Upgraded all code to new torch master.	Nov 11, 2011
DataSetLabelMe.lua	Formatting.	Jul 18, 2013
DataSetSamplingPascal.lua	Scoping.	Jul 12, 2013
DistMarginCriterion.lua	Upgraded all code to new torch master.	Nov 11, 2011
DistNLLCriterion.lua	Added Log.	Apr 1, 2012
FunctionWrapper.lua	Add module FunctionWrapper	May 7, 2012
LA.lua	Added a new activation function based on http://arxiv.org/pdf/1412.68…	Feb 24, 2015
LICENSE	Create LICENSE	Feb 18, 2016
Minus.lua	Minus.	Sep 4, 2013
MultiSoftMax.lua	Use THNN.	Feb 5, 2016
Probe.lua	better probe	Jun 8, 2013
PullTable.lua	fix infinite recursion in PullTable and PushTable when converting to …	Nov 24, 2015
PushTable.lua	fix infinite recursion in PullTable and PushTable when converting to …	Nov 24, 2015
QDRiemaNNLinear.lua	Removing useless flags for OP metric	Apr 15, 2016
README.md	update README.md for changes in CTCCriterion.lua	Jul 27, 2016
SaturatedLU.lua	adding a Saturated linear unit (saturate from both sides)	Mar 20, 2014
SoftMaxTree.lua	fix non-contiguous gradOutput bug	Jan 29, 2016
SparseCriterion.lua	Upgraded all code to new torch master.	Nov 11, 2011
SpatialClassifier.lua	Upgraded all code to new torch master.	Nov 11, 2011
SpatialColorTransform.lua	Upgraded all code to new torch master.	Nov 11, 2011
SpatialDownSampling.lua	Extra checks and cleanup in C code	Feb 19, 2016
SpatialFovea.lua	Merge branch 'master' of https://github.com/clementfarabet/lua---nnx	Mar 28, 2012
SpatialGraph.lua	Upgraded all code to new torch master.	Nov 11, 2011
SpatialLinear.lua	Put SpatialLinear back	Sep 8, 2012
SpatialMatching.lua	Add SpatialReSamplingEx (grouping all the Spatial*Sampling).	Apr 26, 2012
SpatialMaxSampling.lua	Upgraded all code to new torch master.	Nov 11, 2011
SpatialNormalization.lua	Added two old modules for backward compat.	Sep 8, 2012
SpatialPadding.lua	Extra checks and cleanup in C code	Feb 19, 2016
SpatialPyramid.lua	math.mod -> math.fmod	Jan 19, 2015
SpatialRadialMatching.lua	Add SpatialRadialMatching	Jun 14, 2012
SpatialReSampling.lua	Extra checks and cleanup in C code	Feb 19, 2016
SpatialReSamplingEx.lua	fix a bug where the module is locked in to a particular oheight/width	Oct 25, 2013
SpatialRecursiveFovea.lua	Upgraded all code to new torch master.	Nov 11, 2011
SpatialSparseCriterion.lua	Upgraded all code to new torch master.	Nov 11, 2011
SpatialUpSampling.lua	Add SpatialReSamplingEx (grouping all the Spatial*Sampling).	Apr 26, 2012
SuperCriterion.lua	Upgraded all code to new torch master.	Nov 11, 2011
Tic.lua	Add SpatialReSamplingEx (grouping all the Spatial*Sampling).	Apr 26, 2012
Toc.lua	Add SpatialReSamplingEx (grouping all the Spatial*Sampling).	Apr 26, 2012
TreeNLLCriterion.lua	TreeNLLCriterion._module support in cunn	Jul 28, 2015
ZeroGrad.lua	fixed ZeroGrad:updateOutput	Mar 15, 2015
init.c	added SoftMaxTree to init.lua/c	May 15, 2014
init.lua	Adding the QDRiemaNNLinear module	Apr 13, 2016
nnx-0.1-1.rockspec	Fixing cases when $(LUAROCKS_PREFIX) != $(LUA_BINDIR)/..	May 12, 2016

clementfarabet/lua---nnx

README.md