Package org. diffkt. model
Types
An affine transform. Multiplies by one tensor and then adds another. Like a Dense layer, except that where a dense layer performs a matmul, this one performs an element-wise multiplication.
A training version of batch normalization provided for compatibility with existing code.
A trainable Batch Normalization transform, as described in https://arxiv.org/abs/1502.03167 . When training is complete, use its @see inferenceMode property to get the computed affine transform. This version maintains an exponential moving average of the sum of the samples, sum of the squared samples, and sample count which are used to estimate the population mean and variance.
A trainable Batch Normalization transform, as described in https://arxiv.org/abs/1502.03167 . When training is complete, use its @see inferenceMode property to get the computed affine transform. This version is provided to imitate the behavior in V1, the previous implementation, in that it calculates a running mean and running variance rather than gathering the raw input to compute the mean and variance. It applies Bessel's correction (https://en.wikipedia.org/wiki/Bessel%27s_correction) to the sample variance to get an estimate of the population variance for each batch, and uses an exponential moving average of those values as an estimate the population variance when @see inferenceMode is applied.
Densely-connected layer
A trainable embedding table with size vocabSize x embeddingSize
A trainable embedding table with size vocabSize x embeddingSize
Flattens the input. Does not affect batch size.
Make a GRU you desire? See invoke
in the companion object, or use the GRUEncoder or GRUDecoder helpers.
Linear-after-reset GRU
Linear-before-reset GRU
Stochastic gradient descent optimizer with optional weight decay regularization and momentum parameters.
Functions
Computes the average of the pool (poolHeight x poolWidth) for each pool in x with a stride of (poolHeight, poolWidth). Requires that dim H on x be divisible by poolHeight and dim W on x be divisible by poolWidth.
The batchNorm op used for training
Returns the current (this) value updated by the new value (new) scaled by momentum
Returns the current (this) tensor updated by the new tensor (new) scaled by momentum