Skip to main content

Tensors

In DiffKt there are many different types of differentiable tensors. Tensor means a multi-dimensional array. A float scalar is a 0D tensor. A vector is a 1D tensor. A 2D array is a 2D tensor. A 3D array is a 3D tensor, and so on.

DTensor is the interface for all differentiable tensors in DiffKt. A differentiable tensor can be a scalar, a 1D tensor, a 2D tensor, a 3D tensor, or have even more dimensions. Scalars also inherit from DTensor. A tensor has a number of properties, functions, or extensions defined in the interface. Properties we will discuss about DTensor are size, rank, shape, isScalar, and indexing.

A tensor has a size, which is the number of elements in the tensor,

A tensor has a rank, which indicates the number of dimensions: rank 0 - scalar, rank 1 - 1D tensor, rank 2 - 2D tensor, rank 3 - 3D tensor, and so on.

A tensor has a shape, which indicates the number of axes and the length of each axis of the tensor.

A tensor has an boolean property to see if it is a scalar, isScalar.

Retrieve an element of a tensor use indexing, with the indices indicating the location of the element, such as [0,0] to get the first element of the 2D array.

FloatTensor is an an abstract class for the implementation of DTensor for floating point numbers. There are multiple types of implementations such as scalar, dense, and sparse tensors.

DScalar is the interface for all differentiable scalars.

FloatScalar is an implementation of the interfaces DScalar and FloatTensor.

tensorOf is a factory function that creates a FloatTensor from a set of float numbers. The initial tensor is a 1D array. After creating a tensor with tensorOf, you may need to reshape the tensor to the shape you want.

Tensor Operations

The DTensor interface has many operations that can be applied to a tensor. Click on the Extentions tab in the Kotlin docs of DTensor to see all the operations. Some of the operations allow the use of traditional arithmatic notation, or operator overloading. We will look at a few of the operations in the below examples:

'+' or plus,

'-' or minus,

'*' or times,

'/' or div,

pow,

sin,

cos,

matmul,

sum and,

innerProduct

Calculating the Derivative of a Scalar Function

There are two different algorithms for calculating the derivative of a function over a DScalar variable, the forward derivative algorithm and the reverse derivative algorithm. The forward derivative algorithm is more efficient for when a function has more output variables than input variables. The reverse derivative algorithm is more efficient for a function that has more input variables that output variables. For most situations of optimizing a scalar function, where the output of the function is a single variable, the reverse derivative algorithm is more efficient.

In calling the below functions, one passes a scalar variable, a DScalar, to be differentiated and a lambda of the function of the variable. In Kotlin, if you declare the function fun f(x) then the lambda is ::f.

forwardDerivative calculates the derivative of a function over a DScalar evaluated at the DScalar x using the forward derivative algorithm.

reverseDerivative calculates the derivative of a function over a DScalar evaluated at the DScalar x using the reverse derivative algorithm.

In many cases it is more efficient to calculate the orignal scalar function and its derivative at the same time. In the below functions, they return a Pair<DTensor, DTensor> where the first value is called the primal, which is the value of a function evaluated at x, where x is a tensor, and the second value is called the tangent, which is the derivative of a function evaluated at x, where x is a tensor.

primalAndForwardDerivative calculates a function over DScalar and its derivative evaluated at the DScalar x using the forward derivative algorithm.

primalAndReverseDerivative calculates a function over a DScalar and its derivative evaluated at the DScalar x using the reverse derivative algorithm.

Derivatives of a Function over a Tensor

The symbol nabla, \nabla, is an inverted greek symbol Δ\Delta. The gradient of a function over a vector of variables is f(x)\nabla f(\mathbf x), and is the partial derivatives of the function with respect to each variable. The Jacobian of a vector valued function, either J(f(x))J(\mathbf f(\mathbf x)) or f(x)\mathbf \nabla \mathbf f( \mathbf x) is the gradient of each vector component of the function, or the partial derivatives of each vector component of the function with respect to each variable.

The partial derivatives of a function with N inputs and 1 output at a point x\mathbf x, where x\mathbf x is a vector of size N, or a function f(x):RNR1f(\mathbf x):R^N \rightarrow R^1, is the gradient of the function, which is a function f(x):RNRN\nabla f(\mathbf x): R^N \rightarrow R^N. The gradient of a function of N variables, where x=[x1,x2,,xn]\mathbf x = \left [ x_1, x_2, \cdots, x_n \right ] is

f(x)=[f(x)x1,f(x)x2,,f(x)xn]T\nabla f(\mathbf x) = \left [ \frac {\partial f(\mathbf x)} {\partial x_1}, \frac {\partial f(\mathbf x)} {\partial x_2}, \cdots, \frac {\partial f(\mathbf x)} {\partial x_n} \right ]^T.

For example, if f(x,y)=4x2+2yf(x,y) = 4x^2 + 2y then f(x,y)=[8x,2]T\nabla f(x, y) = \left [ 8x, 2 \right ]^T, where f(x,y)=[f(x,y)x,f(x,y)y]T\nabla f(x,y) = \left [\frac {\partial f(x,y)} {\partial x}, \frac {\partial f(x,y)} {\partial y} \right ]^T.

The partial derivatives of a function with N inputs and M outputs at a point x\mathbf x, where x\mathbf x is of size N, or a function f(x):RNRM\mathbf f(\mathbf x): R^N \rightarrow R^M, is the Jacobian of the function, or f(x):RNRNxM\mathbf \nabla \mathbf f(\mathbf x): R^N \rightarrow R^{NxM}. The point x\mathbf x is a vector of variables, x=[x1,x2,,xn]\mathbf x = \left [ x_1, x_2, \cdots, x_n \right ]. The function f(x)\mathbf f(\mathbf x) is a vector of functions evaluated at x\mathbf x, f(x)=[f1(x),f2(x),,fm(x)]T\mathbf f(\mathbf x) = \left [ f_1(\mathbf x), f_2(\mathbf x), \cdots, f_m(\mathbf x) \right ]^T.

The Jacobian of a function is the partial derivatives of each component function by each variable.

f(x)=[f1x1f1xnfmx1fmxn]\mathbf \nabla \mathbf f(\mathbf x) = \begin{bmatrix}\frac{\partial f_1}{\partial x_1}\cdots\frac{\partial f_1}{\partial x_n}\\ \hspace{0.5em}\vdots\hspace{0.3em}\ddots\hspace{0.3em}\vdots\\ \frac{\partial f_m}{\partial x_1}\cdots \frac{\partial f_m}{\partial x_n}\end{bmatrix}.

For example, if f(x,y)=[4x2+2y,2x+4y2]\mathbf f(x,y) = \left[4x^2 + 2y, 2x + 4y^2 \right] then

the Jacobian is f(x,y)=[8x,22,8y]\mathbf \nabla \mathbf f(x,y) = \begin{bmatrix} 8x,\hspace{0.5em} 2\\ \hspace{0.5em} 2, 8y\end{bmatrix}.

forwardDerivative calculates the derivative of a function over a tensor, evaluated at the tensor x, using the forward derivative algorithm.

reverseDerivative calculates the derivative of a function over a tensor, evaluated at the tensor x, using the reverse derivative algorithm. The reverse derivative algorithm returns the transpose of the derivative calculation, compared to the forward derivative algorithm, when the result is a Jacobian or 2D tensor.

In many cases it is more efficient to calculate the orignal function and its partial derivatives at the same time. In the below functions, they return a Pair<DTensor, DTensor>. The first value is called the primal, which is the value of a function evaluated at x, where x is a tensor. The second value is called the tangent, which is the derivative of a function evaluated at x, where x is a tensor.

primalAndForwardDerivative calculates a function over a tensor x and its derivative, evaluated at the tensor x, using the forward derivative algorithm.

primalAndReverseDerivative calculates a function over a tensor x and its derivative, evaluated at the tensor x, using the reverse derivative algorithm. The reverse derivative algorithm returns the transpose of the derivative calculation, compared to the forward derivative algorithms, when the result is a Jacobian or 2D tensor.