mini_jit::kernels

mini_jit::kernels::matmul

void mini_jit::kernels::matmul::matmul_br_m_n_k(mini_jit::Kernel &kernel, int m, int n, int k, int br_size)

Kernel for batch-reduce matrix multiplication.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in A and C.

  • n – number of columns in B and C.

  • k – number of columns in A and rows in B.

  • br_size – batch-reduce size.

void mini_jit::kernels::matmul::matmul_m_n_k(mini_jit::Kernel &kernel, int m, int n, int k)

Kernel for batch-reduce matrix multiplication.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in A and C.

  • n – number of columns in B and C.

  • k – number of columns in A and rows in B.

mini_jit::kernels::matmul::subkernels

void mini_jit::kernels::matmul::subkernels::matmul_16_6_1(mini_jit::Kernel &kernel)

Kernel for batch-reduce matrix multiplication.

Parameters:

kernelKernel object to be filled with instructions.

void mini_jit::kernels::matmul::subkernels::matmul_16_6_k(mini_jit::Kernel &kernel, int k)

Kernel for batch-reduce matrix multiplication.

Parameters:
  • kernelKernel object to be filled with instructions.

  • k – number of columns in A and rows in B.

void mini_jit::kernels::matmul::subkernels::matmul_m_1_k(mini_jit::Kernel &kernel, int m, int k)

Kernel for batch-reduce matrix multiplication.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in A and C.

  • k – number of columns in A and rows in B.

void mini_jit::kernels::matmul::subkernels::matmul_m_2_k(mini_jit::Kernel &kernel, int m, int k)

Kernel for batch-reduce matrix multiplication.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in A and C.

  • k – number of columns in A and rows in B.

void mini_jit::kernels::matmul::subkernels::matmul_m_3_k(mini_jit::Kernel &kernel, int m, int k)

Kernel for batch-reduce matrix multiplication.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in A and C.

  • k – number of columns in A and rows in B.

void mini_jit::kernels::matmul::subkernels::matmul_m_4_k(mini_jit::Kernel &kernel, int m, int k)

Kernel for batch-reduce matrix multiplication.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in A and C.

  • k – number of columns in A and rows in B.

mini_jit::kernels::unary

void mini_jit::kernels::unary::decrement(mini_jit::Kernel &kernel, u_int32_t m, u_int32_t n)

Kernel that decrements the input by one and stores it into the output.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::unary::decrement_trans(mini_jit::Kernel &kernel, int m, int n)

Kernel for performing the decrement operation on a matrix while transposing the output.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::unary::fast_sigmoid(mini_jit::Kernel &kernel, u_int32_t m, u_int32_t n)

Kernel that applies the fast sigmoid activation function to a matrix.

Specifically, it computes the function: f(x) = 0.5 * (x / (1 + abs(x)) + 1) for each element in the matrix.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::unary::identity(mini_jit::Kernel &kernel, uint32_t m, uint32_t n)

Kernel for performing the identity operation on a matrix.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::unary::identity_trans(mini_jit::Kernel &kernel, int m, int n)

Kernel for performing the identity operation on a matrix while transposing the output.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::unary::increment(mini_jit::Kernel &kernel, u_int32_t m, u_int32_t n)

Kernel that increments the input by one and stores it into the output.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::unary::increment_trans(mini_jit::Kernel &kernel, int m, int n)

Kernel for performing the increment operation on a matrix while transposing the output.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::unary::reciprocal(mini_jit::Kernel &kernel, u_int32_t m, u_int32_t n)

Kernel that computes the element-wise reciprocal of the input and stores it into the output.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::unary::reciprocal_trans(mini_jit::Kernel &kernel, int m, int n)

Kernel for performing the reciprocal operation on a matrix while transposing the output.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::unary::relu(mini_jit::Kernel &kernel, u_int32_t m, u_int32_t n)

Kernel that applies ReLU activation function to a matrix.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::unary::relu_trans(mini_jit::Kernel &kernel, int m, int n)

Kernel that applies ReLU activation function to a matrix while transposing the output.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::unary::sigmoid_interpolation(mini_jit::Kernel &kernel, u_int32_t m, u_int32_t n)

Kernel that applies sigmoid activation function to the input and stores it into the output. Uses linear interpolation for fast SIMD computation.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::unary::sigmoid_taylor(mini_jit::Kernel &kernel, u_int32_t m, u_int32_t n)

Kernel that applies sigmoid activation function to the input and stores it into the output. Uses polynomial approximation: σ(x) ≈ 0.5 + 0.25*x - 0.020833*x^3 + 0.002083*x^5 for precise SIMD computation in [-2,2].

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::unary::square(mini_jit::Kernel &kernel, u_int32_t m, u_int32_t n)

Kernel that squares the input and stores it into the output.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::unary::square_trans(mini_jit::Kernel &kernel, int m, int n)

Kernel for performing the square operation on a matrix while transposing the output.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::unary::zero(mini_jit::Kernel &kernel, uint32_t m, uint32_t n, uint32_t trans_b)

Kernel for zeroing out a matrix using neon and EOR.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

  • trans_b – 0 if B is stored in column-major order, 1 if B is stored in row-major order.

void mini_jit::kernels::unary::zero_xzr(mini_jit::Kernel &kernel, u_int32_t m, u_int32_t n, u_int32_t trans_b)

Kernel for zeroing out a matrix using the XZR register.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

  • trans_b – 0 if B is stored in column-major order, 1 if B is stored in row-major order.

mini_jit::kernels::binary

void mini_jit::kernels::binary::add(mini_jit::Kernel &kernel, u_int32_t m, u_int32_t n)

Kernel that adds two matrices element-wise.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::binary::div(mini_jit::Kernel &kernel, u_int32_t m, u_int32_t n)

Kernel that divides two matrices element-wise.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::binary::max(mini_jit::Kernel &kernel, u_int32_t m, u_int32_t n)

Kernel that computes the element-wise maximum of two matrices.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::binary::min(mini_jit::Kernel &kernel, u_int32_t m, u_int32_t n)

Kernel that computes the element-wise minimum of two matrices.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::binary::mul(mini_jit::Kernel &kernel, u_int32_t m, u_int32_t n)

Kernel that multiplies two matrices element-wise.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.

void mini_jit::kernels::binary::sub(mini_jit::Kernel &kernel, u_int32_t m, u_int32_t n)

Kernel that subtracts two matrices element-wise.

Parameters:
  • kernelKernel object to be filled with instructions.

  • m – number of rows in the matrix.

  • n – number of columns in the matrix.