Vector and Matrix Classes and Handlers

Background

The purpose of this page is to help new developers understand how Re::Solve separates data containers, operation handlers, backend workspaces, and memory spaces. In particular, it explains the difference between vector and matrix classes and the VectorHandler and MatrixHandler classes that operate on them.

The main distinction is that vector and matrix classes store data, while handler classes perform operations on that data. This distinction is important when writing code that needs to port to different backends run (e.g. CPU, CUDA, and HIP).

This separation allows solver logic to remain independent of backend-specific vector and matrix operations.

The main questions this page is meant to answer are:

  • What object stores the data?

  • Where does the data live?

  • What object performs the operation?

  • What backend resources does the operation need?

  • What is the difference between a vector or matrix class and a vector or matrix handler?

  • What needs to happen when data is loaded on the host but used on the device?

This page is not meant to document every method in detail. It is meant to give a practical mental model for reading and writing backend-capable Re::Solve code.

Core Design

The main design idea is that Re::Solve separates storage, operations, and backend resources.

The major pieces are:

  • vector::Vector objects store vector data.

  • Matrix objects, such as matrix::Csr, store sparse matrix data.

  • VectorHandler objects perform vector operations.

  • MatrixHandler objects perform matrix operations.

  • LinAlgWorkspace objects provide backend-specific resources for the handlers.

  • memory::HOST describes data stored in host-accessible memory.

  • memory::DEVICE describes data stored in device-accessible memory.

This means that a vector or matrix object is not automatically a CPU or GPU operation. The data object stores the values. The handler performs the operation. The workspace gives the handler the backend resources it needs.

This separation helps the same solver path run with different backend implementations.

Vector Objects

A vector::Vector object represents vector data. The vector object is a data container. It stores the size of the vector and the data associated with that vector. Before a vector is used, it must be allocated in a memory space.

Simplified example:

vector::Vector* x = new vector::Vector(n);
x->allocate(memory::HOST);

If the vector is loaded or initialized on the host but later used by a GPU backend, the data may need to be synchronized to the device.

Simplified example:

if (memspace == memory::DEVICE)
{
  x->syncData(memory::DEVICE);
}

The important distinction is that allocation and operation are separate steps. Allocating the vector controls where the data is stored. Calling a handler method controls what operation is performed on the data.

This is useful because the same vector object may be part of a CPU test path or a GPU test path, depending on how it is allocated, synchronized, and passed to backend-specific operations.

Matrix Objects

Matrix objects represent matrix data. Like vector objects, matrix objects are data containers. They store or describe the matrix data, but they do not perform matrix operations by themselves.

Sparse matrices are commonly stored in compressed sparse formats, such as CSR (compressed sparse row) and CSC (compressed sparse column). These formats store only the nonzero values of a sparse matrix along with index information that describes where those values belong.

In Re::Solve, matrix objects such as matrix::Csr store sparse matrix data. A CSR matrix stores the matrix dimensions, nonzero count, and sparse matrix data. Like vectors, a matrix object must be allocated in a memory space before it is used.

Simplified example:

matrix::Csr* A = new matrix::Csr(num_rows, num_cols, nnz);
A->allocateMatrixData(memory::HOST);

In file-loading paths, matrix data may need to be loaded into host memory first. For example, Matrix Market file readers write into host-accessible memory. If the test is running on a GPU backend, the matrix can then be synchronized to device memory.

SCCG test path example:

matrix::Csr* h = new matrix::Csr(2278, 2278, 11304, true, false);
h->allocateMatrixData(memory::HOST);
io::updateMatrixFromFile(h_file, h);

if (memspace_ == memory::DEVICE)
{
  h->syncData(memory::DEVICE);
}

This pattern matters because the memory space used for loading data is not always the same as the memory space used for computation.

Vector Handlers

A VectorHandler performs operations on vector::Vector objects. It does not replace the vector class. Instead, it provides backend-specific operations that act on existing vector data.

A VectorHandler may perform operations such as:

  • dot

  • scal

  • axpy

A useful way to think about the difference is:

  • vector::Vector stores the vector data.

  • VectorHandler performs vector operations on that data.

A useful way to separate the roles is that vector::Vector stores the data, while VectorHandler performs operations on that data. For example, a vector object may hold the entries of a residual vector, while a vector handler may compute a dot product, scale the vector, or add one vector to another.

Matrix Handlers

A MatrixHandler performs operations on matrix objects such as matrix::Csr. It does not replace the matrix class. Instead, it provides backend-specific matrix operations that act on existing matrix data.

A MatrixHandler may perform operations such as:

  • matvec

  • transpose

A useful way to separate the roles is that matrix::Csr stores the data, while MatrixHandler performs matrix operations on that data. For example, a matrix object may hold the CSR representation of a sparse matrix, while a matrix handler may perform a sparse matrix-vector product or construct a transpose.

Handler Setup

Handlers are created using a workspace for the selected backend. A simplified setup pattern is:

WorkspaceType workspace;
workspace.initializeHandles();

MatrixHandler matrix_handler(&workspace);
VectorHandler vector_handler(&workspace);

The handler uses the workspace that was created for the selected backend. This is why backend-capable solver code should generally receive the correct handlers from the caller instead of creating a hard-coded CPU, CUDA, or HIP handler internally.

Workspaces

Workspace classes provide the backend-specific resources needed by handlers. A CPU workspace, CUDA workspace, and HIP workspace may initialize different backend handles or library resources.

The general setup is:

  1. Create the workspace for the selected backend.

  2. Initialize the workspace handles.

  3. Create matrix and vector handlers using that workspace.

  4. Pass those handlers into the solver or test fixture.

Simplified SCCG setup example:

WorkspaceType workspace;
workspace.initializeHandles();

MatrixHandler matrix_handler(&workspace);
VectorHandler vector_handler(&workspace);

HykktSchurComplementConjugateGradientTests test(memspace,
                                                matrix_handler,
                                                vector_handler);

This keeps the solver or test fixture from being tied to only one backend.

Principle of Operation

The basic flow for backend-capable code is:

  1. Create or load vector and matrix data.

  2. Allocate that data in the correct memory space.

  3. If data is loaded on the host and used on the device, synchronize it to the device.

  4. Create the backend workspace.

  5. Create handlers from that workspace.

  6. Pass the handlers into the solver or test path.

  7. Use the handlers to perform vector and matrix operations.

This flow keeps the data, operation, and backend setup separate. It also makes it easier to identify whether a problem is caused by data storage, memory movement, backend setup, or the solver algorithm itself.

Re::Solve Context

Re::Solve examples are designed around repeated linear solver use cases. The public Re::Solve documentation describes examples that emulate a nonlinear solver calling the linear solver repeatedly. This matters because repeated solver calls can make setup cost, memory movement, and backend resource management important.

The public HyKKT documentation describes HyKKT as a solver for Karush-Kuhn-Tucker systems that can use hardware accelerators efficiently. The HyKKT description also explains that the solver uses block reduction and conjugate gradient on the Schur complement.

This background is useful for understanding why the SCCG path needs careful handling of matrix dimensions, memory spaces, and backend-specific handlers.

SCCG Example

SCCG stands for Schur Complement Conjugate Gradient. The SCCG test path is a useful example because it uses vector objects, matrix objects, vector handlers, matrix handlers, workspaces, and memory spaces together.

In the SCCG test path, the matrices are represented with matrix::Csr objects. This makes SCCG a useful example of how data containers and operation handlers work together in a backend-capable solver path.

SCCG uses a Schur complement structure. In the test path, the matrices do not all have the same dimensions, and this is expected.

The main matrices are:

  • H: a square matrix used in the inner solve.

  • Jc: a rectangular matrix.

  • Jc_tr: the transpose of Jc.

A simplified operation chain is:

  1. Multiply by Jc_tr.

  2. Solve with H.

  3. Multiply by Jc.

Because of this structure, not every temporary vector has the same size. Some vectors match the outer system dimension. Other vectors match the inner solve dimension. The important requirement is that each matrix and vector matches the operation being performed.

This is similar to other system designs where each component has a specific role. The matrix dimensions, memory spaces, and handlers all need to match the part of the solver path where they are being used.

Important Implementation Detail

One important detail in the SCCG test path is that the Matrix Market file readers write into host-accessible memory. This means the test data should be loaded into memory::HOST first.

For GPU backends, the data should then be synchronized to memory::DEVICE. This avoids trying to load file data directly into device memory when the file reader expects host-accessible memory.

The pattern is:

  1. Allocate in memory::HOST.

  2. Load the file data.

  3. If running on memory::DEVICE, synchronize to device memory.

This applies to both matrix and vector test data.

Why Solver Paths Receive Handlers

Solver paths that support multiple backends should receive backend-specific handlers from the caller because the caller knows which backend is being used. If a solver creates its own handler internally, it can accidentally create a handler for the wrong backend.

The safer design is:

  • The caller or test runner selects the backend.

  • The caller or test runner creates the correct workspace.

  • The caller or test runner creates the correct matrix and vector handlers.

  • The solver receives and uses those handlers.

In the SCCG path, this allows the same solver code to work with CPU, CUDA, and HIP backends.

Inputs and Outputs

The main inputs to this code pattern are:

  • Matrix and vector data.

  • A selected memory space, such as memory::HOST or memory::DEVICE.

  • A backend workspace.

  • Matrix and vector handlers.

  • Solver-specific data, such as matrix dimensions and solver tolerance.

The main outputs are:

  • Correctly allocated and synchronized data.

  • Backend-specific matrix and vector operations.

  • A solver path that can run on more than one backend.

  • A clearer separation between storage, computation, and backend resources.

Common Details to Watch For

The following points may not be clear when first reading this part of the code:

  • File readers may require host-accessible memory.

  • Loading data and using data may happen in different memory spaces.

  • A vector::Vector or matrix::Csr object stores data, while a handler performs an operation.

  • A workspace provides backend-specific resources for handlers.

  • A solver that supports multiple backends should receive backend-specific handlers from the caller instead of creating a hard-coded backend handler internally.

  • Rectangular matrices can be expected in SCCG because the Schur complement path uses different inner and outer dimensions.

  • For GPU tests, loading into memory::HOST first and then synchronizing to memory::DEVICE may be necessary.

  • A test that passes on CPU may still expose memory-space or backend-handler issues on CUDA or HIP.

Checklist for Backend-Capable Code

When writing or reviewing code that should work on CPU and GPU backends, check the following:

  • Is the object allocated before it is used?

  • Is the object allocated in the memory space expected by the next operation?

  • If data was loaded on the host, is it synchronized to the device before GPU operations?

  • Are the matrix and vector dimensions consistent with the operation chain?

  • Are the handlers created from the correct backend workspace?

  • Is the solver receiving backend-specific handlers from the caller?

Suggested Validation

When changing code that uses these classes and handlers, it is useful to test the relevant CPU and GPU paths when the local environment supports them. For an SCCG-related change, this may include building the CPU and CUDA configurations and running the SCCG test executable.

Example commands may vary by environment, but the basic checks are:

cmake --build build-cpu
./build-cpu/tests/unit/hykkt/hykkt_sccg_test

cmake --build build-cuda
./build-cuda/tests/unit/hykkt/hykkt_sccg_test

System Analysis

The main purpose of this structure is to make backend-capable solver code easier to reason about. The vector and matrix classes provide the data storage. The handlers provide the operations. The workspace provides backend resources. The memory space describes where the data lives and where operations should occur.

This separation is especially useful for solver code that needs to work across CPU, CUDA, and HIP. It reduces the chance that solver code will accidentally use a CPU-specific handler inside a GPU path. It also makes the memory movement more explicit when data is loaded on the host and then used on the device.

In the SCCG test path, this structure helps explain why the test loads data into host memory first, why it synchronizes to device memory for GPU backends, and why SCCG receives matrix and vector handlers from the caller.

This design also fits the larger Re::Solve and HyKKT motivation. Public ORNL and Re::Solve materials describe GPU-resident linear solvers as useful in scientific computing and optimization workflows where linear solves can dominate runtime. In those workflows, keeping data movement and backend operations organized is part of making the solver path practical on modern CPU and GPU systems.

Further Reading