Vector and Matrix Classes and Handlers¶
Background¶
The purpose of this page is to help new developers understand how Re::Solve
separates data containers, operation handlers, backend workspaces, and memory
spaces. In particular, it explains the difference between vector and matrix
classes and the VectorHandler and MatrixHandler classes that operate on
them.
The main distinction is that vector and matrix classes store data, while handler classes perform operations on that data. This distinction is important when writing code that needs to port to different backends run (e.g. CPU, CUDA, and HIP).
This separation allows solver logic to remain independent of backend-specific vector and matrix operations.
The main questions this page is meant to answer are:
What object stores the data?
Where does the data live?
What object performs the operation?
What backend resources does the operation need?
What is the difference between a vector or matrix class and a vector or matrix handler?
What needs to happen when data is loaded on the host but used on the device?
This page is not meant to document every method in detail. It is meant to give a practical mental model for reading and writing backend-capable Re::Solve code.
Core Design¶
The main design idea is that Re::Solve separates storage, operations, and backend resources.
The major pieces are:
vector::Vectorobjects store vector data.Matrix objects, such as
matrix::Csr, store sparse matrix data.VectorHandlerobjects perform vector operations.MatrixHandlerobjects perform matrix operations.LinAlgWorkspaceobjects provide backend-specific resources for the handlers.memory::HOSTdescribes data stored in host-accessible memory.memory::DEVICEdescribes data stored in device-accessible memory.
This means that a vector or matrix object is not automatically a CPU or GPU operation. The data object stores the values. The handler performs the operation. The workspace gives the handler the backend resources it needs.
This separation helps the same solver path run with different backend implementations.
Vector Objects¶
A vector::Vector object represents vector data. The vector object is a data
container. It stores the size of the vector and the data associated with that
vector. Before a vector is used, it must be allocated in a memory space.
Simplified example:
vector::Vector* x = new vector::Vector(n);
x->allocate(memory::HOST);
If the vector is loaded or initialized on the host but later used by a GPU backend, the data may need to be synchronized to the device.
Simplified example:
if (memspace == memory::DEVICE)
{
x->syncData(memory::DEVICE);
}
The important distinction is that allocation and operation are separate steps. Allocating the vector controls where the data is stored. Calling a handler method controls what operation is performed on the data.
This is useful because the same vector object may be part of a CPU test path or a GPU test path, depending on how it is allocated, synchronized, and passed to backend-specific operations.
Matrix Objects¶
Matrix objects represent matrix data. Like vector objects, matrix objects are data containers. They store or describe the matrix data, but they do not perform matrix operations by themselves.
Sparse matrices are commonly stored in compressed sparse formats, such as CSR (compressed sparse row) and CSC (compressed sparse column). These formats store only the nonzero values of a sparse matrix along with index information that describes where those values belong.
In Re::Solve, matrix objects such as matrix::Csr store sparse matrix data.
A CSR matrix stores the matrix dimensions, nonzero count, and sparse matrix
data. Like vectors, a matrix object must be allocated in a memory space before
it is used.
Simplified example:
matrix::Csr* A = new matrix::Csr(num_rows, num_cols, nnz);
A->allocateMatrixData(memory::HOST);
In file-loading paths, matrix data may need to be loaded into host memory first. For example, Matrix Market file readers write into host-accessible memory. If the test is running on a GPU backend, the matrix can then be synchronized to device memory.
SCCG test path example:
matrix::Csr* h = new matrix::Csr(2278, 2278, 11304, true, false);
h->allocateMatrixData(memory::HOST);
io::updateMatrixFromFile(h_file, h);
if (memspace_ == memory::DEVICE)
{
h->syncData(memory::DEVICE);
}
This pattern matters because the memory space used for loading data is not always the same as the memory space used for computation.
Vector Handlers¶
A VectorHandler performs operations on vector::Vector objects. It does not replace
the vector class. Instead, it provides backend-specific operations
that act on existing vector data.
A VectorHandler may perform operations such as:
dotscalaxpy
A useful way to think about the difference is:
vector::Vectorstores the vector data.VectorHandlerperforms vector operations on that data.
A useful way to separate the roles is that vector::Vector stores the data, while VectorHandler performs operations on that data.
For example, a vector object may hold the entries of a residual vector, while a
vector handler may compute a dot product, scale the vector, or add one vector
to another.
Matrix Handlers¶
A MatrixHandler performs operations on matrix objects such as matrix::Csr. It does not replace
the matrix class. Instead, it provides backend-specific matrix operations that
act on existing matrix data.
A MatrixHandler may perform operations such as:
matvectranspose
A useful way to separate the roles is that matrix::Csr stores the data, while MatrixHandler performs matrix operations on that data.
For example, a matrix object may hold the CSR representation of a sparse
matrix, while a matrix handler may perform a sparse matrix-vector product or
construct a transpose.
Handler Setup¶
Handlers are created using a workspace for the selected backend. A simplified setup pattern is:
WorkspaceType workspace;
workspace.initializeHandles();
MatrixHandler matrix_handler(&workspace);
VectorHandler vector_handler(&workspace);
The handler uses the workspace that was created for the selected backend. This is why backend-capable solver code should generally receive the correct handlers from the caller instead of creating a hard-coded CPU, CUDA, or HIP handler internally.
Workspaces¶
Workspace classes provide the backend-specific resources needed by handlers. A CPU workspace, CUDA workspace, and HIP workspace may initialize different backend handles or library resources.
The general setup is:
Create the workspace for the selected backend.
Initialize the workspace handles.
Create matrix and vector handlers using that workspace.
Pass those handlers into the solver or test fixture.
Simplified SCCG setup example:
WorkspaceType workspace;
workspace.initializeHandles();
MatrixHandler matrix_handler(&workspace);
VectorHandler vector_handler(&workspace);
HykktSchurComplementConjugateGradientTests test(memspace,
matrix_handler,
vector_handler);
This keeps the solver or test fixture from being tied to only one backend.
Principle of Operation¶
The basic flow for backend-capable code is:
Create or load vector and matrix data.
Allocate that data in the correct memory space.
If data is loaded on the host and used on the device, synchronize it to the device.
Create the backend workspace.
Create handlers from that workspace.
Pass the handlers into the solver or test path.
Use the handlers to perform vector and matrix operations.
This flow keeps the data, operation, and backend setup separate. It also makes it easier to identify whether a problem is caused by data storage, memory movement, backend setup, or the solver algorithm itself.
Re::Solve Context¶
Re::Solve examples are designed around repeated linear solver use cases. The public Re::Solve documentation describes examples that emulate a nonlinear solver calling the linear solver repeatedly. This matters because repeated solver calls can make setup cost, memory movement, and backend resource management important.
The public HyKKT documentation describes HyKKT as a solver for Karush-Kuhn-Tucker systems that can use hardware accelerators efficiently. The HyKKT description also explains that the solver uses block reduction and conjugate gradient on the Schur complement.
This background is useful for understanding why the SCCG path needs careful handling of matrix dimensions, memory spaces, and backend-specific handlers.
SCCG Example¶
SCCG stands for Schur Complement Conjugate Gradient. The SCCG test path is a useful example because it uses vector objects, matrix objects, vector handlers, matrix handlers, workspaces, and memory spaces together.
In the SCCG test path, the matrices are represented with matrix::Csr
objects. This makes SCCG a useful example of how data containers and operation
handlers work together in a backend-capable solver path.
SCCG uses a Schur complement structure. In the test path, the matrices do not all have the same dimensions, and this is expected.
The main matrices are:
H: a square matrix used in the inner solve.Jc: a rectangular matrix.Jc_tr: the transpose ofJc.
A simplified operation chain is:
Multiply by
Jc_tr.Solve with
H.Multiply by
Jc.
Because of this structure, not every temporary vector has the same size. Some vectors match the outer system dimension. Other vectors match the inner solve dimension. The important requirement is that each matrix and vector matches the operation being performed.
This is similar to other system designs where each component has a specific role. The matrix dimensions, memory spaces, and handlers all need to match the part of the solver path where they are being used.
Important Implementation Detail¶
One important detail in the SCCG test path is that the Matrix Market file
readers write into host-accessible memory. This means the test data should be
loaded into memory::HOST first.
For GPU backends, the data should then be synchronized to memory::DEVICE.
This avoids trying to load file data directly into device memory when the file
reader expects host-accessible memory.
The pattern is:
Allocate in
memory::HOST.Load the file data.
If running on
memory::DEVICE, synchronize to device memory.
This applies to both matrix and vector test data.
Why Solver Paths Receive Handlers¶
Solver paths that support multiple backends should receive backend-specific handlers from the caller because the caller knows which backend is being used. If a solver creates its own handler internally, it can accidentally create a handler for the wrong backend.
The safer design is:
The caller or test runner selects the backend.
The caller or test runner creates the correct workspace.
The caller or test runner creates the correct matrix and vector handlers.
The solver receives and uses those handlers.
In the SCCG path, this allows the same solver code to work with CPU, CUDA, and HIP backends.
Inputs and Outputs¶
The main inputs to this code pattern are:
Matrix and vector data.
A selected memory space, such as
memory::HOSTormemory::DEVICE.A backend workspace.
Matrix and vector handlers.
Solver-specific data, such as matrix dimensions and solver tolerance.
The main outputs are:
Correctly allocated and synchronized data.
Backend-specific matrix and vector operations.
A solver path that can run on more than one backend.
A clearer separation between storage, computation, and backend resources.
Common Details to Watch For¶
The following points may not be clear when first reading this part of the code:
File readers may require host-accessible memory.
Loading data and using data may happen in different memory spaces.
A
vector::Vectorormatrix::Csrobject stores data, while a handler performs an operation.A workspace provides backend-specific resources for handlers.
A solver that supports multiple backends should receive backend-specific handlers from the caller instead of creating a hard-coded backend handler internally.
Rectangular matrices can be expected in SCCG because the Schur complement path uses different inner and outer dimensions.
For GPU tests, loading into
memory::HOSTfirst and then synchronizing tomemory::DEVICEmay be necessary.A test that passes on CPU may still expose memory-space or backend-handler issues on CUDA or HIP.
Checklist for Backend-Capable Code¶
When writing or reviewing code that should work on CPU and GPU backends, check the following:
Is the object allocated before it is used?
Is the object allocated in the memory space expected by the next operation?
If data was loaded on the host, is it synchronized to the device before GPU operations?
Are the matrix and vector dimensions consistent with the operation chain?
Are the handlers created from the correct backend workspace?
Is the solver receiving backend-specific handlers from the caller?
Suggested Validation¶
When changing code that uses these classes and handlers, it is useful to test the relevant CPU and GPU paths when the local environment supports them. For an SCCG-related change, this may include building the CPU and CUDA configurations and running the SCCG test executable.
Example commands may vary by environment, but the basic checks are:
cmake --build build-cpu
./build-cpu/tests/unit/hykkt/hykkt_sccg_test
cmake --build build-cuda
./build-cuda/tests/unit/hykkt/hykkt_sccg_test
System Analysis¶
The main purpose of this structure is to make backend-capable solver code easier to reason about. The vector and matrix classes provide the data storage. The handlers provide the operations. The workspace provides backend resources. The memory space describes where the data lives and where operations should occur.
This separation is especially useful for solver code that needs to work across CPU, CUDA, and HIP. It reduces the chance that solver code will accidentally use a CPU-specific handler inside a GPU path. It also makes the memory movement more explicit when data is loaded on the host and then used on the device.
In the SCCG test path, this structure helps explain why the test loads data into host memory first, why it synchronizes to device memory for GPU backends, and why SCCG receives matrix and vector handlers from the caller.
This design also fits the larger Re::Solve and HyKKT motivation. Public ORNL and Re::Solve materials describe GPU-resident linear solvers as useful in scientific computing and optimization workflows where linear solves can dominate runtime. In those workflows, keeping data movement and backend operations organized is part of making the solver path practical on modern CPU and GPU systems.
Further Reading¶
Shaked Regev, Preconditioning Techniques for Sparse Linear Systems
Katarzyna Swirydowicz, Strategies for Recycling Krylov Subspace Methods and Bilinear Form Estimation
ORNL publication page on GPU-resident sparse direct linear solvers for ACOPF
OSTI paper, Iterative Methods in GPU-Resident Linear Solvers for Nonlinear Constrained Optimization