Amesos2: Reindexing of matrix #13205

maxfirmbach · 2024-07-03T14:48:45Z

Question

Dear Amesos2-developers,

currently I'm trying to switch our in-house code from Amesos to Amesos2.
As, let's call it pre-processing step, we do a reindexing of our linear problem to have a continous GID numbering from 0 ... n, using EpetraExt (which in our case is necessary as we have GID jumps). Is there a similar feature implemented in Xpetra or Tpetra?

I've seen that Amesos2 has a reindex input parameter, but looking into the code, it seems to do ... basically nothing? From documentation:

We plan in the future to support the following parameters:

    "Reindex" : { true | false }. Put the matrix row and column indices into the range [0..n].

Are there any intentions to implement this feature properly into Amesos2?

Best regards,
Max

@trilinos/amesos2
@mayrmt

The text was updated successfully, but these errors were encountered:

csiefer2 · 2024-07-03T14:56:14Z

@ndellingwood @iyamazaki

MalachiTimothyPhillips · 2024-07-03T17:08:16Z

As an aside, I don't think you are required to have continuous GIDs so long as you set the IsContiguous setting to false. For our application code, we've never had to remap our discontinuous GIDs to continuous GIDs.

See, for example: https://github.com/trilinos/Trilinos/blob/master/packages/amesos2/test/solvers/Superlu_UnitTests.cpp#L372.

The one big caveat with that is if you dump the matrix to matrix-market format, you'll need to remap those GIDs to be 1-based index to be in compliance with the matrix market format.
Here's a completely un-tested, un-supported, provided as-is without any warranty Python script that can accomplish that as a post-processing step:

Script:

#!/usr/bin/env python3
import os.path
import scipy.io
import numpy as np
import sys
from itertools import (takewhile, repeat)
import glob
import pathlib
import argparse

def warn_on_missing_file(fileName):
  errString = f"File {fileName} does not exist!"
  print(errString)
  return


def warn_on_existing_file(fileName):
  errString = f"Cowardly refusing to overwrite file {fileName}.\n"
  errString += "Please use --overwrite to allow overwriting to occur, or alter the output matrix filename.\n"
  print(errString)
  return

def draw_progress_bar(percentage, process="", num_bins=20):
  num_draw = int(percentage * num_bins)
  sys.stdout.write('\r')
  sys.stdout.write(
      f"{process} : [{'='*num_draw}{' ' * (num_bins-num_draw)}] {int(100 * percentage)}%")
  sys.stdout.flush()
  return

def grab_line_count(filename):
  f = open(filename, 'rb')
  bufgen = takewhile(lambda x: x, (f.raw.read(1024*1024)
                     for _ in repeat(None)))
  return sum(buf.count(b'\n') for buf in bufgen)


def read_matrix_file(matrixFileName):
  gids = {}
  A_coo = {}
  readSizeLine = False
  update_line_freq = 1000
  num_lines = grab_line_count(matrixFileName)
  header_content = ""

  with open(matrixFileName, "r") as matrixFile:
    for line_num, line in enumerate(matrixFile):
      if line_num % update_line_freq == 0:
        draw_progress_bar((line_num+1)/num_lines, "reading matrix file")
      if "%" in line:
        header_content += line
        continue
      if not readSizeLine:
        header_content += line
        readSizeLine = True
        continue
      i, j, A_ij = line.split()
      i, j = int(i), int(j)
      A_coo[(i, j)] = A_ij
      gids[i] = None
  draw_progress_bar(1, "reading matrix file")
  print()

  rows = {}
  row = 1
  for gid in gids.keys():
    rows[gid] = row
    row = row + 1
  return rows, A_coo, header_content


def write_matrix_file(rows, A_coo, header, outputMatrixFileName):
  update_line_freq = 1000
  newFileContents = header
  num_matrix_entries = len(A_coo)
  for mat_entry, ((gid_i, gid_j), A_ij) in enumerate(A_coo.items()):
    if mat_entry % update_line_freq == 0:
      draw_progress_bar((mat_entry+1)/num_matrix_entries,
                        "creating new matrix file")
    i = rows[gid_i]
    j = rows[gid_j]
    newFileContents += f"{i} {j} {A_ij}\n"
  draw_progress_bar(1, "creating new matrix file")
  print()

  with open(outputMatrixFileName, "w") as newMatrixFile:
    newMatrixFile.write(newFileContents)
  return


def reassign_matrix_rows(matrixFileName, newMatrixFileName):
  rows, A_coo, header = read_matrix_file(matrixFileName)
  write_matrix_file(rows, A_coo, header, newMatrixFileName)
  return

if __name__ == "__main__":
  parser = argparse.ArgumentParser(
      description='Re-assign row ids in matrix market file to force them to be contiguous and 1-base indexed')
  parser.add_argument('-i', '--input-matrix',
                      help='Input matrix-market file', required=True)
  parser.add_argument('-o', '--output-matrix',
                      help='Output matrix-market file', required=True)
  parser.add_argument('--overwrite', action='store_true')
  args = vars(parser.parse_args())
  inputMatrix = args['input_matrix']
  outputMatrix = args['output_matrix']
  
  allGood = True
  if not os.path.isfile(inputMatrix):
    warn_on_missing_file(inputMatrix)
    print("Please check your --input-matrix/-i option.")
    allGood = False

  if os.path.isfile(outputMatrix) and not overwrite:
    warn_on_existing_file(outputMatrix)
    allGood = False

  reassign_matrix_rows(inputMatrix, outputMatrix)

maxfirmbach · 2024-07-04T08:41:11Z

@MalachiTimothyPhillips Interesting! I will try to set the IsContiguous parameter to false, maybe that already solves the problem.

maxfirmbach · 2024-07-09T09:30:30Z

@MalachiTimothyPhillips I checked the flag, but it's not helping in our case ... I still get the same memory errors, if I don't reindex the problem ...

cgcgcg · 2024-07-09T13:38:50Z

@maxfirmbach Can you reproduce the issue with a matrix market file of your matrix and some Trilinos executable?

MalachiTimothyPhillips · 2024-07-09T14:47:57Z

@maxfirmbach Can you reproduce the issue with a matrix market file of your matrix and some Trilinos executable?

@cgcgcg I'm pretty sure the Tpetra matrix market readers require contiguous GID numbering at the moment. That said, @maxfirmbach could dump the (non-contiguously numbered) matrix, use the conversion script to use 1-based contiguous GID numbering, read in the matrix, and then alter the GID ordering to something like GID = 10 * old_GID + 3.

@maxfirmbach Do you have a stack-trace or something for the error you are hitting? We've never had to re-index for our application code, and we have some very non-contiguous GIDs.

maxfirmbach · 2024-07-09T15:58:30Z

@cgcgcg @MalachiTimothyPhillips

Seems like with the respective flag set to false I get something like this:

#0 0x55b124a14643 in Tpetra::Map<int, int, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::getComm() const
#1 0x55b1032bccb9 in Teuchos::RCP<Tpetra::Map<int, int, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const> Tpetra::Details::computeGatherMap<Tpetra::Map<int, int, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >(Teuchos::RCP<Tpetra::Map<int, int, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const>, Teuchos::RCP<Teuchos::basic_FancyOStream<char, std::char_traits<char> > > const&, bool)
#2 0x55b123dbbd55 in Teuchos::RCP<Tpetra::Map<int, int, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const> const Amesos2::Util::getDistributionMap<int, int, unsigned long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >(Amesos2::EDistribution, unsigned long, Teuchos::RCP<Teuchos::Comm<int> const> const&, int, Teuchos::RCP<Tpetra::Map<int, int, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const> const&)
#3 0x55b123f77d16 in Amesos2::Umfpack<Epetra_CrsMatrix, Epetra_MultiVector>::loadA_impl(Amesos2::EPhase)
...

Maybe let's clarify what contiguous GID numbering means ...does it mean to have something like [0, 1, 2, 6, 7, 8, 12, 13, 14, ...] ? Btw, I'm using Umfpack as direct solver, not sure if this is relevant.

maxfirmbach · 2024-07-09T16:04:29Z

It's weird though ... a Tpetra error ... despite the fact I'm using Epetra ... Might also be that something major is going wrong in our code, but with the reindexing there's no problem ;)

mayrmt · 2024-07-09T17:43:34Z

@maxfirmbach Can you pinpoint the exact location of the error inside the Amesos2 source code? Maybe some #ifdef is wrong and you accidentally end up in a Tpetra code path.

MalachiTimothyPhillips · 2024-07-09T18:25:43Z

@maxfirmbach

Contiguous GID numbering as in [0, 1, 2, .., nrow-1] as opposed to, e.g., [0, 3, 6, ..., 3*(nrow-1)]. At least, that's my understanding.

I should clarify that I've only ever tested using Amesos2 + Tpetra, so maybe the discontinuous numbering does not work as expected with Epetra?

iyamazaki · 2024-07-10T05:19:25Z

You are right. I do not see testing for non-contiguous GIDs with Epetra. I'll take a look.

maxfirmbach · 2024-07-10T05:25:24Z

@iyamazaki @MalachiTimothyPhillips Could be that there's a problem with Epetra ... I know this is a edge case, but a working version with Epetra would make my/our life easier switching to Tpetra without having to port all packages at once.

maxfirmbach · 2024-07-10T05:33:16Z

@maxfirmbach Can you pinpoint the exact location of the error inside the Amesos2 source code? Maybe some #ifdef is wrong and you accidentally end up in a Tpetra code path.

@mayrmt Yes, I can track it down till getDistributionMap() in Amesos2_Util.{hpp,cpp}. The respective function arguments are always Tpetra::Map.

iyamazaki · 2024-07-10T05:39:02Z

I can reproduce the error, so hopefully I can fix it. I think we create Tpetra::Map, but then convert it to Epetra::Map for Epetra backend.

iyamazaki · 2024-07-10T19:23:05Z

Basically, the issue is

Trilinos/packages/amesos2/src/Amesos2_EpetraRowMatrix_AbstractMatrixAdapter_def.hpp

Lines 305 to 317 in 9b42dd2

    
             template <class DerivedMat> 
        
             const RCP<const Tpetra::Map<MatrixTraits<Epetra_RowMatrix>::local_ordinal_t, 
        
                                         MatrixTraits<Epetra_RowMatrix>::global_ordinal_t, 
        
                                         MatrixTraits<Epetra_RowMatrix>::node_t> > 
        
             AbstractConcreteMatrixAdapter<Epetra_RowMatrix, DerivedMat>::getMap_impl() const 
        
             { 
        
               // Should Map() be used (returns Epetra_BlockMap) 
        
               /* 
        
               printf("Amesos2_EpetraRowMatrix_AbstractMatrixAdapter: Epetra does not support a 'getMap()' method. Returning rcp(Teuchos::null). \ 
        
                   If your map contains non-contiguous GIDs please use Tpetra instead of Epetra. \n"); 
        
               */ 
        
               return( Teuchos::null ); 
        
             }

We could return rowMap as a patch (which lets me pass my very limited test), but I am not sure the whole effects of that. We are wondering how critical it is to support non-contiguous GIDs with Epetra. @ndellingwood

maxfirmbach added the type: question label Jul 3, 2024

mayrmt added the pkg: Amesos2 label Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amesos2: Reindexing of matrix #13205

Amesos2: Reindexing of matrix #13205

maxfirmbach commented Jul 3, 2024 •

edited by mayrmt

Loading

csiefer2 commented Jul 3, 2024

MalachiTimothyPhillips commented Jul 3, 2024 •

edited

Loading

maxfirmbach commented Jul 4, 2024

maxfirmbach commented Jul 9, 2024

cgcgcg commented Jul 9, 2024

MalachiTimothyPhillips commented Jul 9, 2024

maxfirmbach commented Jul 9, 2024 •

edited

Loading

maxfirmbach commented Jul 9, 2024 •

edited

Loading

mayrmt commented Jul 9, 2024

MalachiTimothyPhillips commented Jul 9, 2024

iyamazaki commented Jul 10, 2024

maxfirmbach commented Jul 10, 2024

maxfirmbach commented Jul 10, 2024

iyamazaki commented Jul 10, 2024

iyamazaki commented Jul 10, 2024

Amesos2: Reindexing of matrix #13205

Amesos2: Reindexing of matrix #13205

Comments

maxfirmbach commented Jul 3, 2024 • edited by mayrmt Loading

Question

csiefer2 commented Jul 3, 2024

MalachiTimothyPhillips commented Jul 3, 2024 • edited Loading

maxfirmbach commented Jul 4, 2024

maxfirmbach commented Jul 9, 2024

cgcgcg commented Jul 9, 2024

MalachiTimothyPhillips commented Jul 9, 2024

maxfirmbach commented Jul 9, 2024 • edited Loading

maxfirmbach commented Jul 9, 2024 • edited Loading

mayrmt commented Jul 9, 2024

MalachiTimothyPhillips commented Jul 9, 2024

iyamazaki commented Jul 10, 2024

maxfirmbach commented Jul 10, 2024

maxfirmbach commented Jul 10, 2024

iyamazaki commented Jul 10, 2024

iyamazaki commented Jul 10, 2024

maxfirmbach commented Jul 3, 2024 •

edited by mayrmt

Loading

MalachiTimothyPhillips commented Jul 3, 2024 •

edited

Loading

maxfirmbach commented Jul 9, 2024 •

edited

Loading

maxfirmbach commented Jul 9, 2024 •

edited

Loading