Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amesos2: Reindexing of matrix #13205

Open
maxfirmbach opened this issue Jul 3, 2024 · 15 comments
Open

Amesos2: Reindexing of matrix #13205

maxfirmbach opened this issue Jul 3, 2024 · 15 comments

Comments

@maxfirmbach
Copy link
Contributor

maxfirmbach commented Jul 3, 2024

Question

Dear Amesos2-developers,

currently I'm trying to switch our in-house code from Amesos to Amesos2.
As, let's call it pre-processing step, we do a reindexing of our linear problem to have a continous GID numbering from 0 ... n, using EpetraExt (which in our case is necessary as we have GID jumps). Is there a similar feature implemented in Xpetra or Tpetra?

I've seen that Amesos2 has a reindex input parameter, but looking into the code, it seems to do ... basically nothing? From documentation:

We plan in the future to support the following parameters:

    "Reindex" : { true | false }. Put the matrix row and column indices into the range [0..n]. 

Are there any intentions to implement this feature properly into Amesos2?

Best regards,
Max

@trilinos/amesos2
@mayrmt

@csiefer2
Copy link
Member

csiefer2 commented Jul 3, 2024

@ndellingwood @iyamazaki

@MalachiTimothyPhillips
Copy link

MalachiTimothyPhillips commented Jul 3, 2024

As an aside, I don't think you are required to have continuous GIDs so long as you set the IsContiguous setting to false. For our application code, we've never had to remap our discontinuous GIDs to continuous GIDs.

See, for example: https://github.com/trilinos/Trilinos/blob/master/packages/amesos2/test/solvers/Superlu_UnitTests.cpp#L372.

The one big caveat with that is if you dump the matrix to matrix-market format, you'll need to remap those GIDs to be 1-based index to be in compliance with the matrix market format.
Here's a completely un-tested, un-supported, provided as-is without any warranty Python script that can accomplish that as a post-processing step:

Script:
#!/usr/bin/env python3
import os.path
import scipy.io
import numpy as np
import sys
from itertools import (takewhile, repeat)
import glob
import pathlib
import argparse

def warn_on_missing_file(fileName):
  errString = f"File {fileName} does not exist!"
  print(errString)
  return


def warn_on_existing_file(fileName):
  errString = f"Cowardly refusing to overwrite file {fileName}.\n"
  errString += "Please use --overwrite to allow overwriting to occur, or alter the output matrix filename.\n"
  print(errString)
  return

def draw_progress_bar(percentage, process="", num_bins=20):
  num_draw = int(percentage * num_bins)
  sys.stdout.write('\r')
  sys.stdout.write(
      f"{process} : [{'='*num_draw}{' ' * (num_bins-num_draw)}] {int(100 * percentage)}%")
  sys.stdout.flush()
  return

def grab_line_count(filename):
  f = open(filename, 'rb')
  bufgen = takewhile(lambda x: x, (f.raw.read(1024*1024)
                     for _ in repeat(None)))
  return sum(buf.count(b'\n') for buf in bufgen)


def read_matrix_file(matrixFileName):
  gids = {}
  A_coo = {}
  readSizeLine = False
  update_line_freq = 1000
  num_lines = grab_line_count(matrixFileName)
  header_content = ""

  with open(matrixFileName, "r") as matrixFile:
    for line_num, line in enumerate(matrixFile):
      if line_num % update_line_freq == 0:
        draw_progress_bar((line_num+1)/num_lines, "reading matrix file")
      if "%" in line:
        header_content += line
        continue
      if not readSizeLine:
        header_content += line
        readSizeLine = True
        continue
      i, j, A_ij = line.split()
      i, j = int(i), int(j)
      A_coo[(i, j)] = A_ij
      gids[i] = None
  draw_progress_bar(1, "reading matrix file")
  print()

  rows = {}
  row = 1
  for gid in gids.keys():
    rows[gid] = row
    row = row + 1
  return rows, A_coo, header_content


def write_matrix_file(rows, A_coo, header, outputMatrixFileName):
  update_line_freq = 1000
  newFileContents = header
  num_matrix_entries = len(A_coo)
  for mat_entry, ((gid_i, gid_j), A_ij) in enumerate(A_coo.items()):
    if mat_entry % update_line_freq == 0:
      draw_progress_bar((mat_entry+1)/num_matrix_entries,
                        "creating new matrix file")
    i = rows[gid_i]
    j = rows[gid_j]
    newFileContents += f"{i} {j} {A_ij}\n"
  draw_progress_bar(1, "creating new matrix file")
  print()

  with open(outputMatrixFileName, "w") as newMatrixFile:
    newMatrixFile.write(newFileContents)
  return


def reassign_matrix_rows(matrixFileName, newMatrixFileName):
  rows, A_coo, header = read_matrix_file(matrixFileName)
  write_matrix_file(rows, A_coo, header, newMatrixFileName)
  return

if __name__ == "__main__":
  parser = argparse.ArgumentParser(
      description='Re-assign row ids in matrix market file to force them to be contiguous and 1-base indexed')
  parser.add_argument('-i', '--input-matrix',
                      help='Input matrix-market file', required=True)
  parser.add_argument('-o', '--output-matrix',
                      help='Output matrix-market file', required=True)
  parser.add_argument('--overwrite', action='store_true')
  args = vars(parser.parse_args())
  inputMatrix = args['input_matrix']
  outputMatrix = args['output_matrix']
  
  allGood = True
  if not os.path.isfile(inputMatrix):
    warn_on_missing_file(inputMatrix)
    print("Please check your --input-matrix/-i option.")
    allGood = False

  if os.path.isfile(outputMatrix) and not overwrite:
    warn_on_existing_file(outputMatrix)
    allGood = False

  reassign_matrix_rows(inputMatrix, outputMatrix)

@maxfirmbach
Copy link
Contributor Author

@MalachiTimothyPhillips Interesting! I will try to set the IsContiguous parameter to false, maybe that already solves the problem.

@maxfirmbach
Copy link
Contributor Author

@MalachiTimothyPhillips I checked the flag, but it's not helping in our case ... I still get the same memory errors, if I don't reindex the problem ...

@cgcgcg
Copy link
Contributor

cgcgcg commented Jul 9, 2024

@maxfirmbach Can you reproduce the issue with a matrix market file of your matrix and some Trilinos executable?

@MalachiTimothyPhillips
Copy link

@maxfirmbach Can you reproduce the issue with a matrix market file of your matrix and some Trilinos executable?

@cgcgcg I'm pretty sure the Tpetra matrix market readers require contiguous GID numbering at the moment. That said, @maxfirmbach could dump the (non-contiguously numbered) matrix, use the conversion script to use 1-based contiguous GID numbering, read in the matrix, and then alter the GID ordering to something like GID = 10 * old_GID + 3.

@maxfirmbach Do you have a stack-trace or something for the error you are hitting? We've never had to re-index for our application code, and we have some very non-contiguous GIDs.

@maxfirmbach
Copy link
Contributor Author

maxfirmbach commented Jul 9, 2024

@cgcgcg @MalachiTimothyPhillips

Seems like with the respective flag set to false I get something like this:

#0 0x55b124a14643 in Tpetra::Map<int, int, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >::getComm() const
#1 0x55b1032bccb9 in Teuchos::RCP<Tpetra::Map<int, int, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const> Tpetra::Details::computeGatherMap<Tpetra::Map<int, int, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > >(Teuchos::RCP<Tpetra::Map<int, int, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const>, Teuchos::RCP<Teuchos::basic_FancyOStream<char, std::char_traits<char> > > const&, bool)
#2 0x55b123dbbd55 in Teuchos::RCP<Tpetra::Map<int, int, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const> const Amesos2::Util::getDistributionMap<int, int, unsigned long, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> >(Amesos2::EDistribution, unsigned long, Teuchos::RCP<Teuchos::Comm<int> const> const&, int, Teuchos::RCP<Tpetra::Map<int, int, Tpetra::KokkosCompat::KokkosDeviceWrapperNode<Kokkos::Serial, Kokkos::HostSpace> > const> const&)
#3 0x55b123f77d16 in Amesos2::Umfpack<Epetra_CrsMatrix, Epetra_MultiVector>::loadA_impl(Amesos2::EPhase)
...

Maybe let's clarify what contiguous GID numbering means ...does it mean to have something like [0, 1, 2, 6, 7, 8, 12, 13, 14, ...] ? Btw, I'm using Umfpack as direct solver, not sure if this is relevant.

@maxfirmbach
Copy link
Contributor Author

maxfirmbach commented Jul 9, 2024

It's weird though ... a Tpetra error ... despite the fact I'm using Epetra ... Might also be that something major is going wrong in our code, but with the reindexing there's no problem ;)

@mayrmt
Copy link
Member

mayrmt commented Jul 9, 2024

@maxfirmbach Can you pinpoint the exact location of the error inside the Amesos2 source code? Maybe some #ifdef is wrong and you accidentally end up in a Tpetra code path.

@MalachiTimothyPhillips
Copy link

@maxfirmbach

Contiguous GID numbering as in [0, 1, 2, .., nrow-1] as opposed to, e.g., [0, 3, 6, ..., 3*(nrow-1)]. At least, that's my understanding.

I should clarify that I've only ever tested using Amesos2 + Tpetra, so maybe the discontinuous numbering does not work as expected with Epetra?

@iyamazaki
Copy link
Contributor

You are right. I do not see testing for non-contiguous GIDs with Epetra. I'll take a look.

@maxfirmbach
Copy link
Contributor Author

@iyamazaki @MalachiTimothyPhillips Could be that there's a problem with Epetra ... I know this is a edge case, but a working version with Epetra would make my/our life easier switching to Tpetra without having to port all packages at once.

@maxfirmbach
Copy link
Contributor Author

@maxfirmbach Can you pinpoint the exact location of the error inside the Amesos2 source code? Maybe some #ifdef is wrong and you accidentally end up in a Tpetra code path.

@mayrmt Yes, I can track it down till getDistributionMap() in Amesos2_Util.{hpp,cpp}. The respective function arguments are always Tpetra::Map.

@iyamazaki
Copy link
Contributor

I can reproduce the error, so hopefully I can fix it. I think we create Tpetra::Map, but then convert it to Epetra::Map for Epetra backend.

@iyamazaki
Copy link
Contributor

Basically, the issue is

template <class DerivedMat>
const RCP<const Tpetra::Map<MatrixTraits<Epetra_RowMatrix>::local_ordinal_t,
MatrixTraits<Epetra_RowMatrix>::global_ordinal_t,
MatrixTraits<Epetra_RowMatrix>::node_t> >
AbstractConcreteMatrixAdapter<Epetra_RowMatrix, DerivedMat>::getMap_impl() const
{
// Should Map() be used (returns Epetra_BlockMap)
/*
printf("Amesos2_EpetraRowMatrix_AbstractMatrixAdapter: Epetra does not support a 'getMap()' method. Returning rcp(Teuchos::null). \
If your map contains non-contiguous GIDs please use Tpetra instead of Epetra. \n");
*/
return( Teuchos::null );
}

We could return rowMap as a patch (which lets me pass my very limited test), but I am not sure the whole effects of that. We are wondering how critical it is to support non-contiguous GIDs with Epetra. @ndellingwood

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants