Skip to content
Amaro Taylor-Weiner edited this page Oct 17, 2018 · 4 revisions

Description of inputs

--data: data matrix columns = Samples, Rows = count categories, can include headers and index labels.

--K0: Initial number of active clusters/signatures defaults to the number of rows.

--max_iter: maximum iterations to run ard-NMF default is 10,000.

--tolerance: Early stop condition based on max lambda entry default is 1e-5.

--a: Hyperparamter. We recommend trying various values of a. Smaller values will result in sparser results a good starting point might be a = log(F+N).

--prior_on_W: Prior on W matrix "L1" (exponential) or "L2" (half-normal). Defaults to L1.

--prior_on_H: Prior on H matrix "L1" (exponential) or "L2" (half-normal). Defaults to L1.

--objective: Defines the data objective. Choose between "poisson" or "gaussian". Defaults to Poisson.

--phi: Dispersion parameter see paper for discussion of choosing phi. We default to recommended settings in Tan and Fevotte 2012.

--b: Default used is as recommended in Tan and Fevotte 2012.

--output_file: output_file_name if run in array mode this correspond to the output directory.

--labeled: Pass this argument if the data matrix has has row and column labels/headers

--report_frequency: Number of iterations between progress reports.

Description of outputs

Each of these is also summarized in the output parameters table if run in job array mode.

W matrix: [output_file name or label column from parameters file]_W.tx. Contains signature or cluster activations.

H matrix: [output_file name or label column from parameters file]_H.txt Contains patient/sample signature or cluster activities.

Number of active clusters / signatures: [output_file name or label column from parameters file]_n_signatures.txt

Objective function value: [output_file name or label column from parameters file]_objective_function.txt

Clone this wiki locally