Cannot coerce class ‘"function"’ to a data.frame #158

linusjf · 2020-04-28T03:01:50Z

Brief description of the problem:

I try to run the following code and receive the following error message:

 cannot coerce class ‘"function"’ to a data.frame

#!/usr/bin/env Rscript


suppressPackageStartupMessages(library(olsrr))

main <- function(argv) {
  
  data <- read.table(cement.txt,
    header = TRUE, as.is = TRUE
  )
  

  model <- lm(y ~ ., data)
  k <- ols_step_all_possible(model)
  plot(k)
  return(0)
}

if (identical(environment(), globalenv())) {
  quit(status = main(commandArgs(trailingOnly = TRUE)))
}

cement.txt

    y     x1     x2     x3     x4
 78.5      7     26      6     60
 74.3      1     29     15     52
104.3     11     56      8     20
 87.6     11     31      8     47
 95.9      7     52      6     33
109.2     11     55      9     22
102.7      3     71     17      6
 72.5      1     31     22     44
 93.1      2     54     18     22
115.9     21     47      4     26
 83.8      1     40     23     34
113.3     11     66      9     12
109.4     10     68      8     12

Assistance will be appreciated.

The text was updated successfully, but these errors were encountered:

aravindhebbali · 2020-04-28T04:48:10Z

Hi @Fernal73 correct me if I am wrong but looks like this error is generated only when executed from the command line. I will look into this but might need your help as my command line knowledge is limited.

linusjf · 2020-04-28T05:45:26Z

I'm running my scripts on the command line on ArchLinux on Termux.

I see no reason why they wouldn't run otherwise.

I have not encountered any problems running my scripts specifically from the command line. None of them employ shiny.

I'd appreciate it if you could assist since your package is quite comprehensive in covering step wise regression using different selection criteria.

R version 3.6.2 (2019-12-12) -- "Dark and Stormy Night"

olsrr version: 0.5.3

aravindhebbali · 2020-04-28T05:57:05Z

Sure.. I will get back to you at the earliest

linusjf · 2020-04-28T06:10:37Z

That's fine, Aravind.

Else, I simply will have to write my own if I can't find any other package or script that does the same. Why reinvent the wheel?

Am I missing something obvious? I find it hard to believe that a package like yours would have an obvious showstopper after being used extensively by your users. That's why I hesitated to write up this issue report.

I've used leaps, MASS and step using AIC and written my own function for p_value as criteria.

aravindhebbali · 2020-04-28T06:16:59Z

I think it should work if you can read the data as anything other than data. I have used cement instead of data and it did not throw that error. I ran this in RStudio and seems to work. Can you check and let me know

main <- function(argv) {

  cement <- read.table("cement.txt", header = TRUE, as.is = TRUE)

  model <- lm(y ~ ., cement)
  k <- ols_step_all_possible(model)
  plot(k)
  return(0)
}

if (identical(environment(), globalenv())) {
  quit(status = main(commandArgs(trailingOnly = TRUE)))
}

aravindhebbali · 2020-04-28T06:22:05Z

The R session gets aborted when I run that last part of the code

if (identical(environment(), globalenv())) {
  quit(status = main(commandArgs(trailingOnly = TRUE)))
}

linusjf · 2020-04-28T07:33:50Z

I tried it with table as the variable name but encountered the same error.
Your code is finding a function with the same name instead and using it.

The variable name 'cement' worked as you pointed out. plot(k) provides no graphical output, however.

The script can be found at:

https://github.com/Fernal73/LearnR/blob/development/Stats462/Lesson11/cementolsrr.R

aravindhebbali · 2020-04-28T07:44:27Z

Yes.. looks like that. Let me debug this and get back to you.

aravindhebbali · 2020-04-28T08:51:17Z

Hi @Fernal73 The error was due to the way we were extracting data from the model: eval(model$call$data). I have added the option to explicitly specify the data used in the model. Please download the development version from GitHub:

# install.packages("devtools")
devtools::install_github("rsquaredacademy/olsrr@develop")

and let me know if the below code works (the plots were printed in RStudio after I removed return(0)

main <- function(argv) {
  data  <- read.table("cement.txt", header = TRUE, as.is = TRUE)
  model <- lm(y ~ ., data)
  k     <- ols_step_all_possible(model, data = data)
  plot(k)
}

if (identical(environment(), globalenv())) {
  quit(status = main(commandArgs(trailingOnly = TRUE)))
}

linusjf · 2020-04-28T09:19:19Z

You're returning a list of ggplot objects in k.

To output that in a non-interactive environment, you would have to print the plots in a loop using print() or
grid.draw() if you're using grob objects.

You might wish to check out the method print_chart in the following script if you need assistance:

https://github.com/Fernal73/LearnR/blob/development/CoronaVirus/worldanalysis.R

aravindhebbali · 2020-04-28T09:33:19Z

The plots are printed by default i.e. the print_plot argument is set to TRUE. The user can set it to FALSE if they want the function to just return the ggplot2 objects.

https://olsrr.rsquaredacademy.com/reference/ols_step_all_possible.html

linusjf · 2020-04-28T09:39:06Z

print_plot does not work in a non-interactive environment when writing to a pdf document.

I can work with the list of plots you return and output them sequentially. That works for me.

However, your code must work in both modes. The graphical output especially when print_plot is TRUE must happen irrespective.

You are using print() elsewhere for ggplot objects.

#135

Thanks for your assistance.

As for installing from Github, I'd rather wait till you release your next version since it impacts my installer.R and I'd rather not have a longer install time than necessary.

aravindhebbali · 2020-04-28T10:37:35Z

Thanks for bringing this to my notice @Fernal73.. it will help other end users very much. Will keep you posted about the next release.

linusjf · 2020-04-28T12:25:52Z

https://online.stat.psu.edu/stat462/node/196/

I've been referring to the above for step wise regression using p-values, Alpha-To-Remove and Alpha-To-Enter.

The ols_step_both_p function does not use the above methodology.

Is there any intention to provide the same?

aravindhebbali · 2020-04-28T15:17:24Z

It does follow the methodology mentioned in the link. I have reproduced the example mentioned below:

> k <- ols_step_both_p(model, pent = 0.15, prem = 0.15)
> k

                             Stepwise Selection Summary                               
-------------------------------------------------------------------------------------
                     Added/                   Adj.                                       
Step    Variable    Removed     R-Square    R-Square      C(p)        AIC       RMSE     
-------------------------------------------------------------------------------------
   1       x4       addition       0.675       0.645    120.2970    97.7440    8.9639    
   2       x1       addition       0.972       0.967      3.9370    67.6341    2.7343    
   3       x2       addition       0.982       0.976      2.0180    63.8663    2.3087    
   4       x4       removal        0.979       0.974      1.4710    64.3124    2.4063    
-------------------------------------------------------------------------------------
> k$model

Call:
lm(formula = paste(response, "~", paste(preds, collapse = " + ")), 
    data = l)

Coefficients:
(Intercept)           x1           x2  
    52.5773       1.4683       0.6623

linusjf · 2020-04-28T15:30:33Z

Ok. I seem to have overlooked that. The default prem = 0.3. So that changes the result. https://www.rdocumentation.org/packages/olsrr/versions/0.5.3/topics/ols_step_both_p

aravindhebbali · 2020-04-28T15:51:49Z

Yes.

linusjf · 2020-04-29T06:25:33Z

Another question wrt to the data.frame returned by ols_step_all_possible:
You seem to be losing data i.e., the response variable.
Adding another column for the response variable is one solution. The other would be to replace the predictors column with the model formula.
What do you think? Or do you believe that any calling method should retain its own formula or model object to pass on with the above output? Let me know if I need to write up a separate issue for this along with the one for print().

aravindhebbali · 2020-04-29T07:17:00Z

Yeah.. we can replace the predictors column with model formula. How about returning lm() objects in a list? I just came across this question on Stack Overflow

https://stackoverflow.com/questions/33627179/how-do-i-store-lm-object-in-a-data-frame-in-r

You know that way, users can directly use the lm() object by referring to the index number mentioned in the data.frame.

linusjf · 2020-04-29T09:49:10Z

The simplest thing to do is add an extra column for the response variable. (Yes, it's redundant over multiple rows but sometimes adding redundancy is the best solution). Changing the predictors to formula will upset your existing users if they depend on the existing structure of the field. I did not consider returning lm objects. I'd like my return value to be as lightweight as possible. If you still want to do that, you can add an option to the method call such as fill.models = TRUE. Returning a list (including the data frame and models) or your original data frame based on a parameter such as above makes it more flexible.

aravindhebbali · 2020-04-29T09:56:21Z

Thanks for your thoughtful inputs @Fernal73. I will think it over and implement the changes.

linusjf · 2020-05-01T09:44:48Z

I'd be chary of returning models over all possible combinations of variables in the model. It would be fine in the case of best_subsets which is a much smaller number. Either way, providing a choice such as fill.models is wise.

linusjf · 2020-05-01T14:50:15Z

Hi @Fernal73 The error was due to the way we were extracting data from the model: eval(model$call$data). I have added the option to explicitly specify the data used in the model. Please download the development version from GitHub:
# install.packages("devtools")
devtools::install_github("rsquaredacademy/olsrr@develop")
and let me know if the below code works (the plots were printed in RStudio after I removed return(0)
main <- function(argv) {
  data  <- read.table("cement.txt", header = TRUE, as.is = TRUE)
  model <- lm(y ~ ., data)
  k     <- ols_step_all_possible(model, data = data)
  plot(k)
}

if (identical(environment(), globalenv())) {
  quit(status = main(commandArgs(trailingOnly = TRUE)))
}

I finally managed to get around to installing your package from Github.
I'm running R version 4 now.

However, I'm still receiving the same error.

Additionally, if I use a different name say datum or iqsize, I receive the following error message:

"Error in eval(model$call$data) : object 'datum' not found
Calls: quit ... ols_step_best_subset.default -> mod_sel_data -> eval -> eval
Execution halted"

The script is:

https://github.com/Fernal73/LearnR/blob/development/Stats462/Lesson11/iqsizesubsets.R

The exact same code in the following script works as expected:
https://github.com/Fernal73/LearnR/blob/development/Stats462/Lesson11/cementsubsets.R

The only change is the variable name.

So is the eval finding your data set 'cement' instead?

You need to pass in the parent environment to eval to have it find the right dataset.
Any particular reason why lm$model is not used to return the data frame?

aravindhebbali · 2020-05-01T17:27:20Z

Hi @Fernal73.. I am unable to reproduce the error. I tried importing the data using different names and it did not throw the error. Not sure why I used eval(model$call$data) over model$model though .

I used model$model and ran the code again and it seems to work. No need to explicitly specify the data name in ols_step_all_possible(). I have pushed the new code to the develop branch. Let me know if it works.

Again, thanks a ton for your detailed feedback :)

linusjf · 2020-05-01T18:06:29Z

cdedb9a#commitcomment-38881126

Are you expecting your users to run your code only in the RStudio environment?

I've run your code with ols_step_all_possible. It works. Those changes have to be in ols_step_best_subset as well. That is the best way to subset the models and then refine the selection process further.

Another point:

You are setting the class for the returned data.frame object as per each function name. A better , more sensible convention would be to set each differently based on the structure of the data.frrame or object returned. That's more important to the user of your API.

https://github.com/Fernal73/LearnR/blob/development/Stats462/Lib/libstep.R

linusjf · 2020-05-02T05:33:19Z

Hi @Fernal73.. I am unable to reproduce the error. I tried importing the data using different names and it did not throw the error. Not sure why I used eval(model$call$data) over model$model though .

A user might wish to specify a starting model, other than all the fields in a dataset.
There ought to be a way to specify certain columns always present and certain always absent. lmsubsets has an API which supports that.

aravindhebbali · 2020-05-02T07:08:36Z

cdedb9a#commitcomment-38881126

Are you expecting your users to run your code only in the RStudio environment?

I've run your code with ols_step_all_possible. It works. Those changes have to be in ols_step_best_subset as well. That is the best way to subset the models and then refine the selection process further.

Another point:

You are setting the class for the returned data.frame object as per each function name. A better , more sensible convention would be to set each differently based on the structure of the data.frrame or object returned. That's more important to the user of your API.

https://github.com/Fernal73/LearnR/blob/development/Stats462/Lib/libstep.R

We do expect the code to run in environments other than RStudio but can't assure the same. Regarding setting of class, we do that whenever a print method is defined. Not all the variable selection methods return just a data.frame but will appreciate your feedback on this.

We will replace eval(model$call$data) with model$model across the package as well as in a couple of other packages as well.

aravindhebbali · 2020-05-02T07:16:05Z

Hi @Fernal73.. I am unable to reproduce the error. I tried importing the data using different names and it did not throw the error. Not sure why I used eval(model$call$data) over model$model though .

A user might wish to specify a starting model, other than all the fields in a dataset.
There ought to be a way to specify certain columns always present and certain always absent. lmsubsets has an API which supports that.

Yup.. we hope to implement it in the next release. Will look into the API of lmSubsets.

linusjf · 2020-05-02T07:46:38Z

The data.frame structure returned in the above two method calls is identical. I'd expect the class for the data.frame to be the same. I don't want to accept a generic data.frame as the parameter to my functions since the implementation depends on the column names in your data.frame object. Rather than have my method crash, I'd rather stop processing if it isn't the class type expected.

Fair.. we do need to review the output from different functions in olsrr so that the end user can use it for further analysis in a seamless manner. We will do this on a priority basis in the next release. Thanks for your input again :)

linusjf · 2020-05-02T07:51:52Z

We do expect the code to run in environments other than RStudio but can't assure the same.

I expect my code to work from the command-line on all R environments.

linusjf · 2020-05-02T10:05:42Z

Understand that and we try to ensure that the package works in different environments but sometimes it just doesn't and we do not have the bandwidth or the infra to test the package in multiple environments.

This appears as though I've made this comments, not you. Not the right way to edit comments. It can be misleading. Yes, I agree. We don't have access to all environments. We do our best.

aravindhebbali · 2020-05-02T10:10:35Z

Sorry @Fernal73 .. that was not intentional. I have removed that comment now but not sure how to add it back in the right place.

linusjf · 2020-05-02T10:10:36Z

Fair.. we do need to review the output from different functions in olsrr so that the end user can use it for further analysis in a seamless manner. We will do this on a priority basis in the next release. Thanks for your input again :)

Ditto about the comment again. I'd rather have a 'base class' added for the data.frame structure that returns all the criteria values such as Rsquare, adjr etc....
The existing class names can be retained for backward compatibility until removed after warning users for a version or two.

aravindhebbali · 2020-05-02T10:16:16Z

But will that still work if the end user would want to return the model objects for further analysis? My initial thought was to return a list which would include the data.frame and the model objects.

linusjf · 2020-05-02T10:19:31Z

@aravindhebbali R is not a strongly typed language. You can return two different class objects if they have different structures. Is there anything in R best practices that prevent you doing that as long as it's well documented?

At some point of time, breaking backwards compatibility is the only way to go forward. As I said earlier, it can be staggered out over a couple of releases with adequate warning to users. Do you want to support multiple class objects in perpetuity?

aravindhebbali · 2020-05-02T10:21:21Z

@aravindhebbali R is not a strongly typed language. You can return two different class objects if they have different structures. Is there anything in R best practices that prevent you doing that as long as it's well documented?

Not as far as I know.

linusjf · 2020-05-02T12:00:20Z

All said and done, when do you think will the changes about lm$model be propagated to all the methods using model_data?

aravindhebbali · 2020-05-02T12:24:36Z

I have pushed the changes to the develop branch.

linusjf · 2020-05-02T12:33:53Z

I have pushed the changes to the develop branch.

https://github.com/rsquaredacademy/olsrr/blob/master/R/ols-best-subsets-regression.R

But not to the above...

aravindhebbali · 2020-05-02T12:36:19Z

I have a few more changes to make.. will merge them into the master branch later.

linusjf · 2020-05-03T22:29:47Z

It does follow the methodology mentioned in the link. I have reproduced the example mentioned below:

> k <- ols_step_both_p(model, pent = 0.15, prem = 0.15)
> k

                             Stepwise Selection Summary                               
-------------------------------------------------------------------------------------
                     Added/                   Adj.                                       
Step    Variable    Removed     R-Square    R-Square      C(p)        AIC       RMSE     
-------------------------------------------------------------------------------------
   1       x4       addition       0.675       0.645    120.2970    97.7440    8.9639    
   2       x1       addition       0.972       0.967      3.9370    67.6341    2.7343    
   3       x2       addition       0.982       0.976      2.0180    63.8663    2.3087    
   4       x4       removal        0.979       0.974      1.4710    64.3124    2.4063    
-------------------------------------------------------------------------------------
> k$model

Call:
lm(formula = paste(response, "~", paste(preds, collapse = " + ")), 
    data = l)

Coefficients:
(Intercept)           x1           x2  
    52.5773       1.4683       0.6623

Does this work with any other dataset other than the 'cement' one provided by olsrr? It does not.

aravindhebbali · 2020-05-04T04:38:28Z

Please file a separate issue with a reproducible example.

linusjf · 2020-05-04T06:27:33Z

Aravind, all these errrors stem from the use of eval in model_data. Replace that with model$model or use the parent.frame environment as a parameter to eval to avoid the problem. I wouldn't recommend the use of eval unless it's absolutely necessary. If you can control the call to lm which fills the model$model with the source data.frame, that's the way to go. I am unaware of why you're specifically using eval. I'm not an R expert, not yet. The advice about eval holds. You don't need a new issue for this. The original issue originates from the same problem.

aravindhebbali · 2020-05-04T06:38:32Z

Hi @Fernal73 We will not be using eval() to extract data anymore. Will replace it with model$model as you had suggested and push the changes today.

linusjf · 2020-05-04T07:28:07Z

Yeah.. we can replace the predictors column with model formula. How about returning lm() objects in a list? I just came across this question on Stack Overflow

https://stackoverflow.com/questions/33627179/how-do-i-store-lm-object-in-a-data-frame-in-r

You know that way, users can directly use the lm() object by referring to the index number mentioned in the data.frame.

I can raise an issue for this.

linusjf · 2020-05-04T07:30:20Z

print_plot does not work in a non-interactive environment when writing to a pdf document.

I can work with the list of plots you return and output them sequentially. That works for me.

However, your code must work in both modes. The graphical output especially when print_plot is TRUE must happen irrespective.

You are using print() elsewhere for ggplot objects.

#135

Thanks for your assistance.

As for installing from Github, I'd rather wait till you release your next version since it impacts my installer.R and I'd rather not have a longer install time than necessary.

And this...

aravindhebbali · 2020-05-04T08:04:21Z

Correct me if I an wrong but the below modification will return ggplot objects irrespective of the value assigned to print_plot and that is what you are suggesting

Change from

if (print_plot) {
    print(p)
  } else {
    return(p)
  }

to the below:

# print plots if set to TRUE
if (print_plot) {
    print(p)
  }   

# always return ggplot objects
 return(p)

linusjf · 2020-05-04T09:49:06Z

Correct me if I an wrong but the below modification will return ggplot objects irrespective of the value assigned to print_plot and that is what you are suggesting

Change from
if (print_plot) {
    print(p)
  } else {
    return(p)
  }
to the below:
# print plots if set to TRUE
if (print_plot) {
    print(p)
  }   

# always return ggplot objects
 return(p)

Yes, it can be returned invisibly.
invisible(p)

linusjf · 2020-05-05T01:48:06Z

Hi @Fernal73 We will not be using eval() to extract data anymore. Will replace it with model$model as you had suggested and push the changes today.

The default is for lm to return the dataframe in lm$model as the parameter model is set to TRUE by default. As I said, if you can control the call to lm, this behaviour is fine. If on the other hand, lm is invoked with model = FALSE, the data.frame will not be returned.

linusjf · 2020-05-05T04:19:01Z

Hi @Fernal73 We will not be using eval() to extract data anymore. Will replace it with model$model as you had suggested and push the changes today.

The default is for lm to return the dataframe in lm$model as the parameter model is set to TRUE by default. As I said, if you can control the call to lm, this behaviour is fine. If on the other hand, lm is invoked with model = FALSE, the data.frame will not be returned.

The way I see it, these are the options.

The function accepts a formula and a data.frame and constructs the model internally.
The function accepts a data.frame and constructs a formula and model internally using DF2formula and lm respectively in that order.
The function accepts a model and an optional data.frame. If the model$model is NULL, it looks up the data.frame parameter and uses its value if not NULL. Else an error is flagged and processing halts.

Is there a fourth option? Should the call object be investigated further? I'm not familiar with its dynamics.

linusjf · 2020-05-05T06:03:15Z

Is there a fourth option? Should the call object be investigated further? I'm not familiar with its dynamics.

As far as I can see, there isn't one.
The only way to , perhaps, do it is to traverse the environment tree and find the environment that has 'data' as symbol and a data.frame object. It's too messy and could be error-prone.

http://adv-r.had.co.nz/Environments.html

aravindhebbali self-assigned this Apr 28, 2020

linusjf mentioned this issue May 4, 2020

Methods returning data.frame of decision criteria values such as RSquare, Adjr, AIC etc must also return the equivalent regression model object on-demand. #160

Open

linusjf mentioned this issue May 4, 2020

ggplot (or grob) object must be printed when print_plot is TRUE #161

Open

mwm-cbus mentioned this issue Mar 7, 2021

using ols_step_all_possible with Model created from dynamic function leads to "Error in eval(model$call$data) . . . not found" #176

Closed

Cannot coerce class ‘"function"’ to a data.frame #158

Cannot coerce class ‘"function"’ to a data.frame #158

Comments

linusjf commented Apr 28, 2020

aravindhebbali commented Apr 28, 2020

linusjf commented Apr 28, 2020 • edited Loading

aravindhebbali commented Apr 28, 2020

linusjf commented Apr 28, 2020

aravindhebbali commented Apr 28, 2020

aravindhebbali commented Apr 28, 2020 • edited Loading

linusjf commented Apr 28, 2020

aravindhebbali commented Apr 28, 2020

aravindhebbali commented Apr 28, 2020 • edited Loading

linusjf commented Apr 28, 2020

aravindhebbali commented Apr 28, 2020

linusjf commented Apr 28, 2020 • edited Loading

aravindhebbali commented Apr 28, 2020

linusjf commented Apr 28, 2020

aravindhebbali commented Apr 28, 2020

linusjf commented Apr 28, 2020 via email • edited Loading

aravindhebbali commented Apr 28, 2020

linusjf commented Apr 29, 2020

aravindhebbali commented Apr 29, 2020

linusjf commented Apr 29, 2020 via email • edited Loading

aravindhebbali commented Apr 29, 2020

linusjf commented May 1, 2020

linusjf commented May 1, 2020 • edited Loading

aravindhebbali commented May 1, 2020

linusjf commented May 1, 2020 • edited Loading

linusjf commented May 2, 2020

aravindhebbali commented May 2, 2020 • edited Loading

aravindhebbali commented May 2, 2020

linusjf commented May 2, 2020 • edited by aravindhebbali Loading

linusjf commented May 2, 2020 • edited by aravindhebbali Loading

linusjf commented May 2, 2020

aravindhebbali commented May 2, 2020

linusjf commented May 2, 2020

aravindhebbali commented May 2, 2020

linusjf commented May 2, 2020 • edited Loading

aravindhebbali commented May 2, 2020

linusjf commented May 2, 2020

aravindhebbali commented May 2, 2020

linusjf commented May 2, 2020

aravindhebbali commented May 2, 2020

linusjf commented May 3, 2020

aravindhebbali commented May 4, 2020

linusjf commented May 4, 2020 via email • edited Loading

aravindhebbali commented May 4, 2020

linusjf commented May 4, 2020

linusjf commented May 4, 2020

aravindhebbali commented May 4, 2020

linusjf commented May 4, 2020

linusjf commented May 5, 2020

linusjf commented May 5, 2020

linusjf commented May 5, 2020

linusjf commented Apr 28, 2020 •

edited

Loading

aravindhebbali commented Apr 28, 2020 •

edited

Loading

aravindhebbali commented Apr 28, 2020 •

edited

Loading

linusjf commented Apr 28, 2020 •

edited

Loading

linusjf commented Apr 28, 2020 via email •

edited

Loading

linusjf commented Apr 29, 2020 via email •

edited

Loading

linusjf commented May 1, 2020 •

edited

Loading

linusjf commented May 1, 2020 •

edited

Loading

aravindhebbali commented May 2, 2020 •

edited

Loading

linusjf commented May 2, 2020 •

edited by aravindhebbali

Loading

linusjf commented May 2, 2020 •

edited by aravindhebbali

Loading

linusjf commented May 2, 2020 •

edited

Loading

linusjf commented May 4, 2020 via email •

edited

Loading