Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File is locked after using tabulizer on it #115

Open
2 of 3 tasks
sikaiser opened this issue Jan 24, 2020 · 2 comments
Open
2 of 3 tasks

File is locked after using tabulizer on it #115

sikaiser opened this issue Jan 24, 2020 · 2 comments

Comments

@sikaiser
Copy link

sikaiser commented Jan 24, 2020

Please specify whether your issue is about:

  • a possible bug

  • a question about package functionality

  • a suggested code or documentation change, improvement to the code, or feature request

  • Example below is using extract_tables(), however the same happens with locate_areas() (which does not call extract_tables() afaik).

  • Does anyone have a suggestion how the file can be unlocked manually if this is intended behavior? Thank you!

Put your code here:

## rJava loads successfully
# install.packages("rJava")
library("rJava")

## load package
library("tabulizer")

## code goes here
download.file("https://www.sjsu.edu/faculty/gerstman/StatPrimer/t-table.pdf", "example.pdf", mode = "wb")
file.rename("example.pdf", to = "renamed.pdf")
# [1] TRUE
tables <- extract_tables("renamed.pdf")
file.rename("renamed.pdf", to = "extracted.pdf")
# [1] FALSE
--

## session info for your system
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] rJava_0.9-11    tabulizer_0.2.2

loaded via a namespace (and not attached):
[1] tabulizerjars_1.0.1 compiler_3.6.1      tools_3.6.1        
[4] png_0.1-7   
@sikaiser
Copy link
Author

sikaiser commented Jan 25, 2020

Update with workaround: using copy = TRUE fixes it for extract_tables and for locate_areas, but not for extract_areas.

So:

# locks file
tables <- extract_areas("example.pdf", copy = TRUE)

# does not lock file (workaround)
areas  <- locate_areas("example.pdf", copy = TRUE)
tables <- extract_tables("example.pdf", area = areas, guess = FALSE, copy = TRUE)

@sikaiser
Copy link
Author

I think a way to fix this would be to pass on the value of copy to extract_tables in the function definition of extract_areas (in locate_area.R).

extract_areas <- function(file,
                          pages = NULL,
                          guess = FALSE,
                          copy = FALSE,
                          ...) {
    areas <- locate_areas(file = file, pages = pages, copy = copy)
    extract_tables(file = file,
                   pages = pages,
                   area = areas,
                   guess = guess,
                   copy = copy, # proposed change
                   ...)
}

I'm happy to make a pull request for this. But I haven't been able to test it, and am not sure if this is indeed a bug or intended behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant