Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread buffer overflow with version 1.9.4 #956

Closed
heraldb opened this issue Nov 16, 2014 · 14 comments
Closed

fread buffer overflow with version 1.9.4 #956

heraldb opened this issue Nov 16, 2014 · 14 comments
Assignees
Milestone

Comments

@heraldb
Copy link

heraldb commented Nov 16, 2014

Hi!

I just ran into a bug with fread of version 1.9.4, when done on file "UCI HAR Dataset/test/X_test.txt" of
https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

*** buffer overflow detected ***: /usr/lib64/R/bin/exec/R terminated
======= Backtrace: =========
/lib64/libc.so.6[0x3b92675a4f]
/lib64/libc.so.6(__fortify_fail+0x37)[0x3b92706947]
/lib64/libc.so.6[0x3b92704b20]
/lib64/libc.so.6[0x3b92704029]
/lib64/libc.so.6(_IO_default_xsputn+0xbc)[0x3b9267907c]
/lib64/libc.so.6(_IO_vfprintf+0x3190)[0x3b9264ab70]
/lib64/libc.so.6(__vsprintf_chk+0x88)[0x3b927040b8]
/lib64/libc.so.6(__sprintf_chk+0x7d)[0x3b9270400d]
/home/herald/R/x86_64-redhat-linux-gnu-library/3.1/data.table/libs/datatable.so(readfile+0x20bb)[0x7f08a2228cab]
/usr/lib64/R/lib/libR.so[0x32e3698386]
/usr/lib64/R/lib/libR.so[0x32e36d0469]
/usr/lib64/R/lib/libR.so(Rf_eval+0x260)[0x32e36d8030]
/usr/lib64/R/lib/libR.so(Rf_applyClosure+0x42c)[0x32e36d969c]
/usr/lib64/R/lib/libR.so(Rf_eval+0x336)[0x32e36d8106]
/usr/lib64/R/lib/libR.so[0x32e36db9ee]
/usr/lib64/R/lib/libR.so(Rf_eval+0x558)[0x32e36d8328]
/usr/lib64/R/lib/libR.so[0x32e36da6d3]
/usr/lib64/R/lib/libR.so(Rf_eval+0x558)[0x32e36d8328]
/usr/lib64/R/lib/libR.so(Rf_applyClosure+0x42c)[0x32e36d969c]
/usr/lib64/R/lib/libR.so(Rf_eval+0x336)[0x32e36d8106]
/usr/lib64/R/lib/libR.so(Rf_ReplIteration+0x252)[0x32e3700cd2]
/usr/lib64/R/lib/libR.so[0x32e3701021]
/usr/lib64/R/lib/libR.so(run_Rmainloop+0x44)[0x32e37010b4]
/usr/lib64/R/bin/exec/R(main+0x1b)[0x4007fb]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x3b92621d65]
/usr/lib64/R/bin/exec/R[0x40082d]
======= Memory map: ========

Let me know if you need any extra information.

Thanks,
Herald

@arunsrinivasan
Copy link
Member

Did you read this and more importantly the 2nd and 3rd point under checklist here?

@heraldb
Copy link
Author

heraldb commented Nov 16, 2014

Op 16-11-14 om 17:13 schreef Arun:

Did you read this
https://github.com/Rdatatable/data.table/blob/master/Contributing.md
and more importantly the 2nd and 3rd point under |checklist| here
https://github.com/Rdatatable/data.table/wiki/How-to-file-a-bug-report?

I just read README.md and noticed that there are bug fixes for fread,
but none of them seem related to the bug I see (e.g. there are no quotes
in the data file).

I tried to install 1.9.5, following the instructions (using devtools
package), but after doing that, the version reported (when loading the
package and when running packageVersion('data.table') was still 1.9.4.
Not sure if the version is 1.9.5 nevertheless. But anyway, when running
the test again, the problem is still there.

Maybe you can you it a shot?

Thanks,
Herald


Reply to this email directly or view it on GitHub
#956 (comment).

@jangorecki
Copy link
Member

packageVersion('data.table') means you are on 1.9.4. Your package installation might failed.
Before any tests ensure packageVersion('data.table') will return 1.9.5.

@heraldb
Copy link
Author

heraldb commented Nov 16, 2014

Well, I tried again by following the instructions on
https://github.com/Rdatatable/data.table/wiki/Installation

library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)

remove.packages("data.table") # revert back to CRAN
install.packages("data.table")

While doing that I did not see any error messages. But the version
remains 1.9.4.

Any suggestions?

Op 16-11-14 om 21:39 schreef Jan Gorecki:

|packageVersion('data.table')| means you are on |1.9.4|. Your package
installation might failed.
Before any tests ensure |packageVersion('data.table')| will return |1.9.5|.


Reply to this email directly or view it on GitHub
#956 (comment).

@jangorecki
Copy link
Member

try only the first two lines

library(devtools)
install_github("Rdatatable/data.table",  build_vignettes = FALSE)

@heraldb
Copy link
Author

heraldb commented Nov 16, 2014

Yes, in that case I get version 1.9.5. So the instructions should be corrected?

WIth this version of 1.9.5 I tried again and now the buffer overflow is gone. However, fread fails in another way:

Error in fread(input = file_data, header = FALSE) :
Not positioned correctly after testing format of header row. ch=' '
Calls: rbindlist -> read_data -> fread
Execution halted

It strikes me that the code seems to test the format of the header row, while there is no header row (header = FALSE).

FWIW, the file format is space delimited (with one ore more spaces) and has 561 numeric values (e.g. "-6.6768331e-001") and the lines start with one ore more spaces. The function read.table() has no problem with is. The work around I use now is "data <- as.table.data(read.table(....))"

@arunsrinivasan
Copy link
Member

Thanks, I've updated installation instructions to make it clearer.

@arunsrinivasan
Copy link
Member

@heraldb If you could please go through What should the report contain? part and update your post, it would be great. The purpose of creating such instructions is to not go through this back and forth.

  1. Please format your post using markdown so that it's easier to read files/code.
  2. Also please provide the code you ran as well, and comment those lines that are output from code / due to segfaults/errors.
  3. Provide a minimal example if possible (the link to just the file is sufficient, not the entire directory).
  4. It's great that you include the output from your current version (1.9.4), but more importantly we'd like to know what happens in 1.9.5 - and please run it along with verbose=TRUE option (point 5 in the link provided) and paste the output along with the error you get.
  5. Pasting sessionInfo() output for bug reports is generally helpful.

In any case, I've managed to reproduce the error, and marked as bug.


require(data.table) ## 1.9.5
DT = fread("X_test.txt", verbose=TRUE)
# Input contains no \n. Taking this to be a filename to open
# File opened, filesize is 0.024641 GB.
# Memory mapping ... ok
# Detected eol as \r\n (CRLF) in that order, the Windows standard.
# Positioned on line 1 after skip or autostart
# This line is the autostart and not blank so searching up for the last non-blank ... line 1
# Detecting sep ... ' '
# Detected 561 columns. Longest stretch was from line 1 to line 30
# Starting data input on line 1 (either column names or first row of data). First 10 characters:   2.571777
# Error in fread("~/Downloads/X_test.txt", verbose = TRUE) :
#   Not positioned correctly after testing format of header row. ch=' '

sessionInfo()
# R version 3.1.2 (2014-10-31)
# Platform: x86_64-apple-darwin13.4.0 (64-bit)

# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base

# other attached packages:
# [1] data.table_1.9.5

# loaded via a namespace (and not attached):
# [1] chron_2.3-45

@heraldb
Copy link
Author

heraldb commented Nov 17, 2014

Yes the instructions to install a development version are clear now!
And yes, next time I'll do my best to create a better/more detailed bug report.

Thanks a lot!

@arunsrinivasan
Copy link
Member

👍

@heraldb
Copy link
Author

heraldb commented Nov 17, 2014

A minimal dataset to create this type of error:

library(data.table)

packageVersion('data.table')
#[1] ‘1.9.5’

#create minimal data set
fn = 'data-956.txt'
write(file = fn, " 2 3")

# now read it with fread()
dt = fread(fn, header = FALSE, verbose = TRUE)
#Input contains no \n. Taking this to be a filename to open
#File opened, filesize is 0.000000 GB.
#Memory mapping ... ok
#Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
#Positioned on line 1 after skip or autostart
#This line is the autostart and not blank so searching up for the last non-blank ... line 1
#Detecting sep ... ' '
#Detected 2 columns. Longest stretch was from line 1 to line 1
#Starting data input on line 1 (either column names or first row of data). First 10 characters:  2 3
#Error in fread(fn, header = FALSE, verbose = TRUE) : 
#  Not positioned correctly after testing format of header row. ch=' '
sessionInfo()
#R version 3.1.1 (2014-07-10)
#Platform: x86_64-redhat-linux-gnu (64-bit)
#
#locale:
# [1] LC_CTYPE=nl_NL.utf8       LC_NUMERIC=C             
# [3] LC_TIME=nl_NL.utf8        LC_COLLATE=nl_NL.utf8    
# [5] LC_MONETARY=nl_NL.utf8    LC_MESSAGES=nl_NL.utf8   
# [7] LC_PAPER=nl_NL.utf8       LC_NAME=C                
# [9] LC_ADDRESS=C              LC_TELEPHONE=C           
#[11] LC_MEASUREMENT=nl_NL.utf8 LC_IDENTIFICATION=C      
#
#attached base packages:
#[1] stats     graphics  grDevices utils     datasets  base     
#
#other attached packages:
#[1] data.table_1.9.5
#
#loaded via a namespace (and not attached):
#[1] chron_2.3-45  methods_3.1.1

same results when removing "header = FALSE" from fread() call

When changing the write() statement in the file above we can play with other file formats.

We see the same results with fread() when also adding a header like this:

write(file = fn, ncolumns = 1, c(" a b", " 2 3"))

When removing the leading spaces in the first line the error with fread() goes away, e.g.

write(file = fn, ncolumns = 1, c("a b", " 2 3"))

Also removing the header line and removing the space before the first column makes the error go away:

write(file = fn, "2 3")

So leading spaces seems to play a role, but it's not the whole story. fread() also gets confused when multiple spaces are used as separation, which is used in some formats to align the columns by reserving space for the minus sign (so one space between "2" and "-2" but two spaces between "2" and "2"

So when using double space between the columns, like this:

write(file = fn, "2  3")

Will make fread() produce the same error again.

So in short, the problem seems to be with leading spaces and multiple spaces between columns.

@arunsrinivasan
Copy link
Member

Awesome report! Appreciate it very much. Thanks.

@heiderich
Copy link

I can confirm the bug, also using version 1.9.4 and the same file as in the original post.

@arunsrinivasan arunsrinivasan self-assigned this Sep 16, 2015
@arunsrinivasan arunsrinivasan added this to the v1.9.6 milestone Sep 16, 2015
@arunsrinivasan
Copy link
Member

Fixed with commit 0e7a835. Please upgrade and test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants