Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Duncan Murdoch <murdoch.duncan <at> gmail.com>
Subject: Re: Issue with Control-Z in a text file on Windows - readLines() appears to truncate
Newsgroups: gmane.comp.lang.r.devel
Date: Wednesday 10th April 2013 19:47:32 UTC (over 4 years ago)
On 10/04/2013 10:20 AM, Sean O'Riordain wrote:
> Working on Windows I have had to deal with CSV files that,
> unfortunately, contain embedded Control-Zs, i.e. ASCII character 26 in
> decimal, and the readLines() function in R on Windows (2.15.2 and
> 3.0.0) appears to truncate at the control-Z.  There is no problem at
> all on Ubuntu Linux with R 3.0.0.
>
> Am I mistaken or is this genuine?

Ctrl-Z is the old text file EOF marker from MSDOS.  readLines() normally 
reads files in text mode using the Microsoft Visual C libraries, so I 
wouldn't be surprised if they respect Ctrl-Z as EOF.

A simpler workaround than the one you used is to read the file in binary 
mode, e.g.

f <- file("h3.txt", "rb")
readLines(f)
close(f)

See the ?file help topic for a discussion of the limitations this may 
impose on you.

Duncan Murdoch

>
> # Create a small file with embedded Control-Z
> h3 <- paste('1,34,44.4,"', rawToChar(as.raw(c(65, 26, 65))), '",99')
> h3
> #  "1,34,44.4,\" A\032A \",99"
> writeLines(h3, 'h3.txt')
>
> # now attempt to read the file back in
> h3a <- readLines('h3.txt')
> # but on Windows 2.15.2 and 3.0.0 I get the message
> #Warning message:
> #In readLines("h3.txt") : incomplete final line found on 'h3.txt'
> h3a
> # [1] "1,34,44.4,\" A"
> # so it drops from the Control-Z onwards
>
> ####
> # The following is my rough and ready workaround - I'm sure there is a
> cleaner way
> fnam <- 'h3.txt'
> tmp.bin <- readBin(fnam, raw(), size=1, n=max(2*file.info(fnam)$size,
100))
> tmp.char <- rawToChar(tmp.bin)
> txt <- unlist(strsplit(tmp.char, '\r\n', fixed=TRUE))
> txt
> # [1] "1,34,44.4,\" A\032A \",99"
>
> This was on 64-bit R on a 64-bit Windows 7, but it also appears to be
> the case in a 32-bit R 2.15.2 on 32-bit Windows-7 inside in a
> VirtualBox.
>
> Kind regards,
> Sean O'Riordain
> Trinity College
> Dublin
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
 
CD: 4ms