Subject: Re: popen read and write?
Date: Tuesday 9th October 2007 05:06:09 UTC (over 10 years ago)
Michal Kolodziejczyk wrote: > > Hello, > I can see that popen() can be used either to read or to write data > from/to the process. Is there any hope it would be able to read and > write data at the same time? > How do you use lua when trying do write and read to/from the same > process? (under linux if this matters) This question comes up again and again and there are always people who show how easy it is to implement it. Unfortunately, nobody yet has told about the problems that such a bidirectional popen (let's call it popen2) has. There's is a very good reason why such a popen2 isn't part of the POSIX standard: it's _very_very_hard_ to use it correctly. (I would even go as far as saying that it's impossible to get right with stdio.) The trivial cases seems to work well but real work results in a deadlock. Just take the most simple filter: cat. It echos everything it reads from stdin to stdout. Let's assume our popen2 uses a single file handle for reading and writing (using two handles doesn't really change the problems) and that it's using stdio (that gives even more problems but still doesn't change the inherent difficulties). You think this sequence is ok? fp = popen2("cat") fp:write("Hello World!") x = fp:read("*a") fp:close() No, it's broken - not even this trivial example works. It hangs. Why? Because fp is buffered - you need to flush the output buffer otherwise "cat" gets nothing to echo back to your read(). Ok, let's add a flush: fp:write("Hello World!") fp:flush_output() x = fp:read("*a") Heck, still no go. It hangs in read again. Still something wrong. What's going on? Easy, cat will not write such a small amount of data (12 chars). Even if it were in line buffered mode, there's no \n at the end of "Hello World!". It needs a buffer full of data (whatever that is) or an EOF to process the data. So, give it an EOF: fp:write("Hello World!") fp:flush_output() fp:close_output() x = fp:read("*a") Fine. This trivial example works. Kind of. It may work on your system but not on others. Actually, it will work on most systems but only because the implicit assumptions made by that code (namely that about buffer sizes and the behaviour of "cat") are satisfied on most systems. To see what's wrong let's shove bigger chunks of data into the filter: fp:write(about_64k_of_data) fp:flush_output() fp:close_output() x = fp:read("*a") 64k should be big enough to fill all buffers between the parent and the filter process (the actual value differs greatly between systems). What happens? The program hangs again! Why? Because the parent is still trying to write data to the filter but the filter is no longer reading the data. It is trying to pass already processes data back to the parent. Deadlock! Trying to circumvent the deadlock by guessing proper chunk sizes is fruitless. Even if you know the buffer sizes (of the system and that of the used filter!) you may not know in advance how much data the filter may produce and you're lost. There are usually two ways to handle bidirectional popens: First: use two different processes/tasks/threads, one for sending data to the filter and one for reading data from it. Of course, these two processes shouldn't block each other through an additional communication channel or the deadlock may come again. This method is the one usually used for typical filters in Unix pipes. It works very well and the implementation gives no big problems. It is used when the reader and the writer-process are indepent from each other, a simple producer/filter/consumer relationship. Second: use non-blocking file handles together with select/poll-like system calls to dispatch reading and writing yourself. That prohibits use of stdio. The stdio routines are not designed for non-blocking access and all kind of magic things happen. Even if you think you can get away with select and blocking I/O (you're wrong btw) stdio won't play nice with you. You can't really control when and how stdio performs I/O on the relevant file pointers and you can bet that it works differently on a different implementation. You can't even query if there's still something in the input buffer! This method is prefered when a single app really tries to communicate with the "filter". You have to implement your own buffering with overflow and underflow handling, timeouts etc. A generic implementation for all kind of "filters" gives a non-trivial API. Probably not easier than the POSIX non-blocking I/O API. I didn't test Python's popen2 family of functions but I bet that they won't pass the "cat" test. The users of these functions usually ignore the deadlock problem. They pass tiny packets of data back and forth and hope that it works. It does, until one of the packets exceeds some hard to tell size or the filter does something unexpected. Reliable bidirectional communication between two processes isn't trivial. Getting it right is hard. Don't let people thing it's easy. Back to the original poster: if possible, redirect one end of the stream (input or output) to a file. I.e.: fp = io.popen("foo >/tmp/unique", "w") fp:write(anything) fp:close() fp = io.open("/tmp/unique") x = read("*a") fp:close() If you want to process it further, pipe it into another Lua instance: fp = io.popen("foo | lua part2.lua", "w") ... For reliable bidrectional communication there's nothing at the moment in Lua and, afaik, none of the present extension libraries provide enough functionality to implement it (multiple processes method maybe). Ciao, ET.  Regarding hard to tell size: Linux sometime back changed pipe buffers from a single page sized buffer to something like 8 page-sized buffers. But each one could be partial- ly full as every write consumed at least one buffer. Worst case: a pipe could buffer only 8 bytes if you perform single byte writes!  even ssh had this kind of deadlock - too much data on stderr and noone was reading it - hang.