Description
_io.FileIO
is implemented by utilizing mostly FileStream
to access files in the OS file system. Unfortunately, this class does not work well when there are multiple simultaneous writers. This is possibly the Win32 legacy, where simultaneous writes to a file may cause an exception during write through another handle, according to documentation. I have not observed exceptions, but I have noticed that simultaneous writes overwrite each other. This is not POSIX behaviour, which safely allows multiple writes through the same descriptor, duplicate descriptor, or another opened descriptor to the same file, if appropriate file mode flags are used (e.g. O_APPEND
).
Consider the following example:
// Test code that accesses one file opened in Append mode simultaneously on two threads
string filePath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.UserProfile), "testfile.txt");
if (File.Exists(filePath)) {
File.Delete(filePath);
}
// Number of writes
const int ndata = 100200300;
Task task1 = Task.Run(() => WriteToFile(filePath, Encoding.ASCII.GetBytes("xxxxxxxxx\n")));
Task task2 = Task.Run(() => WriteToFile(filePath, Encoding.ASCII.GetBytes("zzzzzzzzz\n")));
Task.WaitAll(task1, task2);
void WriteToFile(string name, byte[] data) {
using (var fs = new FileStream(name, FileMode.Append, FileAccess.Write, FileShare.Write)) {
for (int i = 0; i < ndata; i++) {
fs.Write(data, 0, data.Length);
}
}
}
This snippet uses two tasks to perform 100200300 writes, each write 10 bytes long, so each task produces 1002003000 bytes. Two such tasks should produce a file twice that size, that is, 2004006000 bytes. However, the file created is only 1002003000 bytes long (sometimes a bit more), containing a mixture of x
's and z
's, clearly a sign of the tasks overwriting the data from each other.
For comparison, here is the equivalent example in Python:
import os
import threading
file_path = os.path.join(os.path.expanduser("~"), "testfile.txt")
if os.path.exists(file_path):
os.remove(file_path)
# Number of writes
ndata = 100200300
def write_to_file(file_path, data):
with open(file_path, 'ab') as f:
for _ in range(ndata):
f.write(data)
thread1 = threading.Thread(target=write_to_file, args=(file_path, b"xxxxxxxxx\n"))
thread2 = threading.Thread(target=write_to_file, args=(file_path, b"zzzzzzzzz\n"))
thread1.start()
thread2.start()
thread1.join()
thread2.join()
This code, when run with CPython on Linux or macOS (not Windows), correctly produces a file that is 2004006000 bytes long. IronPython, obviously, does not.
I am considering the following possible solutions:
- In place of
System.IO.FileStream
, useMono.Unix.UnixStream
(which operates directly on the file descriptor) for all file access in IronPython when run on POSIX OSes. However:UnixStream
is unbuffered, which changes the runtime profile of IronPython. This may actually be not a bad thing since at this levelFileIO
is supposed to provide a "raw" (unbuffered) access to the file. Nevertheless, it's a change, and let's hope that the buffered wrappers above it do a good job in buffering.- The OS errors inside
UnixStream
are translated to native CLR exceptions, as much as possible. This is not desirable for IronPython which, to match CPython, should produceOSError
with an appropriate errno code. UnixStream
does not support efficientReadOnlySpan<byte>
interfaces of .NET.
All three concerns can be addressed in various ways (proxy class, exception unpacking etc.)
- Write own dedicated stream class that makes low level OS calls to perform IO operations (e.g. using
Mono.Unix.Native
). Such a class can be easily integrated into the rest of the IronPython runtime.