Skip to content

FileStream not suitable for FileIO on POSIX systems #1846

Open
@BCSharp

Description

@BCSharp

_io.FileIO is implemented by utilizing mostly FileStream to access files in the OS file system. Unfortunately, this class does not work well when there are multiple simultaneous writers. This is possibly the Win32 legacy, where simultaneous writes to a file may cause an exception during write through another handle, according to documentation. I have not observed exceptions, but I have noticed that simultaneous writes overwrite each other. This is not POSIX behaviour, which safely allows multiple writes through the same descriptor, duplicate descriptor, or another opened descriptor to the same file, if appropriate file mode flags are used (e.g. O_APPEND).

Consider the following example:

// Test code that accesses one file opened in Append mode simultaneously on two threads
string filePath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.UserProfile), "testfile.txt");

if (File.Exists(filePath)) {
    File.Delete(filePath);
}
// Number of writes
const int ndata = 100200300;

Task task1 = Task.Run(() => WriteToFile(filePath, Encoding.ASCII.GetBytes("xxxxxxxxx\n")));
Task task2 = Task.Run(() => WriteToFile(filePath, Encoding.ASCII.GetBytes("zzzzzzzzz\n")));

Task.WaitAll(task1, task2);

void WriteToFile(string name, byte[] data) {
	using (var fs = new FileStream(name, FileMode.Append, FileAccess.Write, FileShare.Write)) {
		for (int i = 0; i < ndata; i++) {
			fs.Write(data, 0, data.Length);
		}
	}
}

This snippet uses two tasks to perform 100200300 writes, each write 10 bytes long, so each task produces 1002003000 bytes. Two such tasks should produce a file twice that size, that is, 2004006000 bytes. However, the file created is only 1002003000 bytes long (sometimes a bit more), containing a mixture of x's and z's, clearly a sign of the tasks overwriting the data from each other.

For comparison, here is the equivalent example in Python:

import os
import threading

file_path = os.path.join(os.path.expanduser("~"), "testfile.txt")
if os.path.exists(file_path):
    os.remove(file_path)

# Number of writes
ndata = 100200300

def write_to_file(file_path, data):
    with open(file_path, 'ab') as f:
        for _ in range(ndata):
            f.write(data)

thread1 = threading.Thread(target=write_to_file, args=(file_path, b"xxxxxxxxx\n"))
thread2 = threading.Thread(target=write_to_file, args=(file_path, b"zzzzzzzzz\n"))

thread1.start()
thread2.start()

thread1.join()
thread2.join()

This code, when run with CPython on Linux or macOS (not Windows), correctly produces a file that is 2004006000 bytes long. IronPython, obviously, does not.

I am considering the following possible solutions:

  1. In place of System.IO.FileStream, use Mono.Unix.UnixStream (which operates directly on the file descriptor) for all file access in IronPython when run on POSIX OSes. However:
    1. UnixStream is unbuffered, which changes the runtime profile of IronPython. This may actually be not a bad thing since at this level FileIO is supposed to provide a "raw" (unbuffered) access to the file. Nevertheless, it's a change, and let's hope that the buffered wrappers above it do a good job in buffering.
    2. The OS errors inside UnixStream are translated to native CLR exceptions, as much as possible. This is not desirable for IronPython which, to match CPython, should produce OSError with an appropriate errno code.
    3. UnixStream does not support efficient ReadOnlySpan<byte> interfaces of .NET.
      All three concerns can be addressed in various ways (proxy class, exception unpacking etc.)
  2. Write own dedicated stream class that makes low level OS calls to perform IO operations (e.g. using Mono.Unix.Native). Such a class can be easily integrated into the rest of the IronPython runtime.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions