Modify the fsize program to print the other information contained in the inode entry.
This exercise took me 3 weeks and was a wild rabbithole into Windows and POSIX differences.
My development environment GCC 8.1.0 built with mingw-w64 on Windows 10 has served me well up until now, but it seems it doesn't cover the POSIX-specific functionality I needed for this exercise.
I knew I could probably find a solution because I got fsize
working by using mingw-w64 <dirent.h>
:
- replaced the custom
opendir
,readdir
andclosedir
functions - replaced
Dirent *dp
withstruct dirent *dp;
- used
dp->d_name
instead ofdp->name
So began a 3 week deep dive reading articles, header files, watching videos and much chatgpt-ing...
- rewrote
opendir
,readdir
andclosedir
with MSVCRT functions_findfirst
,_findnext
and_findclose
- used values from
stat()
Firstly I had another pass at trying to understand the boundaries between WSL, Cygwin, MSYS, MinGW and MinGW-W64.
My understanding is:
Cygwin – Full POSIX-like environment on Windows; heavy, complex, and not native.
MinGW – GNU build tools for Windows; compiles native programs.
MinGW-w64 – Updated MinGW with 32-bit & 64-bit support; now the standard.
MSYS – Minimal Cygwin-based environment for MinGW-w64 and C development on Windows.
MSYS2 – Improved Cygwin-based version with a package manager; now the standard.
WSL1 – Full Linux OS using the Windows kernel; slower but no VM overhead.
WSL2 – Full Linux OS running in a lightweight Hyper-V VM; now the standard.
Some other readings:
- I found video and video helpful on understanding MSYS.
- article explaining how gcc compiles hello world with MinGW.
- Stackexhange answer on what POSIX is
- Windows doesn't follow the 'Everything is a file' POSIX approach
- Some old chat about
dirent.h
withmingw
- (unrelated but cool) Super cool timeline diagram of the evolution of Unix. (Also for windows and languages)
- (unrelated but cool) Transcribed audio interviews with people who developed UNIX.
Some functions in MinGW seem to be implemented for compatibility rather than full functionality and don’t work as expected.
MinGW maps open
, read
and close
to msvcrt.dll Windows calls, which works with files but not directories.
For instance if I run:
#include <stdio.h>
#include <io.h>
#include <fcntl.h>
#include <errno.h>
int main()
{
int fd = open("some_directory", O_RDONLY);
printf("%d, errno: %d, strerror: %s", fd, errno, strerror(errno));
}
The result is:
-1, errno: 13, strerror: Permission denied
MinGW provides POSIX-like functions opendir
, readdir
and closedir
, but they only partially work.
The dirent structure exists, but certain values are always empty (from MinGW-w64 dirent.h):
struct dirent
{
long d_ino; /* Always zero. */
unsigned short d_reclen; /* Always zero. */
unsigned short d_namlen; /* Length of name in d_name. */
char d_name[260]; /* [FILENAME_MAX] */ /* File name. */
};
From Microsoft Docs (MSDN) on _stat()
"The inode, and therefore st_ino, has no meaning in the FAT, HPFS, or NTFS file systems."
Documentation for standard C is pretty straightforward, but when you start looking into UNIX-like functions, it gets kind of murky if they exist or how they work.
I had trouble finding solid documentation, I guess its some sort of combination of reading all the layers and then a bit of trial and error...
- ISO C - C Standard Library - cppreference
- POSIX - OpenGROUP Docs SuSv2, SuSv3
- MinGW-w64 - No centralized documentation, source code is best reference
- GCC - GNU Compiler Docs
- Windows CRT MSVCRT/UCRT Docs
- Windows DLLs - Microsoft Docs (MSDN)
- WindowsNT DLLs - Undocumented
This got me on to trying to see what the complilation process is actually doing, specifically at the linker stage
I had a deeper look into how gcc works, trying to understand what the macros end up creating, which headers are being included and which dll's are being linked.
I found these commands quite useful:
gcc -M Headers
gcc -H Headers with paths
gcc -E Preprocess
gcc -dD -E Preprocess + dump defines
gcc -v Verbose output of what gcc is doing
gcc -dumpspecs baked-in defaults gcc follows if you gave it no options.
gcc -Wl, --verbose print out what the linker is doing
nm a.exe prints names embedded in a compiled executable
ar archive, honestly don't quite understand this tool yet but seems useful
Found this semicomplete beginner friendly guide for working with gcc, seems the guy actually works on gcc now.
Also found this summarisation of an article on GCC Internals pretty straightforward to follow along with
My understanding of how GCC works:
FROM CHATGPT:
1. Preprocessor (cpp - part of cc1.exe) → Expands macros & includes headers.
2. Compiler (cc1.exe for C, cc1plus.exe for C++) → Translates C/C++ code into assembly.
3. Assembler (as.exe) → Converts assembly into machine code (.o files).
4. Linker (ld/collect2.exe) → Combines .o files and libraries into an executable.
If you want to find where these internal components are you can use a command like:
gcc -print-prog-name=cc1
I found two implementations of dirent.c
for Windows which I used as reference:
- MinGW version of dirent.c
- stackoverflow answer that provided link to old implementation of dirent.c
The MinGW version maps functions to probably the correct Win32 API combinations:
opendir - GetFileAttributes
readdir - FindFirst, FindNext
closedir - FindClose
But I couldn't access to GetFileAttributes()
from MinGW, so I went the with the other implementation mapping
opendir - FindFirst
readdir - FindNext
closedir - FindClose
MinGW provides access to the Win32 API functions through _findfirst
, _findnext
and _findclose
through io.h
, which use a struct _finddata_t
Win32 API also seems to require you to add /*
to the end of the filename if you want a directory, so some_dir
becomes some_dir/*
- C was designed to be OS independant language, but POSIX is not so independant
- Straying from purely C Standard library gets murky on documentation and portability on Windows
- GCC and GNU Tools were built on POSIX, theres only so much MinGW can do on Windows
- Might be better to work with a different compilier if doing a lot of Windows FileSystem stuff..
Seems like there are even some differenes in Linux Distros on how these directory filesystem operations work
Not relevant to me as I'm not using Unix
#define NAME_MAX 259 /* longest filename component */
/* system dependant */
typedef struct { /* portable directory entry: */
long ino; /* inode number */
char name[NAME_MAX+1]; /* name + '\0' terminator */
} Dirent;
typedef struct { /* minimal DIR: no buffering, etc */
int fd; /* file descriptor for directory */
Dirent d; /* the directory entry */
} DIR;
DIR *opendir(char *dirname);
Dirent *readdir(DIR *dfd);
void closedir(DIR *dfd);
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <time.h>
#include "dirent.h"
void fsize(char *);
/* print file sizes */
int main(int argc, char **argv)
{
if (argc == 1) /* default: current directory */
fsize(".");
else
while (--argc > 0)
fsize(*++argv);
return 0;
}
void dirwalk(char *, void (*fcn)(char *));
/* fsize: print size of file "name" */
void fsize(char *name)
{
struct stat stbuf;
if (stat(name, &stbuf) == -1) {
fprintf(stderr, "fsize: can't access %s\n", name);
return;
}
if ((stbuf.st_mode & S_IFMT) == S_IFDIR)
dirwalk(name, fsize);
struct tm *c_time = localtime(&stbuf.st_ctime);
char c_time_buf[100];
strftime(c_time_buf, sizeof(c_time_buf), "%Y-%m-%d %H:%M %p", c_time);
printf("%20s", c_time_buf);
struct tm *m_time = localtime(&stbuf.st_mtime);
char m_time_buf[100];
strftime(m_time_buf, sizeof(m_time_buf), "%Y-%m-%d %H:%M %p", m_time);
printf("%20s", m_time_buf);
struct tm *a_time = localtime(&stbuf.st_atime);
char a_time_buf[100];
strftime(a_time_buf, sizeof(a_time_buf), "%Y-%m-%d %H:%M %p", a_time);
printf("%20s", a_time_buf);
printf("%8ld %s\n", stbuf.st_size, name);
}
#define MAX_PATH 1024
/* dirwalk: apply fcn to all files in dir */
void dirwalk(char *dir, void (*fcn)(char *))
{
char name[MAX_PATH];
Dirent *dp;
DIR *dfd;
if ((dfd = opendir(dir)) == NULL) {
fprintf(stderr, "dirwalk: can't open %s\n", dir);
return;
}
while ((dp = readdir(dfd)) != NULL) {
if (strcmp(dp->name, ".") == 0 ||
strcmp(dp->name, "..") == 0)
continue; /* skip self and parent */
if (strlen(dir)+strlen(dp->name)+2 > sizeof(name))
fprintf(stderr, "dirwalk: name %s/%s too long\n", dir, dp->name);
else {
sprintf(name, "%s/%s", dir, dp->name);
(*fcn)(name);
}
}
closedir(dfd);
}
/* my_opendir: open a directory for my_readdir calls */
DIR *opendir(char *dirname)
{
int fd;
struct stat stbuf;
struct _finddata_t info;
DIR *dp;
char dirname_w32[MAX_PATH+3]; /* windows dir ext "/*" with null '\0' */
strcpy(dirname_w32, dirname);
strcat(dirname_w32, "/*");
if ((fd = _findfirst(dirname_w32, &info)) == -1)
return NULL;
if ((dp = (DIR *) malloc(sizeof(DIR))) == NULL)
return NULL;
dp->fd = fd;
return dp;
}
/* readdir: read directory entries in sequence */
Dirent *readdir(DIR *dfd)
{
int fd;
struct _finddata_t dirbuf;
static Dirent d;
while ((fd = _findnext(dfd->fd, &dirbuf)) != -1) {
strncpy(d.name, dirbuf.name, NAME_MAX-1);
d.name[NAME_MAX] = '\0'; /* ensure termination */
return &d;
}
return NULL;
}
void closedir(DIR *dp)
{
if (dp) {
_findclose(dp->fd);
free(dp);
}
}