There's a feature added to Linux 6.9 that I think people should become more aware of: there's finally an identifier for processes that doesn't wrap around as easily as UNIX pid_t PIDs do: the pidfd file descriptors have been moved onto their own proper...
-
There's a feature added to Linux 6.9 that I think people should become more aware of: there's finally an identifier for processes that doesn't wrap around as easily as UNIX pid_t PIDs do: the pidfd file descriptors have been moved onto their own proper file system (pidfs), which enabled at the same time unique inode numbers for them.
-
Lennart Poetteringreplied to Lennart Poettering last edited by
These inode numbers are (at least on 64bit archs, i.e. anything modern) unique during the entire runtime of a system. And that's fantastic: there's finally a way how you can race-freely reference a process, with the ability to pass it around over any form of IPC, without risking that it suddenly starts to refer to some unintentended other process.
-
Lennart Poetteringreplied to Lennart Poettering last edited by
To query the inode number from a pidfd, you use a simple fstat() call, and look at the .st_ino field.
There's currently no way to get from a pidfd inode number directly to a process however. Hence, for now you always have to pass around a combination of classic PID and the new pidfd inode number. This can be safely and correctly be turned into a pidfd: 1. first acquire a pidfd from the PID via pidfd_open(). 2. Then fstat() the fd, and check if .st_ino matches the expected value.
-
Lennart Poetteringreplied to Lennart Poettering last edited by
If you want a world-wide unique identifier for a process it makes sense to combine the pair of pid_t and pidfd inode number with the system's boot ID (i.e. /proc/sys/kernel/random/boot_id). This triplet is awesome, because for the first time we can uniquely identify a Linux process, globally in this universe.
In systemd we are making use of this heavily now: internally we always store a triplet of pid, pidfd, pidfd inode for referencing processes we manage andβ¦
-
Lennart Poetteringreplied to Lennart Poettering last edited by
β¦ when we pass around information about processes via IPC we have started to do so via the triplet pid, pid inode, boot id.
And I'd recommend everyone dealing with low-level process management to do the same.
-
Lennart Poetteringreplied to Lennart Poettering last edited by
I think the pair of PID and pidfd inode number would be great to support in the various tools that currently deal with PIDs. For example, I filed an RFE bug against util-linux' kill tool to add just that:
-
Lennart Poetteringreplied to Lennart Poettering last edited by
It took a long time, but thanks to @brauner after all those years the limitations of UNIX pid_t are addressed! Thanks, Christian!
-
Lennart Poetteringreplied to Lennart Poettering last edited by
Two caveats though: the concept is not universal: it's a Linux thing, and it requires kernel 6.9 or newer and a 64bit architecture. On 32bit the inode number range is too small to provide unique IDs.
To properly check if the feature is available allocate a pidfd, and check if statfs() reports a .f_type field of it being 0x50494446. Also verify if sizeof(ino_t) is >= 8.
-
@pid_eins wtf, the kernel tracks this unique 64 bit number on 32 bit systems but won't let you see it. Infuriating.
This would be a basically perfect use case for name_to_handle_at (and maybe open_by_handle_at?)... -
@erincandescent cgroupfs actually exposes the cgroupid via ntha() and obha(). So yes, there's prior art for doing the same in pidfs. But it's a bit weird, because unlike cgroupfs pidfs is not an fs you can mount, hence you don't really have anything to invoke obha() on. You'd probably have to get a pidfd on your own pid first, before you can use it to use obha() to get to the pidfd you actually want to get to.
-
Erin π½β¨replied to Lennart Poettering last edited by [email protected]
@pid_eins I was about to say βweβre getting
PIDFD_SELF
and you could use that in the same way as e.g.AT_FDCWD
β except bothPIDFD_SELF
andAT_FDCWD
are defined as -100. This sucks. Maybe we could get them made into different numbers before 6.13 drops? ;_; -
@pid_eins My perhaps controversial opinion is that from userland the magic file descriptors should Just Work like actual file descriptors in every regard except that you canβt close() them or dup2 to them, but I guess that ship has sailed anyway.
(i.e. you should be able to stuff PIDFD_SELF into an SCM_RIGHTS control message and out of the other end pops a pidfd for your process; no need to
pidfd_open(getpid())
andclose()
it) -
@pid_eins Since saying this I discovered that newer versions of the patch use different integers. Phew.
Anyway, I sent ntha/obha support for pidfs to lkml, lets see how it goesβ¦