Check out my first novel, midnight's simulacra!

Procfs

From dankwiki
Revision as of 23:56, 26 June 2023 by Dank (talk | contribs)

procfs began life in 1984's UNIX V8 as a virtual filesystem dedicated to exporting process information and supporting ptrace (another decades-old mess), presumably after Tom Killian lost a bet to a VAX. Over time, it has grown substantially on any number of operating systems. It is pretty much a required interface on Linux, and strongly recommended on Solaris and FreeBSD.

proc/PID

Each entity associated with a non-zero PID (this includes most kernel threads) has a corresponding toplevel procfs directory named by its PID (e.g. when using systemd as init and mounting procfs at /proc, systemd's primary process is described by /proc/1 (the process only appears in procfs mounts within the same PID namespace)). One of the entries is proc/PID/task, a directory which contains the threads making up the process, using the TID as name:

[schwarzgerat](1) $ grep ^Threads: /proc/`pidof rtorrent`/status
Threads:	3
[schwarzgerat](0) $ ls /proc/`pidof rtorrent`/task
4282  4283  4309
[schwarzgerat](0) $ 

Since time immemorial, tools like ps have enumerated processes via walking procfs's dentries when possible.

procfs since Linux 3.3 accepts a mount option hidepid, taking one of three values:

  • 0: everyone may access all proc/PID directories
  • 1: users can only access their own proc/PID directories
  • 2: users can only *see* their own proc/PID directories

Linux 3.3 also introduced the gid parameter, which specifies a group ID. Members of this group are exempted from hidepid restrictions.

Note that if you start a process as one user, and change to another user using e.g. setuid(2), you will generally no longer be able to access your own /proc/PID.

proc/PID/loginuid

Note that this is only present when CONFIG_AUDIT is enabled in the kernel configuration. Audit doesn't have to be enabled, and auditd doesn't need to be running, but without that option, no loginuid for you.

proc/PID/stat sucks

/proc/PID/stat contains a mélange of data, though exactly which data depends on precise kernel version. The proc(5) manpage describes them in terms of scanf conversion operators, which generally lacks rigor due to C numeric types having different ranges on different implementations. This would all be gross, but fundamentally acceptable.

No, the real shit on this sandwich is that it is provably impossible to rigorously delineate the second field, even with a pushdown automata:

  (2) comm  %s
      The filename of the executable, in parentheses. This is
      visible whether or not the  executable is swapped out.

This field contains the exact contents of proc/PID/comm in matched parens, faithfully reproducing even newlines:

[schwarzgerat](0) $ for i in `pidof 'an
n)' '(an))' '\(an\n\)' ` ; do cat /proc/$i/comm /proc/$i/stat ; done
an
n)
16575 (an
n)) S 26490 16575 26490 34821 16575 4194304 83 0 4 0 0 0 0 0 20 0 1 0 7180301 2322432 190 18446744073709551615 94120227446784 94120227451341 140723167495248 0 0 0 0 0 0 1 0 0 17 2 0 0 0 0 0 94120227462632 94120227463216 94120252813312 140723167503219 140723167503227 140723167503227 140723167506416 0
(an))
19009 ((an))) S 18466 19009 18466 34824 19009 4194304 88 0 0 0 0 0 0 0 20 0 1 0 7196127 2322432 191 18446744073709551615 94409545498624 94409545503181 140728365407536 0 0 0 0 0 0 1 0 0 17 2 0 0 0 0 0 94409545514472 94409545515056 94409563447296 140728365413235 140728365413243 140728365413243 140728365416432 0
\(an\n\)
19506 (\(an\n\)) S 19091 19506 19091 34825 19506 4194304 87 0 0 0 0 0 0 0 20 0 1 0 7199808 2322432 174 18446744073709551615 94706604249088 94706604253645 140726402607088 0 0 0 0 0 0 1 0 0 17 13 0 0 0 0 0 94706604264936 94706604265520 94706610049024 140726402610026 140726402610037 140726402610037 140726402613229 0
[schwarzgerat](0) $ 

We start at the lparen following whitespace following digits. But where do we end? There is no valid delimiter. Even if we don't care about the name of the process, and just want to know its state ('S'leeping in all three cases above), we can't identify where the next token begins. Any potential delimiter can be encoded into a filename under this scheme. Together with the varying fields over the years, one must lex backwards from the right, since the common (and most important) fields are in the front.

What, you ask? Couldn't we read proc/PID/comm and thus know the exact match to expect? Unfortunately, a process (or even thread) can change its "comm value". According to the proc(5) man page, stat's second field reproduces the name of the executable, and thus a process which had changed its comm value would not facilitate sane lexing. The documentation is incorrect:

[schwarzgerat](0) $ cat > p
#!/usr/bin/env bash
suck () {
  echo "$$ c $(cat /proc/$$/comm) e $(sed -n -e 's/.*\((.*)\).*/\1/p' < /proc/$$/stat)"
}
suck
echo "$1" > /proc/self/task/$$/comm
echo "$$ Changed my name to $1!"
suck
[schwarzgerat](0) $ chmod 755 p && ./p bigdumbsackofshit
30819 c bash e (bash)
30819 Changed my name to bigdumbsackofshit!
30819 c bigdumbsackofsh e (bigdumbsackofsh)
[schwarzgerat](0) $ 

Good, I guess? Gotta love that silent truncation to TASK_COMM_LEN bytes (16 as of Linux 5.3.5).

To remedy this and other problems, proc/PID/status also exists, reproducing *most* of the information from stat using newline-delimited key-value pairs. The process name here is backslash-escaped. Let's return to our three friendly processes:

[schwarzgerat](0) $ for i in `pidof 'an
n)' '(an))' '\(an\n\)' ` ; do cat /proc/$i/comm && head -1 /proc/$i/status ; done
an
n)
Name:	an\nn)
(an))
Name:	(an))
\(an\n\)
Name:	\\(an\\n\\)
[schwarzgerat](0) $ 

Unfortunately, proc/PID/status cheerfully leaves out several fields from stat, including flags and all the timing information, so it's like what was the point of that exactly. Ugh.

See Also