Check out my first novel, midnight's simulacra!
Procfs: Difference between revisions
(9 intermediate revisions by the same user not shown) | |||
Line 13: | Line 13: | ||
Since time immemorial, tools like <tt>ps</tt> have enumerated processes via walking procfs's dentries when possible. | Since time immemorial, tools like <tt>ps</tt> have enumerated processes via walking procfs's dentries when possible. | ||
procfs since Linux 3.3 accepts a mount option <tt>hidepid</tt>, taking one of | procfs since Linux 3.3 accepts a mount option <tt>hidepid</tt>, taking one of four values: | ||
* 0: everyone may access all <tt>proc/PID</tt> directories | * 0: everyone may access all <tt>proc/PID</tt> directories | ||
* 1: users can only access their own <tt>proc/PID</tt> directories | * 1: users can only access their own <tt>proc/PID</tt> directories | ||
* 2: users can only *see* their own <tt>proc/PID</tt> directories | * 2: users can only *see* their own <tt>proc/PID</tt> directories | ||
* 4: users can only see directories of processes they can [[ptrace]] | |||
Linux 3.3 also introduced the <tt>gid</tt> parameter, which specifies a group ID. Members of this group are exempted from <tt>hidepid</tt> restrictions. | Linux 3.3 also introduced the <tt>gid</tt> parameter, which specifies a group ID. Members of this group are exempted from <tt>hidepid</tt> restrictions. | ||
Note that if you start a process as one user, and change to another user using e.g. <tt>setuid(2)</tt>, you might no longer be able to access your own <tt>/proc/PID</tt>. | |||
===<tt>proc/PID/loginuid</tt>=== | |||
Note that this is only present when <tt>CONFIG_AUDIT</tt> is enabled in the kernel configuration. Audit doesn't have to be enabled, and auditd doesn't need to be running, but without that option, no <tt>loginuid</tt> for you. | |||
===<tt>proc/PID/stat</tt> sucks=== | ===<tt>proc/PID/stat</tt> sucks=== | ||
<tt>/proc/PID/stat</tt> contains a mélange of data, though exactly which data depends on precise kernel version. The proc(5) manpage describes them in terms of <tt>scanf</tt> conversion operators, which generally lacks rigor due to [[C]] numeric types having different ranges on different implementations. This would all be acceptable. | <tt>/proc/PID/stat</tt> contains a mélange of data, though exactly which data depends on precise kernel version. The <tt>proc(5)</tt> manpage describes them in terms of <tt>scanf</tt> conversion operators, which generally lacks rigor due to [[C]] numeric types having different ranges on different implementations. This would all be gross, but fundamentally acceptable. | ||
No, the real shit on this sandwich is that it is provably impossible to rigorously delineate the second field, even with a pushdown automata: | No, the real shit on this sandwich is that it is provably impossible to rigorously delineate the second field, even with a pushdown automata: | ||
Line 46: | Line 52: | ||
</pre> | </pre> | ||
We start at the lparen following whitespace following digits. But where do we end? | We start at the lparen following whitespace following digits. But where do we end? <i>There is no valid delimiter</i>. Even if we don't care about the name of the process, and just want to know its state ('S'leeping in all three cases above), we can't identify where the next token begins. Any potential delimiter can be encoded into a filename under this scheme. Together with the varying fields over the years, one must lex backwards from the right, since the common (and most important) fields are in the front. | ||
What, you ask? Couldn't we read <tt>proc/PID/comm</tt> and thus know the exact match to expect? Unfortunately, a process (or even thread) can change its "comm value". According to the <tt>proc(5)</tt> man page, <tt>stat</tt>'s second field reproduces the name of the | What, you ask? Couldn't we read <tt>proc/PID/comm</tt> and thus know the exact match to expect? Unfortunately, a process (or even thread) can change its "comm value". According to the <tt>proc(5)</tt> man page, <tt>stat</tt>'s second field reproduces the name of the <i>executable</i>, and thus a process which had changed its comm value would not facilitate sane lexing. The documentation is incorrect: | ||
<pre> | <pre> | ||
Line 60: | Line 66: | ||
echo "$$ Changed my name to $1!" | echo "$$ Changed my name to $1!" | ||
suck | suck | ||
[schwarzgerat](0) $ chmod 755 p && ./p | [schwarzgerat](0) $ chmod 755 p && ./p bigdumbsackofshit | ||
30819 c bash e (bash) | 30819 c bash e (bash) | ||
30819 Changed my name to bigdumbsackofshit! | 30819 Changed my name to bigdumbsackofshit! | ||
Line 67: | Line 73: | ||
</pre> | </pre> | ||
Good, I guess? | Good, I guess? Gotta love that silent truncation to <tt>TASK_COMM_LEN</tt> bytes (16 as of Linux 5.3.5). | ||
To remedy this and other problems, <tt>proc/PID/status</tt> also exists, reproducing *most* of the information from <tt>stat</tt> using newline-delimited key-value pairs. The process name here is backslash-escaped. Let's return to our three friendly processes: | To remedy this and other problems, <tt>proc/PID/status</tt> also exists, reproducing *most* of the information from <tt>stat</tt> using newline-delimited key-value pairs. The process name here is backslash-escaped. Let's return to our three friendly processes: | ||
Line 84: | Line 90: | ||
</pre> | </pre> | ||
Unfortunately, <tt>proc/PID/status</tt> cheerfully leaves out several fields from <tt>stat</tt>, including <tt>flags</tt>. | Unfortunately, <tt>proc/PID/status</tt> cheerfully leaves out several fields from <tt>stat</tt>, including <tt>flags</tt> and all the timing information, so it's like what was the point of that exactly. Ugh. | ||
==See Also== | ==See Also== | ||
* The [[sysfs]] page | * The [[sysfs]] page | ||
* UNIX V8 [http://man.cat-v.org/unix_8th/4/proc proc(4)] man page | * UNIX V8 [http://man.cat-v.org/unix_8th/4/proc proc(4)] man page |
Latest revision as of 04:21, 27 June 2023
procfs began life in 1984's UNIX V8 as a virtual filesystem dedicated to exporting process information and supporting ptrace (another decades-old mess), presumably after Tom Killian lost a bet to a VAX. Over time, it has grown substantially on any number of operating systems. It is pretty much a required interface on Linux, and strongly recommended on Solaris and FreeBSD.
proc/PID
Each entity associated with a non-zero PID (this includes most kernel threads) has a corresponding toplevel procfs directory named by its PID (e.g. when using systemd as init and mounting procfs at /proc, systemd's primary process is described by /proc/1 (the process only appears in procfs mounts within the same PID namespace)). One of the entries is proc/PID/task, a directory which contains the threads making up the process, using the TID as name:
[schwarzgerat](1) $ grep ^Threads: /proc/`pidof rtorrent`/status Threads: 3 [schwarzgerat](0) $ ls /proc/`pidof rtorrent`/task 4282 4283 4309 [schwarzgerat](0) $
Since time immemorial, tools like ps have enumerated processes via walking procfs's dentries when possible.
procfs since Linux 3.3 accepts a mount option hidepid, taking one of four values:
- 0: everyone may access all proc/PID directories
- 1: users can only access their own proc/PID directories
- 2: users can only *see* their own proc/PID directories
- 4: users can only see directories of processes they can ptrace
Linux 3.3 also introduced the gid parameter, which specifies a group ID. Members of this group are exempted from hidepid restrictions.
Note that if you start a process as one user, and change to another user using e.g. setuid(2), you might no longer be able to access your own /proc/PID.
proc/PID/loginuid
Note that this is only present when CONFIG_AUDIT is enabled in the kernel configuration. Audit doesn't have to be enabled, and auditd doesn't need to be running, but without that option, no loginuid for you.
proc/PID/stat sucks
/proc/PID/stat contains a mélange of data, though exactly which data depends on precise kernel version. The proc(5) manpage describes them in terms of scanf conversion operators, which generally lacks rigor due to C numeric types having different ranges on different implementations. This would all be gross, but fundamentally acceptable.
No, the real shit on this sandwich is that it is provably impossible to rigorously delineate the second field, even with a pushdown automata:
(2) comm %s The filename of the executable, in parentheses. This is visible whether or not the executable is swapped out.
This field contains the exact contents of proc/PID/comm in matched parens, faithfully reproducing even newlines:
[schwarzgerat](0) $ for i in `pidof 'an n)' '(an))' '\(an\n\)' ` ; do cat /proc/$i/comm /proc/$i/stat ; done an n) 16575 (an n)) S 26490 16575 26490 34821 16575 4194304 83 0 4 0 0 0 0 0 20 0 1 0 7180301 2322432 190 18446744073709551615 94120227446784 94120227451341 140723167495248 0 0 0 0 0 0 1 0 0 17 2 0 0 0 0 0 94120227462632 94120227463216 94120252813312 140723167503219 140723167503227 140723167503227 140723167506416 0 (an)) 19009 ((an))) S 18466 19009 18466 34824 19009 4194304 88 0 0 0 0 0 0 0 20 0 1 0 7196127 2322432 191 18446744073709551615 94409545498624 94409545503181 140728365407536 0 0 0 0 0 0 1 0 0 17 2 0 0 0 0 0 94409545514472 94409545515056 94409563447296 140728365413235 140728365413243 140728365413243 140728365416432 0 \(an\n\) 19506 (\(an\n\)) S 19091 19506 19091 34825 19506 4194304 87 0 0 0 0 0 0 0 20 0 1 0 7199808 2322432 174 18446744073709551615 94706604249088 94706604253645 140726402607088 0 0 0 0 0 0 1 0 0 17 13 0 0 0 0 0 94706604264936 94706604265520 94706610049024 140726402610026 140726402610037 140726402610037 140726402613229 0 [schwarzgerat](0) $
We start at the lparen following whitespace following digits. But where do we end? There is no valid delimiter. Even if we don't care about the name of the process, and just want to know its state ('S'leeping in all three cases above), we can't identify where the next token begins. Any potential delimiter can be encoded into a filename under this scheme. Together with the varying fields over the years, one must lex backwards from the right, since the common (and most important) fields are in the front.
What, you ask? Couldn't we read proc/PID/comm and thus know the exact match to expect? Unfortunately, a process (or even thread) can change its "comm value". According to the proc(5) man page, stat's second field reproduces the name of the executable, and thus a process which had changed its comm value would not facilitate sane lexing. The documentation is incorrect:
[schwarzgerat](0) $ cat > p #!/usr/bin/env bash suck () { echo "$$ c $(cat /proc/$$/comm) e $(sed -n -e 's/.*\((.*)\).*/\1/p' < /proc/$$/stat)" } suck echo "$1" > /proc/self/task/$$/comm echo "$$ Changed my name to $1!" suck [schwarzgerat](0) $ chmod 755 p && ./p bigdumbsackofshit 30819 c bash e (bash) 30819 Changed my name to bigdumbsackofshit! 30819 c bigdumbsackofsh e (bigdumbsackofsh) [schwarzgerat](0) $
Good, I guess? Gotta love that silent truncation to TASK_COMM_LEN bytes (16 as of Linux 5.3.5).
To remedy this and other problems, proc/PID/status also exists, reproducing *most* of the information from stat using newline-delimited key-value pairs. The process name here is backslash-escaped. Let's return to our three friendly processes:
[schwarzgerat](0) $ for i in `pidof 'an n)' '(an))' '\(an\n\)' ` ; do cat /proc/$i/comm && head -1 /proc/$i/status ; done an n) Name: an\nn) (an)) Name: (an)) \(an\n\) Name: \\(an\\n\\) [schwarzgerat](0) $
Unfortunately, proc/PID/status cheerfully leaves out several fields from stat, including flags and all the timing information, so it's like what was the point of that exactly. Ugh.