From zblaxell@myrus.com  Sat May 25 02:54:45 1996
Received: from minitrue.ultratech.net (root@minitrue.ultratech.net [204.101.209.1]) by suburbia.net (8.7.4/Proff-950810) with ESMTP id CAA20940 for <best-of-security@suburbia.net>; Sat, 25 May 1996 02:53:14 +1000
Received: from myrus.com (root@localhost [127.0.0.1]) by minitrue.ultratech.net (8.7.3/8.7.3) with ESMTP id MAA24694 for <best-of-security@suburbia.net>; Fri, 24 May 1996 12:52:19 -0400
Received: (from zblaxell@localhost) by myrus.com (8.7.5/8.7.3) id MAA11095 for best-of-security@suburbia.net; Fri, 24 May 1996 12:52:21 -0400
From: Zygo Blaxell <zblaxell@myrus.com>
Message-Id: <199605241652.MAA11095@myrus.com>
Subject: Unix Interface Considered Harmful
To: best-of-security@suburbia.net
Date: Fri, 24 May 1996 12:52:21 -0400 (EDT)
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

(Thought the BOS crowd might like this.  It's an edited reply to one
of the dozens of replies to my earlier 'don't put "find /tmp -exec rm
-f..." in root's crontab' posting.  It's available at:

http://www.ultratech.net/~zblaxell/unix-interface-considered-harmful.txt

Permission granted for unlimited distribution of this document without
modification and with attribution.)

* Unix Interfaces Considered Harmful

Things would be a lot easier if Unix and C had been a bit better
designed.  Once you've removed a few bits of brain-damage from the
standard C/Unix library, people have to think for themselves, and 
that can result in better code.  Not always, but it helps.

Compare an interface that requires a buffer address and maximum length
of the buffer (like fgets) with an interface that only requires a buffer
address and leaves the user responsible for ensuring that the maximum
length of input will be less than the size of the buffer (like gets).
Which is more likely to suffer from buffer-overrun bugs?  

Documentation is the first step in combatting the problem.  The GNU
gets() man page contains this disclaimer:

       Because  it is impossible to tell without knowing the data
       in advance how  many  characters  gets()  will  read,  and
       because  gets() will continue to store characters past the
       end of the buffer, it is extremely dangerous to  use.   It
       has  been  used  to  break computer security.  Use fgets()
       instead.

I'm beginning to get the impression we should have some sort of
'security-lint' program that just looks for stuff like 'gets' and
screams about it.  Unfortunately every third program, even the bloody
*shells*, do stupid things like this:

	strace -f sh -c 'cat <<foo;^Jdata^Jfoo^J'
[...snip...]
open("/tmp/t10841-sh", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
	               ^^^^^^^^^^^^^^^^^^^^^^^^ where's O_EXCL?  D'oh!

Exploit:  create symlinks from '/tmp/tXXXXX-sh' to your victim's
favorite files, where 'XXXXX' is one of the next few hundred process
ID's.  Wait for your victim to run a shell script with a <<document.
Works better if you know that root has one in a crontab.

It seems almost all of the security problems I've seen recently come
from these major causes:

	1.  Race conditions in the filesystem.  Everything from 'lpr -s
-r' to 'rm -rf /home/eviluser' to 'find -exec rm -f' simply doesn't do
enough checking.  If any directory in a filename is insecure, any
filename that goes through that directory is insecure as well.
There are relatively few cases where this is a problem, but the few
cases are implemented at *many* sites.

	2.  Omitting O_EXCL from open() flags for creating files that
aren't supposed to already exist.  Any program that uses mktemp() or
fopen(file,"w") probably has this problem lurking somewhere.  There are
a *lot* of these.

	3.  Buffer overrun and data parsing problems.  CGI scripts are
notorious for this, but everything from syslog() to fax-receiving
software has had this sort of problem.  The biggest cause of this
problem seems to be either the lack of a secure interface or the
existence of an insecure one.

There are several additions to categories #1-3 made by programs that
are ported from one OS to another, where the behavior of system calls
changes.  The behavior of 'chown(symlink,...)' is very different on
systems that have and don't have 'lchown()'.  Striving for the
lowest-common-denominator means you will have no security or no
functionality.

	4.  Executing dangerous commands in input data by default, when
relatively few benign pieces of input data require them.  nroff and
PostScript "documents" can execute arbitrary programs and read arbitrary
files (as can Microsoft Word, Excel, etc...).  Text editors seem to want
to look all over the place to find configuration files, including the
current directory (nvi at least does one sanity check:  you have to own
the candidate configuration file, or it is ignored).  This problem only
gets worse as the WWW infiltrates computer networks, and there are more
and more documents in these formats floating around and fewer and fewer
paranoid users in proportion to the user population.

	5.  "Real" bugs, where the author went out of his or her way to do
something, usually a complicated operation that does not rely very
heavily on system libraries, and did it wrong.  There are actually quite
rare in my experience; most bugs that I find are due to simple
naiveness (so naive that you can grep code for it), not to mistakes made
by people who know what they're doing.  Weak entropy-generators for
cryptographic systems are a common example of this type of problem in
Unix.  The number of Unix programs that need strong entropy generators
is fairly rare, and most such programs have already had dozens of
security evaluations before they hit the masses.  Still, within the last
six months Kerberos IV and Netscape have both had these problems
actively exploited.

A common technique used in MS-DOS programs, where their idea of a "data
file format" is equivalent to "the contents of memory", is to dump the
contents of memory to a data file, without completely initializing the
memory first; in the process, the previous contents of the memory are
"leaked".  Often these files are given to someone else as part of data
interchange, and that someone else can recover the leaked data.  This is
rare in Unix because you usually have to explicitly convert the contents
of memory to a portable representation of some kind before writing a
file, or the data file will be non-portable and inherently useless for
data interchange purposes.  

Category #4 problems are the hardest to fix, and only made harder by the
fact that some package authors consider these security problems
"features" and are reluctant to correct or even to properly document
them.  While these features are useful, they are also rather dangerous
and should not be enabled unless a user explicitly requests so, using an
obvious option name like '--allow-shell-commands' or
'--allow-write-access' or even simply '--unsafe'.

One could probably detect most of the category #3 problems (at least in
C programs) by removing functions from the C library, or replacing them
with stubs that generate run-time warning messages.  Without strcpy (use
strncpy), gets (use fgets), sprintf (use snprintf), popen (use only with
constant parameters and never after running set[ug]id; or use pipe() and
execve() instead), and system (ditto), the vast majority of problems in
category #3 are those created by program authors themselves.  Without
these functions, you have to shoot yourself in the foot instead of
allowing the system library do it for you; laziness takes care of the
rest.

Of course, one probably wants to use this pruned system library only for
auditing purposes, as it will break ANSI C compatibility and 9 out of 10
existing C programs.  The Perl language has 'tainting' checks and better
data types that find many similar problems, although certainly not all
of them.  Shell scripts are almost always beyond hope.

Removing mktemp() and changing the behavior of fopen(file,"w") will help
category #2 problems.  Always use mkstemp() instead of mktemp().  Instead
of fopen(file,"w"), programs should create a new file with
open(file,O_CREAT|O_EXCL), write it, and rename the new file to clobber
the old file.  As a bonus, software actually ends up being more robust;
in the event of disaster (like power failure or software abort),
existing data is not necessarily lost.  Of course, this changes a lot of
things about the file, like permissions and ownership, and you need
write access to the directory; the old clobber-the-existing-file
algorithm can be used in these cases, but it shouldn't be the first
choice.

Category #1 can be really "easily" (in quotes because it's not really so
easy) fixed by allowing a process to modify an inheritable flag in its
state that causes any system call on any filename that ever traverses a
symlink to return EINVAL.  This means that the sanity checks in 'find'
and friends will be unnecessary, since the kernel (or filesystem driver)
is now doing the symlink sanity checking.

An OS with 'fchdir' could implement symlink-aware 'open', 'unlink', and
friends in the C library.  A properly symlink-aware 'lopen' would be nice;
I hate having to keep pasting code into programs to do the
lstat/open/fstat tests to ensure that the file I have opened is the same
as the one I expected it to be.


-- 
Zygo Blaxell.  Former Unix/soft/hardware guru, U of Waterloo Computer Science 
Club.  Current sysadmin for Myrus Design, Inc.  10th place, ACM Intl Collegiate
Programming Contest Finals, 1994.  Administer Linux nets for food, clothing, 
and anime.  "I gave up $1000 to avoid working on windoze... *sigh*" - Amy Fong

