README for L5 vers 1.1, 950207, *Hobbit* After examining Tripwire and deciding that it was *way* overkill for my own purposes, I decided to cobble together my own minimalist solution to the unix file integrity problem. I call it "L5", for a variety of reasons, and have decided to present it to the community as a Useful Hack. For all I know it may have already been done elsewhere, but I haven't yet seen such a thing mentioned, despite the simple underlying concept. L5 simply walks down Unix or DOS filesystems, sort of like "ls -R" or "find" would, generating listings of anything it finds there. It tells you everything it can about a file's status, and adds on the MD5 hash of it. Its output is rather "numeric", but it is a very simple format and is designed to be post-treated by scripts that call L5. Here are some of its other features: Filenames come first, making sorting easier. Filenames are delimited in a non-[unix]-spoofable way; ending in "//". A single character after "//" indicates the file type. Scanning stops at device boundaries, so L5 doesn't go slogging through random NFS trees or "tmpfs"es unless you tell it to. You can tell it not to walk any directories lower than the one[s] you handed it as arguments. [It always walks one level of its given arguments.] You can tell it to only print the filenames. You can specify a file to use as a "timestamp", and L5 will only print files that have been changed since the timestamp. Useful for automated backup scripts; see below. If a file looks like a script of some kind, it is shown as type "K" instead of "F". Useful for finding those setuid shell scripts... MD5 hashing can be output in hex, Tripwire's radix64 format, or not at all, as you specify. The hex hash for a given file is the same as that of the CERT "md5check". You can feed it a list of files or directories to check as its standard input. You can have it do its hash *on* standard input. This feature is useful for doing things like "l5 /critical/files | l5" to get a small but secure summary hash. It is small and reasonably fast. Building is straightforward. The Makefile's primary targets are system types, so "make" one of those. If you don't see your system listed, try "generic" or check out generic.h to see how to build an appropriate section for your own. If you do build a section for a new system, please send me a copy to add to my collection! And yes, I know about GNU autoconf already. After building, you should try these two tests. First, doing "echo foofoo | ./l5" [on unix] should produce -STANDARD INPUT-//X - - -/- 7 - 1vnGam6fDhYM5zgofZB2Ei Next, do "./l5 /dev/ttyp0" and make sure it shows the same major and minor device numbers as "ls -l /dev/ttyp0" does. [It may be /dev/pts/something or /devices/pseudo/pts@god-knows-what your system. Do the right thing.] Some of it is based on code from Tripwire, but it doesn't use a DBM database and only offers one hash method. The MD5 code, in particular, is the endian- independent version from Tripwire, which builds almost anywhere. Selection of files to ignore certain changes in is undoubtedly less versatile, but you can always filter the output through further scripts before, for example, diffing your "old" system snapshot against your "new" system snapshot. Unlike Tripwire itself, this is NOT a complete toolkit -- one is expected to use it as a small, reliable part of a larger system. I generally run the output of L5 through "sed" scripts -- here are some example regexes to select certain criteria to "p" or "d", for instance. # set[ug]id files; also handles extra "tcb" mode bits on AIX /\/\/. [0-9]* [10]*[2-7]... / # world-writeable plain files /\/\/. [0-9]* 10...[2367] / # world-writeable directories /\/\/D [0-9]* 4...[2367] / # low-numbered major devices, such as mem/kmem /\/\/[CB].* [0-9],/ # logfiles in /usr/adm, /var/log, etc that are always changing anyway /\/[al][do][mg]\/.*log\/\/F / # You get the idea... Output is in the following format: /file/name//T inode mode links uid/gid size mtime extra where: T is the file type [see below] inode inode number, a la "ls -i" mode whole mode in octal links number of hardlinks uid owner UID gid owner GID size in decimal mtime in long hex extra varies File types are as follows: F plain file. "extra" is the MD5 hash, or "-". K plain file that looks like a script [#!/bin/foo]. L symlink. "extra" is where it points to. D directory. "extra" is the device it lives on. C character device special. "extra" is major,minor. B block device special. "extra" is major,minor. P FIFO. No "extra". S socket. No "extra". X unknown, or standard-input. Some generic examples: /tmp//D 2 43777 6 0/0 512 2e71fccc 703 /etc/passwd//F 194 100644 1 0/10 10927 2e70ea30 0LOyVbfQFCUvq64c3XePV5 /dev/console//C 3876 20622 1 0/0 0 2e71fc14 0,0 /diskless/root/dev/fd0b//B 169025 60666 1 0/1 0 2e0a2ea3 16,1 /dev/rst8//C 1693 20666 1 0/1 0 2e0a2eea 18,8 /tools/src/localbuild.csh//K 134417 100644 1 433/100 288 2d90c589 - L5 should build on DOS machines unmodified. The creaky old compiler I have doesn't have a simple readdir() equivalent, so I've supplied one. Newer compilers may have library functions to handle this. L5 still has a little trouble with pathnames involving "\" and "C:", and stat() on DOS is somewhat incomplete, but it's still useful for detecting changes in a list of paths you give it. The MD5 algorithm is, however, dismally slow in x86 real mode, especially if you have disabled your machine's memory caching. It could conceivably be ported to other system types like VMS, if appropriate directory-walking handlers were supplied. There's probably already plenty of example code for that in the gnu stuff, for instance. I don't have ready access to a VMS machine at the moment; if you go to do a VMS port please try to keep the output format the same, i.e. translate from [FOO.BAR]BAZ.XXX to /FOO/BAR/BAZ.XXX format, and make sure you send me the diffs and extra code!! L5 prints the mtime rather than the ctime, which some may consider to be insufficiently "sensitive" due to the ease with which mtimes can be changed. However, I've run across some systems where a simple file access [like "cat"ing it] updates the ctime, and if someone has run a backup or something, you're going to get a *lot* of new output. That one was really disturbing until I figured out what was going on. Besides, if you observe a critical system file change in no way but the hash value, that's MUCH more suspicious than if other attributes changed too. If you want ctimes anyway, change statp->st_size, statp->st_mtime, op); to statp->st_size, statp->st_ctime, op); in the "big printf" around line 380 in l5.c [and beware the "offconv" hack]. If L5 is given a timestamp file, its functionality changes somewhat. First, it takes the timestamp from the *mtime* of the given file, and compares it against the *ctime* of any files encountered in its travels. Second, only newer files are printed, directories are always printed, and everything else like devices and links are ignored. This is designed to pipe into "cpio -pvadm" as an automatic backup handler that duplicates a tree into another place. There is a problem here, that this functionality works around: if cpio is only handed files, with the -d switch meaning "create needed directories", the resulting dirs are all owned by root, and possibly mode 700. If directory names are fed into cpio too, the ownership and modes of them are preserved. Thus, if you want users to be able to grab their own files out of the backup tree, this needs to print directories. If you want to be fascist about the backup tree, by all means, take out the directory check around line 420 in l5.c. You could always use something other than "cpio", too, although nothing immediately comes to mind that's any better. An effective "online backup" script can be as simple as touch timestampfile.NEW l5 -q -t timestampfile.OLD /various /file /systems | \ cpio -pvadm /backuptree 2>> backup.log mv timestampfile.NEW timestampfile.OLD Note carefully how L5 is invoked, so that only new-file and directory names get printed; otherwise cpio will be *most* confused. The ctime is compared, rather than the mtime, so the file gets backed up again if someone changed its modes, or name, or whatever. The standard error output from cpio is collected to backup.log so one can see that it ran correctly. The name "L5" seemed appropriate for a number of slightly silly reasons: It does most of what "ls" does, but more. It does MD *5* checksums. Shady characters might spell "ls" as "l5" to be |<00L. The "L5" point in space is a point of gravitational stability -- if you're there, you don't have to worry about drifting away. I think that's what it is, anyway... You are hereby WARNED that I sometimes tend to write rather, um, expressive code and comments. If you don't like observing the occasional obscenity, don't read it. This is supplied AS IS, by a hacker for hackers, and you're expected to be able to deal with any quirks or deficiencies. I have tried to make it as solid and portable as I could, while retaining the freedom to take swipes at well-known vendor stupidities. If you make improvements, send them to me so I can update the "master copy". The major/minor bitshift on AIX appears to be 16, while 18 on solaris and 8 most other places. This is gross, and I've sort of sleazed around it so that AIX and Solaris work now. Search l5.c for "sysmacros" if you want to gaze at the wreckage. You cannot use this in any commercial product; it contains bits of code from Tripwire which is free in the first place. To avoid having myself, Spaf, *and* PKP hunting you down to make sure you never work in the field again, don't go trying to sell this. _H*