Discussion:
finding orphaned files
G***@aotx.uscourts.gov
2008-07-15 15:08:43 UTC
Permalink
Hi.

So I'm in the process of writing a script that does a few things:

1: creates a report of all RPM installed files whose md5 has changed [1]
2: creates a report of all non-rpm owned files (orphans)

I've actually got the script working.. First step takes about 9 minutes due
to the md5 hashing, and the second step takes all of about 15s. Times
would be subjective to system, of course. Anyways. There is a problem
with the second report. Many RPMs report that they own a directory instead
of a file. Which isn't a bad thing, except that not all RPMs when removed
will actually remove the directory and its contents, depending on how it
was defined as owned. My original thought was to just check to see if the
parent directory was owned, but I don't know how conclusive that would be,
or if it would hide from me some of the files I really need to see.
Anyone have any suggestions?

-greg


[1] I'm using a method similar to this:
http://people.redhat.com/pnasrat/rpm-python/rpm-python-slides/foil15.html
Carter Sanders
2008-07-21 21:06:15 UTC
Permalink
Hi Greg-

First a question back at you - it seems like your first step
replicates the --verify option to some extent. Is there some info not
provided by --verify that you need for step 1?

About "2:" - I find when I set up the spec file for an rpm, it's
best to only define %directory that I'm pretty sure won't already exist.
For example, I would define it for /opt/mypackage, but not /opt. If you
have control over the spec files for the rpms you're talking about, you
could trim these unneeded %directory entries to reduce the orphaned
directories output from your script.

Also, I think you'll find that orphaned files are sometimes config
files, which are retained if they have been edited before rpm removal.

-Carter
Post by G***@aotx.uscourts.gov
Hi.
1: creates a report of all RPM installed files whose md5 has changed [1]
2: creates a report of all non-rpm owned files (orphans)
I've actually got the script working.. First step takes about 9 minutes due
to the md5 hashing, and the second step takes all of about 15s. Times
would be subjective to system, of course. Anyways. There is a problem
with the second report. Many RPMs report that they own a directory instead
of a file. Which isn't a bad thing, except that not all RPMs when removed
will actually remove the directory and its contents, depending on how it
was defined as owned. My original thought was to just check to see if the
parent directory was owned, but I don't know how conclusive that would be,
or if it would hide from me some of the files I really need to see.
Anyone have any suggestions?
-greg
http://people.redhat.com/pnasrat/rpm-python/rpm-python-slides/foil15.html
_______________________________________________
Rpm-list mailing list
https://www.redhat.com/mailman/listinfo/rpm-list
G***@aotx.uscourts.gov
2008-07-22 16:03:43 UTC
Permalink
Post by Carter Sanders
Hi Greg-
First a question back at you - it seems like your first step
replicates the --verify option to some extent. Is there some info not
provided by --verify that you need for step 1?
programatic access. It doesn't check all the other bits that -V does, and
also allows me to manipulate the list without the need of parsing the
output of rpm -Va.
Post by Carter Sanders
About "2:" - I find when I set up the spec file for an rpm, it's
best to only define %directory that I'm pretty sure won't already exist.
For example, I would define it for /opt/mypackage, but not /opt. If you
have control over the spec files for the rpms you're talking about, you
could trim these unneeded %directory entries to reduce the orphaned
directories output from your script.
Also, I think you'll find that orphaned files are sometimes config
files, which are retained if they have been edited before rpm removal.
Fortunately, I am not responsible for all of the RedHat RPMs.
Unfortunately, I'm only responsible for a handful of our own RPMs. Even
more unfortunately, some of the modifications on the systems (additional
apps and such) aren't always added with RPM. Its this last scenario that I
am trying to provide the enumeration for. The question was more towards
being able to tell which way an rpm owned a directory and its contents with
only access to the RPM database.

-greg
Carter Sanders
2008-07-22 17:58:37 UTC
Permalink
OK. I'm not sure what you mean by "which way". If you haven't already
done so, you might want to investigate the --dump query option to get
detailed info on directory ownership and permissions.
Post by G***@aotx.uscourts.gov
Fortunately, I am not responsible for all of the RedHat RPMs.
Unfortunately, I'm only responsible for a handful of our own RPMs. Even
more unfortunately, some of the modifications on the systems (additional
apps and such) aren't always added with RPM. Its this last scenario that I
am trying to provide the enumeration for. The question was more towards
being able to tell which way an rpm owned a directory and its contents with
only access to the RPM database.
-greg
_______________________________________________
Rpm-list mailing list
https://www.redhat.com/mailman/listinfo/rpm-list
G***@aotx.uscourts.gov
2008-07-22 19:05:26 UTC
Permalink
Post by Carter Sanders
OK. I'm not sure what you mean by "which way". If you haven't already
done so, you might want to investigate the --dump query option to get
detailed info on directory ownership and permissions.
by "which way" I mean that you can list a directory in the %files section a
handful of ways

%dir /path/to/dir
/path/to/dir
/path/to/dir/*

I'm not saying that they are all the Right Way, but that doesn't mean that
someone hasnt used all of them at one point or another. Where this makes a
difference is how it gets removed upon package removal. Please correct me
if I'm wrong but don't the different methods of declaring the directory in
the %files section behave differently upon rpm removal? I want to say that
they do behave differently, but it has ben a while since I built an RPM
with the purpose of testing them.

The --dump option is helpful and probably what I need to figure out how to
access from inside python, but also shows my concern.

Lets take the the files in /boot/grub and the grub RPM (i'll shorten output
some for the sake of brevity):

My script tells methat these following files are not owned by an RPM:

/boot/grub/device.map
/boot/grub/e2fs_stage1_5
/boot/grub/fat_stage1_5
/boot/grub/ffs_stage1_5
/boot/grub/grub.conf
/boot/grub/iso9660_stage1_5
/boot/grub/jfs_stage1_5
/boot/grub/menu.lst
/boot/grub/minix_stage1_5
/boot/grub/reiserfs_stage1_5
/boot/grub/stage1
/boot/grub/stage2
/boot/grub/ufs2_stage1_5
/boot/grub/vstafs_stage1_5
/boot/grub/xfs_stage1_5

So a quick wrapper script verifies that (assuming the above list is inside
a file called orphans.lst):

[***@hawk ~]# for x in `cat orphans.lst`; do rpm -q --whatprovides $x;
done
file /boot/grub/device.map is not owned by any package
file /boot/grub/e2fs_stage1_5 is not owned by any package
file /boot/grub/fat_stage1_5 is not owned by any package
...
file /boot/grub/ufs2_stage1_5 is not owned by any package
file /boot/grub/vstafs_stage1_5 is not owned by any package
file /boot/grub/xfs_stage1_5 is not owned by any package

Yep, none of them are owned. So lets look at the shared directory path,
/boot/grub.

[***@hawk ~]# rpm -q --whatprovides /boot/grub
grub-0.97-19

Now taking a --dump of grub:
[***@hawk ~]# rpm -q --dump grub
/boot/grub 4096 1190319671 00000000000000000000000000000000 040755 root
root 0 0 0 X
/sbin/grub 239780 1190319672 643711e5865afb872322eefd1dfbf2de 0100755 root
root 0 0 0 X
/sbin/grub-install 17645 1190319670 de58a44fb9850640aed1df09d2e62c82
0100755 root root 0 0 0 X
...
/usr/share/man/man8/grub.8.gz 766 1190319671
788ab166e49665d586e5b20552d88b67 0100644 root root 0 1 0 X

The only entry related to the /boot/grub direcory and its contents was
specifically /boot/grub, and the 3rd from last value supposedly specifies
"isdoc"*. This value is not set to true (which I would assume would be a
1). This should mean that it was defined as '/boot/grub' not '%dir
/boot/grub'. (on a side note, any idea why the grub.8.gz man page file
would be considered a directory? is the man page wrong?) So when I remove
that rpm, what happens to the directory and the files?

-greg




* per man page of rpm the column explanation is:
path size mtime md5sum mode owner group isconfig isdoc rdev symlink
devzero2000
2008-07-23 08:15:54 UTC
Permalink
Post by G***@aotx.uscourts.gov
Post by Carter Sanders
Hi Greg-
First a question back at you - it seems like your first step
replicates the --verify option to some extent. Is there some info not
provided by --verify that you need for step 1?
programatic access. It doesn't check all the other bits that -V does, and
also allows me to manipulate the list without the need of parsing the
output of rpm -Va.
Post by Carter Sanders
About "2:" - I find when I set up the spec file for an rpm, it's
best to only define %directory that I'm pretty sure won't already exist.
For example, I would define it for /opt/mypackage, but not /opt. If you
have control over the spec files for the rpms you're talking about, you
could trim these unneeded %directory entries to reduce the orphaned
directories output from your script.
Also, I think you'll find that orphaned files are sometimes config
files, which are retained if they have been edited before rpm removal.
Fortunately, I am not responsible for all of the RedHat RPMs.
Unfortunately, I'm only responsible for a handful of our own RPMs. Even
more unfortunately, some of the modifications on the systems (additional
apps and such) aren't always added with RPM. Its this last scenario that I
am trying to provide the enumeration for. The question was more towards
being able to tell which way an rpm owned a directory and its contents with
only access to the RPM database.
This simple script give you which rpm own which dir.

_t_d=$(mktemp -d -p /tmp)
_f_f=$(mktemp -p ${_t_d} XXXXXXX)
for _pkg in $(rpm -qa --qf '%{name}-%{version}\n' | sort | uniq)
do
rpm -q --qf '%{name} owns %{dirnames}\n' ${_pkg} | uniq >> ${_f_f}
done
echo "wrote ${_f_f}"

FYI, rpm5 have implicit dependency on parent dir o filelinkto (so it is no
possible to have orphan dir for example: this resolve many problem on
package removal or upgrade). IMHO, Orphan file aren't an issue, IMHO, if
they live in /var or in a volatile filesystem .

hth
Post by G***@aotx.uscourts.gov
-greg
_______________________________________________
Rpm-list mailing list
https://www.redhat.com/mailman/listinfo/rpm-list
G***@aotx.uscourts.gov
2008-07-25 19:36:03 UTC
Permalink
Post by devzero2000
This simple script give you which rpm own which dir.
_t_d=$(mktemp -d -p /tmp)
_f_f=$(mktemp -p ${_t_d} XXXXXXX)
for _pkg in $(rpm -qa --qf '%{name}-%{version}\n' | sort | uniq)
do
rpm -q --qf '%{name} owns %{dirnames}\n' ${_pkg} | uniq >> ${_f_f}
done
echo "wrote ${_f_f}"
thanks... I'm still not sure I found exactly what I was looking for but I'm
close enough at this point.
Post by devzero2000
FYI, rpm5 have implicit dependency on parent dir o filelinkto (so it
is no possible to have orphan dir for example: this resolve many
problem on package removal or upgrade). IMHO, Orphan file aren't an
issue, IMHO, if they live in /var or in a volatile filesystem .
as far as I know we have no plans to goto a different version of RPM than
that provided with RHEL, but that is good to know. Its a nice addition.


-greg
devzero2000
2008-07-28 07:50:19 UTC
Permalink
Post by G***@aotx.uscourts.gov
Post by devzero2000
This simple script give you which rpm own which dir.
_t_d=$(mktemp -d -p /tmp)
_f_f=$(mktemp -p ${_t_d} XXXXXXX)
for _pkg in $(rpm -qa --qf '%{name}-%{version}\n' | sort | uniq)
do
rpm -q --qf '%{name} owns %{dirnames}\n' ${_pkg} | uniq >> ${_f_f}
done
echo "wrote ${_f_f}"
thanks... I'm still not sure I found exactly what I was looking for but I'm
close enough at this point.
Post by devzero2000
FYI, rpm5 have implicit dependency on parent dir o filelinkto (so it
is no possible to have orphan dir for example: this resolve many
problem on package removal or upgrade). IMHO, Orphan file aren't an
issue, IMHO, if they live in /var or in a volatile filesystem .
as far as I know we have no plans to goto a different version of RPM than
that provided with RHEL, but that is good to know. Its a nice addition.
He, only one of many. Anyway, perhaps you are interested in the orphan
directories
present in RHEL5.1. rpm5 can catch via a popt alias

rpm -Va --orphandirs

It is very efficent in discover it

If you want a list on RHEL produced by the above command i can give you if
you ask.

Regards
G***@aotx.uscourts.gov
2008-07-28 14:05:01 UTC
Permalink
Post by devzero2000
He, only one of many. Anyway, perhaps you are interested in the
orphan directories
present in RHEL5.1. rpm5 can catch via a popt alias
rpm -Va --orphandirs
It is very efficent in discover it
If you want a list on RHEL produced by the above command i can give
you if you ask.
It occurs to me that I should have explained that my purpose in this was to
enumerate RHEL3 systems for a migration to RHEL5.

-greg

Loading...