Discussion:
SUG: Automatic RPM database verification and repair
Tony Nelson
2006-11-22 20:54:37 UTC
Permalink
The RPM database should be verified more often than it is now. I've posted
on this topic to the fedora-devel list.

With FC6 there has been a spate of RPM database corruption. It happened to
me: though there may have been incipient corruption in my FC5, after
--rebuilddb and upgrading successfully I found more corruption later. This
brings up that the RPM database is just assumed to work, but isn't being
checked until it falls over.

I propose that the RPM database should be verified on a regular basis. I
have written a utility, rpm_verify_db, to automatically verify and repair
the RPM database, via a daily cron job. Reports of errors are syslog'd,
emailed to root, and shown by logwatch. It could be incorporated into the
RPM package, or even Yum. It can be found at
<http://georgeanelson.com/rpm-verifydb.htm>.

I propose that Anaconda should check the RPM database before starting an
upgrade to an existing installation. Checking takes under a minute on my
system, so it should not be objectionable. Anaconda should offer to repair
a damaged RPM database (if the Package file is OK) before proceeding with
the installation.

I suggest that the --verifydb command should not be undocumented in RPM and
its manpage. This seems to be on purpose, but I think it is a mistake.

I would like some feedback about these proposals. If they are acceptable I
will file RFE bugs on them.

My knowlege of things RPM is superficial. It would be a good idea to have
my proposed verification and repair methods criticised by authentic RPM
developers.

I will be away for a few days, starting tomorrow, Thursday, US Thanksgiving
holiday.
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Nerazzurri
2006-11-24 11:52:10 UTC
Permalink
>
> Subject:
> SUG: Automatic RPM database verification and repair
> From:
> Tony Nelson <***@georgeanelson.com>
> Date:
> Wed, 22 Nov 2006 15:54:37 -0500
> To:
> rpm-***@redhat.com
>
> To:
> rpm-***@redhat.com
>
>
> The RPM database should be verified more often than it is now. I've posted
> on this topic to the fedora-devel list.
>
> With FC6 there has been a spate of RPM database corruption. It happened to
> me: though there may have been incipient corruption in my FC5, after
> --rebuilddb and upgrading successfully I found more corruption later. This
> brings up that the RPM database is just assumed to work, but isn't being
> checked until it falls over.
>
> I propose that the RPM database should be verified on a regular basis. I
> have written a utility, rpm_verify_db, to automatically verify and repair
> the RPM database, via a daily cron job. Reports of errors are syslog'd,
> emailed to root, and shown by logwatch. It could be incorporated into the
> RPM package, or even Yum. It can be found at
> <http://georgeanelson.com/rpm-verifydb.htm>.
>
> I propose that Anaconda should check the RPM database before starting an
> upgrade to an existing installation. Checking takes under a minute on my
> system, so it should not be objectionable. Anaconda should offer to repair
> a damaged RPM database (if the Package file is OK) before proceeding with
> the installation.
>
> I suggest that the --verifydb command should not be undocumented in RPM and
> its manpage. This seems to be on purpose, but I think it is a mistake.
>
> I would like some feedback about these proposals. If they are acceptable I
> will file RFE bugs on them.
>
> My knowlege of things RPM is superficial. It would be a good idea to have
> my proposed verification and repair methods criticised by authentic RPM
> developers.


your idea is good.

but do you test the scripts? it seems that the return value of "rpm
--verifydb" has some problem.

whether your rpmdb is broken or not, the return value is same(all is 0),
so the scripts will not work well


i remove my rpmdb directory(rm -rf /var/lib/rpm), the return value is
still 0, same with right rpmdb

>
> I will be away for a few days, starting tomorrow, Thursday, US Thanksgiving
> holiday.
>
>
> ------------------------------------------------------------------------
Tony Nelson
2006-11-27 03:48:24 UTC
Permalink
At 7:52 PM +0800 11/24/06, Nerazzurri wrote:
>> From: Tony Nelson <***@georgeanelson.com>
...
>> I propose that the RPM database should be verified on a regular basis. I
>> have written a utility, rpm_verify_db, to automatically verify and repair
>> the RPM database, via a daily cron job. Reports of errors are syslog'd,
>> emailed to root, and shown by logwatch. It could be incorporated into the
>> RPM package, or even Yum. It can be found at
>> <http://georgeanelson.com/rpm-verifydb.htm>.
>>
>> I propose that Anaconda should check the RPM database before starting an
>> upgrade to an existing installation. Checking takes under a minute on my
>> system, so it should not be objectionable. Anaconda should offer to repair
>> a damaged RPM database (if the Package file is OK) before proceeding with
>> the installation.
>>
>> I suggest that the --verifydb command should not be undocumented in RPM and
>> its manpage. This seems to be on purpose, but I think it is a mistake.
>>
>> I would like some feedback about these proposals. If they are acceptable I
>> will file RFE bugs on them.
>>
>> My knowlege of things RPM is superficial. It would be a good idea to have
>> my proposed verification and repair methods criticised by authentic RPM
>> developers.
>
>
>your idea is good.
>
>but do you test the scripts?

Yes.

>it seems that the return value of "rpm
>--verifydb" has some problem.
>
>whether your rpmdb is broken or not, the return value is same(all is 0),
>so the scripts will not work well
>
>
>i remove my rpmdb directory(rm -rf /var/lib/rpm), the return value is
>still 0, same with right rpmdb

Rpm does something odd when it is given a non-existent or non-full path to
a RPM database: it creates a new RPM database there. If it is given a full
path (or a proper relative path) to a broken RPM database it will report
the errors. For proper operation, the script requires a full path to a
real RPM database. Arguably this is a deficiency in rpm's "--verifydb"
option.
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Jeff Johnson
2006-11-24 13:48:20 UTC
Permalink
On 11/22/06, Tony Nelson <***@georgeanelson.com> wrote:
> The RPM database should be verified more often than it is now. I've posted
> on this topic to the fedora-devel list.
>

Perhaps.

Using --verifydb (or equivalently, running /usr/lib/rpm/rpmdb_verify, the
preferred solution these days) does not do data checks, only structural
checks on the database.

(aside) rpm-4.0.2 used to do --verifydb on every close. I don't
recall any problem that was meaningfully detected and solved
by verifying more often. FWIW, the macros to verify a dabase
on every close are likely still present and functional in current
rpm.

Note that running --verifydb can/will create locks, which is really
the source of recent reported problems, and running --verifydb
is likelier to create more, rather than fewer, problem reports imho.

The data contained in Packages is signature/digest checked. A digest check
is about the strongest data integrity check possible.

The only other data in an rpmdb is join keys and index keys, and
that data is regenerated whenever --rebuilddb is run.

> With FC6 there has been a spate of RPM database corruption. It happened to
> me: though there may have been incipient corruption in my FC5, after
> --rebuilddb and upgrading successfully I found more corruption later. This
> brings up that the RPM database is just assumed to work, but isn't being
> checked until it falls over.
>

No there has not been a "spate of RPM database corruption".

Yes there have been a number of reports of stale locks and cache
(not database) incoherency. And there are a few other segfaults
being reported against rpm.

But feel free to describe rpm problems however you wish.

> I propose that the RPM database should be verified on a regular basis. I
> have written a utility, rpm_verify_db, to automatically verify and repair
> the RPM database, via a daily cron job. Reports of errors are syslog'd,
> emailed to root, and shown by logwatch. It could be incorporated into the
> RPM package, or even Yum. It can be found at
> <http://georgeanelson.com/rpm-verifydb.htm>.
>

rpm used to do --verifydb, stopped because no problems were prevented or
solved by verifying.

> I propose that Anaconda should check the RPM database before starting an
> upgrade to an existing installation. Checking takes under a minute on my
> system, so it should not be objectionable. Anaconda should offer to repair
> a damaged RPM database (if the Package file is OK) before proceeding with
> the installation.
>

Anaconda used to do a --rebuilddb, stopped because no problems were
being usefully
solved.

> I suggest that the --verifydb command should not be undocumented in RPM and
> its manpage. This seems to be on purpose, but I think it is a mistake.
>

The option is undocumented because using /usr/lib/rpm/rpmdb_verify (which
is exactly the same operation) is the preferred means of database repair.

rpmdb_verify is exactly the Sleepycat distributed utility linked against rpm
rather than system libraries, and is quite well documented at sleepycat.com.

> I would like some feedback about these proposals. If they are acceptable I
> will file RFE bugs on them.
>
> My knowlege of things RPM is superficial. It would be a good idea to have
> my proposed verification and repair methods criticised by authentic RPM
> developers.
>

None of the above should be considered criticism, just history.

There are two needs if one wants to protect against rpmdb data loss.

The most important is saving a copy of /var/lib/rpm/Packages routinely.
All other information in an rpmdb can be regenerated from a reasonably
recent copy of Packages. And in most cases a depsolver like yum/smart/apt/poldek
will reinstall the packages that have changed since the last copy of Packages
was saved.

The other important need is to have a find-like utility to reconstruct an rpmdb
using only md5 digests of installed files for those people who are not saving
a copy of Packages ;-)

73 de Jeff
Tony Nelson
2006-11-27 03:48:28 UTC
Permalink
At 8:48 AM -0500 11/24/06, Jeff Johnson wrote:
>On 11/22/06, Tony Nelson <***@georgeanelson.com> wrote:
>> The RPM database should be verified more often than it is now. I've posted
>> on this topic to the fedora-devel list.
>>
>
>Perhaps.
>
>Using --verifydb (or equivalently, running /usr/lib/rpm/rpmdb_verify, the
>preferred solution these days) does not do data checks, only structural
>checks on the database.

True. It would be a good thing if RPM did such checks.


>(aside) rpm-4.0.2 used to do --verifydb on every close. I don't
>recall any problem that was meaningfully detected and solved
>by verifying more often. FWIW, the macros to verify a dabase
>on every close are likely still present and functional in current
>rpm.

"More often" --> "that often"?


>Note that running --verifydb can/will create locks, which is really
>the source of recent reported problems, and running --verifydb
>is likelier to create more, rather than fewer, problem reports imho.

I hope rpm-verifydb will create more problem reports. Currently, problems
in the RPM database can go unnoticed for a long time, until disaster
strikes.

I chose "rpm --verifydb" over some Berkely DB tool as I expect that rpm
already does proper locking of its database during checks (modulo any
kernel bugs).


>The data contained in Packages is signature/digest checked. A digest check
>is about the strongest data integrity check possible.
>
>The only other data in an rpmdb is join keys and index keys, and
>that data is regenerated whenever --rebuilddb is run.

I take it that all the data used by "--rebuilddb" is in the Packages file,
and that the Packages file is carefully checked by "--rebuilddb"?


>> With FC6 there has been a spate of RPM database corruption. It happened to
>> me: though there may have been incipient corruption in my FC5, after
>> --rebuilddb and upgrading successfully I found more corruption later. This
>> brings up that the RPM database is just assumed to work, but isn't being
>> checked until it falls over.
>>
>
>No there has not been a "spate of RPM database corruption".

How do you know? 8-) After all, who's checking now?


>Yes there have been a number of reports of stale locks and cache
>(not database) incoherency. And there are a few other segfaults
>being reported against rpm.
>
>But feel free to describe rpm problems however you wish.

OK.


>> I propose that the RPM database should be verified on a regular basis. I
>> have written a utility, rpm_verify_db, to automatically verify and repair
>> the RPM database, via a daily cron job. Reports of errors are syslog'd,
>> emailed to root, and shown by logwatch. It could be incorporated into the
>> RPM package, or even Yum. It can be found at
>> <http://georgeanelson.com/rpm-verifydb.htm>.
>>
>
>rpm used to do --verifydb, stopped because no problems were prevented or
>solved by verifying.
>
>> I propose that Anaconda should check the RPM database before starting an
>> upgrade to an existing installation. Checking takes under a minute on my
>> system, so it should not be objectionable. Anaconda should offer to repair
>> a damaged RPM database (if the Package file is OK) before proceeding with
>> the installation.
>>
>
>Anaconda used to do a --rebuilddb, stopped because no problems were
>being usefully solved.

Hmm, that would have solved the problem I reported in RH BZ 215127, and
probably the OP's problem as well. I don't know how to find out if it
would not have helped anyone else.


>> I suggest that the --verifydb command should not be undocumented in RPM and
>> its manpage. This seems to be on purpose, but I think it is a mistake.
>>
>
>The option is undocumented because using /usr/lib/rpm/rpmdb_verify (which
>is exactly the same operation) is the preferred means of database repair.

Where is rpmdb_verify documented? "man rpmdb_verify" doesn't find
anything. "apropos rpm" | grep 'verify'" doesn't find anything either. It
doesn't seem to be in "man rpm".


>rpmdb_verify is exactly the Sleepycat distributed utility linked against rpm
>rather than system libraries, and is quite well documented at sleepycat.com.

RPM users would benefit from having that documented some place RPM (other
than this list). "man rpm" is where they would expect to find it.


>> I would like some feedback about these proposals. If they are acceptable I
>> will file RFE bugs on them.
>>
>> My knowlege of things RPM is superficial. It would be a good idea to have
>> my proposed verification and repair methods criticised by authentic RPM
>> developers.
>>
>
>None of the above should be considered criticism, just history.
>
>There are two needs if one wants to protect against rpmdb data loss.
>
>The most important is saving a copy of /var/lib/rpm/Packages routinely.
>All other information in an rpmdb can be regenerated from a reasonably
>recent copy of Packages. And in most cases a depsolver like
>yum/smart/apt/poldek
>will reinstall the packages that have changed since the last copy of Packages
>was saved.

OK, though Packages is too large to save every day, and just keeping the
last one would mean that the sysadmin would need to notice the problem
before that last good copy of Packages was overwritten. (I have an RPM
database instance that has corruption but functioned normally most of the
time, though not during an Anacoda upgrade to FC6. See RH BZ 215127.) It
would seem that it would be better to only save good copies of the Packages
file, and to report to a responsible sysadmin that there is an issue (if
only there were such a sysdamin for most systems). I don't know how to do
that with "rpm --rebuilddb", but I do know how to do that with "rpm
--verifydb".


>The other important need is to have a find-like utility to reconstruct an
>rpmdb
>using only md5 digests of installed files for those people who are not saving
>a copy of Packages ;-)

/No one/ saves a copy of Packages, as such an implementation detail is
RPM's responsibility, and it does not do it. Don't blame the users for
omissions!
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Michael Jennings
2006-11-30 19:43:58 UTC
Permalink
On Sunday, 26 November 2006, at 22:48:28 (-0500),
Tony Nelson wrote:

> /No one/ saves a copy of Packages, as such an implementation detail
> is RPM's responsibility, and it does not do it.

You are on crack. Backups are the user's responsibility, not the
software's. Filesystems don't back up your data for you. Databases
don't back up your tables for you. Editors don't back up your files
for you (at least not the new versions). SCM systems don't back up
your repositories for you.

If you want backups, YOU have to make them. Stop trying to blame
everybody else and take responsibility for your own (in)actions!

Michael

--
Michael Jennings (a.k.a. KainX) http://www.kainx.org/ <***@kainx.org>
n + 1, Inc., http://www.nplus1.net/ Author, Eterm (www.eterm.org)
-----------------------------------------------------------------------
"Without the hope that things will get better, that our inheritors
will know a world that is fuller and richer than our own, life is
pointless, and evolution is vastly overrated."
-- Mira Furlan (Ambassador Delenn), Babylon Five
Jeff Johnson
2006-11-30 20:05:49 UTC
Permalink
On Nov 30, 2006, at 2:43 PM, Michael Jennings wrote:

> On Sunday, 26 November 2006, at 22:48:28 (-0500),
> Tony Nelson wrote:
>
>> /No one/ saves a copy of Packages, as such an implementation detail
>> is RPM's responsibility, and it does not do it.
>
> You are on crack. Backups are the user's responsibility, not the
> software's. Filesystems don't back up your data for you. Databases
> don't back up your tables for you. Editors don't back up your files
> for you (at least not the new versions). SCM systems don't back up
> your repositories for you.
>
> If you want backups, YOU have to make them. Stop trying to blame
> everybody else and take responsibility for your own (in)actions!
>

Peace, please ;-)

Automatic db_verify trigged by explicit DB_RUNRECOVERY from explicit
dbenv->failchk
environment verification implemented this weekend.

That makes this entire thread history afaict.

73 de Jeff
Tony Nelson
2006-11-30 22:01:35 UTC
Permalink
At 3:05 PM -0500 11/30/06, Jeff Johnson wrote:
>On Nov 30, 2006, at 2:43 PM, Michael Jennings wrote:
>
>> On Sunday, 26 November 2006, at 22:48:28 (-0500),
>> Tony Nelson wrote:
>>
>>> /No one/ saves a copy of Packages, as such an implementation detail
>>> is RPM's responsibility, and it does not do it.
>>
>> You are on crack. Backups are the user's responsibility, not the
>> software's. Filesystems don't back up your data for you. Databases
>> don't back up your tables for you. Editors don't back up your files
>> for you (at least not the new versions). SCM systems don't back up
>> your repositories for you.
>>
>> If you want backups, YOU have to make them. Stop trying to blame
>> everybody else and take responsibility for your own (in)actions!
>>
>
>Peace, please ;-)
>
>Automatic db_verify trigged by explicit DB_RUNRECOVERY from explicit
>dbenv->failchk
>environment verification implemented this weekend.

I don't quite follow you, does that mean RPM will try to recover when it
notices database corruption? Or just do a verify?

BTW, my personal database that started me on this seemed fine in use, but
it fell over when the correct item was accessed. Updating to FC6 did that,
as did querying the item specifically, and the database failed --verify,
but most other operations worked, including updates. If it matters.


>That makes this entire thread history afaict.

Uhh, OK?
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Michael Jennings
2006-11-30 22:28:54 UTC
Permalink
On Thursday, 30 November 2006, at 17:01:35 (-0500),
Tony Nelson wrote:

> BTW, my personal database that started me on this seemed fine in
> use, but it fell over when the correct item was accessed. Updating
> to FC6 did that, as did querying the item specifically, and the
> database failed --verify, but most other operations worked,
> including updates. If it matters.

So far, everyone I've heard of having problems mentioned FC6. Those
of us not using Fedora don't seem to be having problems. Perhaps
you're barking up the wrong tree.

Michael

--
Michael Jennings (a.k.a. KainX) http://www.kainx.org/ <***@kainx.org>
n + 1, Inc., http://www.nplus1.net/ Author, Eterm (www.eterm.org)
-----------------------------------------------------------------------
"Believe what you will...until experience changes your mind."
-- Mira Furlan (Ambassador Delenn), Babylon Five
James Olin Oden
2006-11-30 18:52:20 UTC
Permalink
> >The other important need is to have a find-like utility to reconstruct an
> >rpmdb
> >using only md5 digests of installed files for those people who are not saving
> >a copy of Packages ;-)
>
> /No one/ saves a copy of Packages, as such an implementation detail is
> RPM's responsibility, and it does not do it. Don't blame the users for
> omissions!
Hmmm...I guess no one ever saves copies of the their MySQL tables, or
Postgresql if you like (BTW, I completly understand that to do one
needs the database in a quiescent state)? That there might be a
backup option for rpm is reasonable, but when it should get called is
a completely different question (again we have entered policy land, if
you don't believe me we can make autorollback the default behavior).

...james
Nerazzurri
2006-11-27 07:19:52 UTC
Permalink
>
> None of the above should be considered criticism, just history.
>
> There are two needs if one wants to protect against rpmdb data loss.
>
> The most important is saving a copy of /var/lib/rpm/Packages routinely.
> All other information in an rpmdb can be regenerated from a reasonably
> recent copy of Packages. And in most cases a depsolver like
> yum/smart/apt/poldek
> will reinstall the packages that have changed since the last copy of
> Packages
> was saved.
>

but how can i know which version of "/var/lib/rpm/Packages" is correct
and work well, if i backup a corrupted "Packages", the backup work will
be senseless, isnt it? :-)


> The other important need is to have a find-like utility to reconstruct
> an rpmdb
> using only md5 digests of installed files for those people who are not
> saving
> a copy of Packages ;-)
>
> 73 de Jeff
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Rpm-list mailing list
> Rpm-***@redhat.com
> https://www.redhat.com/mailman/listinfo/rpm-list
Tony Nelson
2006-11-27 15:43:18 UTC
Permalink
At 3:19 PM +0800 11/27/06, Nerazzurri wrote:
>>
>> None of the above should be considered criticism, just history.
>>
>> There are two needs if one wants to protect against rpmdb data loss.
>>
>> The most important is saving a copy of /var/lib/rpm/Packages routinely.
>> All other information in an rpmdb can be regenerated from a reasonably
>> recent copy of Packages. And in most cases a depsolver like
>> yum/smart/apt/poldek
>> will reinstall the packages that have changed since the last copy of
>> Packages
>> was saved.
>>
>
>but how can i know which version of "/var/lib/rpm/Packages" is correct
>and work well, if i backup a corrupted "Packages", the backup work will
>be senseless, isnt it? :-)

The "right" way would be to make a copy of the Packages file, check it, and
only save it if the check passes. According to Jeff, the proper check is
to do a "rpm --rebuilddb" with that Packages file and see if it works, but
I haven't tried that method.
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Tony Nelson
2006-11-29 23:15:10 UTC
Permalink
At 10:43 AM -0500 11/27/06, Tony Nelson wrote:
>At 3:19 PM +0800 11/27/06, Nerazzurri wrote:
>>>
>>> None of the above should be considered criticism, just history.
>>>
>>> There are two needs if one wants to protect against rpmdb data loss.
>>>
>>> The most important is saving a copy of /var/lib/rpm/Packages routinely.
>>> All other information in an rpmdb can be regenerated from a reasonably
>>> recent copy of Packages. And in most cases a depsolver like
>>> yum/smart/apt/poldek
>>> will reinstall the packages that have changed since the last copy of
>>> Packages
>>> was saved.
>>>
>>
>>but how can i know which version of "/var/lib/rpm/Packages" is correct
>>and work well, if i backup a corrupted "Packages", the backup work will
>>be senseless, isnt it? :-)
>
>The "right" way would be to make a copy of the Packages file, check it, and
>only save it if the check passes. According to Jeff, the proper check is
>to do a "rpm --rebuilddb" with that Packages file and see if it works, but
>I haven't tried that method.

Great. /Now/ these messages come through. Three messages to rpm-list seem
to have been stuck at my server (?). No other messages to other places
appear to have had this issue.

>Received: from dc2-web23.assortedinternet.com (dc2-web23.assortedinternet.com
> [66.36.233.11])
> by mx3.redhat.com (8.13.1/8.13.1) with ESMTP id kATMrqQo020733
> for <rpm-***@redhat.com>; Wed, 29 Nov 2006 17:54:02 -0500
>Received: from pool-68-239-53-64.bos.east.verizon.net ([68.239.53.64]
> helo=[192.168.123.162])
> by dc2-web23.assortedinternet.com with esmtpa (Exim 4.52)
> id 1GoicC-00031b-H0
> for rpm-***@redhat.com; Mon, 27 Nov 2006 10:41:52 -0500
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Tony Nelson
2006-11-29 04:32:35 UTC
Permalink
I haven't seen this come through.

At 8:48 AM -0500 11/24/06, Jeff Johnson wrote:
>On 11/22/06, Tony Nelson <***@georgeanelson.com> wrote:
>> The RPM database should be verified more often than it is now. I've posted
>> on this topic to the fedora-devel list.
>>
>
>Perhaps.
>
>Using --verifydb (or equivalently, running /usr/lib/rpm/rpmdb_verify, the
>preferred solution these days) does not do data checks, only structural
>checks on the database.

True. It would be a good thing if RPM did such checks.


>(aside) rpm-4.0.2 used to do --verifydb on every close. I don't
>recall any problem that was meaningfully detected and solved
>by verifying more often. FWIW, the macros to verify a dabase
>on every close are likely still present and functional in current
>rpm.

"More often" --> "that often"?


>Note that running --verifydb can/will create locks, which is really
>the source of recent reported problems, and running --verifydb
>is likelier to create more, rather than fewer, problem reports imho.

I hope rpm-verifydb will create more problem reports. Currently, problems
in the RPM database can go unnoticed for a long time, until disaster
strikes.

I chose "rpm --verifydb" over some Berkely DB tool as I expect that rpm
already does proper locking of its database during checks (modulo any
kernel bugs).


>The data contained in Packages is signature/digest checked. A digest check
>is about the strongest data integrity check possible.
>
>The only other data in an rpmdb is join keys and index keys, and
>that data is regenerated whenever --rebuilddb is run.

I take it that all the data used by "--rebuilddb" is in the Packages file,
and that the Packages file is carefully checked by "--rebuilddb"?


>> With FC6 there has been a spate of RPM database corruption. It happened to
>> me: though there may have been incipient corruption in my FC5, after
>> --rebuilddb and upgrading successfully I found more corruption later. This
>> brings up that the RPM database is just assumed to work, but isn't being
>> checked until it falls over.
>>
>
>No there has not been a "spate of RPM database corruption".

How do you know? 8-) After all, who's checking now?


>Yes there have been a number of reports of stale locks and cache
>(not database) incoherency. And there are a few other segfaults
>being reported against rpm.
>
>But feel free to describe rpm problems however you wish.

OK.


>> I propose that the RPM database should be verified on a regular basis. I
>> have written a utility, rpm_verify_db, to automatically verify and repair
>> the RPM database, via a daily cron job. Reports of errors are syslog'd,
>> emailed to root, and shown by logwatch. It could be incorporated into the
>> RPM package, or even Yum. It can be found at
>> <http://georgeanelson.com/rpm-verifydb.htm>.
>>
>
>rpm used to do --verifydb, stopped because no problems were prevented or
>solved by verifying.
>
>> I propose that Anaconda should check the RPM database before starting an
>> upgrade to an existing installation. Checking takes under a minute on my
>> system, so it should not be objectionable. Anaconda should offer to repair
>> a damaged RPM database (if the Package file is OK) before proceeding with
>> the installation.
>>
>
>Anaconda used to do a --rebuilddb, stopped because no problems were
>being usefully solved.

Hmm, that would have solved the problem I reported in RH BZ 215127, and
probably the OP's problem as well. I don't know how to find out if it
would not have helped anyone else.


>> I suggest that the --verifydb command should not be undocumented in RPM and
>> its manpage. This seems to be on purpose, but I think it is a mistake.
>>
>
>The option is undocumented because using /usr/lib/rpm/rpmdb_verify (which
>is exactly the same operation) is the preferred means of database repair.

Where is rpmdb_verify documented? "man rpmdb_verify" doesn't find
anything. "apropos rpm" | grep 'verify'" doesn't find anything either. It
doesn't seem to be in "man rpm".


>rpmdb_verify is exactly the Sleepycat distributed utility linked against rpm
>rather than system libraries, and is quite well documented at sleepycat.com.

RPM users would benefit from having that documented some place RPM (other
than this list). "man rpm" is where they would expect to find it.


>> I would like some feedback about these proposals. If they are acceptable I
>> will file RFE bugs on them.
>>
>> My knowlege of things RPM is superficial. It would be a good idea to have
>> my proposed verification and repair methods criticised by authentic RPM
>> developers.
>>
>
>None of the above should be considered criticism, just history.
>
>There are two needs if one wants to protect against rpmdb data loss.
>
>The most important is saving a copy of /var/lib/rpm/Packages routinely.
>All other information in an rpmdb can be regenerated from a reasonably
>recent copy of Packages. And in most cases a depsolver like
>yum/smart/apt/poldek
>will reinstall the packages that have changed since the last copy of Packages
>was saved.

OK, though Packages is too large to save every day, and just keeping the
last one would mean that the sysadmin would need to notice the problem
before that last good copy of Packages was overwritten. (I have an RPM
database instance that has corruption but functioned normally most of the
time, though not during an Anacoda upgrade to FC6. See RH BZ 215127.) It
would seem that it would be better to only save good copies of the Packages
file, and to report to a responsible sysadmin that there is an issue (if
only there were such a sysdamin for most systems). I don't know how to do
that with "rpm --rebuilddb", but I do know how to do that with "rpm
--verifydb".


>The other important need is to have a find-like utility to reconstruct an
>rpmdb
>using only md5 digests of installed files for those people who are not saving
>a copy of Packages ;-)

/No one/ saves a copy of Packages, as such an implementation detail is
RPM's responsibility, and it does not do it. Don't blame the users for
omissions!
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Stanley, Jon
2006-11-29 04:54:38 UTC
Permalink
>/No one/ saves a copy of Packages, as such an implementation detail is
>RPM's responsibility, and it does not do it. Don't blame the users for
>omissions!
>--

No one? For a particularly large customer, one of my daily cron jobs is
to backup /var/lib/rpm (in it's entirety). I've dealt with the
aftermath of far too many RPM database corruptions (namely about one in
1,000. For this customer, it's worth the extra disk space).
Tony Nelson
2006-11-29 04:32:27 UTC
Permalink
I haven't seen this come through.

At 7:52 PM +0800 11/24/06, Nerazzurri wrote:
>> From: Tony Nelson <***@georgeanelson.com>
...
>> I propose that the RPM database should be verified on a regular basis. I
>> have written a utility, rpm_verify_db, to automatically verify and repair
>> the RPM database, via a daily cron job. Reports of errors are syslog'd,
>> emailed to root, and shown by logwatch. It could be incorporated into the
>> RPM package, or even Yum. It can be found at
>> <http://georgeanelson.com/rpm-verifydb.htm>.
>>
>> I propose that Anaconda should check the RPM database before starting an
>> upgrade to an existing installation. Checking takes under a minute on my
>> system, so it should not be objectionable. Anaconda should offer to repair
>> a damaged RPM database (if the Package file is OK) before proceeding with
>> the installation.
>>
>> I suggest that the --verifydb command should not be undocumented in RPM and
>> its manpage. This seems to be on purpose, but I think it is a mistake.
>>
>> I would like some feedback about these proposals. If they are acceptable I
>> will file RFE bugs on them.
>>
>> My knowlege of things RPM is superficial. It would be a good idea to have
>> my proposed verification and repair methods criticised by authentic RPM
>> developers.
>
>
>your idea is good.
>
>but do you test the scripts?

Yes.

>it seems that the return value of "rpm
>--verifydb" has some problem.
>
>whether your rpmdb is broken or not, the return value is same(all is 0),
>so the scripts will not work well
>
>
>i remove my rpmdb directory(rm -rf /var/lib/rpm), the return value is
>still 0, same with right rpmdb

Rpm does something odd when it is given a non-existent or non-full path to
a RPM database: it creates a new RPM database there. If it is given a full
path (or a proper relative path) to a broken RPM database it will report
the errors. For proper operation, the script requires a full path to a
real RPM database. Arguably this is a deficiency in rpm's "--verifydb"
option.
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Stanley, Jon
2006-11-29 05:03:32 UTC
Permalink
>
>Rpm does something odd when it is given a non-existent or
>non-full path to
>a RPM database: it creates a new RPM database there. If it is
>given a full
>path (or a proper relative path) to a broken RPM database it
>will report
>the errors. For proper operation, the script requires a full path to a
>real RPM database. Arguably this is a deficiency in rpm's "--verifydb"
>option.
>--

Correct - if given a database path that does not exist, I would expect
rpm to generate an error on --verifydb. I would submit it as a bug if
it creates a new database in that path instead.
Tony Nelson
2006-11-29 04:33:02 UTC
Permalink
I haven't seen this come through.

At 3:19 PM +0800 11/27/06, Nerazzurri wrote:
>>
>> None of the above should be considered criticism, just history.
>>
>> There are two needs if one wants to protect against rpmdb data loss.
>>
>> The most important is saving a copy of /var/lib/rpm/Packages routinely.
>> All other information in an rpmdb can be regenerated from a reasonably
>> recent copy of Packages. And in most cases a depsolver like
>> yum/smart/apt/poldek
>> will reinstall the packages that have changed since the last copy of
>> Packages
>> was saved.
>>
>
>but how can i know which version of "/var/lib/rpm/Packages" is correct
>and work well, if i backup a corrupted "Packages", the backup work will
>be senseless, isnt it? :-)

The "right" way would be to make a copy of the Packages file, check it, and
only save it if the check passes. According to Jeff, the proper check is
to do a "rpm --rebuilddb" with that Packages file and see if it works, but
I haven't tried that method.
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Stanley, Jon
2006-11-29 04:57:06 UTC
Permalink
>
>The "right" way would be to make a copy of the Packages file,
>check it, and
>only save it if the check passes. According to Jeff, the
>proper check is
>to do a "rpm --rebuilddb" with that Packages file and see if
>it works, but
>I haven't tried that method.
>--

Is there some way to verify that a --rebuilddb *would* work against it?
I have an aversion to modifying production data, for obviously good
reason.
Jeff Johnson
2006-11-29 12:27:59 UTC
Permalink
On Nov 28, 2006, at 11:33 PM, Tony Nelson wrote:

> I haven't seen this come through.
>
> At 3:19 PM +0800 11/27/06, Nerazzurri wrote:
>>>
>>> None of the above should be considered criticism, just history.
>>>
>>> There are two needs if one wants to protect against rpmdb data loss.
>>>
>>> The most important is saving a copy of /var/lib/rpm/Packages
>>> routinely.
>>> All other information in an rpmdb can be regenerated from a
>>> reasonably
>>> recent copy of Packages. And in most cases a depsolver like
>>> yum/smart/apt/poldek
>>> will reinstall the packages that have changed since the last copy of
>>> Packages
>>> was saved.
>>>
>>
>> but how can i know which version of "/var/lib/rpm/Packages" is
>> correct
>> and work well, if i backup a corrupted "Packages", the backup work
>> will
>> be senseless, isnt it? :-)
>
> The "right" way would be to make a copy of the Packages file, check
> it, and
> only save it if the check passes. According to Jeff, the proper
> check is
> to do a "rpm --rebuilddb" with that Packages file and see if it
> works, but
> I haven't tried that method.

Sorry for not replying.

Doing --rebuildib is not the "proper" check, nor is --rebuilddb the
only check.

What I tried to say is

1) Doing --verifydb does not verify the data in an rpmdb, but
only verifies
the Berkeley DB structural elements.

2) All the essential data is in Packages, the indices can be
rebuilt by --rebuilddb
whenever needed.

3) The integrity of headers contained in Packages is verfied
when the header
is read from an rpmdb.

There are very few cases of damaged headers in Packages that have
been reported
to me in the last 2 years, none that were not easily detectable and
fixed.

73 de Jeff
Tony Nelson
2006-11-29 20:48:10 UTC
Permalink
At 7:27 AM -0500 11/29/06, Jeff Johnson wrote:
>On Nov 28, 2006, at 11:33 PM, Tony Nelson wrote:
>
>> I haven't seen this come through.
>>
>> At 3:19 PM +0800 11/27/06, Nerazzurri wrote:
>>>>
>>>> None of the above should be considered criticism, just history.
>>>>
>>>> There are two needs if one wants to protect against rpmdb data loss.
>>>>
>>>>The most important is saving a copy of /var/lib/rpm/Packages routinely.
>>>>All other information in an rpmdb can be regenerated from a reasonably
>>>>recent copy of Packages. And in most cases a depsolver like
>>>>yum/smart/apt/poldek will reinstall the packages that have changed
>>>>since the last copy of Packages was saved.
>>>>
>>>
>>>but how can i know which version of "/var/lib/rpm/Packages" is correct
>>>and work well, if i backup a corrupted "Packages", the backup work will
>>>be senseless, isnt it? :-)
>>
>>The "right" way would be to make a copy of the Packages file, check it,
>>and only save it if the check passes. According to Jeff, the proper check
>>is to do a "rpm --rebuilddb" with that Packages file and see if it works,
>>but I haven't tried that method.
>
>Sorry for not replying.

No, I didn't see my own post come through, so I don't know how anyone could
have replied.


>Doing --rebuildib is not the "proper" check,

How do I get --rebuilddb to tell me if there are problems with my RPM database?

>nor is --rebuilddb the only check.

So far you have mentioned only --verifydb, which does a check, and
--rebuilddb, which does not check the existing database, but ony attempts
to build a new database from the Packages file. What other check is there?


>What I tried to say is
>
> 1) Doing --verifydb does not verify the data in an rpmdb, but
> only verifies the Berkeley DB structural elements.
>
> 2) All the essential data is in Packages, the indices can be
> rebuilt by --rebuilddbwhenever needed.
>
> 3) The integrity of headers contained in Packages is verfied
> when the headeris read from an rpmdb.

Yes, you said that, and I understood it. Since RPM normally uses the other
files, their integrity is important for normal use. "rpm --verifydb"
checks the format of those files. "rpm --rebuilddb" replaces them without
checking them. Apparently nothing checks their referential integrity, so
the best available check is "rpm --verifydb". Is there any check that can
determine that --rebuilddb is needed, other than either --verifydb or rpm
puking?


>There are very few cases of damaged headers in Packages that have been
>reported to me in the last 2 years, none that were not easily detectable
>and fixed.

Normally very few bugs get reported at all. Most people using Fedora don't
report bugs, and don't use even fedora-list. Newcomers to fedora-list
frequently say it is too much trouble when asked to file a bug report, and
then they go away again. The lack of new bug reports is mild reassurance.

I may have been using a "broken" RPM database for quite a while before my
attempt to upgrade FC5 to FC6 failed because of it. Many operations on the
RPM database still worked (e.ge. yum updates and installs). Only fiddling
around with the "problem" package caused errors from RPM, though "rpm
--verifydb" also reported the errors. I don't really know if the damage to
the RPM database had any real affect (until the failed upgrade), and it
would be very hard to find out. Thus, even I did not report such a
problem, until now. For most users, if an upgrade to FCx fails, they will
do a fresh install of some OS, possibly Fedora, and never report anything.
Therefore there can be errors in the RPM database that aren't in the
Packages file and that can be detected by "rpm --verifydb"; see RH BZ
215127 for possibly two examples.

Running "rpm --rebuilddb" does not diagnose errors in the RPM database, as
it replaces the old indices without checking whether they were wrong.

As you believe that "rpm --verfiydb" will not help anyone, I understand
that you will not be documenting it or encouraging the checking of users'
RPM databases. As I think that there are corrupt RPM databases out there,
I will promote my rpm_vefifydb package on fedora-devel-list and
fedora-list. You may chime in there if you wish to correct any perceived
misinformation I am promoting.
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Jeff Johnson
2006-11-29 13:40:36 UTC
Permalink
On 11/28/06, Tony Nelson <***@georgeanelson.com> wrote:
>
> Rpm does something odd when it is given a non-existent or non-full path to
> a RPM database: it creates a new RPM database there. If it is given a full
> path (or a proper relative path) to a broken RPM database it will report
> the errors. For proper operation, the script requires a full path to a
> real RPM database. Arguably this is a deficiency in rpm's "--verifydb"
> option.

RPM creates the dbpath lazily if it does not exist. Whether that is a feature
or a bug depends on who you are talking to.

The current behavior (creating the path lazily) makes creating chroots
easier, one does not have to do the explicit
mkdir -p /path/to/chroot/var/lib/rpm

OTOH, invocations of rpm with random dbpaths can/will litter the end-point
with lazily created empty directories.

73 de Jeff

> --
> ____________________________________________________________________
> TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
> ' is no more. <http://www.georgeanelson.com/>
>
> _______________________________________________
> Rpm-list mailing list
> Rpm-***@redhat.com
> https://www.redhat.com/mailman/listinfo/rpm-list
>
Tony Nelson
2006-11-29 20:48:14 UTC
Permalink
At 8:40 AM -0500 11/29/06, Jeff Johnson wrote:
>On 11/28/06, Tony Nelson <***@georgeanelson.com> wrote:
>>
>> Rpm does something odd when it is given a non-existent or non-full path to
>> a RPM database: it creates a new RPM database there. If it is given a full
>> path (or a proper relative path) to a broken RPM database it will report
>> the errors. For proper operation, the script requires a full path to a
>> real RPM database. Arguably this is a deficiency in rpm's "--verifydb"
>> option.
>
>RPM creates the dbpath lazily if it does not exist. Whether that is a feature
>or a bug depends on who you are talking to.
>
>The current behavior (creating the path lazily) makes creating chroots
>easier, one does not have to do the explicit
> mkdir -p /path/to/chroot/var/lib/rpm
>
>OTOH, invocations of rpm with random dbpaths can/will litter the end-point
>with lazily created empty directories.

Yes, as will doing a rebuild on a missing database.

RPM should mention this creation when doing a "--verifydb" or a "--rebuilddb".
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Jeff Johnson
2006-11-29 14:05:01 UTC
Permalink
On 11/28/06, Tony Nelson <***@georgeanelson.com> wrote:
> I haven't seen this come through.
>
> At 8:48 AM -0500 11/24/06, Jeff Johnson wrote:
> >On 11/22/06, Tony Nelson <***@georgeanelson.com> wrote:
> >> The RPM database should be verified more often than it is now. I've posted
> >> on this topic to the fedora-devel list.
> >>
> >
> >Perhaps.
> >
> >Using --verifydb (or equivalently, running /usr/lib/rpm/rpmdb_verify, the
> >preferred solution these days) does not do data checks, only structural
> >checks on the database.
>
> True. It would be a good thing if RPM did such checks.
>

We disagree, at least on the frequency of checking and the amount of automation.

But there is nothing stopping you from using --verifydb, or rpmdb_verify, or
configuring rpm to verify every index on every close if you desire.

Choosing the default behavior configured, as always in rpm, is where most
of the disagreement lies.

>
> >(aside) rpm-4.0.2 used to do --verifydb on every close. I don't
> >recall any problem that was meaningfully detected and solved
> >by verifying more often. FWIW, the macros to verify a dabase
> >on every close are likely still present and functional in current
> >rpm.
>
> "More often" --> "that often"?
>

Yes, that is the sense of my reply.

>
> >Note that running --verifydb can/will create locks, which is really
> >the source of recent reported problems, and running --verifydb
> >is likelier to create more, rather than fewer, problem reports imho.
>

So create problem reports. The current problem reports have to do
with stale locks and __db cache corruption, not with Packages or --verifydb.

> I hope rpm-verifydb will create more problem reports. Currently, problems
> in the RPM database can go unnoticed for a long time, until disaster
> strikes.
>

If a tree falls in the forest, who hears the noise?

> I chose "rpm --verifydb" over some Berkely DB tool as I expect that rpm
> already does proper locking of its database during checks (modulo any
> kernel bugs).
>

RPM with concurrent access uses exactly Berkely DB locks, there really
is no meaningful difference between --verifydb and rpmdb_verify.

The external executable is preferred to simplify rpm options (there are
far too many) and to provide better doco (of which there is far too little)
for verifying an rpmdb.

>
> >The data contained in Packages is signature/digest checked. A digest check
> >is about the strongest data integrity check possible.
> >
> >The only other data in an rpmdb is join keys and index keys, and
> >that data is regenerated whenever --rebuilddb is run.
>
> I take it that all the data used by "--rebuilddb" is in the Packages file,
> and that the Packages file is carefully checked by "--rebuilddb"?
>

I have no idea what you mean by "carefully". If you ask more specifically,
I might be able to say what is and isn't checked.

>
> >> With FC6 there has been a spate of RPM database corruption. It happened to
> >> me: though there may have been incipient corruption in my FC5, after
> >> --rebuilddb and upgrading successfully I found more corruption later. This
> >> brings up that the RPM database is just assumed to work, but isn't being
> >> checked until it falls over.
> >>
> >
> >No there has not been a "spate of RPM database corruption".
>
> How do you know? 8-) After all, who's checking now?
>

Ultimately, noone knows.

Meanwhile, I have years of experience diagnosing and fixing rpmdb
problems. I am not seeing or hearing that headers were damaged in
databases (which is my definition of "corruption"). YMMV as always.

>
> >Yes there have been a number of reports of stale locks and cache
> >(not database) incoherency. And there are a few other segfaults
> >being reported against rpm.
> >
> >But feel free to describe rpm problems however you wish.
>
> OK.
>
>
> >> I propose that the RPM database should be verified on a regular basis. I
> >> have written a utility, rpm_verify_db, to automatically verify and repair
> >> the RPM database, via a daily cron job. Reports of errors are syslog'd,
> >> emailed to root, and shown by logwatch. It could be incorporated into the
> >> RPM package, or even Yum. It can be found at
> >> <http://georgeanelson.com/rpm-verifydb.htm>.
> >>
> >
> >rpm used to do --verifydb, stopped because no problems were prevented or
> >solved by verifying.
> >
> >> I propose that Anaconda should check the RPM database before starting an
> >> upgrade to an existing installation. Checking takes under a minute on my
> >> system, so it should not be objectionable. Anaconda should offer to repair
> >> a damaged RPM database (if the Package file is OK) before proceeding with
> >> the installation.
> >>
> >
> >Anaconda used to do a --rebuilddb, stopped because no problems were
> >being usefully solved.
>
> Hmm, that would have solved the problem I reported in RH BZ 215127, and
> probably the OP's problem as well. I don't know how to find out if it
> would not have helped anyone else.
>

Then ask anaconda (or yum) to do run rpmdb_verify or --verifydb or
call the python method ts.verifyDB() as needed. When an rpmdb is
verified is not solvable in rpmlib
(except by verifying on every close, been there, done that, dinna help).

>
> >> I suggest that the --verifydb command should not be undocumented in RPM and
> >> its manpage. This seems to be on purpose, but I think it is a mistake.
> >>
> >
> >The option is undocumented because using /usr/lib/rpm/rpmdb_verify (which
> >is exactly the same operation) is the preferred means of database repair.
>
> Where is rpmdb_verify documented? "man rpmdb_verify" doesn't find
> anything. "apropos rpm" | grep 'verify'" doesn't find anything either. It
> doesn't seem to be in "man rpm".
>

Look for "db_verify" in the Berkely DB utilities doco, which is also included
in the db4 package last I checked.

>
> >rpmdb_verify is exactly the Sleepycat distributed utility linked against rpm
> >rather than system libraries, and is quite well documented at sleepycat.com.
>
> RPM users would benefit from having that documented some place RPM (other
> than this list). "man rpm" is where they would expect to find it.
>

"expect" is in the eye of the beholder.

>
> >> I would like some feedback about these proposals. If they are acceptable I
> >> will file RFE bugs on them.
> >>
> >> My knowlege of things RPM is superficial. It would be a good idea to have
> >> my proposed verification and repair methods criticised by authentic RPM
> >> developers.
> >>
> >
> >None of the above should be considered criticism, just history.
> >
> >There are two needs if one wants to protect against rpmdb data loss.
> >
> >The most important is saving a copy of /var/lib/rpm/Packages routinely.
> >All other information in an rpmdb can be regenerated from a reasonably
> >recent copy of Packages. And in most cases a depsolver like
> >yum/smart/apt/poldek
> >will reinstall the packages that have changed since the last copy of Packages
> >was saved.
>
> OK, though Packages is too large to save every day, and just keeping the
> last one would mean that the sysadmin would need to notice the problem
> before that last good copy of Packages was overwritten. (I have an RPM
> database instance that has corruption but functioned normally most of the
> time, though not during an Anacoda upgrade to FC6. See RH BZ 215127.) It
> would seem that it would be better to only save good copies of the Packages
> file, and to report to a responsible sysadmin that there is an issue (if
> only there were such a sysdamin for most systems). I don't know how to do
> that with "rpm --rebuilddb", but I do know how to do that with "rpm
> --verifydb".
>

Too large is a different problem, try "man rdiff" if you want an easy
incremental
backup.

>
> >The other important need is to have a find-like utility to reconstruct an
> >rpmdb
> >using only md5 digests of installed files for those people who are not saving
> >a copy of Packages ;-)
>
> /No one/ saves a copy of Packages, as such an implementation detail is
> RPM's responsibility, and it does not do it. Don't blame the users for
> omissions!

We disagree. If you want to protect your data, you need to take precautions.

RPM cannot do reliable backups for all possible users under all cases.

Meanwhile, there is nothing stopping you or anyone else from doing whatever
you wish to achieve reliability.

73 de Jeff
Tony Nelson
2006-11-29 20:48:17 UTC
Permalink
At 9:05 AM -0500 11/29/06, Jeff Johnson wrote:
>On 11/28/06, Tony Nelson <***@georgeanelson.com> wrote:

>> True. It would be a good thing if RPM did such checks.

>We disagree, at least on the frequency of checking and the amount of
>automation.

OK.


>So create problem reports. The current problem reports have to do
>with stale locks and __db cache corruption, not with Packages or --verifydb.

RH BZ 215127.


>> I hope rpm-verifydb will create more problem reports. Currently, problems
>> in the RPM database can go unnoticed for a long time, until disaster
>> strikes.

>If a tree falls in the forest, who hears the noise?

Only the end-user it falls on.


>> I chose "rpm --verifydb" over some Berkely DB tool as I expect that rpm
>> already does proper locking of its database during checks (modulo any
>> kernel bugs).
>>
>
>RPM with concurrent access uses exactly Berkely DB locks, there really
>is no meaningful difference between --verifydb and rpmdb_verify.

Nor did I claim that there was. "rpmdb_verify" is part of RPM, not part of
Berkely DB.


>The external executable is preferred to simplify rpm options (there are
>far too many) and to provide better doco (of which there is far too little)
>for verifying an rpmdb.

There is no man page for rpmdb_verify in FC6.


>>>>With FC6 there has been a spate of RPM database corruption. It happened
>>>>to me: though there may have been incipient corruption in my FC5, after
>>>>--rebuilddb and upgrading successfully I found more corruption later.
>>>>This brings up that the RPM database is just assumed to work, but isn't
>>>>being checked until it falls over.
>> >>
>> >
>> >No there has not been a "spate of RPM database corruption".
>>
>> How do you know? 8-) After all, who's checking now?
>>
>
>Ultimately, noone knows.
>
>Meanwhile, I have years of experience diagnosing and fixing rpmdb
>problems. I am not seeing or hearing that headers were damaged in
>databases (which is my definition of "corruption"). YMMV as always.

I suggest that you supplement your years of experience with actual hard
data, gathered by a daily cron task that checks the RPM database for
corruption and reports that corruption to the user, and to someone at RH or
Fedora -- "yum update" seems a reasonable reporting channel. (While there
is much angst over counting Fedora installations, counting broken RPM
databases is another matter. If you are correct, no reports will be sent.)


>Then ask anaconda (or yum) to do run rpmdb_verify or --verifydb or
>call the python method ts.verifyDB() as needed.

That does seem like the course of action that is most likely to produce
benefit.

>When an rpmdb is verified is not solvable in rpmlib
>(except by verifying on every close, been there, done that, dinna help).

It is solvable in the RPM /package/, by installing a cron task much as my
rpm_verifydb package does.


>> Where is rpmdb_verify documented? "man rpmdb_verify" doesn't find
>> anything. "apropos rpm" | grep 'verify'" doesn't find anything either. It
>> doesn't seem to be in "man rpm".

>Look for "db_verify" in the Berkely DB utilities doco, which is also included
>in the db4 package last I checked.

So "rpmdb_verify" is not documented anywhere? It should be documented in
RPM's documentation, which could assert that it is the same as "db_verify"
in the Beredely DB docs.


>> RPM users would benefit from having that documented some place RPM (other
>> than this list). "man rpm" is where they would expect to find it.
>>
>
>"expect" is in the eye of the beholder.

Exactly. RPM's users are RPM's customers. RPM should try to serve them
well, and this includes putting the documentatin for RPM in RPM's own
documentation.


>> OK, though Packages is too large to save every day, and just keeping the
>> last one would mean that the sysadmin would need to notice the problem
>> before that last good copy of Packages was overwritten. (I have an RPM
>> database instance that has corruption but functioned normally most of the
>> time, though not during an Anacoda upgrade to FC6. See RH BZ 215127.) It
>> would seem that it would be better to only save good copies of the Packages
>> file, and to report to a responsible sysadmin that there is an issue (if
>> only there were such a sysdamin for most systems). I don't know how to do
>> that with "rpm --rebuilddb", but I do know how to do that with "rpm
>> --verifydb".
>>
>
>Too large is a different problem, try "man rdiff" if you want an easy
>incremental
>backup.

I see rdiff is in extras. I see it uses rsync. Possibly I will add that
capability to my rpm_verifydb package.


>> /No one/ saves a copy of Packages, as such an implementation detail is
>> RPM's responsibility, and it does not do it. Don't blame the users for
>> omissions!
>
>We disagree. If you want to protect your data, you need to take precautions.

It disapoints me that you think it is not RPM's responsibility to protect
its database.


>RPM cannot do reliable backups for all possible users under all cases.
>
>Meanwhile, there is nothing stopping you or anyone else from doing whatever
>you wish to achieve reliability.

I understand completely.
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Michael Jennings
2006-11-29 20:58:36 UTC
Permalink
On Wednesday, 29 November 2006, at 15:48:17 (-0500),
Tony Nelson wrote:

> It disapoints me that you think it is not RPM's responsibility to
> protect its database.

It is RPM's responsibility to provide database backups as much as it
is the kernel's responsibility to back up your ext2 filesystems.

Michael

--
Michael Jennings (a.k.a. KainX) http://www.kainx.org/ <***@kainx.org>
n + 1, Inc., http://www.nplus1.net/ Author, Eterm (www.eterm.org)
-----------------------------------------------------------------------
"May God stand between you and harm in all the empty places where you
must walk." -- Old Arabic Blessing
seth vidal
2006-11-29 21:04:49 UTC
Permalink
On Wed, 2006-11-29 at 15:58 -0500, Michael Jennings wrote:
> On Wednesday, 29 November 2006, at 15:48:17 (-0500),
> Tony Nelson wrote:
>
> > It disapoints me that you think it is not RPM's responsibility to
> > protect its database.
>
> It is RPM's responsibility to provide database backups as much as it
> is the kernel's responsibility to back up your ext2 filesystems.
>

However, if it were possible to corrupt the ext2 filesystem by
performing frequent reads and writes we would consider that a bug in
ext2, not in the program making the writes.

-sv
Michael Jennings
2006-11-29 21:08:50 UTC
Permalink
On Wednesday, 29 November 2006, at 16:04:49 (-0500),
seth vidal wrote:

> However, if it were possible to corrupt the ext2 filesystem by
> performing frequent reads and writes

It is.

> we would consider that a bug in ext2, not in the program making the
> writes.

Not necessarily. You're making a rather large ASSumption that the
problem doesn't like with the program, or the VFS layer, or some other
component.

In any event...what's your point? I have yet to see a scenario where
frequent reads from and writes to the RPM DB, which were allowed to
complete successfully without interruption, caused any corruption in
the DB structure.

yum's psychotic handling of transactions notwithstanding, of course.

Michael

--
Michael Jennings (a.k.a. KainX) http://www.kainx.org/ <***@kainx.org>
n + 1, Inc., http://www.nplus1.net/ Author, Eterm (www.eterm.org)
-----------------------------------------------------------------------
"I don't care if you win or lose, just as long as you win."
-- Vince Lombardi
seth vidal
2006-11-29 21:16:45 UTC
Permalink
On Wed, 2006-11-29 at 16:08 -0500, Michael Jennings wrote:
> On Wednesday, 29 November 2006, at 16:04:49 (-0500),
> seth vidal wrote:
>
> > However, if it were possible to corrupt the ext2 filesystem by
> > performing frequent reads and writes
>
> It is.
>
> > we would consider that a bug in ext2, not in the program making the
> > writes.
>
> Not necessarily. You're making a rather large ASSumption that the
> problem doesn't like with the program, or the VFS layer, or some other
> component.
>
> In any event...what's your point? I have yet to see a scenario where
> frequent reads from and writes to the RPM DB, which were allowed to
> complete successfully without interruption, caused any corruption in
> the DB structure.
>
> yum's psychotic handling of transactions notwithstanding, of course.

It's psychotic b/c it makes lots of consecutive reads? Interesting.

-sv
Matthew Miller
2006-11-29 21:18:20 UTC
Permalink
On Wed, Nov 29, 2006 at 04:16:45PM -0500, seth vidal wrote:
> > yum's psychotic handling of transactions notwithstanding, of course.
> It's psychotic b/c it makes lots of consecutive reads? Interesting.

Frankly, I don't care how psychotic it may or may not be -- it shouldn't
matter.

--
Matthew Miller ***@mattdm.org <http://mattdm.org/>
Boston University Linux ------> <http://linux.bu.edu/>
James Olin Oden
2006-11-29 22:28:25 UTC
Permalink
On 11/29/06, Matthew Miller <***@mattdm.org> wrote:
> On Wed, Nov 29, 2006 at 04:16:45PM -0500, seth vidal wrote:
> > > yum's psychotic handling of transactions notwithstanding, of course.
> > It's psychotic b/c it makes lots of consecutive reads? Interesting.
>
> Frankly, I don't care how psychotic it may or may not be -- it shouldn't
> matter.
>
If what your saying is the case then their is no need for MySQL to
support transactions anymore. That would be cool, because the code
could be greatly simplified.

Again, no emotion, I'm just trying to followin your logical assertions.

...james
Michael Jennings
2006-11-29 21:27:30 UTC
Permalink
On Wednesday, 29 November 2006, at 16:16:45 (-0500),
seth vidal wrote:

> > yum's psychotic handling of transactions notwithstanding, of course.
>
> It's psychotic b/c it makes lots of consecutive reads? Interesting.

There you go with your ASSumptions again. That's going to start
chafing if you're not careful.



On Wednesday, 29 November 2006, at 16:18:20 (-0500),
Matthew Miller wrote:

> Frankly, I don't care how psychotic it may or may not be -- it
> shouldn't matter.

Ah, I see. "The RPM DB should be robust enough to withstand any and
all manner or means of stupidity which may be applied against it."
Gotcha.

If you need me, I'll be outside pouring Diet Coke into my gas tank and
filing a repair claim against my warranty.

Michael

--
Michael Jennings (a.k.a. KainX) http://www.kainx.org/ <***@kainx.org>
n + 1, Inc., http://www.nplus1.net/ Author, Eterm (www.eterm.org)
-----------------------------------------------------------------------
"If the President knowingly lies to the American people, he should
immediately resign." -- Bill Clinton in 1974
Matthew Miller
2006-11-29 21:31:08 UTC
Permalink
On Wed, Nov 29, 2006 at 04:27:30PM -0500, Michael Jennings wrote:
> Ah, I see. "The RPM DB should be robust enough to withstand any and
> all manner or means of stupidity which may be applied against it."
> Gotcha.

By your clear tone, I'm pretty sure not only do you not "gotcha" at all, but
willfully don't want to and further discussion is futile.


> If you need me, I'll be outside pouring Diet Coke into my gas tank and
> filing a repair claim against my warranty.

What a stupid analogy.

--
Matthew Miller ***@mattdm.org <http://mattdm.org/>
Boston University Linux ------> <http://linux.bu.edu/>
Michael Jennings
2006-11-29 21:37:36 UTC
Permalink
On Wednesday, 29 November 2006, at 16:31:08 (-0500),
Matthew Miller wrote:

> By your clear tone, I'm pretty sure not only do you not "gotcha" at
> all, but willfully don't want to and further discussion is futile.

Feel free to jump to any conclusion you like. Or you can correct my
restatement of your message to show me where I erred. Your choice.

> > If you need me, I'll be outside pouring Diet Coke into my gas tank
> > and filing a repair claim against my warranty.
>
> What a stupid analogy.

I don't think it's stupid at all. But since all you have to offer is
adjectives rather than a basis for your claims, I cannot refute your
statement. I can only offer a retort of equal logical merit:

Your momma.

Michael

--
Michael Jennings (a.k.a. KainX) http://www.kainx.org/ <***@kainx.org>
n + 1, Inc., http://www.nplus1.net/ Author, Eterm (www.eterm.org)
-----------------------------------------------------------------------
"If everything is coming your way, you're in the wrong lane."
-- fortune
Matthew Miller
2006-11-29 21:12:30 UTC
Permalink
On Wed, Nov 29, 2006 at 04:08:50PM -0500, Michael Jennings wrote:
> Not necessarily. You're making a rather large ASSumption that the
> problem doesn't like with the program, or the VFS layer, or some other
> component.

A VFS layer or some other bug maybe, but you're seriously suggesting that
it's not a bug in something kernelspace if a bug in a userspace program
can corrupt your filesystem?


--
Matthew Miller ***@mattdm.org <http://mattdm.org/>
Boston University Linux ------> <http://linux.bu.edu/>
Michael Jennings
2006-11-29 21:39:25 UTC
Permalink
On Wednesday, 29 November 2006, at 16:12:30 (-0500),
Matthew Miller wrote:

> A VFS layer or some other bug maybe, but you're seriously suggesting
> that it's not a bug in something kernelspace if a bug in a userspace
> program can corrupt your filesystem?

Yes.

dd if=/dev/urandom of=/dev/hda

Michael

--
Michael Jennings (a.k.a. KainX) http://www.kainx.org/ <***@kainx.org>
n + 1, Inc., http://www.nplus1.net/ Author, Eterm (www.eterm.org)
-----------------------------------------------------------------------
"Know that I love you, and no matter what, I'll see you again."
-- Brian Sweeney, passenger on a hijacked airliner, to his wife
Jeff Johnson
2006-11-29 21:38:04 UTC
Permalink
On 11/29/06, seth vidal <***@linux.duke.edu> wrote:
> On Wed, 2006-11-29 at 15:58 -0500, Michael Jennings wrote:
> > On Wednesday, 29 November 2006, at 15:48:17 (-0500),
> > Tony Nelson wrote:
> >
> > > It disapoints me that you think it is not RPM's responsibility to
> > > protect its database.
> >
> > It is RPM's responsibility to provide database backups as much as it
> > is the kernel's responsibility to back up your ext2 filesystems.
> >
>
> However, if it were possible to corrupt the ext2 filesystem by
> performing frequent reads and writes we would consider that a bug in
> ext2, not in the program making the writes.
>

OK, lets carry your ext2 example to its end-point.

Let's say you identify the physical block in a file, verfify that indeed the
block has exactly the content yopu expect by reading the physical device,
and you save that pyhsical block number somewhere for later use.

<time passes, file is changed/copied>

Now you take your physical block no and go read the data (again)
from the physical device.

Why are you surprised that the contents at the phsical blkno location
have changed? The change is certainly not due to ext2 corruption.

That is essentially what yum is doing, taking a join key (aka header instance)
out of an iterator locking context by opening and closing a Berkeley DB
environment repeatedly, i.e. releasing locks. With concurrent access (and no
persistent locking), other rpmdb accesses can/will open the rpmdb, thereby
changing the contents.

I will attempt a reproducer for the yum problem shortly. Should not be
very hard.

FWIW, there's another kernel problem I saw last night with mmap randomization
(which is used by Berkeley DB) in the FC6 kernel that I should be able
to identify a reproducer for shortly.

73 de Jeff
James Olin Oden
2006-11-29 22:26:19 UTC
Permalink
On 11/29/06, seth vidal <***@linux.duke.edu> wrote:
> On Wed, 2006-11-29 at 15:58 -0500, Michael Jennings wrote:
> > On Wednesday, 29 November 2006, at 15:48:17 (-0500),
> > Tony Nelson wrote:
> >
> > > It disapoints me that you think it is not RPM's responsibility to
> > > protect its database.
> >
> > It is RPM's responsibility to provide database backups as much as it
> > is the kernel's responsibility to back up your ext2 filesystems.
> >
>
> However, if it were possible to corrupt the ext2 filesystem by
> performing frequent reads and writes we would consider that a bug in
> ext2, not in the program making the writes.
>
Honestly, in simple terms with no emotion , Jeff is saying that your
doing updates to a database that belong in the same "transaction"
across multiple "transactions" thus loosing all your locks. If you
want transactional semantics, you have to do all the updates and reads
within the same transaction. Right? Or am I smoking crack?

Cheers...james
seth vidal
2006-11-29 22:32:28 UTC
Permalink
On Wed, 2006-11-29 at 17:26 -0500, James Olin Oden wrote:
> On 11/29/06, seth vidal <***@linux.duke.edu> wrote:
> > On Wed, 2006-11-29 at 15:58 -0500, Michael Jennings wrote:
> > > On Wednesday, 29 November 2006, at 15:48:17 (-0500),
> > > Tony Nelson wrote:
> > >
> > > > It disapoints me that you think it is not RPM's responsibility to
> > > > protect its database.
> > >
> > > It is RPM's responsibility to provide database backups as much as it
> > > is the kernel's responsibility to back up your ext2 filesystems.
> > >
> >
> > However, if it were possible to corrupt the ext2 filesystem by
> > performing frequent reads and writes we would consider that a bug in
> > ext2, not in the program making the writes.
> >
> Honestly, in simple terms with no emotion , Jeff is saying that your
> doing updates to a database that belong in the same "transaction"
> across multiple "transactions" thus loosing all your locks. If you
> want transactional semantics, you have to do all the updates and reads
> within the same transaction. Right? Or am I smoking crack?

but we're not doing updates to a database in that way.

The only time yum is doing the quick open-read-closes of the rpmdb is
when it is reading in the info from the rpmdb or getting prco info from
a package. Yum switched from:
open rpmdb ro, and use that for all ro interactions
to
open rpmdb ro, get the index number of the header for a package, save
the index number into a dict, close the rpmdb (repeat)


There's no updating going on.

-sv
Jeff Johnson
2006-11-29 23:10:57 UTC
Permalink
On 11/29/06, seth vidal <***@linux.duke.edu> wrote:
> but we're not doing updates to a database in that way.
>
> The only time yum is doing the quick open-read-closes of the rpmdb is
> when it is reading in the info from the rpmdb or getting prco info from
> a package. Yum switched from:
> open rpmdb ro, and use that for all ro interactions
> to
> open rpmdb ro, get the index number of the header for a package, save
> the index number into a dict, close the rpmdb (repeat)
>

The index number is not guaranteed to remain constant just because yum opened
an rpmdb ro and then closed.

>
> There's no updating going on.
>

Yep. Index retrieved, saved in dict, rpmdb closed. No update there.

The next question is
What does yum do to guarantee that the index number from the dict
is meaningful
when it actually *does* get around to doing an update?

And if the answer is that
Yum has its own locks.
well, that ain't good enough in the real world, /bin/rpm certainly
does not pay attention
to yum locks, for one easy reproducer.

73 de Jeff
seth vidal
2006-11-29 23:20:18 UTC
Permalink
On Wed, 2006-11-29 at 18:10 -0500, Jeff Johnson wrote:
> On 11/29/06, seth vidal <***@linux.duke.edu> wrote:
> > but we're not doing updates to a database in that way.
> >
> > The only time yum is doing the quick open-read-closes of the rpmdb is
> > when it is reading in the info from the rpmdb or getting prco info from
> > a package. Yum switched from:
> > open rpmdb ro, and use that for all ro interactions
> > to
> > open rpmdb ro, get the index number of the header for a package, save
> > the index number into a dict, close the rpmdb (repeat)
> >
>
> The index number is not guaranteed to remain constant just because yum opened
> an rpmdb ro and then closed.
>
> >
> > There's no updating going on.
> >
>
> Yep. Index retrieved, saved in dict, rpmdb closed. No update there.
>
> The next question is
> What does yum do to guarantee that the index number from the dict
> is meaningful
> when it actually *does* get around to doing an update?
>
> And if the answer is that
> Yum has its own locks.
> well, that ain't good enough in the real world, /bin/rpm certainly
> does not pay attention
> to yum locks, for one easy reproducer.
>

I'm not saying yum does guarantee those. I'm asking why does the above
cause the rpmdb to have errors?

-sv
Jeff Johnson
2006-11-29 23:24:28 UTC
Permalink
On Nov 29, 2006, at 6:20 PM, seth vidal wrote:

>
> I'm not saying yum does guarantee those. I'm asking why does the above
> cause the rpmdb to have errors?
>

Dunno (yet), there are likely several intersecting causes.

No matter what, yum should go back to last known good. It's easier
debugging,
and seemed to work at some point in time.

The only reason that I've heard for the open-extract-close change is
to handle
signals within yum, and that can easily be achieved in other ways.
Nasrat has
details.

As I tried to tell you in 2004 ...

... but you stopped listening.

73 de Jeff
Tony Nelson
2006-12-01 02:53:13 UTC
Permalink
At 6:24 PM -0500 11/29/06, Jeff Johnson wrote:
>On Nov 29, 2006, at 6:20 PM, seth vidal wrote:
>
>>
>> I'm not saying yum does guarantee those. I'm asking why does the above
>> cause the rpmdb to have errors?
>>
>
>Dunno (yet), there are likely several intersecting causes.
>
>No matter what, yum should go back to last known good. It's easier
>debugging,
>and seemed to work at some point in time.
>
>The only reason that I've heard for the open-extract-close change is
>to handle
>signals within yum, and that can easily be achieved in other ways.
>Nasrat has
>details.

If handling of Ctl-C is the main reason for yum's new repeated RPM database
opens / closes, I have a suggestion or two.

RPM wants to catch the signals so it can be sure to close the RPM database
properly in all cases. Yum also tries to close the RPM database properly
in all instances. It should be enough if yum does it.

1) Yum could steal back the SIGINT handler, as I do in my Stablemirror yum
plugin <http://georgeanelson.com/stablemirror.htm> (for FC5 yum, not really
needed anymore with the mirrorlist improvements). It calls

signal.signal(signal.SIGINT, signal.default_int_handler)

in an override of _mirror_try() in a subclass of MirrorGroup, to repeatedly
steal back the SIGINT signal. Yum could do the same right after opening
the RPM database, but also in other places, as I think that RPM will
sometimes take the signals again later just to be "safe". This can be done
now.

2) RPM's developers could be asked for a new API to tell RPM not to take
certain signals. This could be done for FC7.
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Jeff Johnson
2006-12-01 11:55:30 UTC
Permalink
On Nov 30, 2006, at 9:53 PM, Tony Nelson wrote:

> At 6:24 PM -0500 11/29/06, Jeff Johnson wrote:
>> On Nov 29, 2006, at 6:20 PM, seth vidal wrote:
>>
>>>
>>> I'm not saying yum does guarantee those. I'm asking why does the
>>> above
>>> cause the rpmdb to have errors?
>>>
>>
>> Dunno (yet), there are likely several intersecting causes.
>>
>> No matter what, yum should go back to last known good. It's easier
>> debugging,
>> and seemed to work at some point in time.
>>
>> The only reason that I've heard for the open-extract-close change is
>> to handle
>> signals within yum, and that can easily be achieved in other ways.
>> Nasrat has
>> details.
>
> If handling of Ctl-C is the main reason for yum's new repeated RPM
> database
> opens / closes, I have a suggestion or two.
>
> RPM wants to catch the signals so it can be sure to close the RPM
> database
> properly in all cases. Yum also tries to close the RPM database
> properly
> in all instances. It should be enough if yum does it.
>
> 1) Yum could steal back the SIGINT handler, as I do in my
> Stablemirror yum
> plugin <http://georgeanelson.com/stablemirror.htm> (for FC5 yum,
> not really
> needed anymore with the mirrorlist improvements). It calls
>
> signal.signal(signal.SIGINT, signal.default_int_handler)
>
> in an override of _mirror_try() in a subclass of MirrorGroup, to
> repeatedly
> steal back the SIGINT signal. Yum could do the same right after
> opening
> the RPM database, but also in other places, as I think that RPM will
> sometimes take the signals again later just to be "safe". This can
> be done
> now.
>

What does signal.default_int_handler do? Exit?

If so, that explains why your database is in need of so much
verification.

> 2) RPM's developers could be asked for a new API to tell RPM not to
> take
> certain signals. This could be done for FC7.

You ought to make the suggestion on <fedora-devel> and on <rpm-devel>
mailing lists.

While you're at it, why don't you create bugzilla reports for every
Red Hat, SuSE
and Mandrive product as well.

73 de Jeff
Tony Nelson
2006-12-01 16:47:26 UTC
Permalink
At 6:55 AM -0500 12/1/06, Jeff Johnson wrote:
>On Nov 30, 2006, at 9:53 PM, Tony Nelson wrote:
>
>> At 6:24 PM -0500 11/29/06, Jeff Johnson wrote:
>>> On Nov 29, 2006, at 6:20 PM, seth vidal wrote:
>>>
>>>>I'm not saying yum does guarantee those. I'm asking why does the above
>>>>cause the rpmdb to have errors?
>>>
>>> Dunno (yet), there are likely several intersecting causes.
>>>
>>>No matter what, yum should go back to last known good. It's easier
>>>debugging, and seemed to work at some point in time.
>>>
>>>The only reason that I've heard for the open-extract-close change is to
>>>handle signals within yum, and that can easily be achieved in other
>>>ways. Nasrat has details.
>>
>>If handling of Ctl-C is the main reason for yum's new repeated RPM
>>database opens / closes, I have a suggestion or two.
>>
>>RPM wants to catch the signals so it can be sure to close the RPM
>>database properly in all cases. Yum also tries to close the RPM database
>>properly in all instances. It should be enough if yum does it.
>>
>>1) Yum could steal back the SIGINT handler, as I do in my Stablemirror
>>yum plugin <http://georgeanelson.com/stablemirror.htm> (for FC5 yum, not
>>really needed anymore with the mirrorlist improvements). It calls
>>
>> signal.signal(signal.SIGINT, signal.default_int_handler)
>>
>>in an override of _mirror_try() in a subclass of MirrorGroup, to
>>repeatedly steal back the SIGINT signal. Yum could do the same right
>>after opening the RPM database, but also in other places, as I think that
>>RPM will sometimes take the signals again later just to be "safe". This
>>can be done now.
>
>What does signal.default_int_handler do? Exit?

It does the "Python thing" of raising a Python Exception. SIGINT produces
a KeyboardInterrupt exception. Any unhandled exception will terminate a
python program, but Yum handles KeyboardInterrupt, and either does
something useful, such as moving to the next download mirror (that happens
in a library), or it terminates gracefully, closing the RPM database
cleanly.

My understanding is that the changes to yum to repeatedly open and close
the RPM database are to work around RPM's seizing of SIGINT. I believe
this proposal is a cleaner, safer, and more effective workaround for that
issue. (More effective in that fewer SIGINTs are eaten by RPM.) I have
not received any complaints about Stablemirror from its (very few) users
other than that it doesn't work at all on FC6, where it is really no longer
needed.


>If so, that explains why your database is in need of so much
>verification.

No, yum is careful about closing the database when it exits. My reading of
what RPM does on receipt of a signal and what yum does was that they are
equivalent.

And it is an ASSumption ;-b that my RPM database needs more care than
others. I'm merely taking better care of it than most. 8-)


>> 2) RPM's developers could be asked for a new API to tell RPM not to
>> take certain signals. This could be done for FC7.
>
>You ought to make the suggestion on <fedora-devel> and on <rpm-devel>
>mailing lists.

Well, at the moment, it seems that the two people most involved, you and
Seth, are communicating here. If my idea is acceptable, you and Seth will
say so, and if Seth requests me to I will poste elsewhere and file a RFE
against RPM. If not, my idea won't be used, so a request to enhance RPM
would be a waste of your time. Still, if you ask, I'll do those things.

Seth doesn't always respond to my posts (even if they are in reply to a
reply of his to one of my posts). When he does respond, it is usually to
day that whatever I've proposed is an unclean hack. He may think that it
is cleaner for yum to repeatedly open and close the RPM database than for
it to steal back the SIGINT signal, but I don't know.


>While you're at it, why don't you create bugzilla reports for every
>Red Hat, SuSE and Mandrive product as well.

That would be inappropriate, as I don't use those products.
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Jeff Johnson
2006-12-01 17:12:30 UTC
Permalink
On Dec 1, 2006, at 11:47 AM, Tony Nelson wrote:

> At 6:55 AM -0500 12/1/06, Jeff Johnson wrote:

...

>>
>> What does signal.default_int_handler do? Exit?
>
> It does the "Python thing" of raising a Python Exception. SIGINT
> produces
> a KeyboardInterrupt exception. Any unhandled exception will
> terminate a
> python program, but Yum handles KeyboardInterrupt, and either does
> something useful, such as moving to the next download mirror (that
> happens
> in a library), or it terminates gracefully, closing the RPM database
> cleanly.
>


Heh, we are having vocabulary problems.

There's lots more (and lots more critical code) to fielding a signal
than
just raising an exception.

That's the land of rpmlib, in C, not python.

>
> My understanding is that the changes to yum to repeatedly open and
> close
> the RPM database are to work around RPM's seizing of SIGINT. I
> believe
> this proposal is a cleaner, safer, and more effective workaround
> for that
> issue. (More effective in that fewer SIGINTs are eaten by RPM.) I
> have
> not received any complaints about Stablemirror from its (very few)
> users
> other than that it doesn't work at all on FC6, where it is really
> no longer
> needed.
>

Free ths SIGINT! Bad RPM for preventing yum (and FLOSS) from handling
^C!

(aside) That's a joke, son, perhaps too dry for your taste.

>
>> If so, that explains why your database is in need of so much
>> verification.
>
> No, yum is careful about closing the database when it exits. My
> reading of
> what RPM does on receipt of a signal and what yum does was that
> they are
> equivalent.
>

And what happens if yum does reach the ppint "it exits". That state
is (ahem)
less than graceful, and is happening on every segfault, leaving stale
locks,
users get annoyed and start trying to verify rpm databases.

> And it is an ASSumption ;-b that my RPM database needs more care than
> others. I'm merely taking better care of it than most. 8-)
>
>
>>> 2) RPM's developers could be asked for a new API to tell RPM not to
>>> take certain signals. This could be done for FC7.
>>
>> You ought to make the suggestion on <fedora-devel> and on <rpm-devel>
>> mailing lists.
>
> Well, at the moment, it seems that the two people most involved,
> you and
> Seth, are communicating here. If my idea is acceptable, you and
> Seth will
> say so, and if Seth requests me to I will poste elsewhere and file
> a RFE
> against RPM. If not, my idea won't be used, so a request to
> enhance RPM
> would be a waste of your time. Still, if you ask, I'll do those
> things.
>

FWIW, rpm has a bit mask, a bit is set for every signal received.

That set of bits has been in rpmlib for years. All that yum needs to
do is test
the bit, and exit if signal is received.

Reopening a Berkeley DB repeatedly in order to handle ^C et al is nuts.

And that API has been there, and I've pointed out its use to Seth,
and to
other yum developers, repeatedly, since 2004. To no avail.

> Seth doesn't always respond to my posts (even if they are in reply
> to a
> reply of his to one of my posts). When he does respond, it is
> usually to
> day that whatever I've proposed is an unclean hack. He may think
> that it
> is cleaner for yum to repeatedly open and close the RPM database
> than for
> it to steal back the SIGINT signal, but I don't know.
>

Free the SIGINT!!! Sigh ...

>
>> While you're at it, why don't you create bugzilla reports for every
>> Red Hat, SuSE and Mandrive product as well.
>
> That would be inappropriate, as I don't use those products.

But a solution for RPM (and perhaps for yum) needs to work on linux,
not just for Fedora.

But feel free to do whatever with rpm and yum in Fedora. Have fun!

73 de Jeff
James Olin Oden
2006-12-01 19:43:10 UTC
Permalink
>
> My understanding is that the changes to yum to repeatedly open and close
> the RPM database are to work around RPM's seizing of SIGINT. I believe
> this proposal is a cleaner, safer, and more effective workaround for that
> issue. (More effective in that fewer SIGINTs are eaten by RPM.)
And racy. Its not really solving the problem. In general its been my
experience that signal handlers in libraries is a bad thing. That
said librpm is IMO a very special library. The issue is that there
is no way for rpm to know (today but not maybe tommorow) what the
consumers needs are in the event of a signal. That said there should
be no way for a consumer to know what needs to be done by librpm in
the event a signal is caught (the particulars that is). So what do
you do.

Probably the right thing to do is provide one of two things (or both):

- have librpm provide a method for a consumer to register a
callback to be used
in the event a signal is caught (probably sending what signal and
what rpms
action will be, such as exiting or not).
- Have rpm have a single method to call in the event that the
program is exiting.
This would require rpm to internally track all transactions in
existance in a
processes, or at least all DB connections.

So who is going to write the patch to really fix this problem?

In all universes your still going to have issues with SIGCHLD since
rpm clearly needs this when running scriptlets.

>I have
> not received any complaints about Stablemirror from its (very few) users
> other than that it doesn't work at all on FC6, where it is really no longer
> needed.
>
>
> >If so, that explains why your database is in need of so much
> >verification.
>
> No, yum is careful about closing the database when it exits.
YUM should not have to know there is a database to close really. This
is not a yum criticism necessarily. RPM hides access to the database
behind an rpmts (transaction), and an rpmtsUnlink() of the transaction
will automatically close the database. But what you really need is
better coperation between yourself and rpm without adding a very tight
coupling between yum and librpm.

> My reading of
> what RPM does on receipt of a signal and what yum does was that they are
> equivalent.
>
You shouldn't assume that, at least if your trying to treat librpm
like a black box.

> And it is an ASSumption ;-b that my RPM database needs more care than
> others. I'm merely taking better care of it than most. 8-)
>
Please keep it professional and mature. Together with all parties
showing mutual respect we can solve problems.

Cheers...james
Jeff Johnson
2006-12-01 21:02:09 UTC
Permalink
On 12/1/06, James Olin Oden <***@gmail.com> wrote:
> >
> > My understanding is that the changes to yum to repeatedly open and close
> > the RPM database are to work around RPM's seizing of SIGINT. I believe
> > this proposal is a cleaner, safer, and more effective workaround for that
> > issue. (More effective in that fewer SIGINTs are eaten by RPM.)
> And racy. Its not really solving the problem. In general its been my
> experience that signal handlers in libraries is a bad thing. That
> said librpm is IMO a very special library. The issue is that there
> is no way for rpm to know (today but not maybe tommorow) what the
> consumers needs are in the event of a signal. That said there should
> be no way for a consumer to know what needs to be done by librpm in
> the event a signal is caught (the particulars that is). So what do
> you do.
>
> Probably the right thing to do is provide one of two things (or both):
>
> - have librpm provide a method for a consumer to register a
> callback to be used
> in the event a signal is caught (probably sending what signal and
> what rpms
> action will be, such as exiting or not).

I can add the callback if necessary, but there's a fair amount of overhead
that would be needed ...


> - Have rpm have a single method to call in the event that the
> program is exiting.
> This would require rpm to internally track all transactions in
> existance in a
> processes, or at least all DB connections.
>

... rpmlib is already keeping track of the list of all opened rpmdb's,
no biggie ...

> So who is going to write the patch to really fix this problem?
>

I am. I'll salt in checks of rpmsqCaught on exiting method exits,
and throw the keyboard exception when any of the ^C usual candidates
occur.

> In all universes your still going to have issues with SIGCHLD since
> rpm clearly needs this when running scriptlets.
>

Sssshhh! Lest "Free the SIGCHLD!" becomes the next droning chant ...
There are already yum bug reports being dumped on rpm because scripts,
indeed, fail, with output to stderr, and so its a rpm, not a yum, problem
to deal with the messed up progress bars.

> >I have
> > not received any complaints about Stablemirror from its (very few) users
> > other than that it doesn't work at all on FC6, where it is really no longer
> > needed.
> >
> >
> > >If so, that explains why your database is in need of so much
> > >verification.
> >
> > No, yum is careful about closing the database when it exits.
> YUM should not have to know there is a database to close really. This
> is not a yum criticism necessarily. RPM hides access to the database
> behind an rpmts (transaction), and an rpmtsUnlink() of the transaction
> will automatically close the database. But what you really need is
> better coperation between yourself and rpm without adding a very tight
> coupling between yum and librpm.
>

Which is why I removed all db methods from the rpmlib API, and
tried to indicate that Ya really don't have to call ts.CloseDB() in
python __doc__ strings. Been there for years ...

> > My reading of
> > what RPM does on receipt of a signal and what yum does was that they are
> > equivalent.
> >
> You shouldn't assume that, at least if your trying to treat librpm
> like a black box.
>
> > And it is an ASSumption ;-b that my RPM database needs more care than
> > others. I'm merely taking better care of it than most. 8-)
> >
> Please keep it professional and mature. Together with all parties
> showing mutual respect we can solve problems.
>

Thanks for the help, you're much nicer than I am ...

73 de Jeff
Tony Nelson
2006-12-03 04:12:16 UTC
Permalink
At 4:02 PM -0500 12/1/06, Jeff Johnson wrote:
>On 12/1/06, James Olin Oden <***@gmail.com> wrote:
>> >
>> > My understanding is that the changes to yum to repeatedly open and close
>> > the RPM database are to work around RPM's seizing of SIGINT. I believe
>> > this proposal is a cleaner, safer, and more effective workaround for that
>> > issue. (More effective in that fewer SIGINTs are eaten by RPM.)
>> And racy. Its not really solving the problem. In general its been my
>> experience that signal handlers in libraries is a bad thing. That
>> said librpm is IMO a very special library. The issue is that there
>> is no way for rpm to know (today but not maybe tommorow) what the
>> consumers needs are in the event of a signal. That said there should
>> be no way for a consumer to know what needs to be done by librpm in
>> the event a signal is caught (the particulars that is). So what do
>> you do.
>>
>> Probably the right thing to do is provide one of two things (or both):
>>
>> - have librpm provide a method for a consumer to register a
>> callback to be usedin the event a signal is caught (probably
>> sending what signal and what rpms action will be, such as
>> exiting or not).

This sounds like a reasonable way to cope with RPM's justified invasive
paranoia. I know that it is hard to do much in a signal handler; what
Python does is defer the actual handling until task time, much as RPM does.
I suppose this callback would also be called at task time, and not in the
signal handler?

As possible use cases:

In yum, a user may press Ctl-C to get yum to give up on a sluggish mirror
and move on to the next, hopefully faster, mirror. This works more or less
well with different versions of yum, or Python (2.5 and 2.4.4 are good), or
Fedora, and mostly depends on the possibly accidental behavior of a library
module that yum uses.

More hypothetically, yum and other programs start subprocesses and might
also want the SIGCHLD in such cases, for reasons as good as RPM's.

[ My patch to Python <http://python.org/sf/1519025>, which fixes Python's
Ctl-C handling for sockets with timeouts, was accepted and ships in Python
2.5 and 2.4.4. It helps yum quite a bit in responding to Ctl-C. The rest
of making Ctl-C work well during the downloading phase of yum was done in
my Stablemirror yum plugin by stealing back the SIGINT signal, in a
subclass of that library module. Should the user choose to exit or the
KeyboardInterrupt exception somehow not be handled, yum's existing code for
the purpose would handle closing the RPM database. I was content to let
RPM (and yum) have its way during the actual installation phases of yum, as
that seemed safest -- even if the RPM database is properly closed, with
transactions either all in or all out, just having only some of a dependent
set of packages installed seems like a bad idea. ]


>I can add the callback if necessary, but there's a fair amount of overhead
>that would be needed ...

Darn. :-( I sort of thought that it would just be called from
rpmdbCheckSignals().


>> - Have rpm have a single method to call in the event that the
>> program is exiting. This would require rpm to internally
>> track all transactions in existance in a processes, or at
>> least all DB connections.
>
>... rpmlib is already keeping track of the list of all opened rpmdb's,
>no biggie ...
>
>> So who is going to write the patch to really fix this problem?
>>
>
>I am. I'll salt in checks of rpmsqCaught on exiting method exits,
>and throw the keyboard exception when any of the ^C usual candidates
>occur.

Does this mean that when a SIGINT signal is received, rpmlib would
gracefully stop whatever it was doing and then tell the application that it
got a SIGINT, or that it would gracefully exit the application? Please
pardon my ignorance. What happens when it is client code that is
interrupted by the signal, not a call to rpmlib? It appears to me that
usually rpmlib has been opened up and has set its signal handlers, and then
the application does things and calls rpmlib from time to time, and that a
signal can be received when application code is active, not rpmlib code. I
may just not be following things. I might try to trace through it all, but
my gdb-fu is not strong and tracing Python code to rpmlib code seems a bit
much for me.


>> In all universes your still going to have issues with SIGCHLD since
>> rpm clearly needs this when running scriptlets.
>>
>
>Sssshhh! Lest "Free the SIGCHLD!" becomes the next droning chant ...
>There are already yum bug reports being dumped on rpm because scripts,
>indeed, fail, with output to stderr, and so its a rpm, not a yum, problem
>to deal with the messed up progress bars.

Rpmlib opened up a can of worms when it stole the signals in the first
place. Sure, it needs something like that in order to be robust in the
face of stupidity or carelessness, but re-canning still takes a bigger can.

If James' first proposal were adopted, then rpmlib clients would also get
the SIGCHLD and could decide what to do about it.


Note that I'm not yum's author (Seth Vidal), and I hope I haven't
represented myself as such.
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Jeff Johnson
2006-12-03 04:49:07 UTC
Permalink
On Dec 2, 2006, at 11:12 PM, Tony Nelson wrote:

>
> Rpmlib opened up a can of worms when it stole the signals in the first
> place. Sure, it needs something like that in order to be robust in
> the
> face of stupidity or carelessness, but re-canning still takes a
> bigger can.
>

Guy, you don't have a clue ...

rpm-3.0.x had *CATASTROPHIC* database failures.

Yes, whole machine, reinstall, scrub to bare disk.

Which is why the switch was made to Berkeley DB.

And a concurrent access model was chosen so the widdle
python script kiddies would not have to learn anything whatsoever
about database programming.

And you want to tell me that rpmlib "opened a can of worms"
because some idiot python programmers have decided to
reopen a Berkeley DB for every header instance in order
to handle ^C??????

This code was *STABLE* even if not perfect until a dweeb made a
change to yum
on September 3, 2006.

I haven't even begun to rip up yum yet. Taking a header instance
and saving it in a dictionary *RELEASING ALL LOCKS* is psychotic.

And delivering FC6 (and soon RHEL5) with this degree of instability,
refusing to upgrade rpm, blabbering on about forking rpm, and accusing
me of sabotaging the source code, when I am not part of FC, nor
@redhat.com, is plain and simply not my problem.

Whatever I'm doing, you and FC users are highly unlikely to gain any
benefit
whatsoever.

73 de Jeff
Tony Nelson
2006-12-03 20:49:21 UTC
Permalink
At 11:49 PM -0500 12/2/06, Jeff Johnson wrote:
>On Dec 2, 2006, at 11:12 PM, Tony Nelson wrote:
>
>>
>> Rpmlib opened up a can of worms when it stole the signals in the first
>> place. Sure, it needs something like that in order to be robust in
>> the face of stupidity or carelessness, but re-canning still takes a
>> bigger can.
>
>Guy, you don't have a clue ...
>
>rpm-3.0.x had *CATASTROPHIC* database failures.
>
>Yes, whole machine, reinstall, scrub to bare disk.
>
>Which is why the switch was made to Berkeley DB.
>
>And a concurrent access model was chosen so the widdle python script
>kiddies would not have to learn anything whatsoever about database
>programming.
>
>And you want to tell me that rpmlib "opened a can of worms"
>because some idiot python programmers have decided to
>reopen a Berkeley DB for every header instance in order
>to handle ^C??????

No, I did not say that. I said that a library that takes signal handling
away from its client application has opened a can of worms, and that RPM
needed to do something like that in order to be safe. I did not say
anything about Berkely DB, or about earlier versions of RPM. I most
certainly did not say, nor do I believe, that the particular method used by
yum to regain the use of Ctl-C was correct or workable.

I have been promoting a different way to regain the use of Ctl-C.
According to my reading of yum, my way is safe for the RPM database, as all
the exit paths from yum are equivalent, so if any are safe all are safe.
My reason for taking back the SIGINT signal is that RPM gets them even when
not in a call to RPM, so they can be delayed for /minutes/ and then pop up
after yum's package downloading completes. My understanding is that RPM
doesn't need any special action taken on SIGINT; it just needs its database
to be closed properly no matter how the application exits, and yum seems to
be doing that. I may be wrong about any of this, but I am not a proponent
of what yum is currently doing.

In any event, the "bigger can" involves something along the lines of
letting the client application know in a timely fashion about the signals
whether RPM has captured them or not. James' first suggestion looks good
to me, but insufficient. I think a function to get the current
signal-received flags would also be needed. I don't know how much work
Jame's callback would be, and I defer to your judgement. Maybe a function
to get the flags would be enough, if there is a reasonable place to use it
from yum; I will look into that and make a formal RFE for it if I can come
up with a simple patch to yum that would do the job.


>This code was *STABLE* even if not perfect until a dweeb made a
>change to yum on September 3, 2006.
>
>I haven't even begun to rip up yum yet. Taking a header instance
>and saving it in a dictionary *RELEASING ALL LOCKS* is psychotic.

I do not recall defending doing that. Please show me where I have done
that so I can apologise for it.

I am not the author of yum, I have not written any of the code in yum, and
I don't claim to have done so. Writing a yum plugin does not change yum,
and my plugin /does not ship/ with yum. I make it available only at my web
site (see sig for URL).

Umm, in order to help me when I try to read yum to understand what it is
doing, are you saying that yum is copying some data from RPM and then
assuming that it will not change across unlocking / relocking the database?
That is, it uses possibly stale data, but not invalid iterators? And
please pardon my ignorance of BerkelyDB; I will work to learn about it.


>And delivering FC6 (and soon RHEL5) with this degree of instability,
>refusing to upgrade rpm, blabbering on about forking rpm, and accusing
>me of sabotaging the source code, when I am not part of FC, nor
>@redhat.com, is plain and simply not my problem.

I do not maintain RPM or its Fedora package. I am not part of Fedora
Project, nor have I said that I was.

I posted on this list to ask for assistance with my RPM database-verifying
package. I received that assistance from you and others, but I also
received a lot of unwarranted and incorrectly aimed abuse.

I do not recall "[accusing you] of sabotaging the source code". Please
show me where I did that so I can apologise for it.


>Whatever I'm doing, you and FC users are highly unlikely to gain any
>benefit whatsoever.

That seems strange. In any event, you have been somewhat helpful to me and
you are clearly doing work to make RPM more robust in the face of both
happenstance and stupidity, and I thank you.
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Michael Jennings
2006-12-03 21:05:24 UTC
Permalink
On Sunday, 03 December 2006, at 15:49:21 (-0500),
Tony Nelson wrote:

> >Whatever I'm doing, you and FC users are highly unlikely to gain any
> >benefit whatsoever.
>
> That seems strange. In any event, you have been somewhat helpful to
> me and you are clearly doing work to make RPM more robust in the
> face of both happenstance and stupidity, and I thank you.

The point is that the Fedora Deities have a political beef with Jeff
and thus are unwilling to follow his tree or adopt the new rpm
4.4.7/4.4.8 features. So anything Jeff does is largely worthless for
you Fedora users.

Michael

--
Michael Jennings (a.k.a. KainX) http://www.kainx.org/ <***@kainx.org>
n + 1, Inc., http://www.nplus1.net/ Author, Eterm (www.eterm.org)
-----------------------------------------------------------------------
"Your best? Losers always whine about [doing] their best. Winners
go home and f*** the prom queen." -- Sean Connery, "The Rock"
Jeff Johnson
2006-12-03 21:07:17 UTC
Permalink
On Dec 3, 2006, at 3:49 PM, Tony Nelson wrote:

>
>
> No, I did not say that. I said that a library that takes signal
> handling
> away from its client application has opened a can of worms, and
> that RPM
> needed to do something like that in order to be safe. I did not say
> anything about Berkely DB, or about earlier versions of RPM. I most
> certainly did not say, nor do I believe, that the particular method
> used by
> yum to regain the use of Ctl-C was correct or workable.
>
> I have been promoting a different way to regain the use of Ctl-C.
> According to my reading of yum, my way is safe for the RPM
> database, as all
> the exit paths from yum are equivalent, so if any are safe all are
> safe.
> My reason for taking back the SIGINT signal is that RPM gets them
> even when
> not in a call to RPM, so they can be delayed for /minutes/ and then
> pop up
> after yum's package downloading completes. My understanding is
> that RPM
> doesn't need any special action taken on SIGINT; it just needs its
> database
> to be closed properly no matter how the application exits, and yum
> seems to
> be doing that. I may be wrong about any of this, but I am not a
> proponent
> of what yum is currently doing.
>
> In any event, the "bigger can" involves something along the lines of
> letting the client application know in a timely fashion about the
> signals
> whether RPM has captured them or not. James' first suggestion
> looks good
> to me, but insufficient. I think a function to get the current
> signal-received flags would also be needed. I don't know how much
> work
> Jame's callback would be, and I defer to your judgement. Maybe a
> function
> to get the flags would be enough, if there is a reasonable place to
> use it
> from yum; I will look into that and make a formal RFE for it if I
> can come
> up with a simple patch to yum that would do the job.
>

Look -- in case you can't tell -- the problem of ^C handling wrto yum
has been around for years.

And furthermore -- just like with --verifydb handling -- all the
peices have been in place for years.

The issue is getting yum to use what is already implemented.

It's already possible to set signal handlers outside of the signal
handler mechanism that is already implemented in rpmlib.

I do not need someone who has not looked at the code, nor does not
know the history, to tell me what needs to be done. Much of the
implementation
already exists in rpmlib, been there for years.

>
>> This code was *STABLE* even if not perfect until a dweeb made a
>> change to yum on September 3, 2006.
>>
>> I haven't even begun to rip up yum yet. Taking a header instance
>> and saving it in a dictionary *RELEASING ALL LOCKS* is psychotic.
>
> I do not recall defending doing that. Please show me where I have
> done
> that so I can apologise for it.
>
> I am not the author of yum, I have not written any of the code in
> yum, and
> I don't claim to have done so. Writing a yum plugin does not
> change yum,
> and my plugin /does not ship/ with yum. I make it available only
> at my web
> site (see sig for URL).
>
> Umm, in order to help me when I try to read yum to understand what
> it is
> doing, are you saying that yum is copying some data from RPM and then
> assuming that it will not change across unlocking / relocking the
> database?
> That is, it uses possibly stale data, but not invalid iterators? And
> please pardon my ignorance of BerkelyDB; I will work to learn about
> it.
>

Don't take it too personally, just look at the code some, and think a
bit more,
before answering. ;-)

>
>> And delivering FC6 (and soon RHEL5) with this degree of instability,
>> refusing to upgrade rpm, blabbering on about forking rpm, and
>> accusing
>> me of sabotaging the source code, when I am not part of FC, nor
>> @redhat.com, is plain and simply not my problem.
>
> I do not maintain RPM or its Fedora package. I am not part of Fedora
> Project, nor have I said that I was.
>

Neither do I. I find the situation quite ironic, and the lack of
stability
in FC6 (and perhaps RHEL5) disturbing.

Unfortunately my RPM professional reputation is coupled to FC's and
RHAT's,
so I *have* to care.

> I posted on this list to ask for assistance with my RPM database-
> verifying
> package. I received that assistance from you and others, but I also
> received a lot of unwarranted and incorrectly aimed abuse.
>
> I do not recall "[accusing you] of sabotaging the source code".
> Please
> show me where I did that so I can apologise for it.
>

You did not, others have.

>
>> Whatever I'm doing, you and FC users are highly unlikely to gain any
>> benefit whatsoever.
>
> That seems strange. In any event, you have been somewhat helpful
> to me and
> you are clearly doing work to make RPM more robust in the face of both
> happenstance and stupidity, and I thank you.

Thank you for thank you, the same to you. I wouldn't have
internalized --verifydb
if you hadn't shown up ;-)

73 de Jeff
Tony Nelson
2006-12-04 04:14:57 UTC
Permalink
At 4:07 PM -0500 12/3/06, Jeff Johnson wrote:

>Look -- in case you can't tell -- the problem of ^C handling wrto yum
>has been around for years.

I started with FC3. In FC3 it worked pretty well, kind of by accident, and
I came to like it. In FC5 it didn't work at all well. I fixed the part
that was a bug in Python, and made the rest of the fix a part of my hackish
Stablemirror yum plugin. Possibly my activity prompted yum's current
badness, though yum is doing it in a completely different way, or more
likely it just percolated to the top of the list and erupted now.

...
>Don't take it too personally, just look at the code some, and think a
>bit more, before answering. ;-)

OK.

...
>...I find the situation quite ironic, and the lack of
>stability in FC6 (and perhaps RHEL5) disturbing.

Well, ISTM from reading fedora-devel that they do as well. They don't seem
to be blaming RPM exactly, either.


>Unfortunately my RPM professional reputation is coupled to FC's and
>RHAT's, so I *have* to care.

Yes.


>> I do not recall "[accusing you] of sabotaging the source code".
>> Please show me where I did that so I can apologise for it.
>
>You did not, others have.

Thank you for the clarification.


>Thank you for thank you, the same to you. I wouldn't have
>internalized --verifydb if you hadn't shown up ;-)

You're welcome. I will try to be less irritating in the future.
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Jeff Johnson
2006-12-03 21:41:49 UTC
Permalink
On Dec 3, 2006, at 3:49 PM, Tony Nelson wrote:

>
>
> Umm, in order to help me when I try to read yum to understand what
> it is
> doing, are you saying that yum is copying some data from RPM and then
> assuming that it will not change across unlocking / relocking the
> database?
> That is, it uses possibly stale data, but not invalid iterators? And
> please pardon my ignorance of BerkelyDB; I will work to learn about
> it.
>

This code from yum/rpmUtils/transaction.py is what all the fuss is about

1.24 (mjs 03-Sep-06): self.open = True
1.24 (mjs 03-Sep-06):
1.24 (mjs 03-Sep-06): def __del__(self):
1.24 (mjs 03-Sep-06): # Automatically close the
rpm transaction when the reference is lost
1.24 (mjs 03-Sep-06): self.close()
1.24 (mjs 03-Sep-06):
1.24 (mjs 03-Sep-06): def close(self):
1.24 (mjs 03-Sep-06): if self.open:
1.24 (mjs 03-Sep-06): self.ts.closeDB()
1.24 (mjs 03-Sep-06): self.ts = None
1.24 (mjs 03-Sep-06): self.open = False

That overloads a transaction object with a lazy open/close
of an rpmdb in order to handle ^C.

While I think the code is insanely clever, and I'm quite pleased
that rpm is surviving as well as it is in spite of unanticipated
and bizzarre uses of the implementation, the code is solving
entirely the wrong problem, and triggering a lot of instability.

Here was the first indication I received:
https://lists.dulug.duke.edu/pipermail/rpm-devel/2006-November/
001857.html

Here is part of the history (which goes back even further)

http://www.archivesat.com/
RPM_internals_development_and_distro_coordination/thread158194.htm

And there is another recent post by Panu (which I can't find, from
11/2006) identifying the overloading
as a cause of worse performance in yum (even though the claim is
faster!).

The slowdown is rather easy to understand, repoening a Berkeley DB is
not exactly a cheap
operation.

hth

73 de Jeff
Tony Nelson
2006-12-04 04:12:44 UTC
Permalink
At 4:41 PM -0500 12/3/06, Jeff Johnson wrote:
>On Dec 3, 2006, at 3:49 PM, Tony Nelson wrote:
>
>>
>>
>>Umm, in order to help me when I try to read yum to understand what it is
>>doing, are you saying that yum is copying some data from RPM and then
>>assuming that it will not change across unlocking / relocking the
>>database? That is, it uses possibly stale data, but not invalid
>>iterators? And please pardon my ignorance of BerkelyDB; I will work to
>>learn about it.
>>
>
>This code from yum/rpmUtils/transaction.py is what all the fuss is about

Thank you for pointing me to the offending part of yum. It will help me
focus on the problem, and possibly come up with a solution acceptable to
yum's developers.


>1.24 (mjs 03-Sep-06): self.open = True
>1.24 (mjs 03-Sep-06):
>1.24 (mjs 03-Sep-06): def __del__(self):
>1.24 (mjs 03-Sep-06): # Automatically close the
>rpm transaction when the reference is lost
>1.24 (mjs 03-Sep-06): self.close()
>1.24 (mjs 03-Sep-06):
>1.24 (mjs 03-Sep-06): def close(self):
>1.24 (mjs 03-Sep-06): if self.open:
>1.24 (mjs 03-Sep-06): self.ts.closeDB()
>1.24 (mjs 03-Sep-06): self.ts = None
>1.24 (mjs 03-Sep-06): self.open = False
>
>That overloads a transaction object with a lazy open/close
>of an rpmdb in order to handle ^C.

Destructors are really hard to get right in Python, because if there is a
reference loop the garbage collector zaps the objects in it in arbitrary
order. This one looks OK on the surface, as it only releases a resource,
but I'll look harder. I'd actually have expected the TransactionSet would
already do this on its own when it's last reference disappeared; I'll look
at its code later.


>While I think the code is insanely clever, and I'm quite pleased
>that rpm is surviving as well as it is in spite of unanticipated
>and bizzarre uses of the implementation, the code is solving
>entirely the wrong problem, and triggering a lot of instability.

I don't quite see that this code is a problem. It only runs at task time
(not at signal time), when the object is being reclaimed by the Python
interpreter. It should be like any other call to close the database. It
can't happen inside a call to rpmlib (unless there is more than one
thread). Am I misunderstanding the issue?


>Here was the first indication I received:
> https://lists.dulug.duke.edu/pipermail/rpm-devel/2006-November/
>001857.html

I've read this thread; I will try to understand it better with time.


>Here is part of the history (which goes back even further)
>
> http://www.archivesat.com/
>RPM_internals_development_and_distro_coordination/thread158194.htm

OK.


>And there is another recent post by Panu (which I can't find, from
>11/2006) identifying the overloading as a cause of worse performance in
>yum (even though the claim is faster!).
>
>The slowdown is rather easy to understand, repoening a Berkeley DB is
>not exactly a cheap operation.

AIUI, yum does that less often now than did the version in FC6 test
releases, and Yum's developers realize that it is a trigger for the
problem. Whether or not it is a bug in RPM or BerkelyDB or yum that is
being triggered, it is not necessary to do it, or even particularly helpful
in order to get SIGINT to work usefully. I hope to develop a solution that
is acceptable to RPM and to Yum. We'll see what I manage.


>hth

I think it does, thanks.
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Jeff Johnson
2006-12-04 05:37:15 UTC
Permalink
On Dec 3, 2006, at 11:12 PM, Tony Nelson wrote:

>
>> While I think the code is insanely clever, and I'm quite pleased
>> that rpm is surviving as well as it is in spite of unanticipated
>> and bizzarre uses of the implementation, the code is solving
>> entirely the wrong problem, and triggering a lot of instability.
>
> I don't quite see that this code is a problem. It only runs at
> task time
> (not at signal time), when the object is being reclaimed by the Python
> interpreter. It should be like any other call to close the
> database. It
> can't happen inside a call to rpmlib (unless there is more than one
> thread). Am I misunderstanding the issue?
>

The intent of
ts.CloseDB() # <-- this is unnecessary btw
del ts
is to keep rpm from "stealing" signal handlers, on last
rpmdb close the original signal handlers are lazily restored.

I have heard no other reason for the "del ts" change, transaction.py
used to have a persistent transaction which kept an rpmdb
open as long as needed.

Which was higher performing in some yum version, I still can't find
Panu's measurements though.

There are other means to throw an exception on a signal.

73 de Jeff
Panu Matilainen
2006-12-04 08:22:00 UTC
Permalink
On Mon, 4 Dec 2006, Jeff Johnson wrote:
>
> On Dec 3, 2006, at 11:12 PM, Tony Nelson wrote:
>
>>
>>> While I think the code is insanely clever, and I'm quite pleased
>>> that rpm is surviving as well as it is in spite of unanticipated
>>> and bizzarre uses of the implementation, the code is solving
>>> entirely the wrong problem, and triggering a lot of instability.
>>
>> I don't quite see that this code is a problem. It only runs at task time
>> (not at signal time), when the object is being reclaimed by the Python
>> interpreter. It should be like any other call to close the database. It
>> can't happen inside a call to rpmlib (unless there is more than one
>> thread). Am I misunderstanding the issue?
>>
>
> The intent of
> ts.CloseDB() # <-- this is unnecessary btw
> del ts
> is to keep rpm from "stealing" signal handlers, on last
> rpmdb close the original signal handlers are lazily restored.
>
> I have heard no other reason for the "del ts" change, transaction.py
> used to have a persistent transaction which kept an rpmdb
> open as long as needed.
>
> Which was higher performing in some yum version, I still can't find
> Panu's measurements though.

Here's the timing info (by Terje Rosten, not me):
https://lists.dulug.duke.edu/pipermail/yum-devel/2006-November/002870.html

- Panu -
Tony Nelson
2006-12-04 18:56:41 UTC
Permalink
At 12:37 AM -0500 12/4/06, Jeff Johnson wrote:
>On Dec 3, 2006, at 11:12 PM, Tony Nelson wrote:
>>
>>> While I think the code is insanely clever, and I'm quite pleased
>>> that rpm is surviving as well as it is in spite of unanticipated
>>> and bizzarre uses of the implementation, the code is solving
>>> entirely the wrong problem, and triggering a lot of instability.
>>
>>I don't quite see that this code is a problem. It only runs at task time
>>(not at signal time), when the object is being reclaimed by the Python
>>interpreter. It should be like any other call to close the database. It
>>can't happen inside a call to rpmlib (unless there is more than one
>>thread). Am I misunderstanding the issue?
>
>The intent of
> ts.CloseDB() # <-- this is unnecessary btw
> del ts
>is to keep rpm from "stealing" signal handlers, on last
>rpmdb close the original signal handlers are lazily restored.
>
>I have heard no other reason for the "del ts" change, transaction.py
>used to have a persistent transaction which kept an rpmdb
>open as long as needed.

You may well be right about how it's being used, but the TransactionWrapper
class /says/ its purpose is to allow easy instrumentation of
rpm.Transaction. It appears to be the "owner" of one of them, which it
takes pains to close when it goes away. (If it really "owns" a
TransactionSet when it returns it to use as an iterator.) (There's also a
global ts which isn't being used for anything?) If it is being (mis)used
to do what you say, that would be by the code that holds a reference to it
choosing to forget that reference.

Or are you saying that "del ts" would not do the ts.CloseDB()? I'm
assuming that it would close it, which makes the destructor unnecessary.


>Which was higher performing in some yum version, I still can't find
>Panu's measurements though.

That's OK, I accept that already (I also see Panu's reply). I read
(somewhere) that performance suffers quite a bit with the new opens /
closes, but that an earlier yum that had more of them was even slower.


>There are other means to throw an exception on a signal.

Well, yes. ;) Umm, I will look for the ones you prefer.
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Jeff Johnson
2006-12-04 19:44:37 UTC
Permalink
On Dec 4, 2006, at 1:56 PM, Tony Nelson wrote:

> At 12:37 AM -0500 12/4/06, Jeff Johnson wrote:
>> On Dec 3, 2006, at 11:12 PM, Tony Nelson wrote:
>>>
>>>> While I think the code is insanely clever, and I'm quite pleased
>>>> that rpm is surviving as well as it is in spite of unanticipated
>>>> and bizzarre uses of the implementation, the code is solving
>>>> entirely the wrong problem, and triggering a lot of instability.
>>>
>>> I don't quite see that this code is a problem. It only runs at
>>> task time
>>> (not at signal time), when the object is being reclaimed by the
>>> Python
>>> interpreter. It should be like any other call to close the
>>> database. It
>>> can't happen inside a call to rpmlib (unless there is more than one
>>> thread). Am I misunderstanding the issue?
>>
>> The intent of
>> ts.CloseDB() # <-- this is unnecessary btw
>> del ts
>> is to keep rpm from "stealing" signal handlers, on last
>> rpmdb close the original signal handlers are lazily restored.
>>
>> I have heard no other reason for the "del ts" change, transaction.py
>> used to have a persistent transaction which kept an rpmdb
>> open as long as needed.
>
> You may well be right about how it's being used, but the
> TransactionWrapper
> class /says/ its purpose is to allow easy instrumentation of
> rpm.Transaction. It appears to be the "owner" of one of them,
> which it
> takes pains to close when it goes away. (If it really "owns" a
> TransactionSet when it returns it to use as an iterator.) (There's
> also a
> global ts which isn't being used for anything?) If it is being
> (mis)used
> to do what you say, that would be by the code that holds a
> reference to it
> choosing to forget that reference.
>
> Or are you saying that "del ts" would not do the ts.CloseDB()? I'm
> assuming that it would close it, which makes the destructor
> unnecessary.
>

The ts.CloseDB() is unnecessary, "del ts" will accomplish the same call
to rpmdbClose() when a transaction is freed.

Way down underneath the hood, rpmlib is maintaining a link list of
open rpmdb databases, and associating already open rpmdb's with
new transactions, maintaining a refcount on used rpmdb objects, so that
users of rpm-python can freely create/destroy transactions without
having to worry about the underlying attached rpmdb database
environment.

The code snippet forces the ts <-> rpmdb relation to be one-to-one,
which is okay. I worry about the use of ts.CloseDB(), which has
other side-effects, namely any database that is manually closed
using ts.CloseDB() will never be lazily open'ed again.

With all the other mess of maintaining link lists and refcounts and
associating already existing rpmdb's with newly created ts objects,
I'd rather *NOT* also have to worry about about raciness on a
manually closed rpmdb that prevents lazy reopens.

AFAIK (and I have looked), the
ts.CloseDB()
is unnecessary to yum, the same action will be performed by
del ts

Meanwhile, the eventual bestest fix is going to be to handle
signals through other means and making the ts object more
persistent (thereby preserving the database environment's
concurrency guarantee of "multiple readers or single writer",
and being higher performing as well.

Got that?

73 de Jeff
Tony Nelson
2006-12-05 01:50:25 UTC
Permalink
At 2:44 PM -0500 12/4/06, Jeff Johnson wrote:
>On Dec 4, 2006, at 1:56 PM, Tony Nelson wrote:
>
>> At 12:37 AM -0500 12/4/06, Jeff Johnson wrote:
>>> On Dec 3, 2006, at 11:12 PM, Tony Nelson wrote:
>>>>
>>>>> While I think the code is insanely clever, and I'm quite pleased
>>>>> that rpm is surviving as well as it is in spite of unanticipated
>>>>> and bizzarre uses of the implementation, the code is solving
>>>>> entirely the wrong problem, and triggering a lot of instability.
>>>>
>>>>I don't quite see that this code is a problem. It only runs at task
>>>>time (not at signal time), when the object is being reclaimed by the
>>>>Python interpreter. It should be like any other call to close the
>>>>database. It can't happen inside a call to rpmlib (unless there is
>>>>more than one thread). Am I misunderstanding the issue?
>>>
>>> The intent of
>>> ts.CloseDB() # <-- this is unnecessary btw
>>> del ts
>>> is to keep rpm from "stealing" signal handlers, on last
>>> rpmdb close the original signal handlers are lazily restored.
>>>
>>> I have heard no other reason for the "del ts" change, transaction.py
>>> used to have a persistent transaction which kept an rpmdb
>>> open as long as needed.
>>
>> You may well be right about how it's being used, but the
>> TransactionWrapper class /says/ its purpose is to allow easy
>> instrumentation of rpm.Transaction. It appears to be the "owner" of one
>> of them, which it takes pains to close when it goes away. (If it really
>> "owns" a TransactionSet when it returns it to use as an iterator.)
>> (There's also a global ts which isn't being used for anything?) If it is
>> being (mis)used to do what you say, that would be by the code that holds
>> a reference to it choosing to forget that reference.
>>
>> Or are you saying that "del ts" would not do the ts.CloseDB()? I'm
>> assuming that it would close it, which makes the destructor
>> unnecessary.
>>
>
>The ts.CloseDB() is unnecessary, "del ts" will accomplish the same call
>to rpmdbClose() when a transaction is freed.
>
>Way down underneath the hood, rpmlib is maintaining a link list of
>open rpmdb databases, and associating already open rpmdb's with
>new transactions, maintaining a refcount on used rpmdb objects, so that
>users of rpm-python can freely create/destroy transactions without
>having to worry about the underlying attached rpmdb database
>environment.
>
>The code snippet forces the ts <-> rpmdb relation to be one-to-one,
>which is okay. I worry about the use of ts.CloseDB(), which has
>other side-effects, namely any database that is manually closed
>using ts.CloseDB() will never be lazily open'ed again.
>
>With all the other mess of maintaining link lists and refcounts and
>associating already existing rpmdb's with newly created ts objects,
>I'd rather *NOT* also have to worry about about raciness on a
>manually closed rpmdb that prevents lazy reopens.
>
>AFAIK (and I have looked), the
> ts.CloseDB()
>is unnecessary to yum, the same action will be performed by
> del ts

And I'm saying further that (by my reading of Transaction.py) the whole
destructor (__del__(self): self.close()) is unnecessary and pointless,
since whenever it would be called, ts would have been released and closed
itself normally without it.

>Meanwhile, the eventual bestest fix is going to be to handle
>signals through other means and making the ts object more
>persistent (thereby preserving the database environment's
>concurrency guarantee of "multiple readers or single writer",
>and being higher performing as well.
>
>Got that?

Yes. It's what I'm working toward. Note that I have no influence over yum
other than to work up a patch and then argue for it.

At the moment, I can't find how to get the currently received signals from
rpmlib. I only see two places where rpmsqCaught is used, in rpmsqAction()
where they're set, and rpmdbCheckSignals(), where they're read. I don't
see an available funciton that just polls them, though rpmsqCaught is a
global and such a thing could be added. Was the suggestion that
rpmsqCaught could be used to make such a polling function?

Once I have a polling routing, it looks like either sprinkle such polling
into yum, or use a Python threading.Timer() to poll a couple of times a
second (started automatically by the rpm Python package). (That would poll
from a different thread. My understanding is that sigismember() and
sigdelset() are safe across threads and CPUs.)
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Jeff Johnson
2006-12-05 01:59:32 UTC
Permalink
On Dec 4, 2006, at 8:50 PM, Tony Nelson wrote:

>
> And I'm saying further that (by my reading of Transaction.py) the
> whole
> destructor (__del__(self): self.close()) is unnecessary and pointless,
> since whenever it would be called, ts would have been released and
> closed
> itself normally without it.
>

;-) Perhaps. Even though I rewrote most of rpm-python, I'm a C, not a
python,
programmer.

>> Meanwhile, the eventual bestest fix is going to be to handle
>> signals through other means and making the ts object more
>> persistent (thereby preserving the database environment's
>> concurrency guarantee of "multiple readers or single writer",
>> and being higher performing as well.
>>
>> Got that?
>
> Yes. It's what I'm working toward. Note that I have no influence
> over yum
> other than to work up a patch and then argue for it.
>

Very cool.

> At the moment, I can't find how to get the currently received
> signals from
> rpmlib. I only see two places where rpmsqCaught is used, in
> rpmsqAction()
> where they're set, and rpmdbCheckSignals(), where they're read. I
> don't
> see an available funciton that just polls them, though rpmsqCaught
> is a
> global and such a thing could be added. Was the suggestion that
> rpmsqCaught could be used to make such a polling function?
>

Yes, rpmsqCaught is the bit mask of received signals, and a
rpmdbCheckSignals()
exit is what is needed to gracefully close cursors and databases,
thereby freeing
locks.

I can/will add methods to rpm-python for any reasonable API, that's
pretty easy stuff.

> Once I have a polling routing, it looks like either sprinkle such
> polling
> into yum, or use a Python threading.Timer() to poll a couple of
> times a
> second (started automatically by the rpm Python package). (That
> would poll
> from a different thread. My understanding is that sigismember() and
> sigdelset() are safe across threads and CPUs.)

Sprinkling into existing rpm-python method exits may be needed if yum
resists
changing.

Thanks for the effort. Some python-head expressing a wish for what is
needed
is what has stopped me from doing the coding.

73 de Jeff
Tony Nelson
2006-12-05 03:09:59 UTC
Permalink
At 8:59 PM -0500 12/4/06, Jeff Johnson wrote:
>On Dec 4, 2006, at 8:50 PM, Tony Nelson wrote:

I see you trimmed some unneeded context, good. I've been leaving it in as
you probably get lots more unrelated messages to respond to than I do.


>>And I'm saying further that (by my reading of Transaction.py) the whole
>>destructor (__del__(self): self.close()) is unnecessary and pointless,
>>since whenever it would be called, ts would have been released and closed
>>itself normally without it.
>>
>
>;-) Perhaps. Even though I rewrote most of rpm-python, I'm a C, not a
>python, programmer.

OK.


>>> Meanwhile, the eventual bestest fix is going to be to handle
>>> signals through other means and making the ts object more
>>> persistent (thereby preserving the database environment's
>>> concurrency guarantee of "multiple readers or single writer",
>>> and being higher performing as well.
>>>
>>> Got that?
>>
>> Yes. It's what I'm working toward. Note that I have no influence
>> over yum other than to work up a patch and then argue for it.
>
>Very cool.

Thank you.


>>At the moment, I can't find how to get the currently received signals
>>from rpmlib. I only see two places where rpmsqCaught is used, in
>>rpmsqAction() where they're set, and rpmdbCheckSignals(), where they're
>>read. I don't see an available funciton that just polls them, though
>>rpmsqCaught is a global and such a thing could be added. Was the
>>suggestion that rpmsqCaught could be used to make such a polling function?
>
>Yes, rpmsqCaught is the bit mask of received signals, and a
>rpmdbCheckSignals() exit is what is needed to gracefully close cursors and
>databases, thereby freeing locks.

I think we are talking at cross-purposes here. I would expect that "a
rpmdbCheckSignals() exit" is needed when it is RPM code that is
interrupted, which won't ever happen from this Python code.

>From Python code, all that could be needed is the normal unwinding that the
code already does. It's no different than if the code had decided to exit
for any other reason. If it's in a loop iterating over something, exit the
loop as if finished, dispose of anything that would normally be disposed,
exit that function, and so on. If it's downloading packages and just has a
ts open that it won't be using again until it's finished, there's nothing
RPM needs to (un)do, and no benefit from it suddenly quitting much later
because a handled SIGINT is still set as a flag. I'm pretty sure that any
time Python code can see the SIGINT flag set, RPM can not have anything
open that needs a rpmdbCheckSignals() exit.

I also see no harm in extremely occasionally dropping a second SIGINT onto
the floor, so I don't mind introducing such a possible race condition by
having two unlocked atomic ops, sigismember() then sigdelset(). SIGINT is
special this way, as it normally is a request from a user. I wouldn't want
to treat SIGSEGV this way.


>I can/will add methods to rpm-python for any reasonable API, that's
>pretty easy stuff.

That would save me some work! I'll ask you when I'm prepared. I'll hack
something first, and then let you do it right. :)


>>Once I have a polling routing, it looks like either sprinkle such polling
>>into yum, or use a Python threading.Timer() to poll a couple of times a
>>second (started automatically by the rpm Python package). (That would
>>poll from a different thread. My understanding is that sigismember() and
>>sigdelset() are safe across threads and CPUs.)
>
>Sprinkling into existing rpm-python method exits may be needed if yum
>resists changing.

The use case for yum having SIGINT handling is that it /isn't/ in any RPM
code, so I don't think that will be needed.

If I can make the whole thing automatic, so that all yum would need is to
get rid of the extra closes / opens, that would probably be the easiest
thing to get accepted.


>Thanks for the effort. Some python-head expressing a wish for what is
>needed is what has stopped me from doing the coding.

You're welcome. Just asking for something is not of any use, what's needed
is the effort of an acceptable patch. It just takes a while to find out
what that would be.

I expect it will be a couple of days before I'm back. I'll probably start
a new thread on rpm-devel when I do have something.
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Jeff Johnson
2006-12-05 03:11:51 UTC
Permalink
On Dec 4, 2006, at 8:50 PM, Tony Nelson wrote:

>
> At the moment, I can't find how to get the currently received
> signals from
> rpmlib. I only see two places where rpmsqCaught is used, in
> rpmsqAction()
> where they're set, and rpmdbCheckSignals(), where they're read. I
> don't
> see an available funciton that just polls them, though rpmsqCaught
> is a
> global and such a thing could be added. Was the suggestion that
> rpmsqCaught could be used to make such a polling function?
>

Here's a no-brainer patch that should apply most everywhere to add a
getter
method for rpmsqCaught, and a poll method for rpmdbCheckSignals().

If it helps, I'll add to the FC6 rpm-4.4.2, build, and make the
packages available.

Feel free to suggest better. Subclassing on the existing python
signal methods
to unify rpmsqCaught with whatever python is doing would be the most
Pythonic.

Alternatively, I can attempt sub-typing instead of sub-classing
within rpm-python.

73 de Jeff
James Olin Oden
2006-11-29 22:19:23 UTC
Permalink
> Exactly. RPM's users are RPM's customers. RPM should try to serve them
> well, and this includes putting the documentatin for RPM in RPM's own
> documentation.
>
RPM is an open source package. Anyone is capable of jumping in and
supplying documentation; the basic policy has always been "patches
gleefully accepted".

Furthermore, the only one you have any right to complain too about
lack of documentation is:

a) The distro vendor of which you pay for a support contract.
b) Yourself.

a) because if you did pay you are there customer. b) because this is
open source and a community development model is what open source
software is based on.
Most of the developers of rpm, Jeff being the principal one, do not
get payed by you to support them so you technically are not our
customer. On the other hand I would be happy for you to be our peer
and submit patches to document the things that you feel sorely need
documenting (actually I do not diagree with you that they need
documenting, but those of us who work on rpm typically work on the
things that we care about and our employers care about...Jeff is
honestly the only one that I know who does this out pure insane love
for the application).

OTOH if you wish a more traditional customer-vendor relationship then
this is not the proper forum. That would be through RedHat's pay for
support or the vendor of your choice.

>
> >> OK, though Packages is too large to save every day, and just keeping the
> >> last one would mean that the sysadmin would need to notice the problem
> >> before that last good copy of Packages was overwritten. (I have an RPM
> >> database instance that has corruption but functioned normally most of the
> >> time, though not during an Anacoda upgrade to FC6. See RH BZ 215127.) It
> >> would seem that it would be better to only save good copies of the Packages
> >> file, and to report to a responsible sysadmin that there is an issue (if
> >> only there were such a sysdamin for most systems). I don't know how to do
> >> that with "rpm --rebuilddb", but I do know how to do that with "rpm
> >> --verifydb".
> >>
> >
> >Too large is a different problem, try "man rdiff" if you want an easy
> >incremental
> >backup.
>
> I see rdiff is in extras. I see it uses rsync. Possibly I will add that
> capability to my rpm_verifydb package.
>
>
> >> /No one/ saves a copy of Packages, as such an implementation detail is
> >> RPM's responsibility, and it does not do it. Don't blame the users for
> >> omissions!
> >
> >We disagree. If you want to protect your data, you need to take precautions.
>
> It disapoints me that you think it is not RPM's responsibility to protect
> its database.
Thats not what he saying. If you know Jeff you would know that he
wishes to harden the database, what your missing is that its not his
or any open source developer in the purest sense to see to the
integrity of your data. If system go bad in the field where I work,
my management comes to me, and then I figure out what went wrong, and
try to make it not happen again. I do not point fingers because the
systems integrity was my responsibility.

Could rpm be configured to backup its data automatically based on some
event, or could one produce script to do this on a timely basis.
Sure. But I think the key is that we are in the area of policy here,
and by default I probably don't want rpm databases being backed up
everytime a write transaction occurs against them (I don't know about
others). That being said, patches are gleefully accepted if you want
to provide the mechanism such that the policy can be turned on. Or
perhaps it does not need to be part of rpm proper all. Maybe just
provide a package that supplies a cron job? Either way, a little
coding on your side goes a long way.

Cheers...james
Tony Nelson
2006-11-29 20:45:59 UTC
Permalink
At 10:57 PM -0600 11/28/06, Stanley, Jon wrote:
>>
>>The "right" way would be to make a copy of the Packages file, check it, and
>>only save it if the check passes. According to Jeff, the proper check is
>>to do a "rpm --rebuilddb" with that Packages file and see if it works, but
>>I haven't tried that method.
>>--
>
>Is there some way to verify that a --rebuilddb *would* work against it?
>I have an aversion to modifying production data, for obviously good
>reason.

See "make a copy of the Packages file", above. If that works, then it will
work. Unfortunately, it won't tell you if the new database has any
significant difference from the old one, so if "rpm --verifydb" says the
database is OK and this works, you don't really know if there was a problem
previously unless RPM already puked. (My bad database, acceptable to RPM
for most operations, did fail the "rpm --verifydb" check.)
--
____________________________________________________________________
TonyN.:' The Great Writ <mailto:***@georgeanelson.com>
' is no more. <http://www.georgeanelson.com/>
Loading...