jbd: fix error handling for checkpoint io
authorHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Wed, 22 Oct 2008 21:15:00 +0000 (14:15 -0700)
committerLinus Torvalds <torvalds@linux-foundation.org>
Thu, 23 Oct 2008 15:55:01 +0000 (08:55 -0700)
commit4afe978530702c934dfdb11f54073136818b2119
tree5f7fb9539b46c0b390157f55c84017e14b7f605c
parent66f50ee3cee4c9d98eea0add6f439e6e5e0ca4a5
jbd: fix error handling for checkpoint io

When a checkpointing IO fails, current JBD code doesn't check the error
and continue journaling.  This means latest metadata can be lost from both
the journal and filesystem.

This patch leaves the failed metadata blocks in the journal space and
aborts journaling in the case of log_do_checkpoint().  To achieve this, we
need to do:

1. don't remove the failed buffer from the checkpoint list where in
   the case of __try_to_free_cp_buf() because it may be released or
   overwritten by a later transaction
2. log_do_checkpoint() is the last chance, remove the failed buffer
   from the checkpoint list and abort the journal
3. when checkpointing fails, don't update the journal super block to
   prevent the journaled contents from being cleaned.  For safety,
   don't update j_tail and j_tail_sequence either
4. when checkpointing fails, notify this error to the ext3 layer so
   that ext3 don't clear the needs_recovery flag, otherwise the
   journaled contents are ignored and cleaned in the recovery phase
5. if the recovery fails, keep the needs_recovery flag
6. prevent cleanup_journal_tail() from being called between
   __journal_drop_transaction() and journal_abort() (a race issue
   between journal_flush() and __log_wait_for_space()

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Acked-by: Jan Kara <jack@suse.cz>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fs/jbd/checkpoint.c
fs/jbd/journal.c
fs/jbd/recovery.c
include/linux/jbd.h