Fault Tolerance Interface
postreco.c File Reference

Post recovery functions for the FTI library. More...

#include "interface.h"
Include dependency graph for postreco.c:

Functions

int FTI_Decode (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt, int *erased)
 It recovers a set of ckpt. files using RS decoding. More...
 
int FTI_RecoverL1 (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt)
 It checks that all L1 ckpt. files are present. More...
 
int FTI_SendCkptFileL2 (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_checkpoint *FTI_Ckpt, int destination, int ptner)
 It sends checkpint file. More...
 
int FTI_RecvCkptFileL2 (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_checkpoint *FTI_Ckpt, int source, int ptner)
 It receives checkpint file. More...
 
int FTI_RecoverL2 (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt)
 It recovers L2 ckpt. files using the partner copy. More...
 
int FTI_RecoverL3 (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt)
 It recovers L3 ckpt. files ordering the RS decoding algorithm. More...
 
int FTI_RecoverL4 (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt)
 It recovers L4 ckpt. files from the PFS. More...
 
int FTI_RecoverL4Posix (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt)
 It recovers L4 ckpt. files from the PFS using POSIX. More...
 
int FTI_RecoverL4Mpi (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt)
 It recovers L4 ckpt. files from the PFS using MPI-I/O. More...
 
int FTI_RecoverL4Sionlib (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt)
 It recovers L4 ckpt. files from the PFS using SIONlib. More...
 

Detailed Description

Post recovery functions for the FTI library.

Copyright (c) 2017 Leonardo A. Bautista-Gomez All rights reserved

FTI - A multi-level checkpointing library for C/C++/Fortran applications

Revision 1.0 : Fault Tolerance Interface (FTI)

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Date
October, 2017

Function Documentation

int FTI_Decode ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt,
int *  erased 
)

It recovers a set of ckpt. files using RS decoding.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.
erasedThe array of erasures.
Returns
integer FTI_SCES if successful.

This function tries to recover the L3 ckpt. files missing using the RS decoding.

Here is the call graph for this function:

int FTI_RecoverL1 ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt 
)

It checks that all L1 ckpt. files are present.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.
Returns
integer FTI_SCES if successful.

This function detects all the erasures for L1. If there is at least one, L1 is not considered as recoverable.

Here is the call graph for this function:

int FTI_RecoverL2 ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt 
)

It recovers L2 ckpt. files using the partner copy.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.
Returns
integer FTI_SCES if successful.

This function tries to recover the L2 ckpt. files missing using the partner copy. If a ckpt. file and its copy are both missing, then we consider this checkpoint unavailable.

Here is the call graph for this function:

int FTI_RecoverL3 ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt 
)

It recovers L3 ckpt. files ordering the RS decoding algorithm.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.
Returns
integer FTI_SCES if successful.

This function tries to recover the L3 ckpt. files missing using the RS decoding. If to many files are missing in the group, then we consider this checkpoint unavailable.

Here is the call graph for this function:

int FTI_RecoverL4 ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt 
)

It recovers L4 ckpt. files from the PFS.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.
Returns
integer FTI_SCES if successful.

This function tries to recover the ckpt. files using the L4 ckpt. files stored in the PFS. If at least one ckpt. file is missing in the PFS, we consider this checkpoint unavailable.

Here is the call graph for this function:

int FTI_RecoverL4Mpi ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt 
)

It recovers L4 ckpt. files from the PFS using MPI-I/O.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.
Returns
integer FTI_SCES if successful.

This function tries to recover the ckpt. files using the L4 ckpt. files stored in the PFS. If at least one ckpt. file is missing in the PFS, we consider this checkpoint unavailable.

Here is the call graph for this function:

int FTI_RecoverL4Posix ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt 
)

It recovers L4 ckpt. files from the PFS using POSIX.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.
Returns
integer FTI_SCES if successful.

This function tries to recover the ckpt. files using the L4 ckpt. files stored in the PFS. If at least one ckpt. file is missing in the PFS, we consider this checkpoint unavailable.

Here is the call graph for this function:

int FTI_RecoverL4Sionlib ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt 
)

It recovers L4 ckpt. files from the PFS using SIONlib.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.
Returns
integer FTI_SCES if successful.

This function tries to recover the ckpt. files using the L4 ckpt. files stored in the PFS. If at least one ckpt. file is missing in the PFS, we consider this checkpoint unavailable.

Here is the call graph for this function:

int FTI_RecvCkptFileL2 ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_checkpoint FTI_Ckpt,
int  source,
int  ptner 
)

It receives checkpint file.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_CkptCheckpoint metadata.
sourceSource group rank.
ptner0 if receiving Ckpt, 1 if PtnerCkpt.
Returns
integer FTI_SCES if successful.

This function receives Ckpt or PtnerCkpt file from partner proccess.

Here is the call graph for this function:

int FTI_SendCkptFileL2 ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_checkpoint FTI_Ckpt,
int  destination,
int  ptner 
)

It sends checkpint file.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_CkptCheckpoint metadata.
destinationdestination group rank.
ptner0 if sending Ckpt, 1 if PtnerCkpt.
Returns
integer FTI_SCES if successful.

This function sends Ckpt or PtnerCkpt file from partner proccess.

Here is the call graph for this function: