Fault Tolerance Interface
|
Post recovery functions for the FTI library. More...
#include "interface.h"
Functions | |
int | FTI_Decode (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt, int *erased) |
It recovers a set of ckpt. files using RS decoding. More... | |
int | FTI_RecoverL1 (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt) |
It checks that all L1 ckpt. files are present. More... | |
int | FTI_SendCkptFileL2 (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_checkpoint *FTI_Ckpt, int destination, int ptner) |
It sends checkpint file. More... | |
int | FTI_RecvCkptFileL2 (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_checkpoint *FTI_Ckpt, int source, int ptner) |
It receives checkpint file. More... | |
int | FTI_RecoverL2 (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt) |
It recovers L2 ckpt. files using the partner copy. More... | |
int | FTI_RecoverL3 (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt) |
It recovers L3 ckpt. files ordering the RS decoding algorithm. More... | |
int | FTI_RecoverL4 (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt) |
It recovers L4 ckpt. files from the PFS. More... | |
int | FTI_RecoverL4Posix (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt) |
It recovers L4 ckpt. files from the PFS using POSIX. More... | |
int | FTI_RecoverL4Mpi (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt) |
It recovers L4 ckpt. files from the PFS using MPI-I/O. More... | |
int | FTI_RecoverL4Sionlib (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt) |
It recovers L4 ckpt. files from the PFS using SIONlib. More... | |
Post recovery functions for the FTI library.
Copyright (c) 2017 Leonardo A. Bautista-Gomez All rights reserved
FTI - A multi-level checkpointing library for C/C++/Fortran applications
Revision 1.0 : Fault Tolerance Interface (FTI)
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
int FTI_Decode | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt, | ||
int * | erased | ||
) |
It recovers a set of ckpt. files using RS decoding.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
erased | The array of erasures. |
This function tries to recover the L3 ckpt. files missing using the RS decoding.
int FTI_RecoverL1 | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt | ||
) |
It checks that all L1 ckpt. files are present.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
This function detects all the erasures for L1. If there is at least one, L1 is not considered as recoverable.
int FTI_RecoverL2 | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt | ||
) |
It recovers L2 ckpt. files using the partner copy.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
This function tries to recover the L2 ckpt. files missing using the partner copy. If a ckpt. file and its copy are both missing, then we consider this checkpoint unavailable.
int FTI_RecoverL3 | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt | ||
) |
It recovers L3 ckpt. files ordering the RS decoding algorithm.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
This function tries to recover the L3 ckpt. files missing using the RS decoding. If to many files are missing in the group, then we consider this checkpoint unavailable.
int FTI_RecoverL4 | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt | ||
) |
It recovers L4 ckpt. files from the PFS.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
This function tries to recover the ckpt. files using the L4 ckpt. files stored in the PFS. If at least one ckpt. file is missing in the PFS, we consider this checkpoint unavailable.
int FTI_RecoverL4Mpi | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt | ||
) |
It recovers L4 ckpt. files from the PFS using MPI-I/O.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
This function tries to recover the ckpt. files using the L4 ckpt. files stored in the PFS. If at least one ckpt. file is missing in the PFS, we consider this checkpoint unavailable.
int FTI_RecoverL4Posix | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt | ||
) |
It recovers L4 ckpt. files from the PFS using POSIX.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
This function tries to recover the ckpt. files using the L4 ckpt. files stored in the PFS. If at least one ckpt. file is missing in the PFS, we consider this checkpoint unavailable.
int FTI_RecoverL4Sionlib | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt | ||
) |
It recovers L4 ckpt. files from the PFS using SIONlib.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
This function tries to recover the ckpt. files using the L4 ckpt. files stored in the PFS. If at least one ckpt. file is missing in the PFS, we consider this checkpoint unavailable.
int FTI_RecvCkptFileL2 | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_checkpoint * | FTI_Ckpt, | ||
int | source, | ||
int | ptner | ||
) |
It receives checkpint file.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Ckpt | Checkpoint metadata. |
source | Source group rank. |
ptner | 0 if receiving Ckpt, 1 if PtnerCkpt. |
This function receives Ckpt or PtnerCkpt file from partner proccess.
int FTI_SendCkptFileL2 | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_checkpoint * | FTI_Ckpt, | ||
int | destination, | ||
int | ptner | ||
) |
It sends checkpint file.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Ckpt | Checkpoint metadata. |
destination | destination group rank. |
ptner | 0 if sending Ckpt, 1 if PtnerCkpt. |
This function sends Ckpt or PtnerCkpt file from partner proccess.