Fault Tolerance Interface
|
Checkpointing functions for the FTI library. More...
Macros | |
#define | _POSIX_C_SOURCE 200809L |
Functions | |
int | FTI_UpdateIterTime (FTIT_execution *FTI_Exec) |
It updates the local and global mean iteration time. More... | |
int | FTI_WriteCkpt (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt, FTIT_dataset *FTI_Data) |
It writes the checkpoint data in the target file. More... | |
int | FTI_PostCkpt (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt) |
Decides wich action start depending on the ckpt. level. More... | |
int | FTI_Listen (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt) |
It listens for checkpoint notifications. More... | |
int | FTI_HandleCkptRequest (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt) |
handles checkpoint requests from application ranks (if head). More... | |
int | FTI_WritePosix (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt, FTIT_dataset *FTI_Data) |
Writes ckpt to PFS using POSIX. More... | |
int | FTI_WriteMPI (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_dataset *FTI_Data) |
Writes ckpt to PFS using MPI I/O. More... | |
int | FTI_WriteSionlib (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_dataset *FTI_Data) |
Writes ckpt to PFS using SIONlib. More... | |
Checkpointing functions for the FTI library.
Copyright (c) 2017 Leonardo A. Bautista-Gomez All rights reserved
FTI - A multi-level checkpointing library for C/C++/Fortran applications
Revision 1.0 : Fault Tolerance Interface (FTI)
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#define _POSIX_C_SOURCE 200809L |
int FTI_HandleCkptRequest | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt | ||
) |
handles checkpoint requests from application ranks (if head).
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
int FTI_Listen | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt | ||
) |
It listens for checkpoint notifications.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
int FTI_PostCkpt | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt | ||
) |
Decides wich action start depending on the ckpt. level.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
This function launches the required action dependeing on the ckpt. level. It does that for each group (application process in the node) if executed by the head, or only locally if executed by an application process. The parameter pr determines if the for loops have 1 or number of App. procs. iterations. The group parameter helps determine the groupID in both cases.
int FTI_UpdateIterTime | ( | FTIT_execution * | FTI_Exec | ) |
It updates the local and global mean iteration time.
FTI_Exec | Execution metadata. |
This function updates the local and global mean iteration time. It also recomputes the checkpoint interval in iterations and corrects the next checkpointing iteration based on the observed mean iteration duration.
int FTI_WriteCkpt | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt, | ||
FTIT_dataset * | FTI_Data | ||
) |
It writes the checkpoint data in the target file.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
FTI_Data | Dataset metadata. |
This function checks whether the checkpoint needs to be local or remote, opens the target file and writes dataset per dataset, the checkpoint data, it finally flushes and closes the checkpoint file.
int FTI_WriteMPI | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_dataset * | FTI_Data | ||
) |
Writes ckpt to PFS using MPI I/O.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Data | Dataset metadata. |
In here it is taken into account, that in MPIIO the count parameter in both, MPI_Type_contiguous and MPI_File_write_at, are integer types. The ckpt data is split into chunks of maximal (MAX_INT-1)/2 elements to form contiguous data types. It was experienced, that if the size is greater then that, it may lead to problems.
int FTI_WritePosix | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt, | ||
FTIT_dataset * | FTI_Data | ||
) |
Writes ckpt to PFS using POSIX.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
FTI_Data | Dataset metadata. |
int FTI_WriteSionlib | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_dataset * | FTI_Data | ||
) |
Writes ckpt to PFS using SIONlib.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Data | Dataset metadata. |