Fault Tolerance Interface
|
Metadata functions for the FTI library. More...
#include "interface.h"
Functions | |
int | FTI_GetChecksums (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt, char *checksum, char *ptnerChecksum, char *rsChecksum) |
It gets the checksums from metadata. More... | |
int | FTI_WriteRSedChecksum (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt, int rank, char *checksum) |
It writes the RSed file checksum to metadata. More... | |
int | FTI_LoadTmpMeta (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt) |
It gets the temporary metadata. More... | |
int | FTI_LoadMeta (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt) |
It gets the metadata to recover the data after a failure. More... | |
int | FTI_LoadL4CkptMetaData (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt) |
Loads relevant data from checkpoint meta data. More... | |
int | FTI_LoadCkptMetaData (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt) |
Loads relevant data from checkpoint meta data. More... | |
int | FTI_WriteCkptMetaData (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt) |
Creates or updates checkpoint meta data. More... | |
int | FTI_WriteMetadata (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, long *fs, long mfs, char *fnl, char *checksums, int *allVarIDs, long *allVarSizes) |
It writes the metadata to recover the data after a failure. More... | |
int | FTI_CreateMetadata (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt, FTIT_dataset *FTI_Data) |
It writes the metadata to recover the data after a failure. More... | |
Metadata functions for the FTI library.
Copyright (c) 2017 Leonardo A. Bautista-Gomez All rights reserved
FTI - A multi-level checkpointing library for C/C++/Fortran applications
Revision 1.0 : Fault Tolerance Interface (FTI)
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
int FTI_CreateMetadata | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt, | ||
FTIT_dataset * | FTI_Data | ||
) |
It writes the metadata to recover the data after a failure.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
FTI_Data | Dataset metadata. |
This function gathers information about the checkpoint files in the group (name and sizes), and creates the metadata file used to recover in case of failure.
int FTI_GetChecksums | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt, | ||
char * | checksum, | ||
char * | ptnerChecksum, | ||
char * | rsChecksum | ||
) |
It gets the checksums from metadata.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
checksum | Pointer to fill the checkpoint checksum. |
ptnerChecksum | Pointer to fill the ptner file checksum. |
rsChecksum | Pointer to fill the RS file checksum. |
This function reads the metadata file created during checkpointing and recovers the checkpoint checksum. If there is no RS file, rsChecksum string length is 0.
int FTI_LoadCkptMetaData | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt | ||
) |
Loads relevant data from checkpoint meta data.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
int FTI_LoadL4CkptMetaData | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt | ||
) |
Loads relevant data from checkpoint meta data.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
int FTI_LoadMeta | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt | ||
) |
It gets the metadata to recover the data after a failure.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
This function reads the metadata file created during checkpointing and recovers the checkpoint file name, file size, partner file size and the size of the largest file in the group (for padding if necessary during decoding).
int FTI_LoadTmpMeta | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt | ||
) |
It gets the temporary metadata.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
This function reads the temporary metadata file created during checkpointing and recovers the checkpoint file name, file size, partner file size and the size of the largest file in the group (for padding if necessary during decoding).
int FTI_WriteCkptMetaData | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt | ||
) |
Creates or updates checkpoint meta data.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
Writes checkpoint meta data in checkpoint meta data file.
int FTI_WriteMetadata | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
long * | fs, | ||
long | mfs, | ||
char * | fnl, | ||
char * | checksums, | ||
int * | allVarIDs, | ||
long * | allVarSizes | ||
) |
It writes the metadata to recover the data after a failure.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
fs | Pointer to the list of checkpoint sizes. |
mfs | The maximum checkpoint file size. |
fnl | Pointer to the list of checkpoint names. |
checksums | Checksums array. |
allVarIDs | IDs of vars from all processes in group. |
allVarSizes | Sizes of vars from all processes in group. |
This function should be executed only by one process per group. It writes the metadata file used to recover in case of failure.
int FTI_WriteRSedChecksum | ( | FTIT_configuration * | FTI_Conf, |
FTIT_execution * | FTI_Exec, | ||
FTIT_topology * | FTI_Topo, | ||
FTIT_checkpoint * | FTI_Ckpt, | ||
int | rank, | ||
char * | checksum | ||
) |
It writes the RSed file checksum to metadata.
FTI_Conf | Configuration metadata. |
FTI_Exec | Execution metadata. |
FTI_Topo | Topology metadata. |
FTI_Ckpt | Checkpoint metadata. |
rank | global rank of the process |
checksum | Pointer to the checksum. |
This function should be executed only by one process per group. It writes the RSed checksum to the metadata file.