Fault Tolerance Interface
meta.c File Reference

Metadata functions for the FTI library. More...

#include "interface.h"
Include dependency graph for meta.c:

Functions

int FTI_GetChecksums (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt, char *checksum, char *ptnerChecksum, char *rsChecksum)
 It gets the checksums from metadata. More...
 
int FTI_WriteRSedChecksum (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt, int rank, char *checksum)
 It writes the RSed file checksum to metadata. More...
 
int FTI_LoadTmpMeta (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt)
 It gets the temporary metadata. More...
 
int FTI_LoadMeta (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt)
 It gets the metadata to recover the data after a failure. More...
 
int FTI_LoadL4CkptMetaData (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt)
 Loads relevant data from checkpoint meta data. More...
 
int FTI_LoadCkptMetaData (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt)
 Loads relevant data from checkpoint meta data. More...
 
int FTI_WriteCkptMetaData (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt)
 Creates or updates checkpoint meta data. More...
 
int FTI_WriteMetadata (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, long *fs, long mfs, char *fnl, char *checksums, int *allVarIDs, long *allVarSizes)
 It writes the metadata to recover the data after a failure. More...
 
int FTI_CreateMetadata (FTIT_configuration *FTI_Conf, FTIT_execution *FTI_Exec, FTIT_topology *FTI_Topo, FTIT_checkpoint *FTI_Ckpt, FTIT_dataset *FTI_Data)
 It writes the metadata to recover the data after a failure. More...
 

Detailed Description

Metadata functions for the FTI library.

Copyright (c) 2017 Leonardo A. Bautista-Gomez All rights reserved

FTI - A multi-level checkpointing library for C/C++/Fortran applications

Revision 1.0 : Fault Tolerance Interface (FTI)

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Date
October, 2017

Function Documentation

int FTI_CreateMetadata ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt,
FTIT_dataset FTI_Data 
)

It writes the metadata to recover the data after a failure.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.
FTI_DataDataset metadata.
Returns
integer FTI_SCES if successful.

This function gathers information about the checkpoint files in the group (name and sizes), and creates the metadata file used to recover in case of failure.

Here is the call graph for this function:

int FTI_GetChecksums ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt,
char *  checksum,
char *  ptnerChecksum,
char *  rsChecksum 
)

It gets the checksums from metadata.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.
checksumPointer to fill the checkpoint checksum.
ptnerChecksumPointer to fill the ptner file checksum.
rsChecksumPointer to fill the RS file checksum.
Returns
integer FTI_SCES if successful.

This function reads the metadata file created during checkpointing and recovers the checkpoint checksum. If there is no RS file, rsChecksum string length is 0.

Here is the call graph for this function:

int FTI_LoadCkptMetaData ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt 
)

Loads relevant data from checkpoint meta data.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.

Here is the call graph for this function:

int FTI_LoadL4CkptMetaData ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt 
)

Loads relevant data from checkpoint meta data.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.

Here is the call graph for this function:

int FTI_LoadMeta ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt 
)

It gets the metadata to recover the data after a failure.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.
Returns
integer FTI_SCES if successful.

This function reads the metadata file created during checkpointing and recovers the checkpoint file name, file size, partner file size and the size of the largest file in the group (for padding if necessary during decoding).

Here is the call graph for this function:

int FTI_LoadTmpMeta ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt 
)

It gets the temporary metadata.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.
Returns
integer FTI_SCES if successful.

This function reads the temporary metadata file created during checkpointing and recovers the checkpoint file name, file size, partner file size and the size of the largest file in the group (for padding if necessary during decoding).

Here is the call graph for this function:

int FTI_WriteCkptMetaData ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt 
)

Creates or updates checkpoint meta data.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.

Writes checkpoint meta data in checkpoint meta data file.

  • timestamp
  • level
  • number of processes participating in the checkpoint
  • I/O mode
  • dCP enabled/disabled

Here is the call graph for this function:

int FTI_WriteMetadata ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
long *  fs,
long  mfs,
char *  fnl,
char *  checksums,
int *  allVarIDs,
long *  allVarSizes 
)

It writes the metadata to recover the data after a failure.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
fsPointer to the list of checkpoint sizes.
mfsThe maximum checkpoint file size.
fnlPointer to the list of checkpoint names.
checksumsChecksums array.
allVarIDsIDs of vars from all processes in group.
allVarSizesSizes of vars from all processes in group.
Returns
integer FTI_SCES if successful.

This function should be executed only by one process per group. It writes the metadata file used to recover in case of failure.

Here is the call graph for this function:

int FTI_WriteRSedChecksum ( FTIT_configuration FTI_Conf,
FTIT_execution FTI_Exec,
FTIT_topology FTI_Topo,
FTIT_checkpoint FTI_Ckpt,
int  rank,
char *  checksum 
)

It writes the RSed file checksum to metadata.

Parameters
FTI_ConfConfiguration metadata.
FTI_ExecExecution metadata.
FTI_TopoTopology metadata.
FTI_CkptCheckpoint metadata.
rankglobal rank of the process
checksumPointer to the checksum.
Returns
integer FTI_SCES if successful.

This function should be executed only by one process per group. It writes the RSed checksum to the metadata file.

Here is the call graph for this function: