SN_OFFLINE_DURATION

Section: FSCLI (1)
Updated: October 2018


NAME

sn_offline_duration – Scan TAC logs to produce a duration-frequency data file of days between truncation and retrieval actions for all managed files

SYNOPSIS

sn_offline_duration [-c CSV-report-file[-n #days]] [-e mm:dd:yyyy:hh:mm:ss] [-f] [-i intermediate-file] [-l mm:dd:yyyy:hh:mm:ss] [-o intermediate-file] [-r] [-t mm:dd:yyyy:hh:mm:ss] [-w] [logfiles…]

DESCRIPTION

The sn_offline_duration program helps to evaluate the effectiveness of truncation-policy settings. It scans TAC log messages for truncation and retrieval operations and correlates them to glean information about the elapsed days between these operations. It produces data for a duration-frequency report in terms of the number of days between truncation and retrieval and the number of occurrences for each day value. For example, the report can be useful where a large number of retrieve occurrences are shown in the report. The truncation policy can be delayed by the day value correlating to those occurrences to reduce the need for retrieves. This will increase the need for disk space, so it may be necessary to go through a series of tuning steps to find the optimal settings for your workload. The time window for the report is controlled by specifying the earliest truncation log message and the latest retrieval log message to be considered. The report does not distinguish between policies or file systems because truncation and retrieval log messages do not provide this information. The report file is in a comma-separated-values (CSV) format.

The program operates in three modes: 1) ingest of information from an optional intermediate file and specified log files, 2) optional output of an intermediate file for use in a future run of the program, 3) optional output of a duration-frequency report file in CSV format that is filtered by time and retrieval-type options. The ingest mode takes in information from every truncation and retrieval log message. The output intermediate file stores information from every truncation and retrieval log message. The duration-frequency report file can be limited to certain truncation and retrieval information by filtering options.

Reports with different filtering options can be run on the same intermediate file because it contains all the available information. Each run of the program can start with an old intermediate file, collect more information from new log files, and create a new intermediate file that includes all the information. This saves processing time by eliminating the need to rescan log files.

Each row contains a duration in days and a count of the number of times files were retrieved after being offline for that many days. Spreadsheet programs can analyze and format the data into histograms and other chart types.

Three types of retrieves can be selected to be counted in the report file. Retrieves caused by the fsretrieve command are counted by specifying the -f command-line option. Retrieves caused by read operations are counted by specifying the -r command-line option. Retrieves caused by write operations are counted by specifying the -w command-line option. The report may be limited to a beginning and ending time window by using the -e earliest-log-message and -l latest-log-message options. The number of day-duration buckets can be specified. The last bucket includes all occurrences that are longer than the highest bucket day number.

The program is designed to use TAC log files that have been rolled, which adds time information to the file name and compresses the files. The program depends on the encoded time in the file name to determine a year value, which is not included in log-message timestamps. Compressed files are automatically expanded within the program by passing them through the zcat command. Uncompressed files may also be used. When the filename does not have a timestamp that is recognizable to the program, (tac_00 for example), the current system time is used as the reference for determining the year for log messages in the file.

STATISTICS

A few statistics filtered by the time window are written to stdout at the end of each run. These are:

Files truncated and retrieved
The number of distinct files involved in truncation-retrieval cycles.
Files having a truncation
The number of distinct files having at least one logged truncation action.
Files having a retrieval
The number of distinct files having at least one logged retrieve action of the selected types.
Files having a truncation or retrieval
The number of distinct files having at least one logged truncation or retrieve action.

OPTIONS

-c duration-frequency-report-file
The relative or absolute pathname where the duration-frequency report file is to be created. As a safeguard, the program will quit with an error if the file exists.
-e timestamp
Sets a limit for the earliest log message to include in the report. Otherwise, there is no limit. When evaluating the impact of a new class-policy truncation setting, this option can be used to exclude activity that preceded the change. The option-argument format is mm:dd:yyyy:hh:mm:ss, which mimics the timestamp format used in log-file names. It is: month (mm), month-day (dd), year (yyyy), hour (hh), minute (mm), and second (ss).
-f
Causes fsretrieve type retrieves to be included in the report.
-i intermediate-input-file
Identifies a relative or absolute pathname to a file that was previously written with the intermediate-file output option. The input and output intermediate-file options used from one run to the next provide a way to reuse scanned information across multiple runs of the program. The intermediate file contains the full set of scanned data unlimited by any command-line options.
-l timestamp
Sets a limit for the latest log message to include in the report. Otherwise, there is no limit. When testing a new class-policy truncation setting, this option can be used to exclude activity that followed the change. This option-argument’s format is mm:dd:yyyy:hh:mm:ss, which mimics the timestamp format used in log-file names. It is: month (mm), month-day (dd), year (yyyy), hour (hh), minute (mm), and second (ss).
-n #buckets
Specifies the number of discrete per-day buckets to use in producing the report. The number should be equal to the length of the time window in days. Any positive integer is valid. One additional bucket is added to accumulate all the durations that are longer than the number specified here. This option cannot be used without the -c option. The default is 100.
-o intermediate-output-file
Provides a relative or absolute pathname to a file where intermediate output is written. As a safeguard, the program will quit with an error if the file exists. This option provides a way to reuse the scanned information across runs of the program as described above for the -i option. The intermediate file receives the full set of scanned data unlimited by any command-line options.
-r
Causes read-event retrieves to be include in the report.
-t timestamp
Sets a limit for the earliest log data to be ingested from the intermediate file. This allows old data to be trimmed off so that uninteresting data do not slow down the program or consume intermediate file space. This option-argument’s format is mm:dd:yyyy:hh:mm:ss, which mimics the timestamp format used in log-file names. It is: month (mm), month-day (dd), year (yyyy), hour (hh), minute (mm), and second (ss).
-w
Causes write-event retrieves to be include in the report.
logfiles
The absolute or relative pathnames of a set of TAC log files. When no pathname is specified, the command’s output is derived entirely from intermediate-file data. All the logfiles pass through the zcat command, which expands files that have been compressed with gzip or passes non-compressed files without modification. The timestamp-encoded file names of rolled TAC log files provide necessary information for the year and month when the file was rolled, which is used to determine the correct year to add to log-message timestamps. When recognizable time information is not available in the file name, the year and month at the time of running this command are used. Note that TAC log files are written to a local file system per MDC. The logs from both MDCs of a high-availability (HA) system must be processed to get an accurate report. An intermediate file is a convenient way to pass the essential information between MDCs. Processing a log message more than once degrades performance, but does not affect report results. Redundant logs are automatically discarded.

SEE ALSO

$FS_HOME/logs/tac/tac_00, $FS_HOME/logs/tac/tac_00.*.gz, fsloglevel(1)