Navigating results¶
Once all tasks have completed, the data will be uploaded into a database. Currently, both sqlite and postgresql have been tested, but mysql should work in principle as well.
The data is organised into a simple collection of tables. A benchmark is composed of a single run that groups together several instances. Each instance is a combination of a metric, tool, input data and options. Each metric outputs a tab-separated table that is uploaded into a separate table and adds runtime execution performance into tables called _timings. A typical collection of tables looks like this:
> .tables
# Maintenance tables
run
instance
# Timing tables
metric_timings
tool_timings
# metric tables
bedtools_stats_allele_frequency
bedtools_jaccard
bcftools_stats_depth_distribution
bcftools_stats_indel_context_length
bcftools_stats_indel_context_summary
bcftools_stats_indel_distribution
bcftools_stats_quality
bcftools_stats_singleton_stats
bcftools_stats_substitution_types
bcftools_stats_summary_numbers
vcftools_tstv_by_count
vcftools_tstv_summary
Table overview¶
- run
Information about a benchmark run. Columns:
- id
- Identification number of this run
- author
- The user name of the person running the pipeline
- created
- Date the benchmark run was created
- pipeline_name
- The name of the pipeline
- pipeline_version
- The pipeline version (git commit), typically the current git commit.
- config
- The benchmark configuration file in json format
- title
- The title of the benchmark run, see Configuring a benchmark
- description
- The description of the benchmark run, see Configuring a benchmark
- instance
An instantiation of a combination of a particular metric, tool, input data and options.
- id
- Identification number of this instance
- run_id
- Reference to run
- completed
- Time that computation was completed
- input
- Input data
- metric_name
- Name of the metric
- metric_version
- Version of the metric
- metric_options
- Options supplied to the metric
- tool_name
- Name of the tool
- tool_version
- Tool version
- tool_options
- Options supplied to the tool
- meta_data
- Other environment variables
- timings
Timing information
- instance_id
- Reference to instance
- host
- Execution host
- started
- Time that job was submitted
- completed
- Time that job was completed
- total_t
- Total time of job, including waiting in the queue
- wall_t
- Time spend in user/system in total
- user_t
- Time spend in user in job script
- sys_t
- Time spend in system in job script
- child_user_t
- Time spend in user in child processes. This is typically the tool/metric being executed
- child_sys_t
- Time spend in system in child processes. This is typically the tool/metric being executed
- statement
- Command line statement
- tags
List of tags
- run_id
- Reference to the run
- tag
- A tag associated with the run
- arvados_job
Arvados job information. This table is only present if arvados is
--engine=arvados
has been used