How Task Concurrency Limits Increase Storage Costs in AWS HealthOmics Workflow Runs

When running genomic workflows on AWS HealthOmics, you might use Run Groups with maxCpus to control task concurrency — for example, to manage costs or share capacity across teams. But there's an important side effect: limiting concurrency can actually increase your total run cost, specifically through storage charges.

We ran a controlled experiment to measure this effect and verify the underlying billing mechanics.

TL;DR

Throttling a 50-task workflow from full parallelism down to 2 concurrent tasks increased run duration by 5.67x
Storage cost increased by 5.67x ($0.049 → $0.278), tracking duration exactly
Compute cost stayed flat ($0.220 → $0.217, just -1.2%)
Total cost increased by 84% ($0.269 → $0.496)

Background: How HealthOmics Billing Works

HealthOmics run costs have two independent components with fundamentally different billing models:

Compute Cost (per task)

Each task is billed based on its actual execution time (start to stop), with a 60-second minimum, at the per-second rate of the assigned instance type. Whether tasks run in parallel or sequentially, each task still executes for the same duration — so total compute cost is the same regardless of concurrency.

compute_cost = Σ max(task_runtime_seconds, 60) × instance_rate_per_second

Storage Cost (per run)

Run storage is provisioned when the run enters the "Running" state and persists until the run transitions to "Stopping". For STATIC storage, you pay for the full provisioned capacity (minimum 1,200 GiB) for the entire wall-clock duration:

storage_cost = $0.0001918/GB-hour × provisioned_GiB × run_duration_hours

This means: anything that extends run duration — including concurrency throttling — directly increases storage cost.

Experiment Design

Workflow

We created a custom WDL workflow with 50 scatter tasks. Each task:

Requests 2 CPUs and 4 GB memory (maps to omics.c.large)
Generates a 100 MB random file (simulating storage I/O)
Runs ~120 seconds of CPU-bound work (md5sum checksums)

A final AggregateResults task collects all outputs.

version 1.0

workflow ScatterStorageCostTest {
    input {
        Int num_tasks = 50
        Int task_duration_seconds = 120
        Int file_size_mb = 100
    }

    scatter (i in range(num_tasks)) {
        call ComputeTask {
            input:
                task_id = i,
                duration_seconds = task_duration_seconds,
                file_size_mb = file_size_mb
        }
    }

    call AggregateResults {
        input:
            results = ComputeTask.result_file
    }

    output {
        File final_report = AggregateResults.report
    }
}

task ComputeTask {
    input {
        Int task_id
        Int duration_seconds
        Int file_size_mb
    }

    command <<<
        echo "Task ~{task_id} starting at $(date -u +%Y-%m-%dT%H:%M:%SZ)"
        dd if=/dev/urandom of=output_~{task_id}.bin bs=1M count=~{file_size_mb} 2>/dev/null

        end_time=$((SECONDS + ~{duration_seconds}))
        while [ $SECONDS -lt $end_time ]; do
            md5sum output_~{task_id}.bin > /dev/null 2>&1
        done

        echo "Task ~{task_id} completed at $(date -u +%Y-%m-%dT%H:%M:%SZ)"
        echo "task_~{task_id}: duration=~{duration_seconds}s, file_size=~{file_size_mb}MB" > result_~{task_id}.txt
    >>>

    runtime {
        docker: "<ecr-uri>/ubuntu:20.04"
        cpu: 2
        memory: "4 GB"
    }

    output {
        File result_file = "result_~{task_id}.txt"
    }
}

task AggregateResults {
    input {
        Array[File] results
    }

    command <<<
        echo "=== Scatter Storage Cost Test Results ==="
        echo "Aggregation started at $(date -u +%Y-%m-%dT%H:%M:%SZ)"
        echo "Number of tasks: $(ls ~{sep=' ' results} | wc -w)"
        echo ""
        cat ~{sep=' ' results}
        echo ""
        echo "Aggregation completed at $(date -u +%Y-%m-%dT%H:%M:%SZ)"
    >>>

    runtime {
        docker: "<ecr-uri>/ubuntu:20.04"
        cpu: 1
        memory: "2 GB"
    }

    output {
        File report = stdout()
    }
}

Two Runs, One Variable

We executed two identical runs with the only difference being the concurrency constraint:

Parameter	Run A (High Concurrency)	Run B (Low Concurrency)
Run Group	None (default limits)	`maxCpus=4`
Effective Parallelism	~50 tasks (all at once)	2 tasks at a time
Storage Type	STATIC (1,200 GiB)	STATIC (1,200 GiB)
Workflow & Parameters	Identical	Identical
Region	us-east-1	us-east-1

With maxCpus=4 and each task requesting 2 CPUs, Run B could only execute 2 tasks simultaneously. Both runs started within 10 seconds of each other.

Results

Duration

	Run A (High)	Run B (Low)	Ratio
Total Duration	12 min 48 sec (768.6s)	1 hr 12 min 35 sec (4,355.0s)	5.67x

Cost Breakdown

Cost Component	Run A (High)	Run B (Low)	Ratio
Compute Cost	$0.2198	$0.2171	0.99x (-1.2%)
Storage Cost	$0.0491	$0.2784	5.67x (+467%)
Total Cost	$0.2690	$0.4955	1.84x (+84%)

Storage as Percentage of Total

	Run A (High)	Run B (Low)
Storage	18.3%	56.2%
Compute	81.7%	43.8%

At low concurrency, storage became the dominant cost component — flipping from less than one-fifth to more than half of total cost.

Verifying the Math

We verified the reported costs against the billing formulas using actual task start/stop timestamps from the run manifests.

Compute Cost Verification

Each ComputeTask (2 CPUs, 4 GiB) maps to an omics.c.large instance at $0.1148/hour ($0.00003188/second).

	Run A	Run B
Total tasks	51	51
Total billed seconds	6,893.7s	6,806.9s
Avg ComputeTask runtime	136.7s	134.9s
Calculated compute cost	$0.2197	$0.2170
Reported compute cost	$0.2198	$0.2171

The total billed seconds differ by only -1.3% between runs. Run B tasks ran marginally faster (~1.8s less per task) due to reduced resource contention with only 2 concurrent tasks, but this effect is negligible.

Conclusion: Compute cost depends on the sum of individual task execution times, not on overall run duration. Parallelism does not change this sum.

Storage Cost Verification

Storage cost = $0.0001918/GB-hr × 1,200 GiB × run_duration_hours

	Run A	Run B
Run duration	768.6s (0.2135 hr)	4,355.0s (1.2097 hr)
Calculated storage cost	$0.0491	$0.2784
Reported storage cost	$0.0491	$0.2784

Both match to the penny. Storage cost tracks wall-clock duration exactly.

Pricing Rates Used

Component	Rate	Source
omics.c.large (2 vCPU, 4 GiB)	$0.1148/hour	Derived from omics.c.4xlarge ($0.9180/hr for 16 vCPU)
STATIC Run Storage	$0.0001918/GB-hour	AWS HealthOmics Pricing (us-east-1)
DYNAMIC Run Storage	$0.0004110/GB-hour	AWS HealthOmics Pricing (us-east-1)
Minimum task billing	60 seconds	AWS HealthOmics Pricing

Task Execution Patterns

Run A: All Tasks in Parallel

All 50 ComputeTask instances launched within a ~70-second window and completed within ~4 minutes. The AggregateResults task ran immediately after.

Time (min from start)   0    1    2    3    4    5    6    7
                        |----|----|----|----|----|----|----| 
Task 00-49              [========================================]  (all 50 parallel)
AggregateResults                                          [==]

Run B: Serialized in Batches of 2

Tasks executed in 25 sequential batches, each taking ~2.5 minutes (135s execution + ~20s scheduling overhead). Total task execution spanned ~58 minutes.

Time (min from start)   0    5    10   15   20  ...  55   60   65
                        |----|----|----|----|---- ... |----|----|----|
Batch 1  (T43,T39)     [====]
Batch 2  (T22,T35)          [====]
Batch 3  (T44,T49)               [====]
...
Batch 25 (T31)                                       [====]
AggregateResults                                            [=]

Generalized Formula

From the verified results, we can derive a general model for storage cost overhead:

Hourly storage burn rate (STATIC 1,200 GiB):
  $0.0001918/GB-hr × 1,200 GiB = $0.2302/hour

Storage cost multiplier ≈ duration_throttled / duration_full_concurrency

Every additional hour of run duration from concurrency throttling
adds ~$0.23 in storage cost.

The actual multiplier in our test (5.67x) was less than the theoretical maximum (50 tasks / 2 concurrent = 25x) because:

Run A still had ~70s of scheduling overhead spreading tasks across staggered launches
Run B had ~20s of scheduling gaps between each batch
These overheads compress the ratio compared to the naive N / C_max estimate

Recommendations

Ensure sufficient concurrency headroom. Set your Run Group's maxCpus high enough to avoid serializing scatter tasks. The HealthOmics default limits (3,000 concurrent CPUs, 10 concurrent tasks per run) are generally adequate, but custom Run Groups with low maxCpus can trigger this issue.
Consider DYNAMIC storage for shorter runs. If your workflow completes in under 2 hours, DYNAMIC storage ($0.0004110/GB-hr) charges only for actual GB-hours consumed rather than 1,200 GiB of provisioned capacity, potentially reducing the storage cost impact.
Monitor run duration as a cost signal. An unexpectedly long run duration may indicate concurrency throttling. Comparing expected vs actual duration can help detect storage cost inflation early.
Use the Run Analyzer for diagnostics. The HealthOmics Run Analyzer (available via aws-healthomics-tools) provides detailed cost breakdowns including the storage vs compute split, making concurrency-related cost inflation easy to identify.

Conclusion

Our experiment confirms that task concurrency limits directly increase HealthOmics run storage costs by extending run duration, while compute costs remain constant. The storage cost scales linearly with wall-clock duration: a 5.67x longer run resulted in a 5.67x higher storage cost and an 84% increase in total cost.

For production genomic workflows with more tasks and longer runtimes, this effect becomes even more significant. Understanding the two distinct billing models — per-task compute vs per-duration storage — is key to optimizing HealthOmics workflow costs.

Tested on AWS HealthOmics in us-east-1 using a custom WDL workflow. Cost analysis performed using the HealthOmics Run Analyzer.

AWS HealthOmics 소개 및 주요 링크

Nf-core 워크플로우 마이그레이션 하기 (scrnaseq)

Nf-core 워크플로우 마이그레이션 하기 (rnaseq)

Nf-core 워크플로우 마이그레이션 하기 (spatialvi)

WDL 워크플로우 마이그레이션 하기 (HiFi-human-WGS-WDL)

AWS HealthOmics로 유전체 분석 (Fastq to VCF, Annotation) 자동화하기

AWS HealthOmics - Storage

AWS HealthOmics에서 Annotation 작업 수행하기

AWS HealthOmics - Troubleshooting

몇 가지 기능과 알아둘 것들

AWS HealthOmics Run Analyzer 분석 리포트

Call Caching

Run Analzyer

VPC-Connected Workflows: 워크플로우에서 인터넷과 크로스 리전 S3에 접근하기

VPC-Connected Workflows: Accessing the Internet and Cross-Region S3 from Workflows

Amazon EMR on EC2

AWS Glue

Hail with Amazon EMR notebook

Quickstart Hail (Korean)

Quickstart Hail (English)

Store multimodal data with purpose-built Health AI services

Example code

개요 및 Executive Summary

프로젝트 분석 및 기술 스택

AWS 아키텍처 전략

스토리지 전략 및 성능 최적화

리소스 요구사항 및 TCO 분석

보안 및 IAM 전략

모니터링 및 운영 전략

구현 로드맵 및 마이그레이션 전략

도전과제 및 해결방안

모범 사례 및 권장사항

FAQ 및 결론

실전 트러블슈팅 가이드