Skip to main content

VPC-Connected Workflows: Accessing the Internet and Cross-Region S3 from Workflows (English)

AWS HealthOmics VPC-Connected Workflows: Accessing the Internet and Cross-Region S3 from Workflows

Overview

In March 2026, AWS HealthOmics introduced VPC-Connected Workflows. This feature allows HealthOmics workflows to route network traffic through a customer-managed VPC, removing the network restrictions of the default RESTRICTED mode.

This post covers how the feature works, real-world test results, and how to set up the infrastructure using CDK.


Limitations of RESTRICTED Mode

The default networking mode for HealthOmics workflows is RESTRICTED. In this mode, workflow tasks can only access S3 and ECR within the same region — all other network communication is blocked.

This restricts common bioinformatics workflow scenarios:

  • No access to public databases: Cannot download reference data from NCBI, Ensembl, or other public bioinformatics databases
  • No external API calls: Cannot connect to license servers, REST APIs, notification webhooks, etc.
  • No cross-region S3 access: Cannot access genomic datasets stored in S3 buckets in other AWS regions

For cross-region S3 specifically, RESTRICTED mode validates S3 bucket regions at StartRun API call time — if any S3 URI references a different region, the workflow won't even start.


VPC Mode: How It Works

In VPC mode, HealthOmics creates ENIs (Elastic Network Interfaces) for workflow tasks in the customer's VPC private subnets. Traffic flows through the VPC's NAT gateway to reach the internet or AWS services in other regions.

┌──────────────────────────────────────────────────────┐
│  VPC (10.0.0.0/16)                                   │
│                                                      │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────┐  │
│  │ Private Sub-a│  │ Private Sub-b│  │Private Sub-c│ │
│  │  HealthOmics │  │  HealthOmics │  │ HealthOmics│  │
│  │  ENIs        │  │  ENIs        │  │ ENIs       │  │
│  └──────┬───────┘  └──────┬───────┘  └─────┬──────┘  │
│         └─────────────────┼────────────────┘         │
│                           │                          │
│                  ┌────────▼────────┐                 │
│                  │   NAT Gateway   │                 │
│                  │  (Public Subnet) │                │
│                  └────────┬────────┘                 │
│                           │                          │
│  S3 Gateway Endpoint ─── Same-region S3 (no cost)    │
│  Security Group: Outbound HTTPS 443 only             │
└───────────────────────────┼──────────────────────────┘
                            │
              ┌─────────────┼─────────────┐
              ▼             ▼             ▼
    ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
    │  Public      │ │ Cross-region │ │ External     │
    │  Internet    │ │ S3 Buckets   │ │ REST APIs    │
    │ (NCBI, etc.) │ │ (us-east-1)  │ │ (GitHub etc.)│
    └──────────────┘ └──────────────┘ └──────────────┘

Traffic path summary:

  • Same-region S3: Direct access via S3 Gateway endpoint (no data transfer cost)
  • Internet/Cross-region: Private subnet → NAT gateway → Internet gateway

Viewing in the Console

You can see the difference between the two modes in the HealthOmics console Run summary.

RESTRICTED Mode Run

When running in RESTRICTED mode, the Networking mode field shows Restricted. No Configuration field is displayed, and the workflow runs in HealthOmics' default network environment. Only same-region S3 and ECR are accessible; external internet connectivity is not available.

VPC Mode Run

When running in VPC mode, the Networking mode field shows Virtual Private Cloud (VPC), with an additional Configuration field below it. This displays the linked VPC Configuration name (e.g., tutorial-vpc-config) as a clickable link, leading to the Configuration details (VPC ID, subnets, security groups, etc.).

Both runs used the same workflow (vpc-connectivity-test-v3, WDL), with similar execution times (RESTRICTED: 5m 49s, VPC: 5m 21s).


Test Results

We ran the same WDL workflow in both RESTRICTED and VPC modes in the ap-northeast-2 (Seoul) region to compare connectivity.

Test Cases

Category A — Internet Access:

  • A1: curl https://checkip.amazonaws.com (outbound HTTPS + NAT public IP verification)
  • A2: wget https://ftp.ncbi.nlm.nih.gov/robots.txt (NCBI public resource download)
  • A3: curl https://api.github.com (external REST API access)

Category B — Cross-Region S3 Access (us-east-1 → ap-northeast-2):

  • B1: aws s3 cp (cross-region file download)
  • B2: aws s3 ls (cross-region bucket listing)

Results Comparison

Test RESTRICTED Mode VPC Mode
A1 — checkip.amazonaws.com FAIL (timeout) PASS (NAT IP returned)
A2 — NCBI robots.txt download FAIL (timeout) PASS (file downloaded)
A3 — GitHub API call FAIL (timeout) PASS (HTTP 200)
B1 — Cross-region S3 download Blocked at API PASS (file downloaded)
B2 — Cross-region S3 listing Blocked at API PASS (object list returned)

In RESTRICTED mode, all internet tests fail with timeouts, and cross-region S3 is rejected at the StartRun API call with ValidationException: S3 bucket not located in ap-northeast-2 region.

In VPC mode, all 5 tests pass.


Infrastructure Setup with CDK

To use VPC mode, you need to create a VPC meeting HealthOmics requirements and a HealthOmics Configuration. The HealthOmicsVpc CDK L3 Construct simplifies this process.

This CDK Construct creates everything with a single cdk deploy:

  • VPC: Public/private subnets automatically placed in HealthOmics-supported AZs
  • NAT Gateway: Choose development (1) or production (1 per AZ) mode
  • Security Group: Least privilege principle (outbound HTTPS 443 only)
  • S3 Gateway Endpoint: No data transfer cost for same-region S3 access
  • VPC Flow Logs: Automatically sent to CloudWatch Logs
  • HealthOmics Configuration: Automatically created and lifecycle-managed via Custom Resource

Usage Example

import { HealthOmicsVpc } from './lib';

new HealthOmicsVpc(stack, 'HealthOmicsVpc', {
  networkingConfigurationName: 'my-vpc-config',
  deploymentMode: 'development',    // 1 NAT GW (cost savings)
  vpcEndpoints: ['s3'],             // S3 Gateway endpoint
});

Deployment Steps

# Set environment variables
export AWS_REGION=us-east-1
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

# CDK deployment
cd healthomics-vpc-cdk-main
npm install
npx cdk bootstrap aws://${AWS_ACCOUNT_ID}/${AWS_REGION}
npx cdk deploy --require-approval never

Once the Configuration reaches ACTIVE status (approximately 5 minutes), you can run workflows in VPC mode.


Running Workflows

RESTRICTED Mode (Default)

aws omics start-run \
    --workflow-id <WORKFLOW_ID> \
    --role-arn <ROLE_ARN> \
    --output-uri s3://my-bucket/output/ \
    --parameters '{"output_s3_uri": "s3://my-bucket/report.json"}'

VPC Mode

aws omics start-run \
    --workflow-id <WORKFLOW_ID> \
    --role-arn <ROLE_ARN> \
    --output-uri s3://my-bucket/output/ \
    --networking-mode VPC \
    --configuration-name my-vpc-config \
    --parameters '{
        "output_s3_uri": "s3://my-bucket/report.json",
        "cross_region_s3_uri": "s3://bucket-in-other-region/data.txt"
    }'

The only difference is the addition of --networking-mode VPC and --configuration-name.


Things to Know

ECR Image Configuration

HealthOmics only supports ECR private repository images. Docker Hub and Public ECR images cannot be used. In addition to IAM role permissions, you must set an access policy on the ECR repository itself for the omics.amazonaws.com service principal.

aws ecr set-repository-policy \
    --repository-name my-repo \
    --policy-text '{
        "Version": "2012-10-17",
        "Statement": [{
            "Effect": "Allow",
            "Principal": {"Service": "omics.amazonaws.com"},
            "Action": [
                "ecr:BatchGetImage",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchCheckLayerAvailability"
            ]
        }]
    }'

S3 Parameter Validation

HealthOmics validates S3 URIs in workflow parameters at StartRun call time:

  • Referenced S3 objects must exist (for output paths, create a placeholder)
  • In RESTRICTED mode, S3 buckets must be in the same region

Performance Impact

VPC mode adds approximately 30–60 seconds to startup time due to ENI provisioning. There is no significant difference in workflow task execution time itself.

Cost Considerations

Resource Cost Notes
NAT Gateway ~$0.045/hr + data processing Largest cost component
S3 Gateway Endpoint Free Same-region S3 access
VPC Flow Logs CloudWatch Logs ingestion cost Useful for troubleshooting

For testing or development, use development mode (1 NAT gateway) and clean up resources immediately after testing to avoid unnecessary costs.

Supported Regions

Region AZ Count
us-east-1 (N. Virginia) 4
us-west-2 (Oregon) 3
eu-west-1 (Ireland) 3
eu-west-2 (London) 3
eu-central-1 (Frankfurt) 3
ap-southeast-1 (Singapore) 3
ap-northeast-2 (Seoul) 3
il-central-1 (Tel Aviv) 3

Summary

VPC-Connected Workflows significantly expand the network accessibility of HealthOmics workflows. This feature is particularly useful for bioinformatics pipelines that require access to external databases, API calls, or cross-region data. Using the CDK L3 Construct, you can complete the complex VPC infrastructure setup with a single command.