VPC-Connected Workflows: Accessing the Internet and Cross-Region S3 from Workflows (English)
AWS HealthOmics VPC-Connected Workflows: Accessing the Internet and Cross-Region S3 from Workflows
Overview
In March 2026, AWS HealthOmics introduced VPC-Connected Workflows. This feature allows HealthOmics workflows to route network traffic through a customer-managed VPC, removing the network restrictions of the default RESTRICTED mode.
This post covers how the feature works, real-world test results, and how to set up the infrastructure using CDK.
Limitations of RESTRICTED Mode
The default networking mode for HealthOmics workflows is RESTRICTED. In this mode, workflow tasks can only access S3 and ECR within the same region — all other network communication is blocked.
This restricts common bioinformatics workflow scenarios:
- No access to public databases: Cannot download reference data from NCBI, Ensembl, or other public bioinformatics databases
- No external API calls: Cannot connect to license servers, REST APIs, notification webhooks, etc.
- No cross-region S3 access: Cannot access genomic datasets stored in S3 buckets in other AWS regions
For cross-region S3 specifically, RESTRICTED mode validates S3 bucket regions at StartRun API call time — if any S3 URI references a different region, the workflow won't even start.
VPC Mode: How It Works
In VPC mode, HealthOmics creates ENIs (Elastic Network Interfaces) for workflow tasks in the customer's VPC private subnets. Traffic flows through the VPC's NAT gateway to reach the internet or AWS services in other regions.
┌──────────────────────────────────────────────────────┐
│ VPC (10.0.0.0/16) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ Private Sub-a│ │ Private Sub-b│ │Private Sub-c│ │
│ │ HealthOmics │ │ HealthOmics │ │ HealthOmics│ │
│ │ ENIs │ │ ENIs │ │ ENIs │ │
│ └──────┬───────┘ └──────┬───────┘ └─────┬──────┘ │
│ └─────────────────┼────────────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ NAT Gateway │ │
│ │ (Public Subnet) │ │
│ └────────┬────────┘ │
│ │ │
│ S3 Gateway Endpoint ─── Same-region S3 (no cost) │
│ Security Group: Outbound HTTPS 443 only │
└───────────────────────────┼──────────────────────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Public │ │ Cross-region │ │ External │
│ Internet │ │ S3 Buckets │ │ REST APIs │
│ (NCBI, etc.) │ │ (us-east-1) │ │ (GitHub etc.)│
└──────────────┘ └──────────────┘ └──────────────┘
Traffic path summary:
- Same-region S3: Direct access via S3 Gateway endpoint (no data transfer cost)
- Internet/Cross-region: Private subnet → NAT gateway → Internet gateway
Viewing in the Console
You can see the difference between the two modes in the HealthOmics console Run summary.
RESTRICTED Mode Run
When running in RESTRICTED mode, the Networking mode field shows Restricted. No Configuration field is displayed, and the workflow runs in HealthOmics' default network environment. Only same-region S3 and ECR are accessible; external internet connectivity is not available.
VPC Mode Run
When running in VPC mode, the Networking mode field shows Virtual Private Cloud (VPC), with an additional Configuration field below it. This displays the linked VPC Configuration name (e.g., tutorial-vpc-config) as a clickable link, leading to the Configuration details (VPC ID, subnets, security groups, etc.).
Both runs used the same workflow (vpc-connectivity-test-v3, WDL), with similar execution times (RESTRICTED: 5m 49s, VPC: 5m 21s).
Test Results
We ran the same WDL workflow in both RESTRICTED and VPC modes in the ap-northeast-2 (Seoul) region to compare connectivity.
Test Cases
Category A — Internet Access:
- A1:
curl https://checkip.amazonaws.com(outbound HTTPS + NAT public IP verification) - A2:
wget https://ftp.ncbi.nlm.nih.gov/robots.txt(NCBI public resource download) - A3:
curl https://api.github.com(external REST API access)
Category B — Cross-Region S3 Access (us-east-1 → ap-northeast-2):
- B1:
aws s3 cp(cross-region file download) - B2:
aws s3 ls(cross-region bucket listing)
Results Comparison
| Test | RESTRICTED Mode | VPC Mode |
|---|---|---|
| A1 — checkip.amazonaws.com | FAIL (timeout) | PASS (NAT IP returned) |
| A2 — NCBI robots.txt download | FAIL (timeout) | PASS (file downloaded) |
| A3 — GitHub API call | FAIL (timeout) | PASS (HTTP 200) |
| B1 — Cross-region S3 download | Blocked at API | PASS (file downloaded) |
| B2 — Cross-region S3 listing | Blocked at API | PASS (object list returned) |
In RESTRICTED mode, all internet tests fail with timeouts, and cross-region S3 is rejected at the StartRun API call with ValidationException: S3 bucket not located in ap-northeast-2 region.
In VPC mode, all 5 tests pass.
Infrastructure Setup with CDK
To use VPC mode, you need to create a VPC meeting HealthOmics requirements and a HealthOmics Configuration. The HealthOmicsVpc CDK L3 Construct simplifies this process.
This CDK Construct creates everything with a single cdk deploy:
- VPC: Public/private subnets automatically placed in HealthOmics-supported AZs
- NAT Gateway: Choose development (1) or production (1 per AZ) mode
- Security Group: Least privilege principle (outbound HTTPS 443 only)
- S3 Gateway Endpoint: No data transfer cost for same-region S3 access
- VPC Flow Logs: Automatically sent to CloudWatch Logs
- HealthOmics Configuration: Automatically created and lifecycle-managed via Custom Resource
Usage Example
import { HealthOmicsVpc } from './lib';
new HealthOmicsVpc(stack, 'HealthOmicsVpc', {
networkingConfigurationName: 'my-vpc-config',
deploymentMode: 'development', // 1 NAT GW (cost savings)
vpcEndpoints: ['s3'], // S3 Gateway endpoint
});
Deployment Steps
# Set environment variables
export AWS_REGION=us-east-1
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
# CDK deployment
cd healthomics-vpc-cdk-main
npm install
npx cdk bootstrap aws://${AWS_ACCOUNT_ID}/${AWS_REGION}
npx cdk deploy --require-approval never
Once the Configuration reaches ACTIVE status (approximately 5 minutes), you can run workflows in VPC mode.
Running Workflows
RESTRICTED Mode (Default)
aws omics start-run \
--workflow-id <WORKFLOW_ID> \
--role-arn <ROLE_ARN> \
--output-uri s3://my-bucket/output/ \
--parameters '{"output_s3_uri": "s3://my-bucket/report.json"}'
VPC Mode
aws omics start-run \
--workflow-id <WORKFLOW_ID> \
--role-arn <ROLE_ARN> \
--output-uri s3://my-bucket/output/ \
--networking-mode VPC \
--configuration-name my-vpc-config \
--parameters '{
"output_s3_uri": "s3://my-bucket/report.json",
"cross_region_s3_uri": "s3://bucket-in-other-region/data.txt"
}'
The only difference is the addition of --networking-mode VPC and --configuration-name.
Things to Know
ECR Image Configuration
HealthOmics only supports ECR private repository images. Docker Hub and Public ECR images cannot be used. In addition to IAM role permissions, you must set an access policy on the ECR repository itself for the omics.amazonaws.com service principal.
aws ecr set-repository-policy \
--repository-name my-repo \
--policy-text '{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"Service": "omics.amazonaws.com"},
"Action": [
"ecr:BatchGetImage",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchCheckLayerAvailability"
]
}]
}'
S3 Parameter Validation
HealthOmics validates S3 URIs in workflow parameters at StartRun call time:
- Referenced S3 objects must exist (for output paths, create a placeholder)
- In RESTRICTED mode, S3 buckets must be in the same region
Performance Impact
VPC mode adds approximately 30–60 seconds to startup time due to ENI provisioning. There is no significant difference in workflow task execution time itself.
Cost Considerations
| Resource | Cost | Notes |
|---|---|---|
| NAT Gateway | ~$0.045/hr + data processing | Largest cost component |
| S3 Gateway Endpoint | Free | Same-region S3 access |
| VPC Flow Logs | CloudWatch Logs ingestion cost | Useful for troubleshooting |
For testing or development, use development mode (1 NAT gateway) and clean up resources immediately after testing to avoid unnecessary costs.
Supported Regions
| Region | AZ Count |
|---|---|
| us-east-1 (N. Virginia) | 4 |
| us-west-2 (Oregon) | 3 |
| eu-west-1 (Ireland) | 3 |
| eu-west-2 (London) | 3 |
| eu-central-1 (Frankfurt) | 3 |
| ap-southeast-1 (Singapore) | 3 |
| ap-northeast-2 (Seoul) | 3 |
| il-central-1 (Tel Aviv) | 3 |
Summary
VPC-Connected Workflows significantly expand the network accessibility of HealthOmics workflows. This feature is particularly useful for bioinformatics pipelines that require access to external databases, API calls, or cross-region data. Using the CDK L3 Construct, you can complete the complex VPC infrastructure setup with a single command.