AWS networks grow organically. A VPC here for production, one there for staging, a Transit Gateway to connect them, a peering added in a hurry during an incident. A year later, you have seven VPCs and no clear picture of what can reach what.

We built a representative 7-VPC environment — the kind of setup most teams have after a year or two of growth — and ran one Netway scan on it. We found four risks. None of them were visible in the AWS console.


The Environment

Seven VPCs across three environment groups:

Production

prod-api (10.0.0.0/16), prod-data (10.1.0.0/16), prod-shared (10.2.0.0/16) — all connected via Transit Gateway

Staging

staging-api (10.3.0.0/16), staging-data (10.4.0.0/16) — also on the TGW, but on a separate TGW route table to keep them isolated from production

Development

dev (10.0.0.0/16), dev-tools (10.5.0.0/16) — no TGW, no connectivity

The TGW setup looks clean on paper. Separate route tables for prod and staging. No cross-environment propagations. Everything appears isolated.

One scan later, it wasn't.


The Four Findings

1

Isolation Breach CRITICAL

The problem: A VPC peering connection between staging-api and prod-api that bypasses the Transit Gateway entirely.

The TGW route tables correctly separate production and staging at the TGW layer. But someone added a direct VPC peering between staging-api and prod-api — probably during debugging, probably months ago, and then left in place. The peering created a route in both route tables:

  • prod-api private route table: 10.3.0.0/16 → VpcPeeringConnectionId
  • staging-api route table: 10.0.0.0/16 → VpcPeeringConnectionId

The TGW isolation is irrelevant. The peering is a direct path.

Why it matters: prod-api carries PCI scope. A direct network path from a non-PCI staging environment into a PCI-scoped production VPC is an automatic PCI-DSS finding under requirements 1.3.1 and 1.3.2. Your QSA will flag it. The audit will be delayed.

Netway traced the exact path: prod-api → peering connection → staging-api → staging-data. Every hop is logged, timestamped, and included in the signed compliance report.

The fix

Delete the VPC peering connection. If the debugging use case was legitimate, use SSM Session Manager with proper role scoping — not a persistent network path between environments.

2

CIDR Conflict HIGH

The problem: The dev VPC has CIDR block 10.0.0.0/16 — identical to prod-api.

This is a silent failure waiting to happen. Right now, dev has no connectivity to anything — no TGW, no peering, no IGW. The conflict is harmless in isolation. The moment someone tries to connect dev to the rest of the network, AWS routing will break silently. Traffic intended for prod-api will mis-route to dev, or the attachment will simply fail to route at all.

Why it matters: CIDR conflicts are invisible until you try to do something with the conflicting VPC. By then, the fix requires re-IPing one of the VPCs — a significant operation that touches every subnet, every route table, every security group reference, and every workload. The earlier you catch it, the cheaper it is.

The fix

Re-CIDR dev to a non-overlapping block (e.g., 10.10.0.0/16) before it ever gets connected to anything.

3

Orphaned VPCs MEDIUM

The problem: Both dev and dev-tools have no connectivity to anything. No Internet Gateway. No NAT Gateway. No TGW attachment. No peering. They're islands.

This is less urgent than the first two — orphaned VPCs don't create a security risk on their own. But they signal one of two things: either these VPCs were created and then abandoned (wasted spend on whatever is running inside), or they're intended to be connected but the setup was never completed.

In the second case, the CIDR conflict in dev makes any future connection dangerous.

The fix

For genuinely abandoned VPCs, delete them. For VPCs that should be connected, complete the setup — and for dev, re-CIDR first.

4

S3 via NAT Gateway HIGH

The problem: An EC2 instance in prod-api's private subnet is sending S3 traffic through the NAT Gateway.

The NAT Gateway and EC2 instance are in the same AZ. The private route table sends 0.0.0.0/0 to the NAT. There is no S3 VPC Gateway Endpoint. Every S3 operation travels:

  1. EC2 instance (private subnet)
  2. NAT Gateway — $0.045/GB NAT processing
  3. Out to the public S3 endpoint
  4. Back the same route

For a data pipeline writing terabytes to S3 monthly, this is hundreds of dollars silently accumulating. Netway identified the specific instance, VPC, region, and extrapolated the monthly cost from the flow log traffic volume.

The fix
# 1. Create a free S3 Gateway Endpoint in prod-api
aws ec2 create-vpc-endpoint \
  --vpc-id <prod-api-vpc-id> \
  --service-name com.amazonaws.us-east-2.s3 \
  --route-table-ids <prod-api-private-route-table-id>

# 2. Move the NAT Gateway to us-east-2b to match where EC2 runs
#    (or move EC2 to us-east-2a — pick the direction that fits your architecture)

S3 Gateway Endpoints are free. Once in place, S3 traffic routes privately inside AWS and the NAT charge disappears entirely.


What One Scan Found

Finding Severity Risk
Isolation breach — prod-api ↔ staging-api via VPC peering CRITICAL PCI-DSS audit finding (1.3.1, 1.3.2)
CIDR conflict — dev shares 10.0.0.0/16 with prod-api HIGH Silent routing failure if dev is ever connected
Orphaned VPCs — dev and dev-tools have no connectivity MEDIUM Wasted spend, incomplete setup
S3 via NAT + NAT in wrong AZ — prod-api EC2 HIGH Hundreds of dollars/month at production data volumes
None of these were visible in the AWS console. The VPC console shows the peering as "active". The route tables look correct. The CIDR overlap is not flagged until you try to connect the conflicting VPC. The S3 cost shows up as an aggregate line item on the bill with no resource attribution.

Why These Patterns Are So Common

The isolation breach came from a VPC peering added during debugging and left in place. This happens constantly — the peering solved an immediate problem, nobody wrote it down, nobody removed it.

The CIDR conflict came from dev being created by copying a VPC configuration without changing the CIDR. A 30-second mistake that's invisible until it causes a routing failure months later.

The orphaned VPCs are the result of VPC creation being easier than VPC deletion. The setup was started, deprioritized, and never completed.

The S3-via-NAT pattern is the default. A private subnet with a NAT Gateway and no S3 endpoint — that's what you get when you follow the standard "private subnet" architecture without the optimization step. Most teams set it up once and move on.

These aren't mistakes made by teams who don't know what they're doing. They're the natural result of infrastructure growing faster than documentation and faster than audits.


Running This Yourself

  1. Register at netway.basavytix.com
  2. Run the CloudFormation deploy command shown in your dashboard
  3. Run the scan
  4. See what's actually in your network

See it in action

Watch the full demo — 7 VPCs, one scan, four findings surfaced live on the Netway dashboard.