VPC Flow Logs are the most valuable and least used data source in most AWS accounts. They record metadata about every network connection in your VPC — source, destination, port, protocol, byte count — and they are the only way to answer questions like "what is my NAT Gateway actually processing?" or "which instance is generating all this S3 egress traffic?"

Most teams enable Flow Logs for compliance reasons, store them in S3, and never look at them again. The cost signals they contain go unread month after month.

This post explains what Flow Logs contain, how to query them with Athena, and four SQL queries to find the traffic patterns driving avoidable charges.


What VPC Flow Logs Capture

A VPC Flow Log record captures a ten-minute window of traffic for a network interface. Each record contains:

FieldWhat it tells you
srcaddrSource IP address
dstaddrDestination IP address
srcport / dstportSource and destination ports
protocolProtocol number (6=TCP, 17=UDP, 1=ICMP)
bytesTotal bytes transferred in the flow
packetsTotal packets
actionACCEPT or REJECT
flow-directioningress or egress
pkt-src-aws-serviceAWS service at the source, if applicable (e.g. S3, AMAZON)
pkt-dst-aws-serviceAWS service at the destination, if applicable — key for identifying S3-via-NAT

The pkt-src-aws-service and pkt-dst-aws-service fields (highlighted above) are particularly useful for cost analysis — they identify when traffic is going to or from a specific AWS service, making it possible to detect patterns like S3 traffic routing through NAT.

What Flow Logs do not capture: the contents of network traffic. Flow Logs record metadata only — IP addresses, ports, byte counts. They cannot see the data inside your connections.


Enabling VPC Flow Logs with the Extended Format

Flow Logs can be enabled at the VPC, subnet, or network interface level. For cost analysis, VPC-level logging gives the broadest coverage. The extended log format including pkt-src-aws-service and pkt-dst-aws-service is essential — the default format omits these fields.

Enable VPC Flow Logs → S3
aws ec2 create-flow-logs \
  --resource-type VPC \
  --resource-ids <vpc-id> \
  --traffic-type ALL \
  --log-destination-type s3 \
  --log-destination arn:aws:s3:::<your-flow-log-bucket> \
  --log-format '${version} ${account-id} ${interface-id} \
    ${srcaddr} ${dstaddr} ${srcport} ${dstport} \
    ${protocol} ${packets} ${bytes} ${start} ${end} \
    ${action} ${flow-direction} ${pkt-srcaddr} ${pkt-dstaddr} \
    ${pkt-src-aws-service} ${pkt-dst-aws-service}'

Setting Up Athena

Querying Flow Logs directly from S3 is slow without a table structure. Create a partitioned Athena table to make queries fast and cost-effective:

Create Athena table
CREATE EXTERNAL TABLE vpc_flow_logs (
  version      int,
  account_id   string,
  interface_id string,
  srcaddr      string,
  dstaddr      string,
  srcport      int,
  dstport      int,
  protocol     bigint,
  packets      bigint,
  bytes        bigint,
  start        bigint,
  end          bigint,
  action               string,
  flow_direction       string,
  pkt_srcaddr          string,
  pkt_dstaddr          string,
  pkt_src_aws_service  string,
  pkt_dst_aws_service  string
)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
LOCATION 's3://<bucket>/AWSLogs/<account-id>/vpcflowlogs/<region>/'
TBLPROPERTIES ('skip.header.line.count'='1');

Query 1 — S3 Traffic Routing Through NAT

1

Find S3 traffic routing through NAT Gateway

Traffic appearing in this query is going from your NAT Gateway to S3 — meaning it exited through NAT before reaching S3. Any result here represents avoidable spend at $0.045/GB. Replace the srcaddr filter with your NAT Gateway Elastic IPs.

SELECT
  srcaddr,
  dstaddr,
  pkt_dst_aws_service,
  SUM(bytes)                                          AS total_bytes,
  ROUND(SUM(bytes) / 1073741824.0, 3)              AS total_gb,
  ROUND(SUM(bytes) / 1073741824.0 * 0.045, 2)       AS estimated_cost_usd
FROM vpc_flow_logs
WHERE
  year = '2026' AND month = '06'
  AND pkt_dst_aws_service = 'S3'
  AND action = 'ACCEPT'
  AND srcaddr IN ('<nat-gateway-eip>')  -- your NAT EIP(s)
GROUP BY srcaddr, dstaddr, pkt_dst_aws_service
ORDER BY total_bytes DESC
LIMIT 50;

Query 2 — Cross-AZ Traffic

2

Find high-volume flows between private IP ranges

Cross-AZ traffic costs $0.01/GB per direction. This query surfaces the highest-volume private-to-private flows — cross-reference the IP addresses against your subnet AZ assignments to identify cross-AZ pairs. The estimated cost uses $0.02/GB to account for both directions.

SELECT
  srcaddr,
  dstaddr,
  SUM(bytes)                                          AS total_bytes,
  ROUND(SUM(bytes) / 1073741824.0, 3)              AS total_gb,
  ROUND(SUM(bytes) / 1073741824.0 * 0.02, 2)        AS estimated_cost_usd
FROM vpc_flow_logs
WHERE
  year = '2026' AND month = '06'
  AND action = 'ACCEPT'
  AND (srcaddr LIKE '10.%' OR srcaddr LIKE '172.16.%' OR srcaddr LIKE '192.168.%')
  AND (dstaddr LIKE '10.%' OR dstaddr LIKE '172.16.%' OR dstaddr LIKE '192.168.%')
GROUP BY srcaddr, dstaddr
ORDER BY total_bytes DESC
LIMIT 100;

Query 3 — Internet Egress by Instance

3

Top sources of outbound traffic to public IPs

Finds the instances generating the most internet egress. The estimated cost uses $0.09/GB — the standard internet egress rate for the first 10 TB/month. Actual rates vary by region.

SELECT
  srcaddr,
  dstaddr,
  SUM(bytes)                                          AS total_bytes,
  ROUND(SUM(bytes) / 1073741824.0, 3)              AS total_gb,
  ROUND(SUM(bytes) / 1073741824.0 * 0.09, 2)        AS estimated_cost_usd
FROM vpc_flow_logs
WHERE
  year = '2026' AND month = '06'
  AND action = 'ACCEPT'
  AND flow_direction = 'egress'
  AND dstaddr NOT LIKE '10.%'
  AND dstaddr NOT LIKE '172.16.%'
  AND dstaddr NOT LIKE '192.168.%'
  AND dstaddr NOT LIKE '169.254.%'
GROUP BY srcaddr, dstaddr
ORDER BY total_bytes DESC
LIMIT 100;

Query 4 — NAT Gateway Processing by Day

4

Daily NAT Gateway traffic volume and cost

Gives a daily breakdown of NAT Gateway traffic — useful for identifying spikes and trending the cost over time. Replace the IP list with your NAT Gateway Elastic IPs.

SELECT
  year, month, day,
  SUM(bytes)                                          AS total_bytes,
  ROUND(SUM(bytes) / 1073741824.0, 2)              AS total_gb,
  ROUND(SUM(bytes) / 1073741824.0 * 0.045, 2)       AS estimated_nat_cost_usd
FROM vpc_flow_logs
WHERE
  srcaddr IN ('<nat-eip-1>', '<nat-eip-2>')
  OR dstaddr IN ('<nat-eip-1>', '<nat-eip-2>')
GROUP BY year, month, day
ORDER BY year, month, day;

Why Doing This Manually Doesn't Scale

These queries work — but running them in practice is harder than it looks:

Scale

A busy VPC generates hundreds of millions of flow log records per day. Even with Athena and partitioning, a full-month query across a large environment can take minutes and scan gigabytes of data.

NAT Gateway IP identification

The queries above require you to know your NAT Gateway Elastic IPs. In multi-VPC environments with multiple NAT Gateways, keeping this list current is its own maintenance burden.

Joining with resource metadata

Flow Logs contain IP addresses, not instance IDs or VPC names. To answer "which instance is generating this traffic?" you need to join Flow Log data with your EC2 network interface inventory — a separate API call and data merge step.

Ongoing maintenance

Traffic patterns change. New services get deployed. New VPCs get created. Running these queries once gives a point-in-time answer. The patterns need to be evaluated continuously to catch new waste as it appears.


How Netway Automates This

Netway runs these analyses automatically on a schedule — without you writing or maintaining any SQL.

The Lambda function queries your VPC Flow Logs via Athena, enriches the results with resource metadata from the EC2 API, and identifies the specific traffic patterns generating avoidable charges. For each finding you get the VPC, the instance, the pattern, the monthly cost, and the exact fix command.

Your flow log data never leaves your AWS account. Netway's Lambda runs inside your account and queries your own Athena workgroup. Only aggregated findings — VPC IDs, pattern names, cost estimates — are sent to the Netway dashboard.

Getting Started

1

Register at netway.basavytix.com

2

Run the CloudFormation deploy command shown in your dashboard

3

Run the scan

4

Cost findings appear in the Cost tab with monthly estimates and fix commands