Home July 2, 2026 10 min read AWS By Arunkumar Ganesan

AWS Lambda in Production: Triggers, Streams, Edge, and the Traps Teams Learn Late

Lambda is excellent when work is short, event driven, and easy to isolate. It gets expensive, noisy, or fragile when teams treat it like a tiny container that can ignore state, retries, downstream limits, and deployment safety.

Trigger first The event source decides retry behavior, batch shape, and user latency.
State last Reuse clients, but never trust memory for user or business truth.
Concurrency is pressure Every fast scale up can become a database or API incident downstream.
Deploy by alias Canary and blue green are traffic routing problems, not zip upload rituals.

I like Lambda when the work is short, event driven, and easy to isolate. I do not recommend treating it like a tiny container that can ignore state, retries, downstream limits, and deployment safety.

Picture an order platform. Mobile clients submit orders through API Gateway. Payment providers call back asynchronously. Fulfillment events land on Kinesis. Receipt images arrive in S3. A few CloudFront redirects need to happen before the request reaches origin. Lambda can participate in all of that, but each path has a different failure model.

How Lambda usually enters the system

API HTTP command API Gateway, Function URL, ALB, or VPC Lattice invokes the function synchronously.
EVT Service event S3, SNS, EventBridge, Cognito, or Step Functions pushes an event.
Q Queue batch SQS is polled by an event source mapping and invokes Lambda with messages.
KDS Stream batch Kinesis and DynamoDB Streams invoke with ordered records and checkpoint rules.
L AWS Lambda One invocation runs inside one execution environment at a time.
CF Edge request CloudFront can run Lambda at Edge for viewer and origin events.

Start with the trigger, not the function

When I talk about "Start with the trigger, not the function", I am checking whether AWS Lambda in Production makes ownership, failure handling, or rollback clearer.

Synchronous HTTP paths care about user latency, status codes, payload size, and client retry behavior. EventBridge and SNS paths care about decoupling and asynchronous retry. SQS paths care about visibility timeout, batch failure, and dead letter queues. Stream paths care about ordering, shard pressure, iterator age, and poison records. Edge paths care about global propagation and CloudFront restrictions.

This is why "we use Lambda" is too vague. The production design starts with the event source and then works inward to handler code, timeout, memory, concurrency, alarms, and deployment style.

Lambda as a microservice boundary

My recommendation in "Lambda as a microservice boundary" is to write the operational cost beside the architecture.

A Lambda function becomes a microservice only when it owns a meaningful boundary: contract, validation, data write, alarms, deployment, rollback, and operational responsibility. A random function named send_email is not automatically a service. An OrderCommand function behind API Gateway can be a microservice entry point if it owns order creation rules and protects the write path.

import json
import os
import uuid
from datetime import datetime, timezone

import boto3
from botocore.exceptions import ClientError

dynamodb = boto3.resource("dynamodb")
events = boto3.client("events")

orders = dynamodb.Table(os.environ["ORDER_TABLE"])
event_bus = os.environ["EVENT_BUS_NAME"]


def response(status_code, body):
    return {
        "statusCode": status_code,
        "headers": {"content-type": "application/json"},
        "body": json.dumps(body),
    }


def lambda_handler(event, context):
    headers = {k.lower(): v for k, v in (event.get("headers") or {}).items()}
    idempotency_key = headers.get("idempotency-key")

    if not idempotency_key:
        return response(400, {"message": "Idempotency-Key header is required"})

    body = json.loads(event.get("body") or "{}")
    customer_id = body.get("customerId")
    items = body.get("items", [])

    if not customer_id or not items:
        return response(400, {"message": "customerId and items are required"})

    order_id = body.get("orderId") or str(uuid.uuid4())
    now = datetime.now(timezone.utc).isoformat()

    item = {
        "pk": f"ORDER#{order_id}",
        "sk": "ORDER",
        "orderId": order_id,
        "customerId": customer_id,
        "items": items,
        "status": "ACCEPTED",
        "idempotencyKey": idempotency_key,
        "createdAt": now,
    }

    try:
        orders.put_item(
            Item=item,
            ConditionExpression="attribute_not_exists(pk)",
        )
    except ClientError as error:
        if error.response["Error"]["Code"] != "ConditionalCheckFailedException":
            raise

        existing = orders.get_item(
            Key={"pk": f"ORDER#{order_id}", "sk": "ORDER"},
            ConsistentRead=True,
        ).get("Item")

        if existing and existing.get("idempotencyKey") == idempotency_key:
            return response(200, {"orderId": order_id, "status": existing["status"]})

        return response(409, {"message": "orderId already exists for another request"})

    events.put_events(
        Entries=[
            {
                "EventBusName": event_bus,
                "Source": "orders.command",
                "DetailType": "OrderAccepted",
                "Detail": json.dumps({"orderId": order_id, "customerId": customer_id}),
            }
        ]
    )

    return response(202, {"orderId": order_id, "status": "ACCEPTED"})
This Python handler uses the idempotency key to make retries safe. The conditional write stops duplicate order creation, and the EventBridge event is emitted only after the write succeeds. For a stricter production flow, put the event in a transactional outbox beside the order and let a separate publisher drain it; that is the same reliability idea discussed in the CQRS article on this site.

Lambda for stream processing

Stream processing with Lambda is convenient, but the batch checkpoint rules matter. With Kinesis and DynamoDB Streams, Lambda checkpoints a batch after success. If one record fails and the function throws, the whole batch is retried. That can replay records that already succeeded and increase iterator age. AWS supports partial batch responses so the function can report the failed sequence number instead of treating the whole batch as failed.

import base64
import json
import os

import boto3

dynamodb = boto3.resource("dynamodb")
processed = dynamodb.Table(os.environ["PROCESSED_EVENT_TABLE"])


class RetryableRecordError(Exception):
    pass


def decode_kinesis_record(record):
    payload = base64.b64decode(record["kinesis"]["data"]).decode("utf-8")
    return json.loads(payload)


def mark_processed(event_id, amount):
    processed.put_item(
        Item={"eventId": event_id, "amount": amount},
        ConditionExpression="attribute_not_exists(eventId)",
    )


def process_record(record):
    event = decode_kinesis_record(record)
    event_id = event["eventId"]
    amount = event["amount"]

    if amount < 0:
        raise RetryableRecordError("negative amount must be corrected upstream")

    try:
        mark_processed(event_id, amount)
    except Exception as error:
        if "ConditionalCheckFailedException" in str(error):
            return
        raise


def lambda_handler(event, context):
    for record in event["Records"]:
        sequence_number = record["kinesis"]["sequenceNumber"]

        try:
            process_record(record)
        except RetryableRecordError:
            return {
                "batchItemFailures": [
                    {"itemIdentifier": sequence_number}
                ]
            }

    return {"batchItemFailures": []}
This stream handler records the business event id before treating the record as complete. If a retry delivers the same event again, the conditional write makes the duplicate harmless. When a record cannot be processed, the handler returns the Kinesis sequence number so Lambda retries from the failed record instead of forcing every successful record in the batch through the same work again.

Lambda at the edge is a different product shape

What I learnt around "Lambda at the edge is a different product shape" is that a clean diagram is not enough if the failure path is vague.

Lambda at Edge is useful when CloudFront needs code during viewer or origin events. Good examples are origin selection, security header logic, request normalization that needs libraries, or body access that CloudFront Functions cannot provide. It is not the same as regional Lambda.

The practical gotchas are sharp: the function must use a numbered version, not $LATEST or an alias; it must be created in US East, N. Virginia; it cannot use VPC access, layers, X-Ray, provisioned concurrency, or custom environment variables beyond reserved variables. If all you need is a small redirect, header rewrite, or cache key normalization, CloudFront Functions is often a better fit.

Settings that change production behavior

Setting Production meaning What to watch
Memory Memory also controls CPU share. More memory can make CPU bound functions faster and cheaper if duration drops enough. Duration, billed duration, max memory used, cost per request.
Timeout Do not set every function to 15 minutes. Set it from downstream latency and upstream timeout budgets. Timeouts, retries, API Gateway integration timeout, SQS visibility timeout.
Reserved concurrency Caps and reserves capacity for a function. It protects dependencies from a sudden invocation flood. Throttles, database connections, third party rate limits.
Provisioned concurrency Keeps execution environments initialized for latency sensitive paths. Cold starts, provisioned concurrency spillover, cost during quiet hours.
SnapStart For supported runtimes, Lambda snapshots initialized state at publish time and resumes from that snapshot. Initialization uniqueness, stale connections, runtime support, version cleanup.
Ephemeral storage The /tmp directory can be sized from 512 MB to 10,240 MB for workloads like PDFs, image transforms, or model files. Disk pressure, cleanup between invokes, sensitive temp data.

Limits worth knowing before the design review

Limit Current AWS documented value Why the team should care
Maximum invocation time 900 seconds, or 15 minutes. This rules out long running workers unless the job is split, checkpointed, or moved to Fargate, Batch, or Step Functions.
Memory 128 MB to 10,240 MB. AWS documents about one vCPU at 1,769 MB. Memory tuning is CPU tuning. A higher memory setting can lower duration and sometimes lower total cost.
Default regional concurrency 1,000 concurrent executions per account per Region, adjustable by quota request. A noisy function can consume the regional pool unless critical functions use reserved concurrency.
Function scaling rate 1,000 new execution environments every 10 seconds per function. Sudden spikes may still see throttling while Lambda ramps, especially on synchronous APIs.
Invocation payload 6 MB for synchronous request and response. 1 MB for asynchronous invocation. Move large payloads through S3 and pass object references through events.
Ephemeral storage 512 MB to 10,240 MB in /tmp. Useful for file transforms, but it is not durable storage and should not hold business truth.

Concurrency is where Lambda stops being magic

The useful formula is simple: concurrency = requests per second * average duration in seconds. If an API receives 2,000 requests per second and each request takes 250 ms, the function needs about 500 concurrent executions. If one dependency slows the function to 1 second, the same traffic now needs about 2,000 concurrent executions. That is how a database wobble becomes Lambda throttling.

AWS documents a default regional concurrency quota of 1,000 for Lambda, and a per function scaling rate of 1,000 new execution environments every 10 seconds. Those numbers are not the only limits in the system. API Gateway, DynamoDB, RDS, Redis, partner APIs, NAT gateways, and identity providers all have their own limits. Reserved concurrency is often less about making Lambda faster and more about making sure Lambda does not flatten something slower.

Anti patterns, and why they are anti patterns

One function with forty routes It becomes a mini monolith with one deployment blast radius, one IAM role, noisy logs, shared cold starts, and unclear ownership. Use this only when the routes are truly one bounded service.
Synchronous Lambda to Lambda chains Each hop adds timeout risk, retries, tracing complexity, and concurrency multiplication. If the call is part of one user request, consider a direct service call or Step Functions. If it is not part of the user request, publish an event.
No idempotency on queues or streams Event source mappings process records at least once. Duplicate records are normal, not weird. Without idempotency, retries can double charge, double email, double reserve inventory, or corrupt projections.
Opening database connections per invocation Lambda scales faster than many relational databases accept connections. Reuse clients outside the handler and consider RDS Proxy when relational connection management is the real bottleneck.
Using Lambda for long running background jobs Lambda has a 15 minute invocation limit. Jobs that need hours, local daemons, long lived sockets, or complex worker lifecycle usually fit ECS Fargate or AWS Batch better.
Ignoring payload and temp storage limits Large files should move through S3, not through invocation payloads. Payload limits and /tmp limits become production incidents when the happy path was tested only with tiny samples.
Treating Lambda at Edge like normal Lambda The restrictions are different: no VPC, no layers, no provisioned concurrency, and numbered versions only. A regional Lambda deployment habit can break at the edge.

Lambda versus ECS Fargate

Lambda is usually better when

The workload is event driven, short lived, bursty, or idle most of the day. Lambda also wins when the trigger is already an AWS event source and the team wants less service scaling machinery.

Fargate is usually better when

The workload is a long running service, has steady traffic, needs a familiar container process model, uses long lived connections, or does not fit within Lambda duration and runtime limits.

The benefit of Lambda over Fargate is not that it is "more serverless." Fargate also removes EC2 fleet management. Lambda removes service count management, task placement, load balancer target health, idle capacity, and much of the scaling code. The tradeoff is control. With Fargate, you keep the container and process model. With Lambda, you accept the function invocation model.

Canary and blue green deployments

I use "Canary and blue green deployments" to test whether the pattern helps on a bad production day, not only in a design review.

Production Lambda deployments should use versions and aliases. Publish an immutable version, point an alias such as live to the current version, and shift a small percentage of traffic to the new version. Watch errors, duration, throttles, business metrics, and dependency metrics. Roll forward by moving the alias to 100 percent, or roll back by pointing it to the old version.

Resources:
  OrderApiFunction:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: python3.12
      Handler: app.lambda_handler
      AutoPublishAlias: live
      MemorySize: 1024
      Timeout: 10
      DeploymentPreference:
        Type: Canary10Percent10Minutes
        Alarms:
          - !Ref OrderApiAliasErrorsAlarm
          - !Ref OrderApiP95DurationAlarm
        Hooks:
          PreTraffic: !Ref OrderApiPreTrafficCheck
          PostTraffic: !Ref OrderApiPostTrafficCheck
This SAM configuration publishes a version, routes production traffic through the live alias, and shifts 10 percent of traffic before moving the rest. The deployment is not trusted just because the zip uploaded. It is trusted after pre traffic checks, alarms, and post traffic checks keep the new version inside the error and latency budget.

Gotchas teams should know before launch

The way I apply "Gotchas teams should know before launch" is to make the tradeoff explicit before the implementation spreads.

Match SQS visibility timeout to real function duration, or messages can return while the first invocation is still running. Keep SDK clients outside the handler, but keep request data inside the handler. Use structured logs and CloudWatch metrics instead of making synchronous metric calls from the hot path. Alarm on throttles, errors, duration, concurrent executions, iterator age for streams, and dead letter queue depth. Load test the full path, not just the function, because upstream and downstream quotas rarely scale at the same shape.

Finally, delete old versions and layers intentionally. Lambda version cleanup is not glamorous, but unused versions consume code storage and can also keep SnapStart snapshots active in ways the team forgets about.

My production test for Lambda is simple: can the team explain what happens when the same event arrives twice, the downstream dependency slows down, and the new version fails after 10 percent of traffic sees it? If that answer is clear, Lambda is usually a good fit.

What I learnt using Lambda is that the function code is usually the easy part. The hard part is being honest about retries, concurrency, and the service that gets hit after Lambda scales.

#AWS #Lambda #Serverless #Python #CloudArchitecture #StreamProcessing #LambdaAtEdge #ECSFargate

Sources