I like Lambda when the work is short, event driven, and easy to isolate. I do not recommend treating it like a tiny container that can ignore state, retries, downstream limits, and deployment safety.
Picture an order platform. Mobile clients submit orders through API Gateway. Payment providers call back asynchronously. Fulfillment events land on Kinesis. Receipt images arrive in S3. A few CloudFront redirects need to happen before the request reaches origin. Lambda can participate in all of that, but each path has a different failure model.
How Lambda usually enters the system
Start with the trigger, not the function
When I talk about "Start with the trigger, not the function", I am checking whether AWS Lambda in Production makes ownership, failure handling, or rollback clearer.
Synchronous HTTP paths care about user latency, status codes, payload size, and client retry behavior. EventBridge and SNS paths care about decoupling and asynchronous retry. SQS paths care about visibility timeout, batch failure, and dead letter queues. Stream paths care about ordering, shard pressure, iterator age, and poison records. Edge paths care about global propagation and CloudFront restrictions.
This is why "we use Lambda" is too vague. The production design starts with the event source and then works inward to handler code, timeout, memory, concurrency, alarms, and deployment style.
Lambda as a microservice boundary
My recommendation in "Lambda as a microservice boundary" is to write the operational cost beside the architecture.
A Lambda function becomes a microservice only when it owns a meaningful boundary: contract, validation, data write, alarms, deployment, rollback, and operational responsibility. A random function named send_email is not automatically a service. An OrderCommand function behind API Gateway can be a microservice entry point if it owns order creation rules and protects the write path.
import json
import os
import uuid
from datetime import datetime, timezone
import boto3
from botocore.exceptions import ClientError
dynamodb = boto3.resource("dynamodb")
events = boto3.client("events")
orders = dynamodb.Table(os.environ["ORDER_TABLE"])
event_bus = os.environ["EVENT_BUS_NAME"]
def response(status_code, body):
return {
"statusCode": status_code,
"headers": {"content-type": "application/json"},
"body": json.dumps(body),
}
def lambda_handler(event, context):
headers = {k.lower(): v for k, v in (event.get("headers") or {}).items()}
idempotency_key = headers.get("idempotency-key")
if not idempotency_key:
return response(400, {"message": "Idempotency-Key header is required"})
body = json.loads(event.get("body") or "{}")
customer_id = body.get("customerId")
items = body.get("items", [])
if not customer_id or not items:
return response(400, {"message": "customerId and items are required"})
order_id = body.get("orderId") or str(uuid.uuid4())
now = datetime.now(timezone.utc).isoformat()
item = {
"pk": f"ORDER#{order_id}",
"sk": "ORDER",
"orderId": order_id,
"customerId": customer_id,
"items": items,
"status": "ACCEPTED",
"idempotencyKey": idempotency_key,
"createdAt": now,
}
try:
orders.put_item(
Item=item,
ConditionExpression="attribute_not_exists(pk)",
)
except ClientError as error:
if error.response["Error"]["Code"] != "ConditionalCheckFailedException":
raise
existing = orders.get_item(
Key={"pk": f"ORDER#{order_id}", "sk": "ORDER"},
ConsistentRead=True,
).get("Item")
if existing and existing.get("idempotencyKey") == idempotency_key:
return response(200, {"orderId": order_id, "status": existing["status"]})
return response(409, {"message": "orderId already exists for another request"})
events.put_events(
Entries=[
{
"EventBusName": event_bus,
"Source": "orders.command",
"DetailType": "OrderAccepted",
"Detail": json.dumps({"orderId": order_id, "customerId": customer_id}),
}
]
)
return response(202, {"orderId": order_id, "status": "ACCEPTED"})
Lambda for stream processing
Stream processing with Lambda is convenient, but the batch checkpoint rules matter. With Kinesis and DynamoDB Streams, Lambda checkpoints a batch after success. If one record fails and the function throws, the whole batch is retried. That can replay records that already succeeded and increase iterator age. AWS supports partial batch responses so the function can report the failed sequence number instead of treating the whole batch as failed.
import base64
import json
import os
import boto3
dynamodb = boto3.resource("dynamodb")
processed = dynamodb.Table(os.environ["PROCESSED_EVENT_TABLE"])
class RetryableRecordError(Exception):
pass
def decode_kinesis_record(record):
payload = base64.b64decode(record["kinesis"]["data"]).decode("utf-8")
return json.loads(payload)
def mark_processed(event_id, amount):
processed.put_item(
Item={"eventId": event_id, "amount": amount},
ConditionExpression="attribute_not_exists(eventId)",
)
def process_record(record):
event = decode_kinesis_record(record)
event_id = event["eventId"]
amount = event["amount"]
if amount < 0:
raise RetryableRecordError("negative amount must be corrected upstream")
try:
mark_processed(event_id, amount)
except Exception as error:
if "ConditionalCheckFailedException" in str(error):
return
raise
def lambda_handler(event, context):
for record in event["Records"]:
sequence_number = record["kinesis"]["sequenceNumber"]
try:
process_record(record)
except RetryableRecordError:
return {
"batchItemFailures": [
{"itemIdentifier": sequence_number}
]
}
return {"batchItemFailures": []}
Lambda at the edge is a different product shape
What I learnt around "Lambda at the edge is a different product shape" is that a clean diagram is not enough if the failure path is vague.
Lambda at Edge is useful when CloudFront needs code during viewer or origin events. Good examples are origin selection, security header logic, request normalization that needs libraries, or body access that CloudFront Functions cannot provide. It is not the same as regional Lambda.
The practical gotchas are sharp: the function must use a numbered version, not $LATEST or an alias; it must be created in US East, N. Virginia; it cannot use VPC access, layers, X-Ray, provisioned concurrency, or custom environment variables beyond reserved variables. If all you need is a small redirect, header rewrite, or cache key normalization, CloudFront Functions is often a better fit.
Settings that change production behavior
| Setting | Production meaning | What to watch |
|---|---|---|
| Memory | Memory also controls CPU share. More memory can make CPU bound functions faster and cheaper if duration drops enough. | Duration, billed duration, max memory used, cost per request. |
| Timeout | Do not set every function to 15 minutes. Set it from downstream latency and upstream timeout budgets. | Timeouts, retries, API Gateway integration timeout, SQS visibility timeout. |
| Reserved concurrency | Caps and reserves capacity for a function. It protects dependencies from a sudden invocation flood. | Throttles, database connections, third party rate limits. |
| Provisioned concurrency | Keeps execution environments initialized for latency sensitive paths. | Cold starts, provisioned concurrency spillover, cost during quiet hours. |
| SnapStart | For supported runtimes, Lambda snapshots initialized state at publish time and resumes from that snapshot. | Initialization uniqueness, stale connections, runtime support, version cleanup. |
| Ephemeral storage | The /tmp directory can be sized from 512 MB to 10,240 MB for workloads like PDFs, image transforms, or model files. |
Disk pressure, cleanup between invokes, sensitive temp data. |
Limits worth knowing before the design review
| Limit | Current AWS documented value | Why the team should care |
|---|---|---|
| Maximum invocation time | 900 seconds, or 15 minutes. | This rules out long running workers unless the job is split, checkpointed, or moved to Fargate, Batch, or Step Functions. |
| Memory | 128 MB to 10,240 MB. AWS documents about one vCPU at 1,769 MB. | Memory tuning is CPU tuning. A higher memory setting can lower duration and sometimes lower total cost. |
| Default regional concurrency | 1,000 concurrent executions per account per Region, adjustable by quota request. | A noisy function can consume the regional pool unless critical functions use reserved concurrency. |
| Function scaling rate | 1,000 new execution environments every 10 seconds per function. | Sudden spikes may still see throttling while Lambda ramps, especially on synchronous APIs. |
| Invocation payload | 6 MB for synchronous request and response. 1 MB for asynchronous invocation. | Move large payloads through S3 and pass object references through events. |
| Ephemeral storage | 512 MB to 10,240 MB in /tmp. |
Useful for file transforms, but it is not durable storage and should not hold business truth. |
Concurrency is where Lambda stops being magic
The useful formula is simple: concurrency = requests per second * average duration in seconds. If an API receives 2,000 requests per second and each request takes 250 ms, the function needs about 500 concurrent executions. If one dependency slows the function to 1 second, the same traffic now needs about 2,000 concurrent executions. That is how a database wobble becomes Lambda throttling.
AWS documents a default regional concurrency quota of 1,000 for Lambda, and a per function scaling rate of 1,000 new execution environments every 10 seconds. Those numbers are not the only limits in the system. API Gateway, DynamoDB, RDS, Redis, partner APIs, NAT gateways, and identity providers all have their own limits. Reserved concurrency is often less about making Lambda faster and more about making sure Lambda does not flatten something slower.
Anti patterns, and why they are anti patterns
/tmp limits become production incidents when the happy path was tested only with tiny samples.
Lambda versus ECS Fargate
Lambda is usually better when
The workload is event driven, short lived, bursty, or idle most of the day. Lambda also wins when the trigger is already an AWS event source and the team wants less service scaling machinery.
Fargate is usually better when
The workload is a long running service, has steady traffic, needs a familiar container process model, uses long lived connections, or does not fit within Lambda duration and runtime limits.
The benefit of Lambda over Fargate is not that it is "more serverless." Fargate also removes EC2 fleet management. Lambda removes service count management, task placement, load balancer target health, idle capacity, and much of the scaling code. The tradeoff is control. With Fargate, you keep the container and process model. With Lambda, you accept the function invocation model.
Canary and blue green deployments
I use "Canary and blue green deployments" to test whether the pattern helps on a bad production day, not only in a design review.
Production Lambda deployments should use versions and aliases. Publish an immutable version, point an alias such as live to the current version, and shift a small percentage of traffic to the new version. Watch errors, duration, throttles, business metrics, and dependency metrics. Roll forward by moving the alias to 100 percent, or roll back by pointing it to the old version.
Resources:
OrderApiFunction:
Type: AWS::Serverless::Function
Properties:
Runtime: python3.12
Handler: app.lambda_handler
AutoPublishAlias: live
MemorySize: 1024
Timeout: 10
DeploymentPreference:
Type: Canary10Percent10Minutes
Alarms:
- !Ref OrderApiAliasErrorsAlarm
- !Ref OrderApiP95DurationAlarm
Hooks:
PreTraffic: !Ref OrderApiPreTrafficCheck
PostTraffic: !Ref OrderApiPostTrafficCheck
live alias, and shifts 10 percent of traffic before moving the rest. The deployment is not trusted just because the zip uploaded. It is trusted after pre traffic checks, alarms, and post traffic checks keep the new version inside the error and latency budget.
Gotchas teams should know before launch
The way I apply "Gotchas teams should know before launch" is to make the tradeoff explicit before the implementation spreads.
Match SQS visibility timeout to real function duration, or messages can return while the first invocation is still running. Keep SDK clients outside the handler, but keep request data inside the handler. Use structured logs and CloudWatch metrics instead of making synchronous metric calls from the hot path. Alarm on throttles, errors, duration, concurrent executions, iterator age for streams, and dead letter queue depth. Load test the full path, not just the function, because upstream and downstream quotas rarely scale at the same shape.
Finally, delete old versions and layers intentionally. Lambda version cleanup is not glamorous, but unused versions consume code storage and can also keep SnapStart snapshots active in ways the team forgets about.
My production test for Lambda is simple: can the team explain what happens when the same event arrives twice, the downstream dependency slows down, and the new version fails after 10 percent of traffic sees it? If that answer is clear, Lambda is usually a good fit.
What I learnt using Lambda is that the function code is usually the easy part. The hard part is being honest about retries, concurrency, and the service that gets hit after Lambda scales.
Sources
- AWS Lambda quotas
- Understanding Lambda function scaling
- Best practices for AWS Lambda functions
- Lambda event source mappings
- Kinesis partial batch response with Lambda
- CloudFront Functions and Lambda at Edge differences
- Lambda at Edge restrictions
- Lambda SnapStart
- CodeDeploy deployment configurations
- AWS SAM DeploymentPreference
- AWS Fargate for Amazon ECS