Python Best Practices: Write Clean and Efficient Code

Python in Production: Why Your “Best Practices” Are Killing My On-Call Rotation

It was 3:15 AM on a Tuesday in 2019. I was staring at a Grafana dashboard that looked like a heart attack. Our main API, a Python service handling checkout flows for a major e-commerce client, was throwing 504s across three availability zones. The CPU usage was flatlined at 12%, but the memory was climbing in a perfect, terrifying 45-degree angle. We were being OOM-killed every six minutes. The culprit? A “senior” dev had implemented a custom caching decorator using a global dictionary without a TTL or a maximum size. They thought they were being clever. They thought they were following “Python best” practices for performance. They were wrong. I spent four hours manually killing pods and scaling the replica set to 50 just to keep the site alive while we reverted the commit. That was the day I stopped trusting “clean code” tutorials and started caring about how Python actually behaves when the hits the fan.

Most “Python best” lists are written by people who have never had to debug a race condition in a production environment at scale. They talk about list comprehensions and PEP 8. I don’t care about your trailing commas. I care about your signal handling, your dependency resolution, and why your Docker image is 1.2GB. If you want to write Python that survives a traffic spike on a Friday afternoon, you need to stop thinking like a coder and start thinking like a systems engineer. Python is a beautiful, high-level language that hides a lot of complexity. That complexity is exactly what will bite you in the ass when you’re running 500 nodes in a Kubernetes cluster.

The Dependency Lie: Stop Using requirements.txt

If I see a requirements.txt file in a root directory without a corresponding lockfile, I assume the project is broken. It’s just a matter of time. pip install -r requirements.txt is non-deterministic. You might get requests==2.31.0 today, but if a sub-dependency releases a breaking change tomorrow, your CI/CD pipeline will explode. Or worse, it will pass, and your production environment will start behaving erratically because of a transitive dependency conflict.

The “Python best” approach here isn’t just “use a tool.” It’s “use a tool that generates a cryptographic lockfile.” I used to advocate for Poetry, but it’s become bloated and slow. These days, I’m all in on uv. It’s written in Rust, it’s insanely fast, and it handles virtual environments without making me want to throw my laptop out the window. If you aren’t using uv or at least pip-compile from pip-tools, you aren’t doing production Python.

  • Deterministic Builds: Your lockfile must include hashes. If the hash of the downloaded wheel doesn’t match the lockfile, the build should fail. This prevents supply-chain attacks and ensures that what you tested in staging is exactly what runs in prod.
  • Separation of Concerns: Keep your development dependencies (like pytest, black, and mypy) separate from your runtime dependencies. Your production Docker image should not contain a test runner.
  • The pyproject.toml Standard: Stop using setup.py. It’s 2024. Use the PEP 621 standard. It’s declarative, readable, and doesn’t require executing arbitrary Python code just to install a package.
  • Avoid “Latest”: Never, ever use package>=1.0.0 in a production environment without a lockfile. You are begging for a breaking change to ruin your weekend.
# Example of a modern pyproject.toml using uv
[project]
name = "payment-processor"
version = "0.1.0"
dependencies = [
    "fastapi==0.110.0",
    "pydantic==2.6.3",
    "sqlalchemy[asyncio]==2.0.28",
    "structlog==24.1.0",
    "uvicorn[standard]==0.27.1",
]

[tool.uv]
dev-dependencies = [
    "pytest==8.0.2",
    "mypy==1.8.0",
    "httpx==0.27.0",
]

Pro-tip: When using uv, you can run uv pip compile pyproject.toml -o requirements.txt to generate a pinned, hashed file if your legacy deployment scripts still require a requirements.txt. It gives you the speed of modern tools with the compatibility of the old ones.

Typing is Not Optional (If You Value Your Sanity)

I’ve heard the argument a thousand times: “Python is dynamic, adding types just makes it Java.” This is a lazy take. In a large codebase, types are documentation that the compiler (or in this case, mypy) can actually verify. I once spent two days chasing a bug where a function expected a UUID object but received a str. In a dynamic world, it failed five layers deep inside a database driver with a cryptic AttributeError. With mypy, that’s a 2-second fix during local development.

But don’t just add types for the sake of it. Use Pydantic for data validation at the boundaries. If you’re ingesting JSON from an external API like Stripe or GitHub, you cannot trust that data. Pydantic forces you to define a schema and validates it at runtime. If the data is wrong, it fails early and loudly, rather than letting a NoneType error propagate into your business logic.

from pydantic import BaseModel, Field, field_validator
from uuid import UUID
from datetime import datetime

class StripeWebhook(BaseModel):
    event_id: str = Field(..., alias="id")
    created_at: datetime = Field(..., alias="created")
    user_id: UUID
    amount_cents: int

    @field_validator("amount_cents")
    @classmethod
    def must_be_positive(cls, v: int) -> int:
        if v <= 0:
            raise ValueError("We aren't giving money away.")
        return v

def process_payment(data: dict):
    # This is where the magic happens. If data is malformed, 
    # Pydantic raises a ValidationError immediately.
    payment = StripeWebhook(**data)
    print(f"Processing {payment.amount_cents} for {payment.user_id}")

Notice I used UUID and datetime. These aren’t just strings. Pydantic handles the conversion for you. This is “Python best” practice because it reduces the cognitive load on the next developer. They don’t have to guess what data contains; the code tells them exactly what to expect.

The Asyncio Foot-Gun

Everyone wants to use asyncio because it sounds fast. “It’s non-blocking!” they shout. Sure, it’s non-blocking until you accidentally call a synchronous library like requests inside an async def function. Now you’ve blocked the entire event loop, and your “high-performance” service is processing exactly one request at a time while the others time out. I’ve seen entire clusters grind to a halt because someone used time.sleep() instead of await asyncio.sleep().

If you are doing heavy CPU work—image processing, heavy math, or even just massive JSON parsing—asyncio is the wrong tool. Python’s Global Interpreter Lock (GIL) still exists (mostly, until 3.13’s free-threading matures). For CPU-bound tasks, you need multiprocessing. For I/O-bound tasks, asyncio is great, but you have to be disciplined. You need to audit every single library you use to ensure it’s async-compatible. Using psycopg2? Stop. Use psycopg (version 3) or asyncpg. Using requests? Switch to httpx or aiohttp.

One of the biggest “gotchas” in asyncio is task management. People love asyncio.gather(*tasks), but it’s dangerous. If one task fails, the others keep running, often in a “zombie” state, leaking resources. In Python 3.11+, we finally got TaskGroup, which handles this properly. If one task in a group fails, the others are cancelled. This is how you write resilient code.

import asyncio
import httpx

async def fetch_service_status(client: httpx.AsyncClient, url: str):
    response = await client.get(url, timeout=5.0)
    response.raise_for_status()
    return response.json()

async def monitor_system():
    urls = [
        "https://api.stripe.com/health",
        "https://api.github.com/status",
        "http://localhost:8080/health"
    ]

    async with httpx.AsyncClient() as client:
        try:
            async with asyncio.TaskGroup() as tg:
                tasks = [tg.create_task(fetch_service_status(client, url)) for url in urls]

            results = [t.result() for t in tasks]
            print(f"All systems operational: {results}")
        except ExceptionGroup as eg:
            # Handle multiple failures gracefully
            print(f"System failures detected: {eg.exceptions}")

# Pro-tip: Use uvloop for a faster event loop implementation in production.
# import uvloop
# asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

If you’re still on Python 3.9 or 3.10, you’re living in the past. Upgrade. The performance improvements in 3.11 and 3.12 alone are worth the migration effort. We saw a 15% drop in p99 latency just by bumping the runtime version without changing a single line of code.

Logging is Not for Humans

If your logs look like this: INFO: User 12345 logged in, you are making my life miserable. When I’m trying to aggregate logs from 50 different microservices in an ELK stack or Datadog, I don’t want to write complex regex patterns to extract a user ID. I want JSON. Structured logging is the only way to maintain observability in a distributed system.

The standard logging module in Python is a nightmare of global state and confusing configurations. I prefer structlog. It allows you to attach context to a logger. For example, when a request comes in, you can bind a request_id to the logger. Every subsequent log message—even those deep in your business logic—will automatically include that request_id. This makes tracing a single request across your entire stack trivial.

  • Contextual Binding: Bind user_id, correlation_id, and environment at the top level.
  • Level Discipline: DEBUG is for local dev. INFO is for high-level flow. WARNING is for “this is weird but we can recover.” ERROR is for “someone needs to look at this.” CRITICAL is for “the database is gone.”
  • No PII: Never, ever log passwords, credit card numbers, or PII. I once had to scrub 4TB of logs because a dev logged the entire kwargs of a create_user function.
import structlog
from uuid import uuid4

# Configure structlog to output JSON for production
structlog.configure(
    processors=[
        structlog.processors.add_log_level,
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.JSONRenderer()
    ]
)

logger = structlog.get_logger()

def handle_request(user_id: str):
    # Bind request-specific context
    log = logger.bind(request_id=str(uuid4()), user_id=user_id)

    log.info("processing_request", action="payment_initiate")

    try:
        # Simulate business logic
        1 / 0
    except Exception as e:
        log.error("request_failed", error=str(e), exc_info=True)

handle_request("user_8821")

The output is a clean JSON object. My log aggregator can index user_id and request_id automatically. I can now query for all errors associated with user_8821 in seconds. That’s how you reduce Mean Time To Resolution (MTTR).

The Docker Trap: Alpine is Not Your Friend

There is a common “Python best” tip that says you should use Alpine Linux for your Docker images to keep them small. This is terrible advice for Python. Python relies heavily on glibc. Alpine uses musl. Most pre-compiled Python wheels (the stuff you download from PyPI) are built for glibc. When you try to install them on Alpine, pip can’t find a compatible wheel, so it tries to compile the package from source.

Now your “small” image needs gcc, make, and a bunch of system headers just to install pandas or cryptography. Your build time goes from 30 seconds to 15 minutes, and your final image is actually *larger* than if you had just used a slim Debian-based image. Use python:3.12-slim-bookworm. It’s stable, it’s compatible with almost everything, and it’s maintained by people who know what they’re doing.

Also, stop running your containers as root. It’s a massive security risk. If someone finds an exploit in your web framework, they have root access to the container. Use a non-privileged user.

# Use a specific version, never 'latest'
FROM python:3.12-slim-bookworm AS builder

# Set environment variables to make Python behave in Docker
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    UV_PROJECT_ENVIRONMENT=/venv

# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv

WORKDIR /app
COPY pyproject.toml uv.lock ./

# Install dependencies into a virtualenv
RUN uv sync --frozen --no-dev

# Final stage
FROM python:3.12-slim-bookworm

WORKDIR /app
COPY --from=builder /venv /venv
COPY ./src ./src

# Create a non-root user
RUN groupadd -g 999 python && \
    useradd -r -u 999 -g python python
USER python

ENV PATH="/venv/bin:$PATH"

# Use exec form for CMD to handle signals correctly
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]

Using the exec form (["executable", "param1"]) for your CMD is crucial. If you use the shell form (CMD uvicorn src.main:app), Python runs as a child of /bin/sh. When Kubernetes sends a SIGTERM to stop the pod, the shell doesn’t forward it to Python. Your app will just sit there until K8s loses patience and SIGKILLs it after 30 seconds. This means no graceful shutdowns, no closing DB connections, and potentially corrupted data.

The Real World: Managing State and Connections

In a production environment, your Python app is rarely a standalone entity. It’s a node in a graph. It talks to Postgres, Redis, RabbitMQ, and third-party APIs. The way you manage these connections determines whether your app is resilient or brittle.

One of the most common mistakes is not setting timeouts on *everything*. If you use the requests library, the default timeout is None. This means if a third-party API hangs, your worker thread will hang forever. Eventually, all your workers are stuck waiting for a response that will never come, and your service goes down. Always set a connect timeout and a read timeout.

Database connection pooling is another area where “Python best” practices often fail. People either open too many connections and crash Postgres, or they don’t use a pool and spend half their request time doing TCP handshakes. If you’re using SQLAlchemy, use the QueuePool. If you’re in a serverless environment like AWS Lambda, you need a proxy like RDS Proxy or PgBouncer because Lambda will exhaust your connection limit in seconds.

from sqlalchemy.ext.asyncio import create_async_engine

# Production-ready engine configuration
engine = create_async_engine(
    "postgresql+asyncpg://user:pass@localhost/dbname",
    pool_size=20,          # Maximum number of permanent connections
    max_overflow=10,       # Allow 10 extra connections during spikes
    pool_timeout=30,       # Give up after 30 seconds of waiting for a connection
    pool_recycle=1800,     # Close connections after 30 mins to avoid stale links
    pool_pre_ping=True     # Check if the connection is alive before using it
)

The pool_pre_ping=True is a lifesaver. It handles the “The network went away for a second” scenario by transparently reconnecting if the connection is dead. Without it, your first few requests after a DB restart will all fail with a BrokenPipeError.

The “Gotcha”: Circular Imports and the Global Scope

As your project grows, you will eventually hit a circular import. You have models.py that needs utils.py, and utils.py that needs models.py. The “Python best” way to fix this isn’t to put the import inside a function (though that works in a pinch). It’s to refactor. Circular imports are a sign of tight coupling. Move the shared logic to a third module, or use type hinting strings ("ModelName") to avoid importing the class at runtime.

Also, be extremely careful with what you put in the global scope. Anything at the top level of a module is executed when the module is imported. If you have a db_connection = connect_to_db() at the top of a file, that connection will be created during the build process or when your unit tests run. This is a nightmare for testing. Use dependency injection or a factory pattern. Keep your side effects inside functions or classes that are instantiated at runtime.

I once saw a codebase where a global variable was used to store a configuration fetched from an AWS Parameter Store. Every time a developer ran a unit test, the test suite would try to connect to AWS. If they were on a plane without Wi-Fi, the tests wouldn’t even start. Don’t be that person. Use pydantic-settings to manage your config and load it explicitly.

from pydantic_settings import BaseSettings, SettingsConfigDict

class Settings(BaseSettings):
    db_url: str
    api_key: str
    debug: bool = False

    model_config = SettingsConfigDict(env_file=".env")

# Instantiate once at the entry point of your app
settings = Settings()

The Wrap-up

Stop chasing the latest “clean code” trends and start looking at your telemetry. Python is a tool, and like any tool, it has failure modes that only appear under pressure. Focus on deterministic dependencies, strict typing at the boundaries, disciplined concurrency, and structured observability. If you can’t explain how your app handles a SIGTERM or what happens when your database latency triples, you haven’t followed “Python best” practices; you’ve just written code that works on your machine. Build for the failure, not the happy path.

Related Articles

Explore more insights and best practices:

Leave a Comment