Python Best Practices - Guide

Timestamp: 03:14:22 UTC. The cluster started bleeding, and it was all because someone thought they were too good for type hints.

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/fastapi/routing.py", line 299, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/fastapi/routing.py", line 210, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/api/v2/endpoints/processor.py", line 84, in process_data
    result = compute_weighted_average(payload.items)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/logic/math_utils.py", line 12, in compute_weighted_average
    return sum(item.value * item.weight for item in items) / sum(item.weight for item in items)
               ^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'value'

I’ve been awake for 48 hours. My keyboard is sticky with spilled espresso and my monitors are burning a hole through my retinas. The cooling fan in the server rack next to my desk is screaming at 5000 RPM, which is a fitting soundtrack for the absolute wreckage I’ve been digging through. We had a 99.99% uptime SLA. We had a reputation. Then Kevin—our “rockstar” junior who thinks documentation is for people who can’t read code—pushed a “minor optimization” to the data processing pipeline. He didn’t use type hints. He didn’t use a linter. He just pushed. And because our CI/CD pipeline was apparently configured by a toddler, it let his unmitigated disaster of a PR slide right into production.

The error above is the smoking gun. It’s a Python 3.12.1 runtime failure that should have been caught at the IDE level, or at the very least, during a pre-commit hook. But no. We’re running Python 3.12.1, taking advantage of the new f-string parsing improvements and the per-interpreter GIL work, yet we’re writing code like it’s 2005 and we’re trying to build a guestbook for a hobby site. This isn’t just about style; it’s about the fundamental python best practices that keep the lights on when the traffic spikes at 3:00 AM.

Table of Contents

The Anatomy of a Runtime Catastrophe

The failure started in the compute_weighted_average function. In Python 3.12.1, the interpreter is faster, sure, but it’s not a mind reader. Kevin decided that passing a list of objects was “too heavy” and instead passed a list of dictionaries, but he forgot to update the attribute access to key access. Or rather, he mixed them up. In a statically typed language, the compiler would have laughed him out of the room. In Python, without mypy 1.8.0 or pyright 1.1.350, the code just sits there like a landmine, waiting for a specific payload to trigger a TypeError or an AttributeError.

This is the cost of “clever” code. Clever code is code that bypasses safety checks to save three lines of boilerplate. Clever code is what causes SREs to lose their weekends. If you want to claim you follow python best standards, you start by acknowledging that your brain is a faulty biological machine and you need the machine to check your work. The items variable was supposed to be a list[Item], where Item is a Pydantic v2.6.1 model. Instead, it was a raw list[dict]. The sum() function started iterating, hit a dictionary, tried to access .value, and the whole service collapsed under the weight of its own incompetence.

The Dependency Hellscape and the Death of Determinism

When I finally got into the pod to see why the rollback was failing, I found the next layer of the nightmare. The requirements.txt file. I hate requirements.txt. It is a relic of a simpler, stupider time. It’s a flat list of lies. Kevin had added requests without pinning a version.

# Raw terminal output: pip list on the failing pod
Package            Version
------------------ ---------
annotated-types    0.6.0
anyio              4.2.0
click              8.1.7
fastapi            0.109.0
h11                0.14.0
idna               3.6
pydantic           2.6.1
pydantic_core      2.16.2
requests           2.31.0
sniffio            1.3.0
starlette          0.35.1
typing_extensions  4.9.0
uvicorn            0.27.0.post1

Look at that. requests 2.31.0. Yesterday it was 2.28.1. Somewhere in the transit, a sub-dependency shifted, and because we aren’t using a lockfile, the build system just grabbed whatever was newest on PyPI. This is why pyproject.toml is not optional. If you aren’t using poetry 1.7.1 or pdm 2.12.0 to generate a deterministic lockfile, you aren’t running a production environment; you’re running a gambling ring.

The python best way to handle dependency management is to use pyproject.toml with strict pins and a cryptographic lockfile. I spent four hours just trying to reconstruct the exact environment that existed before the “optimization” push because the pip freeze output from the dev environment didn’t match the staging environment. We are using Python 3.12.1, which has excellent support for modern packaging standards, yet we’re still acting like we’re installing packages on a shared hosting provider in 2010.

Type Hinting as a Survival Mechanism

I’m tired of hearing that type hints make Python “look like Java.” You know what Java has? Fewer 3:00 AM outages caused by NoneType errors. In Python 3.12.1, we have the new type statement (PEP 695) which makes generics actually readable. We have Annotated for metadata. We have zero excuses.

When I ran mypy against the “optimized” code, the output was a wall of red text that looked like a crime scene.

# Raw terminal output: mypy --strict logic/math_utils.py
logic/math_utils.py:12: error: "dict[Any, Any]" has no attribute "value"  [attr-defined]
logic/math_utils.py:12: error: "dict[Any, Any]" has no attribute "weight"  [attr-defined]
logic/math_utils.py:15: error: Argument 1 to "compute_weighted_average" has incompatible type "list[dict[str, float]]"; expected "list[Item]"  [arg-type]
Found 3 errors in 1 file (checked 1 source file)

Three errors. It took mypy 0.4 seconds to find what took me 4 hours of log aggregation to isolate. This is why I insist on mypy 1.8.0 being a blocking step in the CI. If the types don’t check out, the code doesn’t exist. I don’t care if it “works” on your local machine. Your local machine isn’t handling 50,000 requests per second. Your local machine isn’t running in a container with limited cgroups memory.

The python best way to write functions is to define the contract. def compute_weighted_average(items: list[Item]) -> float:. It’s not a suggestion. It’s a contract. When Kevin changed Item (a Pydantic model) to a dict, he broke the contract. Because he didn’t update the type hints, the IDE didn’t complain. Because he didn’t run mypy, the CI didn’t complain. And because he’s a “rockstar,” he didn’t think he needed to check.

The Ruff Mandate: Linting or Chaos

We used to use flake8, isort, and black. It was a mess of different config files and slow execution times. Now we have Ruff 0.2.1. It’s written in Rust. It’s fast. It’s exhaustive. And yet, Kevin had disabled the Ruff pre-commit hook because it was “slowing down his workflow.”

His workflow. His precious seconds of saved time cost the company thousands of dollars in lost revenue and cost me my sanity.

If you aren’t using Ruff, you are failing. The python best approach to code quality is an automated, uncompromising linter that runs on every save. Ruff catches things that mypy misses—unused imports that bloat the namespace, mutable default arguments that create persistent state across function calls, and the use of eval() which is basically an open invitation for a remote code execution exploit.

I had to go back and fix his try...except blocks. He was using except Exception:, catching everything including KeyboardInterrupt and SystemExit, and then—this is the best part—he was using pass. He silenced the errors. He literally buried the evidence of his own code’s failure. If Ruff had been running, it would have flagged E722 (do not use bare except) and F841 (local variable assigned but never used).

Structured Logging: A Love Letter to Sanity

When the system started failing, I went to the logs. What did I find?

ERROR: Something went wrong in the math function.

That’s it. That was the log entry. No stack trace (because he caught the exception and passed), no request ID, no user ID, no payload snippet. Just a vague sense of dread in string format.

We are using structlog 24.1.0 for a reason. We need JSON-formatted logs that can be ingested by Elasticsearch and queried with precision. The python best way to handle logging is to treat logs as data, not as a diary. A good log entry should look like this:

{
  "event": "calculation_failed",
  "level": "error",
  "timestamp": "2024-05-20T03:14:22.123Z",
  "request_id": "a8f3-4b92-91c1",
  "exception": "AttributeError: 'dict' object has no attribute 'value'",
  "input_payload_summary": {"item_count": 42, "type": "list[dict]"}
}

Instead, I got a string that told me nothing. I had to inject sys.settrace into a running process like some kind of digital surgeon just to see what the hell was happening inside the event loop. In Python 3.12.1, the sys module and the new monitoring API (PEP 669) actually make this easier, but I shouldn’t have to use debugger-level tools to find a basic logic error.

Logging is not an afterthought. It is the only window you have into the black box of production. If you don’t log with context, you are flying blind in a storm. I’ve spent the last six hours rewriting the logging middleware for FastAPI 0.109.0 to ensure that every single request carries a trace ID through the entire stack, from the load balancer to the database driver.

Memory Leaks and the CPython 3.12.1 Garbage Collector

As if the logic errors weren’t enough, the “optimization” also introduced a massive memory leak. Kevin thought it would be a good idea to cache the results of the weighted average calculation in a global dictionary. He didn’t use an LRU cache. He didn’t use functools.lru_cache. He just appended to a global dict.

In Python 3.12.1, the garbage collector is efficient, but it can’t collect things that are still referenced. By the time I got the alerts, the RSS (Resident Set Size) of the worker processes was climbing at a 45-degree angle.

# Raw terminal output: pytest traceback from the memory leak reproduction script
_________________________ test_memory_growth __________________________
def test_memory_growth():
    initial_mem = get_process_memory()
    for _ in range(10000):
        process_request({"items": [{"value": 10, "weight": 1}]})
    final_mem = get_process_memory()
>   assert final_mem < initial_mem * 1.1
E   AssertionError: assert 1024.5 > 110.0
E   Note: Memory grew from 100MB to 1GB in 10,000 iterations.

He created a reference cycle that even the generational collector couldn’t break because the root was a global variable. This is basic memory management. This is “Python 101,” yet here we are. The python best practice for caching is to use a dedicated store like Redis 7.2 or, if it must be in-memory, a bounded cache with a clear eviction policy.

I had to kill the worker processes manually because the OOM (Out Of Memory) killer was taking too long and the swap space was starting to thrash. When a Linux kernel starts thrashing swap, the whole node becomes unresponsive. I couldn’t even SSH in for ten minutes. I had to hard-reboot the instances through the AWS console. All of this because someone didn’t want to use a standard library decorator.

The CI/CD Gatekeeper

The final failure wasn’t Kevin’s code; it was our culture. We allowed a PR to be merged without a green light from the static analysis tools. We allowed “clever” to beat “stable.”

From now on, the rules are changing. I don’t care if the feature is needed for a demo to the CEO. I don’t care if it’s a “one-line fix.”

Strict Typing: Every function must have type hints. No Any. No ignore. If mypy 1.8.0 isn’t happy, the build fails.
Ruff Enforcement: The Ruff linter will run with the ALL rule set. Any violation is a build failure.
Pydantic Everywhere: No raw dictionaries will be passed between architectural layers. We use Pydantic v2.6.1 for all data validation. The performance overhead is negligible compared to the cost of a 48-hour War Room.
Lockfiles: requirements.txt is banned. We use poetry.lock.
Testing: 100% coverage is a myth, but 0% coverage on new logic is a fireable offense. pytest 8.0.0 must run and pass.

I’m going to sleep now. If the cluster starts bleeding again, don’t call me. Call Kevin. Tell him to check his types. Tell him to read the CPython source code for gc.c until he understands why global dictionaries are a bad idea. Tell him that “clever” is the enemy of “reliable.”

I’ve written this post-mortem not just to document the failure, but to serve as a warning. Python is a powerful, sophisticated language in its 3.12.1 iteration. It deserves better than the hacky, unoptimized garbage that was pushed this weekend. Stability is not a feature; it is a prerequisite. If you can’t write code that survives the night, you shouldn’t be writing code at all.

The next time I see a PR without type hints, I’m not just going to reject it. I’m going to delete the branch and revoke the committer’s access. We are SREs. We are the thin line between a functioning platform and a pile of smoking silicon. We don’t have time for “clever.” We only have time for what works.

Go fix your code. Use Ruff. Use Mypy. Use Pydantic. Follow the python best practices I’ve laid out, or find another profession where your “creativity” doesn’t wake me up at 3:00 AM. My coffee is cold, my head hurts, and I’m done.

Explore more insights and best practices:

Python Best Practices – Guide