simonw
2 hours ago
This article would benefit from a date. It looks like it's recent (Internet Archive first grabbed it on May 29th) but it's the kind of information that can quickly become stale as models and agents improve.
(I've been getting solid results recently from simply telling Claude Code and Codex "Test with uv run pytest, use red/green TDD".)
__mharrison__
34 minutes ago
Here's a portion of my AGENTS.md from this week (playing FDE, implementing a custom workflow for a client that 20x their productivity).
# Python Tooling
- Use `uv` to manage Python environments and dependencies.
- Use `uv run` to execute Python scripts and commands.
- Use `pytest` for testing your code.
- Use the `hypothesis` library for property-based testing when you have complex input spaces or need to test edge cases.
- Don't edit `pyproject.toml` directly. Instead, use `uv add` and `uv add --dev` to manage dependencies.
- Use ruff, ty, prek, wily for code quality and linting.
- Don't use excessive casting. If you find yourself needing to cast types frequently, consider refactoring your code to use more appropriate types. Casting should only be done in boundary layers where you are interfacing with external systems.
- Run appropriate tooling after making changes to your code to ensure it meets quality standards.
- When you come across a bug or regression, think hard about writing a test and also how to create code that will prevent this from happening again in the future.
- When creating a command line interface, add `--verbose` flag that provides logging output useful for debugging issues.
- Before creating code, brainstorm 5 different approaches to solve the problem and sort them by their probable effectiveness. Then, choose the best approach and implement it.
- Use Test Driven Development (TDD) for all code you write. Write tests before writing the implementation code.
- Collect pytest fixtures in a `conftest.py` file to avoid duplication
- Prefer testing real code where possible. Use doubles and `monkeypatch` when absolute necessary. Try to avoid mocking as much as possible.
- Favor pytest monkeypatch to mock.
- When a test fails, run the last failed test first using `uv run pytest --last-failed`
- Use numpy-style docstrings for all functions and classes you create.
- Include doctests in the docstrings of your functions to provide examples
- Use type hints for all function parameters and return types.
- Use logging to provide insight into failures. Don't use print for debugging. Don't use logging to hide stack traces.porphyra
an hour ago
A lot of prompt engineering goes out of date quickly. Nobody nowadays goes "you are an expert software engineer. make no mistakes" lol.
As a personal anecdote, I find that a lot of big prompts and skills use up context window budget and in many cases agents will eagerly try to use a skill even if it isn't super relevant or necessary for the current task. So when I have too many skills I have to spend a bunch of time toggling the checkboxes to figure out which ones are needed for the task at hand before starting...
disgruntledphd2
2 hours ago
Me too, although I dislike the fact that it over-focuses on mocks (which I accept is over-represented in the training data).
galsapir
an hour ago
sometimes I also feel it tries to optimise for "per line coverage" over more "real, complex use cases" type tests