Test AI-Written Code — From Jason's Forge

01

Write the PRD Before You Write Any Code

The spec that stops Claude from guessing

⌄

🔭

2nd Order EffectClaude fails because it makes assumptions about your environment. A PRD replaces those assumptions with your actual facts. Every assumption Claude has to make is a potential failure point.

🔓 Unlocks: Claude gets the context it can't guess on its own

Before you ask Claude to write any code, answer these 5 questions. Open a new Claude chat and paste this template:

PRD Template — Paste Into Claude First

I need you to build a script. Before writing any code,
confirm you understand these requirements:

ENVIRONMENT:
- OS: [Mac / Linux Ubuntu / Windows]
- Python version: [run: python3 --version and paste result]
- Full Python path: [run: which python3 and paste result]
- Virtual environment: [yes/no — did you use pip to install tweepy?]
- Where the script will live: [e.g. /home/joe/scripts/]

TASK:
- Posts a tweet to X (Twitter) on a schedule
- Schedule: [every day at 9am / every hour / etc.]

CREDENTIALS:
- X API credentials stored in a .env file (NOT in the code)
- Full absolute path to .env: [e.g. /home/joe/scripts/.env]
- X app has write permissions enabled: [yes/no — check at developer.twitter.com]

OUTPUT:
- Log file at: [e.g. /home/joe/scripts/logs/post.log]
- Log must capture: start time, success/failure, error message

ON FAILURE:
- Should it: log and exit / retry once / send me a notification?

STOP before writing code. Tell me:
1. What assumptions you're still making
2. What I need to verify before we start
3. What could break in MY specific environment

⚠️ X/Twitter API NoteBefore building anything, verify two things at developer.twitter.com: (1) your app has Read and Write permissions enabled — Read-only is the default and your script will fail silently. (2) You're on a tier that allows posting — the free tier post-2023 has rate limits that can block your script even if the code is perfect.

✓
I know my OS and Python version (python3 --version)
✓
I ran which python3 and have the full path
✓
I know if I used pip (virtual environment) to install tweepy
✓
My X app has Write permissions enabled at developer.twitter.com
✓
I pasted the PRD into Claude and got its assumptions back

Next: Make a second AI audit the first →

02

Use a Second AI to Audit the First

The pre-flight check that catches silent failures

⌄

🔭

2nd Order EffectA fresh Claude prompt with critic framing produces different, more skeptical output than the same window that built the code. You catch failure modes before touching your server — not after.

🔓 Unlocks: Failure modes identified before you run anything

Once Claude gives you a script, open a brand new chat. Paste the code and this prompt. The "new chat" part matters — fresh context means no attachment to the output.

Code Critic Prompt — New Chat

You are a code critic. NOT a code writer.

This script is meant to run via cron and post to X/Twitter.

Find every way this FAILS. Check specifically:
1. Will it work when cron runs it with no environment loaded?
2. Are ALL file paths absolute (starting with /)?
3. Is the .env file loaded with an absolute path?
4. Is auth validated BEFORE attempting to post?
5. Does it log errors or crash silently?
6. Does it use the correct Python path for this user's setup?
7. What happens when it fails — does it notify anyone?

For each issue: exact line number and the fix.

[PASTE CODE HERE]

What to look for: The critic should find at least 2–3 issues on any first draft. If it says "looks fine" — push back: "What would break if zero environment variables were loaded?"

✓
I opened a NEW Claude chat (not the same window that wrote the code)
✓
I pasted the critic prompt + the script
✓
I got at least 1 issue back — if zero, I pushed back
✓
I sent the issues to the original Claude and got a revised script

Next: Wrap every script in a safety net →

03

Add the Logging Wrapper

So failures scream instead of disappear

⌄

🔭

2nd Order EffectWithout a log, your script fails at 9am and you have no idea why. With a log, you open a file and see the exact error. Every future failure becomes a one-line diagnosis you can paste directly into Claude. That feedback loop is the real leverage.

🔓 Unlocks: Visibility into what happened when you weren't watching

Every auto-scheduled script needs this wrapper. Ask Claude to add it if you're not comfortable editing directly.

Python Logging Wrapper

import logging
import sys
from dotenv import load_dotenv

# ── LOAD ENV VARS — USE ABSOLUTE PATH ──
load_dotenv('/home/joe/scripts/.env')  # ← change this

# ── LOGGING — USE ABSOLUTE PATH ──
logging.basicConfig(
    filename='/home/joe/scripts/logs/post.log',  # ← change this
    level=logging.DEBUG,
    format='%(asctime)s | %(levelname)s | %(message)s'
)

logging.info("=== Script started ===")

try:
    # ── YOUR CODE GOES HERE ──

    logging.info("Post successful")

except Exception as e:
    logging.error(f"FAILED: {e}")
    sys.exit(1)  # tells cron the script failed

Two paths to change: the .env path and the .log path. Both must be absolute — starting from /, not ./ or ~/. Not sure of your path? Run pwd in the folder where your script lives.

⚠️ Don't forget .gitignoreIf you ever push code to GitHub, add .env to your .gitignore file. Otherwise your API keys go public. This is the second half of the "credentials in .env" rule that most people miss until it's too late.

✓
Logging wrapper is in my script
✓
Both paths are absolute (start with /)
✓
The logs/ folder exists (create it if not: mkdir -p /path/to/logs)
✓
.env is in my .gitignore (if using GitHub)
✓
I ran the script manually and the log file was created

Next: Test in the exact cron environment →

04

Simulate the Cron Environment

The test that exposes the hidden failure most people skip

⌄

🔭

2nd Order EffectYour terminal and cron are different environments. Running this test reveals that gap before the scheduled job ever runs. Pass this test and you know the script will work the same way the scheduler sees it.

🔓 Unlocks: Real confidence — not just "it worked in my terminal"

Find the correct Python path first — this is critical if you installed tweepy via pip (virtual environment):

Step 1: Find Your Python Path

which python3

Use that exact path in both the simulation command and in your cron job. Then simulate how cron will run your script:

Step 2: Cron Simulation Test

# Replace with the path from 'which python3' above
env -i /usr/bin/python3 /absolute/path/to/your_script.py

# If you used a virtual environment (pip install tweepy), use:
env -i /home/joe/myproject/venv/bin/python3 /absolute/path/to/your_script.py

Check the log immediately after:

Step 3: Read the Log Live

tail -f /absolute/path/to/logs/post.log

If this command fails but python3 script.py works normally — you found the environment gap. Paste the log error into Claude and ask for the fix.

ℹ️ macOS UsersIf you're on a Mac, crontab -e works on older macOS but Apple has progressively restricted cron. If your cron job won't run after setup, you may need to use launchd instead. Ask Claude: "Convert this cron job to a macOS launchd plist file" and it will handle it.

✓
I ran which python3 and have my exact Python path
✓
I ran the cron simulation command using that path
✓
I checked the log for errors
✓
If errors appeared: I pasted the log into Claude and got the fix
✓
Simulation passes with "Post successful" in the log

Next: Declare what success looks like before it runs →

05

Declare Your Expected Outcome First

The smoke test that makes success binary

⌄

🔭

2nd Order EffectMost people check if something "ran" — which is vague. Pre-declaring what success looks like lets you compare actual vs expected. That comparison catches subtle failures: API accepted the post but tweet was filtered, auth expired, wrong account. Binary check = zero ambiguity.

🔓 Unlocks: The ability to verify results, not just assume they happened

Before the cron job runs for real, write down what you expect. Literally write it down:

Smoke Test Template

SMOKE TEST — [Date] [Time]

EXPECTED IMMEDIATE:
- Script runs at [time]
- Log shows: "=== Script started ===" then "Post successful"
- Tweet appears on X within 2 minutes
- No error lines in the log

EXPECTED 2ND ORDER (still true 24 hrs later):
- No duplicate posts
- Auth still valid (no expired token errors in log)
- Log file under 1MB
- Script runs again at next scheduled time without manual restart

ACTUAL (fill in after it runs):
- Log result:
- Tweet appeared: YES / NO
- Any errors:

✓
I've written my expected immediate outcome
✓
I've written the 2nd order expected outcome
✓
I have a place to fill in actual results after it runs

Next: Set the real cron and monitor →

06

Set the Cron Job and Verify the First Run

The last step — and how to feed failures back to AI

⌄

🔭

2nd Order EffectLog → paste → fix is the compound feedback loop. Every failure you process this way makes the next one faster. You stop debugging with your brain. You debug with evidence. Logs beat descriptions every time — that's not opinion, that's how diagnostic systems work.

🔓 Unlocks: A self-sustaining feedback loop — log is your reporter, Claude is your mechanic

Simulation passed. Now set the real cron job. Open terminal:

Open Cron Editor

crontab -e

Add your schedule. Format: minute hour * * * /full/path/to/python3 /full/path/to/script.py

Cron Schedule Examples

# Every day at 9:00 AM — use the python path from 'which python3'
0 9 * * * /usr/bin/python3 /home/joe/scripts/post.py

# Every hour
0 * * * * /usr/bin/python3 /home/joe/scripts/post.py

# Weekdays at 8:30 AM
30 8 * * 1-5 /usr/bin/python3 /home/joe/scripts/post.py

ℹ️ macOS: Use launchd if cron doesn't workOn modern macOS, cron jobs may not run due to system restrictions. If your job doesn't fire, ask Claude: "Convert this cron schedule to a macOS launchd plist" — it will generate the correct .plist file and the commands to load it.

When the first run happens, check the log and compare to your smoke test. If it fails:

Failure Feedback Prompt — Paste Log Into Claude

Here is my error log. Here is the script.

Tell me:
1. The exact line that failed
2. Why it failed
3. The corrected script only — no explanation needed

LOG:
[paste log here]

SCRIPT:
[paste script here]

✓
Cron job set with full absolute Python path
✓
Waited for the first scheduled run
✓
Compared log output to my smoke test declaration
✓
If failed: pasted log + script into Claude and got the fix
✓
Tweet posted correctly — log shows "Post successful"

You built a self-monitoring automation

How to Test AI-Written CodeBefore It Burns You

How to Test AI-Written Code
Before It Burns You