AI Prompt Engineering

Getting ChatGPT to Write Shell Scripts That Handle Failures Correctly

June 19, 2026 10 min read 4 views

You asked ChatGPT to write a shell script, it gave you something that looks reasonable, you ran it in production, and now a partial operation has left your filesystem half-configured. The script exited zero even though a critical step failed. It happens constantly, and it is almost always a prompting problem, not a ChatGPT capability problem.

Shell scripts fail silently by default. Bash is famous for continuing past errors unless you explicitly tell it not to. ChatGPT, unless you guide it, mirrors the bad habits that dominate the majority of shell scripts it trained on. The good news: a few targeted additions to your prompt consistently produce scripts that fail loudly, clean up after themselves, and leave your system in a predictable state.

What You'll Learn

  • Why ChatGPT defaults to fragile, happy-path shell scripts and how to change that
  • Which flags and patterns to require in every prompt (set -euo pipefail, traps, exit codes)
  • How to prompt for cleanup and rollback logic in multi-step scripts
  • How to review AI-generated scripts for the silent failure patterns ChatGPT still misses

Prerequisites

This guide assumes you are writing Bash scripts for Linux or macOS and that you have a working understanding of the command line. You do not need to be a shell scripting expert — that is partly why you are using ChatGPT — but you should know what a function, a variable, and an exit code are. Examples use Bash 4+.

Why ChatGPT's Default Shell Scripts Are Fragile

Ask ChatGPT to "write a shell script that backs up a directory to S3" without any additional context, and you will almost certainly get something like this:

#!/bin/bash
aws s3 sync /data/app s3://my-bucket/backups/
echo "Backup complete"

This works on the happy path. If aws s3 sync fails because credentials have expired, or the bucket name is wrong, the script still prints "Backup complete" and exits with code 0. Anything watching that script — a cron job, a CI pipeline, a monitoring tool — sees success. The backup did not happen.

ChatGPT is not being lazy. It is producing what statistically looks like a shell script based on its training data, and most shell scripts in the wild do not have proper error handling. Your job is to make the error handling requirements explicit before it writes a single line.

The Baseline Prompt Problem: Happy-Path Only

The typical prompt for a shell script goes something like: "Write a bash script that deploys my app by pulling the latest Docker image, stopping the old container, and starting a new one." That sentence contains zero information about what should happen when any of those steps fails.

ChatGPT fills the gap with the simplest possible assumption: each command runs and succeeds, then the next one runs. There is no guidance to add error handling, so it does not add error handling — or it adds it superficially, checking the exit code of the last command but not of intermediate ones.

The fix is to include error handling requirements directly in your initial prompt, not as an afterthought. Treating robustness as a constraint from the start produces structurally better scripts than asking ChatGPT to "add error handling" to code it already wrote.

Setting the Stage: Tell ChatGPT Your Error Handling Requirements

Before describing the script's task, open your prompt with a set of non-negotiable requirements. Here is a template you can adapt:

Write a Bash script that does the following. Requirements that apply to every script you write for me: use set -euo pipefail at the top; define a cleanup function and register it with trap to run on EXIT, ERR, and INT; use meaningful exit codes (not just 0/1); print all errors to stderr with a prefix like [ERROR]; and never silently swallow command failures. Now, here is the task: [your task description].

This single block up front shifts ChatGPT's output dramatically. It stops treating error handling as optional decoration and starts building it into the structure of the script. You are essentially writing your team's shell scripting standards into the prompt itself — similar to how you would prompt ChatGPT to write idiomatic code for your specific stack rather than generic boilerplate.

Using set -euo pipefail as a Non-Negotiable Starting Point

These three options do different things and all of them matter:

  • set -e — exit immediately if any command exits with a non-zero status.
  • set -u — treat unset variables as an error. Prevents rm -rf $DIR/ from becoming rm -rf / when DIR is empty.
  • set -o pipefail — a pipeline fails if any command in it fails, not just the last one. Without this, cat missing_file | grep something exits 0 because grep exited 0.

Tell ChatGPT explicitly: "Start the script with set -euo pipefail and explain in a comment what each flag does." Asking for the comment serves two purposes: it makes the script self-documenting, and it forces ChatGPT to reason about the flags rather than paste them mechanically, which reduces the chance it then writes code that inadvertently disables them.

Watch out for one common ChatGPT pattern: it sometimes wraps a block with set +e to handle an expected failure, then forgets to restore set -e afterward. Review every occurrence of set +e in generated scripts and make sure a matching set -e follows.

Prompting for Meaningful Exit Codes and Error Messages

Bash only guarantees two exit codes by convention: 0 for success and non-zero for failure. But "non-zero" is not enough information for a calling process or a monitoring system. Prompt ChatGPT to define a small exit code table at the top of the script:

#!/bin/bash
set -euo pipefail

# Exit codes
readonly E_OK=0
readonly E_MISSING_DEPENDENCY=1
readonly E_INVALID_ARGS=2
readonly E_UPLOAD_FAILED=3
readonly E_CLEANUP_FAILED=4

err() {
  echo "[ERROR] $*" >&2
}

info() {
  echo "[INFO] $*"
}

To get this output, add to your prompt: "Define named exit code constants at the top of the script. Write err() and info() helper functions that route to stderr and stdout respectively. Use named constants instead of bare numbers when calling exit."

The helper functions matter because they make it obvious in the code where errors are being reported and whether they are going to the right file descriptor. Scripts that mix echo and echo >&2 randomly are much harder to debug when something goes wrong at 2 AM.

Getting ChatGPT to Add Cleanup Traps

A trap registers a function to run when the script exits for any reason — normal exit, an error, or a signal like Ctrl-C. Without a trap, temporary files, half-written configs, and acquired locks can be left behind when a script dies mid-run.

Explicitly ask for this pattern in your prompt: "Register a cleanup() function with trap cleanup EXIT ERR INT TERM. The cleanup function should remove any temporary files or directories created during the script." Here is what a well-prompted ChatGPT should produce:

#!/bin/bash
set -euo pipefail

TMPDIR=""

cleanup() {
  local exit_code=$?
  if [[ -n "$TMPDIR" && -d "$TMPDIR" ]]; then
    rm -rf "$TMPDIR"
    info "Cleaned up temporary directory: $TMPDIR"
  fi
  exit "$exit_code"
}

trap cleanup EXIT ERR INT TERM

TMPDIR=$(mktemp -d)
info "Working in $TMPDIR"

Notice that the cleanup function captures $? at its very start. By the time Bash runs the trap, $? holds the exit code of the command that triggered the exit. If you do not capture it immediately, subsequent commands inside cleanup() will overwrite it, and you lose the original failure code.

If ChatGPT's generated trap does not capture $? first, point this out in a follow-up: "Your cleanup function does not preserve the original exit code. Capture $? as the first line and pass it to exit at the end of the cleanup function."

Prompting for Rollback Logic on Multi-Step Scripts

Deployment and migration scripts often make a series of changes where a failure halfway through is worse than not starting at all. ChatGPT can write rollback logic, but only if you describe your rollback semantics explicitly. Vague prompts produce vague rollbacks.

A prompt that works: "Write a deployment script that pulls a new Docker image, stops the running container, starts the new one, and runs a health check. If any step after stopping the old container fails, roll back by restarting the previous container image. Store the previous image tag before making any changes."

That gives ChatGPT enough information to structure rollback correctly. The key phrase is "store the previous image tag before making any changes" — it tells the model to capture the rollback state before it is destroyed. Without that hint, ChatGPT often writes rollback logic that tries to recover information that is no longer available after the failure.

PREVIOUS_IMAGE=$(docker inspect --format='{{.Config.Image}}' my-app 2>/dev/null || echo "")

rollback() {
  if [[ -n "$PREVIOUS_IMAGE" ]]; then
    err "Rolling back to $PREVIOUS_IMAGE"
    docker run -d --name my-app "$PREVIOUS_IMAGE" || err "Rollback also failed — manual intervention required"
  fi
}

trap rollback ERR

After getting this output, verify that the rollback trap only fires on ERR, not on normal EXIT. If it fires on EXIT unconditionally, you will roll back every successful deployment. ChatGPT sometimes conflates the two — always check the trap registration line.

For scripts that modify database state or files, describe the inverse operation for each step. ChatGPT cannot infer that the inverse of "copy file A to location B" is "delete B" unless you say so. This is the same principle behind prompting ChatGPT for accurate data migration scripts — the more precise your description of the desired outcome and its failure cases, the more useful the output.

Common Pitfalls When Reviewing AI-Generated Shell Scripts

Even with a well-constructed prompt, generated scripts deserve a review pass. Here are the patterns that slip through most often:

Unquoted variables

ChatGPT frequently writes rm -rf $DIR instead of rm -rf "$DIR". An unquoted variable with a space in its value splits into multiple arguments. Always check that variables are double-quoted, especially in file operations. Run the script through shellcheck — it catches this class of bug automatically.

Error checking after command substitution

Command substitution like OUTPUT=$(some_command) does not trigger set -e when it appears on the right side of an assignment. Bash treats the assignment itself as succeeding even if some_command failed. Prompt ChatGPT to check the exit code explicitly: "After any command substitution used in an assignment, check $? and exit with an error if it is non-zero."

Pipelines that hide failures

pipefail helps, but some patterns still swallow errors. A while read loop fed by a pipe is one: some_command | while read -r line; do ...; done runs the loop in a subshell, and failures inside the loop body do not propagate cleanly. If ChatGPT generates pipeline loops, review them carefully and test failure paths explicitly.

Missing dependency checks

Scripts that depend on external tools (jq, aws, docker) should verify those tools exist before doing any real work. Prompt for this explicitly: "At the start of the script, check that all required external commands are available using command -v and exit with a clear error message if any are missing."

Reviewing AI-generated code for silent failure modes is a skill that applies beyond shell scripts. The same discipline of checking edge cases and failure paths is covered in more depth in the guide on debugging ChatGPT code suggestions that silently break edge cases.

Overly broad error suppression

Watch for patterns like command || true used to silence an error that should actually stop execution. ChatGPT sometimes adds || true to commands it thinks might fail harmlessly, when in your context the failure is actually critical. Question every || true in generated output.

If you want a second opinion on what the generated script is actually doing, you can use the same approach described in getting ChatGPT to explain someone else's code without surface-level summaries — paste the generated script back in and ask for a line-by-line analysis that specifically calls out any place where a failure could be silently ignored.

Wrapping Up: Next Steps

Shell scripts that handle failures correctly are not complicated to write — they just require being explicit about what "correct" means before you ask for the code. ChatGPT is good at following constraints when those constraints are stated clearly upfront.

Here are the concrete actions to take from here:

  1. Create a prompt template for your team that includes set -euo pipefail, a cleanup trap requirement, named exit codes, and stderr error helpers. Paste it at the start of every shell script request.
  2. Install shellcheck in your local environment and run every AI-generated script through it before committing. It catches the majority of quoting and error-handling gaps automatically.
  3. Write a failure test for every generated script. Deliberately break one dependency or pass an invalid argument and confirm the script exits non-zero, prints a clear error to stderr, and cleans up after itself.
  4. Review every || true, set +e, and unquoted variable in generated output as a checklist step before merging.
  5. For multi-step scripts, describe rollback semantics explicitly in the prompt — what state needs to be captured before changes begin, and what the inverse of each step is.

Frequently Asked Questions

Why does ChatGPT write shell scripts that don't check for errors?

ChatGPT generates scripts based on patterns in its training data, and most shell scripts in the wild skip proper error handling. Unless you explicitly require error-handling patterns like set -euo pipefail and traps in your prompt, ChatGPT defaults to happy-path code that assumes every command succeeds.

What does set -euo pipefail do in a bash script and should I always use it?

set -e exits immediately on any command failure, set -u treats unset variables as errors, and set -o pipefail makes the whole pipeline fail if any command in it fails. You should use all three at the top of almost every non-trivial Bash script — they catch the most common silent failure patterns.

How do I get ChatGPT to add cleanup logic that runs even when a script crashes?

Tell ChatGPT to define a cleanup() function and register it with trap cleanup EXIT ERR INT TERM. This ensures the cleanup runs whether the script exits normally, hits an error, or is interrupted by the user. Make sure the cleanup function captures $? on its first line to preserve the original exit code.

How can I make ChatGPT generate rollback logic for a deployment script?

Describe the rollback semantics explicitly in your prompt — specify what state should be captured before any changes are made and what the inverse operation is for each step. ChatGPT cannot infer rollback logic from a task description alone; you need to tell it what to undo and in what order.

Is shellcheck useful for reviewing AI-generated shell scripts?

Yes, shellcheck is one of the most effective ways to catch common issues in AI-generated shell scripts, including unquoted variables, missing error checks, and problematic pipeline patterns. Run it on every generated script before committing it to catch the gaps that even a careful prompt cannot always prevent.

📤 Share this article

Sign in to save

Comments (0)

No comments yet. Be the first!

Leave a Comment

Sign in to comment with your profile.

📬 Weekly Newsletter

Stay ahead of the curve

Get the best programming tutorials, data analytics tips, and tool reviews delivered to your inbox every week.

No spam. Unsubscribe anytime.