scripts/lambda-bootstrap.sh — drop the cd-into-repo / before launching trainer
The previous version did `(cd repo && "${cmd[@]}")` to "cd into repo
for module imports." But PYTHONPATH was already set to $PWD/repo at
the top of the script — so the cd was redundant for imports AND
broke relative paths: the trainer expects to find
data/processed/validation_v1.parquet from $HOME/cis490, not from
$HOME/cis490/repo/.
Symptom: every training job failed immediately with
FileNotFoundError: data/processed/validation_v1.parquet
Drop the cd; PYTHONPATH already does the import work.
Found while running on the A100 today; trainer relaunched manually
in-place via a stand-in bootstrap2.sh; this commit makes the next
bundle clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
997c399cf9
commit
ed7e3db035
1 changed files with 4 additions and 4 deletions
|
|
@ -175,10 +175,10 @@ for entry in "${JOBS[@]}"; do
|
|||
cmd+=("${extra_args[@]}")
|
||||
fi
|
||||
|
||||
# Launch in background. Each subshell cd's into repo/ for module
|
||||
# imports; output redirected to per-job log; trainer + torch handle
|
||||
# multi-process CUDA OK on a single A100.
|
||||
(cd repo && "${cmd[@]}") > "$log" 2>&1 &
|
||||
# Launch in background. PYTHONPATH is set to $PWD/repo at the top
|
||||
# of this script so we DO NOT cd into repo/ — relative paths to
|
||||
# data/processed/* must resolve from $HOME/cis490, not from repo/.
|
||||
"${cmd[@]}" > "$log" 2>&1 &
|
||||
pid=$!
|
||||
PIDS+=("$pid")
|
||||
PID_TO_LABEL[$pid]="$job_label"
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue