Thursday, 15 August 2013

File can't be found in small number of jobs

File can't be found in small number of jobs

I'm trying to run a very large set of batch jobs on a RHEL5 cluster which
uses a Lustre file system. I'm getting a strange error with roughly 1% of
the jobs: they can't find a text file they are all using for steering. The
script I submit looks like this:
#!/usr/bin/env bash
#PBS -t 1-18792
#PBS -l mem=4gb,walltime=30:00
#PBS -l nodes=1:ppn=1
#PBS -q hep
#PBS -o output/fit/out.txt
#PBS -e output/fit/error.txt
cd $PBS_O_WORKDIR
mkdir -p output/fit
echo 'submitted from: ' $PBS_O_WORKDIR
files=($(ls ./*.txt | sort)) # this is historical, not used here
cat batch/fits/fit-paths.txt
For some small fraction of jobs, the error stream output looks like this:
cat: batch/fits/fit-paths.txt: No such file or directory
Any idea why some jobs wouldn't be able to find this file?

No comments:

Post a Comment