Hi Matthias,
I can't speak for GridEngine's specific behavior because I haven't used it in a long time, but it's not surprising that jobs "disappear" as soon as they've exited. Unfortunately, Galaxy uses periodic polling rather than waiting on completion. We'd need to create a thread-per-submitted job unless you can still get job exit details by looping over jobs with a timeout wait.
You can gain some control over how Galaxy handles InvalidJobException exceptions with drmaa job runner plugin params, see here:
However, if normally finished jobs also result in InvalidJobException, that probably won't help. Alternatively, you could create a DRMAAJobRunner subclass for GridEngine like we've done for Slurm that does some digging to learn more about terminal jobs.
--nate