From: Wasef Masood (masood@mcs.anl.gov)
Date: Wed Apr 10 2002 - 13:43:27 PDT
Lew and Qian,
All the April 4 gram*.logs have error code 73 failures. You said
"sometimes it works and sometimes fails". Do you also have those logs
where the job manager (sometimes) succeeded in writing to the client?
Also let me know what ports are open on the hotel and gat.com firewalls.
Wasef Masood
masood@mcs.anl.gov
Distributed Systems Laboratory
Mathematics and Computer Science Division
Argonne National Laboratory
On Wed, 10 Apr 2002, Lew Randerson wrote:
>
> Wasef,
>
> AAA_OLDGLOBUSLOGS.tgz is a tar ball containing all the jobmanager
> logs to account pshare on termita.pppl.gov or bluebeat.pppl.gov.
> bluebeat_globus-gatekeeper.log is a copy of the globus gatekeeper
> log on bluebeat.
> termita_globus-gatekeeper.log us a copy of the globus gatekeeper
> log on termita.
>
> These types of failures were seen:
>
> 1) In both the termita and bluebeat globus-gatekeeper logs, you will
> notice a connection from 63.218.77.133. This is the address to which
> the hotel firewall changed my hosts private IP address.
>
> The gatekeepers firewall had been changed to allow this address.
> Nothing could be done to the hotel's firewall.
>
> globusrunjobs which wrote back to the client were submitted from this
> host to termita and/or bluebeat and failed. This is documented in
> gram_job_mgr_12368.log and gram_job_mgr_12388.log
>
> gsincftp could connect successfully to termita/bluebeate
>
> 2) In the rest of the gram_job_mgr_*.log files are all the failures.
> Initally some of the failures were because the firewall was not
> open. The failures documented on April 4 issued from pm-port10,
> pm-port1, and pm-port5 were definitely not firewall issues. These
> log files are included in
>
> gram_job_mgr_11848.log gram_job_mgr_11934.log gram_job_mgr_11949.log
> gram_job_mgr_12049.log gram_job_mgr_12057.log gram_job_mgr_12062.log
> gram_job_mgr_12229.log gram_job_mgr_12236.log gram_job_mgr_12260.log
> gram_job_mgr_5820.log gram_job_mgr_5935.log gram_job_mgr_6017.log
> gram_job_mgr_6191.log gram_job_mgr_6193.log gram_job_mgr_6216.log
> gram_job_mgr_6218.log gram_job_mgr_6220.log gram_job_mgr_6266.log
> gram_job_mgr_6734.log gram_job_mgr_6996.log gram_job_mgr_22296.log
>
> The mystery problems to us are the ones documented in #2.
>
> --Lew
>
>
>
This archive was generated by hypermail 2.1.4 : Mon Apr 15 2002 - 23:30:56 PDT