SC Demo for the NFC Project: Keahey, Papka

 

Description

 

Fusion experiments operate in a pulsed mode with a new pulse coming up roughly 15-20 minutes. Between experimental pulses Fusion scientists run analysis and simulation codes to evaluate the progress of the experiment and determine parameter adjustments going into the next pulse. With the adoption of computational Grids in the Fusion community, such codes can be run remotely as long as they can be guaranteed to finish in the prescribed time; this requires combining time-critical execution of network transfer, resource reservation, and job execution.

 

The demo will showcase the use of computational Grids and visualization technologies during a real-time Fusion experiment. Specifically, we will use the Globus Toolkit 3 (GT3) infrastructure to run time-critical remote Fusion codes, Access Grid technologies to create an environment where time-critical jobs can be launched and monitored, the results visualized, and the ongoing experiment can be discussed,The demo viewer will have the opportunity to passively participate in an ongoing Fusion experiment, see how the interactions during such an experiment advance Fusion science, and experience first-hand, how the technologies developed under the NFC project enhance those interactions.  Also, this experiment has never been carried out before in any Fusion control room. In addition to being a demonstration it will therefore mark a milestone for the Fusion community.

 

 

Globus part

 

The infrastructure combining time-critical execution of network transfer, resource reservation, and job execution has been partially developed. The experiment will involve reading data from MDSplus, using a data transfer service, based on GridFTP, to transfer them to the remote site where job execution will take place, executing it on a reserved resource, transferring it back using the data transfer service, and writing the data into MDSplus. These actions will be carried out automatically and orchestrated by a broker executing a previously written workflow.

 

From the perspective of the fusion scientist the interaction takes place in the following stages. Prior to the experiment, the scientist inquires with the broker for the codes and arguments that can be executed in the required time. The broker provides that information by requesting and combining information from the data transfer service and job execution service. These services provide it based on historical execution data as well as network prediction. Based on this information the client makes an agreement for service execution with certain arguments to be available during the time of the experiment. The broker uses the agreement in order to make the requisite resource reservations. At the time of the experiment, the scientist requests the execution of a certain code. The broker carries out the execution in the requested time, informing the client of its progress and completion of stages (data transfer time, execution time, etc.). This interaction is carried through using GUIs.

 

What is required: I have an infrastructure that some of my students have been developing that I have used for a similar (but much simpler, and not real-life-ready) demo last year. Infrastructure is flaky, and I can't rely on student effort to provide it for the demo. I cannot do it  myself due to other time commitments (although to the extent possible I would like to be involved in development). What is needed is support/troubleshooting for the infrastructure, addition of some specific features required for the demo, work with scientists to put concrete codes into the framework, and help in running the demo. I currently estimate something like at least 1/2 FTE for 2 months, but this might change after I discuss exact codes with my collaborators.

Benefits: the resulting infrastructure would constitute a Fusion deliverable. In addition, (as it is similar and has partially driven the development of OGSI-Agreement) it would be another "test drive" of the ideas (if not an actual implementation of OGSI-Agreement; there is no time for this). It would also constitute a major step in converting the Fusion infrastructure to GT3 (obviously getting them to run GT3 in a real experiment makes a statement about GT3). If successful, it would also demonstrate how remote Grid resources can be used for time-critical calculations during an experiment.

 

Access Grid Infrastructure

The Access Grid (AG) will leverage its long history of aiding groups in collaboration at a distance. Building on the recent 2.0 release that incorporates all the functionality of earlier versions of the AG environment enhanced by the use of standard Grid middleware provided by the Globus Toolkit. The new AG environment provides the ability to incorporate Grid based services, the starting and monitoring of jobs on the Grid, The results can then be stored within the AG Virtual Venue or an external datastore.

 

As part of the National Fusion Collaboratoryıs Supercomputing 2003 demonstration the collaboration and visualization efforts will integrate with the distributed computing efforts a integrated solution for Fusion scientists to enhance the way they do science. This will tie together time-critical computing, the ability to consult with colleagues that are not collocated with the experiment, and to share visualizations as part of the analysis process.

 

From the perspective of the fusion scientist the interaction takes place in the following stages. Prior to the experiment, the scientist will login to the Access Grid Fusion venue where it is capable to inquire with the ?WHAT BROKER? broker for the codes and arguments that can be executed in the required time. The broker provides that information by requesting and combining information from the data transfer service and job execution service and presents this information to the scientists. At the time of the experiment, the scientist returns to the Fusion venue where he can monitor the broker requesting the execution of a certain code. As the broker carries out the execution in the requested time, informing the client of its progress and completion of stages (data transfer time, execution time, etc.) results are monitored within the Fusion venue. During this time the fusion scientist is also capable of meeting with remote colleagues to discuss already received results of the experiment and simulation. The fusion scientist has full access to data already stored in the MDSPlus via the AG based client; access to shared analysis tools such as ReviewPlus and Jscope.

Issues to resolve

 

  1. 1.     What remote, time-critical Fusion codes are we going to run during the demo? As well as what analysis tools are needed to analyze the results of these codes? They would need to be relevant to the science that is going on (so that it is clear that Grid computing can help in this setting), and then obviously they need to be short enough to run in the between-pulses timeframe. Basically we need to demonstrate a need as well as a solution
    1. 2.     Resource assignment on the Globus side; these will be determined based on what codes we run (i.e. how much interaction with the codes is needed)
      1. 3.     Demo logistics: typical SC slot is ½ hour every day for 2 or 3 days. Do we want to change that and have the demo run for say an hour or two hours one day only? The real question is how much will the GA facility be available? Limiting the demo to one 2-3 hour demo will limit who we can get there to see it.
        1. 4.     Does it make sense to ³stage² the demo with preparation phase and execution phase? And if so how do we communicate that this has happen?
          1. 5.     What is our fall back plan, what if the net fails, certain Grid resources fail? What is the plan for booth only demo?
            1. 6.     Obviously, we also need schedules etc.

            2.