FusionGrid.jpg (72 bytes) Workers: Proposal


Proposal of Globus capabilities for Fusion renewal proposal from ANL

Proposal of Globus capabilities for Fusion renewal proposal

 

Work Items

 

  1. Support (0.6 to 1.5 FTE)

 

The use of the Globus Toolkit throughout the Fusion Collaboratory, and the various enhancements planned in Phase II, will require technical support from the Globus team. These proposals address requirements specified in points 1(a&b), 2(d,g,i), 4, and 6(a).

 

    1. Basic support, including technical support in terms of bug fixes, answering technical questions, etc. Estimated cost: 1/5 FTE per year = 0.6 FTE over three years.
    2. Option: enhanced support: this includes technical support, bug fixes, answering questions, plus small feature additions such as for example, adding new errors, credential refresh, extended documentation for requested features, etc. Estimated cost: 1/2 FTE per year = 1.5 FTE over three years.

 

  1. Data Management (1.1 to 1.3 FTE)

 

Data management is important for the Fusion Collaboratory, especially as we begin to work more with the simulation community. Integrating the new fast XIO transport with MDSplus will allow it to work at much higher speeds, and leveraging the GridFTP infrastructure that achieved 8 Gbit/sec transfers at SCą03 will have high potential benefits to the general Fusion community.  All the proposals below addresse requirements described in point 5.

 

    1. Integrating XIO with MDSplus: (0.1 FTE = ~10 weeks); support for integration, debugging documentation, ironing out whatever new issues arise in this context, etc.
    2. XIO drivers: 0.2 FTE per driver; XIO drivers of interest are: (a) mode-e allowing an XIO user to leverage the parallelism aspects of gridftp, and (b) the gridftp driver for access to remote files (altogether 0.4 FTE = 20 weeks). Driver development includes design, prototyping, development, testing and integration, release cycle (alpha through final), documentation and integration with community users.
    3. XIO for Windows: under the current design we have provided an alpha version of Globus I/O for Windows and a design for XIO for Windows. More funding will be needed to provide a reliable implementation. Estimated effort: 0.5 FTE = 6 months. The work will include prototype, development, development of a test suite for Windows, development of packaging for Windows, testing and integration, release cycle, documentation and integration with community users.
    4. Introducing GridFTP to a new community of users (NIMROD): estimated effort 0.3 FTE = ~ 15 weeks. Work includes 2 site visits by 2 people, help with installation, debugging site setup, performance tuning, tutorials, etc. Also, interactions with teams designing translations into the HDF5 format.

 

  1. Resource Management and Enforcement (0.6 to 1.9 FTE)

 

Even modest success of the Fusion Collaboratory will result in overloading compute, network, and storage resources, and also increase the demands on sites in terms of authorization and account management. The work that we propose here will provide the essential machinery needed for automated resource management and enforcement.

 

    1. Simple gateway enforcement: estimated effort (0.3 FTE = 15 weeks). This is essentially an extension of what we have right now, but would cover new authorization policies. The work will include design, prototyping, development, testing and integration, release cycle (alpha through final), documentation and integration with community users. This proposal addresses requirement from point 2 (a).
    2. Dynamic accounts: estimated effort: 0.3 FTE = 15 weeks. The design and prototype were developed under current proposal. Future work would include integration with authorization service (authorizing creation and management of those accounts) and adjustments to different setups (currently implemented based on modifications to /etc/passwd file but for example a cluster would need yppasswd, etc.). Integration with WS-Agreement and putting this through a release cycle would take another 0.3 FTE = 15 weeks. This proposal addresses requirements from point 2 (f and g).
    3. Sandboxes and/or VM: replace simple dynamic accounts with a sandbox capable of enforcing properties such as CPU share, disk usage share, memory usage etc. This is a better alternative to the enforcement described in D(a), but at this point would require significant R&D. Open-ended, different levels of involvement would require different levels of effort, perhaps something for year 3. Estimate 1.0 FTE = 12 months. This proposal addresses requirements from point 2 (a and g).

 

  1. Agreement-based Infrastructure (1.1 to 1.6 FTE)

 

The work we propose here complements that proposed under (D), leveraging emerging WS-Agreement standards under development in GGF and OASIS to produce a standards-based solution to resource management and enforcement.

 

    1. WS-Agreement-based advance resource reservations. The initial work on the spec and first prototype were implemented under the current proposal. Further work includes specification development, prototype adjustments, backending to different technologies (PBS, single machine, others?), better integration with GRAM, GGF standards interaction, release cycle, and work with users. Estimated effort: 0.5 FTE = 6 months. This proposal addresses requirements from points 2 (a and g) and 6 (b).
    2. WS-Agreement-based network infrastructure: we have a prototype rate limiting based transfer plus prediction. We could conceivably do a slightly better integration here, but not much more than that. I think at this point maybe 0.1 FTE = 5 weeks would make sense for integration of basic prediction into RFT, for 3rd year we could propose something more sophisticated but with a rather vague effort level: estimate 0.4 FTE = 20 weeks for year 3. This proposal addresses requirements from point 6 (c).
    3. WS-Agreement based workflow. This would be an infrastructure that makes agreements for an end-to-end fusion service composed of multiple services: data transfer, execution, database access, etc. The terms of this agreement would be derived based on the terms of the subsidiary services. Claiming would entail claiming all subsidiary services. Under the current proposal, we developed a hardwired service that does that for one application; the proposed work would involve making this flexible, work on specification, prototype, testing with different fusion applications, working with users to deploy, some support. Estimated effort: 0.5 FTE = 6 months. This proposal addresses requirements from point 6 (d).

 

  1. Overall (0.6 FTE)

 

Coordination is an important part of making sure that everything runs smoothly and involves reaching out to fusion sites, documenting technical requirements, developing plans, and organizing technical demonstrations for meetings such as SC. This work should continue. There is no explicit requirement noted for this but there should be: without this work none of the other work will get done.

 

    1. Liason activities: discussing and understanding requirements, evaluating capabilities, coordination with the Globus team (writing campaigns, etc.) and larger Grid community, outreach activities (fusion related talks, participation in fusion events, etc.), etc. I think it makes sense to reduce this effort to say 0.2 = 0.6 FTE over three years.

 

Workplan

 

Year 1 (mid 2004 to mid 2005):

 

1)    Development of XIO mode-e driver (plus integration with MDSplus) (estimated effort: 10 weeks) The mode-e driver allows an XIO user to leverage the parallelism aspects of gridftp. Driver development includes design, prototyping, development, testing and integration, release cycle (alpha through final), documentation and integration with community users. This addresses requirements defined in point 5.

2)    Introduction of GridFTP to a new community of users (NIMROD). The support here could range from very extensive (2 site visits by 2 people, help with installation, debugging site setup, performance tuning, tutorials) to more scaled down (debugging site setup, performance tuning, maybe also a tutorial or installation help). In either case we need to include interactions with teams designing translations into the HDF5 format (estimated effort: ~ 15 weeks to 4 weeks spread over the first year). This addresses requirements defined in point 5.

3)    Support for agreement-based between-pulse interaction. The initial work on the spec and first prototype were implemented under the current proposal. Further work includes specification development, prototype adjustments, backending to different technologies (PBS, single machine, others?), GGF standards interaction, and work with users. . This proposal addresses requirements from points 2 (a and g) and 6 (b).

a.     Further development of resource management prototype (different back-end technologies) and integration with users (estimated effort: 10 weeks)

b.     Development of a specification of WS-Agreement for resource management, translating it into GGF-based standards, interaction with teams working on similar technologies  and possible partial adoption (deliverable: specification for resource management)  (estimated effort: 5 weeks)

c.     Initial development of a workflow prototype (estimated effort: 10 weeks)

4)    Support for dynamic accounts. The design and prototype were developed under current proposal. Future work would include integration with authorization service (authorizing creation and management of those accounts) and adjustments to different setups (currently implemented based on modifications to /etc/passwd file but for example a cluster would need yppasswd, etc.). (estimated effort this year: ~ 7 weeks).

5)    Simple gateway enforcement: (estimated effort this year 7 weeks). This is essentially an extension of what we have right now, but would cover new authorization policies and possible modes of working with Akenti. The work will include design, prototyping, development, testing and integration, release cycle (alpha through final), documentation and integration with community users. This proposal addresses requirement from point 2 (a).

6)    Technical support (estimated effort: 10 weeks this year)

7)    Coordiantion (estimated effort: 10 weeks this year)

 

Year 2 (mid-2005 to mid-2006):

 

1)    XIO for Windows: under the current design we have provided an alpha version of Globus I/O for Windows and a design for XIO for Windows. More funding will be needed to provide a reliable implementation. Estimated effort: 0.5 FTE = 6 months. The work will include prototype, development, development of a test suite for Windows, development of packaging for Windows, testing and integration, release cycle, documentation and integration with community users.

2)    Gridftp driver (plus integration with MDSplus). The gridftp driver allows for access to remote files (estimated effort: 10 weeks). Driver development includes design, prototyping, development, testing and integration, release cycle (alpha through final), documentation and integration with community users.

3)    Support for agreement-based between-pulse interaction.

a.     Integration of basic transfer prediction into RFT (estimated effort: 5 weeks)

b.     Delivery of WS-Agreement infrastructure fully integrated with GT3 GRAM. This work includes release cycle, development of testsuite, testing, documentation and integration with users. (estimated effort: 7 weeks)

c.     Development of a specification of WS-Agreement for workflow, translating it into GGF-based standards (deliverable: workflow specification)  (estimated effort: 3 weeks)

d.     Development of a workflow prototype, testing with different fusion applications, working with users to deploy, some support. (estimated effort: 10 weeks)

4)    Support for dynamic accounts; continuation of work from year 1. Integration with authorization service (authorizing creation and management of those accounts) and adjustments to different setups (currently implemented based on modifications to /etc/passwd file but for example a cluster would need yppasswd, etc.). (estimated effort: ~ 7 weeks).

5)    Simple gateway enforcement: estimated effort this year 7 weeks). This is essentially an extension of what we have right now, but would cover new authorization policies and possible modes of working with Akenti. The work will include design, prototyping, development, testing and integration, release cycle (alpha through final), documentation and integration with community users. This proposal addresses requirement from point 2 (a).

6)    Technical support (estimated effort: 10 weeks this year)

7)    Coordiantion (estimated effort: 10 weeks this year)

 

 

 

Year 3 (mid-2006 to mid-2007):

 

1)    Improve network reservations: integrate rate limiting and predictions, deploy as part of Globus (estimated effort: 1/2 FTE) This addresses the requirements of point 6.

2)    Release dynamic accounts as part of the Globus toolkit Integration with WS-Agreement and putting this through a release cycle would take another 0.3 FTE = 15 weeks. This proposal addresses requirements from point 2 (f and g).

3)    Sandboxes and/or VM: replace simple dynamic accounts with a sandbox capable of enforcing properties such as CPU share, disk usage share, memory usage etc. This is a better alternative to the enforcement described in D(a), but at this point would require significant R&D. Open-ended, different levels of involvement would require different levels of effort, perhaps something for year 3. Estimate 1.0 FTE = 12 months. This proposal addresses requirements from point 2 (a and g).

4)    Integrate the workflow service into Globus toolkit Integration with WS-Agreement and putting this through a release cycle would take another 0.3 FTE = 15 weeks.


team contact info | about the fusion grid | fusiongrid research

Last modified 12/28/03. Comments? webmaster