12.4 Case Study: The NASA ECS Project

We now apply the CBAM to a real-world system as an example of the method in action.

The Earth Observing System is a constellation of NASA satellites that gathers data for the U.S. Global Change Research Program and other scientific communities worldwide. The Earth Observing System Data Information System (EOSDIS) Core System (ECS) collects data from various satellite downlink stations for further processing. ECS's mission is to process the data into higher-form information and make it available to scientists in searchable form. The goal is to provide both a common way to store (and hence process) data and a public mechanism to introduce new data formats and processing algorithms, thus making the information widely available.

The ECS processes an input stream of hundreds of gigabytes of raw environment-related data per day. The computation of 250 standard "products" results in thousands of gigabytes of information that is archived at eight data centers in the United States. The system has important performance and availability requirements. The long-term nature of the project also makes modifiability important.

The ECS project manager had a limited annual budget to maintain and enhance his current system. From a prior analysis, in this case an ATAM exercise, a large set of desirable changes to the system was elicited from the system stakeholders, resulting in a large set of architectural strategies. The problem was to choose a (much) smaller subset for implementation, as only 10% to 20% of what was being proposed could actually be funded. The manager used the CBAM to make a rational decision based on the economic criterion of return on investment.

In the execution of the CBAM described next, we concentrated on analyzing the Data Access Working Group (DAWG) portion of the ECS.

STEP 1: COLLATE SCENARIOS

Scenarios from the ATAM were collated with a set of new scenarios elicited from the assembled ECS stakeholders. Because the stakeholders had been through an ATAM exercise, this step was relatively straightforward.

A subset of the raw scenarios put forward by the DAWG team were as shown in Table 12.1. Note that they are not yet well formed and that some of them do not have defined responses. These issues are resolved in step 2, when the number of scenarios is reduced.[1]

[1] In the presentation of the DAWG case study, we only show the reduced set of scenarios.

Table 12.1. Collected Scenarios in Priority Order

Scenario

Scenario Description

1

Reduce data distribution failures that result in hung distribution requests requiring manual intervention.

2

Reduce data distribution failures that result in lost distribution requests.

3

Reduce the number of orders that fail on the order submission process.

4

Reduce order failures that result in hung orders that require manual intervention.

5

Reduce order failures that result in lost orders.

6

There is no good method of tracking ECSGuest failed/canceled orders without much manual intervention (e.g., spreadsheets).

7

Users need more information on why their orders for data failed.

8

Because of limitations, there is a need to artificially limit the size and number of orders.

9

Small orders result in too many notifications to users.

10

The system should process a 50-GB user request in one day, and a 1-TB user request in one week.

STEP 2: REFINE SCENARIOS

The scenarios were refined, paying particular attention to precisely specifying their stimulus-response measures. The worst-case, current-case,desired-case, and the best-case response goals for each scenario were elicited and recorded, as shown in Table 12.2.

Table 12.2. Response Goals for Refined Scenarios
 

Response Goals

Scenario

Worst

Current

Desired

Best

1

10% hung

5% hung

1% hung

0% hung

2

> 5% lost

< 1% lost

0% lost

0% lost

3

10% fail

5% fail

1% fail

0% fail

4

10% hung

5% hung

1% hung

0% hung

5

10% lost

< 1% lost

0% lost

0% lost

6

50% need help

25% need help

0% need help

0% need help

7

10% get information

50% get information

100% get information

100% get information

8

50% limited

30% limited

0% limited

0% limited

9

1/granule

1/granule

1/100 granules

1/1,000 granules

10

< 50% meet goal

60% meet goal

80% meet goal

> 90% meet goal

STEP 3: PRIORITIZE SCENARIOS

In voting on the refined representation of the scenarios, the close-knit team deviated slightly from the method. Rather than vote individually, they chose to discuss each scenario and arrived at a determination of its weight via consensus. The votes allocated to the entire set of scenarios were constrained to 100, as shown in Table 12.3. Although the stakeholders were not required to make the votes multiples of 5, they felt that this was a reasonable resolution and that more precision was neither needed nor justified.

Table 12.3. Refined Scenarios with Votes
   

Response Goals

Scenario

Votes

Worst

Current

Desired

Best

1

10

10% hung

5% hung

1% hung

0% hung

2

15

> 5% lost

< 1% lost

0% lost

0% lost

3

15

10% fail

5% fail

1% fail

0% fail

4

10

10% hung

5% hung

1% hung

0% hung

5

15

10% lost

< 1% lost

0% lost

0% lost

6

10

50% need help

25% need help

0% need help

0% need help

7

5

10% get information

50% get information

100% get information

100% get information

8

5

50% limited

30% limited

0% limited

0% limited

9

10

1/granule

1/granule

1/100 granules

1/1000 granules

10

5

< 50% meet goal

60% meet goal

80% meet goal

> 90% meet goal

STEP 4: ASSIGN UTILITY

In this step the utility for each scenario was determined by the stakeholders, again by consensus. A utility score of 0 represented no utility; a score of 100 represented the most utility possible. The results of this process are given in Table 12.4.

Table 12.4. Scenarios with Votes and Utility Scores
   

Utility Scores

Scenario

Votes

Worst

Current

Desired

Best

1

10

10

80

95

100

2

15

0

70

100

100

3

15

25

70

100

100

4

10

10

80

95

100

5

15

0

70

100

100

6

10

0

80

100

100

7

5

10

70

100

100

8

5

0

20

100

100

9

10

50

50

80

90

10

5

0

70

90

100

STEP 5: DEVELOP ARCHITECTURAL STRATEGIES FOR SCENARIOS AND DETERMINE THEIR EXPECTED QUALITY ATTRIBUTE RESPONSE LEVELS

Based on the requirements implied by the preceding scenarios, a set of 10 architectural strategies was developed by the ECS architects. Recall that an architectural strategy may affect more than one scenario. To account for these complex relationships, the expected quality attribute response level that each strategy is predicted to achieve had to be determined with respect to each relevant scenario.

The set of architectural strategies, along with the determination of the scenarios they address, is shown in Table 12.5. For each architectural strategy/scenario pair, the response levels expected to be achieved with respect to that scenario are shown (along with the current response, for comparison purposes).

Table 12.5. Architectural Strategies and Scenarios Addressed

Strategy

Name

Description

Scenarios Affected

Current Response

Expected Response

1

Order persistence on submission

Store an order as soon as it arrives in the system.

3

5

6

5% fail

<1% lost

25% need help

2% Fail

0% lost

0% need help

2

Order chunking

Allow operators to partition large orders into multiple small orders.

8

30% limited

15% limited

3

Order bundling

Combine multiple small orders into one large order.

9

10

1 per granule

60% meet goal

1 per 100

55% meet goal

4

Order segmentation

Allow an operator to skip items that cannot be retrieved due to data quality or availability issues.

4

5% hung

2% hung

5

Order reassignment

Allow an operator to reassign the media type for items in an order.

1

5% hung

2% hung

6

Order retry

Allow an operator to retry an order or items in an order that may have failed due to temporary system or data problems.

4

5% hung

3% hung

7

Forced order completion

Allow an operator to override an item's unavailability due to data quality constraints.

1

5% hung

3% hung

8

Failed order notification

Ensure that users are notified only when part of their order has truly failed and provide detailed status of each item; user notification occurs only if operator okays notification; the operator may edit notification.

6

25% need help

20% need help

   

7

50% get information

90% get information

9

Granule level-order tracking

An operator and user can determine the status for each item in their order.

6

25% need help

10% need help

   

7

50% get nformation

95% get information

10

Links to user information

An operator can quickly locate a user's contact information. Server will access SDSRV information to determine any data restrictions that might apply and will route orders/order segments to appropriate distribution capabilities, including DDIST, PDS, external subsetters and data processing tools, etc.

7

50% get information

60% get information

STEP 6: DETERMINE THE UTILITY OF THE "EXPECTED" QUALITY ATTRIBUTE RESPONSE LEVELS BY INTERPOLATION

Once the expected response level of every architectural strategy has been characterized with respect to a set of scenarios, their utility can be calculated by consulting the utility scores for each scenario's current and desired responses for all of the affected attributes. Using these scores, we may calculate, via interpolation, the utility of the expected quality attribute response levels for the architectural strategy/scenario pair applied to the DAWG of ECS.

Table 12.6. Architectural Strategies and Their Expected Utility

Strategy

Strategy

Scenarios Affected

Current Utility

Expected Utility

1

Order persistence on submission

3

5

6

70

70

80

90

100

100

2

Order chunking

8

20

60

3

Order bundling

9

10

50

70

80

65

4

Order segmentation

4

80

90

5

Order reassignment

1

80

92

6

Order retry

4

80

85

7

Forced order completion

1

80

87

8

Failed order notification

6

7

80

70

85

90

9

Granule level order tracking

6

7

80

70

90

95

10

Links to user information

7

70

75

The results of this calculation are shown in Table 12.6, for the architectural strategy/scenario pairs presented in Table 12.5.

STEP 7: CALCULATE THE TOTAL BENEFIT OBTAINED FROM AN ARCHITECTURAL STRATEGY

Based on the information collected, as represented in Table 12.6, the total benefit of each architectural strategy can now be calculated, following the equation on page 313. This equation calculates total benefit as the sum of the benefit that accrues to each scenario, normalized by the scenario's relative weight. The total benefit scores for each architectural strategy are given in Table 12.7.

Table 12.7. Total Benefit of Architectural Strategies

Strategy

Scenario Affected

Scenario Weight

Raw Architectural Strategy Benefit

Normalized Architectural Strategy Benefit

Total Architectural Strategy Benefit

1

3

15

20

300

 

1

5

15

30

450

 

1

6

10

20

200

950

2

8

5

40

200

200

3

9

10

30

300

 

3

10

5

-5

-25

275

4

4

10

10

100

100

5

1

10

12

120

120

6

4

10

5

50

50

7

1

10

7

70

70

8

6

10

5

50

 

8

7

5

20

100

150

9

6

10

10

100

 

9

7

5

25

125

225

10

7

5

5

25

25

STEP 8: CHOOSE ARCHITECTURAL STRATEGIES BASED ON ROI VALUE SUBJECT TO COST CONSTRAINTS

To complete the analysis, the team estimated cost for each architectural strategy. The estimates were based on experience with the system, and a return on investment for each architectural strategy was calculated. Using the ROI, we were able to rank each strategy. This is shown in Table 12.8. Not surprisingly, the ranks roughly follow the ordering in which the strategies were proposed: strategy 1 has the highest rank; strategy 3 the second highest. Strategy 9 has the lowest rank; strategy 8, the second lowest. This simply validates stakeholders' intuition about which architectural strategies were going to be of the greatest benefit. For the ECS these were the ones proposed first.

Table 12.8. ROI of Architectural Strategies

Strategy

Cost

Total Strategy Benefit

Strategy ROI

Strategy Rank

1

1200

950

0.79

1

2

400

200

0.5

3

3

400

275

0.69

2

4

200

100

0.5

3

5

400

120

0.3

7

6

200

50

0.25

8

7

200

70

0.35

6

8

300

150

0.5

3

9

1000

225

0.22

10

10

100

25

0.25

8



    Part Two: Creating an Architecture
    Part Four: Moving From One System to Many