some answers to your questions about data handling at the pole

Gene,

Jodi Lamoureux and I have generated some answers to your questions. Many of the answers are in slide form on web page http://aether.lbl.gov/www/pplogistics/band/index.htm

This includes the calculations and plots of the trigger (raw) data rates and the mean delivered persistent data rate as a function of detector growth (in number of optical modules). You can translate that into time via the schedule, which I think is made too optimistic since the engineering still needs to be done and that will take management and money.

Additional answers:

1. Raw data/day as a function of the growth of the detector.

SEE PLOTS on WEB PAGE, these are preliminary and may be updated. This is effected by the technical question of local concidence or more complicated trigger geometries as well as the topology of the growing detector.

2. I also understand that a computer farm will be installed at the Pole so that the data can be processed and filtered. Processed data/day as a function of the growth of the detector. A few words on how much processing would be great, e.g. discard noise events, insert calibration, etc.

Optimistic Answer: Algorithms developed in AMANDA lead us to believe that there are quick methods of determining "up-" and "down-"ness as well as "round-" and "sausage-"ness in the events. These can be used to quickly select the cascades and up-going muons with a reasonable efficiency. These algorithms are executed on hits that are "causally" connected (clustered) in the event. These calibration and algorithms require ~2 msec/eventt to execute.

The speed of the AMANDA code is not so good. Most of the time is devoted to reading and calibrating the event for each algorithm applied. There are a number of no-nos in the structure of the code which can not be overcome and have convinced us to design a software framework for IceCube. We can not gain useful time estimates, therefore from existing code. Fast I/O have been studied by the Mainz group with estimates of ~5 msec/event. Fast calibration is not so well known since the calibration of digital hits differs significantly from analog. In a digital system, hit times will be coarsely (or perhaps finely) calibrated in the DAQ putting little or no requirement on the online filter farm. Quick calibration in AMANDA take 11 msec/event which is a conservative estimate.

Processor speeds will vary as well. The current estimates are for 400 MHz processors. If we assume 1 GHz processors, then the online requirement will be:

(20 msec/event) * (400 MHz/1 GHz) * (1000 triggered events/sec) = 8 CPUs

We have given ourselves a margin of error in this because of all the uncertainties involved. A small error here, could end up a factor of 2 or 3. Hence the 25 node cluster at the Pole.

We assume that a system manager will be wintering over at the Pole each year and can maintain these machines. Since our share of the computing is small, we assume that Ratheon will be able to provide some system support during the winter months. Installation and upgrades have been factored into the IceCube budget.

(See comment by Gerry Prbzybylski.)

3. Desired data transmission/day as a function of the growth of the detector.

We anticipate transmitting all the persistent data out of the pole over the satellite link. Backups onto tape will be used for transmission failures. It may seem zealous to transmit all the data during the winter. Our reasons are:

1. It is hard to convince collaborators to work fervently on a small fraction of the data. Monitoring as well as analysis is severly impacted when data is not expected for 9 months.

2. Nine month delays in analysis are large. If the delay was 3X shorter and the production processing was occuring, then it might be argued. However, a year is a significant portion of a graduate/postdoc career.

3. The signals do not require a larger bandwidth. All that can be gained is larger background contamination. Since persistent data will have to be stored for many years to come, it is not cost effective to blow up the persistent data.

The IceCube proposal estimates that 13 Gbytes/day is needed. We believe that 17 Gbytes/day will be needed if the satellite transfers have 30% failures.

My other question is does physics require that you transmit all the data (or processed data) back once a day or can you ship parts of it back and leave the rest for shipping when spring comes?
The cost of the transmission infrastructure depends on your needs. Please specify them. It would be nice if Berkeley, who is putting the farm together, concurs.

Our transmission needs are less than 20 Gbytes/day which isn't outrageous. If we transfered all triggered events out of the Pole, we would need many times this bandwidth. If we store all triggered events and shipped them home, we would have to store them permenantly, as well as process them with a 9 month delay. Currently processing a year of AMANDA triggered events takes several months of prep time to define the analysis streams and calibration and then several months of processing time to load all the events off of tape and through the programs. This whole proceedure is streamlined if the lion's share of the data can be transmitted daily. Its more than a 9 month savings.

It makes sense to perform the first level processing at the pole and save the storage, reprocessing, etc. That does require a designed system in that the instrument must calibrate itself to the level that 95 to 99% of the raw triggers can be processed, summarized and discarded and only the enriched sample sent onward for further processing and eventual data analysis.

George Smoot