Version 3 of PINOCCHIO


Bug report (May 27th, 2013):

Version 3.0.1 solves a bug contained in the original 3.0 version: when the power spectrum was input in the code through a file, the code was incorrectly tilting the spectrum by k**(1-PowerSpectrum). Moreover, the documentation file gives more precise indications on how to give the power spectrum through an input file.

Main features:

General description

The latest version V3.0 is fully, parallel, is written in fortran90 + C, and uses Message Passing Interface (MPI) for communications. It has been designed to run on hundreds if not thousands of cores of a massively parallel super-computer. The two separate codes have been merged and no out-of-core strategy is adopted, so the amount of needed memory rises by a factor of three with respect to the previous version. The computation of collapse times is performed as in version 2. Fragmentation is performed by dividing the box into sub-volumes and distributing one sub-volume to each MPI task. The tasks do not communicate during this process, and each sub-volume needs to extend the calculation to a boundary layer, where reconstruction is affected by boundaries. From our tests and for a standard cosmology, the reconstruction of the largest objects is convergent when the boundary layer is larger than about 30 Mpc. This strategy minimizes the number of communications among tasks, and the boundary layer requires an overhead that is typically of order of some tens per cent for large cosmological boxes. For small boxes at very high resolution this overhead would become dominant, in which case the serial code of Version 1 (on a large shared-memory machine) or the parallel code of Version 2 would be preferable.

To generate the initial linear density field in the Fourier space, we have merged PINOCCHIO with a part of the code taken from N-GenIC by V. Springel. Besides a few technical improvements with respect to the original PINOCCHIO code, this has the advantage to be able to faithfully reproduce a simulation run from initial conditions generated with N-GenIC, or with 2LPTic, the second-order LPT version by R. Scoccimarro, just from the knowledge of the assumed cosmology and the random number seed.

The code has also been extended to consider a wider range of cosmologies including a generic, redshift-dependent equation of state of the quintessence, but the computation of collapse times based on ellipsoidal collapse still relies on the assumption that the dependence on cosmology is factorized out of dynamical evolution when the growing mode D(t) is used as a clock, an approximation that should be tested before using the code for more general cosmologies. Displacements of groups from their final position are still computed with the Zeldovich approximation.

Previous versions

The original scalar code (latest release: V1.1) was written in Fortran 77 and designed to work on a simple PC. It allowed to perform runs of 256^3 particles on a 450 MHz PentiumIII machine with 512 Mbyte of RAM in nearly 6 hours, a remarkable achievement that allowed to obtain reasonable statistics of merger histories with no access to a supercomputer. Because memory is the limiting factor in this case, the code has an out-of-core design: it keeps in memory only one component of the derivatives of the potential at a time, while the other components are saved on the disc. The most time- and memory- consuming part is the computation of collapse times, fragmentation takes less than 10 per cent of time.

In 2005, P. Monaco and T. Theuns wrote the parallel (MPI) version 2 of PINOCCHIO (latest release: V2.3), that was publicized among interested researchers. It is written in Fortran 90 and uses the FFTW package to compute Fast Fourier Transforms. While parallelizing the computation of collapse times is straightforward (FFTW takes care of most communications), the fragmentation code was parallelized rather inefficiently, with one task performing the fragmentation and other tasks acting as storage; fragmentation is so quick that even this parallelization gives reasonable running times. Memory requirements were still minimized with an out-of-core strategy. This code is suitable to run on tens of cores, and requires fast access to the disc; when the number of cores increases, reading and writing on the disc becomes the limiting factor.