*****************************************************************
 *                        PINOCCHIO  V4.1                        *
 *  (PINpointing Orbit-Crossing Collapsed HIerarchical Objects)  *
 *****************************************************************
 
 This code was written by
 Pierluigi Monaco
 Dipartimento di Fisica, Universita` di Trieste
 Copyright (C) 2016
 
 web page: http://adlibitum.oats.inaf.it/monaco/pinocchio.html

 The original code was developed with:
 Tom Theuns           Institute for Computational Cosmology, University of Durham 
 Giuliano Taffoni     INAF - Osservatorio Astronomico di Trieste

 This program is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
 the Free Software Foundation; either version 2 of the License, or
 (at your option) any later version.
 
 This program is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU General Public License for more details.
 
 You should have received a copy of the GNU General Public License
 along with this program; if not, write to the Free Software
 Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA


*****************************************************************

   This file contains information on how to run PINOCCHIO-4.1

*****************************************************************

This documentation is meant to give technical support in running the
PINOCCHIO code. It is not meant to explain the scientific and
cosmological background of the code. The user is supposed to be
familiar with the formalism that has been used to develop PINOCCHIO.

Complete information can be found in the papers quoted at the end of
this file, and in the PINOCCHIO web-site

http://adlibitum.oats.inaf.it/monaco/pinocchio.html


*****************************************************************


INDEX

 0. V4.1.2
 1. Differences with previous versions
 2. The package - V4.1
 3. The parameter file
 4. Pre-compiler directives
 5. Outputs
 6. Designing a run
 7. Special behaviour
 8. Scale-dependent growth rate
 9. PINOCCHIO papers


*****************************************************************


0. V4.1.2

This version has been completely re-written in C. I strongly recommend
this version with respect to the previous ones. Moreover, it presents
a number of new features, presented in Munari et al. (2016) and Rizzo
et al. (2016):

 - particle displacements can be computed with Lagrangian Perturbation
   Theory (LPT) up to third order;
 - the code is interfaced with V3 of fftw library;
 - parameter calibration has been improved;
 - the code allows the user to have a much better control of memory
   usage, so as to best exploit a given machine;
 - it can output all particles in a Gadget2 format snapshot;
 - it can work with scale-dependent growth factors.

With respect to V4.0, this version fixes an important bug in the
production of halos in the past light cone. Moreover, it improves in
the writing of results as Gadget snapshots.

With respect to V4.1, this version fixes other bugs in the PLC
reconstruction, allows the user to visualize the tiling of the box
replications to sample the past light cone, and improves in the
writing of snapshots.


*****************************************************************


1. DIFFERENCES WITH PREVIOUS VERSIONS


V1 of pinocchio is in fortran 77 and is not parallel. It adopts an
out-of-core strategy to minimize memory requirements, so it needs fast
access to the disc.

V2 has been re-written in fortran 90. The fmax code is fully parallel
but maintains the out-of-core design, so it keeps memory requirements
low but needs fast disc access.  It uses fftw-2.1.5 to compute FFTs,
in place of the code based on Numerical Recipes.  The fragment code is
provided in two versions, a scalar one and a parallel one. This
parallelization has no restrictions in validity but does not have good
scaling properties, the scalar code on a shared-memory machine is
preferable whenever possible.

V3.0 is fully, parallel, is written in fortran90 + C, and uses Message
Passing Interface (MPI) for communications. It has been designed to
run on hundreds if not thousands of cores of a massively parallel
super-computer. Main features:

- Parameters are passed to the code with a Gadget-like parameter file.

- The two separate codes (fmax and fragment) have been merged and no
  out-of-core strategy is adopted, so i/o is very limited but the
  amount of needed memory rises by a factor of three.  This version of
  the code needs ~110 bytes per particle.

- Fragmentation is performed by dividing the box into sub-volumes and
  distributing one sub-volume to each MPI task. The tasks do not
  communicate during fragmentation, so each sub-volume needs to extend
  the calculation to a boundary layer, where reconstruction is
  affected by boundaries. For small boxes at very high resolution the
  overhead implied by the boundary layer would become dominant; in
  this case the fragmentation code of version 2 would be preferable.

- We have merged PINOCCHIO with the generator of a Gaussian density
  field embedded in the N-GenIC code by V. Springel.

- The code has also been extended to consider a wider range of
  cosmologies including a generic, redshift-dependent equation of
  state of the quintessence.

- The code can generate a full-sky catalog of halos in the
  past-light-cone up to a redshift specified by the user. In this case
  the box is replicated using periodic boundary conditions to fill the
  needed cosmological volume.

A complete description of the code is provided in Monaco et
al. (2013); see Section 7 below.

V4 has been rewritten in C. It uses MPI for communications,
extension to OpenMP is in project and may be released in future
versions. These are the main changes:

- Particle displacements are computed with Zeldovich, 2LPT or 3LPT
  (without the rotational term). The LPT order is decided at
  compilation time. Higher-order displacements are used to compute the
  position of a group in an output catalog, but groups can be
  reconstructed with lower-order LPT if requrired.

- The code is interfaced with fftw3, the obsolete fftw2.1.5 library is
  not adopted any more.

- Parameters are calibrated so as to reproduce the analytic mass
  function of Watson et al. (2013) in a wide range of box size and
  mass resolution.

- Memory usage has been improved by performing the fragmentation of
  collapsed particles (the procedure used to create halo catalogs) in
  slices of the original box. The number of slices is computed by
  requiring that the required memory stays below a limit, provided by
  the user in units of bytes per particle. A separate scalar code
  simulates memory requirements for a given run, so as to best design
  big runs.

- The code can output positions and velocities of all particles as a
  snapshot in Gadget2 format. Particle positions are computed as
  follows: halo particles are distributed around the center as
  Navarro-Frenk-White spheres, while uncollapsed particles and
  particles in filaments are simply displaced from their initial
  position. Alternatively, the code can write an LPT output of all
  particles, again in Gadget2 format, at the order specified at
  compilation. This can be used to generate initial conditions for a
  simulation; this option is presently limited to second-order LPT.

- In the merger history file, trees are given consecutively (before
  all halos that belong to some tree were giving together, and trees
  ought to be extracted from the complete halo list) and the
  quantities provided are slightly different from previous versions.

V4.1 fixes a bug in the production of halos in the past light cone,
and improves in the writing of results as Gadget snapshots.


*****************************************************************


2. THE PACKAGE - V4.1

This software has been tested successfully on several machines. You
are most warmly encouraged to use this code and report any problem to
monaco@oats.inaf.it.

The src/ directory of the PINOCCHIO-4.1 package contains source files,
headers, the Makefile, the main code pinocchio.c, and these further
codes:

- memorytest.c to test whether a given configuration will achieve the
  required memory;

- read_pinocchio_binary.c, read_pinocchio_binary.f90 and
  ReadPinocchio.py to read the catalogs in binary format, and rewrite
  them into ascii if needed (C version).

- PlotPLCGeometry.py to visualize the tiling of boxes to sample the
  past light cone (when the PLC directive is present).

The example/ directory contains the results of an example run, see
below and the INSTALLATION file for more details.


*****************************************************************


3. THE PARAMETER FILE

Parameters are passed to the code through a parameter file. The file
named "parameter_file" in the example/ directory gives a list of all
allowed parameters. Each line starting with # or % will be interpreted
as a comment. There are two kinds of keywords in this file, those that
require an argument and MUST be present, and logical keywords that
require no argument and, if absent, are assumed as FALSE. As an
exception, the PLCCenter and PLCAxis parameters must be present only
if the logical keyword PLCProvideConeData is specified. The example
parameter file is reported here for clarity. Empty lines and lines
starting with "%" or "#" are ignored.

****************************** parameter_file **********************************

# This is an example parameter file for the Pinocchio 4.1 code

# run properties
RunFlag                example      % name of the run
OutputList             outputs      % name of file with required output redshifts
BoxSize                500.0        % physical size of the box in Mpc
BoxInH100                           % specify that the box is in Mpc/h
GridSize               200          % number of grid points per side
RandomSeed             999          % random seed for initial conditions

# cosmology
Omega0                 0.25         % Omega_m (total matter)
OmegaLambda            0.75         % Omega_Lambda
OmegaBaryon            0.044        % Omega_b (baryonic matter)
Hubble100              0.70         % little h
Sigma8                 0.8          % sigma8; if 0, it is computed from the provided P(k)
PrimordialIndex        0.96         % n_s
DEw0                   -1.0         % w0 of parametric dark energy equation of state
DEwa                   0.0          % wa of parametric dark energy equation of state
TabulatedEoSfile       no           % equation of state of dark energy tabulated in a file
FileWithInputSpectrum  no           % P(k) tabulated in a file; 
                                    % "no" means that the fit of Eisenstein & Hu is used

# from N-GenIC
InputSpectrum_UnitLength_in_cm 0    % units of tabulated P(k), or 0 if it is in h/Mpc
WDM_PartMass_in_kev    0.0          % WDM cut following Bode, Ostriker & Turok (2001)

# control of memory requirements
BoundaryLayerFactor    1.0          % width of the boundary layer for fragmentation
MaxMem                 3600         % max available memory to an MPI task in Mbyte
MaxMemPerParticle      300          % max available memory in bytes per particle

# output
CatalogInAscii                      % catalogs are written in ascii and not in binary format
OutputInH100                        % units are in H=100 instead of the true H value
NumFiles               1            % number of files in which each catalog is written
MinHaloMass            10           % smallest halo that is given in output
AnalyticMassFunction   9            % form of analytic mass function given in the .mf.out files

# output options:
% WriteSnapshot                     % writes a Gadget2 snapshot as an output
% DoNotWriteCatalogs                % skips the writing of full catalogs (including PLC)
% DoNotWriteHistories               % skips the writing of merger histories

# for debugging or development purposes:
% WriteFmax                         % writes the values of the Fmax field, particle by particle
% WriteVmax                         % writes the values of the Vmax field, particle by particle
% WriteRmax                         % writes the values of the Rmax field, particle by particle
% WriteDensity                      % writes the linear density, particle by particle

# past light cone
StartingzForPLC        0.3          % starting (highest) redshift for the past light cone
LastzForPLC            0.0          % final (lowest) redshift for the past light cone
PLCAperture            30           % cone aperture for the past light cone
% PLCProvideConeData                % read vertex and direction of cone from paramter file
% PLCCenter 0. 0. 0.                % cone vertex in the same coordinates as the BoxSize
% PLCAxis   1. 1. 0.                % un-normalized direction of the cone axis

****************************** parameter file **********************************

Outputs will be given at all the redshifts specified in a file, whose
name is given in the parameter file (OutputList). This file should
contain one redshift per line, in descending (chronological) order.
Empty lines and lines starting with "%" or "#" will be ignored.
The last redshift in this file determines the final redshift for the
run, so at least one valid line must be present.

Please remember that, because pinocchio constructs merger histories
and light cones on the fly, you will rarely need to write out a large
number of outputs.

Pinocchio uses the analytic fit of Eisenstein & Hu (1998) to compute
the power spectrum. The user can provide a tabulated power spectrum to
the code. The filename containing the power spectrum will be given as
an argument of the FileWithInputSpectrum keyword. The file is supposed
to provide two columns with k and P(k), in units of h/Mpc and
(Mpc/h)^3; if the P(k) is provided in other units, you can specify
them using InputSpectrum_UnitLength_in_cm. It is anyway assumed that
units are for H=100. A P(k) generated by the CAMB code can be directly
provided. The parameter Sigma8 gives the value of the sigma8
parameter, used to renormalize the power spectrum. In case the
normalization of the power spectrum provided in a file is already
correct (say, it has been generated by CAMB), you can specify 0.0 for
Sigma8; in this case the code will not compute the normalization
constant of the power spectrum. The value of the sigma8 parameter will
be written in the standard output, so it is easy to check its
correctness.


*****************************************************************


4. PRE-COMPILER DIRECTIVES

These are the pre-compiler directives that can be specified in the
Makefile.

TWO_LPT: activates the computation of displacements with second-order
LPT.

THREE_LPT: activates the computation of displacements with third-order
LPT.

These directives will determine the order at which particles and
groups are displaced when a catalog is written. However, groups can be
constructed using a lower-order displacement. Indeed, it is
demonstrated in Munari et al. (2016) that 3rd order is very convenient
for displacements but not necessarily for constructing groups. The
order used to construct groups and displace particles is specified in
fragment.h, and by default it is:

#define ORDER_FOR_GROUPS 2
#define ORDER_FOR_CATALOG 3

If the corresponding directives (TWO_LPT or THREE_LPT) are not active
the order used for the computations will be limited at the highest one
available in that configuration.

PLC: it is used to generate past-light-cone catalogs. The code chooses
a position in the box to locate the observer (default: a random
position in the box), defines a survey volume as a cone of aperture
PLCAperture (in degrees) and directed toward a direction (default:
1,1,1), replicates the box (with periodic boundary conditions) to fill
the survey volume and stores the properties of all the groups that
cross the PLC in each of the replications. The catalog is output at
the end.

The corresponding parameters (StartingzForPLC, LastzForPLC,
PLCAperture) MUST be present in the parameter file even when the PLC
directive is not active, in which case their value is irrelevant. When
PLC is active, specifying a negative value for StartingzForPLC will
result in switching off the construction of the PLC. It is possible to
specify the position of the PLC vertex and the direction of the cone
axis, using the keywords PLCProvideConeData, PLCCenter and PLCAxis.

ROTATE_BOX: N-GenIC and 2LPTic use a different ordering of data in
Fourier space; as a consequence, when pinocchio is run with the same
random seed as one of these two codes, it produces a box whose
large-scale structure is rotated with respect to the one produced by
those codes. This directive tells the code to rotate the axes back to
produce exactly the same large-scale structure. In cases when no
comparison with N-body simulations is in program it has no practical
use, though it does not introduce any significant overhead.

WHITENOISE: the code can upload a white noise generated by the GRAFIC2
code; it must be contained in a file called WhiteNoise. Please ask
details to monaco@oats.inaf.it, or have a look at the ReadWhiteNoise.c
file.

NO_RANDOM_MODULES: if active, the modulus of k-space modes of the
density field is fixed to the mean value, and not randomly
distributed; phases remain random. This is a numerical trick to
suppress sample variance in single realizations, that can be very
useful in some contexts. Random numbers for the moduli are anyway
generated, so two runs with the same seed, with and without
NO_RANDOM_MODULES will have the same phases and give roughly the same
large-scale structure.

SCALE_DEPENDENT_GROWTH: if active, the code computes the growth
factors (for density and velocity) from a set of power spectra, in a
range of redshifts, generated by the CAMB code. See below for further
explanations.


*****************************************************************


5. OUTPUTS

The code produces the following files. All the outputs (excluding the
mass function) will be split in a number of files equal to the value
of NumFiles parameters. If this is >1 then the files will have a
further extension of an integer from 0 no NumFiles-1.

- pinocchio.[output redshift].[run name].mf.out

It gives, for the specified output redshift, the mass function of all
halos with at least MinHaloMass particles. It is an ascii file with
the following format (as specified in the file header, here for
OutputInH100):

# Mass function for redshift 0.000000
# 1) mass (Msun/h)
# 2) n(m) (Mpc^-3 Msun^-1 h^4)
# 3) upper +1-sigma limit for n(m) (Mpc^-3 Msun^-1 h^4)
# 4) lower -1-sigma limit for n(m) (Mpc^-3 Msun^-1 h^4)
# 5) number of halos in the bin
# 6) analytical n(m) from ****

Errorbars give only the Poisson error on the number of halos. The
analytic mass function is chosen through the parameter
AnalyticMassFunction:

    AnalyticMassFunction = 0:  Press  & Schechter (1974)
    AnalyticMassFunction = 1:  Sheth & Tormen (2001)
    AnalyticMassFunction = 2:  Jenkins et al. (2001)
    AnalyticMassFunction = 3:  Warren et al. (2006)
    AnalyticMassFunction = 4:  Reed et al. (2007)
    AnalyticMassFunction = 5:  Crocce et al. (2010)
    AnalyticMassFunction = 6:  Tinker et al. (2008)
    AnalyticMassFunction = 7:  Courtin et al. (2010)
    AnalyticMassFunction = 8:  Angulo et al. (2012)
    AnalyticMassFunction = 9:  Watson et al. (2013)
    AnalyticMassFunction =10:  Crocce et al. (2010), with forced universality

- pinocchio.[output redshift].[run name].catalog.out

This file gives, for the specified output redshift, the catalog of all
halos with at least MinHaloMass particles. The file contains the
following information (as described in the file header, in case
CatalogInAscii is specified - here we assume OutputInH100):

# Group catalog for redshift [redshift] and minimal mass of 10 particles
# All quantities are relative to H0=100 km/s/Mpc
#    1) group ID
#    2) group mass (Msun/h)
# 3- 5) initial position (Mpc/h)
# 6- 8) final position (Mpc/h)
# 9-11) velocity (km/s)
#   12) number of particles

If CatalogInAscii is not specified, the same quantities are provided
in fortran binary format (see below). In this case floating-point
quantities are in double precision.

- pinocchio.[run name].histories.out

This file is generated at the final output redshift and gives the
merger histories of all halos. At each merging event the largest halo
maintains its ID-number, while the other groups are labelled as
non-existing and are not updated any more. In the merger trees only
the halos with at least MinHaloMass particles are considered. The
content of the file is illustrated by the header obtained for the
ascii output:

# Merger histories for halos with minimal mass of 10 particles
#  1) group ID
#  2) index within the tree
#  3) linking list
#  4) merged with
#  5) mass of halo at merger (particles)
#  6) mass of main halo it merges with, at merger (particles)
#  7) merger redshift
#  8) redshift of peak collapse
#  9) redshift at which the halo overtakes the minimal mass

Differently from previous versions, trees are given consecutively. The
first number in a non-commented line gives the number of slices used
to fragment the box, then, for each slice, the total number of trees
and branches is first provided, then the trees are given one by one.

Halos (both trees and branches) are numbered in two ways. Group ID (an
unsigned long long integer) is the id-number of the halo, and
corresponds to the ID of the peak of Fmax from which the halo has been
created. If (i,j,k) are its coordinates in grid units, 0<=i<GridSize,
then

  ID = i + j * GridSize + k * GridSize**2

The "index within the tree" is a consecutive number, from 1 to
Nbranches, that identifies the halo within the tree. The main halo has
index=Nbranches.

"Linking list" is a chaining mesh that allows to scroll the tree:
starting from the main branch, it points recursively to all the halos
that have merged into it, linking back to it at the end. Because the
halos are in fact written in the very same order, this is an
consecutive number that is given mostly for backward compatibility.

"merged with" gives the index (within the tree) of the (larger) halo
with which the halo has merged when it disappeared, or -1 if the halo
still exists.

If the halo is still existent, then the "mass of the halo at merger"
is the number of particles of the halo at the final redshift and the
"mass of the halo it merges with, at merger" is 0. Otherwise, the halo
has disappeared by merging with a bigger one, and the first quantity
is its number of particles, while the second is the number of
particles of the halo it has merged with, both given at the time of
merging.

"merging redshift" is the redshift at which the group has disappeared,
-1 if the group exists.

"redshift of peak collapse" is the redshift at which the peak
particle, that gives the name to the halo through its group ID,
collapses, creating the seed of the main halo branch.

"redshift at which the halo overtakes the minimal mass" is the first
time the halo, the main branch of the tree, overtakes the MinHaloMass
limit specified in the parameter file. This is when storing of tree
starts.

- pinocchio.[run name].plc.out

This file is generated at the final output redshift and gives a
catalog of all halos more massive than MinHaloMass at the redshift
when they cross the past-light-cone in one of the replications. The
content of the file is illustrated by its header (with CatalogInAscii
and OutputInH100):

# Group catalog on the Past Light Cone for a minimal mass of 10 particles
# All quantities are relative to H0=100 km/s/Mpc
#    1) group ID
#    2) true redshift
#  3-5) comoving position (Mpc/h)
#  6-8) velocity (km/s)
#    9) group mass (Msun/h)
#   10) theta (degree)
#   11) phi (degree)
#   12) peculiar velocity along the line-of-sight (km/s)
#   13) observed redshift

Positions are relative to the chosen center of the PLC, and take into
account the replications of the box; velocities are in the box frame.
Theta and phi are spherical coordinates aligned with the box axes, the
observed redshift takes into account the peculiar velocity along the
line of sight.

- pinocchio.[run name].geometry.out

This file is produced when the PLC option is active, and contains a
list of the box replications that are used to sample the light cone.
The python script PlotPLCGeometry.py reads the information in this
file and plots the used box tiling.

- pinocchio.[run name].cosmology.out

This file contains the main cosmological quantities computed by the
code, and is given for reference. The contained quantities are
specified in the header.

- pinocchio.[output redshift].[run name].snapshot.out

Under request (WriteSnapshot in the parameter file) the code writes a
gadget-2 snapshot of all the particles. This works only if the box is
fragmented in a single slice, otherwise the writing of the snapshot is
not done (but the code goes on to the end). Uncollapsed or filament
particles are displaced using LPT at the order specified at
compilation time. Halo particles are distributed around the halo
center of mass as Navarro, Frenk \& White spheres, with concentrations
taken from Bhattacharya et al. (2013), while velocities are
Gaussian-distributed with standard deviation equal to the halo
circular velocity divided by sqrt(3).  

The snapshot contains the following block: ID (particle IDs), POS
(positions), VEL (velocities), GRID (particle group IDs, that is 0 for
uncollapsed particles and 1 for filaments) and an INFO block.

Halos lying at the border may be reconstructed in a slightly different
way by different tasks, and this can result in inconsistencies in the
halo assignment of particles.  This is evident in the halo catalogs
only in the fact that some halos may appear in different merger
trees. In a snapshot, each particle must be given a group ID, and this
can be dealt with inconsistently by different tasks.  This problem is
faced in two possible ways. As a default choice, halo consistency is
preferred with respect to particle duplication.  As a result, the
snapshot does not contain the right number of particles, but the
number of particles belonging to each group is consistent with group
mass. If the FORCE_PARTICLE_NUMBER directive is used, particle number
is preferred with respect to halo consistency. The number of particles
is now correct, but some halos will contain fewer particles than their
nominal mass.

Finally, the option ONLY_LPT_DISPLACEMENTS will result in all
particles stored with their LPT positions, regardless of their
belonging to a halo or not. The result is very similar to the special
behaviour described below, so it can be used to generate snapshots
with pure LPT displacement up to third order, though the whole code
must be run to get to this point - halo construction will not be used.

- Special files

Special output is produced on demand. They are placed in the Data/
directory. Most of them are produced for compatibility with
Pinocchio-2 and can be useful for detailed analyses:

  Data/Fmax.[run name].d
  Data/Velx.[run name].d
  Data/Vely.[run name].d
  Data/Velz.[run name].d
  Data/Rmax.[run name].d
  Data/density.grid0.[run name].d

- Binary format

If CatalogInAscii is not activated in the parameter file, catalogs,
past-light cones and merger histories are written in binary format.
This is the recommended format for large runs. We provide three codes,
read_pinocchio_binary.c, read_pinocchio_binary.f90, and
ReadPinocchio.py to read binary files. In particular, the C program
can be used as follows:

prompt> ./read_pinocchio_binary RunFlag redshift [ascii]

The code will read the catalog of the run named RunFlag at the
specified redshift, that is assumed to be a string and not a float
number - so give exactly the number you find in the filename. If
redshift takes the value "plc" or "hist" then the plc or histories
file will be read. The code, that is provided just as an example code,
prints out the information of the first ten and last lines. This is
true for both the fortran and the C code. If the ascii argument is
further provided to the C code, it will translate the binary file in
ascii, producing a file very similar to the one that PINOCCHIO would
produce with the CatalogInAscii keyword.

The ReadPinocchio.py contains three python classes for reading
catalogs, past-light cones and histories. Basic instruction are given
in the file itself.


*****************************************************************


6. DESIGNING A RUN

The main limit of pinocchio is memory, so the largest run that can be
performed on a machine will be defined by the amount of available RAM.
To design a run it is necessary to take into account the following
factors.

Each particle requires an amount of memory of at least ~112 bytes for
Zeldovich, ~144 bytes for 2LPT, ~227 for 3LPT. However, the exact
amount of memory depends on the overhead due to the boundary layers of
sub-volumes during fragmentation. The computation of collapse times is
performed using fftw, that distributes memory in planes, so that a few
planes to each MPI task. Grouping of collapsed particles into halos
(fragmentation) is performed in sub-volumes, but to avoid border
effects each task must process a boundary layer around each
sub-volume. This overhead in memory allows fragmentation to progress
with minimal communications. The width of the boundary layer is
computed as the Lagrangian radius of the largest halo that is expected
to be present in the box at the latest redshift, times the
BoundaryLayerFactor parameter, that thus regulates this important
quantity. Values >=1 are recommended for this parameter, but
distributing a relatively small box (<500 Mpc/h) on many tasks
requires a very large overhead in memory, so it may be necessary to
adopt lower values of BoundaryLayerFactor in these cases.

To limit the overhead due to boundary layers, the code will perform
the fragmentation of the box in a certain number of slices. This does
increase the overhead by itself, but allows to divide the box in a
larger number of sub-volumes and load only a part of them each time,
decreasing total memory requirement at the cost of repeating the
fragmentation process several times. However, fragmentation is a quick
process, so this time overhead is typically small. The parameter that
controls the slicing of the box is MaxMemPerParticle, that specifies
the highest allowed amount of memory, in bytes, per particle: the code
will seek for a sub-division of the volume that minimizes the overhead
and keeps the amount of bytes per particles below that limit. Clearly
the MaxMemPerParticle field should not be below the minimum amount of
memory stated above, that depends on the LPT order used. Typical
values will be 120 for Zeldovich, 150 for 2LPT, 300 for 3LPT.

Each MPI task must accommodate a given number of planes in memory. The
MaxMem parameter (in Mbytes) should be set to the largest amount of
memory available to an MPI task; the code will stop if the memory to
be allocated is above this quantity. Beware that, if the number of
planes is not a multiple of the number of MPI tasks, fftw will
allocate the planes in an uneven way. Also, memory distribution by
planes is the main limit to running boxes with a very large number of
particles. Indeed, an MPI task must keep at least one plane, and this
will create a minimum amount of memory per task that in large runs can
be more than the available memory per core. In this case a mixed
MPI-OpenMP design (in progress) can mitigate the problem, though an
fft solver with pencil-based distribution of memory (in project) is
the best solution.

To help in the design of a big pinocchio run, we provide a code
memorytest.c, that can be compiled as explained in the INSTALLATION
file. To run it:

> ./memorytest [parameter file] [number of tasks]

This scalar code, that needs to know how many tasks will be used for
the pinocchio run, will read the parameter file, initialize the
cosmology and compute the optimal sub-division of the box to fragment,
checking that a configuration exists that passes all tests. It will
fail if:

- There is no way to sub-divide the box to obtain an amount of memory
  per particle below MaxMemPerParticle.  In this case either this
  parameter is increased or BoundaryLayerFactor is decreased.

- The request of memory per task is higher than MaxMem.  In this case
  it is necessary either to increase MaxMem (if possible) or to
  increase the number of tasks; this will however increase the
  overhead because the box will be fragmented in more sub-volumes.

- Enough memory is available in total, but the tasks will not be able
  to load the required number of planes because these require much
  memory. This can happen in two cases. A very big run in a machine
  that has a limited amount of RAM per core will easily run in this
  problem; a (very inefficient) solution is to run on fewer cores of a
  each single shared-memory node. The second case is when the box is
  large and the distribution of planes is not even, so as matter of
  fact some tasks must keep in memory more than the total needed
  memory divided by the number of tasks. The solution is to have the
  number of planes a multiple of the number of tasks.


*****************************************************************


7. SPECIAL BEHAVIOUR

The code can be run with a further line argument. If this takes the
value 1 or 2 then a special behaviour is obtained.

> mpirun -np * pinocchio.x parameter_file 1

will have as result of writing the linear density field in the
Data/density.grid0.[run name].d.

> mpirun -np * pinocchio.x parameter_file 2

will have the result of computing LPT displacements for all particles
and writing a gadget-2 snapshot with particles displaced at the last
redshift specified in the OutputList file. This can be used to
generate initial conditions for a simulation, or an LPT density field
at any redshift. This option presently works only for 2LPT, if the
code is compiled with the THREE_LPT directive no output will be
produced.  3LPT displacements can be obtained by compiling the code
with the ONLY_LPT_DISPLACEMENTS directive, and using the WriteSnapshot
tag in the parameter file. This will require the whole code to be run.
In the case of Zeldovich or 2LPT, the same result is obtained this way
with respect to using the special behaviour, but with a much longer
computing time.


*****************************************************************


8. SCALE-DEPENDENT GROWTH RATE

Compiling the code with the SCALE_DEPENDENT_GROWTH directive allows
the user to deal with a scale-dependent growth factor, as the one
characterizing the growth of matter perturbations in a massive
neutrino cosmology, if not in modified gravity theories.

This is illustrated in the paper by Rizzo et al. (2016), submitted to
JCAP.

Growth functions are computed by reading power spectra generated by
the CAMB code (http://camb.info) in a grid of redshifts. The following
keywords must be added to the parameter file (parameter values are an
example):

CAMBMatterFileTag     matterpower       % label for matter power spectrum files
CAMBTransferFileTag   transfer_out      % label for transfer function files
CAMBRunName           nilcdm_0.3eV      % name of CAMB run
CAMBRedsfhitsFile     inputredshift     % list of redshifts for which the power spectrum is available
CAMBReferenceOutput   149               % total number of CAMB outputs
CAMBReferenceScale    100               % reference output for the linear density field

To use the code in this configuration, please contact
monaco@oats.inaf.it; a working example can be provided.


*****************************************************************


9. PINOCCHIO PAPERS


    The PINOCCHIO algorithm: pinpointing orbit-crossing collapsed
    hierarchical objects in a linear density field, Pierluigi Monaco,
    Tom Theuns & Giuliano Taffoni, 2002, MNRAS, 331, 587

    Predicting the Number, Spatial Distribution and Merging History of
    Dark Matter Haloes, Pierluigi Monaco, Tom Theuns, Giuliano
    Taffoni, Fabio Governato, Tom Quinn & Joachim Stadel, 2002, ApJ,
    564, 8

    PINOCCHIO and the Hierarchical Build-Up of Dark-Matter Haloes,
    Giuliano Taffoni, Pierluigi Monaco & Tom Theuns, 2002, MNRAS, 333,
    623

    An accurate tool for the fast generation of dark matter halo
    catalogs, Pierluigi Monaco, Emiliano Sefusatti, Stefano Borgani,
    Martin Crocce, Pablo Fosalba, Ravi Sheth & Tom Theuns, 2013,
    MNRAS, 433, 2389

    nIFTy cosmology: Galaxy/halo mock catalogue comparison project on
    clustering statistics, C.-H. Chuang, C. Zhao, F. Prada et al.,
    2015, MNRAS, 452, 686.

    Improving the prediction of dark matter halo clustering with
    higher orders of Lagrangian Perturbation Theory, E. Munari,
    P. Monaco, E. Sefusatti, E. Castorina, F.G. Mohammad, S. Anselmi &
    S. Borgani, 2017, MNRAS, 465, 4658

    Simulating cosmologies beyond LambdaCDM with Pinocchio,
    L.A. Rizzo, F. Villaescusa-Navarro, P. Monaco, E. Munari,
    S. Borgani, E. Castorina & E. Sefusatti, 2017, JCAP, 01 008
    arXiv:1610.07624