- using the List containers, and not their low-level data_bytes(),
size_bytes() methods is more convenient and allows future
adjustments to be centralized
ENH: trivial intptr_t wrapper for MPI_Win
STYLE: minor adjustments to mpirunDebug
- 'if constexpr (...)'
* instead of std::enable_if
* terminate template recursion
* compile-time elimination of code
- use C++14 '_t', '_v' versions,
eg, std::is_integral_v<T> instead of std::is_integral<T>::value
- std::begin, std::end, std::void_t instead of prev stdFoam versions
- provide is_contiguous_v<..> as short form of is_contiguous<..>::value
with the additional benefit of removing any cv qualifiers.
ENH: include is_rotational_vectorspace trait
- tests for vector-space and nComponents > 1 (ie, not sphericalTensor)
ENH: improve robustness of pTraits_.. tests by removing cv qualifiers
ENH: eliminate unnecessary duplicate communicator
- in globalMeshData previously had a comm_dup hack to avoid clashes
with deltaCoeffs calculations. However, this was largely due to a
manual implementation of reduce() that used point-to-point
communication. This has since been updated to use an MPI_Allreduce
and now an MPI_Allgather, neither of which need this hack.
- construct Map/HashTable from key/value lists.
- invertToMap() : like invert() but returns a Map<label>,
which is useful for sparse numbering
- inplaceRenumber() : taking a Map<label> for the mapper
ENH: construct/reset CStringList for list of C-strings
- can reduce communication by only sending non-zero data (especially
when using NBX for size exchanges), but proper synchronisation with
multiply-connected processor/processor patches (eg, processorCyclic)
may still require speculative sends.
Can now setup for PstreamBuffers 'registered' sends to avoid
ad hoc bookkeeping within the caller.
- replace Map with a List or DynamicList to reduce the number of
operations and allocations within the loops.
Use polyBoundaryMesh::nProcessorPatches() for initial capacity
to avoid reallocations.
- the changes introduced in f215ad15d1 aim to reduce unnecessary
point-to-point communication. However, if there are also
processorCyclic boundaries involved, there are multiple connections
between any two processors, so simply skipping empty sends will cause
synchronization problems.
Eg,
On the send side:
patch0to1_a is zero (doesn't send) and patch0to1_b does send
(to the same processor).
On the receive side:
patch1to0_a receives the data intended for patch1to0_b !
Remedy
======
Simply stream all of send data into PstreamBuffers
(regardless if empty or non-empty) but track the sends
as a bit operation: empty (0) or non-empty (1)
Reset the buffer slots that were only sent empty data.
This adds an additional local overhead but avoids communication
as much as possible.
- bundles frequently used 'gather/scatter' patterns more consistently.
- combineAllGather -> combineGather + broadcast
- listCombineAllGather -> listCombineGather + broadcast
- mapCombineAllGather -> mapCombineGather + broadcast
- allGatherList -> gatherList + scatterList
- reduce -> gather + broadcast (ie, allreduce)
- The allGatherList currently wraps gatherList/scatterList, but may be
replaced with a different algorithm in the future.
STYLE: PstreamCombineReduceOps.H is mostly unneeded now
- PstreamBuffers nProcs() and allProcs() methods to recover the rank
information consistent with the communicator used for construction
- allowClearRecv() methods for more control over buffer reuse
For example,
pBufs.allowClearRecv(false);
forAll(particles, particlei)
{
pBufs.clear();
fill...
read via IPstream(..., pBufs);
}
This preserves the receive buffers memory allocation between calls.
- finishedNeighbourSends() method as compact wrapper for
finishedSends() when send/recv ranks are identically
(eg, neighbours)
- hasSendData()/hasRecvData() methods for PstreamBuffers.
Can be useful for some situations to skip reading entirely.
For example,
pBufs.finishedNeighbourSends(neighProcs);
if (!returnReduce(pBufs.hasRecvData(), orOp<bool>()))
{
// Nothing to do
continue;
}
...
On an individual basis:
for (const int proci : pBufs.allProcs())
{
if (pBufs.hasRecvData(proci))
{
...
}
}
Also conceivable to do the following instead (nonBlocking only):
if (!returnReduce(pBufs.hasSendData(), orOp<bool>()))
{
// Nothing to do
pBufs.clear();
continue;
}
pBufs.finishedNeighbourSends(neighProcs);
...
- the data front for isoAdvection can be particularly sparse and at
higher processor counts there is an advantage to avoiding all-to-all
communication for the PstreamBuffers exchange
Based on code changes from T.Aoyagi(RIST), A.Azami(RIST)
- simply adds in the reinterpret_cast, which simplifies coding for
binary data movement.
Name complements the size_bytes() method for contiguous data
STYLE: container IO.C files into main headers for better visibility
STYLE: include CompactListList.H in polyTopoChange
- avoids future mismatches if the CompactListList template signature
changes
GIT: relocate CompactListList into CompactLists/ directory
- for use when the is_contiguous check has already been done outside
the loop. Naming as per std::span.
STYLE: use data/cdata instead of begin
ENH: replace random_shuffle with shuffle, fix OSX int64 ambiguity
- returns a range of `int` values that can be iterated across.
For example,
for (const int proci : Pstream::subProcs()) { ... }
instead of
for
(
int proci = Pstream::firstSlave();
proci <= Pstream::lastSlave();
++proci
)
{
...
}
- make read construct from Istream explicit
BUG: sph(const SymmTensor2D<Cmpt>&)
- had incorrect constant, but the 2D routines still need more attention
(#1575)
- nBoundaryFaces() is often used and is identical to
(nFaces() - nInternalFaces()).
- forward the mesh nInternalFaces() and nBoundaryFaces() to
polyBoundaryMesh as nFaces() and start() respectively,
for use when operating on a polyBoundaryMesh.
STYLE:
- use identity() function with starting offset when creating boundary maps.
labelList map
(
identity(mesh.nBoundaryFaces(), mesh.nInternalFaces())
);
vs.
labelList map(mesh.nBoundaryFaces());
forAll(map, i)
{
map[i] = mesh.nInternalFaces() + i;
}
- allows for simpler unpacking of a full list, or list range into any
sufficiently large integral type.
For example,
processorPolyPatch pp = ...;
UOPstream toNbr(pp.neighbProcNo(), pBufs);
toNbr << faceValues.unpack<char>(pp.range());
- The bitSet class replaces the old PackedBoolList class.
The redesign provides better block-wise access and reduced method
calls. This helps both in cases where the bitSet may be relatively
sparse, and in cases where advantage of contiguous operations can be
made. This makes it easier to work with a bitSet as top-level object.
In addition to the previously available count() method to determine
if a bitSet is being used, now have simpler queries:
- all() - true if all bits in the addressable range are empty
- any() - true if any bits are set at all.
- none() - true if no bits are set.
These are faster than count() and allow early termination.
The new test() method tests the value of a single bit position and
returns a bool without any ambiguity caused by the return type
(like the get() method), nor the const/non-const access (like
operator[] has). The name corresponds to what std::bitset uses.
The new find_first(), find_last(), find_next() methods provide a faster
means of searching for bits that are set.
This can be especially useful when using a bitSet to control an
conditional:
OLD (with macro):
forAll(selected, celli)
{
if (selected[celli])
{
sumVol += mesh_.cellVolumes()[celli];
}
}
NEW (with const_iterator):
for (const label celli : selected)
{
sumVol += mesh_.cellVolumes()[celli];
}
or manually
for
(
label celli = selected.find_first();
celli != -1;
celli = selected.find_next()
)
{
sumVol += mesh_.cellVolumes()[celli];
}
- When marking up contiguous parts of a bitset, an interval can be
represented more efficiently as a labelRange of start/size.
For example,
OLD:
if (isA<processorPolyPatch>(pp))
{
forAll(pp, i)
{
ignoreFaces.set(i);
}
}
NEW:
if (isA<processorPolyPatch>(pp))
{
ignoreFaces.set(pp.range());
}
- eliminate iterators from PackedList since they were unused, had
lower performance than direct access and added unneeded complexity.
- eliminate auto-vivify for the PackedList '[] operator.
The set() method provides any required auto-vivification and
removing this ability from the '[]' operator allows for a lower
when accessing the values. Replaced the previous cascade of iterators
with simpler reference class.
PackedBoolList:
- (temporarily) eliminate logic and addition operators since
these contained partially unclear semantics.
- the new test() method tests the value of a single bit position and
returns a bool without any ambiguity caused by the return type
(like the get() method), nor the const/non-const access (like
operator[] has). The name corresponds to what std::bitset uses.
- more consistent use of PackedBoolList test(), set(), unset() methods
for fewer operation and clearer code. Eg,
if (list.test(index)) ... | if (list[index]) ...
if (!list.test(index)) ... | if (list[index] == 0u) ...
list.set(index); | list[index] = 1u;
list.unset(index); | list[index] = 0u;
- deleted the operator=(const labelUList&) and replaced with a setMany()
method for more clarity about the intended operation and to avoid any
potential inadvertent behaviour.