- directory discovery originally designed for a sub-dir location
(eg, etc/openfoam) but failed if called from within the sub-dir
itself.
Now simply assume it is located in the project directory or the etc/
sub-dir, so that it can also be relocated into the project directory
in the future (pending changes to RPM and debian packaging)
- for querying all outstanding requests:
if (UPstream::finishedRequests(startRequest)) ...
if (UPstream::finishedRequests(startRequest, -1)) ...
- for querying slice of outstanding requests:
if (UPstream::finishedRequests(startRequest, 10)) ...
- simplifies communication structuring with intra-host communication.
Can be used for IO only, or for specialised communication.
Demand-driven construction. Gathers the SHA1 of host names when
determining the connectivity. Internally uses an MPI_Gather of the
digests and a MPI_Bcast of the unique host indices.
NOTE:
does not use MPI_Comm_splt or MPI_Comm_splt_type since these
return MPI_COMM_NULL on non-participating process which does not
easily fit into the OpenFOAM framework.
Additionally, if using the caching version of
UPstream::commInterHost() and UPstream::commIntraHost()
the topology is determined simultaneously
(ie, equivalent or potentially lower communication).
- make sizing of commsStruct List demand-driven as well
for more robustness, fewer unneeded allocations.
- fix potential latent bug with allBelow/allNotBelow for proc 0
(linear communication).
ENH: remove unused/unusable UPstream::communicator optional parameter
- had constructor option to avoid constructing the MPI backend,
but this is not useful and inconsistent with what the reset or
destructor expect.
STYLE: local use of UPstream::communicator
- automatically frees communicator when it leaves scope
- these are primarily when encountering sparse (eg, inter-host)
communicators. Additional UPstream convenience methods:
is_rank(comm)
=> True if process corresponds to a rank in the communicators.
Can be a master rank or a sub-rank.
is_parallel(comm)
=> True if parallel algorithm or exchange is used on the process.
same as
(parRun() && (nProcs(comm) > 1) && is_rank(comm))
- for robustness with small edges (which can occur with snappy meshes),
the Le() and magLe() are limited to SMALL (commit a0f1e98d24).
Now use factor sqrt(1/3) in the components to maintain magnitude of 1.
ENH: add fvMesh::unitSf() and faMesh::unitLe() methods
- simple wrappers around Sf()/magSf() and Le()/magLe() but with
the potential for additional/alternative corrections.
STYLE: thisDb() in faMesh code to simplify future changes in storage
ENH: do not register finite-area geometric fields
- consistent with finite-volume treatment
- replace the "one-size-fits-all" approach of tensor field inv()
with individual 'failsafe' inverts.
The inv() field function historically just checked the first entry
to detect 2D cases and adjusted/readjusted *all* tensors accordingly
(to avoid singularity tensors and/or noisy inversions).
This seems to have worked reasonably well with 3D volume meshes, but
breaks down for 2D area meshes, which can be axis-aligned
differently on different sections of the mesh.
- with (nPollProcInterfaces < 0) it does the following:
- loop, waiting for some requests to finish
- for each out-of-date interface, check if its associated
requests have now finished (ie, the ready() check).
- if ready() -> call updateInterfaceMatrix()
In contrast to (nPollProcInterfaces > 0) which loops a specified
number of times with several calls to MPI_Test each time, the
(nPollProcInterfaces < 0) variant relies on internal MPI looping
within MPI_Waitsome to progress communication.
The actual dispatch still remains non-deterministic (ie, waiting for
some requests to finish does not mean that any particular interface
is eligible for update, or in any particular order). However, using
Waitsome places the tight looping into the MPI layer, which results
in few calls and eliminates behaviour dependent on the value of
nPollProcInterfaces.
TUT: add polling to windAroundBuildings case (for testing purposes)
- fewer calls, potentially more consistent
ENH: update sendRequest state after recvRequest wait
- previously had this type of code:
// Treat send as finished when recv is done
UPstream::waitRequest(recvRequest_);
recvRequest_ = -1;
sendRequest_ = -1;
Now refined as follows:
// Require receive data. Update the send request state.
UPstream::waitRequest(recvRequest_);
recvRequest_ = -1;
if (UPstream::finishedRequest(sendRequest_)) sendRequest_ = -1;
Can potentially investigate with requiring both,
but this may be over-contrained.
Example,
// Require receive data, but also wait for sends too
UPstream::waitRequestPair(recvRequest_, sendRequest_);
- checks requests from completion, returning true when some requests
have completed and false when there are no active requests.
This allows it to be used in a polling loop to progress MPI
and then respond when as requests become satisfied.
When using as part of a dispatch loop, waitSomeRequests() is
probably more efficient than calling waitAnyRequest() and can help
avoid biasing which client requests are serviced.
Takes an optional return parameter, to retrieve the indices,
but more importantly to avoid inner-loop reallocations.
Example,
DynamicList<int> indices;
while (UPstream::waitSomeRequests(startRequest, &indices))
{
// Dispatch something ....
}
// Reset list of outstanding requests with 'Waitall' for safety
UPstream::waitRequests(startRequest);
---
If only dealing with single items and an index is required for
dispatching, it can be better to use a list of UPstream::Request
instead.
Example,
List<UPstream::Request> requests = ...;
label index = -1;
while ((index = UPstream::waitAnyRequest(requests)) >= 0)
{
// Do something at index
}
ENH: pair-wise wrappers for MPI_Test or MPI_Wait
- for send/recv pairs of requests, can bundle both together and use a
single MPI_Testsome and MPI_Waitall instead of two individual
calls.
- previously had an additional stack for freedRequests_,
which were used to 'remember' locations into the list of
outstandingRequests_ that were handled by 'waitRequest()'.
This was principally done for sanity checks on shutdown,
but we now just test for any outstanding requests that
are *not* MPI_REQUEST_NULL instead (much simpler).
The framework with freedRequests_ also had a provision to 'recycle'
them by popping from that stack, but this is rather fragile since it
would only triggered by some collectives
(MPI_Iallreduce, MPI_Ialltoall, MPI_Igather, MPI_Iscatter)
with no guarantee that these will all be properly removed again.
There was also no pruning of extraneous indices.
ENH: consolidate internal reset/push of requests
- replace duplicate code with inline functions
reset_request(), push_request()
ENH: null out trailing requests
- extra safety (paranoia) for the UPstream::Request versions
of finishedRequests(), waitAnyRequest()
CONFIG: document nPollProcInterfaces in etc/controlDict
- still experimental, but at least make the keyword known
- mechanism has been unused for at least a decade or more
(or was never used). Message tags are assigned on an ad hoc basis
locally when collision avoidance is necessary.
- not currently used, but it is possible that communicator allocation
modifies the list of sub-ranks. Ensure that the correct size is used
when (re)initialising the linear/tree structures.
STYLE: adjust MPI test applications
- remove some clutter and unneeded grouping.
Some ideas for host-only communicators
- allow reporting even when profiling is suspended
- consolidate reporting into profilingPstream itself
(avoids code scatter).
Example of possible advanced use for timing only one section of
code:
====
// Profile local operations
profilingPstream::enable();
... do something
// Don't profile elsewhere
profilingPstream::suspend();
====
- separate broadcast times from reduce/gather/scatter time
- separate wait times from all-to-all time
- support invocation counts, split off requests time/count
from others to avoid flooding the counts
- support 'detail' switch to increase the output information.
Format may change in the future
- attributes such as assignable(), coupled() etc
- common patchField types: calculatedType(), zeroGradientType() etc.
This simplifies reference to these types without actually needing a
typed patchField version.
ENH: add some basic patchField types to fieldTypes namespace
- allows more general use of the names
ENH: set extrapolated/calculated from patchInternalField directly
- avoids intermediate tmp
- with the current handling of small edges (finite-area), the LSQ
vectors can result in singular/2D tensors. However, the regular
2D handling in field inv() only detects based on the first element.
Provide a 'failsafe' inv() method for symmTensor and tensor that
follows a similar logic for avoiding zero determinates, but it is
applied on a per element basis, instead of deciding based on the
first field element.
The symmTensor::inv(bool) and tensor::inv(bool) methods have a
fairly modest additional overhead.
- unroll the field inv() function to avoid creating an intermediate
field. Reduce the number of operations when adjusting/re-adjusting
the diagonal.
- for cases where a 3D tensor is being used to represent 2D content,
the determinant is zero. Can use inv2D(excludeDirection) to compensate
and invert as if it were only 2D.
ENH: consistent definitions for magSqr of symmTensors, diagSqr() norm
COMP: return scalar not component type for magSqr
- had inconsistent definitions with SymmTensor returning the component
type and Tensor returning scalar. Only evident with complex.
- when only a partial stacktrace is desirable.
ENH: add stack trace decorators
- the 0-th frame is always printStack(), so skip that and emit
some headers/footers instead. Eg,
[stack trace]
=============
#1 Foam::SymmTensor<double> Foam::inv<double>(...)
#2 Foam::inv(Foam::UList<Foam::SymmTensor<double>> const&) ...
...
=============
- data_bytes(), size_bytes() methods to support broadcasting or
gather/scatter content. Additional construct from raw bytes
to support transmitting content.
- missed consistency in a few places.
- return nullptr (with automatic conversion to tmp) on failures
instead of tmp<....>(nullptr), for cleaner coding.
INT: add support for an 'immovable' tmp pointer
- this idea is from openfoam.org, to allow creation of a tmp that is
protected from having its memory reclaimed in field operations
ENH: tmp NewImmovable factory method, forwards as immovable/movable
- no-op implementations, but makes the call to
GeometricBoundaryField::evaluate() less dependent on PatchField type
- add updated()/manipulatedMatrix() methods to faePatchField,
fvsPatchField etc. These are mostly no-ops, but provide name
compatible with fvPatchField etc.
- similar to UPstream::parRun(), the setter returns the previous value.
The accessors are prefixed with 'comm':
Eg, commGlobal(), commWarn(), commWorld(), commSelf().
This distinguishes them from any existing variables (eg, worldComm)
and arguably more similar to MPI_COMM_WORLD etc...
If demand-driven communicators are added in the future, the function
call syntax can help encapsulate that.
Previously:
const label oldWarnComm = UPstream::warnComm;
const label oldWorldComm = UPstream::worldComm;
UPstream::warnComm = myComm;
UPstream::worldComm = myComm;
...
UPstream::warnComm = oldWarnComm;
UPstream::worldComm = oldWorldComm;
Now:
const label oldWarnComm = UPstream::commWarn(myComm);
const label oldWorldComm = UPstream::commWorld(myComm);
...
UPstream::commWarn(oldWarnComm);
UPstream::commWorld(oldWorldComm);
STYLE: check (warnComm >= 0) instead of (warnComm != -1)
- constructing with valueRequired as a bool is still supported,
but now also support more refined requirements
(eg, NO_READ, MUST_READ, LAZY_READ)
- continue with LAZY_READ for finite-area fields