Previously we added heal trigger when bit-rot checks
failed, now extend that to support heal when parts
are not found either. This healing gets only triggered
if we can successfully decode the object i.e read
quorum is still satisfied for the object.
This PR refactors the way we use buffers for O_DIRECT and
to re-use those buffers for messagepack reader writer.
After some extensive benchmarking found that not all objects
have this benefit, and only objects smaller than 64KiB see
this benefit overall.
Benefits are seen from almost all objects from
1KiB - 32KiB
Beyond this no objects see benefit with bulk call approach
as the latency of bytes sent over the wire v/s streaming
content directly from disk negate each other with no
remarkable benefits.
All other optimizations include reuse of msgp.Reader,
msgp.Writer using sync.Pool's for all internode calls.
Add context to all (non-trivial) calls to the storage layer.
Contexts are propagated through the REST client.
- `context.TODO()` is left in place for the places where it needs to be added to the caller.
- `endWalkCh` could probably be removed from the walkers, but no changes so far.
The "dangerous" part is that now a caller disconnecting *will* propagate down, so a
"delete" operation will now be interrupted. In some cases we might want to disconnect
this functionality so the operation completes if it has started, leaving the system in a cleaner state.
- Heal if the part.1 is truncated from its original size
- Heal if the part.1 fails while being verified in between
- Heal if the part.1 fails while being at a certain offset
Other cleanups include make sure to flush the HTTP responses
properly from storage-rest-server, avoid using 'defer' to
improve call latency. 'defer' incurs latency avoid them
in our hot-paths such as storage-rest handlers.
Fixes#8319