This patch uses a technique where in a retryable storage
before object layer initialization has a higher delay
and waits for longer period upto 4 times with time unit
of seconds.
And uses another set of configuration after the disks
have been formatted, i.e use a lower retry backoff rate
and retrying only once per 5 millisecond.
Network IO error count is reduced to a lower value i.e 256
before we reject the disk completely. This is done so that
combination of retry logic and total error count roughly
come to around 2.5secs which is when we basically take the
disk offline completely.
NOTE: This patch doesn't fix the issue of what if the disk
is completely dead and comes back again after the initialization.
Such a mutating state requires a change in our startup sequence
which will be done subsequently. This is an interim fix to alleviate
users from these issues.
Implement a storage rpc specific rpc client,
which does not reconnect unnecessarily.
Instead reconnect is handled at a different
layer for storage alone.
Rest of the calls using AuthRPC automatically
reconnect, i.e upon an error equal to `rpc.ErrShutdown`
they dial again and call the requested method again.
Attempt a reconnect also if disk not found.
This is needed since any network operation error
is converted to disk not found but we also need
to make sure if disk is really not available.
Additionally we also need to retry more than
once because the server might be in startup
sequence which would render other servers to
wrongly think that the server is offline.
This change brings in changes at multiple places
- Reuse buffers at almost all locations ranging
from rpc, fs, xl, checksum etc.
- Change caching behavior to disable itself
under low memory conditions i.e < 8GB of RAM.
- Only objects cached are of size 1/10th the size
of the cache for example if 4GB is the cache size
the maximum object size which will be cached
is going to be 400MB. This change is an
optimization to cache more objects rather
than few larger objects.
- If object cache is enabled default GC
percent has been reduced to 20% in lieu
with newly found behavior of GC. If the cache
utilization reaches 75% of the maximum value
GC percent is reduced to 10% to make GC
more aggressive.
- Do not use *bytes.Buffer* due to its growth
requirements. For every allocation *bytes.Buffer*
allocates an additional buffer for its internal
purposes. This is undesirable for us, so
implemented a new cappedWriter which is capped to a
desired size, beyond this all writes rejected.
Possible fix for #3403.
setGlobalsFromContext() is added to set global variables after parsing
command line arguments. Thus, global flags will be honored wherever
they are placed in minio command.
Ref #3229
After review with @abperiasamy we decided to remove all the unnecessary options
- MINIO_BROWSER (Implemented as a security feature but now deemed obsolete
since even if blocking access to MINIO_BROWSER, s3 API port is open)
- MINIO_CACHE_EXPIRY (Defaults to 72h)
- MINIO_MAXCONN (No one used this option and we don't test this)
- MINIO_ENABLE_FSMETA (Enable FSMETA all the time)
Remove --ignore-disks option - this option was implemented when XL layer
would initialize the backend disks and heal them automatically to disallow
XL accidentally using the root partition itself this option was introduced.
This behavior has been changed XL no longer automatically initializes
`format.json` a HEAL is controlled activity, so ignore-disks is not
useful anymore. This change also addresses the problems of our documentation
going forward and keeps things simple. This patch brings in reduction of
options and defaulting them to a valid known inputs. This patch also
serves as a guideline of limiting many ways to do the same thing.
- Adds an interface to update in-memory bucket metadata state called
BucketMetaState - this interface has functions to:
- update bucket notification configuration,
- bucket listener configuration,
- bucket policy configuration, and
- send bucket event
- This interface is implemented by `localBMS` a type for manipulating
local node in-memory bucket metadata, and by `remoteBMS` a type for
manipulating remote node in-memory bucket metadata.
- The remote node interface, makes an RPC call, but the local node
interface does not - it updates in-memory bucket state directly.
- Rename mkPeersFromEndpoints to makeS3Peers and refactored it.
- Use arrayslice instead of map in s3Peers struct
- `s3Peers.SendUpdate` now receives an arrayslice of peer indexes to
send the request to, with a special nil value slice indicating that
all peers should be sent the update.
- `s3Peers.SendUpdate` now returns an arrayslice of errors, representing
errors from peers when sending an update. The array positions
correspond to peer array s3Peers.peers
Improve globalS3Peers:
- Make isDistXL a global `globalIsDistXL` and remove from s3Peers
- Make globalS3Peers an array of (address, bucket-meta-state) pairs.
- Fix code and tests.
For command line arguments we are currently following
- <node-1>:/path ... <node-n>:/path
This patch changes this to
- http://<node-1>/path ... http://<node-n>/path
* Implements a Peer RPC router that sends info to all Minio servers in the cluster.
* Bucket notifications are propagated to all nodes via this RPC router.
* Bucket listener configuration is persisted to separate object layer
file (`listener.json`) and peer RPCs are used to communicate changes
throughout the cluster.
* When events are generated, RPC calls to send them to other servers
where bucket listeners may be connected is implemented.
* Some bucket notification tests are now disabled as they cannot work in
the new design.
* Minor fix in `funcFromPC` to use `path.Join`
- Servers do not exit for invalid credentials instead they print and wait.
- Servers do not exit for version mismatch instead they print and wait.
- Servers do not exit for time differences between nodes they print and wait.
These messages based on our prep stage during XL
and prints more informative message regarding
drive information.
This change also does a much needed refactoring.
* The user is required to specify a table name and database connection
information in the configuration file.
* INSERTs and DELETEs are done via prepared statements for speed.
* Assumes a table structure, and requires PostgreSQL 9.5 or above due to
the use of UPSERT.
* Creates the table if it does not exist with the given table name using
a query like:
CREATE TABLE myminio (
key varchar PRIMARY KEY,
value JSONB
);
* Vendors some required libraries.
- Instrumentation for locks.
- Detailed test coverage.
- Adding RPC control handler to fetch lock instrumentation.
- RPC control handlers suite tests with a test RPC server.
This API is precursor before implementing `minio lambda` and `mc` continous replication.
This new api is an extention to BucketNofication APIs.
// Request
```
GET /bucket?notificationARN=arn:minio:lambda:us-east-1:10:minio HTTP/1.1
...
...
```
// Response
```
{"Records": ...}
...
...
...
{"Records": ...}
```
* Implement basic S3 notifications through queues
Supports multiple queues and three basic queue types:
1. NilQueue -- messages don't get sent anywhere
2. LogQueue -- messages get logged
3. AmqpQueue -- messages are sent to an AMQP queue
* api: Implement bucket notification.
Supports two different queue types
- AMQP
- ElasticSearch.
* Add support for redis
The object cache implementation is XL cache, which defaults
to 8GB worth of read cache. Currently GetObject() transparently
writes to this cache upon first client read and then subsequently
serves reads from the same cache.
Currently expiration is not implemented.
This patch brings in the removal of debug logging altogether, instead
we bring in the functionality of being able to trace the errors properly
pointing back to the origination of the problem.
To enable tracing you need to enable "MINIO_TRACE" set to "1" or "true"
environment variable which would print back traces whenever there is an
error which is unhandled or at the handler layer.
By default this tracing is turned off and only user level logging is
provided.
When server is run with multiple disks which uses xl interface where
order and count of disks are important, this patch saves such disks
configuration and compares in next run if there is a mismatch.
Fixes#1458