minio

Commit Graph

Author	SHA1	Message	Date
Anis Elleuch	e9ac7b0fb7	heal: Remove empty directories (#11354 ) Since the introduction of __XLDIR__, an empty directory does not have a meaning anymore in erasure mode. Make healing removes it wherever it finds it.	4 years ago
Harshavardhana	1debd722b5	rename last remaining Zone->Pool	4 years ago
Anis Elleuch	00cff1aac5	audit: per object send pool number, set number and servers per operation (#11233 )	4 years ago
Harshavardhana	9cdd981ce7	fix: expire locks only on participating lockers (#11335 ) additionally also add a new ForceUnlock API, to allow forcibly unlocking locks if possible.	4 years ago
Anis Elleuch	bd8020aba8	heal: Decode object name in healing result (#11348 ) The user can see __XLDIR__ prefix in mc admin heal when the command heals an empty object with a trailing slash. This commit decodes the name of the object before sending it back to the upper level.	4 years ago
Harshavardhana	a6c146bd00	validate storage class across pools when setting config (#11320 ) ``` mc admin config set alias/ storage_class standard=EC:3 ``` should only succeed if parity ratio is valid for all server pools, if not we should fail proactively. This PR also needs to bring other changes now that we need to cater for variadic drive counts per pool. Bonus fixes also various bugs reproduced with - GetObjectWithPartNumber() - CopyObjectPartWithOffsets() - CopyObjectWithMetadata() - PutObjectPart,PutObject with truncated streams	4 years ago
Harshavardhana	1ad2b7b699	fix: add stricter validation for erasure server pools (#11299 ) During expansion we need to validate if - new deployment is expanded with newer constraints - existing deployment is expanded with older constraints - multiple server pools rejected if they have different deploymentID and distribution algo	4 years ago
Ritesh H Shukla	b4add82bb6	Updated Prometheus metrics (#11141 ) * Add metrics for nodes online and offline * Add cluster capacity metrics * Introduce v2 metrics	4 years ago
Harshavardhana	f903cae6ff	Support variable server pools (#11256 ) Current implementation requires server pools to have same erasure stripe sizes, to facilitate same SLA and expectations. This PR allows server pools to be variadic, i.e they do not have to be same erasure stripe sizes - instead they should have SLA for parity ratio. If the parity ratio cannot be guaranteed by the new server pool, the deployment is rejected i.e server pool expansion is not allowed.	4 years ago
Harshavardhana	44dff36ff7	listing with prefix prefixed with '/' should be ignored (#11268 ) fixes #11265	4 years ago
Harshavardhana	5c52d5ffc7	fix: treat errVolumeNotFound as EOF error in listPathRaw (#11238 )	4 years ago
Harshavardhana	b5d291ea88	fix: rename remaining zone -> pool (#11231 )	4 years ago
Harshavardhana	e7ae49f9c9	fix: calculate prometheus disks_offline/disks_total correctly (#11215 ) fixes #11196	4 years ago
Anis Elleuch	2ecaab55a6	admin: ServerInfo returns info without object layer initialized (#11142 )	4 years ago
Harshavardhana	e5d378931d	fix: delimiter based listing was broken without marker (#11136 ) with missing nextMarker with delimiter based listing, top level prefixes beyond 4500 or max-keys value wouldn't be sent back for client to ask for the next batch. reproduced at a customer deployment, create prefixes as shown below ``` for year in $(seq 2017 2020) do for month in {01..12} do for day in {01..31} do mc -q cp file myminio/testbucket/dir/day_id=$year-$month-$day/; done done done ``` Then perform ``` aws s3api --profile minio --endpoint-url http://localhost:9000 list-objects \ --bucket testbucket --prefix dir/ --delimiter / --max-keys 1000 ``` You shall see missing NextMarker, this would disallow listing beyond max-keys requested and also disallow beyond 4500 (maxKeyObjectList) prefixes being listed because client wouldn't know the NextMarker available. This PR addresses this situation properly by making the implementation more spec compatible. i.e NextMarker in-fact can be either an object, a prefix with delimiter depending on the input operation. This issue was introduced after the list caching changes and has been present for a while.	4 years ago
Harshavardhana	c606c76323	fix: prioritized latest buckets for crawler to finish the scans faster (#11115 ) crawler should only ListBuckets once not for each serverPool, buckets are same across all pools, across sets and ListBuckets always returns an unified view, once list buckets returns sort it by create time to scan the latest buckets earlier with the assumption that latest buckets would have lesser content than older buckets allowing them to be scanned faster and also to be able to provide more closer to latest view.	4 years ago
Klaus Post	e7d3b49a20	metacache: Make very small requests transient (#11109 )	4 years ago
Harshavardhana	8368ab76aa	fix: remove the requirement for healing buckets in ListBucketsHeal (#11098 ) With new refactor of bucket healing, healing bucket happens automatically including its metadata, there is no need to redundant heal buckets also in ListBucketsHeal remove it.	4 years ago
Harshavardhana	2eb52ca5f4	fix: heal bucket metadata right before healing bucket (#11097 ) optimization mainly to avoid listing the entire `.minio.sys/buckets/.minio.sys` directory, this can get really huge and comes in the way of startup routines, contents inside `.minio.sys/buckets/.minio.sys` are rather transient and not necessary to be healed.	4 years ago
Harshavardhana	db7890660e	fix: a crash when disk is nil, safe access on erasureDisks (#11089 ) fixes #11088	4 years ago
Harshavardhana	9c53cc1b83	fix: heal multiple buckets in bulk (#11029 ) makes server startup, orders of magnitude faster with large number of buckets	4 years ago
Harshavardhana	4ec45753e6	rename server sets to server pools	4 years ago
Klaus Post	e6ea5c2703	crawler: Missing folder heal check per set (#10876 )	4 years ago
Harshavardhana	790833f3b2	Revert "Support variable server sets (#10314 )" This reverts commit `aabf053d2f`.	4 years ago
Harshavardhana	bdd094bc39	fix: avoid sending errors on missing objects on locked buckets (#10994 ) make sure multi-object delete returned errors that are AWS S3 compatible	4 years ago
Harshavardhana	aabf053d2f	Support variable server sets (#10314 )	4 years ago
Poorna Krishnamoorthy	251c1ef6da	Add support for replication of object tags, retention metadata (#10880 )	4 years ago
Klaus Post	b5a3d79bce	listobjectversions: Add shortcut for Veeam blocks (#10893 ) Add shortcut for `APN/1.0 Veeam/1.0 Backup/10.0` It requests unique blocks with a specific prefix. We skip scanning the parent directory for more objects matching the prefix.	4 years ago
Harshavardhana	aa158228f9	fix: simplify healing metadata objects per set (#10867 )	4 years ago
Klaus Post	8747834c69	DeletedObjects: Return objects on lock failure (#10874 ) Return objects when locking fails. <details> <summary>Panic</summary> ``` : 2020/11/10 04:15:55 http: panic serving 10.10.62.153:44858: runtime error: index out of range [0] with length 0 : goroutine 363537270 [running]: : net/http.(conn).serve.func1(0xc019232780) : net/http/server.go:1801 +0x147 : panic(0x1cadd60, 0xc001719260) : runtime/panic.go:975 +0x47a : github.com/minio/minio/cmd.criticalErrorHandler.ServeHTTP.func1(0xc0121d1200, 0x210cda0, 0xc0141940e0) : github.com/minio/minio/cmd/generic-handlers.go:781 +0x1a8 : panic(0x1cadd60, 0xc001719260) : runtime/panic.go:969 +0x1b9 : github.com/minio/minio/cmd.objectAPIHandlers.DeleteMultipleObjectsHandler(0x1e71ce8, 0x1e71cc8, 0x2108420, 0xc0192328c0, 0xc0121d1400) : github.com/minio/minio/cmd/bucket-handlers.go:465 +0x2490 : net/http.HandlerFunc.ServeHTTP(...) : net/http/server.go:2042 : github.com/minio/minio/cmd.httpTraceAll.func1(0x2108420, 0xc0192328c0, 0xc0121d1400) : github.com/minio/minio/cmd/handler-utils.go:353 +0x158 : net/http.HandlerFunc.ServeHTTP(...) : net/http/server.go:2042 : github.com/minio/minio/cmd.collectAPIStats.func1(0x2108420, 0xc019232820, 0xc0121d1400) : github.com/minio/minio/cmd/handler-utils.go:380 +0xed : net/http.HandlerFunc.ServeHTTP(...) : net/http/server.go:2042 : github.com/minio/minio/cmd.maxClients.func1(0x2108420, 0xc019232820, 0xc0121d1400) : github.com/minio/minio/cmd/handler-api.go:132 +0x33b : net/http.HandlerFunc.ServeHTTP(0xc00271d590, 0x2108420, 0xc019232820, 0xc0121d1400) : net/http/server.go:2042 +0x44 : github.com/minio/minio/cmd.redirectHandler.ServeHTTP(0x20e2180, 0xc00271d590, 0x2108420, 0xc019232820, 0xc0121d1400) : github.com/minio/minio/cmd/generic-handlers.go:192 +0x156 : github.com/minio/minio/cmd.customHeaderHandler.ServeHTTP(0x20e1060, 0xc0141a22b0, 0x21083e0, 0xc01814d2e0, 0xc0121d1400) : github.com/minio/minio/cmd/generic-handlers.go:751 +0x162 : github.com/minio/minio/cmd.securityHeaderHandler.ServeHTTP(0x20e0fc0, 0xc0141a22c0, 0x21083e0, 0xc01814d2e0, 0xc0121d1400) : github.com/minio/minio/cmd/generic-handlers.go:766 +0x1d6 : github.com/minio/minio/cmd.bucketForwardingHandler.ServeHTTP(0xc0121c7a40, 0x20e1120, 0xc0141a22d0, 0x21083e0, 0xc01814d2e0, 0xc0121d1400) : github.com/minio/minio/cmd/generic-handlers.go:624 +0xbf : github.com/minio/minio/cmd.requestValidityHandler.ServeHTTP(0x20e0f20, 0xc01814d280, 0x21083e0, 0xc01814d2e0, 0xc0121d1400) : github.com/minio/minio/cmd/generic-handlers.go:608 +0x42a : github.com/minio/minio/cmd.httpStatsHandler.ServeHTTP(0x20e10c0, 0xc0141a2300, 0x210cda0, 0xc0141940e0, 0xc0121d1400) : github.com/minio/minio/cmd/generic-handlers.go:536 +0xe4 : github.com/minio/minio/cmd.requestSizeLimitHandler.ServeHTTP(0x20e0fe0, 0xc0141a2310, 0x50004000000, 0x210cda0, 0xc0141940e0, 0xc0121d1400) : github.com/minio/minio/cmd/generic-handlers.go:68 +0xd4 : github.com/minio/minio/cmd.requestHeaderSizeLimitHandler.ServeHTTP(0x20e10a0, 0xc01814d2a0, 0x210cda0, 0xc0141940e0, 0xc0121d1400) : github.com/minio/minio/cmd/generic-handlers.go:93 +0x1b7 : github.com/minio/minio/cmd.crossDomainPolicy.ServeHTTP(0x20e1080, 0xc0141a2320, 0x210cda0, 0xc0141940e0, 0xc0121d1400) : github.com/minio/minio/cmd/crossdomain-xml-handler.go:51 +0x82 : github.com/minio/minio/cmd.browserRedirectHandler.ServeHTTP(0x20e0fa0, 0xc0141a2330, 0x210cda0, 0xc0141940e0, 0xc0121d1400) : github.com/minio/minio/cmd/generic-handlers.go:276 +0x68 : github.com/minio/minio/cmd.minioReservedBucketHandler.ServeHTTP(0x20e0f00, 0xc0141a2340, 0x210cda0, 0xc0141940e0, 0xc0121d1400) : github.com/minio/minio/cmd/generic-handlers.go:344 +0xb8 : github.com/minio/minio/cmd.cacheControlHandler.ServeHTTP(0x20e1020, 0xc0141a2350, 0x210cda0, 0xc0141940e0, 0xc0121d1400) : github.com/minio/minio/cmd/generic-handlers.go:303 +0x1ce : github.com/minio/minio/cmd.timeValidityHandler.ServeHTTP(0x20e0f40, 0xc0141a2360, 0x210cda0, 0xc0141940e0, 0xc0121d1400) : github.com/minio/minio/cmd/generic-handlers.go:414 +0x3ca : github.com/minio/minio/cmd.resourceHandler.ServeHTTP(0x20e1160, 0xc0141a2370, 0x210cda0, 0xc0141940e0, 0xc0121d1400) : github.com/minio/minio/cmd/generic-handlers.go:516 +0xab : github.com/minio/minio/cmd.authHandler.ServeHTTP(0x20e1100, 0xc0141a2380, 0x210cda0, 0xc0141940e0, 0xc0121d1400) : github.com/minio/minio/cmd/auth-handler.go:502 +0x2e7 : github.com/minio/minio/cmd.sseTLSHandler.ServeHTTP(0x20e0ee0, 0xc0141a2390, 0x210cda0, 0xc0141940e0, 0xc0121d1400) : github.com/minio/minio/cmd/generic-handlers.go:802 +0x79 : github.com/minio/minio/cmd.reservedMetadataHandler.ServeHTTP(0x20e1140, 0xc0141a23a0, 0x210cda0, 0xc0141940e0, 0xc0121d1400) : github.com/minio/minio/cmd/generic-handlers.go:139 +0x1b7 : github.com/gorilla/mux.(Router).ServeHTTP(0xc00073fb00, 0x210cda0, 0xc0141940e0, 0xc0121d1200) : github.com/gorilla/mux@v1.8.0/mux.go:210 +0xd3 : github.com/rs/cors.(Cors).Handler.func1(0x210cda0, 0xc0141940e0, 0xc0121d1200) : github.com/rs/cors@v1.7.0/cors.go:219 +0x1b9 : net/http.HandlerFunc.ServeHTTP(0xc0009aece0, 0x210cda0, 0xc0141940e0, 0xc0121d1200) : net/http/server.go:2042 +0x44 : github.com/minio/minio/cmd.criticalErrorHandler.ServeHTTP(0x20e2180, 0xc0009aece0, 0x210cda0, 0xc0141940e0, 0xc0121d1200) : github.com/minio/minio/cmd/generic-handlers.go:784 +0x85 : github.com/minio/minio/cmd/http.(Server).Start.func1(0x210cda0, 0xc0141940e0, 0xc0121d1200) : github.com/minio/minio/cmd/http/server.go:101 +0x258 : net/http.HandlerFunc.ServeHTTP(0xc000dc4080, 0x210cda0, 0xc0141940e0, 0xc0121d1200) : net/http/server.go:2042 +0x44 : net/http.serverHandler.ServeHTTP(0xc000764c60, 0x210cda0, 0xc0141940e0, 0xc0121d1200) : net/http/server.go:2843 +0xa3 : net/http.(conn).serve(0xc019232780, 0x2114720, 0xc03381f6c0) : net/http/server.go:1925 +0x8ad : created by net/http.(Server).Serve : net/http/server.go:2969 +0x36c ``` </details>	4 years ago
Klaus Post	2294e53a0b	Don't retain context in locker (#10515 ) Use the context for internal timeouts, but disconnect it from outgoing calls so we always receive the results and cancel it remotely.	4 years ago
Harshavardhana	ad382799b1	use list cache for Walk() with webUI and quota (#10814 ) bring list cache optimizations for web UI object listing, also FIFO quota enforcement through list cache as well.	4 years ago
Harshavardhana	68de5a6f6a	fix: IAM store fallback to list users and policies from disk (#10787 ) Bonus fixes, remove package retry it is harder to get it right, also manage context remove it such that we don't have to rely on it anymore instead use a simple Jitter retry.	4 years ago
Harshavardhana	4ea31da889	fix: move list quorum ENV to config (#10804 )	4 years ago
Harshavardhana	b686bb9c83	fix: replaced drive properly by healing the entire drive (#10799 ) Bonus fixes, we do not need reload format anymore as the replaced drive is healed locally we only need to ensure that drive heal reloads the drive properly. We preserve the UUID of the original order, this means that the replacement in `format.json` doesn't mean that the drive needs to be reloaded into memory anymore. fixes #10791	4 years ago
Harshavardhana	5e5cdc581d	remove unnecessary logging and move to log once (#10798 ) the current master logs way too much when a node is down, instead log once and move on.	4 years ago
Klaus Post	a982baff27	ListObjects Metadata Caching (#10648 ) Design: https://gist.github.com/klauspost/025c09b48ed4a1293c917cecfabdf21c Gist of improvements: * Cross-server caching and listing will use the same data across servers and requests. * Lists can be arbitrarily resumed at a constant speed. * Metadata for all files scanned is stored for streaming retrieval. * The existing bloom filters controlled by the crawler is used for validating caches. * Concurrent requests for the same data (or parts of it) will not spawn additional walkers. * Listing a subdirectory of an existing recursive cache will use the cache. * All listing operations are fully streamable so the number of objects in a bucket no longer dictates the amount of memory. * Listings can be handled by any server within the cluster. * Caches are cleaned up when out of date or superseded by a more recent one.	4 years ago
Harshavardhana	734f258878	fix: slow down auto healing more aggressively (#10730 ) Bonus fixes - logging improvements to ensure that we don't use `go logger.LogIf` to avoid runtime.Caller missing the function name. log where necessary. - remove unused code at erasure sets	4 years ago
Harshavardhana	ad726b49b4	rename zones to serverSets to avoid terminology conflict (#10679 ) we are bringing in availability zones, we should avoid zones as per server expansion concept.	4 years ago
Harshavardhana	71b97fd3ac	fix: connect disks pre-emptively during startup (#10669 ) connect disks pre-emptively upon startup, to ensure we have enough disks are connected at startup rather than wait for them. we need to do this to avoid long wait times for server to be online when we have servers come up in rolling upgrade fashion	4 years ago
Harshavardhana	2760fc86af	Bump default idleConnsPerHost to control conns in time_wait (#10653 ) This PR fixes a hang which occurs quite commonly at higher concurrency by allowing following changes - allowing lower connections in time_wait allows faster socket open's - lower idle connection timeout to ensure that we let kernel reclaim the time_wait connections quickly - increase somaxconn to 4096 instead of 2048 to allow larger tcp syn backlogs. fixes #10413	4 years ago
Harshavardhana	6484453fc6	optionally allow strict quorum listing (#10649 ) ``` export MINIO_API_LIST_STRICT_QUORUM=on ``` would enable listing in quorum if necessary	4 years ago
Harshavardhana	736e58dd68	fix: handle concurrent lockers with multiple optimizations (#10640 ) - select lockers which are non-local and online to have affinity towards remote servers for lock contention - optimize lock retry interval to avoid sending too many messages during lock contention, reduces average CPU usage as well - if bucket is not set, when deleteObject fails make sure setPutObjHeaders() honors lifecycle only if bucket name is set. - fix top locks to list out always the oldest lockers always, avoid getting bogged down into map's unordered nature.	4 years ago
Harshavardhana	18063bf25c	fix: cleanup old directory handling code (#10633 ) we don't need them anymore, remove legacy code.	4 years ago
Harshavardhana	23e8390997	fix: Allow Walk to honor load balanced drives (#10610 )	4 years ago
Harshavardhana	2b4eb87d77	pick disks which are common maximally used (#10600 ) further optimization to ensure that good disks are always used for listing, other than healing we only use disks that are maximally used.	4 years ago
Harshavardhana	00eb6f6bc9	cache DiskInfo at storage layer for performance (#10586 ) `mc admin info` on busy setups will not move HDD heads unnecessarily for repeated calls, provides a better responsiveness for the call overall. Bonus change allow listTolerancePerSet be N-1 for good entries, to avoid skipping entries for some reason one of the disk went offline.	4 years ago
Harshavardhana	66174692a2	add '.healing.bin' for tracking currently healing disk (#10573 ) add a hint on the disk to allow for tracking fresh disk being healed, to allow for restartable heals, and also use this as a way to track and remove disks. There are more pending changes where we should move all the disk formatting logic to backend drives, this PR doesn't deal with this refactor instead makes it easier to track healing in the future.	4 years ago
Harshavardhana	ca989eb0b3	avoid ListBuckets returning quorum errors when node is down (#10555 ) Also, revamp the way ListBuckets work make few portions of the healing logic parallel - walk objects for healing disks in parallel - collect the list of buckets in parallel across drives - provide consistent view for listBuckets()	4 years ago
Krishna Srinivas	230fc0d186	Support for "directory" objects (#10499 )	4 years ago

10 Commits (e019f21bdaec7f1e2649a324c32c031a59a67b5b)