minio

Commit Graph

Author	SHA1	Message	Date
Klaus Post	adca28801d	feat: disable Parquet by default (breaking change) (#9920 ) I have built a fuzz test and it crashes heavily in seconds and will OOM shortly after. It seems like supporting Parquet is basically a completely open way to crash the server if you can upload a file and run s3 select on it. Until Parquet is more hardened it is DISABLED by default since hostile crafted input can easily crash the server. If you are in a controlled environment where it is safe to assume no hostile content can be uploaded to your cluster you can safely enable Parquet. To enable Parquet set the environment variable `MINIO_API_SELECT_PARQUET=on` while starting the MinIO server. Furthermore, we guard parquet by recover functions.	4 years ago
Bruce Wang	e464a5bfbc	Fix bug with fields that contain trimming spaces (#10079 ) String x might contain trimming spaces. And it needs to be trimmed. For example, in csv files, there might be trimming spaces in a field that ought to meet a query condition that contains the value without trimming spaces. This applies to both intCast and floatCast functions.	4 years ago
Anis Elleuch	778e9c864f	Move dependency from minio-go v6 to v7 (#10042 )	4 years ago
Klaus Post	2e338e84cb	fix owhanging crashes in parquet S3 select (#9921 )	4 years ago
Klaus Post	2d0f65a5e3	Add archived parquet as int. package (#9912 ) Since github.com/minio/parquet-go is archived add it as internal package.	4 years ago
Harshavardhana	1bc32215b9	enable full linter across the codebase (#9620 ) enable linter using golangci-lint across codebase to run a bunch of linters together, we shall enable new linters as we fix more things the codebase. This PR fixes the first stage of this cleanup.	5 years ago
Frank Wessels	086be07bf5	Fix ndjson unsupported (#9500 )	5 years ago
Klaus Post	e4900b99d7	s3 select: Infer types for comparison (#9438 )	5 years ago
Anis Elleuch	9902c9baaa	sql: Add support of escape quote in CSV (#9231 ) This commit modifies csv parser, a fork of golang csv parser to support a custom quote escape character. The quote escape character is used to escape the quote character when a csv field contains a quote character as part of data.	5 years ago
Klaus Post	8d98662633	re-implement data usage crawler to be more efficient (#9075 ) Implementation overview: https://gist.github.com/klauspost/1801c858d5e0df391114436fdad6987b	5 years ago
Anis Elleuch	35ecc04223	Support configurable quote character parameter in Select (#8955 )	5 years ago
Harshavardhana	603cf2a8bb	fix: broken gzip handling with Select API (#9128 ) This PR fixes a regression introduced in `a1c7c9ea73`	5 years ago
Aditya Manthramurthy	cec8cdb35e	S3Select: Handle array selection in from clause (#9076 )	5 years ago
ebozduman	a1c7c9ea73	Matches s3 invalid compression format error for 'mc sql' (#9067 )	5 years ago
Klaus Post	e4020fb41f	SIMDJSON S3 select input (#8401 )	5 years ago
Anis Elleuch	de924605a1	Import CSV parser library (#8927 ) The CSV library code is imported from Go 1.13.6	5 years ago
Bruce Wang	c476b27a65	Comment typo "index max" to "index map" (#8700 )	5 years ago
Klaus Post	bf3a97d3aa	S3 Select: Concurrent LINES delimited json parsing (#8610 ) The speedup is ~5x on a 6 core CPU	5 years ago
Klaus Post	f1e2e1cc9e	S3 Select: Mismatched types don't match (#8608 ) When comparing for equality, if types cannot be matched, they don't match.	5 years ago
Harshavardhana	5d3d57c12a	Start using error wrapping with fmt.Errorf (#8588 ) Use fatih/errwrap to fix all the code to use error wrapping with fmt.Errorf()	5 years ago
Klaus Post	1c90a6bd49	S3 Select: Convert CSV data to JSON (#8464 )	5 years ago
Klaus Post	26e760ee62	Fix JSON Close data race. (#8486 ) The JSON stream library has no safe way of aborting while Since we cannot expect the called to safely handle "Read" and "Close" calls we must handle this. Also any Read error returned from upstream will crash the server. We preserve the errors and instead always return io.EOF upstream, but send the error on Close. `readahead v1.3.1` handles Read after Close better. Updates to `progressReader` is mostly to ensure safety. Fixes #8481	5 years ago
Klaus Post	38e6d911ea	S3 Select: Detect full object (#8456 ) Check if select is `SELECT s.* from S3Object s` and forward it to All Fixes #8371 and makes this case run significantly faster.	5 years ago
Klaus Post	51456e6adc	Select: Support Square Bracket Lists (#8457 ) Allows for S3 compatible `SELECT * from s3object s WHERE id IN [3,2]` Fixes #8422	5 years ago
Harshavardhana	d48fd6fde9	Remove unusued params and functions (#8399 )	5 years ago
Klaus Post	002ac82631	S3 Select: Add parser support for lists. (#8329 )	5 years ago
Klaus Post	c1a17c2561	S3 Select: Aggregate AVG/SUM as float (#8326 ) Force sum/average to be calculated as a float. As noted in #8221 > run SELECT AVG(CAST (Score as int)) FROM S3Object on ``` Name,Score alice,80 bob,81 ``` > AWS S3 gives 80.5 and MinIO gives 80. This also makes overflows much more unlikely.	5 years ago
Klaus Post	1c5b05c130	S3 select: Fix output conversion on select * (#8303 ) Fixes #8268	5 years ago
Klaus Post	be313f1758	S3 Select: Workaround java buffer size (#8312 ) Updates #7475 The Java implementation has a 128KB buffer and a message must be emitted before that is used. #7475 therefore limits the message size to 128KB. But up to 256 bytes are written to the buffer in each call. This means we must emit a message before shorter than 128KB. Therefore we change the limit to 128KB minus 256 bytes.	5 years ago
Klaus Post	520552ffa9	S3 select: flush when reaching limit (#8279 ) Add missing flush when reaching select limit.	5 years ago
Klaus Post	dac1cf5a9a	S3 Select: Parsing tweaks (#8261 ) * Don't output empty lines. * Trim whitespace from byte to int/float/bool conversions.	5 years ago
Klaus Post	c9b8bd8de2	S3 Select: optimize output (#8238 ) Queue output items and reuse them. Remove the unneeded type system in sql and just use the Go type system. In best case this is more than an order of magnitude speedup: ``` BenchmarkSelectAll_1M-12 1 1841049400 ns/op 274299728 B/op 4198522 allocs/op BenchmarkSelectAll_1M-12 14 84833400 ns/op 169228346 B/op 3146541 allocs/op ```	5 years ago
Klaus Post	017456df63	Wait clearing the close channel (#8250 ) Close channel should not be nilled before goroutines have exited. Fixes potential hang on closing.	5 years ago
Klaus Post	ddea0bdf11	Concurrent CSV parsing and reduce S3 select allocations (#8200 ) ``` CSV parsing, BEFORE: BenchmarkReaderBasic-12 2842 407533 ns/op 397860 B/op 957 allocs/op BenchmarkReaderReplace-12 2718 429914 ns/op 397844 B/op 957 allocs/op BenchmarkReaderReplaceTwo-12 2718 435556 ns/op 397855 B/op 957 allocs/op BenchmarkAggregateCount_100K-12 171 6798974 ns/op 16667102 B/op 308077 allocs/op BenchmarkAggregateCount_1M-12 19 65657411 ns/op 168057743 B/op 3146610 allocs/op BenchmarkSelectAll_10M-12 1 20882119900 ns/op 2758799896 B/op 41978762 allocs/op CSV parsing, AFTER: BenchmarkReaderBasic-12 3721 312549 ns/op 101920 B/op 338 allocs/op BenchmarkReaderReplace-12 3776 318810 ns/op 101993 B/op 340 allocs/op BenchmarkReaderReplaceTwo-12 3610 330967 ns/op 102012 B/op 341 allocs/op BenchmarkAggregateCount_100K-12 295 4149588 ns/op 3553623 B/op 103261 allocs/op BenchmarkAggregateCount_1M-12 30 37746503 ns/op 33827931 B/op 1049435 allocs/op BenchmarkSelectAll_10M-12 1 17608495800 ns/op 1416504040 B/op 21007082 allocs/op ~ benchcmp old.txt new.txt benchmark old ns/op new ns/op delta BenchmarkReaderBasic-12 407533 312549 -23.31% BenchmarkReaderReplace-12 429914 318810 -25.84% BenchmarkReaderReplaceTwo-12 435556 330967 -24.01% BenchmarkAggregateCount_100K-12 6798974 4149588 -38.97% BenchmarkAggregateCount_1M-12 65657411 37746503 -42.51% BenchmarkSelectAll_10M-12 20882119900 17608495800 -15.68% benchmark old allocs new allocs delta BenchmarkReaderBasic-12 957 338 -64.68% BenchmarkReaderReplace-12 957 340 -64.47% BenchmarkReaderReplaceTwo-12 957 341 -64.37% BenchmarkAggregateCount_100K-12 308077 103261 -66.48% BenchmarkAggregateCount_1M-12 3146610 1049435 -66.65% BenchmarkSelectAll_10M-12 41978762 21007082 -49.96% benchmark old bytes new bytes delta BenchmarkReaderBasic-12 397860 101920 -74.38% BenchmarkReaderReplace-12 397844 101993 -74.36% BenchmarkReaderReplaceTwo-12 397855 102012 -74.36% BenchmarkAggregateCount_100K-12 16667102 3553623 -78.68% BenchmarkAggregateCount_1M-12 168057743 33827931 -79.87% BenchmarkSelectAll_10M-12 2758799896 1416504040 -48.66% ``` ``` BenchmarkReaderHuge/97K-12 2200 540840 ns/op 184.32 MB/s 1604450 B/op 687 allocs/op BenchmarkReaderHuge/194K-12 1522 752257 ns/op 265.04 MB/s 2143135 B/op 1335 allocs/op BenchmarkReaderHuge/389K-12 1190 947858 ns/op 420.69 MB/s 3221831 B/op 2630 allocs/op BenchmarkReaderHuge/778K-12 806 1472486 ns/op 541.61 MB/s 5201856 B/op 5187 allocs/op BenchmarkReaderHuge/1557K-12 426 2575269 ns/op 619.36 MB/s 9101330 B/op 10233 allocs/op BenchmarkReaderHuge/3115K-12 286 4034656 ns/op 790.66 MB/s 12397968 B/op 16099 allocs/op BenchmarkReaderHuge/6230K-12 172 6830563 ns/op 934.05 MB/s 16008416 B/op 26844 allocs/op BenchmarkReaderHuge/12461K-12 100 11409467 ns/op 1118.39 MB/s 22655163 B/op 48107 allocs/op BenchmarkReaderHuge/24922K-12 66 19780395 ns/op 1290.19 MB/s 35158559 B/op 90216 allocs/op BenchmarkReaderHuge/49844K-12 34 37282559 ns/op 1369.03 MB/s 60528624 B/op 174497 allocs/op ```	5 years ago
Yao Zongyou	18fedc67d5	friendly prompt for s3select MalformedXML error (#8171 ) partly fix #7911	5 years ago
Yao Zongyou	ec9bfd3aef	speed up the performance of s3select on csv (#7945 )	5 years ago
Kanagaraj M	12353caf35	Fix: Support Unicode delimiters in s3 select (#7931 )	5 years ago
Yao Zongyou	c4f480a839	fix csv read bug (#7885 )	5 years ago
Yao Zongyou	60831e3299	aggregation functions' argument may already has been cast to numeric (#7876 )	5 years ago
Yao Zongyou	037319066f	fix unicode support related bugs in s3select (#7877 )	5 years ago
Ryan Tam	bd56f80250	Fix ignored alias for aggregate result in S3 Select (#7849 ) The SQL parser as it stands right now ignores alias for aggregate result, e.g. `SELECT COUNT(*) AS thing FROM s3object` doesn't actually return record like `{"thing": 42}`, it returns a record like `{"_1": 42}`. Column alias for aggregate result is supported in AWS's S3 Select, so this commit fixes that by respecting the `expr.As` in the expression. Also improve test for S3 select On top of testing a simple `SELECT` query, we want to test a few more "advanced" queries (e.g. aggregation). Convert existing tests into table driven tests[1], and add the new test cases with "advanced" queries into them. [1] - https://github.com/golang/go/wiki/TableDrivenTests	5 years ago
Yao Zongyou	941fed8e4a	s3Select: call Close on error to release the read lock (#7830 )	5 years ago
Yao Zongyou	55092bede1	add timestamp compare support (#7832 )	5 years ago
Yao Zongyou	90a3b830f4	fix typo and the string representation of the time.Time value (#7831 )	5 years ago
Yao Zongyou	23b9df0694	Fix s3select TRIM function's nil pointer dereference bug (#7817 )	5 years ago
Joe Stevens	a19cf063b5	Fixes for multiplatform dev and testing from forks (#7734 ) Add support for correct dependency URLs on all platforms only build mountinfo.go on linux make testfile path relative to support fork work	6 years ago
kannappanr	5ecac91a55	Replace Minio refs in docs with MinIO and links (#7494 )	6 years ago
Aditya Manthramurthy	b1b1d77893	Set S3 Select record message length to 128KiB (#7475 ) - Previously this limit was a little more than 1MiB, and it broke compatibility with AWS SDK Java causing a buffer overflow error.	6 years ago
Kirill Motkov	3d29ab4059	Rewrite if-else chains to switch statements (#7382 )	6 years ago
Harshavardhana	91d85a0d53	Fix stale locks held by SelectParquet API (#7364 ) Vendorize upstream parquet-go to fix this issue.	6 years ago

1 2

80 Commits (e4a44f6224c02fc2e178f7ba550e64c35840ed03)