Klaus Post
02aecb2fc1
select: Check if CSV is valid utf8 ( #10991 )
...
Check if first block of data is valid utf8.
Fixes #10970
4 years ago
Anis Elleuch
9902c9baaa
sql: Add support of escape quote in CSV ( #9231 )
...
This commit modifies csv parser, a fork of golang csv
parser to support a custom quote escape character.
The quote escape character is used to escape the quote
character when a csv field contains a quote character
as part of data.
5 years ago
Anis Elleuch
35ecc04223
Support configurable quote character parameter in Select ( #8955 )
5 years ago
Anis Elleuch
de924605a1
Import CSV parser library ( #8927 )
...
The CSV library code is imported from Go 1.13.6
5 years ago
Bruce Wang
c476b27a65
Comment typo "index max" to "index map" ( #8700 )
5 years ago
Harshavardhana
d48fd6fde9
Remove unusued params and functions ( #8399 )
5 years ago
Klaus Post
017456df63
Wait clearing the close channel ( #8250 )
...
Close channel should not be nilled before goroutines have exited.
Fixes potential hang on closing.
5 years ago
Klaus Post
ddea0bdf11
Concurrent CSV parsing and reduce S3 select allocations ( #8200 )
...
```
CSV parsing, BEFORE:
BenchmarkReaderBasic-12 2842 407533 ns/op 397860 B/op 957 allocs/op
BenchmarkReaderReplace-12 2718 429914 ns/op 397844 B/op 957 allocs/op
BenchmarkReaderReplaceTwo-12 2718 435556 ns/op 397855 B/op 957 allocs/op
BenchmarkAggregateCount_100K-12 171 6798974 ns/op 16667102 B/op 308077 allocs/op
BenchmarkAggregateCount_1M-12 19 65657411 ns/op 168057743 B/op 3146610 allocs/op
BenchmarkSelectAll_10M-12 1 20882119900 ns/op 2758799896 B/op 41978762 allocs/op
CSV parsing, AFTER:
BenchmarkReaderBasic-12 3721 312549 ns/op 101920 B/op 338 allocs/op
BenchmarkReaderReplace-12 3776 318810 ns/op 101993 B/op 340 allocs/op
BenchmarkReaderReplaceTwo-12 3610 330967 ns/op 102012 B/op 341 allocs/op
BenchmarkAggregateCount_100K-12 295 4149588 ns/op 3553623 B/op 103261 allocs/op
BenchmarkAggregateCount_1M-12 30 37746503 ns/op 33827931 B/op 1049435 allocs/op
BenchmarkSelectAll_10M-12 1 17608495800 ns/op 1416504040 B/op 21007082 allocs/op
~ benchcmp old.txt new.txt
benchmark old ns/op new ns/op delta
BenchmarkReaderBasic-12 407533 312549 -23.31%
BenchmarkReaderReplace-12 429914 318810 -25.84%
BenchmarkReaderReplaceTwo-12 435556 330967 -24.01%
BenchmarkAggregateCount_100K-12 6798974 4149588 -38.97%
BenchmarkAggregateCount_1M-12 65657411 37746503 -42.51%
BenchmarkSelectAll_10M-12 20882119900 17608495800 -15.68%
benchmark old allocs new allocs delta
BenchmarkReaderBasic-12 957 338 -64.68%
BenchmarkReaderReplace-12 957 340 -64.47%
BenchmarkReaderReplaceTwo-12 957 341 -64.37%
BenchmarkAggregateCount_100K-12 308077 103261 -66.48%
BenchmarkAggregateCount_1M-12 3146610 1049435 -66.65%
BenchmarkSelectAll_10M-12 41978762 21007082 -49.96%
benchmark old bytes new bytes delta
BenchmarkReaderBasic-12 397860 101920 -74.38%
BenchmarkReaderReplace-12 397844 101993 -74.36%
BenchmarkReaderReplaceTwo-12 397855 102012 -74.36%
BenchmarkAggregateCount_100K-12 16667102 3553623 -78.68%
BenchmarkAggregateCount_1M-12 168057743 33827931 -79.87%
BenchmarkSelectAll_10M-12 2758799896 1416504040 -48.66%
```
```
BenchmarkReaderHuge/97K-12 2200 540840 ns/op 184.32 MB/s 1604450 B/op 687 allocs/op
BenchmarkReaderHuge/194K-12 1522 752257 ns/op 265.04 MB/s 2143135 B/op 1335 allocs/op
BenchmarkReaderHuge/389K-12 1190 947858 ns/op 420.69 MB/s 3221831 B/op 2630 allocs/op
BenchmarkReaderHuge/778K-12 806 1472486 ns/op 541.61 MB/s 5201856 B/op 5187 allocs/op
BenchmarkReaderHuge/1557K-12 426 2575269 ns/op 619.36 MB/s 9101330 B/op 10233 allocs/op
BenchmarkReaderHuge/3115K-12 286 4034656 ns/op 790.66 MB/s 12397968 B/op 16099 allocs/op
BenchmarkReaderHuge/6230K-12 172 6830563 ns/op 934.05 MB/s 16008416 B/op 26844 allocs/op
BenchmarkReaderHuge/12461K-12 100 11409467 ns/op 1118.39 MB/s 22655163 B/op 48107 allocs/op
BenchmarkReaderHuge/24922K-12 66 19780395 ns/op 1290.19 MB/s 35158559 B/op 90216 allocs/op
BenchmarkReaderHuge/49844K-12 34 37282559 ns/op 1369.03 MB/s 60528624 B/op 174497 allocs/op
```
5 years ago
Yao Zongyou
ec9bfd3aef
speed up the performance of s3select on csv ( #7945 )
5 years ago
Yao Zongyou
c4f480a839
fix csv read bug ( #7885 )
5 years ago
Yao Zongyou
037319066f
fix unicode support related bugs in s3select ( #7877 )
5 years ago
kannappanr
5ecac91a55
Replace Minio refs in docs with MinIO and links ( #7494 )
6 years ago
Harshavardhana
2520e535a0
Allow lazyQuotes for certain types of CSV ( #7278 )
...
Set lazyQuotes to true, to allow a quote to appear
in an unquote field and a non-doubled quote may
appear in a quoted field.
6 years ago
Bala FA
b0deea27df
Refactor s3select to support parquet. ( #7023 )
...
Also handle pretty formatted JSON documents.
6 years ago