1
0
mirror of https://github.com/sqlite/sqlite.git synced 2025-11-21 09:00:59 +03:00
Commit Graph

38 Commits

Author SHA1 Message Date
dan
32ca0dbcdf Have fts5 tables delay initializing the tokenizer until it is first used in all cases where the tokenizer is not "trigram".
FossilOrigin-Name: ca4fdcb8ae95d2a61236b949f852d2bf25ea2dbbff7eedafbd8eb84e8fd96687
2024-05-14 17:16:09 +00:00
dan
7b0fd0c564 Fix a problem with amalgamation builds on this branch.
FossilOrigin-Name: 8f046c82c9cf51fc349674577c68d3d2499ee37009deacbf937d711d9930fd49
2023-11-02 18:10:22 +00:00
dan
e186fe20f5 Add the "remove_diacritics" option to the fts5 trigram tokenizer.
FossilOrigin-Name: 83da80135b6105f47d1de560232449562ae8ac176c8011a6f75589f62bc9b1db
2023-11-02 17:31:06 +00:00
drh
d36f588f31 Fix harmless compiler warnings about unused function parameters.
FossilOrigin-Name: 25d067c270966d9506db8bedf280883e32b69050b14bdbbeda4bb2d9a362619c
2020-11-25 16:28:04 +00:00
dan
95dca8d0cf FTS5 does not handle tokens that contain embedded nul characters. Prevent the trigram tokenizer from returning such tokens. Fix for [2ba5930b2].
FossilOrigin-Name: b1d048748c054575425a4bebf0c5d09962f9329d5ce6a978cf54e508b238584c
2020-10-03 14:36:06 +00:00
dan
ccf578d435 Add tests for the trigram tokenizer. Fix minor issues.
FossilOrigin-Name: 897ced99b44085012aa44d3264940dcbd4c77b295a894a1b58fb2c03a0f7fee8
2020-10-01 16:10:22 +00:00
dan
33a99fad08 Add experimental unicode-aware trigram tokenizer to fts5. And support for LIKE and GLOB optimizations for fts5 tables that use said tokenizer.
FossilOrigin-Name: 0d7810c1aea93c0a3da1ccc4911dbce8a1b6e1dbfe1ab7e800289a0c783b5985
2020-09-30 20:35:37 +00:00
drh
3b574e4ea9 Use the 64-bit memory allocator interfaces in extensions, whenever possible.
FossilOrigin-Name: 07ee06fd390bfebebc014b47583d489747b0423bb96c810bed5c605ce0e3be71
2019-04-13 04:38:32 +00:00
drh
2d77d80a65 Use 64-bit math to compute the sizes of memory allocations in extensions.
FossilOrigin-Name: ca67f2ec0e294384c397db438605df1b47aae5f348a8de94f97286997625d169
2019-01-08 20:02:48 +00:00
drh
f9231c34eb Fix harmless compiler warnings.
FossilOrigin-Name: b57c545a384ab5d62becf3164945b32b1e108b2fb4c8dbd939a1706c2079e18b
2018-12-31 21:43:55 +00:00
dan
eefc72d12f Avoid an undefined left-shift operation in fts5 caused by malformed utf-8
text.

FossilOrigin-Name: c3a3a11194586bef80a9d7ca54caae8af30d4e7b464b8bb3d257ba2d2ec4791f
2018-12-28 14:33:55 +00:00
dan
b163b57212 Fix problems in fts5 found by ASAN.
FossilOrigin-Name: c564bf870106faef297594a51995619c80311d06bd5f8a0c7644f666f22ba576
2018-12-28 07:37:22 +00:00
dan
e89feee5c3 Add the "remove_diacritics=2" option to the unicode61 tokenizer in both FTS5
and FTS3/4.

FossilOrigin-Name: 06177f3f114b5d804b84c27ac843740282e2176fdf0f7a999feda0e1b624adec
2018-12-03 16:14:49 +00:00
dan
b80bb6ce88 Add the "categories" option to the unicode61 tokenizer in fts5.
FossilOrigin-Name: 80d2b9e635e3100f90cffdcffa5b5038da6fbbfccc9f5777c59a4ae760d4cb62
2018-07-13 19:52:43 +00:00
dan
22e8356368 Handle parser stack overflow when parsing fts5 query expressions. Fix some compiler warnings in fts5 code.
FossilOrigin-Name: bc3f7900d5a06829d123814a5ac7b951bcfc1560
2016-02-11 17:01:32 +00:00
dan
e9eb1593f5 Fix an fts5 problem with using both xPhraseFirst() and xPhraseFirstColumn() within a single statement in detail=col mode.
FossilOrigin-Name: 72d53699bf0dcdb9d2a22e229989d7435f061399
2016-01-23 18:51:59 +00:00
dan
3e6a141130 Fix some harmless gcc compiler warnings. Mostly in fts5, but also two in the core code.
FossilOrigin-Name: 5d44d4a6cf5c6b983cbd846d9bc34251df8f4bc5
2015-12-23 16:42:27 +00:00
mistachkin
b9becaa268 Fix even more harmless compiler warnings.
FossilOrigin-Name: 1d0e6aa119da8e15d35508f5d75ffc729979da92
2015-12-16 23:30:30 +00:00
mistachkin
cdabd7bd50 Fix harmless compiler warnings.
FossilOrigin-Name: 1c46c194a2da24fe613d77b5a8d727cc2fc9faa4
2015-10-14 20:34:57 +00:00
dan
9c671b741c Further tests to raise coverage of fts5 synonym code to 100%. Fix a dropped error code in the same.
FossilOrigin-Name: bdedd838bb3028c586bcc9f643852ce1364adb49
2015-09-02 19:48:55 +00:00
dan
ee0c0a8de3 Another change to the fts5 tokenizer API.
FossilOrigin-Name: fc71868496f45f9c7a79ed2bf2d164a7c4718ce1
2015-08-29 15:44:27 +00:00
dan
57e0add3f9 Change the fts5 tokenizer API to allow more than one token to occupy a single position within a document.
FossilOrigin-Name: 90b85b42f2b2dd3e939b129b7df2b822a05e243d
2015-08-28 19:56:47 +00:00
dan
79e2347fdf Fix a bug in the fts5 porter tokenizer preventing it from passing xCreate() arguments through to its parent tokenizer.
FossilOrigin-Name: c3c672af97edf2ae5d793f6fa47364370aa4f4ec
2015-07-31 14:43:02 +00:00
dan
3f09beda45 Remove "#ifdef SQLITE_ENABLE_FTS5" from individual fts5 source files. Add a single "#if !defined(SQLITE_CORE) || defined(SQLITE_ENABLE_FTS5)" to fts5.c.
FossilOrigin-Name: 7819002ed85497bbd0f9cf4d39df641573324436
2015-07-02 15:52:21 +00:00
dan
3f3074e0c1 Remove the "#include sqlite3Int.h" from fts5Int.h.
FossilOrigin-Name: e008c3c8e29c843ec945ddad54b9688bbf2bdb44
2015-05-30 11:49:58 +00:00
dan
21b7d2a9b8 Improve test coverage of fts5_unicode2.c.
FossilOrigin-Name: fea8a4db9d8c7b9a946017a0dc984cbca6ce240e
2015-05-22 06:08:25 +00:00
dan
8c1f46de50 Improve test coverage of fts5_tokenize.c.
FossilOrigin-Name: 0e91a6a520f040b8902da6a1a4d9107dc66c0ea3
2015-05-20 09:27:51 +00:00
dan
b10210ea1b Fix a memory leak that could follow an OOM condition in fts5.
FossilOrigin-Name: de9f8ef6ebf036df5a558cd78fb4927da2d83ce8
2015-05-19 11:32:01 +00:00
dan
7b2ec1ae41 Improve fts5 tests.
FossilOrigin-Name: c1f07a3aa98eac87e2747527d15e5e5562221ceb
2015-04-29 20:54:08 +00:00
dan
f5fab92d82 Add an optimization to the fts5 unicode tokenizer code.
FossilOrigin-Name: f5db489250029678fce845dfb2b1109fde46bea5
2015-03-11 14:51:39 +00:00
dan
47c467c80e Fix a couple of build problems.
FossilOrigin-Name: a5d5468c0509d129e198bf9432190ee07cedb7af
2015-03-04 08:29:24 +00:00
dan
57fec54b53 Fix some problems with building fts5 and fts3 together using the amalgamation.
FossilOrigin-Name: fb10bbb9f9c4481e6043d323a3018a4ec68eb0ff
2015-02-02 11:32:20 +00:00
dan
2656167f6e Improve the performance of the fts5 porter tokenizer implementation.
FossilOrigin-Name: 96ea600440de05ee663e71c3f0d0de2c64108bf9
2015-01-17 17:48:10 +00:00
dan
73f7d6ed75 Optimize the unicode61 tokenizer so that it handles ascii text faster. Make it the default tokenizer. Change the name of the simple tokenizer to "ascii".
FossilOrigin-Name: f22dbccad9499624880ddd48df1b07fb42b1ad66
2015-01-12 17:58:04 +00:00
dan
aacf3d1a3b Remove the iPos parameter from the tokenizer callback. Fix the "tokenchars" and "separators" options on the simple tokenizer.
FossilOrigin-Name: 65f0262fb82dbfd9f80233ac7c3108e2f2716c0a
2015-01-06 19:08:26 +00:00
dan
6024772ba2 Add a version of the unicode61 tokenizer to fts5.
FossilOrigin-Name: d09f7800cf14f73ea86d037107ef80295b2c173a
2015-01-01 16:46:10 +00:00
dan
5fa3acabf4 Fixes to built-in tokenizers.
FossilOrigin-Name: b33fe0dd89f3180c209fa1f9e75d0a7acab12b8e
2014-12-29 11:24:46 +00:00
dan
48d7014067 Fix the customization interfaces so that they match the documentation.
FossilOrigin-Name: fba0b5fc7eead07a4853e78e02d788e7c714f6cd
2014-11-15 20:07:31 +00:00