Skip to content

Commit cdb5c84

Browse files
committed
Remove parquet dataset feature flags
Why these changes are being introduced: During development work of establishing the parquet dataset architecture for TIMDEX ETL, feature flags were used in various applications to remain backwards compatible. Now that the parquet dataset architecture is established, the feature flags are no longer needed. How this addresses that need: * Removes 'bulk-index' and 'bulk-delete' CLI commands, preferring only 'bulk-update' which expects a subset of records from the parquet dataset to index and delete (per the 'action' column in the dataset). Side effects of this change: * Indexing from a JSON array of TIMDEX Record objects is no longer supported, neither is a deleting from a JSON array of objects with a single 'timdex_record_id' field. All indexing and deleting is expected to originate from parquet dataset records. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-491
1 parent f63429c commit cdb5c84

File tree

3 files changed

+26
-189
lines changed

3 files changed

+26
-189
lines changed

README.md

Lines changed: 26 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -115,33 +115,31 @@ SENTRY_DSN=### If set to a valid Sentry DSN, enables Sentry exception monitoring
115115
All CLI commands can be run with `pipenv run`.
116116
117117
```
118-
Usage: tim [OPTIONS] COMMAND [ARGS]...
119-
120-
TIM provides commands for interacting with OpenSearch indexes.
121-
For more details on a specific command, run tim COMMAND -h.
122-
123-
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
124-
│ --url -u TEXT The OpenSearch instance endpoint minus the http scheme, e.g. │
125-
'search-timdex-env-1234567890.us-east-1.es.amazonaws.com'. If not provided, will attempt to get from the │
126-
│ TIMDEX_OPENSEARCH_ENDPOINT environment variable. Defaults to 'localhost'. │
127-
│ --verbose -v Pass to log at debug level instead of info │
128-
│ --help -h Show this message and exit. │
129-
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
130-
╭─ Get cluster-level information ─────────────────────────────────────────────────────────────────────────────────────────────────────╮
131-
│ ping Ping OpenSearch and display information about the cluster. │
132-
│ indexes Display summary information about all indexes in the cluster. │
133-
│ aliases List OpenSearch aliases and their associated indexes. │
134-
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
135-
╭─ Index management commands ─────────────────────────────────────────────────────────────────────────────────────────────────────────╮
136-
│ create Create a new index in the cluster. │
137-
│ delete Delete an index. │
138-
│ promote Promote index as the primary alias and add it to any additional provided aliases. │
139-
│ demote Demote an index from all its associated aliases. │
140-
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
141-
╭─ Bulk record processing commands ───────────────────────────────────────────────────────────────────────────────────────────────────╮
142-
│ bulk-index Bulk index records into an index. │
143-
│ bulk-delete Bulk delete records from an index. │
144-
│ bulk-update Bulk update records from an index. │
145-
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
118+
Usage: tim [OPTIONS] COMMAND [ARGS]...
119+
120+
TIM provides commands for interacting with OpenSearch indexes.
121+
For more details on a specific command, run tim COMMAND -h.
122+
123+
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
124+
│ --url -u TEXT The OpenSearch instance endpoint minus the http scheme, e.g. │
125+
'search-timdex-env-1234567890.us-east-1.es.amazonaws.com'. If not provided, will attempt to get from the │
126+
│ TIMDEX_OPENSEARCH_ENDPOINT environment variable. Defaults to 'localhost'. │
127+
│ --verbose -v Pass to log at debug level instead of info │
128+
│ --help -h Show this message and exit. │
129+
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
130+
╭─ Get cluster-level information ────────────────────────────────────────────────────────────────────────────────────────────────╮
131+
│ ping Ping OpenSearch and display information about the cluster. │
132+
│ indexes Display summary information about all indexes in the cluster. │
133+
│ aliases List OpenSearch aliases and their associated indexes. │
134+
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
135+
╭─ Index management commands ────────────────────────────────────────────────────────────────────────────────────────────────────╮
136+
│ create Create a new index in the cluster. │
137+
│ delete Delete an index. │
138+
│ promote Promote index as the primary alias and add it to any additional provided aliases. │
139+
│ demote Demote an index from all its associated aliases. │
140+
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
141+
╭─ Bulk record processing commands ──────────────────────────────────────────────────────────────────────────────────────────────╮
142+
│ bulk-update Bulk update records for an index. │
143+
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
146144
```
147145

tests/test_cli.py

Lines changed: 0 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -186,81 +186,6 @@ def test_promote_index(caplog, runner):
186186
# Test bulk record processing commands
187187

188188

189-
@freeze_time("2022-09-01")
190-
@my_vcr.use_cassette("cli/bulk_index_with_index_name_success.yaml")
191-
def test_bulk_index_with_index_name_success(caplog, runner):
192-
result = runner.invoke(
193-
main,
194-
[
195-
"bulk-index",
196-
"--index",
197-
"dspace-2022-09-01t00-00-00",
198-
"tests/fixtures/sample_records.json",
199-
],
200-
)
201-
assert result.exit_code == EXIT_CODES["success"]
202-
assert (
203-
"Bulk indexing records from file 'tests/fixtures/sample_records.json' into "
204-
"index 'dspace-2022-09-01t00-00-00'" in caplog.text
205-
)
206-
assert "Bulk indexing complete!" in caplog.text
207-
208-
209-
@freeze_time("2022-09-01")
210-
@my_vcr.use_cassette("cli/bulk_index_with_source_success.yaml")
211-
def test_bulk_index_with_source_success(caplog, runner):
212-
result = runner.invoke(
213-
main,
214-
["bulk-index", "--source", "dspace", "tests/fixtures/sample_records.json"],
215-
)
216-
assert result.exit_code == EXIT_CODES["success"]
217-
assert (
218-
"Bulk indexing records from file 'tests/fixtures/sample_records.json' into "
219-
"index 'dspace-2022-09-01t00-00-00'" in caplog.text
220-
)
221-
assert "Bulk indexing complete!" in caplog.text
222-
223-
224-
@freeze_time("2022-09-01")
225-
@my_vcr.use_cassette("cli/bulk_delete_with_index_name_success.yaml")
226-
def test_bulk_delete_with_index_name_success(caplog, runner):
227-
result = runner.invoke(
228-
main,
229-
[
230-
"bulk-delete",
231-
"--index",
232-
"alma-2022-09-01t00-00-00",
233-
"tests/fixtures/sample_deleted_records.txt",
234-
],
235-
)
236-
assert result.exit_code == EXIT_CODES["success"]
237-
assert (
238-
"Bulk deleting records in file 'tests/fixtures/sample_deleted_records.txt' "
239-
"from index 'alma-2022-09-01t00-00-00'" in caplog.text
240-
)
241-
assert "Bulk deletion complete!" in caplog.text
242-
243-
244-
@freeze_time("2022-09-01")
245-
@my_vcr.use_cassette("cli/bulk_delete_with_source_success.yaml")
246-
def test_bulk_delete_with_source_success(caplog, runner):
247-
result = runner.invoke(
248-
main,
249-
[
250-
"bulk-delete",
251-
"--source",
252-
"alma",
253-
"tests/fixtures/sample_deleted_records.txt",
254-
],
255-
)
256-
assert result.exit_code == EXIT_CODES["success"]
257-
assert (
258-
"Bulk deleting records in file 'tests/fixtures/sample_deleted_records.txt' "
259-
"from index 'alma-2022-09-01t00-00-00'" in caplog.text
260-
)
261-
assert "Bulk deletion complete!" in caplog.text
262-
263-
264189
@patch("timdex_dataset_api.dataset.TIMDEXDataset.load")
265190
@patch("tim.helpers.validate_bulk_cli_options")
266191
@patch("tim.opensearch.bulk_delete")

tim/cli.py

Lines changed: 0 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -255,92 +255,6 @@ def promote(ctx: click.Context, index: str, alias: list[str]) -> None:
255255
# Bulk record processing commands
256256

257257

258-
# NOTE: FEATURE FLAG: 'bulk_index' supports ETL v1
259-
@main.command()
260-
@click.option("-i", "--index", help="Name of the index to bulk index records into.")
261-
@click.option(
262-
"-s",
263-
"--source",
264-
type=click.Choice(VALID_SOURCES),
265-
help="Source whose primary-aliased index to bulk index records into.",
266-
)
267-
@click.argument("filepath", type=click.Path())
268-
@click.pass_context
269-
def bulk_index(ctx: click.Context, index: str, source: str, filepath: str) -> None:
270-
"""Bulk index records into an index.
271-
272-
Must provide either the name of an existing index in the cluster or a valid source.
273-
If source is provided, will index records into the primary-aliased index for the
274-
source.
275-
276-
Logs an error and aborts if the provided index doesn't exist in the cluster.
277-
278-
FILEPATH: path to transformed records file, use format "s3://bucketname/objectname"
279-
for s3.
280-
"""
281-
client = ctx.obj["CLIENT"]
282-
index = helpers.validate_bulk_cli_options(index, source, client)
283-
284-
logger.info("Bulk indexing records from file '%s' into index '%s'", filepath, index)
285-
record_iterator = helpers.parse_records(filepath)
286-
results = tim_os.bulk_index(client, index, record_iterator)
287-
logger.info(
288-
"Bulk indexing complete!\n"
289-
" Errors: %d%s\n"
290-
" Created: %d\n"
291-
" Updated: %d\n"
292-
" --------\n"
293-
" Total: %d",
294-
results["errors"],
295-
" (see logs for details)" if results["errors"] else "",
296-
results["created"],
297-
results["updated"],
298-
results["total"],
299-
)
300-
301-
302-
# NOTE: FEATURE FLAG: 'bulk_delete' supports ETL v1
303-
@main.command()
304-
@click.option("-i", "--index", help="Name of the index to bulk delete records from.")
305-
@click.option(
306-
"-s",
307-
"--source",
308-
type=click.Choice(VALID_SOURCES),
309-
help="Source whose primary-aliased index to bulk delete records from.",
310-
)
311-
@click.argument("filepath", type=click.Path())
312-
@click.pass_context
313-
def bulk_delete(ctx: click.Context, index: str, source: str, filepath: str) -> None:
314-
"""Bulk delete records from an index.
315-
316-
Must provide either the name of an existing index in the cluster or a valid source.
317-
If source is provided, will delete records from the primary-aliased index for the
318-
source.
319-
320-
Logs an error and aborts if the provided index doesn't exist in the cluster.
321-
322-
FILEPATH: path to deleted records file, use format "s3://bucketname/objectname"
323-
for s3.
324-
"""
325-
client = ctx.obj["CLIENT"]
326-
index = helpers.validate_bulk_cli_options(index, source, client)
327-
328-
logger.info("Bulk deleting records in file '%s' from index '%s'", filepath, index)
329-
record_iterator = helpers.parse_deleted_records(filepath)
330-
results = tim_os.bulk_delete(client, index, record_iterator)
331-
logger.info(
332-
"Bulk deletion complete!\n"
333-
" Errors: %d%s\n"
334-
" Deleted: %d\n"
335-
" --------\n"
336-
" Total: %d",
337-
results["errors"],
338-
" (see logs for details)" if results["errors"] else "",
339-
results["deleted"],
340-
results["total"],
341-
)
342-
343-
344258
@main.command()
345259
@click.option(
346260
"-i",

0 commit comments

Comments
 (0)