Skip to content

Inconsistent Data Querying ElasticSearch #409

@ribeirodba

Description

@ribeirodba

Look this test I´ve performed in Elassandra with Python.

I created a function to query data using Cassandra driver:

def process_query_cassandra(query, fetch_size = 5000, consistency_level=ConsistencyLevel.LOCAL_ONE):
start = timer()
paging_state = None
rows = []
while True:
statement = SimpleStatement(query, fetch_size = fetch_size, consistency_level=consistency_level)
results = session.execute(statement, paging_state=paging_state)
paging_state = results.paging_state
for row in results.current_rows:
rows.append(row)
if paging_state == None:
break
df = pd.DataFrame(rows)
end = timer()
return df, timedelta(seconds=end-start)

Table f0101 has 872390 rows.

When I query using CQL only, results are OK:

query1 = """
select *
from "dlfinjdep"."f0101"
ALLOW FILTERING
"""

Running Cassandra #1
(22-06-01 12:43) Rows: 872390 seconds: 0:03:17.609349
Running Cassandra #2
(22-06-01 12:46) Rows: 872390 seconds: 0:03:04.289089

However, when I use the option to query ElasticSearch index through CQL, I get different results:

query2 = """
select *
from "dlfinjdep"."f0101"
WHERE es_query='{"query":{"match_all":{}}}'
AND es_options='indices=dlfinjdep-f0101-index'
ALLOW FILTERING
"""

Running Elastic #1
(22-06-01 12:50) Rows: 841350 seconds: 0:03:49.136313
Running Elastic #2
(22-06-01 12:54) Rows: 834372 seconds: 0:03:33.985948

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions