Skip to content

vecsim index performance: hnswlib::L2SqrSIMD4Ext takes 39.35% of on-cpu cycles ( _mm_add_ps and _mm_loadu_ps are related to it ) #28

@filipecosta90

Description

@filipecosta90

Sample rdb loadable via vecsim search brach:

s3://benchmarks.redislabs/redisearch/vecsim/ann-benchmarks/glove-100-angular/dump.rdb

Sample query:

627473461.931705 [0 127.0.0.1:41480] "HSET" "ann_14541" "vector" "\x98Q\xec\xbd\x84\x81\xf7>\xb4<o\xbe*\xe3\xbf>m\xc5\xde\xbe\xf7u\b\xbfK\x02\x14>V\x82%\xbe\x02\x829>W\xb1x<y\x1e$?\xd69\xd6>\xa8Wz?;\xdf\x87\xbf\xa3\x92\xca?=\xb8\xdb\xbe\xcc\xb45?\xdc\x7f\xa4=\x0eJ(?\x97sy?W\xcf\x01?+\xa4\x9c\xbeKY\xa6\xbe\xdb3\xbb>\xee_\xb9>\x9c\xdc\xdf>\xb0\xac4\xbd\x92\a\x82\xbd\xd6s\xb2\xbeU\xde\x1e?E\rN?e\x01\x93<\x1f\xbf7>\xa1\xd6\x14\xbf\x97\xa8\xde\xbe\x1d8\xf7\xbeX9\x84>\xa0\x1a\x97?\x12\xa2\xfc\xbc\xd9|\xac>\x9c\xa7\xba\xbe\xbb\xf2Y>=~\xcf\xbe\x93\xa9\x02>\xa0\xe0\x82\xbe\xffx\xef=u\xc85\xbf\x05\xa2\xa7\xbd\xb1\x16_\xbeJ\x98\t?\xe6t\x19\xbe\xef\xac\xdd=\xf4\xa6:?\xf6E\x02\xbe\x9f\xc8\x0b?ms\x83\xbe\xc8\xef\x05\xbfi5\x04\xbe\xe2X\a?O\xcc\x9a>*o\x17?_{.?8g\x94?\xee\xb1\x84\xbe\xdf\xa6W\xbf>\"\xe6\xbdk\x9a7\xbd\xfb\\M>\x1f\x11\xd3\xbe\\ y?\xcb\xdb1>\"\xc3\x12?+\x18\x95\xbe\xf2\xb5\x1f\xbf\xc4%\xb7\xbe\xb5\x1a\x1a?WC\xa2>\xb9\x88o>$\xd6J?\x89_1\xbc_{F>\x89\xb4\x8d\xbdI.\x8f\xbf\xa2b\x9c>\xaa\xb7\xd6\xbe\xbb\x0f ?z\xc2r>\xa1\x10\xc1\xbe\xda \x03\xbf\xbe\xf6\x8c>\xa1\x83\xae\xbc\xbaI\xb4?\x96&\x85\xbd\xc4\xeb\xca\xbe\xee\xce\x1a?\xf47\x81\xbe\x89\b\xbf\xbd\x99\r\xe2>;p\x8e\xbe/\xf8t="

Top on CPU consumers:

Flat Flat% Sum% Cum Cum% Name Inlined?
38718330030 17.65% 17.65% 38766179198 17.67% _mm_loadu_ps (inline)
18647707936 8.50% 26.15% 18666112242 8.51% _mm_add_ps (inline)
15234704835 6.94% 33.09% 86321125827 39.35% hnswlib::L2SqrSIMD4Ext  
10545240828 4.81% 37.90% 10556520348 4.81% _mm_mul_ps (inline)
3086731940 1.41% 39.31% 3086731940 1.41% _mm_sub_ps (inline)
0 0.00% 39.31% 86317305756 39.34% vectorIndexer (inline)
0 0.00% 39.31% 86317305756 39.34% moduleNotifyKeyspaceEvent  
0 0.00% 39.31% 86317305756 39.34% indexBulkFields  
0 0.00% 39.31% 68849073985 31.38% hnswlib::HierarchicalNSW::searchBaseLayer  
0 0.00% 39.31% 12029285653 5.48% hnswlib::HierarchicalNSW::mutuallyConnectNewElement  
0 0.00% 39.31% 86258259186 39.32% hnswlib::HierarchicalNSW::addPoint  
0 0.00% 39.31% 86317305756 39.34% Indexes_UpdateMatchingWithSchemaRules  
0 0.00% 39.31% 86317305756 39.34% Indexer_Process  
0 0.00% 39.31% 86317305756 39.34% Indexer_Add  
0 0.00% 39.31% 86317305756 39.34% IndexerBulkAdd  
0 0.00% 39.31% 86317305756 39.34% IndexSpec_UpdateDoc  
0 0.00% 39.31% 86317305756 39.34% HashNotificationCallback  
0 0.00% 39.31% 86317305756 39.34% HNSWIndex_AddVector  
0 0.00% 39.31% 86317305756 39.34% Document_AddToIndexes  
0 0.00% 39.31% 86317305756 39.34% AddDocumentCtx_Submit  

Flame Chart detail of hnswlib::L2SqrSIMD4Ext cpu cycles

image

Link:
https://s3.amazonaws.com/benchmarks.redislabs/redisearch/vecsim/perf-tasks/ann-benchmarks/glove-100-angular/ann-benchmark-indexing.svg

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions