Skip to content

Document Import and Indexing

nickmccoy edited this page Mar 5, 2018 · 1 revision

Schema for indexed data

  • document - Contains information for each indexed document. When indexing a document, one row will be inserted into the document table.

  • term - Contains the unique terms of all documents.

  • gram - Contains the grams for a document. Either uni- or bi-grams.

  • idf -

  • tf - Cross-reference table for a document and its terms/grams

Table indexes

SQL Functions

  • insert_terms
  • insert_bigram_df

Table Partitioning

Parallel Processing