This document describes the SQLite export format. This is a database file containing a collection of dictionary entries and supporting indexes and tables.
The supporting tables should make it easy to locate articles by searching for
words in various forms. The lemma
, form
and searchtext
tables can be
deleted without loosing information, as this information can also
be obtained from the entry.xjtei
structure.
All text strings in the database use the UTF-8 encoding.
This represents a single directory entry. This is an article describing a single "word".
Field | Type | Comment |
---|---|---|
id |
int pk | Every entry has a numeric key |
lang |
enum('nb', 'nn') | Bokmål or Nynorsk ISO 639-1 code |
pos_id |
fk | What kind of word is this (verb, noun,...) |
tei |
xml null | The dictionary entry in TEI format |
xjtei |
json | The dictionary entry in XJ-TEI format |
The base form of the word described by a dictionary entry.
Ref wikipedia.
A single article can have multiple lemmas, and the same lemma.orth
value can be used by other entries as well.
Field | Type | Comment |
---|---|---|
id |
int pk | Each lemma has its own key |
orth |
text | The spelling of the word |
entry_id |
fk | The corresponding entry |
POS stands for 'Part of Speech' and is the grammatical class that the word belongs to; like verb, noun, adjective, etc.
Field | Type | Comment |
---|---|---|
id |
enum('v', 'n',...) | The class of word (v=verb, n=noun,...) |
name |
text | 'Verb', 'Substantiv', 'Adjektiv',... |
lang |
enum('nb', 'nn') | The language of name |
This expresses the grammatical forms that words of the referenced pos takes. For instance nouns in Norwegian has the following 4 forms:
- "Entall; Ubestemt form"
- "Entall; Bestemt form"
- "Flertall; Ubestemt form"
- "Flertall; Bestemt form"
Field | Type | Comment |
---|---|---|
id |
id | Just something unique |
name |
text | String like "Entall; Ubestemt form" |
order |
int | The natural order for the given pos and lang |
pos_id |
fk | The pos this applies to |
lang |
enum('nb', 'nn') | The language of name |
If form.name
contains ";" it denotes an opportunity to join columns together
names with the same prefix. For instance the 4 forms above can be presented
like this:
This encodes the how a specific lemma of a word is to be spelled in its various grammatical forms. There can be multiple systems that applies for a single word which is expressed by the paradim key. A separate row will be filled in for all variations of gram given the word's pos.
Field | Type | Comment |
---|---|---|
lemma_id |
fk | Combined key |
gram_id |
fk | Combined key |
paradigm |
int | Combined key |
orth |
text | The spelling of the form |
This is contains the concatenation of the plain text found in a dictionary entry. It can be used to implement full text search for dictionary entries that mention a specific word in its description.
Field | Type | Comment |
---|---|---|
entry_id |
pk fk | The dictionary entry text is extracted from |
text |
text | lemma + forms + etym + defs + cits |