Skip to content

Commit c67b783

Browse files
committed
simpler way of adding a list of terms
fixes #25
1 parent 38fce72 commit c67b783

File tree

3 files changed

+31
-0
lines changed

3 files changed

+31
-0
lines changed

README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,23 @@ And here is an example with a two-level hierarchy:
9898

9999
Note that if the `count` is `1` you can omit it.
100100

101+
Entire lists of tokens can be added for a particular address in one go using `add(address, term_list)`:
102+
103+
```python
104+
>>> import termdoc
105+
>>> c = termdoc.HTDM()
106+
>>> c.add("1.1", ["foo", "bar", "bar", "baz"])
107+
>>> c.add("1.2", ["foo", "foo"])
108+
>>> c.get_counts()["bar"]
109+
2
110+
>>> c.get_counts()["foo"]
111+
3
112+
>>> c.get_counts("1.2")["foo"]
113+
2
114+
115+
```
116+
117+
101118
You can **prune** a HTDM to just `n` levels with the method `prune(n)`.
102119

103120
You can iterate over the document-term counts at the leaves of the HTDM with the method `leaf_entries()` (this returns a generator yielding `(document_address, term, count)` tuples). This is effectively a traditional TDM (the document IDs will still reflect the hierarchy but the aggregate counts aren't present).

termdoc/htdm.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,10 @@ def increment_count(self, address, term, count=1):
4242
address = self.address_sep.join(address.split(self.address_sep)[:-1])
4343
first = False
4444

45+
def add(self, address, term_list):
46+
for term in term_list:
47+
self.increment_count(address, term)
48+
4549
def load(self, filename, field_sep="\t", address_sep=None, prefix=None):
4650
address_sep = address_sep or self.address_sep
4751
with open(filename) as f:

tests.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -336,6 +336,16 @@ def test_two_arg_increment_count(self):
336336
self.assertEqual(c.get_counts()["foo"], 3)
337337
self.assertEqual(c.get_counts()["bar"], 3)
338338

339+
def test_add(self):
340+
import termdoc
341+
342+
c = termdoc.HTDM()
343+
c.add("1", ["foo", "bar", "bar", "baz"])
344+
c.add("2", ["foo", "foo", "bar"])
345+
self.assertEqual(c.get_counts()["foo"], 3)
346+
self.assertEqual(c.get_counts("2")["foo"], 2)
347+
self.assertEqual(c.get_counts("1")["bar"], 2)
348+
339349

340350
if __name__ == "__main__":
341351
unittest.main()

0 commit comments

Comments
 (0)