Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New types of tree: ConcurrentPermutermTree, ConcurrentWildcardTree for wildcard queries #3

Open
GoogleCodeExporter opened this issue Jul 3, 2015 · 2 comments

Comments

@GoogleCodeExporter
Copy link

It would be useful to support wildcard queries.

Two approaches to be investigated (both of which will be tracked in this issue):

(1) A permuterm index on top of the ConcurrentRadixTree. This would support 
queries such as "<prefix>*<suffix>" on a single tree. It may be more memory 
efficient than a hash-dictionary approach. See: 
http://nlp.stanford.edu/IR-book/html/htmledition/permuterm-indexes-1.html

(2) A composite of a ConcurrentRadixTree and a ConcurrentReversedRadixTree. One 
tree would support prefix lookup, the other suffix lookup. Query 
"prefix*suffix" may return the intersection of the results from both trees, 
after some post-filtering. This second approach however, is near the territory 
of a query engine on top of multiple indexes, so if implemented would not 
belong in this project, but in http://code.google.com/p/cqengine/

Example usage for (1) would be:

public static void main(String[] args) {
    PermutermTree<Integer> tree = new ConcurrentPermutermTree<Integer>(new DefaultCharArrayNodeFactory());

    tree.put("TEST", 1);
    tree.put("TOAST", 2);
    tree.put("TEAM", 3);

    System.out.println("Keys matching 'T*T': " + Iterables.toString(tree.getKeysMatching("T", "T"))); // prefix, suffix
}


Output would be:
    Keys matching 'T*T': [TOAST, TEST]

Original issue reported on code.google.com by [email protected] on 24 Mar 2013 at 10:19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant