Skip to content

Vanishing features on use of flags.useOutDict2 in Gale2007ChineseSegmenterFeatureFactory  #328

Open
@tilneyyang

Description

@tilneyyang

Code block of edu.stanford.nlp.wordseg.Gale2007ChineseSegmenterFeatureFactory from 490 to 497 tries to match an out dict as feature.

     features.add(outDict.getW(charp+charc)+"outdict");       // -1 0
     features.add(outDict.getW(charc+charc2)+"outdict");      // 0 1
     features.add(outDict.getW(charp2+charp)+"outdict");      // -2 -1
     features.add(outDict.getW(charp2+charp+charc)+"outdict");      // -2 -1 0
     features.add(outDict.getW(charp3+charp2+charp)+"outdict");      // -3 -2 -1
     ...

https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/wordseg/Gale2007ChineseSegmenterFeatureFactory.java#L490

outDict.getW(String) returns either 0 or 1, so the real features produced by sample code above might be [0outdict, 0outdict, 1outdict, 0outdict, 0outdict]. The problem is that in getCliqueFeatures (https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/wordseg/Gale2007ChineseSegmenterFeatureFactory.java#L83), the container for all these features is a HashSet, so features with the same name will be overridden.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions