Open
Description
Code block of edu.stanford.nlp.wordseg.Gale2007ChineseSegmenterFeatureFactory
from 490 to 497 tries to match an out dict as feature.
features.add(outDict.getW(charp+charc)+"outdict"); // -1 0
features.add(outDict.getW(charc+charc2)+"outdict"); // 0 1
features.add(outDict.getW(charp2+charp)+"outdict"); // -2 -1
features.add(outDict.getW(charp2+charp+charc)+"outdict"); // -2 -1 0
features.add(outDict.getW(charp3+charp2+charp)+"outdict"); // -3 -2 -1
...
outDict.getW(String)
returns either 0
or 1
, so the real features produced by sample code above might be [0outdict, 0outdict, 1outdict, 0outdict, 0outdict]
. The problem is that in getCliqueFeatures
(https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/wordseg/Gale2007ChineseSegmenterFeatureFactory.java#L83), the container for all these features is a HashSet
, so features with the same name will be overridden.