Lattice-to-sequence #547

msperber · 2018-11-19T10:24:05Z

This is a port of the code of my lattice-to-sequence paper ( https://arxiv.org/abs/1704.00559 ). It includes the following components:

preproc.LatticeFromPlfExtractor: creates node-labeled lattices with properly normalized lattice scores from given lattices in .plf format.
sent.Lattice / sent.LatticeNode: data structure
input_readers.LatticeReader: can read lattices from the output of the extractor
LatticeLSTMTransducer / LatticeLSTMTransducer: lattice LSTM
attenders.LatticeBiasedMlpAttender: attention that is biased via the lattice confidence scores
A unit test with toy data
note: the lattice LSTM currently does not consider lattice scores (unlike the attention), and the attention bias does not have a peakiness coefficient, as both of these yielded only minor gains in the paper.

There is also a bit of refactoring to make preproc.py more YAML-like, which will not need a review.

Conflicts: xnmt/sent.py

Conflicts: xnmt/modelparts/attenders.py xnmt/modelparts/embedders.py

neubig · 2018-11-19T15:51:39Z

Hi, this basically looks good! I just had a quick clarification that I couldn't tell immediately by looking at the code, which is how the fact that you have a lattice annotated interacts with batching, etc. Previously we had a somewhat sub-optimal implementation that basically assumed a sentence was a list of word IDs, and downstream operations such as batching were predicated on this. Has this been fixed in this commit or previously?

I'm mostly asking because @armatthews @cindyxinyiwang and I were also discussing implementing the Eriguchi et al. tree-to-sequence model, and would run in to the same problem.

msperber · 2018-11-19T16:08:42Z

I think this issue should be solved by the introduction of the Batch and Sentence classes, which replace the nested lists of unclear semantics from before. Is that what you were referring to? This PR doesn't change anything about that. It assumes the batch size is set to 1, and once #543 is merged it should be easy to speed things up via auto batching.

neubig · 2018-11-19T18:18:11Z

OK maybe it does, thanks! Anyway, I browsed this briefly and think it's OK to merge, although I haven't checked carefully.

msperber · 2018-11-19T18:39:32Z

Ok, thanks for taking a look!

msperber added 30 commits August 23, 2018 16:59

add __getitem__ and get_unpadded_sent to Sentence

1197497

started integrating / updating lattices

97646a9

series of bug fixes to lattice encoder

5db0e51

fixed config file and serializable interface

0e05a72

added documentation

3784e97

removed last remaining from_spec from preproc code

e77229f

WIP: added LatticeFromPlfExtractor

da883b1

Merge branch 'master' into sent-unpadded

2373d73

Conflicts: xnmt/sent.py

extracting lattices from PLF works

b7deb0d

implement lattice reader

0b4637b

config file with lattice reader working

51e8665

removed some specialized code

3a18954

remove broken arc dropout

89a784a

move Lattice class to sent module

d100b63

moved lattice reader to input_readers

3d20d71

simplified lattice embedder by delegating to base embedder

651fa10

simplify config

ee956ad

moved lattice embedder to embedders module

1e5f05a

move lattice lstm out of specialized encoders package

3202e7c

minor cleanup

2567f9c

add link

aab738e

fix inconsistency in preproc code

bb561b0

remove unused config file

eb92656

add Lattice.__len__

e1a2de2

simplified code by passing on expr seqs instead of lattices

ff2f120

lattice plotting

387ee61

remove legacy comment

77a59bf

made lattice plotting more flexible

01b3620

fix lattice.reversed()

9927dd5

remove unused fields

c0974ea

msperber added 13 commits October 17, 2018 23:24

lattice padding

ff7c166

fix access to bwd prob

bdd9d5c

prepared LatticeBiasedMlpAttender

b728465

finished LatticeBiasedMlpAttender

e310c04

text_input feature for LatticeReader

34caee9

fix reading in of bwd probs

da57bea

Merge branch 'sent-unpadded' into lattice-revisited2

88ab40c

unpadded sent handling for lattice

eb8543c

'flatten' option for lattice reader

db9db26

Merge branch 'master' into lattice-revisited2

c02e223

Conflicts: xnmt/sent.py

Merge branch 'master' into lattice-revisited2

7a6b398

Conflicts: xnmt/modelparts/attenders.py xnmt/modelparts/embedders.py

Merge branch 'master' into lattice-revisited2

8b8abd9

remove duplicated classes

71260da

msperber requested a review from neubig November 19, 2018 10:24

msperber merged commit 8ee8fd5 into master Nov 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lattice-to-sequence #547

Lattice-to-sequence #547

msperber commented Nov 19, 2018

neubig commented Nov 19, 2018

msperber commented Nov 19, 2018

neubig commented Nov 19, 2018

msperber commented Nov 19, 2018

Lattice-to-sequence #547

Lattice-to-sequence #547

Conversation

msperber commented Nov 19, 2018

neubig commented Nov 19, 2018

msperber commented Nov 19, 2018

neubig commented Nov 19, 2018

msperber commented Nov 19, 2018