Open
Description
In the original paper, it is said that each program is divided into many short segments (len=50) to feed into model respectively and they are interacted by reusing the hidden states and memory states from previous segment as initial states. However, in this implementation, two points do not fit this description:
- Segments from the same program are scattered in the data list by sorting with their lengths (maybe for reducing padding EOF I guess ?) during data loading and they may be even not in a single batch together during forward. How can they interact with each other without existing in the same batch?
- The initial values of hs and hc are always set to default value in MixturePointer (ones and None respectively), they are never using previous states as initial values, hence no interactions between segments of the same program.
This is only my own question for this implementation. Thanks for any explainations or replies.
Metadata
Metadata
Assignees
Labels
No labels