You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the original paper, it is said that each program is divided into many short segments (len=50) to feed into model respectively and they are interacted by reusing the hidden states and memory states from previous segment as initial states. However, in this implementation, two points do not fit this description:
Segments from the same program are scattered in the data list by sorting with their lengths (maybe for reducing padding EOF I guess ?) during data loading and they may be even not in a single batch together during forward. How can they interact with each other without existing in the same batch?
The initial values of hs and hc are always set to default value in MixturePointer (ones and None respectively), they are never using previous states as initial values, hence no interactions between segments of the same program.
This is only my own question for this implementation. Thanks for any explainations or replies.
The text was updated successfully, but these errors were encountered:
Hi!
Thank you for your issue! I didn't found this in original code and in paper, when I worked on this repo, but authors mentioned this: "We divide each program into segments consisting of 50
consecutive AST nodes, with the last segment being padded
with EOF if it is not full. The LSTM hidden state and mem-
ory state are initialized with h0, c0, which are two trainable
vectors. The last hidden and memory states from the previ-
ous LSTM segment are fed into the next one as initial states
if both segments belong to the same program. Otherwise,
the hidden and memory states are reset to h0, c0."
In my implementation h0 and c0 are always just set to default values (ones as I remember).
I'm not pretty sure if this helps to improve performance, but you can try to fix this issue. You need to pay attention on data preparation and training code.
It would be great if you make a pull request with fix.
In the original paper, it is said that each program is divided into many short segments (len=50) to feed into model respectively and they are interacted by reusing the hidden states and memory states from previous segment as initial states. However, in this implementation, two points do not fit this description:
This is only my own question for this implementation. Thanks for any explainations or replies.
The text was updated successfully, but these errors were encountered: