Beam Search 10.Debug beam_search=True

Error

ValueError: Shape must be rank 2 but is rank 1 for 'model_with_buckets/sequence_loss/sequence_loss_by_example/sampled_softmax_loss/MatMul_1' (op: 'MatMul') with input shapes: [?], [?,256].

comes from

File "/Users/higepon/Dropbox/tensorflow_seq2seq_chatbot/lib/seq2seq_model.py", line 120, in sampled_loss num_classes=self.target_vocab_size), which is

return tf.cast( tf.nn.sampled_softmax_loss( weights=local_w_t, biases=local_b, labels=labels, inputs=local_inputs, num_sampled=num_samples, num_classes=self.target_vocab_size), dtype)

which is

# inputs has shape [batch_size, dim]
# sampled_w has shape [num_sampled, dim]
# sampled_b has shape [num_sampled]
# Apply X*W'+B, which yields [batch_size, num_sampled]
sampled_logits = math_ops.matmul(
    inputs, sampled_w, transpose_b=True) + sampled_b

Debug

see inputs and sampled_w values in the function above
- inputs: (?,)
- sampled_w: (?, 256)
- sampled_b: (?, )
Is 256=sampled_num?
- NO: debugger shows 512
To see if the comment above about the shape is right, let's run this with beam_search=False
- inputs: (?, 256)
- sampled_w: (?, 256)
- sampled_b: (?, )
Fact: inputs parameter in _compute_sampled_logits is different

Debug 2

why _compute_sampled_logits is different
- (?, 256) for beam_search=False
- (?, ) for beam_search=True
inputs comes from sampled_loss(labels, logits) in Seq2SeqModel as logits parameter
which comes from sequence_loss_by_example as logits parameter
comes from sequence_loss as logits parameter
(?, 256) right one is coming from model_with_buckets as outputs[-1] (where outputs length = 1)
(?, ) coming from same place and same length, but different shape.

Debug 4

The outputs parameter above is coming as follows.

bucket_outputs, *_ = seq2seq(encoder_inputs[:bucket[0]], decoder_inputs[:bucket[1]])

confirmed seq2seq_f is the function
confirmed decoder_input, encoder_input and dodecode are the same. (Input values are the same).

Debug 5

then function calls embedding_attention_seq2seq directory.
all the inputs of the function are the same.

Debug 6

The function calls embedding_attention_decoder
all the input of the function are the same.

Debug 7

IF beam_search = False
- attention_decoder
  - loop_function is None
ELSE
- beam_attention_decoder
  - loop_function parameter is different. (= _extract_beam_search)

Debug 8

Understand basic flow of attention_decoder (for beam_search=False) According to the doc.

Output is: A list of the same length as decoder_inputs of 2D Tensors of shape [batch_size x output_size]. which totally makes sense. Because this is output!

essential of how outputs are made

inp = decoder_input[i] # (?, 256)
# Merge input and previous attentions into one vector of the right size.
x = linear([inp] + attns, input_size, True) # (?, 256)
# Run the RNN.
cell_output, state = cell(x, state) # (?, 256)
output = linear([cell_output] + attns, output_size, True) # (?, 256)

Debug 9

Understand basic flow of beam_attention_decoder, and compare with Debug 8.

essential

x = linear([inp] + attns, input_size, True) # (?, 256)
cell_output, state = cell(x, state) # (?, 256)
output = linear([cell_output] + attns, output_size, True) # (?, 256)
tf.argmax(nn_ops.xw_plus_b(output, output_projection[0], output_projection[1]), dimension=1) # ??? this is it

questions

output_projection value
- output_projection[0]: (256, 50000)
- output_projection[1]: (50000, )
- tf.matmul(output, output_projection[0]): (?, 50000)
- tf.matmul(output, output_projection[0]) + output_projection[1] : (?, 500000) DOESNT_MAKE_ANY_SENSE
Either
- I misunderstand literal (500000, ) or (?, 50000)
- or
- output_projection is broken
tf.argmax(above) is (?, ) which is expected, but appearlently wrong return type
I think (?, 50000) doesn't make any sense, because it's totally different format.
what is output_projection
- Oh wait 50000 is num symbols, so output_projectiong make ([num symbols probability], ...), if we argmax, we get most likely token_ids.
- it doesn't conflict with beam_attention_decoder doc "outputs: A list of the same length as decoder_inputs of 2D Tensors of shape [batch_size x output_size]. These represent the generated outputs."
Maybe attention_decoder and beam_attention_decoder expects different output? eg) call should do argmax and projection
- Says "If we use sampled softmax, we need an output projection."
- num_samples in Seq2Seq constructor is 512 as default which is less than vocab_size, so both beam_search=True and False should be using output_projections
Check if latest attention_decoder is output_projection based
- NO
Check if current caller of attention_deocder is using output_projection.
IMA_KOKO attention_decoder has not output_projection param, but beam_attention_decoder has, what is the design decision behind this!
Hypothesis: we should not apply output_project before we calculate sampled_loss.
- How can I confrim this is true?
  - check existing ones?
History for tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py - tensorflow/tensorflow
who set output_projection
where output_projection is set
where is coming from?
Is this necessary?
Read this
- INF8225_Project/my_seq2seq.py at 3c012471740ffdc068118690d014c7e590702055 · XefPatterson/INF8225_Project uses diffrent argmax
- seq2seq/beam_search.py at 50bb833078af91c2e43b040f2dcf0a8f0c23f5a0 · jack-and-rozz/seq2seq kind a make sense it's not using outputs
- wait modelf/my_seq2seq.py at 1a3d023ceb2d169b8e1649fcb59415bee0231306 · mitchelljeff/modelf is using argmax even for normal attention decoder.
- what this tf.argmax(nn_ops.xw_plus_b means?

Beam Search 10.Debug beam_search=True

Error

Debug

Debug 2

Debug 4

Debug 5

Debug 6

Debug 7

Debug 8

essential of how outputs are made

Debug 9

essential

questions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!