Skip to content

Beam Search 10.Debug beam_search=True

Higepon Taro Minowa edited this page Jul 6, 2017 · 50 revisions

Error

ValueError: Shape must be rank 2 but is rank 1 for 'model_with_buckets/sequence_loss/sequence_loss_by_example/sampled_softmax_loss/MatMul_1' (op: 'MatMul') with input shapes: [?], [?,256].

comes from

File "/Users/higepon/Dropbox/tensorflow_seq2seq_chatbot/lib/seq2seq_model.py", line 120, in sampled_loss num_classes=self.target_vocab_size), which is

return tf.cast( tf.nn.sampled_softmax_loss( weights=local_w_t, biases=local_b, labels=labels, inputs=local_inputs, num_sampled=num_samples, num_classes=self.target_vocab_size), dtype)

which is

# inputs has shape [batch_size, dim]
# sampled_w has shape [num_sampled, dim]
# sampled_b has shape [num_sampled]
# Apply X*W'+B, which yields [batch_size, num_sampled]
sampled_logits = math_ops.matmul(
    inputs, sampled_w, transpose_b=True) + sampled_b

Debug

  • see inputs and sampled_w values in the function above
    • inputs: (?,)
    • sampled_w: (?, 256)
    • sampled_b: (?, )
  • Is 256=sampled_num?
    • NO: debugger shows 512
  • To see if the comment above about the shape is right, let's run this with beam_search=False
    • inputs: (?, 256)
    • sampled_w: (?, 256)
    • sampled_b: (?, )
  • Fact: inputs parameter in _compute_sampled_logits is different

Debug 2

  • why _compute_sampled_logits is different
    • (?, 256) for beam_search=False
    • (?, ) for beam_search=True
  • inputs comes from sampled_loss(labels, logits) in Seq2SeqModel as logits parameter
  • which comes from sequence_loss_by_example as logits parameter
  • comes from sequence_loss as logits parameter
  • (?, 256) right one is coming from model_with_buckets as outputs[-1] (where outputs length = 1)
  • (?, ) coming from same place and same length, but different shape.

Debug 4

The outputs parameter above is coming as follows.

bucket_outputs, *_ = seq2seq(encoder_inputs[:bucket[0]], decoder_inputs[:bucket[1]])

  • confirmed seq2seq_f is the function
  • confirmed decoder_input, encoder_input and dodecode are the same. (Input values are the same).

Debug 5

  • then function calls embedding_attention_seq2seq directory.
  • all the inputs of the function are the same.

Debug 6

  • The function calls embedding_attention_decoder
  • all the input of the function are the same.

Debug 7

  • IF beam_search = False
    • attention_decoder
      • loop_function is None
  • ELSE
    • beam_attention_decoder
      • loop_function parameter is different. (= _extract_beam_search)

Debug 8

Understand basic flow of attention_decoder (for beam_search=False) According to the doc.

Output is: A list of the same length as decoder_inputs of 2D Tensors of shape [batch_size x output_size]. which totally makes sense. Because this is output!

essential of how outputs are made

inp = decoder_input[i] # (?, 256)
# Merge input and previous attentions into one vector of the right size.
x = linear([inp] + attns, input_size, True) # (?, 256)
# Run the RNN.
cell_output, state = cell(x, state) # (?, 256)
output = linear([cell_output] + attns, output_size, True) # (?, 256)

Debug 9

Understand basic flow of beam_attention_decoder, and compare with Debug 8.

essential

x = linear([inp] + attns, input_size, True) # (?, 256)
cell_output, state = cell(x, state) # (?, 256)
output = linear([cell_output] + attns, output_size, True) # (?, 256)
tf.argmax(nn_ops.xw_plus_b(output, output_projection[0], output_projection[1]), dimension=1) # ??? this is it

questions

  • output_projection value
    • output_projection[0]: (256, 50000)
    • output_projection[1]: (50000, )
    • tf.matmul(output, output_projection[0]): (?, 50000)
    • tf.matmul(output, output_projection[0]) + output_projection[1] : (?, 500000) DOESNT_MAKE_ANY_SENSE
  • Either
    • I misunderstand literal (500000, ) or (?, 50000)
    • or
    • output_projection is broken
  • tf.argmax(above) is (?, ) which is expected, but appearlently wrong return type
  • I think (?, 50000) doesn't make any sense, because it's totally different format.
  • what is output_projection
    • Oh wait 50000 is num symbols, so output_projectiong make ([num symbols probability], ...), if we argmax, we get most likely token_ids.
    • it doesn't conflict with beam_attention_decoder doc "outputs: A list of the same length as decoder_inputs of 2D Tensors of shape [batch_size x output_size]. These represent the generated outputs."
  • Maybe attention_decoder and beam_attention_decoder expects different output? eg) call should do argmax and projection
    • Says "If we use sampled softmax, we need an output projection."
    • num_samples in Seq2Seq constructor is 512 as default which is less than vocab_size, so both beam_search=True and False should be using output_projections
  • Check if latest attention_decoder is output_projection based
    • NO
  • Check if current caller of attention_deocder is using output_projection.
  • IMA_KOKO attention_decoder has not output_projection param, but beam_attention_decoder has, what is the design decision behind this!
  • Hypothesis: we should not apply output_project before we calculate sampled_loss.
    • How can I confrim this is true?
      • check existing ones?
  • History for tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py - tensorflow/tensorflow
  • who set output_projection
  • where output_projection is set
  • where is coming from?
  • Is this necessary?
  • Read this

Clone this wiki locally