logs5:Consolidate logs

Consolidate logs

Log 1: what specific output am I working on right now?
- Current log system has some issues
  - Can't see whole log and too many log files generated
  - easy_tf_log and normal log co-exists
Log 2: thinking out loud - e.g. hypotheses about the current problem, what to work on next
- Read the document first
- hostname might be the issue
- try all the log files to see if there's actually the most recent one? or download?
Log 3: record of currently ongoing runs along with a short reminder of what question each run is supposed to answer
- Reading TensorBoard: Visualizing Learning | TensorFlow
- Reading tf.summary.FileWriter | TensorFlow
- Readingtensorflow/tensorboard: TensorFlow's Visualization Toolkit
  - important Why does it read the whole directory, rather than an individual file? You might have been using supervisor.py to run your model, in which case if TensorFlow crashes, the supervisor will restart it from a checkpoint. When it restarts, it will start writing to a new events file, and TensorBoard will stitch the various event files together to produce a consistent history of what happened.
- tensorflow/event_file_writer.py at r1.7 · tensorflow/tensorflow
  - filename naming rules is written in C++
    - tensorflow/events_writer.cc at master · tensorflow/tensorflow

  filename_ =
      strings::Printf("%s.out.tfevents.%010lld.%s%s", file_prefix_.c_str(),
                      static_cast<int64>(time_in_seconds),
                      port::Hostname().c_str(), file_suffix_.c_str());

We should check how tensorflow name the file-> See above.
Turns out tflog and normal log have different event log and they work fine together with tensorboard if they are in a same directory. See the graphs below.
Observations.
- msec data looks wrong. We should check.
  - Actually it makes sense. The peaks are when saving model files.
  - It's going up because average len goes up (more computation when doing beam search)
- avg_len goes up which maybe good
- reward didn't go up much.
  - Okay I think loss and reward graph look very similar and which is not good at all.
  - I'm taking a look at reward implementation.
    - If I remember we should normalize reward. Let's do it.
    - let's divide by max_len.
- would be great if we could match steps, somehow
Log 4: results of runs (TensorBoard graphs, any other significant observations), separated by type of run (e.g. by environment the agent is being trained in)

logs5:Consolidate logs

Consolidate logs

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally