Stack trie

The stack trie is a way of getting a quick orientation on where all the compilations in a model take place, esp., if you are compiling a codebase you are unfamiliar with. It is a tree of stack frames, for all stacks that triggered PT2 compilation. If only a single stack is in the tree, you will simply see a plain list of frames (most recent call last). With multiple stacks, at every point where two stacks diverge from having a common prefix, we increase the indentation of the list and have a separate sub-list per sub-tree.

Links to particular compilation are color coded by status: [Success], [Success with restart (e.g., graph break)], [Empty graph], [Error], [Metrics were missing]

/workspace/tools/eval.py:135 in <module>
/workspace/tools/eval.py:130 in main
/workspace/tools/eval.py:30 in main_worker
/workspace/networks/managers/evaluator.py:469 in evaluating
- /workspace/networks/engines/aotv3_engine.py:645 in add_reference_frame
- /workspace/networks/engines/aotv3_engine.py:206 in add_reference_frame
  - /workspace/networks/engines/aotv3_engine.py:136 in encode_one_img_mask
  - /workspace/networks/models/aotv3.py:163 in encode_image
  - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1552 in _wrapped_call_impl
  - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1561 in _call_impl
  - /opt/conda/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py:432 in _fn
  - [0/0] [0/1] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
- /workspace/networks/engines/aotv3_engine.py:238 in add_reference_frame
  - /workspace/networks/models/aotv3.py:189 in LSTT_forward
  - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1552 in _wrapped_call_impl
  - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1561 in _call_impl
  - /opt/conda/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py:432 in _fn
  - [1/0] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
  - /workspace/networks/layers/transformer.py:582 in forward
    - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1552 in _wrapped_call_impl
    - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1561 in _call_impl
    - [2/0] [2/1] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
    - /workspace/networks/layers/transformer.py:753 in forward
      - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1552 in _wrapped_call_impl
        
        /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1561 in _call_impl
        
        /opt/conda/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py:432 in _fn
        
        [3/0] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
        
        /workspace/networks/layers/attention.py:314 in forward
        
        /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1552 in _wrapped_call_impl
        
        /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1561 in _call_impl
        
        [4/0] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
        
        /opt/conda/lib/python3.11/site-packages/spatial_correlation_sampler/spatial_correlation_sampler.py:104 in forward
        
        /opt/conda/lib/python3.11/site-packages/torch/autograd/function.py:573 in apply
        
        [5/0] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
        
        /opt/conda/lib/python3.11/site-packages/spatial_correlation_sampler/spatial_correlation_sampler.py:47 in forward
        
        [6/0] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
        
        [7/0] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
      - [8/0] [8/1] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
  - /workspace/networks/layers/transformer.py:603 in forward
    - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/container.py:294 in __getitem__
    - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/container.py:280 in __init__
    - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/container.py:321 in __iadd__
    - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/container.py:398 in extend
    - [9/0] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
  - /workspace/networks/layers/transformer.py:615 in forward
    - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1552 in _wrapped_call_impl
    - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1561 in _call_impl
    - [10/0] [10/2] [10/4] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
  - /workspace/networks/layers/transformer.py:622 in forward
    - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1552 in _wrapped_call_impl
    - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1561 in _call_impl
    - [10/1] [10/3] [10/5] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
  - /workspace/networks/layers/transformer.py:634 in forward
    - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1552 in _wrapped_call_impl
    - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1561 in _call_impl
    - [2/2] [2/3] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
    - /workspace/networks/layers/transformer.py:753 in forward
      - [8/2] [8/3] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
      - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1552 in _wrapped_call_impl
        
        /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1561 in _call_impl
        
        /opt/conda/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py:432 in _fn
        
        [3/1] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
        
        /workspace/networks/layers/attention.py:314 in forward
        
        /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1552 in _wrapped_call_impl
        
        /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1561 in _call_impl
        
        [4/1] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
        
        /opt/conda/lib/python3.11/site-packages/spatial_correlation_sampler/spatial_correlation_sampler.py:104 in forward
        
        /opt/conda/lib/python3.11/site-packages/torch/autograd/function.py:573 in apply
        
        [5/1] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
        
        [7/1] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
/workspace/networks/managers/evaluator.py:474 in evaluating
- /workspace/networks/engines/aotv3_engine.py:657 in match_propogate_one_frame
- /workspace/networks/engines/aotv3_engine.py:384 in match_propogate_one_frame
- /workspace/networks/models/aotv3.py:189 in LSTT_forward
- /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1552 in _wrapped_call_impl
- /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1561 in _call_impl
- /opt/conda/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py:432 in _fn
- /workspace/networks/layers/transformer.py:582 in forward
  - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1552 in _wrapped_call_impl
  - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1561 in _call_impl
  - [2/4] [2/5] [2/8] [2/9] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
  - /workspace/networks/layers/transformer.py:753 in forward
    - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1552 in _wrapped_call_impl
      - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1561 in _call_impl
      - /opt/conda/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py:432 in _fn
      - [3/2] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
      - /workspace/networks/layers/attention.py:314 in forward
        
        [7/2] [7/4] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
    - [8/4] [8/5] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
- /workspace/networks/layers/transformer.py:634 in forward
  - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1552 in _wrapped_call_impl
  - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1561 in _call_impl
  - [2/6] [2/7] [2/10] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
  - /workspace/networks/layers/transformer.py:753 in forward
    - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1552 in _wrapped_call_impl
      - /opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:1561 in _call_impl
      - /opt/conda/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py:432 in _fn
      - [3/3] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
      - /workspace/networks/layers/attention.py:314 in forward
        
        [7/3] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__
    - [8/6] /opt/conda/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py:1115 in __call__

IR dumps

The IR dumps collected dumped intermediate products from various points of the PT2 compilation process. The products are organized by compile id, and then sorted in chronological order.

A compile id uniquely identifies are particular compilation inside a PT2 program. It is traditionally written as [x/y], where the frame id x identifies the particular Python frame which we are compiling, and frame compile id y identifies how many times we've recompiled this same frame. For example, [0/0] refers to the very first frame compiled by PT2; [0/1] refers to the first recompilation of this frame, while [1/0] refers to a different frame, within distinct code cache, which we are compiling next (perhaps because of a graph break). Although Dynamo treats distinct frames as completely unrelated, a frame compilation could overlap with another frame; for example, if you graph break in an inlined function, Dynamo will typically try to compile the nested frame again on an inner frame. You can identify the hierarchical relationship between frames by looking at the stack trie above.

In some situations, the compile id will have an extra signifier [x/y_z], where z is the attempt for this particular (re)compilation. Certain conditions will cause Dynamo to restart analysis, when Dynamo discovers that it needs to undo a decision it previously made. The most common cause of recompilation is a graph break in an inlined function call, which forces to restart and avoid inlining the function in the first place.

Here is a high level description of PT2's compilation phases, and the intermediate products each phase generates:

Optional: If compiled autograd is enabled, and we are processing a backward call, compiled autograd will trace the autograd graph from the autograd engine, and produce an FX graph compiled_autograd_graph that will be Dynamo traced. Otherwise, Dynamo will directly trace user's bytecode.

Dynamo symbolically evaluates the Python bytecode of a program, producing dynamo_output_graph

Optional: If optimize_ddp is enabled, the DDPOptimizer will split the Dynamo output graph to improve pipelining communications. Each split subgraph is optimize_ddp_split_child_submod, and the high level graph that plumbs the graphs together is optimize_ddp_split_graph. If there are multiple splits, each subsequent build product will be produced multiple times, one for each split.

AOTAutograd traces the (possibly split) Dynamo output graph, producing a aot_joint_graph if backwards is enabled. It then partitions the graph into aot_forward_graph and aot_backward_graph. If training is not needed, there may only be an aot_forward_graph.

Inductor will apply some post grad FX passes, producing inductor_post_grad_graph

Inductor will perform code generation, producing the final inductor_output_code which will be executed at runtime. This output is a valid Python program and can be directly run.

Build products below:

Stack trie

Failures and Restarts

IR dumps