Skip to content

[WIP] Tileable Routing Resource Graph Builder #2135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 549 commits into
base: master
Choose a base branch
from
Open

Conversation

tangxifan
Copy link
Contributor

@tangxifan tangxifan commented Aug 16, 2022

Description

Bring the tileable routing resource graph builder from OpenFPGA to VPR.
Full details about the tileable routing resource graph builder can be found at

X. Tang, E. Giacomin, A. Alacchi and P. Gaillardon, "A Study on Switch Block Patterns for Tileable FPGA Routing Architectures," 2019 International Conference on Field-Programmable Technology (ICFPT), 2019, pp. 247-250, doi: 10.1109/ICFPT47387.2019.00039.

fpt2019_final.pdf

https://ieeexplore.ieee.org/document/8977869

Full details about the VIB routing architecture can be found at

https://ieeexplore.ieee.org/document/10416125

The following documentation described add-on syntax to arch file:

https://openfpga.readthedocs.io/en/master/manual/arch_lang/addon_vpr_syntax/#

Related Issue

Motivation and Context

The tileable routing resource graph builder is an alternative routing resource graph builder than the existing one in VTR.
Being compatible with existing data structures (RRGraphView and RRGraphBuilder), this new feature enables VTR to support FPGA devices created by OpenFPGA.

User Interface
The tileable routing resource graph builder can be enabled through XML syntax in architecture description langauge

<layout tileable="true">
    <auto_layout aspect_ratio="1.000000">
      <!--Perimeter of 'io' blocks with 'EMPTY' blocks at corners-->
      <perimeter type="io" priority="100"/>
      <corners type="EMPTY" priority="101"/>
      <!--Fill with 'clb'-->
      <fill type="clb" priority="10"/>
    </auto_layout>
</layout>
<!-- Switch block with a mix of Subset and Universal patterns -->
<device>
  <switch_block type="subset" fs="3" sub_type="universal" sub_fs="3"/>
</device>

The tileable rr_graph generator also supports mixed switch block pattern: The wires which start and end in a switch block have a switch bock pattern, while the wires which pass through a switch block can have another switch block pattern.

  • Added a new option '--skip_sync_clustering_and_routing_results', soi that users can force to bypass the synchronization on clustering results based on routing optimization results. This is made for OpenFPGA's repacker, which has a built-in synchronization that supports more flexible net swapping during routing optimization.
  • Replace the use of SIGSTKSZ in libcatch2 which is not supported in Ubuntu 21.04+
  • Added CMake option VTR_ENABLE_VERSION (by default is on), which allows developers to skip version build when integrating VTR as a submodule
  • Reworked API is_real_param() in read_blif.cpp (borrowed from another feature branch of Antmicro)

Known Limitations

  • No support on dedicated clock network
  • No support on custom switch block patterns

Checklist

  • Add code changes
  • Update documentation - Clarity limitations on XML syntax
  • Add regression tests

Bugs/Issues found

  • num_class in type_descriptor is not used. It is always set to 0 regardless the list size of class_inf. Suggest to remove it.
  • See bugs in resize_node(). It may mistakenly reset node_lookup() when calling it incrementally. When calling reserve_node to pre-allocate memory, such bugs can be bypassed.
  • libcapn'proto relies on an absolute path when compiling dependency. This will cause build errors when using VTR as a submodule
  • When option --write_block_usage is enabled, the block usage is only shown in std:cout or an external file. As a result, the information is not included in the vpr_stdout.log since it is not using VTR_LOG

void writeClusteredNetlistStats(std::string block_usage_filename) {
const auto stats = ClusteredNetlistStats();
// Print out the human readable version to stdout
stats.write(ClusteredNetlistStats::OutputFormat::HumanReadable, std::cout);

How Has This Been Tested?

Here are a list of regression tests to added, in order to support existing features/options in customizing routing resource graphs.

  • Basic tileable rr graph - a homogeneous FPGA sizing from a small array to a medium/large array
  • Strong tileable - a heterogeneous FPGA sizing from a small array to a medium/large array
  • Strong wire types: Use multiple wire types and different connectivity patterns in architecture description
  • Strong chan_width_tileable: Support different (x, y) routing channel widths
  • Strong SB pattern: support mixed switch block pattern, e.g., Wilton + Universal

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@github-actions github-actions bot added libarchfpga Library for handling FPGA Architecture descriptions libvtrutil VPR VPR FPGA Placement & Routing Tool labels Aug 16, 2022
@github-actions github-actions bot added external_libs build Build system lang-make CMake/Make code labels Sep 26, 2022
@github-actions github-actions bot added infra Project Infrastructure scripts Utility & Infrastructure scripts lang-shell Shell scripts (bash etc.) libpugiutil labels Aug 27, 2023
@github-actions github-actions bot added docs Documentation lang-cpp C/C++ code lang-netlist labels Apr 12, 2024
* [vpr][ap] remove redundant print_pb

* Fix styling regressions

* Add reset_bimap helper method to AtomPBBimap

* Remove copying empty bimap from global context to cluster legalizer

* Refactor is_atom_blk_in_pb function to get two t_pb* arguments

* Fix minor styling issues

* [vpr][pack] reomve redundant function calls

* [vpr][place] fix estimated_wl var name

* [APPack] Updated How APPack Adheres to Given Placement

The original implementation of APPack was focused on reconstructing a
given flat placement. This can cause issues if the given flat placement
disagrees with the decisions of the packer.

Instead, updated APPack so that it treats the flat placement as a hint
to help guide how it performs clustering.

Added the following new features:
- APPack computes the location of clusters based on the centroid of the
  molecules packed within.
- APPack attenuates the gain terms of candidates based on their distance
  from the cluster.
- APPack drops candidates which are too far from the cluster being
  created.

Remove adding molecules near to the position of the cluster. This had
similar affects to unrelated clustering and should be investigated
separately later.

With these changes to APPack, the AP flow now improves WL of circuits by
1-3% at the expense of up to 15% runtime compared to the default VPR
flow.

* make format

* [vpr][route] remove redundant functions from rr_graph2

* make format

* [vpr][route] remove redundant functions from rr_graph2

* [libs][rr_graph] change rr_node_indices value type to RRNodeId

* fix formatting issues

* make format

* [AP][GlobalPlacement] Improved Partial Legalizer Legality

Updated the partial legalizer to now take into account block types when
spreading blocks.

This will create windows around overfilled bins that is aware of which
block types are overfilled and how large the window needs to be to
accomodate them. It also takes these block types into account when
spreading to only allow blocks to spread into sub-windows that they can
exist in.

This improves quality but was detremental to performance, so some
performance improvements were needed.

To improve the performance of the partial legalizer, I split the problem
into groups of models which must be spread together. This allows us to
create tighter windows and can make some parts of the legalizer more
efficient. Create a model grouper class which forms the model pack
patterns into a graph and find disconnected sub-graphs to form the model
groups.

Also improved the window generation by pre-clustering the overfilled
bins before creating the windows. This sped up the window generation
code since less windows overlap.

* [vpr][rr_graph] fix comment

* [AP][Solver] Supporting Unfixed Blocks

When no fixed blocks are provided by the user, the AP flow can still
work. Currently, in the first iteration, the solver will put all blocks
at 0,0 and use the legalized solution in the next iteration as fixed
points. Instead of (0,0), it makes more sense to put the blocks in the
center of the device.

Also added a guess to the solver to help CG converge faster each
iteration.

Added a regression test to ensure that not describing the fixed blocks
is supported.

* [vpr] rename arch_opin_between_dice_switch to arch_inter_die_switch since it is used for both 3d CB and 3d SB

* [arch] fix 3d sb arch delay

* [arch] add ipin_cblock switch

* make format

* Update clang-format version to 18

This is the version that is installed by default on Ubuntu 24.04
which we currently run CI and testing on.

* Fix formatting to be compliant with clang-format-18

* [APPack] Flat-Placement Informed Unrelated Clustering

Used flat placement information provided by APPack to try and select
better unrelated candidates. This searches for candidates as close to
the flat placement position of the cluster.

There are two parameters that control how this is performed:

1) max_unrelated_tile_distance decides how far the algorithm will search
   for unrelated candidates. The algorithm will check for candidates in
   the same tile as the cluster, and then will search farther and
   farther out

2) max_unrelated_clustering_attempts decides how many failing attempts
   the cluster will try unrelated clustering. This matches the option of
   the same name in the candidate selector class; but this was made
   separate since likely it will be different for APPack.

* apply comments

* make format

* [vpr][rr_graph] remove flat router parameter from vpr_create_device

* [vpr][stats] add print_resource_usage

* [vpr][base] moove calculate_device_util to stats

* [vpr][pack] include required lib

* add print_device_util to stats

* [vpr][base] print resource usage and device util only if clb netlist is valid

* [vpr][base] remove unused param

* [vpr][base] remove var from doxygen comment

* [vpr][base] check whether instnace exists in netlist

* apply comments

* make format

* [vpr][place] add skip anneal option

* [vpr][place] pass skip_anneal to placer

* [vpr][place] update constraint doc

* [vpr][place] minor update to the doc

* [vtr][script] add run dir to parse script

* [script] remove get_latest_run_dir_number out of util

* [script]  use run dir name instead of only accepting the run dir num

* [script] rename to set_global_run_dir

* make format-py

* fix formatting issue

* [script] fix when run dir is not found

* make format-py

* fix python lint

* add NestedNetlistRouter and custom thread pool

* fix formatting issues

* [script] add class methods

* fix python lint

* fix pylint

* [place] fix the bug to skip anneal when analytic placer is enabled

* [place] rename skip_anneal to quench_only

* [place] add doc for place_quench_only

* [AP][GlobalPlacment] Added Bound2Bound Solver

The Bound2Bound net model is a method to solve for the linear HPWL
objective by iteratively solving a quadratic objective function.

This method does obtain a better quality post-global placement flat
placement; at the expense of being more computationally expensive.

Found that this solver also has numerical stability issues. This may
cause the CG solver to never converge which will hit the iteration limit
of 2 * the number of moveable blocks. This makes this algorithm
quadratic with the number of blocks in the netlist. To resolve this, set
a custom iteration limit. This seems to work well on our benchmarks but
may need to be revisited in the future.

* [AP][GlobalPlacement] Updated B2B Solver According to Feedback

* [vpr][place] rename get_initial_move_lim to get_place_inner_loop_num_move

* fix a typo

* Bump libs/EXTERNAL/libcatch2 from `914aeec` to `76f70b1`

Bumps [libs/EXTERNAL/libcatch2](https://github.com/catchorg/Catch2) from `914aeec` to `76f70b1`.
- [Release notes](https://github.com/catchorg/Catch2/releases)
- [Commits](catchorg/Catch2@914aeec...76f70b1)

---
updated-dependencies:
- dependency-name: libs/EXTERNAL/libcatch2
  dependency-version: 76f70b1403dbc0781216f49e20e45b71f7eccdd8
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* fix a few typos

* added a doxygen comments

* use VTR_LOGV_ERROR instead of is statements

* doxygen comment for load_rr_edge_overrides()

* make format

* only override edge delay and not electrical stuff

* [script] apply comments

* [script] rename get_latest_run_dir to get_active_run_dir

* [AP] Tuned the AP Flow

The AP flow has many tunable knobs which trade-off quality and run time.
Went through each of the knobs to find a good combination.

Updates to the partial legalizer:
- Reversed the order that unplaced large blocks are inserted into partitions.
- Increased the bin cluster gap from 1 to 2
On the largest VTR benchmarks, this decreased the number of overfilled
bins after legalization by 15% and the average overfill of each of those
bins by 40%.
On Titan, the number of overfilled bins decreased by 32% and the average
overfill decreased by 2.5%.

Updates to the analytical solver and global placer:
- Allowed the B2B solver to stop early if it seems to be converging.
- Changed the anchor weights from a linearized term to a quadratic term.
- Decreased the distance epsilon from 0.5 to 0.01.
- Increased the max number of B2B solver iterations from 6 to 24
- Decreased the CG iteration cap from 200 to 150.
- The global placer saves the best legalized placement it has seen and
  returns it as its final result.
On the largest VTR benchmarks, this decreased the post GP HPWL by 22%
and decreased the GP run time by 17%.
On Titan, the post GP HPWL decreased by 25%, and the GP run time
decreased by 19%.

Updates to APPack:
- Decreased the max candidate distance from 0.5 (W + H) to 0.1 (W + H)
  for logical blocks.
- Decreased the max candidate distance for all other blocks to 0.35 (W +
  H)
- Lowered the attenuation distance threshold from 2.0 to 1.75.
- Decreased the attenuation value at the distance threshold to 0.35.
- Increased the max unrelated clustering distance from 1 to 5.
- Increased the max number of unrelated clustering attempts from 2 to
  10.
- Turned off all APPack optimization for RAM blocks.
On the largest VTR benchmarks, this decreased the wirelength by 2% over
the un-tuned AP flow, with a 2.8% decreased pack time.
On Titan, the post FL wirelength decreased by 6% and the post routing
wirelength decreased by 2.6%, with a 0.7% decrease in pack time.

Updates to initial placement:
- Fixed oversight with how the centroid was being calculated.
- Increased the range limit when searching for nearby locations when the
  location a cluster wants is take from 15 to 60.
This further improved the post routing wirelength of Titan to 4.4%
better than the un-tuned AP flow.

I found that there are a lot of issues with the initial placement which
may be blocking a large amount of gains. Will be investigating the
initial placement code soon.

* [Prepacker] Moved the Prepacker Out of Try Pack

The AP flow makes its own prepacker which it uses throughout. However, a
full legalizer in the AP flow (APPack) uses the try_pack method which
creates its own prepacker. This creates two independent prepacker
objects when only one is needed.

Move the construction of the prepacker object into vpr_api and have it
get passed into the try_pack function.

* [script] afix the bug with get_next_run_dir

* python lint

* [vpr][place] update get_place_inner_loop_num_move comment

* [vpr][place] prrint number of moves per temp after getting the number

* make format

* add a unit test for reading edge override file

* Add edge_id() method to find an edge that connects given src and sink nodes

* replace for loop with edge_id() method that return an edge connecting given src/sink nodes

* add doxygen comment for edge_id() method

* verify overridden edge attribute in the unit test

* move operator==() and hash function of t_rr_switch_inf to physical_types.cpp

* add test_read_rr_edge_override.txt

* make format

* add InsertNewlineAtEOF: true to .clang-format

* make format to add new line at EOF

* init value of false for load_flat_placement

* [Pack][Timing] Abstracted How Timing is Used in the Packer

Timing was intermixed into the packer. It appears as though the code
originally was designed to recalculate the timing information every so
often in the packer, but the idea was abandoned. This left timing code
in disperse locations around the Packer and the timing was being
recomputed every time clustering was restarted which was unecessary.

Collecting all of the timing information from the Packer into a single
object called PreClusterTimingManager which abstracts all of the timing
info in the Packer.

The ultimate goal is to bring this Manager class into the AP flow to be
used together with the Global Placer. By sharing this manager class, the
AP flow may be able to update the timing info with flat placement
information to make the timing more accurate.

* [AP][Timing] Added Basic Net Weighting

Added basic timing awareness to the AP flow by weighting nets in the AP
solver by their criticality (the max criticality of all edges through
that net). This makes the solver try to minimize the length of nets that
are more critical more than nets that are less critical (according to
the pre-clustering timing analyzer).

Added a command-line option to tradeoff between timing and wirelength in
the AP flow.

* [AP][Test] Added Titan Nightly Test of WL-Driven AP Flow

* enum class for graph type

* use std::vector for clb_to_clb_directs

* doxygen comment for t_unified_to_parallel_seg_index

* doxygen comment for get_parallel_segs()

* replace t_seg_details* with std::vector<t_seg_details>

* get_seg_track_counts() returns std::vector<int> + doxygen comment

* move local var declarations from beginning of alloc_and_load_seg_details to where they are used

* pass t_chan_width by reference

* remove get_ordered_seg_track_counts()

* remove t_mux, t_pin_spec, and t_mux_size_distribution structs

* add docs for vtr::thread_pool

* add is_root_location to grid

* remove unnecessary calls to clear()

* [AP][InitialPlacement] Improved Initial Placement

Found that the Initial Placer stage of the AP flow (after APPack, but
before Detailed Placement) was not working as expected. The intention
was that clusters would be placed at their centroid location accordin to
the flat placement, and if that site was illegal or taken it would take
a nearby point instead (falling back on the original initial placer if
nothing can be found).

To achieve this, I was using a method called find_centroid_neighbor
which I thought would return the nearest legal location to the given
location. This was not correct. This method just creates a bounding-box
and tries to find a random point in that box around the given point.
This was causing our AP flow to move clusters WAY farther than they
wanted, which moved them into places other clusters wanted to go. This
was also not exhaustive, so it was often falling back on the original
approach which was putting clusters in practically random locations. All
of this was causing the post-FL placement from the AP flow to actually
have worse quality than the default AP flow!

To resolve this, I wrote the actual method I was intending. It performs
a BFS-style search from the src location to all legal locations and
returns the closest one. By doing this BFS on the compressed grid, I
found that this is actually quite efficient. With these changes, I found
that the quality of the post-FL placement more than doubled and the
average atom displacement from the GP solution decrease dramatically.

* move t_seg_details, t_chan_seg_details, and t_chan_details to rr_types.h

* fix compilation error in test_connection router and the warning in rr_graph2.cpp

* move t_sblock_pattern to rr_types.h

* make format

* [vpr][place] remove get_net_wirelength_from_layer_bb_ from netcosthandler class

* [vpr][place] make get_net_wirelength_from_layer_bb_ static function and update its parameters

* [vpr][place] use appropiate wirelength est function

* make format

* [test] add strong 3d

* fix signal 6 in stratix 10 arch strong test

* apply PR comments

* add the requested comments

* update file_formats.rst

* add --read_rr_edge_override to command_line_usage.rst

* remove duplicate text in command_line_usage.rst

* [vpr][place] apply review comments

* make format

* make format

* [vpr][tileable] add include

* remove unused function linear_regression_vector()

* add write_channel_occupancy_to_file()

* write channel coordinate and occupancy percentage to file

* make columns aligns in channel utilization files

* update submodule

* make format

* [libs][arch] return -1 if valid index is not found

* make format

* [libs][arch] comment unused vars

* refactor the code to use the same code for both x and y channels

* [libs][pugiutil] delete pointer

* [libs][pugiutil] format issue

* fix format

* [libs][archfpga] comment parse_pin_name

* [libs][encrypt] break the line to read file

* [vpr][base] call setupvipinf if vib_infs is not empty

* [libs][encrypt] initialize plaintext only if file is open

* [libs][encrypt] use rdbuf to read a file to avoid gcc-13 warning

* [libs][decrypt] rading a file in safe way to prevent gcc13 warning

* [vpr][vib_grid] fix type name if type is nullptr

* [vpr][tileable] resize if segment inf size is not zero

* [vpr][tileable] use empty method instead of checking size

* [vpr][tileable] set the size when defining the vector (gcc warning)

* fix format

* Add Github action to close stale issues
The added workflow will close up to 30 old issues every day.
Issues that have been inactive for more than a year will be
first marked as stale, and if they remain stale after 15 days
they will be automatically closed.

* Add documentation for automatic issue closure

* [test] fix strong constraint

* [lib][arch] check num_interconnect is bigger than zero

* [vpr][route] add a condition to not increment delta_seg if the segment is on the edge

* [vpr][route] fix max seg idx

* fix formatting

* Change some internal packer APIs to not use C-style arrays
This commit changes some functions that used C-style arrays
to use std::vector instead. Previously we used the .data() method
of std::vectors to pass a pointer to these functions.

* pass by reference and typo

* Clean up prepacker

This commit changes two functions in the prepacker to
get the specific element of the array they work with and
not the entire array.

* Change vector variable name to be more inline with the current style

* remove scratch vectors from Move context

* NetCostHandler is the owner of all bb-related data

* remove PlacerMoveContext

* define MoveGenerator::first_rlim

* use #pragma once in move generator header files

* make format

* fix typo

* get_bb_from_scratch_() accepts use_ts as its argument

* [libs][librrgraph] update echo file of rr graph

* [test][strong] update golden result

* [test] update strong tileable golden result

* explain what RR edge override feature is useful for

* [test][tileable] update golden results

* add comment for MoveGenerator::first_rlim

* [STA] Updated SDF File Generation to Include Min Delays

The SDF file generated by the post-implementation netlist writer was
only using the max delays of timing connections in the timing graph. In
the SDF file, it set all values of the rising and falling triples to the
max delay. When using this SDF file for external timing analysis, the
minimum timing (hold) paths were incorrect.

Updated the netlist writer to work with triples instead of bare delays.
This allows (minimum, typical, maximum) delays to be passed through the
different functions and be printed cleanly. For standard delay signals
in the circuit (not setup / hold times) Tatum provides the minimum
delays. These are now being printed in the SDF file and the minimum
timing paths are being found correctly in the external timing analyzer.

Cleaned up some parts of the netlist printing code as well.
1) netlist_writer.cpp declared many functions in the global scope which
   may cause conflicts at link time in VTR. Put all of these methods in
   anonymous namespace to prevent this.
2) The code was casting the delays from seconds to picoseconds in
   strange places. This was tricky to work with since these are both
   stored as doubles. Changed all of the code to only work with delays
   in seconds, and only cast to picoseconds when printing.
3) General cleanup of the header file and the include files.

* [STA] Updated How Un-Initialized Delay Triples are Handled

Thank you to Fred Tombs for pointing out this issue!

* [AP][InitialPlacement] Created Isolated AP Flow

The old Initial Placer used in the AP flow was constructed within the
initial placer of the non-AP flow. This forced the AP flow to try to
place blocks one at a time with minimum displacement. This is non-ideal
since blocks that were placed earlier were being getting first picks at
locations, which may displace a future cluster which may be a better fit
for that location.

Separated out the AP initial placement code. For AP, initial placement
is done in passes.

The first pass will try to place clusters exactly at the tile that the
centroid of all atoms within the cluster want to be placed (according to
the global placement). Any clusters that could not be placed are
reserved for the next pass.

The second pass will allow clusters to be placed within 1 tile of their
centroid.

All subsequent passes will allow cluster to be placed exponentially
farther from their centroid.

The initial placement terminates when all clusters have been placed or
if the max displacement is the size of the entire device.

The clusters are sorted based on the size of the macro that contains
them and the variance of the placement of the atoms within the macro.
This allows large macro blocks with low variance to be placed first.

* add doxygen comment for X_coord, Y_coord, and layer_coord

* remove X_coord and Y_coord from feasibe_region_move_generator

* add comment explaining ts and permanent data members

* make format

* [AP] General Fixed/Unfixed Blocks Cleanup

Fixed a couple of small known issues around the AP flow related to how
we handle fixed blocks.

Offset the fixed block locations by 0.5 such that they are no longer on
the edge. Previously, fixed blocks were placed at the root location of
tiles. This was a problem since atoms would want to be generally close
to the fixed block and may be biased to the bottom/left tiles to the
fixed-block tile. This does not handle large tiles, but will help in
general.

If no fixed blocks are provided, the AP solver will always produce the
trivial solution (all blocks placed on top of one another anywhere on
the device). We were wasting time running bound2bound to solve this and
the solution was probably being put on the bottom-left corner (0,0)
which is not ideal. Instead of running bound2bound during the first
iteration in this case, just placed all blocks in the center of the
device. This greatly speeds up the first iteration when no fixed blocks
are provided.

* Remove atom_net global context mutation from packer

* [vpr][tileable_rr_graph] fix rr_switch usage

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Amir Poolad <[email protected]>
Co-authored-by: Amir Poolad <[email protected]>
Co-authored-by: AlexandreSinger <[email protected]>
Co-authored-by: AlexandreSinger <[email protected]>
Co-authored-by: vaughnbetz <[email protected]>
Co-authored-by: Soheil Shahrouz <[email protected]>
Co-authored-by: Duck Deux <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: soheilshahrouz <[email protected]>
@github-actions github-actions bot added Odin Odin II Logic Synthesis Tool: Unsorted item lang-python Python code lang-hdl Hardware Description Language (Verilog/VHDL) liblog Parmys labels Apr 24, 2025
tangxifan and others added 6 commits April 24, 2025 09:10
* [Bison] Raised Minimum Bison Version from 3.0 to 3.3

Raised the minimum Bison version to 3.3 since deprecation warnings were
showing up in libblifparse and libsdcparse which could not be resolved
unless the Bison version was 3.3.

* more fixes for bitstream generation with flat router

* [Router] Upstream Fine-Grained Parallel Router (FPT'24)

Upstreamed the fine-grained parallel router implementation into the VTR
master. The original branch is https://github.com/verilog-to-routing/vtr-verilog-to-routing/tree/mq-parallel-router.

Modified the MultiQueue (SPAA'24) implementation and integrated it into
the VTR codebase.

* [ParallelRouter] Removed Boost from FG Parallel Router

The original FG parallel router used to use boost. VTR does not install
boost by default. Moved to STL instead.

* [Router] Fix Code Formatting Issues

* [Router] Added ConnectionRouter Abstraction and Reduced Code Duplication

Added a partial abstract class for ConnectionRouter, derived from the
pure abstract ConnectionRouterInterface.

The SerialConnectionRouter and ParallelConnectionRouter classes are now
derived from the ConnectionRouter class, utilizing the common class
members and helper functions to reduce code duplication.

* [Router] Added Code Comments and Documentation for Connection Routers

Added Doxygen-style code comments and documentation for connection
routers, including the ConnectionRouter abstract class, the Parallel-
ConnectionRouter concrete class, and the SerialConnectionRouter concrete
class.

Updated the helper messages for command-line options added for parallel
connection router.

* [Router] Fixed Interface Issues in NestedNetlistRouter and Code Formats

Fixed the interface issues of ConnectionRouter in NestedNetlistRouter.

Fixed code formats.

Fixed typo in read_options.cpp.

* [Router] Updated Command-Line Usage for Parallel Connection Router

Updated the command-line usage for parallel connection router in both
Read the Docs and read_options.cpp.

* [Router] Added Regression Tests for Parallel Connection Router

Added regression tests for parallel connection router by appending extra
sets of configurations to those VTR flow regression tests previously
selected by Fahri for testing coarse-grained parallel router.

Removed VPR connection router test (vpr/test/test_connection_router.cpp),
since it has been out-dated for a very long time and has caused lots of
trouble for running VPR C++ tests locally.

* Fixed Code Formatting Issue

Fixed a weird code formatting issue in libs/librtlnumber/src/include/
internal_bits.hpp. GitHub CI said the file failed dev/check-format.sh,
however, the same script runs perfectly in my local environment. Double
checked the version of clang-format, which seemed to be the same as CI.

Directly copied the file from the GitHub repo to resolve this issue.

* [Router] Fixed `No source in route tree` in ParallelConnectionRouter

The `No source in route tree` bug in ParallelConnectionRouter (since
commit 875b98e) has been fixed. It turns out that putting another member
variable `MultiQueueDAryHeap<HeapImplementation::arg_D> heap_` in the
derived class ParallelConnectionRouter together with the existing
`HeapImplementation heap_` in the base class ConnectionRouter causes the
issue. The solution is to keep `heap_` only in the base class and use
`ConnectionRouter<MultiQueueDAryHeap<HeapImplementation::arg_D>>` rather
than `ConnectionRouter<HeapImplementation>` for deriving the parallel
connection router.

Please note that ParallelConnectionRouter still has some bugs (i.e.,
getting stuck in the MultiQueue pop). This commit is not fully working.
Please do not use it for any experiments.

Updated the previously incorrect command-line options for the parallel
connection router in the regression tests.

* [AP][MassLegalizer] Revistited Mass Legalizer

Found that the mass legalizer was not spreading out the blocks well
enough according to the mass.

Revistied the spatial partitioning in the mass legalizer. Before, we
just cut the window in half in the larger dimension. This was fine,
however it may create an inbalanced cut which can cause things to not
spread well. Instead, we now search for the best partition by trying
different partition lines and computing how balanced the partition is.
Although this is more expensive than before, by creating more balanced
partitions, it should allow the mass legalizer to converge faster. Time
in the mass legalizer is also dominated by partitioning the blocks, so
increasing the time to choose the partition line should not have that
large of an effect anyways.

Found an oversight with how blocks were partitioned when one of the
partitions become overfilled. Fixed this issue.

* Inverse use of macro_can_be_placed argument check_all_legality to align with meaning

* [vpr][pack] fix merge issues w/ flat sync list

* make format

* [packages] add clang-format

* make format 2

* Invalid C++ fix

* [docker] set ubuntu version to 24.04

* [dockerfile] enable system-wide python package installation for pip

* [dockerfile] add comment

* [package] check whehter clang-format-18 package exist

* [package] remove deprecated names-only option

* [package] remove if condition

* [doc] update quick start on installing packages

* make format
enum class e_rr_type
a few remaining t_rr_type vals
CHANY ---> t_rr_type::CHANY
CHANX ---> t_rr_type::CHANX
OPIN ---> t_rr_type::OPIN
IPIN ---> t_rr_type::IPIN
SINK ---> t_rr_type::SINK
SOURCE ---> t_rr_type::SOURCE

* [Router] Finally fixed the weird bug in parallel connection router

Fixed the weird bug in parallel connection router as mentioned in commit
f73212c. The bug occurred because two function parameters 'num_threads'
and 'num_queues' have been misplaced when instantiating the MQ_IO. This
took two weeks to figure out exactly.

The VTR benchmark (`vtr_reg_qor_chain` task) has been tested/passed for
different cases (1) 'serial mode' 1T+2Q (1 thread, 2 queues), (2) 2T+4Q,
and (3) 4T+2Q.

The determinism has also been verified for the VTR benchmark.

* [Router] Fixed Code Review Comments and Cleanup Codebase

Added more explanation to the command-line options messages and code
comments.

Cleaned up ParallelConnectionRouter-related codebase.

* [doc] clarify that clang-format is not required to build VPR

* remove typedef t_rr_type

* doxygen comment for Direction

* add vtr::array class

* make rr_node_typename of type vtr::array to index it only with e_rr_type

* add default constructor to vtr::array

* access rr_node_indices_ with e_rr_type instead of casting to size_t

* add single argument constructor to vtr::array

* [Router] Updated Golden Results for Parallel Connection Router CI Tests

Updated the golden results for CI tests for parallel connection router:
 - `vtr_reg_strong/koios_test`
 - `vtr_reg_strong/strong_flat_router`
 - `vtr_reg_strong/strong_multiclock`
 - `vtr_reg_strong/strong_timing`

* use vtr::array to index some arrays using e_rr_type

* make format

* avoid using e_rr_type and casting it in place_macro

* [vpr][base] fix assigned pb_graph_pin when graph node is not primitive

* [vpr][pack] pass logical type to alloc_and_laod_pb_route

* [vpr][pack] update alloc_and_load_pb_route header file

* [vpr][pack] fix pb_graph_pin assignment in load_trace_to_pb_route

* [test] keep 3d sb and cb tests

* [vpr][pack] add intra_lb_pb_pin_lookup_ to cluster legalizer

* [vpr][pack] initializer intra_lb_pb_pin_lookup and pass it to alloc_and_load_pb_route

* [vpr][pack] use intra_lb_pb_pin_lookup to get pb_pin from pin number

* make format

* add vtr::array to docs

* [vpr][pack] remove casting net id

* [vpr][pack] add doxygen comment for alloc_and_load_pb_route

* [vpr][pack] remove redundant parameters

* [vpr][pack] polish load_trace_to_pb_route

* make format

* [vpr][pack] fix parameter shadowing

* [AP][HotFix] Fixed Bug With Solver Putting Blocks Off-Device

After moving fixed blocks to the center of tiles, there is a very small
chance that blocks go off the device due to rounding. This is such a
small effect that it does not show up locally on my machine, but it
shows up on CI. Clamping the positions of blocks after solving to be
just within the device region.

* Increase the daily stale issue action API call limit

* [vpr][pack] add a method to get root_ipin

* [vpr][pack] remove unused var

* [Router] Added Assert for MQ_IO numQueues and Updated Golden Results

Added assert for MultiQueueIO numQueues to ensure it must be greater
than two.

Updated CI test tasks to ensure the parallel connection router runs in
Dijkstra mode to ensure determinism and avoid hanging in CI runs.

* [AP][HotFix] Placed Fixed Blocks First During IP

The cost terms in the AP initial placer were not placing fixed blocks
early enough, causing other blocks to take their place and causing the
initial placer to not return a solution.

Blocks which have region constraints are now placed first based on how
constrained they are. More constrained blocks (can only be placed in a
smaller region) will be placed first.

Also found that macros that contained fixed blocks were not observing
these constraints when calculating the centroid position of the macro.
For constrained macros, projected the centroid position onto the
partition region to get the closest point in the partition region to the
calculated centroid. This new centroid is used to then perform the
placement.

* [STA] Added Option to Remove Parameters from Post-Implementation Netlist

When performing post-implementation timing analysis using OpenSTA, the
generated netlist cannot use parameters since each module needs to
correspond with a cell in a liberty file.

Added a command-line option which tells the netlist writer to not use
parameters when generating the netlist. If a primitive cannot be
generated without using parameters, it will error out.

* [Tatum][Parse] Fixed Extraneous Warning With get_clocks

The get_clocks command is used in an SDC file to reference a set of
clocks by name using a regex string. The code to do this tries to
produce a warning if get_clocks is used on a regex string and no clocks
could be found. The issue is that the code to do this was mistakenly
producing this warning for each clock in the circuit. For example, if we
had {clk1, clk2, clk3} and we wanted to do "get_clocks {clk3}", we will
get two warnings since clk1 and clk2 did not match.

Fixed this by moving the warning out of one loop nest.

* Remove PR staling
This commit sets the number of days before marking
issues or PRs as stale to 100 years. This number is
overriden for issues to be 1 years but stays 100 years
for PRs. This means that PR effectively do not get
marked as stale.

* [LibArchFPGA] Updating Model Data Structures

The logical models (the technology-mapped logical blocks) for an
architecture were stored using two independent linked lists. One for the
library models (the models that all architectures have, such as luts and
ffs) and one of the user models. This linked lists were hard to traverse
and were injecting pointers all across VPR.

Created a new class to store and manage the logical models. This class
maintains a unique ID for each logical model (similar to the netlist
data structures in VPR). It also contains helper methods to make working
with the logical models easier.

* fix comments from alex

* revert prepacker changes

* [vpr][pack] add get_pattern_blocks

* [vpr][pack] add blocks in get_all_connected_primitive_pins if they are a part of the pattern

* make format

* Bump libs/EXTERNAL/libcatch2 from `76f70b1` to `5abfc0a`

Bumps [libs/EXTERNAL/libcatch2](https://github.com/catchorg/Catch2) from `76f70b1` to `5abfc0a`.
- [Release notes](https://github.com/catchorg/Catch2/releases)
- [Commits](catchorg/Catch2@76f70b1...5abfc0a)

---
updated-dependencies:
- dependency-name: libs/EXTERNAL/libcatch2
  dependency-version: 5abfc0aa9c1ef4cb40c9f387495134dab02e1af2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* [vpr][pack] add more comments

* Add helper functions to t_pb_type

* Change t_pb_type users to use helper functions

* Add documentation for t_pb_type::is_root and is_primitive

* Fix formatting in libarchfpga/physical_types.h

* [vpr][pack] change count method to find

* [Router] Updated the Regression Tests and Corresponding Golden Results

Changed `multi_queue_num_threads` and `multi_queue_num_queues` settings
in the CI strong regression tests to avoid QoR failure in the CI runs.

The coverage of the regression tests for parallel connection router
after this change is still fair.

* [vpr][CLI] add generate_net_timing_report

* [vpr][route] remove debugging msg

* [vpr][analysis] add generate_net_timing_report

* [vpr][pack] apply formatting comments

* make format

* [vpr][analysis] add comments

* make format

* [vpr][CLI] remove generate net timing from CLI parameters and generate the report by default

* Unused Packer Options Cleanup (#2976)

* Standardized and renamed packer alpha and beta variable. They are now referred to as timing_gain_weight and connection_gain_weight, used as a weight parameter during timing and connection driven clustering respectively. Removed global_clocks, use_attraction_groups, pack_num_moves, pack_move_type from packer.

* [APPack] Updated Max Candidate Distance Interface

The max candidate distance is used by APPack to decide which molecules
to ignore when packing, based on their distance from the cluster being
formed.

Cleaned up the interface of this by pre-computing the max candidate
distance of all logical blocks ahead of time and reading from these
pre-computed values during packing.

Added a command-line option to allow the user to override some or all of
these max distance thresholds. By default, VPR will select values based
on the type of logical block and the primitives it contains.

Fixed issue with APPack creating too many IO blocks for some circuits
due to the max candidate distance thresholds for IO blocks being too
low.

More tuning should be done on these values once the mass legalizer has
been cleaned up a bit more.

* [vtr][parse] fix pattern for init place wl

* [vpr][analysis] add header for net timing report

* [vpr][analysis] add timing format to comments

* formatting fix

* Revert "[vpr][CLI] remove generate net timing from CLI parameters and generate the report by default"

This reverts commit b8289db.

* make format

* [STA] Generating SDC Commands Post-Implementation

Added an option to have VPR generate an SDC file containing the timing
commands required for an external timing analysis of the post-
implementation netlist to match VPR's timing analysis.

* [STA] Added Tutorial for Post-Implementation Timing Analysis

Created a tutorial demonstrating how OpenSTA can be used after VPR to
perform static timing analysis.

* Add artifact upload to nightly test workflow

* t_det_routing_arch* --> const t_det_routing_arch&

* t_chan_width_dist ---> const t_chan_width_dist&

* make format

* fix compilation error in route_diag by passing det_routing_arch argument by reference instead of pointer

* [task] add generate_net_timing_report to timing report strong test

* [doc] add doc for generating _net_timing_report command line option

* [vpr][timing] update generate_net_timing_report comment

* [vpr][timing] add get_net_bounding_box

* [vpr][timing] add net bounding box to the report

* [test] add test for net timing report

* [doc] update doc with new format to net timing report

* [vpr][analysis] fix net timing report bugs + including layer min/max of bb

* make format

* [vpr][analysis] capture vars by reference in lambda

* [packer] Changing the vector of candidate molecules into LazyPopUniquePriorityQueue.

    The class LazyPopUniquePriorityQueue is a priority queue that allows for lazy deletion of elements.
    It is implemented using a vector and 2 sets, one set keeps track of the elements in the queue, and the other set keeps track of the elements that are pending deletion.
    The queue is sorted by the sort-value(SV) of the elements, and the elements are stored in a vector.
    The set is used to keep track of the elements that are pending deletion, so that they can be removed from the queue when they are popped.
    The class definiation can be found in vpr/src/util/lazy_pop_unique_priority_queue.h

    Currently, the class supports the following functions:
        LazyPopUniquePriorityQueue::push(): Pushes a key-sort-value (K-SV) pair into the priority queue and adds the key to the tracking set.
        LazyPopUniquePriorityQueue::pop(): Returns the K-SV pair with the highest SV whose key is not pending deletion.
        LazyPopUniquePriorityQueue::remove(): Removes an element from the priority queue immediately.
        LazyPopUniquePriorityQueue::remove_at_pop_time(): Removes an element from the priority queue when it is popped.
        LazyPopUniquePriorityQueue::empty(): Returns whether the queue is empty.
        LazyPopUniquePriorityQueue::clear(): Clears the priority queue vector and the tracking sets.
        LazyPopUniquePriorityQueue::size(): Returns the number of elements in the queue.
        LazyPopUniquePriorityQueue::contains(): Returns true if the key is in the queue, false otherwise.

* [packer] recollected golden results for regression basic, basic_odin, strong, strong_odin

* [packer] recollected golden results for Nightly

* add pointer to VTR9 paper in the readme

* Add documentation to explain which parts of VPR are parellel

* pass t_chan_width by reference

* doxygen comment for alloc_and_load_rr_node_indices

* add doxygen comments for load_block_rr_indices()

* [AP][Solver] Enabled Parallel Eigen

The Eigen solver has the ability to use OpenMP to run the solver
computations in parallel. Made the AP flow use the num_workers option to
set the number of threads that Eigen can use.

VPR did not have the ability to build with OpenMP in its CMAKE. Added an
option to the CMAKE to allow the user to enable OpenMP.

* remove unused is_flat argument from alloc_and_load_rr_node_indices() and load_block_rr_indices()

* use (x, y) convention for CHANX instead of (y, x)

* make format

* cast x/y to size_t

* get rid of warnings in RRSpatialLookup::find_nodes()

* Add references to the main VTR papers in the documentation.

* Add link to the VTR 9 paper in the documentation

* Add link to the VTR 9 paper in the README

* add a closing ) to the text printed by node_coordinate_to_string()

* fix the x/y mismatch for CHANX nodes in rr_nodes and rr_node_indices

* reserve nodes using x/y instead of chan/seg

* fix a typo

* add rr_graph_genearion directory

* resize node lookup for CHANX nodes in RR graph serializer

* add rr_node_indices.cpp/.h

* add doxygen comment for load_chan_rr_indices()

* [Infra] Updated Install Packages Script For Backwards Compatibility

The install_apt_packages.sh script is no longer backward compatible with
older versions of Ubuntu due to the dependency on clang-format-18.

Added an if statement to check if the distribution can support
clang-format-18 and only installing it if it can.

Added this script to the CI build process so it can always be tested
within the CI to prevent future regression.

* [RegTest] Disabled `strong_multiclock` test for parallel connection router

Temporarily disabled the `strong_multiclock` test in `vtr_reg_strong` CI
regression tests for the parallel connection router, due to some random
failures as mentioned in Issue #3029.

After fixing the problem with the `strong_multiclock` test, this will be
reactivated.

* [doc] update the doc with new report format

* [RegTest] Updated golden results for `strong_multiclock` regression test

Removed the golden results of parallel connection router test cases for
`strong_multiclock` regression test.

* [vpr][analysis] use std::min/max instead of if condition

* Add documentation for include sanitization

* [vpr][analysis] change report_net_timing format to csv

* [vpr][analysis] update comments

* [vpr][analysis] print constant nets in  the net timing report

* [vpr][analysis] apply comments

* [vpr][analysis] fix function name

* [doc] add net timing report use case

* fix a typo

* [Infra] Cleaned Up Include Files in VPR Base Directory

Many include files in the base directory contained includes to other
headers which they do not use. This causes many CPP files to include way
more header files than they need, increasing the incremental build time.

This process needs to be done on the entire VTR repo, but I found that
the base directory was one of the biggest culprits of this and the
hardest to untangle.

* [Infra] Cleaned Up Header Files in Pack Folder

Went through the header files in the pack folder and resolved any unused
header files.

* [AP] Removed Old Cluster-Level AP Flow

Prior to the flat AP flow, a cluster-level AP flow existed in VPR which
performed a SimPL-style algorithm on the clusters created during packing
before performing a placement quench.

Although well-written, this flow was not shown to outperform the SA
placer in VPR. It has also been becoming confusing to keep in VPR since
the new flat AP flow supercedes it. It is unclear if a cluster-level AP
flow will work well with the flat AP flow; however in that case the
cluster-level AP flow would be made using the new AP APIs written.

Removed the old cluster-level AP flow to reduce confusion.

* [Infra] Cleaned Up Header Files in Place Folder

* [lib][rr_graph] replace t_rr_type with e_rr_type

* [vpr][tileable] remove t_rr_type usage

* make is_io_type() a member function of t_physical_tile_type

* replace calls to is_io_type() with t_physical_tile_type::is_io()

* make format

* fix compiler bugs

* make format

* [lib][libutil] fix size_t issue

* inline  t_physical_tile_type::is_io()

* add doxygen comments for alloc_and_load_tile_rr_node_indices()

* [libs][vtrutil] use generate instead of fill to avoid getting potential null pointer dereference

* document alloc_and_load_rr_node_indices() arguments

* made a few function operating on t_pb_type its member functions

* add router_lookahead directory

* [STA] Added Multiclock Incremental STA Consistency Check

The incremental STA consistency coverage was very good, but was just
missing a multiclock circuit with an SDC file.

Added a quick test.

* [libs][rr_graph] don't reverse xy when calling node lookup

* [vpr][util] consider medium node type as inter cluster node

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: AlexandreSinger <[email protected]>
Co-authored-by: Fahrican Kosar <[email protected]>
Co-authored-by: AlexandreSinger <[email protected]>
Co-authored-by: Hang Yan <[email protected]>
Co-authored-by: Fred Tombs <[email protected]>
Co-authored-by: soheilshahrouz <[email protected]>
Co-authored-by: Soheil Shahrouz <[email protected]>
Co-authored-by: Amir Poolad <[email protected]>
Co-authored-by: Amir Poolad <[email protected]>
Co-authored-by: vaughnbetz <[email protected]>
Co-authored-by: Fred Tombs <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: James Yen <[email protected]>
Co-authored-by: Rongbo Zhang <[email protected]>
Co-authored-by: Rongbo Zhang <[email protected]>
Co-authored-by: Mohamed Elgammal <[email protected]>
* [Router] Added ConnectionRouter Abstraction and Reduced Code Duplication

Added a partial abstract class for ConnectionRouter, derived from the
pure abstract ConnectionRouterInterface.

The SerialConnectionRouter and ParallelConnectionRouter classes are now
derived from the ConnectionRouter class, utilizing the common class
members and helper functions to reduce code duplication.

* [Router] Added Code Comments and Documentation for Connection Routers

Added Doxygen-style code comments and documentation for connection
routers, including the ConnectionRouter abstract class, the Parallel-
ConnectionRouter concrete class, and the SerialConnectionRouter concrete
class.

Updated the helper messages for command-line options added for parallel
connection router.

* [Router] Fixed Interface Issues in NestedNetlistRouter and Code Formats

Fixed the interface issues of ConnectionRouter in NestedNetlistRouter.

Fixed code formats.

Fixed typo in read_options.cpp.

* [Router] Updated Command-Line Usage for Parallel Connection Router

Updated the command-line usage for parallel connection router in both
Read the Docs and read_options.cpp.

* [Router] Added Regression Tests for Parallel Connection Router

Added regression tests for parallel connection router by appending extra
sets of configurations to those VTR flow regression tests previously
selected by Fahri for testing coarse-grained parallel router.

Removed VPR connection router test (vpr/test/test_connection_router.cpp),
since it has been out-dated for a very long time and has caused lots of
trouble for running VPR C++ tests locally.

* Fixed Code Formatting Issue

Fixed a weird code formatting issue in libs/librtlnumber/src/include/
internal_bits.hpp. GitHub CI said the file failed dev/check-format.sh,
however, the same script runs perfectly in my local environment. Double
checked the version of clang-format, which seemed to be the same as CI.

Directly copied the file from the GitHub repo to resolve this issue.

* [Router] Fixed `No source in route tree` in ParallelConnectionRouter

The `No source in route tree` bug in ParallelConnectionRouter (since
commit 875b98e) has been fixed. It turns out that putting another member
variable `MultiQueueDAryHeap<HeapImplementation::arg_D> heap_` in the
derived class ParallelConnectionRouter together with the existing
`HeapImplementation heap_` in the base class ConnectionRouter causes the
issue. The solution is to keep `heap_` only in the base class and use
`ConnectionRouter<MultiQueueDAryHeap<HeapImplementation::arg_D>>` rather
than `ConnectionRouter<HeapImplementation>` for deriving the parallel
connection router.

Please note that ParallelConnectionRouter still has some bugs (i.e.,
getting stuck in the MultiQueue pop). This commit is not fully working.
Please do not use it for any experiments.

Updated the previously incorrect command-line options for the parallel
connection router in the regression tests.

* [AP][MassLegalizer] Revistited Mass Legalizer

Found that the mass legalizer was not spreading out the blocks well
enough according to the mass.

Revistied the spatial partitioning in the mass legalizer. Before, we
just cut the window in half in the larger dimension. This was fine,
however it may create an inbalanced cut which can cause things to not
spread well. Instead, we now search for the best partition by trying
different partition lines and computing how balanced the partition is.
Although this is more expensive than before, by creating more balanced
partitions, it should allow the mass legalizer to converge faster. Time
in the mass legalizer is also dominated by partitioning the blocks, so
increasing the time to choose the partition line should not have that
large of an effect anyways.

Found an oversight with how blocks were partitioned when one of the
partitions become overfilled. Fixed this issue.

* Inverse use of macro_can_be_placed argument check_all_legality to align with meaning

* [vpr][pack] fix merge issues w/ flat sync list

* make format

* [packages] add clang-format

* make format 2

* Invalid C++ fix

* [docker] set ubuntu version to 24.04

* [dockerfile] enable system-wide python package installation for pip

* [dockerfile] add comment

* [package] check whehter clang-format-18 package exist

* [package] remove deprecated names-only option

* [package] remove if condition

* [doc] update quick start on installing packages

* make format
enum class e_rr_type
a few remaining t_rr_type vals
CHANY ---> t_rr_type::CHANY
CHANX ---> t_rr_type::CHANX
OPIN ---> t_rr_type::OPIN
IPIN ---> t_rr_type::IPIN
SINK ---> t_rr_type::SINK
SOURCE ---> t_rr_type::SOURCE

* [Router] Finally fixed the weird bug in parallel connection router

Fixed the weird bug in parallel connection router as mentioned in commit
f73212c. The bug occurred because two function parameters 'num_threads'
and 'num_queues' have been misplaced when instantiating the MQ_IO. This
took two weeks to figure out exactly.

The VTR benchmark (`vtr_reg_qor_chain` task) has been tested/passed for
different cases (1) 'serial mode' 1T+2Q (1 thread, 2 queues), (2) 2T+4Q,
and (3) 4T+2Q.

The determinism has also been verified for the VTR benchmark.

* [Router] Fixed Code Review Comments and Cleanup Codebase

Added more explanation to the command-line options messages and code
comments.

Cleaned up ParallelConnectionRouter-related codebase.

* [doc] clarify that clang-format is not required to build VPR

* remove typedef t_rr_type

* doxygen comment for Direction

* add vtr::array class

* make rr_node_typename of type vtr::array to index it only with e_rr_type

* add default constructor to vtr::array

* access rr_node_indices_ with e_rr_type instead of casting to size_t

* add single argument constructor to vtr::array

* [Router] Updated Golden Results for Parallel Connection Router CI Tests

Updated the golden results for CI tests for parallel connection router:
 - `vtr_reg_strong/koios_test`
 - `vtr_reg_strong/strong_flat_router`
 - `vtr_reg_strong/strong_multiclock`
 - `vtr_reg_strong/strong_timing`

* use vtr::array to index some arrays using e_rr_type

* make format

* avoid using e_rr_type and casting it in place_macro

* [vpr][base] fix assigned pb_graph_pin when graph node is not primitive

* [vpr][pack] pass logical type to alloc_and_laod_pb_route

* [vpr][pack] update alloc_and_load_pb_route header file

* [vpr][pack] fix pb_graph_pin assignment in load_trace_to_pb_route

* [test] keep 3d sb and cb tests

* [vpr][pack] add intra_lb_pb_pin_lookup_ to cluster legalizer

* [vpr][pack] initializer intra_lb_pb_pin_lookup and pass it to alloc_and_load_pb_route

* [vpr][pack] use intra_lb_pb_pin_lookup to get pb_pin from pin number

* make format

* add vtr::array to docs

* [vpr][pack] remove casting net id

* [vpr][pack] add doxygen comment for alloc_and_load_pb_route

* [vpr][pack] remove redundant parameters

* [vpr][pack] polish load_trace_to_pb_route

* make format

* [vpr][pack] fix parameter shadowing

* [AP][HotFix] Fixed Bug With Solver Putting Blocks Off-Device

After moving fixed blocks to the center of tiles, there is a very small
chance that blocks go off the device due to rounding. This is such a
small effect that it does not show up locally on my machine, but it
shows up on CI. Clamping the positions of blocks after solving to be
just within the device region.

* Increase the daily stale issue action API call limit

* [vpr][pack] add a method to get root_ipin

* [vpr][pack] remove unused var

* [Router] Added Assert for MQ_IO numQueues and Updated Golden Results

Added assert for MultiQueueIO numQueues to ensure it must be greater
than two.

Updated CI test tasks to ensure the parallel connection router runs in
Dijkstra mode to ensure determinism and avoid hanging in CI runs.

* [AP][HotFix] Placed Fixed Blocks First During IP

The cost terms in the AP initial placer were not placing fixed blocks
early enough, causing other blocks to take their place and causing the
initial placer to not return a solution.

Blocks which have region constraints are now placed first based on how
constrained they are. More constrained blocks (can only be placed in a
smaller region) will be placed first.

Also found that macros that contained fixed blocks were not observing
these constraints when calculating the centroid position of the macro.
For constrained macros, projected the centroid position onto the
partition region to get the closest point in the partition region to the
calculated centroid. This new centroid is used to then perform the
placement.

* [STA] Added Option to Remove Parameters from Post-Implementation Netlist

When performing post-implementation timing analysis using OpenSTA, the
generated netlist cannot use parameters since each module needs to
correspond with a cell in a liberty file.

Added a command-line option which tells the netlist writer to not use
parameters when generating the netlist. If a primitive cannot be
generated without using parameters, it will error out.

* [Tatum][Parse] Fixed Extraneous Warning With get_clocks

The get_clocks command is used in an SDC file to reference a set of
clocks by name using a regex string. The code to do this tries to
produce a warning if get_clocks is used on a regex string and no clocks
could be found. The issue is that the code to do this was mistakenly
producing this warning for each clock in the circuit. For example, if we
had {clk1, clk2, clk3} and we wanted to do "get_clocks {clk3}", we will
get two warnings since clk1 and clk2 did not match.

Fixed this by moving the warning out of one loop nest.

* Remove PR staling
This commit sets the number of days before marking
issues or PRs as stale to 100 years. This number is
overriden for issues to be 1 years but stays 100 years
for PRs. This means that PR effectively do not get
marked as stale.

* [LibArchFPGA] Updating Model Data Structures

The logical models (the technology-mapped logical blocks) for an
architecture were stored using two independent linked lists. One for the
library models (the models that all architectures have, such as luts and
ffs) and one of the user models. This linked lists were hard to traverse
and were injecting pointers all across VPR.

Created a new class to store and manage the logical models. This class
maintains a unique ID for each logical model (similar to the netlist
data structures in VPR). It also contains helper methods to make working
with the logical models easier.

* fix comments from alex

* revert prepacker changes

* [vpr][pack] add get_pattern_blocks

* [vpr][pack] add blocks in get_all_connected_primitive_pins if they are a part of the pattern

* make format

* Bump libs/EXTERNAL/libcatch2 from `76f70b1` to `5abfc0a`

Bumps [libs/EXTERNAL/libcatch2](https://github.com/catchorg/Catch2) from `76f70b1` to `5abfc0a`.
- [Release notes](https://github.com/catchorg/Catch2/releases)
- [Commits](catchorg/Catch2@76f70b1...5abfc0a)

---
updated-dependencies:
- dependency-name: libs/EXTERNAL/libcatch2
  dependency-version: 5abfc0aa9c1ef4cb40c9f387495134dab02e1af2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* [vpr][pack] add more comments

* Add helper functions to t_pb_type

* Change t_pb_type users to use helper functions

* Add documentation for t_pb_type::is_root and is_primitive

* Fix formatting in libarchfpga/physical_types.h

* [vpr][pack] change count method to find

* [Router] Updated the Regression Tests and Corresponding Golden Results

Changed `multi_queue_num_threads` and `multi_queue_num_queues` settings
in the CI strong regression tests to avoid QoR failure in the CI runs.

The coverage of the regression tests for parallel connection router
after this change is still fair.

* [vpr][CLI] add generate_net_timing_report

* [vpr][route] remove debugging msg

* [vpr][analysis] add generate_net_timing_report

* [vpr][pack] apply formatting comments

* make format

* [vpr][analysis] add comments

* make format

* [vpr][CLI] remove generate net timing from CLI parameters and generate the report by default

* Unused Packer Options Cleanup (#2976)

* Standardized and renamed packer alpha and beta variable. They are now referred to as timing_gain_weight and connection_gain_weight, used as a weight parameter during timing and connection driven clustering respectively. Removed global_clocks, use_attraction_groups, pack_num_moves, pack_move_type from packer.

* [APPack] Updated Max Candidate Distance Interface

The max candidate distance is used by APPack to decide which molecules
to ignore when packing, based on their distance from the cluster being
formed.

Cleaned up the interface of this by pre-computing the max candidate
distance of all logical blocks ahead of time and reading from these
pre-computed values during packing.

Added a command-line option to allow the user to override some or all of
these max distance thresholds. By default, VPR will select values based
on the type of logical block and the primitives it contains.

Fixed issue with APPack creating too many IO blocks for some circuits
due to the max candidate distance thresholds for IO blocks being too
low.

More tuning should be done on these values once the mass legalizer has
been cleaned up a bit more.

* [vtr][parse] fix pattern for init place wl

* [vpr][analysis] add header for net timing report

* [vpr][analysis] add timing format to comments

* formatting fix

* Revert "[vpr][CLI] remove generate net timing from CLI parameters and generate the report by default"

This reverts commit b8289db.

* make format

* [STA] Generating SDC Commands Post-Implementation

Added an option to have VPR generate an SDC file containing the timing
commands required for an external timing analysis of the post-
implementation netlist to match VPR's timing analysis.

* [STA] Added Tutorial for Post-Implementation Timing Analysis

Created a tutorial demonstrating how OpenSTA can be used after VPR to
perform static timing analysis.

* Add artifact upload to nightly test workflow

* t_det_routing_arch* --> const t_det_routing_arch&

* t_chan_width_dist ---> const t_chan_width_dist&

* make format

* fix compilation error in route_diag by passing det_routing_arch argument by reference instead of pointer

* [task] add generate_net_timing_report to timing report strong test

* [doc] add doc for generating _net_timing_report command line option

* [vpr][timing] update generate_net_timing_report comment

* [vpr][timing] add get_net_bounding_box

* [vpr][timing] add net bounding box to the report

* [test] add test for net timing report

* [doc] update doc with new format to net timing report

* [vpr][analysis] fix net timing report bugs + including layer min/max of bb

* make format

* [vpr][analysis] capture vars by reference in lambda

* [packer] Changing the vector of candidate molecules into LazyPopUniquePriorityQueue.

    The class LazyPopUniquePriorityQueue is a priority queue that allows for lazy deletion of elements.
    It is implemented using a vector and 2 sets, one set keeps track of the elements in the queue, and the other set keeps track of the elements that are pending deletion.
    The queue is sorted by the sort-value(SV) of the elements, and the elements are stored in a vector.
    The set is used to keep track of the elements that are pending deletion, so that they can be removed from the queue when they are popped.
    The class definiation can be found in vpr/src/util/lazy_pop_unique_priority_queue.h

    Currently, the class supports the following functions:
        LazyPopUniquePriorityQueue::push(): Pushes a key-sort-value (K-SV) pair into the priority queue and adds the key to the tracking set.
        LazyPopUniquePriorityQueue::pop(): Returns the K-SV pair with the highest SV whose key is not pending deletion.
        LazyPopUniquePriorityQueue::remove(): Removes an element from the priority queue immediately.
        LazyPopUniquePriorityQueue::remove_at_pop_time(): Removes an element from the priority queue when it is popped.
        LazyPopUniquePriorityQueue::empty(): Returns whether the queue is empty.
        LazyPopUniquePriorityQueue::clear(): Clears the priority queue vector and the tracking sets.
        LazyPopUniquePriorityQueue::size(): Returns the number of elements in the queue.
        LazyPopUniquePriorityQueue::contains(): Returns true if the key is in the queue, false otherwise.

* [packer] recollected golden results for regression basic, basic_odin, strong, strong_odin

* [packer] recollected golden results for Nightly

* add pointer to VTR9 paper in the readme

* Add documentation to explain which parts of VPR are parellel

* pass t_chan_width by reference

* doxygen comment for alloc_and_load_rr_node_indices

* add doxygen comments for load_block_rr_indices()

* [AP][Solver] Enabled Parallel Eigen

The Eigen solver has the ability to use OpenMP to run the solver
computations in parallel. Made the AP flow use the num_workers option to
set the number of threads that Eigen can use.

VPR did not have the ability to build with OpenMP in its CMAKE. Added an
option to the CMAKE to allow the user to enable OpenMP.

* remove unused is_flat argument from alloc_and_load_rr_node_indices() and load_block_rr_indices()

* use (x, y) convention for CHANX instead of (y, x)

* make format

* cast x/y to size_t

* get rid of warnings in RRSpatialLookup::find_nodes()

* Add references to the main VTR papers in the documentation.

* Add link to the VTR 9 paper in the documentation

* Add link to the VTR 9 paper in the README

* add a closing ) to the text printed by node_coordinate_to_string()

* fix the x/y mismatch for CHANX nodes in rr_nodes and rr_node_indices

* reserve nodes using x/y instead of chan/seg

* fix a typo

* add rr_graph_genearion directory

* resize node lookup for CHANX nodes in RR graph serializer

* add rr_node_indices.cpp/.h

* add doxygen comment for load_chan_rr_indices()

* [Infra] Updated Install Packages Script For Backwards Compatibility

The install_apt_packages.sh script is no longer backward compatible with
older versions of Ubuntu due to the dependency on clang-format-18.

Added an if statement to check if the distribution can support
clang-format-18 and only installing it if it can.

Added this script to the CI build process so it can always be tested
within the CI to prevent future regression.

* [RegTest] Disabled `strong_multiclock` test for parallel connection router

Temporarily disabled the `strong_multiclock` test in `vtr_reg_strong` CI
regression tests for the parallel connection router, due to some random
failures as mentioned in Issue #3029.

After fixing the problem with the `strong_multiclock` test, this will be
reactivated.

* [doc] update the doc with new report format

* [RegTest] Updated golden results for `strong_multiclock` regression test

Removed the golden results of parallel connection router test cases for
`strong_multiclock` regression test.

* [vpr][analysis] use std::min/max instead of if condition

* Add documentation for include sanitization

* [vpr][analysis] change report_net_timing format to csv

* [vpr][analysis] update comments

* [vpr][analysis] print constant nets in  the net timing report

* [vpr][analysis] apply comments

* [vpr][analysis] fix function name

* [doc] add net timing report use case

* fix a typo

* [Infra] Cleaned Up Include Files in VPR Base Directory

Many include files in the base directory contained includes to other
headers which they do not use. This causes many CPP files to include way
more header files than they need, increasing the incremental build time.

This process needs to be done on the entire VTR repo, but I found that
the base directory was one of the biggest culprits of this and the
hardest to untangle.

* [Infra] Cleaned Up Header Files in Pack Folder

Went through the header files in the pack folder and resolved any unused
header files.

* [AP] Removed Old Cluster-Level AP Flow

Prior to the flat AP flow, a cluster-level AP flow existed in VPR which
performed a SimPL-style algorithm on the clusters created during packing
before performing a placement quench.

Although well-written, this flow was not shown to outperform the SA
placer in VPR. It has also been becoming confusing to keep in VPR since
the new flat AP flow supercedes it. It is unclear if a cluster-level AP
flow will work well with the flat AP flow; however in that case the
cluster-level AP flow would be made using the new AP APIs written.

Removed the old cluster-level AP flow to reduce confusion.

* [Infra] Cleaned Up Header Files in Place Folder

* [lib][rr_graph] replace t_rr_type with e_rr_type

* [vpr][tileable] remove t_rr_type usage

* make is_io_type() a member function of t_physical_tile_type

* replace calls to is_io_type() with t_physical_tile_type::is_io()

* make format

* fix compiler bugs

* make format

* [lib][libutil] fix size_t issue

* inline  t_physical_tile_type::is_io()

* add doxygen comments for alloc_and_load_tile_rr_node_indices()

* [libs][vtrutil] use generate instead of fill to avoid getting potential null pointer dereference

* document alloc_and_load_rr_node_indices() arguments

* made a few function operating on t_pb_type its member functions

* add router_lookahead directory

* [STA] Added Multiclock Incremental STA Consistency Check

The incremental STA consistency coverage was very good, but was just
missing a multiclock circuit with an SDC file.

Added a quick test.

* [libs][rr_graph] don't reverse xy when calling node lookup

* [vpr][util] consider medium node type as inter cluster node

* [Infra] Cleaned Up Header Files in Route Folder

Continued the header file cleanup effort in the route folder.

Some of these files may need to be revisited in more detail, but got
some of the major header include issues.

Found that some definitions were in the wrong place, so moved them to
the correct implementation file.

* [Infra] Updated Header Files Based on Comments

Moved to pragma once symantics and cleaned up some less than ideal
code.

* [vpr][tileable] use is_io in t_physcial_tile

* [vpr][route] update rr node indices to include medium type

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Hang Yan <[email protected]>
Co-authored-by: AlexandreSinger <[email protected]>
Co-authored-by: Fred Tombs <[email protected]>
Co-authored-by: soheilshahrouz <[email protected]>
Co-authored-by: AlexandreSinger <[email protected]>
Co-authored-by: Soheil Shahrouz <[email protected]>
Co-authored-by: Amir Poolad <[email protected]>
Co-authored-by: Amir Poolad <[email protected]>
Co-authored-by: vaughnbetz <[email protected]>
Co-authored-by: Fred Tombs <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: James Yen <[email protected]>
Co-authored-by: Rongbo Zhang <[email protected]>
Co-authored-by: Rongbo Zhang <[email protected]>
Co-authored-by: Mohamed Elgammal <[email protected]>
* pass t_chan_width by reference

* doxygen comment for alloc_and_load_rr_node_indices

* add doxygen comments for load_block_rr_indices()

* [AP][Solver] Enabled Parallel Eigen

The Eigen solver has the ability to use OpenMP to run the solver
computations in parallel. Made the AP flow use the num_workers option to
set the number of threads that Eigen can use.

VPR did not have the ability to build with OpenMP in its CMAKE. Added an
option to the CMAKE to allow the user to enable OpenMP.

* undid a couple changes to fix odin memory leak

* [vpr][place] expand search range if block is io

* Revert "[vpr][place] expand search range if block is io"

This reverts commit af29e9d.

* Revert "make format"

This reverts commit caaf456.

* Revert "[vpr][place] move MIN_BLK_PER_COLUMN_EXPAND into the routine"

This reverts commit 4a6e333.

* Revert "make format"

This reverts commit 96e9cc5.

* Revert "[vpr][place] pass block_constraint parameter to relevant functions in initial placement to prevent search range to be adjusted"

This reverts commit ade994b.

* Revert "[vpr][place] use a constexpr to compare the number of blocks in column"

This reverts commit dae25cc.

* Revert "[vpr][place] remove adjust search range and adjust it inside find_compatible_compressed_loc_in_range"

This reverts commit f9e8517.

* [vpr][place] move adjust_search_range to move_utils.h so it can be accessed by initial placement

* [vpr][place] add adjust search range to find centriod neighbour

* remove unused is_flat argument from alloc_and_load_rr_node_indices() and load_block_rr_indices()

* use (x, y) convention for CHANX instead of (y, x)

* make format

* cast x/y to size_t

* get rid of warnings in RRSpatialLookup::find_nodes()

* Add references to the main VTR papers in the documentation.

* Add link to the VTR 9 paper in the documentation

* Add link to the VTR 9 paper in the README

* [vpr][place] remove a special case in find_compatible_compressed_loc_in_range

* add a closing ) to the text printed by node_coordinate_to_string()

* fix the x/y mismatch for CHANX nodes in rr_nodes and rr_node_indices

* reserve nodes using x/y instead of chan/seg

* fix a typo

* add rr_graph_genearion directory

* resize node lookup for CHANX nodes in RR graph serializer

* add rr_node_indices.cpp/.h

* add doxygen comment for load_chan_rr_indices()

* [Infra] Updated Install Packages Script For Backwards Compatibility

The install_apt_packages.sh script is no longer backward compatible with
older versions of Ubuntu due to the dependency on clang-format-18.

Added an if statement to check if the distribution can support
clang-format-18 and only installing it if it can.

Added this script to the CI build process so it can always be tested
within the CI to prevent future regression.

* [RegTest] Disabled `strong_multiclock` test for parallel connection router

Temporarily disabled the `strong_multiclock` test in `vtr_reg_strong` CI
regression tests for the parallel connection router, due to some random
failures as mentioned in Issue #3029.

After fixing the problem with the `strong_multiclock` test, this will be
reactivated.

* [doc] update the doc with new report format

* [RegTest] Updated golden results for `strong_multiclock` regression test

Removed the golden results of parallel connection router test cases for
`strong_multiclock` regression test.

* [vpr][analysis] use std::min/max instead of if condition

* Add documentation for include sanitization

* [vpr][analysis] change report_net_timing format to csv

* [vpr][analysis] update comments

* [vpr][analysis] print constant nets in  the net timing report

* [vpr][analysis] apply comments

* [vpr][analysis] fix function name

* [doc] add net timing report use case

* fix a typo

* [Infra] Cleaned Up Include Files in VPR Base Directory

Many include files in the base directory contained includes to other
headers which they do not use. This causes many CPP files to include way
more header files than they need, increasing the incremental build time.

This process needs to be done on the entire VTR repo, but I found that
the base directory was one of the biggest culprits of this and the
hardest to untangle.

* [FGParallelRouter] Updated Barrier to C++20 Std Barrier

The fine-grained parallel router was originally built before VTR
upgraded to C++20, so we had to roll our own barrier. We originally had
two barriers: spin barriers (thread spin on a lock while waiting) and a
"mutex" barrer (where threads wait on a condition variable and
potentially went to sleep).

Through experimentation, found that the choice of barrier implementation
did not matter; however, the standard barrier provides slight
performance improvements for very long routes and has a much cleaner
interface.

Moved the FG parallel router to the standard barrier. The old
implementations are left in as classes in case c++20 is not preferred
for some users.

Also added a QoR script to make parsing FG parallel router runs easier.

* [Infra] Cleaned Up Header Files in Pack Folder

Went through the header files in the pack folder and resolved any unused
header files.

* [AP] Removed Old Cluster-Level AP Flow

Prior to the flat AP flow, a cluster-level AP flow existed in VPR which
performed a SimPL-style algorithm on the clusters created during packing
before performing a placement quench.

Although well-written, this flow was not shown to outperform the SA
placer in VPR. It has also been becoming confusing to keep in VPR since
the new flat AP flow supercedes it. It is unclear if a cluster-level AP
flow will work well with the flat AP flow; however in that case the
cluster-level AP flow would be made using the new AP APIs written.

Removed the old cluster-level AP flow to reduce confusion.

* [Infra] Cleaned Up Header Files in Place Folder

* make is_io_type() a member function of t_physical_tile_type

* replace calls to is_io_type() with t_physical_tile_type::is_io()

* make format

* inline  t_physical_tile_type::is_io()

* add doxygen comments for alloc_and_load_tile_rr_node_indices()

* document alloc_and_load_rr_node_indices() arguments

* made a few function operating on t_pb_type its member functions

* add router_lookahead directory

* [STA] Added Multiclock Incremental STA Consistency Check

The incremental STA consistency coverage was very good, but was just
missing a multiclock circuit with an SDC file.

Added a quick test.

* add show-resource-usage mode

* add --show_resource_usage to command_line_usage.rst

* run 'make format'

* fix drawing contour style in draw_crit_path_elements

* make format

* fixes in VPR Viewer for flat_routing=on

* fix build errors after cherry-pick

* remove inner //hotfix-vpr-flat-routing-viewer mark toseparate one hotfix from another

* [Infra] Cleaned Up Header Files in Route Folder

Continued the header file cleanup effort in the route folder.

Some of these files may need to be revisited in more detail, but got
some of the major header include issues.

Found that some definitions were in the wrong place, so moved them to
the correct implementation file.

* [Infra] Updated Header Files Based on Comments

Moved to pragma once symantics and cleaned up some less than ideal
code.

* make format

* [Infra] Cleaned Up Includes in Analysis, Power, and Util Dirs

Continued the cleanup into the analysis, power, and util directories.

Nothing majorly changed.

* [FASM] Fixed Bug With Wire Creation

Found a bug within FASM's wire generation where it uses the index of the
output pin to create the wire instead of the index of the input pin.

This stemmed from some confusing code which both verified that the wire
was being used and returning the first valid pin. It just so happens
that it checked the outputs first and returned the output pin instead.

Cleaned up the code and added more error checking to prevent issues like
this in the future.

* [FASM] Updated Documentation Based on PR Review

* replace keyword `auto` with specific type

* Change GreedySeedSelector to work with molecules instead of atoms

* [Infra] Cleaned Up Includes in Draw Dir

Cleaned up the includes in the draw files. These ones were much messier
than I originally thought. Many of the header files in the draw
directory included way more than they needed which was causing false
dependencies anywhere in VPR which included any draw files.

* replaced t_clock_arch with std::shared_ptr<std::vector<t_clock_network>>

* rename vpr_show_resource_usage to vpr_print_arch_resources, to not confuse with existed print_resource_usage(). the new name more clear explain function flow.

* remove unnecessary shortcut std::string device_layout_variant = l.name in vpr_print_arch_resources()

* remove "auto num_instances = 0;" in vpr_print_arch_resources

* replace auto with specific type in expression "for (const auto equivalent_tile : type.equivalent_tiles"

* get red of keyword auto in vpr_print_arch_resources method

* Added EZGL docs under API

* fix 'possibly dangling reference to a temporary'

* refactor server/gateio module. idea is to include any sockpp header directly in gateio.cpp transaction unit. this helps avoid win32 enum names collision with enums defined in VTR if gateio.h is included.

* make format

* [CI] Added Serial Execution Engine Test

Since the CI always installs oneTBB and the execution engine is set to
auto, I found that the CI always tested with the tbb execution engine.
Some users may not have oneTBB installed for one reason or another and
we need to ensure that VTR always builds.

Added a CI test which sets the parallel execution engine to serial for
Tatum and VPR.

* [CI] Removed Redundant Warning Test

Prior to the updates to the CI to make all regression tests warning
clean, there was another warning test which was not as comprehensive as
the tests we have now.

Since this test was superceded, removed it from the CI. The CMAKE param
that enabled it was also used and replaced with a more comprehensive
CMAKE option.

* Have readthedocs install base requirements.txt packages

* Repaced prop and value in t_pin_to_pin_annotation with std::pair

* replaced char[level+1] with std::string

* [Infra] The Big VPR Pragma Commit

VPR is moving to a style that uses "#pragma once" instead of header
gaurds. These are less error prone and may be slightly more performant.

Converted all of the header gaurds in VPR into pragma once's.

Also moved all pragma onces to the top of all header files to maintain a
consistent style. It is a good idea to have them as the very first line
in all header files.

While going through all header files, cleaned up any extra header
includes which were including things they did not need.

* Added basic information on building with debug information and turning on verbosity to the developer guide.

* [HotFix] Fix Failing Python Formatting Check

A failing python formatting check got into Master. Fixing it.

* [libs][utils] remove redundant helper functions

* remove is_net_unrouted, replace it with more appropriate logic

* [libs][physical_types] add is_io to logical block

* [vpr][place] use logic block is_io

* move is_net_fully_absorbed to route_utils.h

* changed calloc to new

* fixed memory leak

* [vpr][place] update initial placement to limit set search range based on placement constraints

* [vpr][place] adjust search range if number of blocks in the column is less that a certain number

* [vpr][place] pass search_range by value

* make format

* Clean up the usage tracking in grid blocks.

* [CI] Consolidated Build Variation Tests

Different build variations of VTR were being run on different CI runners
which was wasteful.

Consolidated these build variations into a single job which will run on
a single runner.

* revert bool is_net_routed(ParentNetId net_id); as using net_stats is not robust for stage except the route. add doxygen doc for is_net_routed and is_net_fully_absorbed

* make format

* added default constructors

* Documenting get_usage method in grid blocks

* fixed build issues and improved pair for loops

* Upload vpr.out in nightly_test_manual artifacts

* [vpr][place] replace auto key word with variable type

* Clean up auto types from grid blocks

* [Infra] Cleaned Up Includes in Non-External Libs

Updated the header files in the non-external libraries of VTR such that
they use pragma once instead of ifdefs and removed false include files.

* [Router] Fixed the Segfault Bug in Parallel Connection Router

Fixed #3029.

Switched from detaching helper threads to joining threads in parallel
connection router to ensure that helper threads terminate before main
thread destroys the parallel connection router object.

* [AP][Legalizer] Added Ability to Generate a Mass Report

While working on the mass abstraction in the partial legalizer of the
globlal placer, found that I needed a lot more information on the device
to be able to debug the mass calculations.

Added a command-line option which will generate a mass report if
requested. This mass report contains useful information on the device,
the netlist, and the mass / capacities computed in the mass calculator.

* [AP][Legalizer] Updated Mass Report Based on PR Comments

* renamed pairs and replaced .first and .second with ={} notation

* removed unecessary if statements to check for nullptr

* moved loop variable definitions to loop

* make format

* [AP][Timing] Used Flat Placement Info to Compute Setup Criticalities

When timing analysis was turned on for AP, we originally only used the
pre-cluster timing analyzer which was very high-level and innacurate. It
practically just counted the number of hops between launch and capture
registers to approximate criticality.

Improved this by using flat placement information provided by AP.

During global placement, the criticality of all edges are recomputed
using the upper bound solution from the prior iteration of GP. The place
delay model from the placement flow was used to get an mostly-accurate
delay estimation for distances between tiles. The slacks computed each
GP iteration are used to update the net weights between iterations to
better optimize CPD and sTNS.

This improved estimation of setup slacks is then passed into the full
legalizer, which it is then used by the packer to better pack critical
atoms together.

This change required some changes to the APNetlist. Notably, we need all
atom nets to be located somewhere in the AP netlist such that their
delays can be calculated properly. Instead of removing nets we do not
care about for AP, marked them as ignored.

* [AP][Timing] Updated Comments for Timing

* [AP][Timing] More Updates to Comments

* [Infra] Fixed False Forward Declarations

The Clang builds were warning that there were several forward
declarations of structs which were supposed to be classes and
vice-versa.

This is not necessarily a problem since in C++ classes and structs end
up being basically the same from the compiler's perspective, but its
still incorrect. Fixed the cases I could see in the Clang builds.

* [test] update golden results

* update golden results

* Bump libs/EXTERNAL/libcatch2 from `5abfc0a` to `74fcff6`

Bumps [libs/EXTERNAL/libcatch2](https://github.com/catchorg/Catch2) from `5abfc0a` to `74fcff6`.
- [Release notes](https://github.com/catchorg/Catch2/releases)
- [Commits](catchorg/Catch2@5abfc0a...74fcff6)

---
updated-dependencies:
- dependency-name: libs/EXTERNAL/libcatch2
  dependency-version: 74fcff6e5b190fb833a231b7f7c1829e3c3ac54d
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* [test] update golden result

* [test][nightly_test_3] change seed from 5 to 3

* [Infra] Converted Pin to Pin Annotations into Vector

Pin to Pin annotations were stored as C-style arrays which creates
confusing pointers around VTR.

Converted to a standard vector.

* Improve makefile build types documentation

* [STA] Fixed Visual Bug in Post-Implementation SDC

While presenting my tutorial on post-implementation timing analysis, I
found that the SDC file generated did not look quite right. It was
functionally correct, but some of the new-line characters were missing.

Added the missing new line characters.

* [STA] Added Tutorial Video to Timing Analysis Tutorial

* [CI] Added Quick Titanium S10 Tests

The titanium benchmarks were not being tested by the CI. Added the
Titanium benchmarks which could be run in under around 2 hours to
NightlyTest7.

5 circuits in this benchmark set currently fail through VTR. The
failures are mainly in the initial placer, which is struggling to create
an initial placement when logical blocks can be placed into different
physical block types which are constrained resources.

* [libs][libarch] add reverse map for pin_physical_num to pb_pin

* [vpr][util] add get_atom_pin_rr_node_id

* [vpr][utils] add comment for get_atom_pin_rr_node_id

* make format

* [libs][archfpga] use pb_pin_to_pin_num to return pb_pin physical_num

* Revert "[libs][archfpga] use pb_pin_to_pin_num to return pb_pin physical_num"

This reverts commit ffe3c7c.

* Update golden results

* apply review comments

* fixed duplicate items

* [AP] Optimized Primitive Vector Class

The primitive vector class was assumed to be quite sparse, and as such
used an unordered map as its internal data structure.

Found through experimentation that most of the time in the partial
legalizer was being spent in the operations of the primitive vector
class. Also, while improving the mass abstraction, I found a need to
separate logical models from the dimensions of the primitive vector to
allow multiple models to point to the same dimension in the primitive
vector.

This PR kills two birds with one stone by turning the unordered map into
a VTR vector map and creating a new PrimitiveVectorDim which can allow
the models to be separate from the dimensions.

Future PRs will make use of this feature to improve the mass
abstraction.

* Added --read_initial_place_file and clarified options that took clustered placement file formats.

* Clarified that --write_initial_place_file is for a clustered placement.

* deleted comment

* Update ezgl to use submodules

* documentation fix

* [lib][libarch] add sstream lib

* fix compile errors

* [CI] Added Test Suite Verification to CI

Found that we were regressing on many features in VTR due to tasks being
added to the appropriate test suite directory, but not being included in
the necessary task list. As such, it appeared as though the tests were
being run, but in reality they were not.

Added a script which will be run by the CI which will verify that all of
the test suites that we care about have all their tasks in the
appropriate task list.

From this tool, found many tasks which were not in the task lists.
Marked these tasks as "ignored" for now. These should be handled in a
separate PR.

* update libcatch2

* [test][ap] fix config file

* remove redundant declerations

* [libs][rr_graph] add rr_graph def to fwd

* [libs][capnp] fix a typo in cmake file

* [libs][arch] remove redundant code

* [vtr_flow][task] fix config file

* [test][strong] fix tileable test

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: soheilshahrouz <[email protected]>
Co-authored-by: AlexandreSinger <[email protected]>
Co-authored-by: AlexandreSinger <[email protected]>
Co-authored-by: SamuelHo10 <[email protected]>
Co-authored-by: mohamedElgammal <[email protected]>
Co-authored-by: vaughnbetz <[email protected]>
Co-authored-by: Hang Yan <[email protected]>
Co-authored-by: Amir Poolad <[email protected]>
Co-authored-by: Soheil Shahrouz <[email protected]>
Co-authored-by: Oleksandr <[email protected]>
Co-authored-by: Amir Poolad <[email protected]>
Co-authored-by: w0lek <[email protected]>
Co-authored-by: Samuel <[email protected]>
Co-authored-by: Jeff Goeders <[email protected]>
Co-authored-by: Vaughn Betz <[email protected]>
Co-authored-by: haydar-c <[email protected]>
Co-authored-by: haydar-c <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Build system docs Documentation external_libs infra Project Infrastructure lang-cpp C/C++ code lang-hdl Hardware Description Language (Verilog/VHDL) lang-make CMake/Make code lang-netlist lang-python Python code lang-shell Shell scripts (bash etc.) libarchfpga Library for handling FPGA Architecture descriptions liblog libpugiutil libvtrutil Odin Odin II Logic Synthesis Tool: Unsorted item Parmys scripts Utility & Infrastructure scripts VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants