Skip to content

Conversation

AlexandreSinger
Copy link
Contributor

I found that the way that the annealer estimates the initial temperature to be too high when the initial placement is of very good quality (for example, after AP).

Added a new way of estimating the starting temperature by setting it to an estimation of the equilibrium temperature. The equilibrium temperature is the temperature at which the change in cost after an annealing iteration would be 0.

The old way (of using the variance of the change in cost) is still the default; however, this new method can be turned on in the command-line.

I found that the way that the annealer estimates the initial temperature
to be too high when the initial placement is of very good quality (for
example, after AP).

Added a new way of estimating the starting temperature by
setting it to an estimation of the equilibrium temperature. The
equilibrium temperature is the temperature at which the change in cost
after an annealing iteration would be 0.

The old way (of using the variance of the change in cost) is still the
default; however, this new method can be turned on in the command-line.
@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool docs Documentation lang-cpp C/C++ code labels Sep 10, 2025
@AlexandreSinger
Copy link
Contributor Author

Results on Titan (titan quick qor). Baseline is using the original cost variance approach (current default), the other is using my new equilibrium option (only one command-line change):

  baseline.txt equilibrium.txt
vtr_flow_elapsed_time 1 0.91
num_LAB 1 1
num_DSP 1 1
num_M9K 1 1
num_M144K 1 1
max_vpr_mem 1 1.00
num_pre_packed_blocks 1 1
num_post_packed_blocks 1 1
device_grid_tiles 1 1
pack_time 1 0.99
placed_wirelength_est 1 1.01
place_time 1 0.85
placed_CPD_est 1 0.99
routed_wirelength 1 1.01
critical_path_delay 1 0.99
geomean_nonvirtual_intradomain_critical_path_delay 1 1.00
crit_path_route_time 1 1.01

Overall, it looks like these changes improved place time by over 15% and improved CPD by 1%, at the expense of 1% wirelength! That's a very good tradeoff in my opinion! I predict AP would only be better!

Raw results:
comparison_output.xlsx

Looking at direct_rf:
Cost variance:
Screenshot from 2025-09-11 16-33-48

Equilibrium:
Screenshot from 2025-09-11 16-34-26

We can see that this new equilibrium approach is achieving its goal of not setting the temperature too high.

@vaughnbetz What do you think? I think we should not make this default yet; but this at least demonstrates the value of this approach.

@AmirhosseinPoolad FYI

@vaughnb-cerebras
Copy link

Definitely looks promising!

@AlexandreSinger
Copy link
Contributor Author

@soheilshahrouz was curious if the gains we are seeing is just due to this new estimator always scaling down the initial temperature; so we could get the same results by just scaling down the initial temperature.

To counter that point, I got the initial temperatures for each circuit for each estimator:

Circuit Baseline Temp Equilibrium Temp Est Ratio (Equil / Baseline)
bitcoin_miner_stratixiv_arch_timing.blif 7.90E-04 3.10E-05 0.04
bitonic_mesh_stratixiv_arch_timing.blif 7.10E-04 1.60E-04 0.23
cholesky_bdti_stratixiv_arch_timing.blif 6.10E-04 9.90E-05 0.16
cholesky_mc_stratixiv_arch_timing.blif 7.00E-04 2.30E-04 0.33
dart_stratixiv_arch_timing.blif 5.20E-04 1.10E-04 0.21
denoise_stratixiv_arch_timing.blif 6.00E-04 6.80E-05 0.11
des90_stratixiv_arch_timing.blif 6.50E-04 2.40E-04 0.37
directrf_stratixiv_arch_timing.blif 9.30E-04 2.40E-05 0.03
gsm_switch_stratixiv_arch_timing.blif 7.10E-04 5.70E-05 0.08
LU230_stratixiv_arch_timing.blif 7.50E-04 7.30E-05 0.10
LU_Network_stratixiv_arch_timing.blif 8.60E-04 4.20E-05 0.05
mes_noc_stratixiv_arch_timing.blif 4.90E-04 2.80E-05 0.06
minres_stratixiv_arch_timing.blif 8.10E-04 1.80E-04 0.22
neuron_stratixiv_arch_timing.blif 6.80E-04 4.20E-04 0.62
openCV_stratixiv_arch_timing.blif 7.90E-04 2.00E-04 0.25
segmentation_stratixiv_arch_timing.blif 9.30E-04 2.40E-04 0.26
SLAM_spheric_stratixiv_arch_timing.blif 4.60E-04 1.20E-04 0.26
sparcT1_chip2_stratixiv_arch_timing.blif 7.40E-04 3.40E-05 0.05
sparcT1_core_stratixiv_arch_timing.blif 4.60E-04 1.50E-04 0.33
sparcT2_core_stratixiv_arch_timing.blif 4.10E-04 3.50E-05 0.09
stap_qrd_stratixiv_arch_timing.blif 6.60E-04 5.80E-05 0.09
stereo_vision_stratixiv_arch_timing.blif 8.10E-04 4.40E-04 0.54
    AVERAGE 0.20

We can see that although, on average, the temperature was reduced by 4x, the ratio for most circuits is not near the average, with some circuits being 10x reduced, and some being 2x reduced. This demonstrates that this new approach is adapting based on the circuit's initial placement.

One big bonus of this new flow is that it does not have any magic scaling factors, which will make it more automatic (so we do not need to keep readjusting it using new scaling factors).

@AlexandreSinger
Copy link
Contributor Author

For @AmirhosseinPoolad , I ran VTR Master to see if there is any run time degredation due to this approach:

  vtr_master.txt variance.txt
vtr_flow_elapsed_time 1 0.965944
num_LAB 1 1
num_DSP 1 1
num_M9K 1 1
num_M144K 1 1
max_vpr_mem 1 0.999895
num_pre_packed_blocks 1 1
num_post_packed_blocks 1 1
device_grid_tiles 1 1
pack_time 1 0.966783
placed_wirelength_est 1 1
place_time 1 0.963469
placed_CPD_est 1 1
routed_wirelength 1 1
critical_path_delay 1 1
geomean_nonvirtual_intradomain_critical_path_delay 1 1
crit_path_route_time 1 0.98467

It looks like machine load hides the results some. But if we compare the change in pack time to the change in place time, we see that the run time does not increase by a noticeable amount due to this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants