Do I dare even ask how long the timing and routing pass took on 1680 cores?

jsgray · on Jan 21, 2017

This build took 11 hours on an Intel Skulltrail NUC w/ 32 GB DRAM, but I believe there are ways to speed this up going forwards (incremental and/or hierarchical builds, out of context synthesis). For example, the XCVI9P is a 3 die (3 "super logic region") device and by setting up a hierarchical design flow I think I can place and route each SLR separately (at the same time) across more (x86) cores on my build box. The inter-SLR interconnect nets are just some quite regular 300b wide Hoplite NOC links and clock and reset.

nickpsecurity · on Jan 21, 2017

Can't the tools do it relatively fast with a geometric method if the individual cores already have area/timing data to use and are homogenous? And a FPGA instead of an ASIC?

My reading the various papers on synthesis as a non-hardware guy made me think this job shouldn't be as hard on that as the SOC's whose components vary considerably in individual attributes.

jsgray · on Jan 21, 2017

I wish it were so. While it is straightforward to do regular placement at the block level or even at the individual LUT/slice level using RPMs (relationally placed macros) or absolute LOC placement of LUTs in the XDC implementation constraints file, most of the implementation time goes into routing and there is not an easy mainstream way to take a routed one tile design and step and repeat it (say) 210 times across the die. In part this is due to non homogeneity across the columns and sometimes rows of the chip.

nickpsecurity · on Jan 21, 2017

That makes sense. Thanks.

jackyinger · on Jan 21, 2017

If you floor plan it yourself you can make the tool's job a lot easier. You can create a grid of regions on the FPGA and assign a core to each.