Skip to content

Commit

Permalink
finished v0.2 of the technical report
Browse files Browse the repository at this point in the history
  • Loading branch information
ugeorge committed Aug 9, 2019
1 parent 65fca3b commit ab3de16
Show file tree
Hide file tree
Showing 16 changed files with 244 additions and 92 deletions.
3 changes: 2 additions & 1 deletion aesa-deep/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.stack-work/
aesa-deep.cabal
*~
*~
PC*
38 changes: 21 additions & 17 deletions aesa-deep/src/AESA/PC/R3.lhs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Refinement 3: Deep Language Embedding
## Refinement 3: Deep Language Embedding {#sec:synth-r3}

In this refinement phase we translate the PC'' system defined in the previous phase to
a language more appropriate for synthesis. Up until now we have used the
Expand All @@ -25,8 +25,8 @@ For more on ForSyDe-Deep modeling, refer to the beginner tutorials on the ForSyD
called TemplateHaskell. The function parser is able to recognize a subset of the
Haskell language written between the so-called banana brackets `[d| function |]`,
and needs to _explicitly specify its type signature_. Whatever code is written
within the bananna brackets is parsed as-is and needs to be self-contained,
i.e. information cannot be infeered from global/system-wide definitions. In order to
within the banana brackets is parsed as-is and needs to be self-contained,
i.e. information cannot be inferred from global/system-wide definitions. In order to
instantiate a parsable ForSyDe-Deep function from a Haskell function, it needs to be
wrapped into a `ProcFun` type, using one of the [function
wrappers](http://hackage.haskell.org/package/forsyde-deep-0.2.0/docs/ForSyDe-Deep-Process.html)
Expand All @@ -35,7 +35,7 @@ For more on ForSyDe-Deep modeling, refer to the beginner tutorials on the ForSyD
* processes are instantiated using the ForSyDe-Deep equivalents of the main
ForSyDe-Shallow [SY process
constructors](http://hackage.haskell.org/package/forsyde-deep-0.2.0/docs/ForSyDe-Deep-Process-SynchProc.html). The
main difference is that they need a string identrifier as well as `ProcFun`-wrapped
main difference is that they need a string identifier as well as `ProcFun`-wrapped
functions instead of regular Haskell functions.

* in order to be able to simulate, parse or synthesize a process, it needs to be
Expand Down Expand Up @@ -80,16 +80,15 @@ the package extension built for this case study and contains a couple of deep-em
(synthesizable) skeletons, as presented in [@sec:atom;@sec:refine].

> import ForSyDe.Deep
> import ForSyDe.Deep.Int
> import ForSyDe.Deep.Skeleton
> import ForSyDe.Deep.Skeleton as Sk

We need some additional type libraries. `Data.Param.FSVec` defines fixed-size vectors
as compared to the shallow vectors used until now,
e.g. `ForSyDe.Atom.Skeleton.Vector`. Fixed-size vectors have their length captured in
the type signature as type-level numerals imported from `Data.TypeLevel.Num`. For
example `(FSVec D8 a)` denotes a fixed-size vector of type a, with the size of 8
elements. We also use list functions, thus we can tell their functions apart by
loading the two librarie using different aliases.
loading the two libraries using different aliases.

> import Data.List as L
> import Data.Param.FSVec as FSV
Expand All @@ -105,7 +104,7 @@ For starters, let us define the adder process used in the `fir'` skeleton. Accor
to the crash course in @sec:crash-deep we need to gradually wrap its elements until it
becomes a system definition `SysDef`. First, we need to declare a `ProcFun` element
from a Haskell addition function. As already mentioned, whatever is written between
the bananna brackets needs to be self-contained. This means that the `Num` type
the banana brackets needs to be self-contained. This means that the `Num` type
instance of `Complex` which overloads the `(+)` operator is not of much use
here. Complex number addition needs to be explicitly written so that the parser know
what to synthesize (N.B. type class support is listed among the "todo" items of future
Expand Down Expand Up @@ -183,12 +182,17 @@ constructor":
which we then use as argument for the `app11` skeleton (which is in fact a `farm11`
tailored for partial application only) to instantiate a farm of constant signal
generators, i.e. a vector of constant signals. The coefficients are interpreted into a
`FSVec` from their shallow couterparts defined in the second refinement phase.
`FSVec` from their shallow counterparts defined in the second refinement phase.
> coefsR3 = app11V "coef" constSys coefs
> coefsR3 = Sk.app11 "coef" constSys coefs
> where
> coefs = $(vectorTH (V.fromVector coefsR2 :: [Complex Fixed20]))
**OBS:** ForSyDe-Deep skeletons, as compared to their shallow counterparts, take care
of _creating instances_ (see @fig:deep-syntax) of defined systems as well as
coupling them together. This is why they need components as arguments instead of
simple functions.
Now, for didactic purpose, let us define ForSyDe-Deep equivalent of the `fir'`
skeleton using the base catamorphisms _recur_ (or its specialization _generate_),
_farm_ and _reduce_. We call this process network constructor `deepFIR` and, as you
Expand All @@ -208,14 +212,14 @@ constants.
> -> a -- ^ input signal/structure
> -> a -- ^ output signal/structure
> deepFIR name addSys mulSys dlySys coefs =
> reduceV addName addSys . farm21V mulName mulSys coefs . generateV dlyName n dlySys
> Sk.reduce addName addSys . Sk.farm21 mulName mulSys coefs . Sk.generate dlyName n dlySys
> where n = lengthT coefs
> dlyName = name L.++ "_dly_"
> addName = name L.++ "_add_"
> mulName = name L.++ "_mul_"
We finally can define a component for the deep equivalent of `procPC` using the
`deepFIR` process network constructor, by passing the aboce-defined components as
`deepFIR` process network constructor, by passing the above-defined components as
arguments.
> procPCSys :: SysDef ( Signal (Complex Fixed20)
Expand All @@ -228,7 +232,7 @@ type-level numeral equal to $N_b$
> pc3 :: FSVec D8 (Signal (Complex Fixed20))
> -> FSVec D8 (Signal (Complex Fixed20))
> pc3 = farm11V "pc" procPCSys
> pc3 = Sk.farm11 "pc" procPCSys
with its associate system definition. Unfortunately, at the moment processes of type
`FSVec a (Signal a) -> ...` are not part of the `SysFun` class, thus we cannot create
Expand All @@ -237,11 +241,11 @@ inside a `zipx . pc . unzipx` pattern which merely transposes the vector and sig
domains, and the synthesizer will simply ignore it. (N.B. vector-based components will
be supported in future iterations of ForSyDe).
> sysPC3 = newSysDef (zipxSY "zip" . pc3 . unzipxSY "unzip") "PC" ["i1"] ["o1"]
> sysPC3 = newSysDef (zipxSY "zip" . pc3 . unzipxSY "unzip") "PC3" ["i1"] ["o1"]
### Simulation. Synthesis
Similarly to the prevous refinement stages, we further wrap the PC component in order
Similarly to the previous refinement stages, we further wrap the PC component in order
to co-simulate it with the rest of the AESA system. To simulate a ForSyDe-Deep
`SysDef` component we use the `simulate` function
Expand All @@ -256,7 +260,7 @@ simulator types.
> . L.map (FSV.unsafeVector d8) . L.transpose . L.map SY.fromSignal . V.fromVector
Please refer to the project's `README` file for how to execute the AESA system
alongside the deep `pc3` component. The plotted output for running the system agains
alongside the deep `pc3` component. The plotted output for running the system against
the radar indata generated by the 13 objects is shown in @fig:deep-r3-odata.
![One output cube with radar data](figs/AESA_REFINE_R3.pdf){#fig:deep-r3-odata}
Expand Down Expand Up @@ -296,7 +300,7 @@ specifications in @tbl:deep-spec-r3.
| | |
| ---------------------------------- | ------------- |
| Top-level Entity Name | PC |
| Top-level Entity Name | PC3 |
| Family | Cyclone IV GX |
| Total logic elements | 3,014 |
| Total combinational functions | 3,014 |
Expand Down
128 changes: 79 additions & 49 deletions aesa-deep/src/AESA/PC/R4.lhs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## R4: Balancing the FIR Reduction. Minimizing the circuit.
## R4: Balancing the FIR Reduction {#sec:synth-r4}

The final refinement phase consists in performing some simple semantic-preserving
transformations on the previous model in order to optimize the efficiency of the
Expand All @@ -12,16 +12,21 @@ Two (rather evident) optimizations are considered in this phase:
previous addition and applying it to the next one, thus generating a linear ($O(n)$)
reduction tree. However, the base operation performed, i.e. addition, is
commutative, and a well-known catamorphism theorem (see @ungureanu2019 for more
details) states that the reduction can in fact be pefrormed as a balanced
logarithmic (($O(\log n)$)) tree which means, in the case of digital hardware
details) states that the reduction can in fact be performed as a balanced
logarithmic ($O(\log n)$) tree which means, in the case of digital hardware
designs, a shorter combinational critical path.

* having a counter in each delay element, although elegant and modular, is a terible
* having a counter in each delay element, although elegant and modular, is a terrible
waste of silicon when considering that all counters are counting the same thing: how
many samples have passed. A more reasonable design would take advantage of the
identical part in all the `"rDelay"` components (i.e. the counter), and separate
from the non-identical part (i.e. the delay register).
The second optimization is identified and performed automatically by the Quartus
compiler, hence the specifications in @tbl:deep-spec-r3 includes only one counter
circuit for all the delay elements. Therefore we will only focus on reduction tree
balancing and leave the functional decoupling of the reset-counter as an exercise.
### Model
The module for the forth refinement module is the following.
Expand All @@ -31,67 +36,92 @@ The module for the forth refinement module is the following.
We import the same library as before:
> import qualified "forsyde-atom-extensions" ForSyDe.Atom.MoC.SY as SY
> import qualified "forsyde-atom-extensions" ForSyDe.Atom.MoC.SDF as SDF
> import qualified "forsyde-atom-extensions" ForSyDe.Atom.Skeleton.Vector as V
> import ForSyDe.Deep
> import ForSyDe.Deep.Int
> import ForSyDe.Deep.Skeleton
> import ForSyDe.Deep.Skeleton as Sk
> import Data.List as L
> import Data.Param.FSVec as FSV
> import Data.TypeLevel.Num hiding ((+), (-), (==))
>
> import AESA.PC.R3
As of the second bullet point above, we define one reset counter `rCounter`.
> rCounterSys :: SysDef (Signal (Complex Fixed20) -> Signal Bool)
> rCounterSys = newSysDef (mooreSY "rCounterProc" count reset (1024-1))
> "rCounter" ["i1"] ["o1"]
> where
> count = $(newProcFun
> [d|cntf :: Int16 -> Complex Fixed20 -> Int16
> cntf c _ = if c == 0 then 1024-1 else c-1 |])
> reset = $(newProcFun
> [d|rstf :: Int16 -> Bool
> rstf c = if c==0 then True else False |])
and a resettable delay (state) element
> delaySys :: SysDef (Signal Bool -> Signal (Complex Fixed20) -> Signal (Complex Fixed20))
> delaySys = newSysDef (scanld2SY "rDelayProc" reset (0:+0))
> "rDelay" ["rst","i1"] ["o1"]
> where
> reset = $(newProcFun
> [d|rstd :: Complex Fixed20 -> Bool -> Complex Fixed20 -> Complex Fixed20
> rstd _ r p = if r then (0:+0) else p |])
> import AESA.PC.R2 (wrapR2)
> import AESA.PC.R3 (wrapR3, coefsR3, addSys, mulSys, rDelaySys)
To instantiate the logarithmic reduction tree, it is enough to replace the `reduce`
skeleton with the `logReduce` skeleton in the previous `deepFIR` process network
constructor. Furthermore, we decouple the reset counter to be outside the FIR network
thus we pass the reset signal as an argument to the network constructor.
constructor.
> balancedFIR name addSys mulSys dlySys coefs rstSys =
> logReduceV addName addSys . farm21V mulName mulSys coefs . generateV dlyName n (dlySys rstSys)
> balancedFIR name addSys mulSys dlySys coefs =
> Sk.logReduce (name L.++ "_add_") addSys
> . Sk.farm21 (name L.++ "_mul_") mulSys coefs
> . Sk.generate (name L.++ "_dly_") n dlySys
> where n = lengthT coefs
> dlyName = name L.++ "_dly_"
> addName = name L.++ "_add_"
> mulName = name L.++ "_mul_"
> pcFIR' :: Signal (Complex Fixed20)
> -> Signal (Complex Fixed20)
> pcFIR' s = balancedFIR "fir" addSys mulSys delaySys coefsR3 rstSig s
> where
> rstSig = (instantiate "rCount" rCounterSys) s
The new `procPC` system definition is created using the skeleton above.
> procPCSys' :: SysDef ( Signal (Complex Fixed20)
> -> Signal (Complex Fixed20) )
> procPCSys' = newSysDef (balancedFIR "fir" addSys mulSys rDelaySys coefsR3)
> "FIR" ["i1"] ["o1"]
We finally create the refined version of the AESA PC$^{(4)}$ signal processing stage
> pc4 :: FSVec D8 (Signal (Complex Fixed20))
> -> FSVec D8 (Signal (Complex Fixed20))
> pc4 = farm11V "pc" (newSysDef pcFIR' "FIR" ["i1"] ["o1"])
> -> FSVec D8 (Signal (Complex Fixed20))
> pc4 = Sk.farm11 "pc" procPCSys'
which is then declared as a component.
> sysPC4 = newSysDef (zipxSY "zip" . pc4 . unzipxSY "unzip") "PC4" ["i1"] ["o1"]
> sysPC4 = newSysDef (zipxSY "zip" . pc4 . unzipxSY "unzip") "PC" ["i1"] ["o1"]
### Simulation. Synthesis
> wrappedPC4 = wrapR2 (wrapR3 (simulate sysPC3))
Similar to previous refinement phases, we wrap the PC$^{(4)}$ in order to co-simulate
it with the high-level AESA model. As expected, the AESA simulation using this
component gives exactly the same results as the previous simulation, shown in
@fig:deep-r3-odata.
> wrappedPC4 = wrapR2 (wrapR3 (simulate sysPC4))
Dumping the internal structure of PC$^{(4)}$ shows in @fig:deep-gramhml-r4 a more
compact FIR structure, with a shorter combinational depth, as well as one `"rCount"`
component fanning out into all the delay elements.
> graphmlPC4 = writeGraphMLOps (defaultGraphMLOps {yFilesMarkup = True}) sysPC4
The VHDL code can be generated
> vhdlPC4 = writeVHDL sysPC4
simulated
> vhdlSim = writeAndModelsimVHDL Nothing sysPC4
> -- vhdlSim' = writeAndGhdlVHDL Nothing sysPC4 -- alternatively
or synthesized.
> quartusPC4 = writeVHDLOps vhdlOps sysPC4
> where vhdlOps = defaultVHDLOps{execQuartus=Just quartusOps}
> quartusOps = checkSynthesisQuartus
The generated circuit has exactly the same size as the previous one
(@tbl:deep-spec-r4), however, the combinational depth is visibly smaller, as seen in
the RTL plot in @fig:deep-quartus-r4.
![Dumped GraphML structure of the new FIR component](figs/GML_FIR_R4.png){#fig:deep-gramhml-r4}
![Screenshot of the RTL view of FIR](figs/RTLR4.png){#fig:deep-quartus-r4}
| | |
| ---------------------------------- | ------------- |
| Top-level Entity Name | PC4 |
| Family | Cyclone IV GX |
| Total logic elements | 3,014 |
| Total combinational functions | 3,014 |
| Dedicated logic registers | 976 |
| Total registers | 976 |
| Total memory bits | 0 |
| Embedded Multiplier 9-bit elements | 160 |
| Total PLLs | 0 |
Table: Specifications of generated FPGA design {#tbl:deep-spec-r4}
3 changes: 2 additions & 1 deletion aesa-deep/stack.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ packages:
- .
- ../forsyde-atom-extensions
# - ../forsyde-deep-extensions
# - ../../forsyde/forsyde-deep
- ../aesa-atom

extra-deps:
Expand All @@ -16,4 +17,4 @@ extra-deps:
- git: [email protected]:forsyde/forsyde-atom.git
commit: 76b0c8b07047180cdef2a58c696e5245e7db3ce3
- git: [email protected]:ugeorge/forsyde-deep.git
commit: 39b89c2addaa1eeedee97c4cc2ea8e2fa180f257
commit: 482a972e3f99c627c01181c071bc2d0c91591878
3 changes: 3 additions & 0 deletions aesa-deep/test/Spec.hs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import Test.Framework.Providers.QuickCheck2 (testProperty)
import TestR1
import TestR2
import TestR3
import TestR4

tests :: [Test]
tests = [
Expand All @@ -19,6 +20,8 @@ tests = [
(withMaxSuccess 1000 prop_refine2_error)
, testProperty "PC3 is equivalent with PC'' "
(withMaxSuccess 100 prop_refine3_equiv)
, testProperty "PC4 is equivalent with PC3 "
(withMaxSuccess 100 prop_refine4_equiv)
]
]

Expand Down
2 changes: 1 addition & 1 deletion aesa-deep/test/TestR3.lhs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
### Properties

The main property we want to check now is that the ForSyDe-Deep version of PC$^{(3)}$
is the same as PC'' defined in the second rfinement phase.
is the same as PC'' defined in the second refinement phase.

> {-# LANGUAGE PackageImports #-}
> module TestR3 where
Expand Down
32 changes: 32 additions & 0 deletions aesa-deep/test/TestR4.lhs
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
### Properties

Here we test that the ForSyDe-Deep version of PC$^{(4)}$ is the same
as PC$^{(3)}$ defined in the third phase.

> {-# LANGUAGE PackageImports #-}
> module TestR4 where
>
> import "forsyde-atom-extensions" ForSyDe.Atom.MoC.SY as SY
> import Test.QuickCheck
> import ForSyDe.Deep
>
> import Generators (largeSySigs, cpxFixed20)
> import AESA.PC.R3 as R3
> import AESA.PC.R4 as R4

The property
$$
\forall c \in \mathbb{C} \Rightarrow
\Sigma(\mathtt{procPCSys} (\overline{c})) =
\Sigma(\mathtt{procPCSys'}(\overline{c}))
$$ {#eq:prop_refine4_equiv}

is tested by the QuickCheck program

> prop_refine4_equiv = forAll (largeSySigs cpxFixed20)
> $ \s -> all (\(a,b) -> a == b) $ zip
> (simulate R3.procPCSys $ SY.fromSignal s)
> (simulate R4.procPCSys' $ SY.fromSignal s)

As all tests are passing (check the project's `README` file on how to run tests),
hence we can conclude that PC$^{(4)}$ is a good replacement for the AESA PC stage.
3 changes: 2 additions & 1 deletion docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ LHS_FILES := 21:../aesa-atom/src/AESA/CubesAtom.lhs \
62a:../aesa-deep/test/TestR2.lhs \
63:../aesa-deep/src/AESA/PC/R3.lhs \
63a:../aesa-deep/test/TestR3.lhs \
64:../aesa-deep/src/AESA/PC/R4.lhs
64:../aesa-deep/src/AESA/PC/R4.lhs \
64a:../aesa-deep/test/TestR4.lhs

BIB_FILE := refs.bib
TX_FILES := $(patsubst $(TXT_DIR)/%.md, $(MD_DIR)/%.md, $(wildcard $(TXT_DIR)/*.md))
Expand Down
Binary file added docs/figs_src/GML_FIR_R4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/figs_src/RTLR4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit ab3de16

Please sign in to comment.