Integer Register File (IRF) on an FPGA
Hi All. My name is Joel. Several students here at University of California Santa Cruz are working on a project for our senior design. We plan to port the OpenSPARC to an FPGA, and to create a custom board around it. To read more about us and our project, feel free to visit our our website at http://sparcfpga.dforge.cse.ucsc.edu.
One of the main things we are currently working on is to simplify the integer register file (IRF), as it is significant problem when porting it to an FPGA. We came up with several ideas and methods, but are leaning toward one.
The IRF has four (three with one being 128 bits instead of 64bits resulting in four) read ports, and two write ports. Our idea is to run the IRF at four times the clock frequency than the processor. We are currently not concerned with performance at the moment but making it fit onto the FPGA. By doing this, we would be able to achieve four rising edge clock cycles in the IRF, into one processor clock cycle. This would allow us to achieve one read and one write per rising edge, and fulfill four reads and two writes in one processor clock cycle.
We would like to hear questions, feedback, suggestions, and/or comments anyone has on this idea.
-SparcFPGA [Joel]
# 1
Hello Joel,
Definitely sounds like an interesting project!
With regards to the clocking the IRF to run at 4 times the core freq
(i.e. 480 MHz), I am a bit skeptical that you can achieve that on a
mid-range FPGA (depending on how you are implementing the register
file on the FPGA, the amount of contiguous block ram you have
available etc).
It would probably be more feasible to implement an IRF with 2 read
ports and one write port (i.e. clocked at twice the speed of the Core
or 240 MHz).
Have you tried to implement the full openSparc IRF on your fpga? If
so, how big was it, how did you implement it at the physical level and
what fpga did you use? I am inclined to think Four read ports and two
write ports on an IRF should'nt be that complicated/huge. Maybe there
are physical resources that you could have taken advantage of to
reduce the amount of logic required to implement the IRF.
Sincerely,
-
Taha Amiralli
thamiral [A] uwo [D] ca
thamiral [A] gmail [D] com
MESc Candidate 2007, Computer Engineering
The University Of Western Ontario
BESc, BSc. 2005,
Computer Engineering & Computer Science
The University Of Western Ontario
On 1/7/07, general@opensparc.info <general@opensparc.info> wrote:
> Hi All. My name is Joel. Several students here at University of California Santa Cruz are working on a project for our senior design. We plan to port the OpenSPARC to an FPGA, and to create a custom board around it. To read more about us and our project, feel free to visit our our website at http://sparcfpga.dforge.cse.ucsc.edu.
>
> One of the main things we are currently working on is to simplify the integer register file (IRF), as it is significant problem when porting it to an FPGA. We came up with several ideas and methods, but are leaning toward one.
>
> The IRF has four (three with one being 128 bits instead of 64bits resulting in four) read ports, and two write ports. Our idea is to run the IRF at four times the clock frequency than the processor. We are currently not concerned with performance at the moment but making it fit onto the FPGA. By doing this, we would be able to achieve four rising edge clock cycles in the IRF, into one processor clock cycle. This would allow us to achieve one read and one write per rising edge, and fulfill four reads and two writes in one processor clock cycle.
>
> We would like to hear questions, feedback, suggestions, and/or comments anyone has on this idea.
>
>
> -SparcFPGA [Joel]
>
>
> To unsubscribe, e-mail: general-unsubscribe@opensparc.sunsource.net
> For additional commands, e-mail: general-help@opensparc.sunsource.net
>
>
--
To unsubscribe, e-mail: general-unsubscribe@opensparc.sunsource.net
For additional commands, e-mail: general-help@opensparc.sunsource.net
# 2
On Jan 8, 2007, at 5:46 AM, Taha Amiralli wrote:
> Hello Joel,
>
> Definitely sounds like an interesting project!
>
> With regards to the clocking the IRF to run at 4 times the core freq
> (i.e. 480 MHz), I am a bit skeptical that you can achieve that on a
> mid-range FPGA (depending on how you are implementing the register
> file on the FPGA, the amount of contiguous block ram you have
> available etc).
>
> It would probably be more feasible to implement an IRF with 2 read
> ports and one write port (i.e. clocked at twice the speed of the Core
> or 240 MHz).
I agree. 2-rd ports, 1-wr port sound like the most reasonable option.
> Have you tried to implement the full openSparc IRF on your fpga? If
> so, how big was it, how did you implement it at the physical level and
> what fpga did you use? I am inclined to think Four read ports and two
> write ports on an IRF should'nt be that complicated/huge. Maybe there
> are physical resources that you could have taken advantage of to
> reduce the amount of logic required to implement the IRF.
The problem is that if you use 2 write ports, you can not use the
built-in
SRAMs (1 write port only).
--
You can't do something you don't know, if you keep on doing what you
do know. by F.M. Alexander
To unsubscribe, e-mail: general-unsubscribe@opensparc.sunsource.net
For additional commands, e-mail: general-help@opensparc.sunsource.net
# 3
Hi Taha and Jose,
Thanks for your comments and suggestions.
Your idea of having two reads and one write by clocking it twice the frequency of the processor sounds like a more reasonable approach and are probably going go with that idea.
We have synthesized the IRF targetting a Cyclone-II using Quartus2 and came up with 36K LE's and 11K registers.
-SparcFPGA [Joel]
Message was edited by:
JoelSantos
# 4
Thanks for the info...
That IRF is more complex than I had anticipated... I'll try to
synthesize onto an FPGA someday just to see whats taking up all those
resources.
Sincerely,
-
Taha Amiralli
thamiral [A] uwo [D] ca
thamiral [A] gmail [D] com
MESc Candidate 2007, Computer Engineering
The University Of Western Ontario
BESc, BSc. 2005,
Computer Engineering & Computer Science
The University Of Western Ontario
On 1/8/07, general@opensparc.info <general@opensparc.info> wrote:
> Hi Taha,
>
> Thanks for your comments and suggestions.
>
> Your idea of having two reads and one write by clocking it twice the frequency of the processor sounds like a more reasonable approach and are probably going go with that idea.
>
> We have synthesized the IRF targetting a Cyclone-II using Quartus2 and came up with 36K LE's and 11K registers.
>
>
> -SparcFPGA [Joel]
>
>
> To unsubscribe, e-mail: general-unsubscribe@opensparc.sunsource.net
> For additional commands, e-mail: general-help@opensparc.sunsource.net
>
>
--
To unsubscribe, e-mail: general-unsubscribe@opensparc.sunsource.net
For additional commands, e-mail: general-help@opensparc.sunsource.net
# 5
> That IRF is more complex than I had anticipated... I'll try to
> synthesize onto an FPGA someday just to see whats taking up all those resources.
You might also try looking into the existing OpenSPARC FPGA project (http://fpga.sunsource.net/), to see some work that's already been done to optimize the design for FPGAs.
Although the FPGA project page shows that the design is down to 134K LUTs (Xilinx), that page needs updating -- the design (w/o FPU) is now actually down to just 38K LUTS (single core, single thread).
# 6
> You might also try looking into the existing
> OpenSPARC FPGA project (http://fpga.sunsource.net/),
> to see some work that's already been done to optimize
> the design for FPGAs.
>
> Although the FPGA project page shows that the design
> is down to 134K LUTs (Xilinx), that page needs
> updating -- the design (w/o FPU) is now actually down
> to just 38K LUTS (single core, single thread).
It could be VERY interesting! We received a lot of email from people asking how to map a single SPARC core on FPGA using less resources as possible.
The FPGA project page could be updated, but what might be very useful could be the set of files and scripts you used to map the single thread version.
# 7
> It could be VERY interesting! We received a lot of email from people asking how to map a single SPARC core on FPGA using less resources as possible.
>
> The FPGA project page could be updated, but what might be very useful could be the set of files and scripts you used to map the single thread version.
>
Is it possible to lose some of the register windows? The IRF for 4
threads contains 640 x64 bit registers ( 160 per thread, 20 register
windows), perhaps this could come down to 80 per thread or fewer? It
might save a few LUTs, not sure precisely how many registers are in the
IRF of the FPGA design at 38K LUTs, it might have been done already.
To unsubscribe, e-mail: general-unsubscribe@opensparc.sunsource.net
For additional commands, e-mail: general-help@opensparc.sunsource.net
# 8
We have been working on a single core single thread implementation of OpenSPARC T1 for FPGAs. This single thread is functionally identical to the Niagara thread except that we reduced number of TLB entries to 8 (Niagara core shares 64-entry TLB across four threads) and removed crypto unit (Stream Processing Unit). We also recoded register file to have 4 read ports and 1 write port since 2 write ports are not really needed for single thread implementation. These, among other things, allowed us to map design on FPGAs much better. More details on our results can be found here -
http://www.opensparc.net/publications/presentations/
Click on "RAMP Retreat" link.
We are also attempting to port Linux/Solaris on this implementation to demonstrate the robustness of the design.
Going back to RF optimizations, having an implementation that runs RF at twice the frequency compared to the rest of the core with half as many number of ports will be a very important contribution. This will be key to having multi-threaded core on FPGAs within reasonable hardware overhead.
Yes, you could save more LUTs by reducing number of windows/size of cache etc but these are now mapped very well on BRAMs so LUT saving may not be very significant.
Durgam.
