General HDL questions

Discussion of developmental aspects of the MiSTer Project.
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

General HDL questions

Unread post by ethern0t »

These aren't MiSTer specific per se, but I'd be developing my code under the MiSTer framework since it gives me input and output support, and I figure some of the folks around here are further along the journey than I am.

I'm trying to implement a CPU of my own design from scratch (google j1 forth cpu for a really great example).

At this point it's going to be a simple 16 bit CPU with a RISC-like instruction set.

So my question is... assuming you have a state machine for the processor, how do you know how much you can accomplish in one processor cycle?

I assume if you do too much in one state, that state will be the bottleneck constraining your max processor speed.

So for example, assuming I have a RISC-style "store reg1,reg2(offset)" where reg1, reg2, and offset are all encoded in the instruction, how many states would I want?

Expressed sequentially, it would be:

Code: Select all

insn = read_memory(ip);
value = read_reg[insn[8:10];
offset = read_reg[insn[11:13]);
offset += insn[0:7];
store_memory(offset, value);
So that's five states, roughly speaking, and each state either reads or write a register or memory

Now, I bet the offset computations could be done in a single state, so that gets us down to four states.

But is it reasonable to read both memory and registers in the same state/cycle? What about reading two registers in the same state/cycle?

If you group the states solely by dependencies and can read two different registers at once, it collapses to three states:

Code: Select all

insn = read_memory(ip);
value = read_reg[insn[8:10]]; offset = read_reg[insn[11:13] + insn[0:7];
store_memory(offset, value);
Thanks,

-Dave
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

Re: General HDL questions

Unread post by ethern0t »

I just realized the add of offset probably needs to be in the third state because it’s dependent on the register read.
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

Re: General HDL questions

Unread post by ethern0t »

I got my 16-bit cpu implemented and passing Verilator lint tests, but it crashed Quartus any time I tried to compile (I instantiated my cpu in the mycore template MiSTer project). I eventually realized that you can't have a given register driven by more than one thing at the same time (that's why register files and rams are generally implemented with a separate module that allows one read or write per cycle).

I stripped it back down to almost nothing (just loading immediate values into registers through the ALU) and it just hangs indefinitely during synthesis.

So... I've got a lot to learn still.
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

Re: General HDL questions

Unread post by ethern0t »

Apparently instantiating a 64k ram (reg [15:0] ram[32768) really upsets the synthesis software and sends it to the brink of insanity.

I've seen other cores use some sort of wizard to instantiate a large ram, or I need to hook it up to the MiSTer SRAM interface.

-Dave
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

Re: General HDL questions

Unread post by ethern0t »

Bonus failure -- I was trying to keep everything in one file, and Verilator complains about multiple modules in the same file, so I tried using functions and tasks to achieve my result. Somehow this caused fitting to fail. Inlining everything in a master "cpu" module seemed to work.
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

Re: General HDL questions

Unread post by ethern0t »

Turns out using functions and tasks was fine... the real issue was my dodgy RAM instantiation. Using the wizard to instantiate an Altera block ram solved all my problems.

Incidentally, I found that wrapping sys_top.v to instantiate just my own test module was a significant win on iteration time.
ExCyber
Posts: 217
Joined: Sun May 24, 2020 3:33 pm
Has thanked: 11 times
Been thanked: 66 times

Re: General HDL questions

Unread post by ExCyber »

ethern0t wrote: Sat Jun 20, 2020 3:58 pmApparently instantiating a 64k ram (reg [15:0] ram[32768) really upsets the synthesis software and sends it to the brink of insanity.
I only have a little familiarity with this problem, but I think what happens is that synthesis doesn't recognize your RAM as being compatible with any of its standard RAM entities (due to e.g. mixed synchronous and asynchronous assignments, or assignments on different clock edges), so it essentially treats it as a single generic logic function with one input and one output per RAM bit, and therefore allocates a ridiculous amount of RAM in order to optimize that huge function.

edit: My math on the truth table size was wrong, and the actual expression isn't very readable in plain text unless I missed some simplifications, so I just removed that part.
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

Re: General HDL questions

Unread post by ethern0t »

ExCyber, thanks for the reply, what you say makes a lot of sense.

I converted both my RAM and my register file into Altera objects and it drastically reduced the number of registers consumed in the design (it also kept me honest so I wasn't doing things like reading and writing the same register in the same machine cycle).

I used a dual-port RAM for the register file so that I can read and write up to two registers per cycle (mostly for EA calculations).

Eventually I'll probably convert the 64k of system memory over to dual-port as well so that my sprite engine can share cleanly.
sajattack
Core Developer
Posts: 35
Joined: Sun May 24, 2020 6:50 pm
Location: BC, Canada
Has thanked: 3 times
Been thanked: 17 times
Contact:

Re: General HDL questions

Unread post by sajattack »

If you see something like "inferred latch" in your synthesis logs, that's how you know you goofed.
Post Reply