~icefox/riscv-reference

Attempt at making an instruction reference guide for RISC-V
6 months ago
9 months ago

#RISC-V Instruction Reference

An unofficial instruction reference for the RISC-V instruction set, oriented towards assembly programming.

One of the common nuisances with the RISC-V instruction set manual is the structure of the documents: They are written in a very prose-y style, which is great for reading them as a learning document, but a pain in the butt for actually using it to look up the details of various things as you are reading or writing assembly code. So I decided to write a simple reference of my own, since it would be useful and maybe not too large a project for one person to take on.

I hate formatting stuff, so all the instruction definitions are in a structured data file (TOML, at the moment) and there is a small Rust program that reads it and outputs whatever text format you want, as long as you want Markdown. It is very crude and not at all pretty, but for now it works. The easy way to get HTML or PDF out of it is to feed the markdown into Pandoc.

Contributions are very welcome. This is a big project for one person, but also a project that is very easy to incrementally chip in a little here and there. To contribute, send patches the the mailing list.

License is Creative Commons Attribution 4.0 International, same as the RISC-V ISA manual.

#Current state

The goal is to cover the entire RV64 user-level instruction set with all ratified extensions, pseudo-instructions, etc. The reference being used is the latest formally published one, dated 20191213, though there will probably be a new one along soon. Another file for the privileged instruction set would be nice but I'm not gonna go there myself yet.

RV32 is currently out of scope, but might be nice in the future. Apparently some instruction encodings are different between RV32 and RV64, such as SLLI, SRLI, and SRAI, so I'm not going to bother worrying about both for now.

Possible states:

  • Finished -- Finished
  • Rough -- All instructions listed and complete, but not proofread or cleaned up
  • WIP -- Not all instructions listed, or instructions listed but missing info
  • Not started -- Not started

Extension and state:

  • I: Rough, version 2.1
  • M: Rough, version 2.0
  • A: WIP, version 2.1
  • F: WIP, version 2.2
  • D: Not started, version 2.2
  • Q: Not started, version 2.2
  • C: Not started, version 2.0
  • Zicsr: Not started, version 2.0
  • Zifencei: WIP, version 2.0

Extensions to do later:

  • V: Not started
  • Zam: Not started
  • Zfh: Not started
  • Zfhmin: Not started
  • Zdinx: Not started
  • Zfinx: Not started
  • Zhinx: Not started
  • Zhinxmin: Not started
  • Ztso: Not started
  • Probably others

A glance at the 20200427 draft doesn't show too many differences, mainly the half-precision floats and floats-in-integer-registers extensions. I'm really waiting for a version to get released with up to date V extension.

#Things to ponder

Some instructions have multiple encodings, such as compressed/noncompressed versions. It's a bit annoying to figure out how the instruction formats intertwine with each other. I'm sure it all makes perfect sense to some hardware people. But we DO need to document what kind of operands instructions take.

Instruction opcodes in RV are kinda wonky 'cause they're commonly broken up across 1-3 fields depending on the instruction format.

Pseudo-instructions are sometimes described in the description of the base instruction, and are sometimes not. This is, of course, the exact sort of annoyance this document is intended to clean up. Currently I THINK all pseudo-instructions are split out and listed as separate instructions, but I may have missed some. In particular there's pseudo-instruction that only change their operands, such as jal offset which translates to jal x1, offset and stuff like that. Not sure how to handle those yet; I'd expected we could entirely describe an instructions operands based on its instruction encoding format, buuuuuuut maybe not. Maybe we need to list operands explicitly, so that this is a more useful reference for assembly programmers. Other guilty instructions: fence, jalr, ...

#Instruction encoding notes

Instruction format types

  • R - register, op rd, rs1, rs2. 3 opcode parts, opcode, funct3 and funct7 (for some gorram reason) (Oh I think that's the number of bits in the section)
  • I - immediate, op rd, rs1, imm[12]. 2 opcode parts, opcode and funct3. Except on ECALL when it uses the immediate section for an opcode, called funct12
  • S - store, op, rs1, rs2+imm[12]. 2 opcode parts, opcode and funct3.
  • B - Variant of S type, probably means "branch". 2 opcode parts, opcode and funct3.
  • U - upper load, op rd, imm[20]. 1 opcode part, opcode.
  • J - Variant of J type, jump???. 1 opcode part, opcode.
  • pseudo -- Pseudo-instruction

For now, the "opcode" section in the toml just lists the text defintions of the opcodes the standard provides, starting from least significant bits first. Exact values can be found (in binary) in chapter 24 of the spec, "RV32/64G Instruction Set Listings". I do kinda want to have actual hex values for the opcodes somewhere, because it is (sometimes) useful to be able to eyeball instructions and at least say "that's an arithmatic op" or "this whole chunk is floating point", but the instruction format makes it tricky.

#Comparison

MIPS standard doc:

MIPS ADDI instruction

RISC-V standard doc:

RISCV ADDI instruction