title: 'Bachelor Thesis Early Summary' author: Norbert Tremurici (e11907086@student.tuwien.ac.at) date: 11th August 2022 ...
The aim of this document is to provide a motivation, problem statement and preliminary summary of how the problem will be tackled.
The motivation of this bachelor thesis is to extend the existing lab course Digital Design and Computer Architecture LU (182.695) pipelined RISC-V 32-bit integer base instruction set (rv32i
) implementation to be able to run more sophisticated programs, more specifically: a basic operating system.
Currently there are a few supplied examples and students can write their own test programs, but because of the limited support for sophisticated library features these programs are invariably limited in scope.
A motivated student longing to see his freshly implemented processor do actual work will quickly realize that there is a mountain of work to be scaled as they will have to implement all required features of the C standard library or write a lot of assembly code, both of which would be hard to test and debug.
The most inspiring test case for a processor implementation of a student in the lab course would arguably be an operating system, as it is comparably sophisticated to modern integrated computer systems. As such, it is the goal of this thesis to investigate the possibility of extending the processor implementation such that this opportunity might end up being available to future students, perhaps even inspiring them to join the effort and make further extensions.
Any student wondering how they can make the jump from their rv32i
implementation to a system that does everything an operating system requires will be faced with similar questions like ones this work will be concerned with. It is not that difficult to find an unconstrained set of extensions needed to run an operating system, there are several RISC-V implementations that can run operating systems on bare metal that list in their specifications what they have implemented, but it is difficult currently to find out what minimal set of extensions are necessary for any given operating system.
The question of implementing a minimal set of extensions would be relevant to any interested students because after completing the lab course, they will have implemented for themselves only the 32-bit integer base instruction set. Targeting the minimal set of extensions would not only constrain the workload, but also reduce complexity and thereby make reading and understanding these extensions easier.
This thesis will tackle the problem of getting from the pipelined rv32i
implementation to one that supports the operation of a chosen basic operating system in two important ways:
Software perspective: Investigate which options in terms of operating systems are currently available, which configurations exist and choose what to extend based on this survey. This is a kind of top-down approach in which the high level requirements are analyzed to choose what to work on. There are different types of available operating systems with their own properties and requirements, such as real-time operating systems (for example FreeRTOS) and non real-time ones. From the RTOS space the most fitting available RTOS (for a 32-bit RISC-V architecture) will be chosen. As for the latter type, for the sake of simplicity and achievability, a basic (as minimalistic as possible) configuration such as 32-bit RISC-V Linux with uClibc
in a no-MMU
configuration and with busybox
as the userspace will be examined. The term no-MMU
refers to a configuration that expects no memory management unit to be present, a configuration which surfaces every once in a while in the space of embedded systems.
Hardware perspective: Consider the current state of the rv32i
implementation and survey which extensions are realistically implementable without ending up proving too complex. There is a lot of leeway in that extension implementations can be chosen to be implemented either performantly or functionally. For example, the M (multiplication and division) extension can be implemented elaborately or in a more basic fashion, depending on what level of performance is desired. Until the goal of a working example is achieved and the goals outlined in the motivation reached, the latter will always be preferred. This is a kind of bottom-up approach in that support for common operating systems is approximated step by step.
In shorter terms, the tasks this bachelor thesis will take on is to survey the currently available operating systems and the state in which they are usable (configurable), to survey extensions to opt for and finally implement operating system support.
Such an effort would have to be constrained in order to feasibly fit the scope of a bachelor thesis. Even if it ended up endlessly supplying interesting hurdles to overcome and work on, a bachelor thesis cannot continue on forever, so it will be of critical importance to define precisely what is desired and prioritize to reach a working example. For this reason, even though operating systems might be compared, realizing support for one of them will be prioritized first. This choice will be justified on the basis of surveying the software and hardware landscape.
The target architecture is set to be the architecture currently in use in the lab course, 32-bit RISC-V. What is required by the lab course is to produce a working rv32i
implementation. This will certainly have to be extended to fit the requirements of an operating system of choice and make it possible to get it running. The extensions to be made are planned to be provided in VHDL as separate modules and auxiliary files that can be dropped into the project structure (or provided initially with the assignment). If this proves to be impossible, or if small modifications have to be undertaken by students to make use of this infrastructure, then as long as the modifications aren't too elaborate this path going forward will be preferred. Because the implementation targets students who want to learn, all extensions that are added are meant to be refactored into a readable state with manageable complexity.
The software side is planned to be tackled by not only considering documentation and examples, but also by making software builds of the two operating systems that were chosen for the comparison and analysing the output. For this analysis, it is planned to either output readable assembly code or disassemble compiled binaries and count the frequency of occurence of individual instructions. This will give an idea not only about which instructions are necessary, but also about how critical they are. As a follow-up, one could even investigate whether they are actually needed or merely there by convenience and how hard it would be to remove instructions that will not be supported. This idea leads into an interesting trade-off: is it simpler to implement a commonly used and required feature, or to adapt what is available to less supported or even unsupported configurations to avoid implementing said feature?
As for the hardware side, the implementation of some control and status registers (CSRs), as well as basic handling of interrupts, exceptions and traps will be necessary. These features are expected to be challenging to implement for a pipelined RISC-V processor.
Because the lab course intends students to pair up to implement the processor, my respected colleague shares the same personal implementation. As he happens to also work on a bachelor thesis supervised by the same supervisor, we will initially get together to develop an implementation of the M extension and branch predictor. It is looking like some form of M extension support will be necessary, or at least lack of M extension support will complicate the effort and incur large performance penalties. The branch predictor, even as a very simple one, will likely improve performance in a favorable proportion compared to time investment which might prove necessary to have a working example that boots in an acceptable timeframe. Both of these efforts will presumably be dropped if they prove too complicated or time-intensive.
Without having completed the survey about operating systems or without having started working on the hardware side, estimating the difficulty and time it will take cannot be accurate. As work progresses and matters become clearer, updates on accomplishments and new findings will be provided to steer towards a sensible direction. This is currently planned and scheduled on a weekly basis.
Some preliminary work has already been done, looking at various operating systems and their requirements, as well as investigating RISC-V extensions. Based on this, the preliminary choice of operating systems to compare will be one of two RTOSes and the aforementioned Linux configuration. Either FreeRTOS or Zephyr RTOS are currently in consideration. For Linux, this choice was based on existing work to support very simple processors (single hart, no-MMU
and little extension support) by various developers targeting 64-bit and 32-bit RISC-V platforms. If there weren't the possibility of running with uClibc
and busybox
, Linux would not have been considered at all for complexity reasons. The existing tool buildroot
, which is closely related to uClibc
and busybox
will be used to make Linux builds, as it greatly simplifies and automates the effort.
A reasonable question to ask before undertaking this effort is why no existing processor or framework can be taken and adapted to fit these needs. Particularly for RISC-V there are many open implementations that have impressive capabilities, but none are geared towards the motives of this bachelor thesis: they are complex in design for performance reasons and their modules are often intertwined, they implement many features and end up being confusing, they are more commonly than not written in another HDL like Verilog or Chisel, or they target a core architecture too far removed from what is implemented in the lab course. To specifically target a more minimal set of extensions is not what is usually done.
The reason this undertaking is challenging is partly due to the frugalness of the rv32i
implementation. The lab course intends to convey the principles of pipelining and offers only a simple UART interface as I/O for the system. Besides finding the right operating system (as well as configuring it appropriately) and finding which extensions are necessary, more storage support might be needed, I/O (at least the UART interface) might have to be extended appropriately and pipelining will have to be considered in implementing these extensions. For instance, the handling of exceptions, interrupts and traps can become complicated in a pipelined implementation. Some assumptions of the current pipeline might not hold anymore, like the duration of a memory access. The target is also an FPGA, which offers limited system resources. There is a lot of research, analysis, adaptation and implementation work to be done.