123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304 |
- vfio-ccw: the basic infrastructure
- ==================================
- Introduction
- ------------
- Here we describe the vfio support for I/O subchannel devices for
- Linux/s390. Motivation for vfio-ccw is to passthrough subchannels to a
- virtual machine, while vfio is the means.
- Different than other hardware architectures, s390 has defined a unified
- I/O access method, which is so called Channel I/O. It has its own access
- patterns:
- - Channel programs run asynchronously on a separate (co)processor.
- - The channel subsystem will access any memory designated by the caller
- in the channel program directly, i.e. there is no iommu involved.
- Thus when we introduce vfio support for these devices, we realize it
- with a mediated device (mdev) implementation. The vfio mdev will be
- added to an iommu group, so as to make itself able to be managed by the
- vfio framework. And we add read/write callbacks for special vfio I/O
- regions to pass the channel programs from the mdev to its parent device
- (the real I/O subchannel device) to do further address translation and
- to perform I/O instructions.
- This document does not intend to explain the s390 I/O architecture in
- every detail. More information/reference could be found here:
- - A good start to know Channel I/O in general:
- https://en.wikipedia.org/wiki/Channel_I/O
- - s390 architecture:
- s390 Principles of Operation manual (IBM Form. No. SA22-7832)
- - The existing Qemu code which implements a simple emulated channel
- subsystem could also be a good reference. It makes it easier to follow
- the flow.
- qemu/hw/s390x/css.c
- For vfio mediated device framework:
- - Documentation/vfio-mediated-device.txt
- Motivation of vfio-ccw
- ----------------------
- Currently, a guest virtualized via qemu/kvm on s390 only sees
- paravirtualized virtio devices via the "Virtio Over Channel I/O
- (virtio-ccw)" transport. This makes virtio devices discoverable via
- standard operating system algorithms for handling channel devices.
- However this is not enough. On s390 for the majority of devices, which
- use the standard Channel I/O based mechanism, we also need to provide
- the functionality of passing through them to a Qemu virtual machine.
- This includes devices that don't have a virtio counterpart (e.g. tape
- drives) or that have specific characteristics which guests want to
- exploit.
- For passing a device to a guest, we want to use the same interface as
- everybody else, namely vfio. Thus, we would like to introduce vfio
- support for channel devices. And we would like to name this new vfio
- device "vfio-ccw".
- Access patterns of CCW devices
- ------------------------------
- s390 architecture has implemented a so called channel subsystem, that
- provides a unified view of the devices physically attached to the
- systems. Though the s390 hardware platform knows about a huge variety of
- different peripheral attachments like disk devices (aka. DASDs), tapes,
- communication controllers, etc. They can all be accessed by a well
- defined access method and they are presenting I/O completion a unified
- way: I/O interruptions.
- All I/O requires the use of channel command words (CCWs). A CCW is an
- instruction to a specialized I/O channel processor. A channel program is
- a sequence of CCWs which are executed by the I/O channel subsystem. To
- issue a channel program to the channel subsystem, it is required to
- build an operation request block (ORB), which can be used to point out
- the format of the CCW and other control information to the system. The
- operating system signals the I/O channel subsystem to begin executing
- the channel program with a SSCH (start sub-channel) instruction. The
- central processor is then free to proceed with non-I/O instructions
- until interrupted. The I/O completion result is received by the
- interrupt handler in the form of interrupt response block (IRB).
- Back to vfio-ccw, in short:
- - ORBs and channel programs are built in guest kernel (with guest
- physical addresses).
- - ORBs and channel programs are passed to the host kernel.
- - Host kernel translates the guest physical addresses to real addresses
- and starts the I/O with issuing a privileged Channel I/O instruction
- (e.g SSCH).
- - channel programs run asynchronously on a separate processor.
- - I/O completion will be signaled to the host with I/O interruptions.
- And it will be copied as IRB to user space to pass it back to the
- guest.
- Physical vfio ccw device and its child mdev
- -------------------------------------------
- As mentioned above, we realize vfio-ccw with a mdev implementation.
- Channel I/O does not have IOMMU hardware support, so the physical
- vfio-ccw device does not have an IOMMU level translation or isolation.
- Sub-channel I/O instructions are all privileged instructions, When
- handling the I/O instruction interception, vfio-ccw has the software
- policing and translation how the channel program is programmed before
- it gets sent to hardware.
- Within this implementation, we have two drivers for two types of
- devices:
- - The vfio_ccw driver for the physical subchannel device.
- This is an I/O subchannel driver for the real subchannel device. It
- realizes a group of callbacks and registers to the mdev framework as a
- parent (physical) device. As a consequence, mdev provides vfio_ccw a
- generic interface (sysfs) to create mdev devices. A vfio mdev could be
- created by vfio_ccw then and added to the mediated bus. It is the vfio
- device that added to an IOMMU group and a vfio group.
- vfio_ccw also provides an I/O region to accept channel program
- request from user space and store I/O interrupt result for user
- space to retrieve. To notify user space an I/O completion, it offers
- an interface to setup an eventfd fd for asynchronous signaling.
- - The vfio_mdev driver for the mediated vfio ccw device.
- This is provided by the mdev framework. It is a vfio device driver for
- the mdev that created by vfio_ccw.
- It realize a group of vfio device driver callbacks, adds itself to a
- vfio group, and registers itself to the mdev framework as a mdev
- driver.
- It uses a vfio iommu backend that uses the existing map and unmap
- ioctls, but rather than programming them into an IOMMU for a device,
- it simply stores the translations for use by later requests. This
- means that a device programmed in a VM with guest physical addresses
- can have the vfio kernel convert that address to process virtual
- address, pin the page and program the hardware with the host physical
- address in one step.
- For a mdev, the vfio iommu backend will not pin the pages during the
- VFIO_IOMMU_MAP_DMA ioctl. Mdev framework will only maintain a database
- of the iova<->vaddr mappings in this operation. And they export a
- vfio_pin_pages and a vfio_unpin_pages interfaces from the vfio iommu
- backend for the physical devices to pin and unpin pages by demand.
- Below is a high Level block diagram.
- +-------------+
- | |
- | +---------+ | mdev_register_driver() +--------------+
- | | Mdev | +<-----------------------+ |
- | | bus | | | vfio_mdev.ko |
- | | driver | +----------------------->+ |<-> VFIO user
- | +---------+ | probe()/remove() +--------------+ APIs
- | |
- | MDEV CORE |
- | MODULE |
- | mdev.ko |
- | +---------+ | mdev_register_device() +--------------+
- | |Physical | +<-----------------------+ |
- | | device | | | vfio_ccw.ko |<-> subchannel
- | |interface| +----------------------->+ | device
- | +---------+ | callback +--------------+
- +-------------+
- The process of how these work together.
- 1. vfio_ccw.ko drives the physical I/O subchannel, and registers the
- physical device (with callbacks) to mdev framework.
- When vfio_ccw probing the subchannel device, it registers device
- pointer and callbacks to the mdev framework. Mdev related file nodes
- under the device node in sysfs would be created for the subchannel
- device, namely 'mdev_create', 'mdev_destroy' and
- 'mdev_supported_types'.
- 2. Create a mediated vfio ccw device.
- Use the 'mdev_create' sysfs file, we need to manually create one (and
- only one for our case) mediated device.
- 3. vfio_mdev.ko drives the mediated ccw device.
- vfio_mdev is also the vfio device drvier. It will probe the mdev and
- add it to an iommu_group and a vfio_group. Then we could pass through
- the mdev to a guest.
- vfio-ccw I/O region
- -------------------
- An I/O region is used to accept channel program request from user
- space and store I/O interrupt result for user space to retrieve. The
- defination of the region is:
- struct ccw_io_region {
- #define ORB_AREA_SIZE 12
- __u8 orb_area[ORB_AREA_SIZE];
- #define SCSW_AREA_SIZE 12
- __u8 scsw_area[SCSW_AREA_SIZE];
- #define IRB_AREA_SIZE 96
- __u8 irb_area[IRB_AREA_SIZE];
- __u32 ret_code;
- } __packed;
- While starting an I/O request, orb_area should be filled with the
- guest ORB, and scsw_area should be filled with the SCSW of the Virtual
- Subchannel.
- irb_area stores the I/O result.
- ret_code stores a return code for each access of the region.
- vfio-ccw patches overview
- -------------------------
- For now, our patches are rebased on the latest mdev implementation.
- vfio-ccw follows what vfio-pci did on the s390 paltform and uses
- vfio-iommu-type1 as the vfio iommu backend. It's a good start to launch
- the code review for vfio-ccw. Note that the implementation is far from
- complete yet; but we'd like to get feedback for the general
- architecture.
- * CCW translation APIs
- - Description:
- These introduce a group of APIs (start with 'cp_') to do CCW
- translation. The CCWs passed in by a user space program are
- organized with their guest physical memory addresses. These APIs
- will copy the CCWs into the kernel space, and assemble a runnable
- kernel channel program by updating the guest physical addresses with
- their corresponding host physical addresses.
- - Patches:
- vfio: ccw: introduce channel program interfaces
- * vfio_ccw device driver
- - Description:
- The following patches utilizes the CCW translation APIs and introduce
- vfio_ccw, which is the driver for the I/O subchannel devices you want
- to pass through.
- vfio_ccw implements the following vfio ioctls:
- VFIO_DEVICE_GET_INFO
- VFIO_DEVICE_GET_IRQ_INFO
- VFIO_DEVICE_GET_REGION_INFO
- VFIO_DEVICE_RESET
- VFIO_DEVICE_SET_IRQS
- This provides an I/O region, so that the user space program can pass a
- channel program to the kernel, to do further CCW translation before
- issuing them to a real device.
- This also provides the SET_IRQ ioctl to setup an event notifier to
- notify the user space program the I/O completion in an asynchronous
- way.
- - Patches:
- vfio: ccw: basic implementation for vfio_ccw driver
- vfio: ccw: introduce ccw_io_region
- vfio: ccw: realize VFIO_DEVICE_GET_REGION_INFO ioctl
- vfio: ccw: realize VFIO_DEVICE_RESET ioctl
- vfio: ccw: realize VFIO_DEVICE_G(S)ET_IRQ_INFO ioctls
- The user of vfio-ccw is not limited to Qemu, while Qemu is definitely a
- good example to get understand how these patches work. Here is a little
- bit more detail how an I/O request triggered by the Qemu guest will be
- handled (without error handling).
- Explanation:
- Q1-Q7: Qemu side process.
- K1-K5: Kernel side process.
- Q1. Get I/O region info during initialization.
- Q2. Setup event notifier and handler to handle I/O completion.
- ... ...
- Q3. Intercept a ssch instruction.
- Q4. Write the guest channel program and ORB to the I/O region.
- K1. Copy from guest to kernel.
- K2. Translate the guest channel program to a host kernel space
- channel program, which becomes runnable for a real device.
- K3. With the necessary information contained in the orb passed in
- by Qemu, issue the ccwchain to the device.
- K4. Return the ssch CC code.
- Q5. Return the CC code to the guest.
- ... ...
- K5. Interrupt handler gets the I/O result and write the result to
- the I/O region.
- K6. Signal Qemu to retrieve the result.
- Q6. Get the signal and event handler reads out the result from the I/O
- region.
- Q7. Update the irb for the guest.
- Limitations
- -----------
- The current vfio-ccw implementation focuses on supporting basic commands
- needed to implement block device functionality (read/write) of DASD/ECKD
- device only. Some commands may need special handling in the future, for
- example, anything related to path grouping.
- DASD is a kind of storage device. While ECKD is a data recording format.
- More information for DASD and ECKD could be found here:
- https://en.wikipedia.org/wiki/Direct-access_storage_device
- https://en.wikipedia.org/wiki/Count_key_data
- Together with the corresponding work in Qemu, we can bring the passed
- through DASD/ECKD device online in a guest now and use it as a block
- device.
- Reference
- ---------
- 1. ESA/s390 Principles of Operation manual (IBM Form. No. SA22-7832)
- 2. ESA/390 Common I/O Device Commands manual (IBM Form. No. SA22-7204)
- 3. https://en.wikipedia.org/wiki/Channel_I/O
- 4. Documentation/s390/cds.txt
- 5. Documentation/vfio.txt
- 6. Documentation/vfio-mediated-device.txt
|