123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197 |
- This is eventfd-based synchronization, or 'esync' for short. Turn it on with
- WINEESYNC=1; debug it with +esync.
- == BUGS AND LIMITATIONS ==
- Please let me know if you find any bugs. If you can, also attach a log with
- +seh,+pid,+esync,+server,+timestamp.
- If you get something like "eventfd: Too many open files" and then things start
- crashing, you've probably run out of file descriptors. esync creates one
- eventfd descriptor for each synchronization object, and some games may use a
- large number of these. Linux by default limits a process to 4096 file
- descriptors, which probably was reasonable back in the nineties but isn't
- really anymore. (Fortunately Debian and derivatives [Ubuntu, Mint] already
- have a reasonable limit.) To raise the limit you'll want to edit
- /etc/security/limits.conf and add a line like
- * hard nofile 1048576
- then restart your session.
- On distributions using systemd, the settings in `/etc/security/limits.conf`
- will be overridden by systemd's own settings. If you run `ulimit -Hn` and it
- returns a lower number than the one you've previously set, then you can set
- DefaultLimitNOFILE=1048576
- in both `/etc/systemd/system.conf` and `/etc/systemd/user.conf`. You can then
- execute `sudo systemctl daemon-reexec` and restart your session. Check again
- with `ulimit -Hn` that the limit is correct.
- Also note that if the wineserver has esync active, all clients also must, and
- vice versa. Otherwise things will probably crash quite badly.
- == EXPLANATION ==
- The aim is to execute all synchronization operations in "user-space", that is,
- without going through wineserver. We do this using Linux's eventfd
- facility. The main impetus to using eventfd is so that we can poll multiple
- objects at once; in particular we can't do this with futexes, or pthread
- semaphores, or the like. The only way I know of to wait on any of multiple
- objects is to use select/poll/epoll to wait on multiple fds, and eventfd gives
- us those fds in a quite usable way.
- Whenever a semaphore, event, or mutex is created, we have the server, instead
- of creating a traditional server-side event/semaphore/mutex, instead create an
- 'esync' primitive. These live in esync.c and are very slim objects; in fact,
- they don't even know what type of primitive they are. The server is involved
- at all because we still need a way of creating named objects, passing handles
- to another process, etc.
- The server creates an eventfd file descriptor with the requested parameters
- and passes it back to ntdll. ntdll creates an object of the appropriate type,
- then caches it in a table. This table is copied almost wholesale from the fd
- cache code in server.c.
- Specific operations follow quite straightforwardly from eventfd:
- * To release an object, or set an event, we simply write() to it.
- * An object is signalled if read() succeeds on it. Notably, we create all
- eventfd descriptors with O_NONBLOCK, so that we can atomically check if an
- object is signalled and grab it if it is. This also lets us reset events.
- * For objects whose state should not be reset upon waiting—e.g. manual-reset
- events—we simply check for the POLLIN flag instead of reading.
- * Semaphores are handled by the EFD_SEMAPHORE flag. This matches up quite well
- (although with some difficulties; see below).
- * Mutexes store their owner thread locally. This isn't reliable information if
- a different process's thread owns the mutex, but this doesn't matter—a
- thread should only care whether it owns the mutex, so it knows whether to
- try waiting on it or simply to increase the recursion count.
- The interesting part about esync is that (almost) all waits happen in ntdll,
- including those on server-bound objects. The idea here is that on the server
- side, for any waitable object, we create an eventfd file descriptor (not an
- esync primitive), and then pass it to ntdll if the program tries to wait on
- it. These are cached too, so only the first wait will require a round trip to
- the server. Then the server signals the file descriptor as appropriate, and
- thereby wakes up the client. So far this is implemented for processes,
- threads, message queues (difficult; see below), and device managers (necessary
- for drivers to work). All of these are necessarily server-bound, so we
- wouldn't really gain anything by signalling on the client side instead. Of
- course, except possibly for message queues, it's not likely that any program
- (cutting-edge D3D game or not) is going to be causing a great wineserver load
- by waiting on any of these objects; the motivation was rather to provide a way
- to wait on ntdll-bound and server-bound objects at the same time.
- Some cases are still passed to the server, and there's probably no reason not
- to keep them that way. Those that I noticed while testing include: async
- objects, which are internal to the file APIs and never exposed to userspace,
- startup_info objects, which are internal to the loader and signalled when a
- process starts, and keyed events, which are exposed through an ntdll API
- (although not through kernel32) but can't be mixed with other objects (you
- have to use NtWaitForKeyedEvent()). Other cases include: named pipes, debug
- events, sockets, and timers. It's unlikely we'll want to optimize debug events
- or sockets (or any of the other, rather rare, objects), but it is possible
- we'll want to optimize named pipes or timers.
- There were two sort of complications when working out the above. The first one
- was events. The trouble is that (1) the server actually creates some events by
- itself and (2) the server sometimes manipulates events passed by the
- client. Resolving the first case was easy enough, and merely entailed creating
- eventfd descriptors for the events the same way as for processes and threads
- (note that we don't really lose anything this way; the events include
- "LowMemoryCondition" and the event that signals system processes to shut
- down). For the second case I basically had to hook the server-side event
- functions to redirect to esync versions if the event was actually an esync
- primitive.
- The second complication was message queues. The difficulty here is that X11
- signals events by writing into a pipe (at least I think it's a pipe?), and so
- as a result wineserver has to poll on that descriptor. In theory we could just
- let wineserver do so and then signal us as appropriate, except that wineserver
- only polls on the pipe when the thread is waiting for events (otherwise we'd
- get e.g. keyboard input while the thread is doing something else, and spin
- forever trying to wake up a thread that doesn't care). The obvious solution is
- just to poll on that fd ourselves, and that's what I did—it's just that
- getting the fd from wineserver was kind of ugly, and the code for waiting was
- also kind of ugly basically because we have to wait on both X11's fd and the
- "normal" process/thread-style wineserver fd that we use to signal sent
- messages. The upshot about the whole thing was that races are basically
- impossible, since a thread can only wait on its own queue.
- System APCs already work, since the server will forcibly suspend a thread if
- it's not already waiting, and so we just need to check for EINTR from
- poll(). User APCs and alertable waits are implemented in a similar style to
- message queues (well, sort of): whenever someone executes an alertable wait,
- we add an additional eventfd to the list, which the server signals when an APC
- arrives. If that eventfd gets signaled, we hand it off to the server to take
- care of, and return STATUS_USER_APC.
- Originally I kept the volatile state of semaphores and mutexes inside a
- variable local to the handle, with the knowledge that this would break if
- someone tried to open the handle elsewhere or duplicate it. It did, and so now
- this state is stored inside shared memory. This is of the POSIX variety, is
- allocated by the server (but never mapped there) and lives under the path
- "/wine-esync".
- There are a couple things that this infrastructure can't handle, although
- surprisingly there aren't that many. In particular:
- * Implementing wait-all, i.e. WaitForMultipleObjects(..., TRUE, ...), is not
- exactly possible the way we'd like it to be possible. In theory that
- function should wait until it knows all objects are available, then grab
- them all at once atomically. The server (like the kernel) can do this
- because the server is single-threaded and can't race with itself. We can't
- do this in ntdll, though. The approach I've taken I've laid out in great
- detail in the relevant patch, but for a quick summary we poll on each object
- until it's signaled (but don't grab it), check them all again, and if
- they're all signaled we try to grab them all at once in a tight loop, and if
- we fail on any of them we reset the count on whatever we shouldn't have
- consumed. Such a blip would necessarily be very quick.
- * The whole patchset only works on Linux, where eventfd is available. However,
- it should be possible to make it work on a Mac, since eventfd is just a
- quicker, easier way to use pipes (i.e. instead of writing 1 to the fd you'd
- write 1 byte; instead of reading a 64-bit value from the fd you'd read as
- many bytes as you can carry, which is admittedly less than 2**64 but
- can probably be something reasonable.) It's also possible, although I
- haven't yet looked, to use some different kind of synchronization
- primitives, but pipes would be easiest to tack onto this framework.
- * PulseEvent() can't work the way it's supposed to work. Fortunately it's rare
- and deprecated. It's also explicitly mentioned on MSDN that a thread can
- miss the notification for a kernel APC, so in a sense we're not necessarily
- doing anything wrong.
- There are some things that are perfectly implementable but that I just haven't
- done yet:
- * Other synchronizable server primitives. It's unlikely we'll need any of
- these, except perhaps named pipes (which would honestly be rather difficult)
- and (maybe) timers.
- * Access masks. We'd need to store these inside ntdll, and validate them when
- someone tries to execute esync operations.
- This patchset was inspired by Daniel Santos' "hybrid synchronization"
- patchset. My idea was to create a framework whereby even contended waits could
- be executed in userspace, eliminating a lot of the complexity that his
- synchronization primitives used. I do however owe some significant gratitude
- toward him for setting me on the right path.
- I've tried to maximize code separation, both to make any potential rebases
- easier and to ensure that esync is only active when configured. All code in
- existing source files is guarded with "if (do_esync())", and generally that
- condition is followed by "return esync_version_of_this_method(...);", where
- the latter lives in esync.c and is declared in esync.h. I've also tried to
- make the patchset very clear and readable—to write it as if I were going to
- submit it upstream. (Some intermediate patches do break things, which Wine is
- generally against, but I think it's for the better in this case.) I have cut
- some corners, though; there is some error checking missing, or implicit
- assumptions that the program is behaving correctly.
- I've tried to be careful about races. There are a lot of comments whose
- purpose are basically to assure me that races are impossible. In most cases we
- don't have to worry about races since all of the low-level synchronization is
- done by the kernel.
- Anyway, yeah, this is esync. Use it if you like.
- --Zebediah Figura
|