performance.xml 29 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE article PUBLIC
  3. "-//OASIS//DTD DocBook XML V4.1.2//EN"
  4. "docbook/docbookx.dtd" [
  5. <!ENTITY howto "http://en.tldp.org/HOWTO/">
  6. <!ENTITY mini-howto "http://en.tldp.org/HOWTO/mini/">
  7. <!ENTITY homepage "http://catb.org/~esr/">
  8. ]>
  9. <article>
  10. <title>Where's the Latency? Performance analysis of GPSes and GPSD</title>
  11. <articleinfo>
  12. <author>
  13. <firstname>Eric</firstname>
  14. <othername>Steven</othername>
  15. <surname>Raymond</surname>
  16. <affiliation>
  17. <orgname><ulink url="&homepage;">
  18. Thyrsus Enterprises</ulink></orgname>
  19. <address>
  20. <email>esr@thyrsus.com</email>
  21. </address>
  22. </affiliation>
  23. </author>
  24. <copyright>
  25. <year>2005,2013</year>
  26. <holder role="mailto:esr@thyrsus.com">Eric S. Raymond</holder>
  27. </copyright>
  28. <revhistory>
  29. <revision>
  30. <revnumber>2.3</revnumber>
  31. <date>25 November 2011</date>
  32. <authorinitials>esr</authorinitials>
  33. <revremark>
  34. Typo fixes.
  35. </revremark>
  36. </revision>
  37. <revision>
  38. <revnumber>2.2</revnumber>
  39. <date>30 September 2011</date>
  40. <authorinitials>esr</authorinitials>
  41. <revremark>
  42. Fix errors in some whole-cycle visualizations.
  43. </revremark>
  44. </revision>
  45. <revision>
  46. <revnumber>2.1</revnumber>
  47. <date>29 September 2011</date>
  48. <authorinitials>esr</authorinitials>
  49. <revremark>
  50. Revisions as suggested by Hal Murray.
  51. </revremark>
  52. </revision>
  53. <revision>
  54. <revnumber>2.0</revnumber>
  55. <date>23 September 2011</date>
  56. <authorinitials>esr</authorinitials>
  57. <revremark>
  58. Update to include whole-cycle profiling.
  59. </revremark>
  60. </revision>
  61. <revision>
  62. <revnumber>1.2</revnumber>
  63. <date>27 September 2009</date>
  64. <authorinitials>esr</authorinitials>
  65. <revremark>
  66. Endnote about the big protocol change.
  67. </revremark>
  68. </revision>
  69. <revision>
  70. <revnumber>1.1</revnumber>
  71. <date>4 January 2008</date>
  72. <authorinitials>esr</authorinitials>
  73. <revremark>
  74. Typo fixes and clarifications for issues raised by Bruce Sutherland.
  75. </revremark>
  76. </revision>
  77. <revision>
  78. <revnumber>1.0</revnumber>
  79. <date>21 February 2005</date>
  80. <authorinitials>esr</authorinitials>
  81. <revremark>
  82. Initial draft.
  83. </revremark>
  84. </revision>
  85. </revhistory>
  86. <abstract>
  87. <para>Many GPS manufacturers tout binary protocols more dense than
  88. NMEA and baud rates higher than the NMEA-standard 4800 bps as ways to
  89. increase GPS performance. While working on
  90. <application>gpsd</application>, I became interested in evaluating
  91. these claims, which have some implications for the design of
  92. <application>gpsd</application>. Later, the average and peak latency
  93. became of interest for modeling the performance of GPS as a time
  94. service. This paper discusses the theory and the results of profiling
  95. the code, and reaches some conclusions about system tuning and latency
  96. control.</para>
  97. </abstract>
  98. </articleinfo>
  99. <sect1><title>What Can We Measure and Improve?</title>
  100. <para>The most important characteristic of a GPS, the positional
  101. accuracy of its fixes, is a function of the GPS system and receiver
  102. design; GPSD can do nothing to affect it. However, a GPS fix has a
  103. timestamp, and the transmission path from the GPS to a user
  104. application introduces some latency (that is, delay between the time
  105. of fix and when it is available to the user).</para>
  106. <para>Latency could be significant source of position error for a GPS in
  107. motion. It may also be a significant issue if the GPS is being used as
  108. a high-precision time source.</para>
  109. <para>This paper describes the results of two rounds of measurement
  110. using different tools. Both yielded interesting results, and can usefully
  111. be juxtaposed with one another.</para>
  112. <para>The first round was performed in 2005 using a version of the
  113. code very near 2.13. This was before the JSON protocol change and before
  114. GPSD developed the capability to do automated cycle detection.
  115. Consequently the statistics extracted were primarily timings of individual
  116. packets coming over the wire from the GPS.</para>
  117. <para>The second round was performed in 2011 using a version of the
  118. code between 3.1 and 3.2. At this time GPSD was being evaluated as a
  119. high-precision time source for use in measuring network latency and
  120. checking the performance of NTP. The profiling tools built into
  121. GPSD had been rebuilt with an emphasis on timing the entire GPS
  122. reporting cycle, rather than individual packets.</para>
  123. <para>Both the 2005 and 2011 rounds used the same consumer-grade
  124. sensor attached via USB to a Linux machine via a USB link, reporting
  125. to <application>gpsd</application> which is polled via sockets by a
  126. profiling application. In all cases the GPS was stationary at
  127. approximately 40N 75W.</para>
  128. </sect1>
  129. <sect1><title>Per-sentence profiling</title>
  130. <sect2><title>Modeling the reporting chain</title>
  131. <para>Consider the whole transmission path of a TPV
  132. (position/velocity/time) report from a GPS to the user or user
  133. application. It has the following stages:</para>
  134. <orderedlist>
  135. <listitem>
  136. <para>A TPV report is generated in the GPS</para>
  137. </listitem>
  138. <listitem>
  139. <para>It is encoded (into NMEA or a vendor binary protocol)
  140. and buffered for transmission via serial link.</para>
  141. </listitem>
  142. <listitem>
  143. <para>The encoding is transmitted via serial link to a buffer in <application>gpsd</application>.</para>
  144. </listitem>
  145. <listitem>
  146. <para>The encoding is decoded and translated into a notification in
  147. GPSD's protocol.</para>
  148. </listitem>
  149. <listitem>
  150. <para>The GPSD-protocol notification is polled and read over a
  151. client socket.</para>
  152. </listitem>
  153. <listitem>
  154. <para>The GPSD-protocol notification is decoded by libgps and unpacked
  155. into a session structure available to the user application.</para>
  156. </listitem>
  157. </orderedlist>
  158. <para>It is also relevant that consumer-grade GPSes do not expect to
  159. be polled, but are designed to issue TPV reports on a fixed cycle
  160. time, which we'll call C and which is usually 1
  161. second. <application>gpsd</application> expects this. A few GPSes
  162. (notably SiRF-II-based ones) can be polled, and we might thus be able
  163. to pull TPV reports out of them at a higher rate.
  164. <application>gpsd</application> doesn't do this; one question this
  165. investigation can address is whether there would be any point to
  166. that.</para>
  167. <para>At various times GPS manufacturers have promoted proprietary
  168. binary protocols and transmission speeds higher than the NMEA-standard
  169. 4800bps as ways to improve GPS performance. Obviously these cannot
  170. affect positional accuracy; all they can change is the latency at
  171. stages 2, 3, and 4.</para>
  172. <para>The implementation of <application>gpsd</application> affects
  173. how much latency is introduced at stage 4. The design of the
  174. <application>gpsd</application> protocol (in particular, the average
  175. and worst-case size and complexity of a position/velocity/time report)
  176. affects how much latency is introduced at stages 5 and 6.</para>
  177. <para>At stages 5 and later, the client design and implementation
  178. matter a lot. In particular, it matters how frequently the client
  179. samples the TPV reports that <application>gpsd</application> makes
  180. available.</para>
  181. <para>The list of stages above implies the following formula for
  182. expected latency L, and a set of tactics for reducing it:</para>
  183. <literallayout>
  184. L = C/2 + E1 + T1 + D1 + W + E2 + T2 + D2
  185. </literallayout>
  186. <para>where:</para>
  187. <orderedlist>
  188. <listitem>
  189. <para>C/2 is the expected delay introduced by a cycle time of C
  190. (worst-case delay would just be C). We can decrease this by decreasing
  191. C, but consumer-grade GPSes don't go below 1 second.</para>
  192. </listitem>
  193. <listitem>
  194. <para>E1 is TPV encoding time within the GPS. We can't affect this.</para>
  195. </listitem>
  196. <listitem>
  197. <para>T1 is transmission time over the serial link. We can decrease
  198. this by raising the baud rate or increasing the information density
  199. of the encoding.</para>
  200. </listitem>
  201. <listitem>
  202. <para>D1 is decode time required for <application>gpsd</application>
  203. to update its session structure. We can decrease this, if need be,
  204. by tuning the implementation or using faster hardware.</para>
  205. </listitem>
  206. <listitem>
  207. <para>W is the wait until the application polls
  208. <application>gpsd</application>. This can only be reduced by designing
  209. the application to poll frequently.</para>
  210. </listitem>
  211. <listitem>
  212. <para>E2 is TPV encoding time within the daemon. We can speed this up
  213. with faster hardware or a simpler GPSD format.</para>
  214. </listitem>
  215. <listitem>
  216. <para>T2 is transmission time over the client socket. Faster hardware,
  217. a better TCP/IP stack or a denser encoding can decrease this.</para>
  218. </listitem>
  219. <listitem>
  220. <para>D2 is decoding time required for the client library to update
  221. the session structure visible to the user application. A simpler
  222. GPSD format could decrease this</para>
  223. </listitem>
  224. </orderedlist>
  225. <para>The total figure L is of interest, of course. The first
  226. question to ask is how it compares to C. But to know where
  227. tuning this system is worth the effort and where it isn't, the
  228. relative magnitude of these six components is what is important. In
  229. particular, if C or E1 dominate, there is no point in trying to tune
  230. the system at all.</para>
  231. <para>The rule on modern hardware is that computation is cheap,
  232. communication is expensive. By this rule, we expect E1, D1, E2, and D2 to
  233. be small relative to T1 and T2. We can't predict W. Thus there is no
  234. knowing how the sum of the other terms will compare to C, but we know
  235. that E1 + T1 is the other statistic GPS vendors can easily measure. C
  236. &lt; E1 + T1 would be a bad idea, and we can guess that competition among
  237. GPS vendors will probably tend to push C downwards to the point where
  238. it's not much larger than E1 + T1.</para>
  239. <para>C is known from manufacturer specifications.
  240. <application>gpsd</application> and its client libraries can be built
  241. with profiling code that measures all the other timing variables. The
  242. tool
  243. <citerefentry><refentrytitle>gpsprof</refentrytitle><manvolnum>1</manvolnum></citerefentry>
  244. collects this data and generates reports and plots from it. There
  245. are, however, some sources of error to be aware of:</para>
  246. <itemizedlist>
  247. <listitem>
  248. <para>Our way of measuring E1 and T1 is to collect a timestamp on the
  249. first character read of a new NMEA sentence, then on the terminating
  250. newline, and compare those to the GPS timestamp on the sentence.
  251. While this will measure E1+T1 accurately, it will underestimate
  252. the contribution of T1 to the whole because it doesn't measure
  253. RS232 activity taking place before the first character becomes
  254. visible at the receive end.</para>
  255. </listitem>
  256. <listitem>
  257. <para>Because we compare GPS sentence timestamps with local ones,
  258. inaccuracy in the computer's clock fuzzes the measurements. The test
  259. machine updated time from NTP, so the expected inaccuracy from this
  260. source should be not more than about ten milliseconds.</para>
  261. </listitem>
  262. <listitem>
  263. <para>The $ clause that the daemon uses to ship per-sentence profiling info to
  264. the client adds substantial bulk to the traffic. Thus, it will tend
  265. to inflate E2, T2, and D2 somewhat.</para>
  266. </listitem>
  267. <listitem>
  268. <para>The client library used for profiling is written in Python,
  269. which will further inflate D2 relative to the C client library most
  270. applications are likely to use.</para>
  271. </listitem>
  272. <listitem>
  273. <para>The system-call overhead of profiling (seven
  274. <citerefentry><refentrytitle>gettimeofday</refentrytitle><manvolnum>2</manvolnum></citerefentry>
  275. calls per sentence to collect timestamps, several other time-library
  276. calls per sentence to convert ISO8661 timestamps) will introduce a
  277. small amount of noise into the figures. These are cheap calls that
  278. don't induce disk activity; thus, on modern hardware; we may expect
  279. the overhead per call to be at worst in the microsecond range. The
  280. entire per-sentence overhead system-call overhead should be on the
  281. order of ten microseconds.</para>
  282. </listitem>
  283. </itemizedlist>
  284. </sect2>
  285. <sect2><title>Data and Analysis</title>
  286. <para>I took measurements using a Haicom 204s USB GPS mouse. This
  287. device, using a SiRF-II GPS chipset and PL2303 USB-to-serial chipset, is
  288. very typical of 2005's consumer-grade GPS hardware; the Haicom people
  289. themselves estimated to me in late 2004 that the SirF-II had about 80%
  290. and rising market share, and the specification sheets I find with
  291. Web searches back this up. Each profile run used 100 samples.</para>
  292. <para>My host system for the measurements was an Opteron 3400 running an
  293. "everything" installation of Fedora Core 3. This was still a
  294. moderately fast machine in early 2005, but average processor
  295. utilization remained low throughout the tests.</para>
  296. <para>The version of the GPSD software I used for the test was
  297. released as 2.13. It was configured with
  298. &mdash;&mdash;enable-profiling. All graphs and figures were generated
  299. with
  300. <citerefentry><refentrytitle>gpsprof</refentrytitle><manvolnum>1</manvolnum></citerefentry>,
  301. a tool built for this purpose and included in the distribution.</para>
  302. <para>One of the effects of building with
  303. &ndash;&ndash;enable-profiling is that a form of the B command that
  304. normally just reports the RS232 parameters can be used to set them (it
  305. ships a SiRF-II control string to the GPS and then changes the line
  306. settings).</para>
  307. <para>Another effect is to enable a Z command to switch on profiling.
  308. When profiling is on, each time
  309. <application>gpsd</application>
  310. reports a fix with timestamp (e.g. on GPGGA, GPRMC and GPGLL
  311. sentences) it also reports timing information from five checkpoints
  312. inside the daemon. The client library adds two more checkpoints.</para>
  313. <para>Our first graph is with profile reporting turned off, to give us
  314. a handle on performance with the system disturbed as little as
  315. possible. This was generated with <command>gpsprof -t "Haicom 204s" -T png -f
  316. uninstrumented -s 4800</command>. We'll compare it to later plots to
  317. see the effect of profiling overhead.</para>
  318. <figure><title>Total latency</title>
  319. <mediaobject>
  320. <imageobject>
  321. <imagedata fileref='graph1.png'/>
  322. </imageobject>
  323. </mediaobject>
  324. </figure>
  325. <para>Uninstrumented total latency is simply the delta from the GPS
  326. timestamp associated with the packet to the arrival time of the end of
  327. the packet at the profiling client. The repeated stairstep effect is
  328. because all packets in a reporting cycle have the same timestamp;
  329. thus, the impulses cumulate time in the reporting cycle so far.</para>
  330. <para>The first thing to notice here is that the fix latency can be
  331. just over a second; you can see the exact figures in the <ulink
  332. url='profile1.txt'>raw data</ulink>. Where is the time going? Our next
  333. graph was generated with <command>gpsprof -T png -t
  334. "Haicom 204s" -f raw -s 4800</command></para>
  335. <figure><title>Instrumented latency report</title>
  336. <mediaobject>
  337. <imageobject>
  338. <imagedata fileref='graph2.png'/>
  339. </imageobject>
  340. </mediaobject>
  341. </figure>
  342. <para>As in the previous graph, each group of three lines is a single
  343. GPS reporting cycle. By comparing this graph to the previous one, it
  344. is pretty clear that the profiling reports are not introducing any
  345. measurable latency. But what is more interesting is to notice that D1
  346. + W + E2 + T2 + D2 vanishes &mdash; at this timescale, all we can see
  347. is E1 and T1.</para>
  348. <para>The <ulink url='profile2.txt'>raw data</ulink> bears this out.
  349. All times besides E1 and T1 are so small that they are comparable to
  350. the noise level of the measurements. This may be a bit surprising
  351. unless one knows that a W near 0 is expected in this setup;
  352. <application>gpsprof</application> sets watcher mode. Also, a modern
  353. zero-copy TCP/IP stack like Linux's implements local sockets with very
  354. low overhead. It is also a little surprising that E1 is so large
  355. relative to E1+T1. Recall, however, that this may be measurement
  356. error.</para>
  357. <para>Our third graph (<command>gpsprof -t "Haicom 204s" -T png -f split -s 4800</command> changes the presentation so we can see
  358. how latency varies with sentence type.</para>
  359. <figure><title>Split latency report</title>
  360. <mediaobject>
  361. <imageobject>
  362. <imagedata fileref='graph3.png'/>
  363. </imageobject>
  364. </mediaobject>
  365. </figure>
  366. <para>The reason for the comb pattern in the previous graphs is now
  367. apparent; latency is constant for any given sentence type. The obvious
  368. correlate would be sentence length &mdash; but looking at the <ulink
  369. url='profile3.txt'>raw data</ulink>, we see that that is not the only
  370. factor. Consider this table:</para>
  371. <informaltable>
  372. <tgroup cols='3'>
  373. <thead>
  374. <row>
  375. <entry>Sentence type</entry>
  376. <entry>Typical length</entry>
  377. <entry>Typical latency</entry>
  378. </row>
  379. </thead>
  380. <tbody>
  381. <row>
  382. <entry>GPRMC</entry>
  383. <entry>70</entry>
  384. <entry>1.01</entry>
  385. </row>
  386. <row>
  387. <entry>GPGGA</entry>
  388. <entry>81</entry>
  389. <entry>0.23</entry>
  390. </row>
  391. <row>
  392. <entry>GPGLL</entry>
  393. <entry>49</entry>
  394. <entry>0.31</entry>
  395. </row>
  396. </tbody>
  397. </tgroup>
  398. </informaltable>
  399. <para>For illustration, here are some sample NMEA sentences logged
  400. while I was conducting these tests:</para>
  401. <literallayout>
  402. $GPRMC,183424.834,A,4002.1033,N,07531.2003,W,0.00,0.00,170205,,*11
  403. $GPGGA,183425.834,4002.1035,N,07531.2004,W,1,05,1.9,134.7,M,-33.8,M,0.0,0000*48
  404. $GPGLL,4002.1035,N,07531.2004,W,183425.834,A*27
  405. </literallayout>
  406. <para>Though GPRMCs are shorter than GPGAs, they consistently have an
  407. associated latency four times as long. The graph tells us most of
  408. this is E1. There must be something the GPS is doing that is
  409. computationally very expensive when it generates GPRMCs. It may well
  410. be that it is actually doing that fix at that point in the send cycle
  411. and buffering the results for retransmission in GPGGA and GPGLL forms.
  412. Alternatively, perhaps the speed/track computation is
  413. expensive.</para>
  414. <para>Now let's look at how the picture changes when we double the
  415. baud rate. <command>gpsprof -t "Haicom 204s" -T png -s 9600</command>
  416. gives us this:</para>
  417. <figure><title>Split latency report, 9600bps</title>
  418. <mediaobject>
  419. <imageobject>
  420. <imagedata fileref='graph4.png'/>
  421. </imageobject>
  422. </mediaobject>
  423. </figure>
  424. <para>This graph looks almost identical to the previous one, except
  425. for vertical scale &mdash; latency has been cut neatly in half.
  426. Transmission times for GPRMC go from about 0.15sec to 0.075sec. Oddly,
  427. average E1 is also cut almost in half. I don't know how to explain
  428. this, unless a lot of what looks like E1 is actually RS232
  429. transmission time spent before the first character appears in the
  430. daemon's receive buffers. You can also view the
  431. <ulink url='profile4.txt'>raw data</ulink>.</para>
  432. <para>For comparison, here's the same plot made with a BU303b, a
  433. different USB GPS mouse using the same SiRF-II/PL2303 combination:</para>
  434. <figure><title>Split latency report, 9600bps</title>
  435. <mediaobject>
  436. <imageobject>
  437. <imagedata fileref='graph5.png'/>
  438. </imageobject>
  439. </mediaobject>
  440. </figure>
  441. <para>This, and the <ulink url='profile5.txt'>raw data</ulink>, look
  442. very similar to the Haicom numbers. The main difference seems to be
  443. that the BU303b firmware doesn't ship GPGLL by default.</para>
  444. </sect2>
  445. </sect1>
  446. <sect1><title>Per-cycle profiling</title>
  447. <sect2><title>Modeling the reporting chain</title>
  448. <para>When the old GPSD protocol was replaced by an application of JSON
  449. and the daemon developed the capability to perform automatic detection of
  450. the beginning and end of GPS reporting cycles, it became possible to measure
  451. whole-cycle latency. Also, embedding timing statistics in the JSON digest
  452. of an entire cycle rather than as a $ sentence after each GPS packet
  453. significantly reduced the overhead of profiling in the report stream.</para>
  454. <para>The model for these measurements is as follows:</para>
  455. <orderedlist>
  456. <listitem>
  457. <para>A TPV report is generated in the GPS (at time 'T')</para>
  458. </listitem>
  459. <listitem>
  460. <para>It is encoded into a burst of sentences in NMEA or a vendor
  461. binary protocol and buffered for transmission via serial link.</para>
  462. </listitem>
  463. <listitem>
  464. <para>The encoding is transmitted via serial link to a buffer in
  465. <application>gpsd</application>, beginning at a time we shall call
  466. 'S'.</para>
  467. </listitem>
  468. <listitem>
  469. <para>Because it consists of multiple packets, a period combining
  470. serial transmission time with <application>gpsd</application>
  471. processing (packet-sniffing and analysis) time will follow.</para>
  472. </listitem>
  473. <listitem>
  474. <para>At the end of this interval (at a time we shall call 'E'),
  475. <application>gpsd</application> has seen the GPS data it needs and is
  476. ready to produce a report to ship to clients.</para>
  477. </listitem>
  478. <listitem>
  479. <para>Meanwhile, the GPS may still be transmitting data that GPSD does
  480. not use. But when the transmission burst is done, there will be quiet
  481. time on the link (exception: as we noted in 2005, some devices' transmissions
  482. may slightly overflow the 1-second cycle time at 4800bps).</para>
  483. </listitem>
  484. <listitem>
  485. <para>The JSON report is shipped, arriving at the client at a time we
  486. shall call 'R'.</para>
  487. </listitem>
  488. </orderedlist>
  489. <para>We cannot know T directly. The GPS's timestamp on the fix will
  490. tell us when it thinks T was, but because we don't know how our local
  491. clock diverges from GPS's atomic-clock timebase we don't actually know
  492. what T was in system time (call that T'). If we trust NTP, we then
  493. believe that the skew between T and T' is no more than 10ms.</para>
  494. <para>We catch time S by recording system time each time data becomes
  495. available from the device. If adjacent returns of select(2) are separated
  496. by more than 250msec, we have good reason to believe that the second one
  497. follows end-of-cycle quiet time. This guard interval is reliable at 9600bps
  498. and will only be more so at higher speeds.</para>
  499. <para>We catch time E just before a JSON report is generated from the
  500. per-device session structures. This is the wnd of the analysis phase.
  501. If timing is enabled, extra members carrying S, E and the number
  502. of characters transmitted during the cycle (C) are included in the JSON.</para>
  503. <para>We catch time R by noting when the JSON arrives at the client.</para>
  504. <para>We know that the transmission-time portion of [S, E] can be
  505. approximated by the formula (C * 10) / B where B is the
  506. transmission rate in bits per second. (Each character costs 8 bits
  507. plus one parity bit plus one stop bit.)</para>
  508. <para>Knowing this, we can subtract (C * 10) / B from (E-S) to approximate
  509. the internal processing time spent by <application>gpsd</application>.
  510. Due to other UART overheads, this formula will slightly underestimate
  511. transmission time and this overestimate processing time, but even a rough
  512. comparison of the two is interesting.</para>
  513. </sect2>
  514. <sect2><title>Data and Analysis</title>
  515. <para>With the new profiling tools, one graph (made with
  516. <command>gpsprof -f instrumented -n 100 -T png</command>) tells the
  517. story. This is from the same Haicom 204s used in the 2005 tests. You
  518. can see the exact figures in the <ulink url='profile6.txt'>raw
  519. data</ulink>.</para>
  520. <figure><title>Per-cycle latency report, 19200bps</title>
  521. <mediaobject>
  522. <imageobject>
  523. <imagedata fileref='graph6.png'/>
  524. </imageobject>
  525. </mediaobject>
  526. </figure>
  527. <para>Fix latency (S - T, the purple time segment in each sample) is
  528. consistently about 120msec. Some of this represents on-chip
  529. processing. Some may represent actual skew between NTP time and GPS
  530. time.</para>
  531. <para>RS232 time (the blue segment) is the character transmission time
  532. estimate we computed. It seems relatively steady at around 125ms. This is
  533. probably a bit low, proportionately speaking.</para>
  534. <para>The green segment is (E-S) with RS232 computed time subtracted.
  535. It approximates the time required by <application>gpsd</application>
  536. for itelf. It seems steady at around 15ms. This is probably a bit
  537. high, proportionately speaking.</para>
  538. <para>The red dots that are just barely visible at the tops of some
  539. sample bars represent R-E, the client reception delta. Inspection of
  540. the raw data reveals that it is on the close order of 1ms.</para>
  541. <para>Total fix latency is steady at about 310ms. Transmission
  542. time dominates.</para>
  543. <para>It is instructive to compare this with the graph (and the
  544. <ulink url='profile7.txt'>raw data</ulink>) from the same device
  545. at 9600bps.</para>
  546. <figure><title>Per-cycle latency report, 9600bps</title>
  547. <mediaobject>
  548. <imageobject>
  549. <imagedata fileref='graph7.png'/>
  550. </imageobject>
  551. </mediaobject>
  552. </figure>
  553. <para>As we might expect, RS232 time changes drastically and the other
  554. components barely change at all. This gives us reason to be confident that
  555. computed RS232 time is in fact tracking actual transmission time pretty
  556. closely. It also confirms that the most effective way to decrease total
  557. fix latency is simply to bump up the transmission speed.
  558. </para>
  559. <para>It is equally instructive to compare these graphs with graphs
  560. taken from the same GPS, at the same speed, running in NMEA rather than
  561. vendor binary mode. Consider, for example, these:</para>
  562. <figure><title>Per-cycle latency report, NMEA mode, 9600bps</title>
  563. <mediaobject>
  564. <imageobject>
  565. <imagedata fileref='graph8.png'/>
  566. </imageobject>
  567. </mediaobject>
  568. </figure>
  569. <para>(Raw data is <ulink url='profile8.txt'>here</ulink>.)</para>
  570. <figure><title>Per-cycle latency report, NMEA mode, 19200bps</title>
  571. <mediaobject>
  572. <imageobject>
  573. <imagedata fileref='graph9.png'/>
  574. </imageobject>
  575. </mediaobject>
  576. </figure>
  577. <para>(Raw data is <ulink url='profile8.txt'>here</ulink>.)</para>
  578. <para>The comb-shaped pattern in these graphs reflect the additional
  579. transmission time for $GPGSV every 5 cycles. We can see clearly that
  580. the vendor binary protocol does not significantly cut either the latency
  581. or the total bandwidth required.</para>
  582. </sect2>
  583. </sect1>
  584. <sect1><title>Conclusions</title>
  585. <para>All these conclusions apply to the consumer-grade GPS hardware
  586. generally available back in 2005 and today in 2011, e.g. with a cycle time
  587. of one second. As it happens, 2005 was just after the point when
  588. consumer-grade GPS chips stabilized as a technology, and though unit
  589. prices have fallen they have changed relatively little in technology and
  590. performance over the intervening six years. The main improvement has been
  591. in sensitivity, improving operation with a poor skyview but not affecting
  592. the timing characteristics of the output.</para>
  593. <sect2><title>For Application Programmers</title>
  594. <para>For the best tradeoff between minimizing latency and use of
  595. application resources, an argument similar to Nyquist's Theorem tells
  596. us to poll <application>gpsd</application> once every half-cycle
  597. &mdash; that is, on almost all GPSes at time of writing, twice a
  598. second.</para>
  599. <para>With the SiRF chips still used in most consumer-grade GPSes at
  600. time of writing, 9600bps is the optimal line speed. 4800 is slightly
  601. too low, not guaranteeing updates within the 1-second cycle time.
  602. 9600bps yields updates in about 0.45sec, 19600bps in about 0.26sec.
  603. Higher speeds would probably not be worth the extra computation unless
  604. your sensor is in rapid motion. Even whole-cycle latency, most
  605. sensitive to transmission speed, is only cut by less than 200ms by
  606. going to 19200. Higher speed will exhibit diminishing returns.</para>
  607. <para>Comparing the SiRF-II performance at 4800bps and 9600 shows a
  608. drop in E1+T1 that looks about linear, suggesting that for a cycle of
  609. n seconds, the optimal line speed would be about 9600/n. Since future
  610. GPS chips are likely to have faster processors and thus less latency,
  611. this may be considered an upper bound on useful line speed.</para>
  612. </sect2>
  613. <sect2><title>For Manufacturer Claims</title>
  614. <para>Because 9600bps is readily available, the transmission- and
  615. decode-time advantages of binary protocols over NMEA are not
  616. significant within a 1-per-second update cycle. Because line speeds
  617. up to 38400 are readily available through standard UARTs, we may
  618. expect this to continue to be the case even with cycle times as
  619. low as 0.25sec.</para>
  620. <para>More generally, binary protocols are largely pointless except as
  621. market-control devices for the manufacturers. The additional
  622. capabilities they support could be just as effectively supported
  623. through NMEA's $P-prefix extension mechanism.</para>
  624. </sect2>
  625. <sect2><title>For GPSD as a Time Service</title>
  626. <para>We have measured a typical intrisic time latency of about 70msec due to
  627. on-GPS processing and the USB polling interval. While this is noticeably
  628. higher than NTP's expected accuracy of &plusmn;10msec, it should be
  629. adequate for most applications other than physics experiments.</para>
  630. </sect2>
  631. <sect2><title>For the Design of GPSD</title>
  632. <para>In 2005, I wrote that <application>gpsd</application> does not
  633. introduce measurable latency into the path from GPS to application. I
  634. said that cycle times would have to decrease by two orders of
  635. magnitude for this to change.</para>
  636. <para>In 2011, with better whole-cycle oriented profiling tools and a
  637. faster test machine, latency incurred by
  638. <application>gpsd</application> can be measured. It is less than 15ms
  639. sec on a 2.66 Intel Core Duo under normal load. How much less depends
  640. on how much the model computations underestimate RS232 transmission time
  641. for the GPS data.</para>
  642. </sect2>
  643. </sect1>
  644. </article>