url.texi 44 KB


  1. \input texinfo
  2. @setfilename ../../info/url.info
  3. @settitle URL Programmer's Manual
  4. @include docstyle.texi
  5. @iftex
  6. @c @finalout
  7. @end iftex
  8. @c @setchapternewpage odd
  9. @c @smallbook
  10. @tex
  11. \overfullrule=0pt
  12. %\global\baselineskip 30pt % for printing in double space
  13. @end tex
  14. @dircategory Emacs lisp libraries
  15. @direntry
  16. * URL: (url). URL loading package.
  17. @end direntry
  18. @copying
  19. This is the manual for the @code{url} Emacs Lisp library.
  20. Copyright @copyright{} 1993--1999, 2002, 2004--2015 Free Software
  21. Foundation, Inc.
  22. @quotation
  23. Permission is granted to copy, distribute and/or modify this document
  24. under the terms of the GNU Free Documentation License, Version 1.3 or
  25. any later version published by the Free Software Foundation; with no
  26. Invariant Sections, with the Front-Cover Texts being ``A GNU Manual,''
  27. and with the Back-Cover Texts as in (a) below. A copy of the license
  28. is included in the section entitled ``GNU Free Documentation License''.
  29. (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
  30. modify this GNU manual.''
  31. @end quotation
  32. @end copying
  33. @c
  34. @titlepage
  35. @title URL Programmer's Manual
  36. @subtitle First Edition, URL Version 2.0
  37. @author William M. Perry @email{wmperry@@gnu.org}
  38. @author David Love @email{fx@@gnu.org}
  39. @page
  40. @vskip 0pt plus 1filll
  41. @insertcopying
  42. @end titlepage
  43. @contents
  44. @node Top
  45. @top URL
  46. @ifnottex
  47. @insertcopying
  48. @end ifnottex
  49. @menu
  50. * Introduction:: About the @code{url} library.
  51. * URI Parsing:: Parsing (and unparsing) URIs.
  52. * Retrieving URLs:: How to use this package to retrieve a URL.
  53. * Supported URL Types:: Descriptions of URL types currently supported.
  54. * General Facilities:: URLs can be cached, accessed via a gateway
  55. and tracked in a history list.
  56. * Customization:: Variables you can alter.
  57. * GNU Free Documentation License:: The license for this documentation.
  58. * Function Index::
  59. * Variable Index::
  60. * Concept Index::
  61. @end menu
  62. @node Introduction
  63. @chapter Introduction
  64. @cindex URL
  65. @cindex URI
  66. @cindex uniform resource identifier
  67. @cindex uniform resource locator
  68. A @dfn{Uniform Resource Identifier} (URI) is a specially-formatted
  69. name, such as an Internet address, that identifies some name or
  70. resource. The format of URIs is described in RFC 3986, which updates
  71. and replaces the earlier RFCs 2732, 2396, 1808, and 1738. A
  72. @dfn{Uniform Resource Locator} (URL) is an older but still-common
  73. term, which basically refers to a URI corresponding to a resource that
  74. can be accessed (usually over a network) in a specific way.
  75. Here are some examples of URIs (taken from RFC 3986):
  76. @example
  77. ftp://ftp.is.co.za/rfc/rfc1808.txt
  78. http://www.ietf.org/rfc/rfc2396.txt
  79. ldap://[2001:db8::7]/c=GB?objectClass?one
  80. mailto:John.Doe@@example.com
  81. news:comp.infosystems.www.servers.unix
  82. tel:+1-816-555-1212
  83. telnet://192.0.2.16:80/
  84. urn:oasis:names:specification:docbook:dtd:xml:4.1.2
  85. @end example
  86. This manual describes the @code{url} library, an Emacs Lisp library
  87. for parsing URIs and retrieving the resources to which they refer.
  88. (The library is so-named for historical reasons; nowadays, the ``URI''
  89. terminology is regarded as the more general one, and ``URL'' is
  90. technically obsolete despite its widespread vernacular usage.)
  91. @node URI Parsing
  92. @chapter URI Parsing
  93. A URI consists of several @dfn{components}, each having a different
  94. meaning. For example, the URI
  95. @example
  96. http://www.gnu.org/software/emacs/
  97. @end example
  98. @noindent
  99. specifies the scheme component @samp{http}, the hostname component
  100. @samp{www.gnu.org}, and the path component @samp{/software/emacs/}.
  101. @cindex parsed URIs
  102. The format of URIs is specified by RFC 3986. The @code{url} library
  103. provides the Lisp function @code{url-generic-parse-url}, a (mostly)
  104. standard-compliant URI parser, as well as function
  105. @code{url-recreate-url}, which converts a parsed URI back into a URI
  106. string.
  107. @defun url-generic-parse-url uri-string
  108. This function returns a parsed version of the string @var{uri-string}.
  109. @end defun
  110. @defun url-recreate-url uri-obj
  111. @cindex unparsing URLs
  112. Given a parsed URI, this function returns the corresponding URI string.
  113. @end defun
  114. @cindex parsed URI
  115. The return value of @code{url-generic-parse-url}, and the argument
  116. expected by @code{url-recreate-url}, is a @dfn{parsed URI}: a CL
  117. structure whose slots hold the various components of the URI@.
  118. @xref{Top,the CL Manual,,cl,GNU Emacs Common Lisp Emulation}, for
  119. details about CL structures. Most of the other functions in the
  120. @code{url} library act on parsed URIs.
  121. @menu
  122. * Parsed URIs:: Format of parsed URI structures.
  123. * URI Encoding:: Non-@acronym{ASCII} characters in URIs.
  124. @end menu
  125. @node Parsed URIs
  126. @section Parsed URI structures
  127. Each parsed URI structure contains the following slots:
  128. @table @code
  129. @item type
  130. The URI scheme (a string, e.g., @code{http}). @xref{Supported URL
  131. Types}, for a list of schemes that the @code{url} library knows how to
  132. process. This slot can also be @code{nil}, if the URI is not fully
  133. specified.
  134. @item user
  135. The user name (a string), or @code{nil}.
  136. @item password
  137. The user password (a string), or @code{nil}. The use of this URI
  138. component is strongly discouraged; nowadays, passwords are transmitted
  139. by other means, not as part of a URI.
  140. @item host
  141. The host name (a string), or @code{nil}. If present, this is
  142. typically a domain name or IP address.
  143. @item port
  144. The port number (an integer), or @code{nil}. Omitting this component
  145. usually means to use the ``standard'' port associated with the URI
  146. scheme.
  147. @item filename
  148. The combination of the ``path'' and ``query'' components of the URI (a
  149. string), or @code{nil}. If the query component is present, it is the
  150. substring following the first @samp{?} character, and the path
  151. component is the substring before the @samp{?}. The meaning of these
  152. components is scheme-dependent; they do not necessarily refer to a
  153. file on a disk.
  154. @item target
  155. The fragment component (a string), or @code{nil}. The fragment
  156. component specifies a ``secondary resource'', such as a section of a
  157. webpage.
  158. @item fullness
  159. This is @code{t} if the URI is fully specified, i.e., the
  160. hierarchical components of the URI (the hostname and/or username
  161. and/or password) are preceded by @samp{//}.
  162. @end table
  163. @findex url-type
  164. @findex url-user
  165. @findex url-password
  166. @findex url-host
  167. @findex url-port
  168. @findex url-filename
  169. @findex url-target
  170. @findex url-attributes
  171. @findex url-fullness
  172. These slots have accessors named @code{url-@var{part}}, where
  173. @var{part} is the slot name. For example, the accessor for the
  174. @code{host} slot is the function @code{url-host}. The @code{url-port}
  175. accessor returns the default port for the URI scheme if the parsed
  176. URI's @var{port} slot is @code{nil}.
  177. The slots can be set using @code{setf}. For example:
  178. @example
  179. (setf (url-port url) 80)
  180. @end example
  181. @node URI Encoding
  182. @section URI Encoding
  183. @cindex percent encoding
  184. The @code{url-generic-parse-url} parser does not obey RFC 3986 in
  185. one respect: it allows non-@acronym{ASCII} characters in URI strings.
  186. Strictly speaking, RFC 3986 compatible URIs may only consist of
  187. @acronym{ASCII} characters; non-@acronym{ASCII} characters are
  188. represented by converting them to UTF-8 byte sequences, and performing
  189. @dfn{percent encoding} on the bytes. For example, the o-umlaut
  190. character is converted to the UTF-8 byte sequence @samp{\xD3\xA7},
  191. then percent encoded to @samp{%D3%A7}. (Certain ``reserved''
  192. @acronym{ASCII} characters must also be percent encoded when they
  193. appear in URI components.)
  194. The function @code{url-encode-url} can be used to convert a URI
  195. string containing arbitrary characters to one that is properly
  196. percent-encoded in accordance with RFC 3986.
  197. @defun url-encode-url url-string
  198. This function return a properly URI-encoded version of
  199. @var{url-string}. It also performs @dfn{URI normalization},
  200. e.g., converting the scheme component to lowercase if it was
  201. previously uppercase.
  202. @end defun
  203. To convert between a string containing arbitrary characters and a
  204. percent-encoded all-@acronym{ASCII} string, use the functions
  205. @code{url-hexify-string} and @code{url-unhex-string}:
  206. @defun url-hexify-string string &optional allowed-chars
  207. This function performs percent-encoding on @var{string}, and returns
  208. the result.
  209. If @var{string} is multibyte, it is first converted to a UTF-8 byte
  210. string. Each byte corresponding to an allowed character is left
  211. as-is, while all other bytes are converted to a three-character
  212. sequence: @samp{%} followed by two upper-case hex digits.
  213. @vindex url-unreserved-chars
  214. @cindex unreserved characters
  215. The allowed characters are specified by @var{allowed-chars}. If this
  216. argument is @code{nil}, the allowed characters are those specified as
  217. @dfn{unreserved characters} by RFC 3986 (see the variable
  218. @code{url-unreserved-chars}). Otherwise, @var{allowed-chars} should
  219. be a vector whose @var{n}-th element is non-@code{nil} if character
  220. @var{n} is allowed.
  221. @end defun
  222. @defun url-unhex-string string &optional allow-newlines
  223. This function replaces percent-encoding sequences in @var{string} with
  224. their character equivalents, and returns the resulting string.
  225. If @var{allow-newlines} is non-@code{nil}, it allows the decoding of
  226. carriage returns and line feeds, which are normally forbidden in URIs.
  227. @end defun
  228. @node Retrieving URLs
  229. @chapter Retrieving URLs
  230. The @code{url} library defines the following three functions for
  231. retrieving the data specified by a URL@. The actual retrieval protocol
  232. depends on the URL's URI scheme, and is performed by lower-level
  233. scheme-specific functions. (Those lower-level functions are not
  234. documented here, and generally should not be called directly.)
  235. In each of these functions, the @var{url} argument can be either a
  236. string or a parsed URL structure. If it is a string, that string is
  237. passed through @code{url-encode-url} before using it, to ensure that
  238. it is properly URI-encoded (@pxref{URI Encoding}).
  239. @defun url-retrieve-synchronously url
  240. This function synchronously retrieves the data specified by @var{url},
  241. and returns a buffer containing the data. The return value is
  242. @code{nil} if there is no data associated with the URL (as is the case
  243. for @code{dired}, @code{info}, and @code{mailto} URLs).
  244. @end defun
  245. @defun url-retrieve url callback &optional cbargs silent no-cookies
  246. This function retrieves @var{url} asynchronously, calling the function
  247. @var{callback} when the object has been completely retrieved. The
  248. return value is the buffer into which the data will be inserted, or
  249. @code{nil} if the process has already completed.
  250. The callback function is called this way:
  251. @example
  252. (apply @var{callback} @var{status} @var{cbargs})
  253. @end example
  254. @noindent
  255. where @var{status} is a plist representing what happened during the
  256. retrieval, with most recent events first, or an empty list if no
  257. events have occurred. Each pair in the plist is one of:
  258. @table @code
  259. @item (:redirect @var{redirected-to})
  260. This means that the request was redirected to the URL
  261. @var{redirected-to}.
  262. @item (:error (@var{error-symbol} . @var{data}))
  263. This means that an error occurred. If so desired, the error can be
  264. signaled with @code{(signal @var{error-symbol} @var{data})}.
  265. @end table
  266. When the callback function is called, the current buffer is the one
  267. containing the retrieved data (if any). The buffer also contains any
  268. MIME headers associated with the data retrieval.
  269. If the optional argument @var{silent} is non-@code{nil}, progress
  270. messages are suppressed. If the optional argument @var{no-cookies} is
  271. non-@code{nil}, cookies are not stored or sent.
  272. @end defun
  273. @defun url-queue-retrieve url callback &optional cbargs silent no-cookies
  274. This function acts like @code{url-retrieve}, but with limits on the
  275. number of concurrently-running network processes. The option
  276. @code{url-queue-parallel-processes} controls the number of concurrent
  277. processes, and the option @code{url-queue-timeout} sets a timeout in
  278. seconds.
  279. To use this function, you must @code{(require 'url-queue)}.
  280. @end defun
  281. @vindex url-queue-parallel-processes
  282. @defopt url-queue-parallel-processes
  283. The value of this option is an integer specifying the maximum number
  284. of concurrent @code{url-queue-retrieve} network processes. If the
  285. number of @code{url-queue-retrieve} calls is larger than this number,
  286. later ones are queued until earlier ones are finished.
  287. @end defopt
  288. @vindex url-queue-timeout
  289. @defopt url-queue-timeout
  290. The value of this option is a number specifying the maximum lifetime
  291. of a @code{url-queue-retrieve} network process, once it is started.
  292. If a process is not finished by then, it is killed and removed from
  293. the queue.
  294. @end defopt
  295. @node Supported URL Types
  296. @chapter Supported URL Types
  297. This chapter describes functions and variables affecting URL retrieval
  298. for specific schemes.
  299. @menu
  300. * http/https:: Hypertext Transfer Protocol.
  301. * file/ftp:: Local files and FTP archives.
  302. * info:: Emacs "Info" pages.
  303. * mailto:: Sending email.
  304. * news/nntp/snews:: Usenet news.
  305. * rlogin/telnet/tn3270:: Remote host connectivity.
  306. * irc:: Internet Relay Chat.
  307. * data:: Embedded data URLs.
  308. * nfs:: Networked File System
  309. * ldap:: Lightweight Directory Access Protocol
  310. * man:: Unix man pages.
  311. @end menu
  312. @node http/https
  313. @section @code{http} and @code{https}
  314. The @code{http} scheme refers to the Hypertext Transfer Protocol. The
  315. @code{url} library supports HTTP version 1.1, specified in RFC 2616.
  316. Its default port is 80.
  317. The @code{https} scheme is a secure version of @code{http}, with
  318. transmission via SSL@. It is defined in RFC 2069, and its default port
  319. is 443. When using @code{https}, the @code{url} library performs SSL
  320. encryption via the @code{ssl} library, by forcing the @code{ssl}
  321. gateway method to be used. @xref{Gateways in general}.
  322. @defopt url-honor-refresh-requests
  323. If this option is non-@code{nil} (the default), the @code{url} library
  324. honors the HTTP @samp{Refresh} header, which is used by servers to
  325. direct clients to reload documents from the same URL or a or different
  326. one. If the value is @code{nil}, the @samp{Refresh} header is
  327. ignored; any other value means to ask the user on each request.
  328. @end defopt
  329. @menu
  330. * Cookies::
  331. * HTTP language/coding::
  332. * HTTP URL Options::
  333. * Dealing with HTTP documents::
  334. @end menu
  335. @node Cookies
  336. @subsection Cookies
  337. @findex url-cookie-delete
  338. @defun url-cookie-list
  339. This command creates a @file{*url cookies*} buffer listing the current
  340. cookies, if there are any. You can remove a cookie using the
  341. @kbd{C-k} (@code{url-cookie-delete}) command.
  342. @end defun
  343. @defopt url-cookie-file
  344. The file in which cookies are stored, defaulting to @file{cookies} in
  345. the directory specified by @code{url-configuration-directory}.
  346. @end defopt
  347. @defopt url-cookie-confirmation
  348. Specifies whether confirmation is require to accept cookies.
  349. @end defopt
  350. @defopt url-cookie-multiple-line
  351. Specifies whether to put all cookies for the server on one line in the
  352. HTTP request to satisfy broken servers like
  353. @url{http://www.hotmail.com}.
  354. @end defopt
  355. @defopt url-cookie-trusted-urls
  356. A list of regular expressions matching URLs from which to accept
  357. cookies always.
  358. @end defopt
  359. @defopt url-cookie-untrusted-urls
  360. A list of regular expressions matching URLs from which to reject
  361. cookies always.
  362. @end defopt
  363. @defopt url-cookie-save-interval
  364. The number of seconds between automatic saves of cookies to disk.
  365. Default is one hour.
  366. @end defopt
  367. @node HTTP language/coding
  368. @subsection Language and Encoding Preferences
  369. HTTP allows clients to express preferences for the language and
  370. encoding of documents which servers may honor. For each of these
  371. variables, the value is a string; it can specify a single choice, or
  372. it can be a comma-separated list.
  373. Normally, this list is ordered by descending preference. However, each
  374. element can be followed by @samp{;q=@var{priority}} to specify its
  375. preference level, a decimal number from 0 to 1; e.g., for
  376. @code{url-mime-language-string}, @w{@code{"de, en-gb;q=0.8,
  377. en;q=0.7"}}. An element that has no @samp{;q} specification has
  378. preference level 1.
  379. @defopt url-mime-charset-string
  380. @cindex character sets
  381. @cindex coding systems
  382. This variable specifies a preference for character sets when documents
  383. can be served in more than one encoding.
  384. HTTP allows specifying a series of MIME charsets which indicate your
  385. preferred character set encodings, e.g., Latin-9 or Big5, and these
  386. can be weighted. The default series is generated automatically from
  387. the associated MIME types of all defined coding systems, sorted by the
  388. coding system priority specified in Emacs. @xref{Recognize Coding, ,
  389. Recognizing Coding Systems, emacs, The GNU Emacs Manual}.
  390. @end defopt
  391. @defopt url-mime-language-string
  392. @cindex language preferences
  393. A string specifying the preferred language when servers can serve
  394. files in several languages. Use RFC 1766 abbreviations, e.g.,
  395. @samp{en} for English, @samp{de} for German.
  396. The string can be @code{"*"} to get the first available language (as
  397. opposed to the default).
  398. @end defopt
  399. @node HTTP URL Options
  400. @subsection HTTP URL Options
  401. HTTP supports an @samp{OPTIONS} method describing things supported by
  402. the URL@.
  403. @defun url-http-options url
  404. Returns a property list describing options available for URL@. The
  405. property list members are:
  406. @table @code
  407. @item methods
  408. A list of symbols specifying what HTTP methods the resource
  409. supports.
  410. @item dav
  411. @cindex DAV
  412. A list of numbers specifying what DAV protocol/schema versions are
  413. supported.
  414. @item dasl
  415. @cindex DASL
  416. A list of supported DASL search types supported (string form).
  417. @item ranges
  418. A list of the units available for use in partial document fetches.
  419. @item p3p
  420. @cindex P3P
  421. The @dfn{Platform For Privacy Protection} description for the resource.
  422. Currently this is just the raw header contents.
  423. @end table
  424. @end defun
  425. @node Dealing with HTTP documents
  426. @subsection Dealing with HTTP documents
  427. HTTP URLs are retrieved into a buffer containing the HTTP headers
  428. followed by the body. Since the headers are quasi-MIME, they may be
  429. processed using the MIME library. @xref{Top,, Emacs MIME,
  430. emacs-mime, The Emacs MIME Manual}.
  431. @node file/ftp
  432. @section file and ftp
  433. @cindex files
  434. @cindex FTP
  435. @cindex File Transfer Protocol
  436. @cindex compressed files
  437. @cindex dired
  438. The @code{ftp} and @code{file} schemes are defined in RFC 1808. The
  439. @code{url} library treats @samp{ftp:} and @samp{file:} as synonymous.
  440. Such URLs have the form
  441. @example
  442. ftp://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
  443. file://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
  444. @end example
  445. @noindent
  446. If the URL specifies a local file, it is retrieved by reading the file
  447. contents in the usual way. If it specifies a remote file, it is
  448. retrieved using the Ange-FTP package. @xref{Remote Files,,, emacs,
  449. The GNU Emacs Manual}.
  450. When retrieving a compressed file, it is automatically uncompressed
  451. if it has the file suffix @file{.z}, @file{.gz}, @file{.Z},
  452. @file{.bz2}, or @file{.xz}. (The list of supported suffixes is
  453. hard-coded, and cannot be altered by customizing
  454. @code{jka-compr-compression-info-list}.)
  455. @defopt url-directory-index-file
  456. This option specifies the filename to look for when a @code{file} or
  457. @code{ftp} URL specifies a directory. The default is
  458. @file{index.html}. If this file exists and is readable, it is viewed.
  459. Otherwise, Emacs visits the directory using Dired.
  460. @end defopt
  461. @node info
  462. @section info
  463. @cindex Info
  464. @cindex Texinfo
  465. @findex Info-goto-node
  466. The @code{info} scheme is non-standard. Such URLs have the form
  467. @example
  468. info:@var{file}#@var{node}
  469. @end example
  470. @noindent
  471. and are retrieved by invoking @code{Info-goto-node} with argument
  472. @samp{(@var{file})@var{node}}. If @samp{#@var{node}} is omitted, the
  473. @samp{Top} node is opened.
  474. @node mailto
  475. @section mailto
  476. @cindex mailto
  477. @cindex email
  478. A @code{mailto} URL specifies an email message to be sent to a given
  479. email address. For example, @samp{mailto:foo@@bar.com} specifies
  480. sending a message to @samp{foo@@bar.com}. The ``retrieval method''
  481. for such URLs is to open a mail composition buffer in which the
  482. appropriate content (e.g., the recipient address) has been filled in.
  483. As defined in RFC 6068, a @code{mailto} URL can have the form
  484. @example
  485. @samp{mailto:@var{mailbox}[?@var{header}=@var{contents}[&@var{header}=@var{contents}]]}
  486. @end example
  487. @noindent
  488. where an arbitrary number of @var{header}s can be added. If the
  489. @var{header} is @samp{body}, then @var{contents} is put in the message
  490. body; otherwise, a @var{header} header field is created with
  491. @var{contents} as its contents. Note that the @code{url} library does
  492. not perform any checking of @var{header} or @var{contents}, so you
  493. should check them before sending the message.
  494. @defopt url-mail-command
  495. @vindex mail-user-agent
  496. The value of this variable is the function called whenever url needs
  497. to send mail. This should normally be left its default, which is the
  498. standard mail-composition command @code{compose-mail}. @xref{Sending
  499. Mail,,, emacs, The GNU Emacs Manual}.
  500. @end defopt
  501. If the document containing the @code{mailto} URL itself possessed a
  502. known URL, Emacs automatically inserts an @samp{X-Url-From} header
  503. field into the mail buffer, specifying that URL.
  504. @node news/nntp/snews
  505. @section @code{news}, @code{nntp} and @code{snews}
  506. @cindex news
  507. @cindex network news
  508. @cindex usenet
  509. @cindex NNTP
  510. @cindex snews
  511. The @code{news}, @code{nntp}, and @code{snews} schemes, defined in RFC
  512. 1738, are used for reading Usenet newsgroups. For compatibility with
  513. non-standard-compliant news clients, the @code{url} library allows
  514. host and port fields to be included in @code{news} URLs, even though
  515. they are properly only allowed for @code{nntp} and @code{snews}.
  516. @code{news} and @code{nntp} URLs have the following form:
  517. @table @samp
  518. @item news:@var{newsgroup}
  519. Retrieves a list of messages in @var{newsgroup};
  520. @item news:@var{message-id}
  521. Retrieves the message with the given @var{message-id};
  522. @item news:*
  523. Retrieves a list of all available newsgroups;
  524. @item nntp://@var{host}:@var{port}/@var{newsgroup}
  525. @itemx nntp://@var{host}:@var{port}/@var{message-id}
  526. @itemx nntp://@var{host}:@var{port}/*
  527. Similar to the @samp{news} versions.
  528. @end table
  529. The default port for @code{nntp} (and @code{news}) is 119. The
  530. difference between an @code{nntp} URL and a @code{news} URL is that an
  531. @code{nttp} URL may specify an article by its number. The
  532. @samp{snews} scheme is the same as @samp{nntp}, except that it is
  533. tunneled through SSL and has default port 563.
  534. These URLs are retrieved via the Gnus package.
  535. @cindex environment variable
  536. @vindex NNTPSERVER
  537. @defopt url-news-server
  538. This variable specifies the default news server from which to fetch
  539. news, if no server was specified in the URL@. The default value,
  540. @code{nil}, means to use the server specified by the standard
  541. environment variable @samp{NNTPSERVER}, or @samp{news} if that
  542. environment variable is unset.
  543. @end defopt
  544. @node rlogin/telnet/tn3270
  545. @section rlogin, telnet and tn3270
  546. @cindex rlogin
  547. @cindex telnet
  548. @cindex tn3270
  549. @cindex terminal emulation
  550. @findex terminal-emulator
  551. These URL schemes are defined in RFC 1738, and are used for logging in
  552. via a terminal emulator. They have the form
  553. @example
  554. telnet://@var{user}:@var{password}@@@var{host}:@var{port}
  555. @end example
  556. @noindent
  557. but the @var{password} component is ignored.
  558. To handle rlogin, telnet and tn3270 URLs, a @code{rlogin},
  559. @code{telnet} or @code{tn3270} (the program names and arguments are
  560. hardcoded) session is run in a @code{terminal-emulator} buffer.
  561. Well-known ports are used if the URL does not specify a port.
  562. @node irc
  563. @section irc
  564. @cindex IRC
  565. @cindex Internet Relay Chat
  566. @cindex ZEN IRC
  567. @cindex ERC
  568. @cindex rcirc
  569. The @code{irc} scheme is defined in the Internet Draft at
  570. @url{http://www.w3.org/Addressing/draft-mirashi-url-irc-01.txt} (which
  571. was never approved as an RFC). Such URLs have the form
  572. @example
  573. irc://@var{host}:@var{port}/@var{target},@var{needpass}
  574. @end example
  575. @noindent
  576. and are retrieved by opening an @acronym{IRC} session using the
  577. function specified by @code{url-irc-function}.
  578. @defopt url-irc-function
  579. The value of this option is a function, which is called to open an IRC
  580. connection for @code{irc} URLs. This function must take five
  581. arguments, @var{host}, @var{port}, @var{channel}, @var{user} and
  582. @var{password}. The @var{channel} argument specifies the channel to
  583. join immediately, and may be @code{nil}.
  584. The default is @code{url-irc-rcirc}, which uses the Rcirc package.
  585. Other options are @code{url-irc-erc} (which uses ERC) and
  586. @code{url-irc-zenirc} (which uses ZenIRC).
  587. @end defopt
  588. @node data
  589. @section data
  590. @cindex data URLs
  591. The @code{data} scheme, defined in RFC 2397, contains MIME data in
  592. the URL itself. Such URLs have the form
  593. @example
  594. data:@r{[}@var{media-type}@r{]}@r{[};@var{base64}@r{]},@var{data}
  595. @end example
  596. @noindent
  597. @var{media-type} is a MIME @samp{Content-Type} string, possibly
  598. including parameters. It defaults to
  599. @samp{text/plain;charset=US-ASCII}. The @samp{text/plain} can be
  600. omitted but the charset parameter supplied. If @samp{;base64} is
  601. present, the @var{data} are base64-encoded.
  602. @node nfs
  603. @section nfs
  604. @cindex NFS
  605. @cindex Network File System
  606. @cindex automounter
  607. The @code{nfs} scheme, defined in RFC 2224, is similar to @code{ftp}
  608. except that it points to a file on a remote host that is handled by an
  609. NFS automounter on the local host. Such URLs have the form
  610. @example
  611. nfs://@var{user}:@var{password}@@@var{host}:@var{port}/@var{file}
  612. @end example
  613. @defvar url-nfs-automounter-directory-spec
  614. @end defvar
  615. A string saying how to invoke the NFS automounter. Certain @samp{%}
  616. sequences are recognized:
  617. @table @samp
  618. @item %h
  619. The hostname of the NFS server;
  620. @item %n
  621. The port number of the NFS server;
  622. @item %u
  623. The username to use to authenticate;
  624. @item %p
  625. The password to use to authenticate;
  626. @item %f
  627. The filename on the remote server;
  628. @item %%
  629. A literal @samp{%}.
  630. @end table
  631. Each can be used any number of times.
  632. @node ldap
  633. @section ldap
  634. @cindex LDAP
  635. @cindex Lightweight Directory Access Protocol
  636. The LDAP scheme is defined in RFC 2255.
  637. @node man
  638. @section man
  639. @cindex @command{man}
  640. @cindex Unix man pages
  641. @findex man
  642. The @code{man} scheme is a non-standard one. Such URLs have the form
  643. @example
  644. @samp{man:@var{page-spec}}
  645. @end example
  646. @noindent
  647. and are retrieved by passing @var{page-spec} to the Lisp function
  648. @code{man}.
  649. @node General Facilities
  650. @chapter General Facilities
  651. @menu
  652. * Disk Caching::
  653. * Proxies::
  654. * Gateways in general::
  655. * History::
  656. @end menu
  657. @node Disk Caching
  658. @section Disk Caching
  659. @cindex Caching
  660. @cindex Persistent Cache
  661. @cindex Disk Cache
  662. The disk cache stores retrieved documents locally, whence they can be
  663. retrieved more quickly. When requesting a URL that is in the cache,
  664. the library checks to see if the page has changed since it was last
  665. retrieved from the remote machine. If not, the local copy is used,
  666. saving the transmission over the network.
  667. @cindex Cleaning the cache
  668. @cindex Clearing the cache
  669. @cindex Cache cleaning
  670. Currently the cache isn't cleared automatically.
  671. @c Running the @code{clean-cache} shell script
  672. @c fist is recommended, to allow for future cleaning of the cache. This
  673. @c shell script will remove all files that have not been accessed since it
  674. @c was last run. To keep the cache pared down, it is recommended that this
  675. @c script be run from @i{at} or @i{cron} (see the manual pages for
  676. @c crontab(5) or at(1) for more information)
  677. @defopt url-automatic-caching
  678. Setting this variable non-@code{nil} causes documents to be cached
  679. automatically.
  680. @end defopt
  681. @defopt url-cache-directory
  682. This variable specifies the
  683. directory to store the cache files. It defaults to sub-directory
  684. @file{cache} of @code{url-configuration-directory}.
  685. @end defopt
  686. @defopt url-cache-creation-function
  687. The cache relies on a scheme for mapping URLs to files in the cache.
  688. This variable names a function which sets the type of cache to use.
  689. It takes a URL as argument and returns the absolute file name of the
  690. corresponding cache file. The two supplied possibilities are
  691. @code{url-cache-create-filename-using-md5} and
  692. @code{url-cache-create-filename-human-readable}.
  693. @end defopt
  694. @defun url-cache-create-filename-using-md5 url
  695. Creates a cache file name from @var{url} using MD5 hashing.
  696. This is creates entries with very few cache collisions and is fast.
  697. @cindex MD5
  698. @smallexample
  699. (url-cache-create-filename-using-md5 "http://www.example.com/foo/bar")
  700. @result{} "/home/fx/.url/cache/fx/http/com/example/www/b8a35774ad20db71c7c3409a5410e74f"
  701. @end smallexample
  702. @end defun
  703. @defun url-cache-create-filename-human-readable url
  704. Creates a cache file name from @var{url} more obviously connected to
  705. @var{url} than for @code{url-cache-create-filename-using-md5}, but
  706. more likely to conflict with other files.
  707. @smallexample
  708. (url-cache-create-filename-human-readable "http://www.example.com/foo/bar")
  709. @result{} "/home/fx/.url/cache/fx/http/com/example/www/foo/bar"
  710. @end smallexample
  711. @end defun
  712. @defun url-cache-expired
  713. This function returns non-@code{nil} if a cache entry has expired (or is absent).
  714. The arguments are a URL and optional expiration delay in seconds
  715. (default @var{url-cache-expire-time}).
  716. @end defun
  717. @defopt url-cache-expire-time
  718. This variable is the default number of seconds to use for the
  719. expire-time argument of the function @code{url-cache-expired}.
  720. @end defopt
  721. @defun url-fetch-from-cache
  722. This function takes a URL as its argument and returns a buffer
  723. containing the data cached for that URL.
  724. @end defun
  725. @c Fixme: never actually used currently?
  726. @c @defopt url-standalone-mode
  727. @c @cindex Relying on cache
  728. @c @cindex Cache only mode
  729. @c @cindex Standalone mode
  730. @c If this variable is non-@code{nil}, the library relies solely on the
  731. @c cache for fetching documents and avoids checking if they have changed
  732. @c on remote servers.
  733. @c @end defopt
  734. @c With a large cache of documents on the local disk, it can be very handy
  735. @c when traveling, or any other time the network connection is not active
  736. @c (a laptop with a dial-on-demand PPP connection, etc.). Emacs/W3 can rely
  737. @c solely on its cache, and avoid checking to see if the page has changed
  738. @c on the remote server. In the case of a dial-on-demand PPP connection,
  739. @c this will keep the phone line free as long as possible, only bringing up
  740. @c the PPP connection when asking for a page that is not located in the
  741. @c cache. This is very useful for demonstrations as well.
  742. @node Proxies
  743. @section Proxies and Gatewaying
  744. @c fixme: check/document url-ns stuff
  745. @cindex proxy servers
  746. @cindex proxies
  747. @cindex environment variables
  748. @vindex HTTP_PROXY
  749. Proxy servers are commonly used to provide gateways through firewalls
  750. or as caches serving some more-or-less local network. Each protocol
  751. (HTTP, FTP, etc.)@: can have a different gateway server. Proxying is
  752. conventionally configured commonly amongst different programs through
  753. environment variables of the form @code{@var{protocol}_proxy}, where
  754. @var{protocol} is one of the supported network protocols (@code{http},
  755. @code{ftp} etc.). The library recognizes such variables in either
  756. upper or lower case. Their values are of one of the forms:
  757. @itemize @bullet
  758. @item @code{@var{host}:@var{port}}
  759. @item A full URL;
  760. @item Simply a host name.
  761. @end itemize
  762. @vindex NO_PROXY
  763. The @code{NO_PROXY} environment variable specifies URLs that should be
  764. excluded from proxying (on servers that should be contacted directly).
  765. This should be a comma-separated list of hostnames, domain names, or a
  766. mixture of both. Asterisks can be used as wildcards, but other
  767. clients may not support that. Domain names may be indicated by a
  768. leading dot. For example:
  769. @example
  770. NO_PROXY="*.aventail.com,home.com,.seanet.com"
  771. @end example
  772. @noindent says to contact all machines in the @samp{aventail.com} and
  773. @samp{seanet.com} domains directly, as well as the machine named
  774. @samp{home.com}. If @code{NO_PROXY} isn't defined, @code{no_PROXY}
  775. and @code{no_proxy} are also tried, in that order.
  776. Proxies may also be specified directly in Lisp.
  777. @defopt url-proxy-services
  778. This variable is an alist of URL schemes and proxy servers that
  779. gateway them. The items are of the form @w{@code{(@var{scheme}
  780. . @var{host}:@var{portnumber})}}, says that the URL @var{scheme} is
  781. gatewayed through @var{portnumber} on the specified @var{host}. An
  782. exception is the pseudo scheme @code{"no_proxy"}, which is paired with
  783. a regexp matching host names not to be proxied. This variable is
  784. initialized from the environment as above.
  785. @example
  786. (setq url-proxy-services
  787. '(("http" . "proxy.aventail.com:80")
  788. ("no_proxy" . "^.*\\(aventail\\|seanet\\)\\.com")))
  789. @end example
  790. @end defopt
  791. @node Gateways in general
  792. @section Gateways in General
  793. @cindex gateways
  794. @cindex firewalls
  795. The library provides a general gateway layer through which all
  796. networking passes. It can both control access to the network and
  797. provide access through gateways in firewalls. This may make direct
  798. connections in some cases and pass through some sort of gateway in
  799. others.@footnote{Proxies (which only operate over HTTP) are
  800. implemented using this.} The library's basic function responsible for
  801. making connections is @code{url-open-stream}.
  802. @defun url-open-stream name buffer host service
  803. @cindex opening a stream
  804. @cindex stream, opening
  805. Open a stream to @var{host}, possibly via a gateway. The other
  806. arguments are as for @code{open-network-stream}. This will not make a
  807. connection if @code{url-gateway-unplugged} is non-@code{nil}.
  808. @end defun
  809. @defvar url-gateway-local-host-regexp
  810. This is a regular expression that matches local hosts that do not
  811. require the use of a gateway. If @code{nil}, all connections are made
  812. through the gateway.
  813. @end defvar
  814. @defvar url-gateway-method
  815. This variable controls which gateway method is used. It may be useful
  816. to bind it temporarily in some applications. It has values taken from
  817. a list of symbols. Possible values are:
  818. @table @code
  819. @item telnet
  820. @cindex @command{telnet}
  821. Use this method if you must first telnet and log into a gateway host,
  822. and then run telnet from that host to connect to outside machines.
  823. @item rlogin
  824. @cindex @command{rlogin}
  825. This method is identical to @code{telnet}, but uses @command{rlogin}
  826. to log into the remote machine without having to send the username and
  827. password over the wire every time.
  828. @item socks
  829. @cindex @sc{socks}
  830. Use if the firewall has a @sc{socks} gateway running on it. The
  831. @sc{socks} v5 protocol is defined in RFC 1928.
  832. @c @item ssl
  833. @c This probably shouldn't be documented
  834. @c Fixme: why not? -- fx
  835. @item native
  836. This method uses Emacs's builtin networking directly. This is the
  837. default. It can be used only if there is no firewall blocking access.
  838. @end table
  839. @end defvar
  840. The following variables control the gateway methods.
  841. @defopt url-gateway-telnet-host
  842. The gateway host to telnet to. Once logged in there, you then telnet
  843. out to the hosts you want to connect to.
  844. @end defopt
  845. @defopt url-gateway-telnet-parameters
  846. This should be a list of parameters to pass to the @command{telnet} program.
  847. @end defopt
  848. @defopt url-gateway-telnet-password-prompt
  849. This is a regular expression that matches the password prompt when
  850. logging in.
  851. @end defopt
  852. @defopt url-gateway-telnet-login-prompt
  853. This is a regular expression that matches the username prompt when
  854. logging in.
  855. @end defopt
  856. @defopt url-gateway-telnet-user-name
  857. The username to log in with.
  858. @end defopt
  859. @defopt url-gateway-telnet-password
  860. The password to send when logging in.
  861. @end defopt
  862. @defopt url-gateway-prompt-pattern
  863. This is a regular expression that matches the shell prompt.
  864. @end defopt
  865. @defopt url-gateway-rlogin-host
  866. Host to @samp{rlogin} to before telnetting out.
  867. @end defopt
  868. @defopt url-gateway-rlogin-parameters
  869. Parameters to pass to @samp{rsh}.
  870. @end defopt
  871. @defopt url-gateway-rlogin-user-name
  872. User name to use when logging in to the gateway.
  873. @end defopt
  874. @defopt url-gateway-prompt-pattern
  875. This is a regular expression that matches the shell prompt.
  876. @end defopt
  877. @defopt socks-server
  878. This specifies the default server, it takes the form
  879. @w{@code{("Default server" @var{server} @var{port} @var{version})}}
  880. where @var{version} can be either 4 or 5.
  881. @end defopt
  882. @defvar socks-password
  883. If this is @code{nil} then you will be asked for the password,
  884. otherwise it will be used as the password for authenticating you to
  885. the @sc{socks} server.
  886. @end defvar
  887. @defvar socks-username
  888. This is the username to use when authenticating yourself to the
  889. @sc{socks} server. By default this is your login name.
  890. @end defvar
  891. @defvar socks-timeout
  892. This controls how long, in seconds, to wait for responses from the
  893. @sc{socks} server; it is 5 by default.
  894. @end defvar
  895. @c fixme: these have been effectively commented-out in the code
  896. @c @defopt socks-server-aliases
  897. @c This a list of server aliases. It is a list of aliases of the form
  898. @c @var{(alias hostname port version)}.
  899. @c @end defopt
  900. @c @defopt socks-network-aliases
  901. @c This a list of network aliases. Each entry in the list takes the form
  902. @c @var{(alias (network))} where @var{alias} is a string that names the
  903. @c @var{network}. The networks can contain a pair (not a dotted pair) of
  904. @c @sc{ip} addresses which specify a range of @sc{ip} addresses, an @sc{ip}
  905. @c address and a netmask, a domain name or a unique hostname or @sc{ip}
  906. @c address.
  907. @c @end defopt
  908. @c @defopt socks-redirection-rules
  909. @c This a list of redirection rules. Each rule take the form
  910. @c @var{(Destination network Connection type)} where @var{Destination
  911. @c network} is a network alias from @code{socks-network-aliases} and
  912. @c @var{Connection type} can be @code{nil} in which case a direct
  913. @c connection is used, or it can be an alias from
  914. @c @code{socks-server-aliases} in which case that server is used as a
  915. @c proxy.
  916. @c @end defopt
  917. @defopt socks-nslookup-program
  918. @cindex @command{nslookup}
  919. This the @samp{nslookup} program. It is @code{"nslookup"} by default.
  920. @end defopt
  921. @menu
  922. * Suppressing network connections::
  923. @end menu
  924. @c * Broken hostname resolution::
  925. @node Suppressing network connections
  926. @subsection Suppressing Network Connections
  927. @cindex network connections, suppressing
  928. @cindex suppressing network connections
  929. @cindex bugs, HTML
  930. @cindex HTML ``bugs''
  931. In some circumstances it is desirable to suppress making network
  932. connections. A typical case is when rendering HTML in a mail user
  933. agent, when external URLs should not be activated, particularly to
  934. avoid ``bugs'' which ``call home'' by fetch single-pixel images and the
  935. like. To arrange this, bind the following variable for the duration
  936. of such processing.
  937. @defvar url-gateway-unplugged
  938. If this variable is non-@code{nil} new network connections are never
  939. opened by the URL library.
  940. @end defvar
  941. @c @node Broken hostname resolution
  942. @c @subsection Broken Hostname Resolution
  943. @c @cindex hostname resolver
  944. @c @cindex resolver, hostname
  945. @c Some C libraries do not include the hostname resolver routines in
  946. @c their static libraries. If Emacs was linked statically, and was not
  947. @c linked with the resolver libraries, it will not be able to get to any
  948. @c machines off the local network. This is characterized by being able
  949. @c to reach someplace with a raw ip number, but not its hostname
  950. @c (@url{http://129.79.254.191/} works, but
  951. @c @url{http://www.cs.indiana.edu/} doesn't). This used to happen on
  952. @c SunOS4 and Ultrix, but is now probably now rare. If Emacs can't be
  953. @c rebuilt linked against the resolver library, it can use the external
  954. @c @command{nslookup} program instead.
  955. @c @defopt url-gateway-broken-resolution
  956. @c @cindex @code{nslookup} program
  957. @c @cindex program, @code{nslookup}
  958. @c If non-@code{nil}, this variable says to use the program specified by
  959. @c @code{url-gateway-nslookup-program} program to do hostname resolution.
  960. @c @end defopt
  961. @c @defopt url-gateway-nslookup-program
  962. @c The name of the program to do hostname lookup if Emacs can't do it
  963. @c directly. This program should expect a single argument on the command
  964. @c line---the hostname to resolve---and should produce output similar to
  965. @c the standard Unix @command{nslookup} program:
  966. @c @example
  967. @c Name: www.cs.indiana.edu
  968. @c Address: 129.79.254.191
  969. @c @end example
  970. @c @end defopt
  971. @node History
  972. @section History
  973. @findex url-do-setup
  974. The library can maintain a global history list tracking URLs accessed.
  975. URL completion can be done from it. The history mechanism is set up
  976. automatically via @code{url-do-setup} when it is configured to be on.
  977. Note that the size of the history list is currently not limited.
  978. @vindex url-history-hash-table
  979. The history ``list'' is actually a hash table,
  980. @code{url-history-hash-table}. It contains access times keyed by URL
  981. strings. The times are in the format returned by @code{current-time}.
  982. @defun url-history-update-url url time
  983. This function updates the history table with an entry for @var{url}
  984. accessed at the given @var{time}.
  985. @end defun
  986. @defopt url-history-track
  987. If non-@code{nil}, the library will keep track of all the URLs
  988. accessed. If it is @code{t}, the list is saved to disk at the end of
  989. each Emacs session. The default is @code{nil}.
  990. @end defopt
  991. @defopt url-history-file
  992. The file storing the history list between sessions. It defaults to
  993. @file{history} in @code{url-configuration-directory}.
  994. @end defopt
  995. @defopt url-history-save-interval
  996. @findex url-history-setup-save-timer
  997. The number of seconds between automatic saves of the history list.
  998. Default is one hour. Note that if you change this variable directly,
  999. rather than using Custom, after @code{url-do-setup} has been run, you
  1000. need to run the function @code{url-history-setup-save-timer}.
  1001. @end defopt
  1002. @defun url-history-parse-history &optional fname
  1003. Parses the history file @var{fname} (default @code{url-history-file})
  1004. and sets up the history list.
  1005. @end defun
  1006. @defun url-history-save-history &optional fname
  1007. Saves the current history to file @var{fname} (default
  1008. @code{url-history-file}).
  1009. @end defun
  1010. @defun url-completion-function string predicate function
  1011. You can use this function to do completion of URLs from the history.
  1012. @end defun
  1013. @node Customization
  1014. @chapter Customization
  1015. @cindex environment variables
  1016. The following environment variables affect the @code{url} library's
  1017. operation at startup.
  1018. @table @code
  1019. @item TMPDIR
  1020. @vindex TMPDIR
  1021. @vindex url-temporary-directory
  1022. If this is defined, @var{url-temporary-directory} is initialized from
  1023. it.
  1024. @end table
  1025. The following user options affect the general operation of
  1026. @code{url} library.
  1027. @defopt url-configuration-directory
  1028. @cindex configuration files
  1029. The value of this variable specifies the name of the directory where
  1030. the @code{url} library stores its various configuration files, cache
  1031. files, etc.
  1032. The default value specifies a subdirectory named @file{url/} in the
  1033. standard Emacs user data directory specified by the variable
  1034. @code{user-emacs-directory} (normally @file{~/.emacs.d}). However,
  1035. the old default was @file{~/.url}, and this directory is used instead
  1036. if it exists.
  1037. @end defopt
  1038. @defopt url-debug
  1039. @cindex debugging
  1040. Specifies the types of debug messages which are logged to
  1041. the @file{*URL-DEBUG*} buffer.
  1042. @code{t} means log all messages.
  1043. A number means log all messages and show them with @code{message}.
  1044. It may also be a list of the types of messages to be logged.
  1045. @end defopt
  1046. @defopt url-personal-mail-address
  1047. @end defopt
  1048. @defopt url-privacy-level
  1049. @end defopt
  1050. @defopt url-uncompressor-alist
  1051. @end defopt
  1052. @defopt url-passwd-entry-func
  1053. @end defopt
  1054. @defopt url-standalone-mode
  1055. @end defopt
  1056. @defopt url-bad-port-list
  1057. @end defopt
  1058. @defopt url-max-password-attempts
  1059. @end defopt
  1060. @defopt url-temporary-directory
  1061. @end defopt
  1062. @defopt url-show-status
  1063. @end defopt
  1064. @defopt url-confirmation-func
  1065. The function to use for asking yes or no functions. This is normally
  1066. either @code{y-or-n-p} or @code{yes-or-no-p}, but could be another
  1067. function taking a single argument (the prompt) and returning @code{t}
  1068. only if an affirmative answer is given.
  1069. @end defopt
  1070. @defopt url-gateway-method
  1071. @c fixme: describe gatewaying
  1072. A symbol specifying the type of gateway support to use for connections
  1073. from the local machine. The supported methods are:
  1074. @table @code
  1075. @item telnet
  1076. Run telnet in a subprocess to connect;
  1077. @item rlogin
  1078. Rlogin to another machine to connect;
  1079. @item socks
  1080. Connect through a socks server;
  1081. @item ssl
  1082. Connect with SSL;
  1083. @item native
  1084. Connect directly.
  1085. @end table
  1086. @end defopt
  1087. @node GNU Free Documentation License
  1088. @appendix GNU Free Documentation License
  1089. @include doclicense.texi
  1090. @node Function Index
  1091. @unnumbered Command and Function Index
  1092. @printindex fn
  1093. @node Variable Index
  1094. @unnumbered Variable Index
  1095. @printindex vr
  1096. @node Concept Index
  1097. @unnumbered Concept Index
  1098. @printindex cp
  1099. @bye