12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139 |
- $Id: libcurl-the-guide,v 1.23 2004/01/29 16:17:25 bagder Exp $
- _ _ ____ _
- ___| | | | _ \| |
- / __| | | | |_) | |
- | (__| |_| | _ <| |___
- \___|\___/|_| \_\_____|
- PROGRAMMING WITH LIBCURL
- About this Document
- This document attempts to describe the general principles and some basic
- approaches to consider when programming with libcurl. The text will focus
- mainly on the C interface but might apply fairly well on other interfaces as
- well as they usually follow the C one pretty closely.
- This document will refer to 'the user' as the person writing the source code
- that uses libcurl. That would probably be you or someone in your position.
- What will be generally refered to as 'the program' will be the collected
- source code that you write that is using libcurl for transfers. The program
- is outside libcurl and libcurl is outside of the program.
- To get the more details on all options and functions described herein, please
- refer to their respective man pages.
- Building
- There are many different ways to build C programs. This chapter will assume a
- unix-style build process. If you use a different build system, you can still
- read this to get general information that may apply to your environment as
- well.
- Compiling the Program
- Your compiler needs to know where the libcurl headers are
- located. Therefore you must set your compiler's include path to point to
- the directory where you installed them. The 'curl-config'[3] tool can be
- used to get this information:
- $ curl-config --cflags
- Linking the Program with libcurl
- When having compiled the program, you need to link your object files to
- create a single executable. For that to succeed, you need to link with
- libcurl and possibly also with other libraries that libcurl itself depends
- on. Like OpenSSL librararies, but even some standard OS libraries may be
- needed on the command line. To figure out which flags to use, once again
- the 'curl-config' tool comes to the rescue:
- $ curl-config --libs
- SSL or Not
- libcurl can be built and customized in many ways. One of the things that
- varies from different libraries and builds is the support for SSL-based
- transfers, like HTTPS and FTPS. If OpenSSL was detected properly at
- build-time, libcurl will be built with SSL support. To figure out if an
- installed libcurl has been built with SSL support enabled, use
- 'curl-config' like this:
- $ curl-config --feature
- And if SSL is supported, the keyword 'SSL' will be written to stdout,
- possibly together with a few other features that can be on and off on
- different libcurls.
- Portable Code in a Portable World
- The people behind libcurl have put a considerable effort to make libcurl work
- on a large amount of different operating systems and environments.
- You program libcurl the same way on all platforms that libcurl runs on. There
- are only very few minor considerations that differs. If you just make sure to
- write your code portable enough, you may very well create yourself a very
- portable program. libcurl shouldn't stop you from that.
- Global Preparation
- The program must initialize some of the libcurl functionality globally. That
- means it should be done exactly once, no matter how many times you intend to
- use the library. Once for your program's entire life time. This is done using
- curl_global_init()
- and it takes one parameter which is a bit pattern that tells libcurl what to
- intialize. Using CURL_GLOBAL_ALL will make it initialize all known internal
- sub modules, and might be a good default option. The current two bits that
- are specified are:
- CURL_GLOBAL_WIN32 which only does anything on Windows machines. When used on
- a Windows machine, it'll make libcurl intialize the win32 socket
- stuff. Without having that initialized properly, your program cannot use
- sockets properly. You should only do this once for each application, so if
- your program already does this or of another library in use does it, you
- should not tell libcurl to do this as well.
- CURL_GLOBAL_SSL which only does anything on libcurls compiled and built
- SSL-enabled. On these systems, this will make libcurl init OpenSSL properly
- for this application. This is only needed to do once for each application so
- if your program or another library already does this, this bit should not be
- needed.
- libcurl has a default protection mechanism that detects if curl_global_init()
- hasn't been called by the time curl_easy_perform() is called and if that is
- the case, libcurl runs the function itself with a guessed bit pattern. Please
- note that depending solely on this is not considered nice nor very good.
- When the program no longer uses libcurl, it should call
- curl_global_cleanup(), which is the opposite of the init call. It will then
- do the reversed operations to cleanup the resources the curl_global_init()
- call initialized.
- Repeated calls to curl_global_init() and curl_global_cleanup() should be
- avoided. They should only be called once each.
- Handle the Easy libcurl
- libcurl version 7 is oriented around the so called easy interface. All
- operations in the easy interface are prefixed with 'curl_easy'.
- Future libcurls will also offer the multi interface. More about that
- interface, what it is targeted for and how to use it is still only debated on
- the libcurl mailing list and developer web pages. Join up to discuss and
- figure out!
- To use the easy interface, you must first create yourself an easy handle. You
- need one handle for each easy session you want to perform. Basicly, you
- should use one handle for every thread you plan to use for transferring. You
- must never share the same handle in multiple threads.
- Get an easy handle with
- easyhandle = curl_easy_init();
- It returns an easy handle. Using that you proceed to the next step: setting
- up your preferred actions. A handle is just a logic entity for the upcoming
- transfer or series of transfers.
- You set properties and options for this handle using curl_easy_setopt(). They
- control how the subsequent transfer or transfers will be made. Options remain
- set in the handle until set again to something different. Alas, multiple
- requests using the same handle will use the same options.
- Many of the informationals you set in libcurl are "strings", pointers to data
- terminated with a zero byte. Keep in mind that when you set strings with
- curl_easy_setopt(), libcurl will not copy the data. It will merely point to
- the data. You MUST make sure that the data remains available for libcurl to
- use until finished or until you use the same option again to point to
- something else.
- One of the most basic properties to set in the handle is the URL. You set
- your preferred URL to transfer with CURLOPT_URL in a manner similar to:
- curl_easy_setopt(easyhandle, CURLOPT_URL, "http://curl.haxx.se/");
- Let's assume for a while that you want to receive data as the URL indentifies
- a remote resource you want to get here. Since you write a sort of application
- that needs this transfer, I assume that you would like to get the data passed
- to you directly instead of simply getting it passed to stdout. So, you write
- your own function that matches this prototype:
- size_t write_data(void *buffer, size_t size, size_t nmemb, void *userp);
- You tell libcurl to pass all data to this function by issuing a function
- similar to this:
- curl_easy_setopt(easyhandle, CURLOPT_WRITEFUNCTION, write_data);
- You can control what data your function get in the forth argument by setting
- another property:
- curl_easy_setopt(easyhandle, CURLOPT_FILE, &internal_struct);
- Using that property, you can easily pass local data between your application
- and the function that gets invoked by libcurl. libcurl itself won't touch the
- data you pass with CURLOPT_FILE.
- libcurl offers its own default internal callback that'll take care of the
- data if you don't set the callback with CURLOPT_WRITEFUNCTION. It will then
- simply output the received data to stdout. You can have the default callback
- write the data to a different file handle by passing a 'FILE *' to a file
- opened for writing with the CURLOPT_FILE option.
- Now, we need to take a step back and have a deep breath. Here's one of those
- rare platform-dependent nitpicks. Did you spot it? On some platforms[2],
- libcurl won't be able to operate on files opened by the program. Thus, if you
- use the default callback and pass in a an open file with CURLOPT_FILE, it
- will crash. You should therefore avoid this to make your program run fine
- virtually everywhere.
- There are of course many more options you can set, and we'll get back to a
- few of them later. Let's instead continue to the actual transfer:
- success = curl_easy_perform(easyhandle);
- The curl_easy_perform() will connect to the remote site, do the necessary
- commands and receive the transfer. Whenever it receives data, it calls the
- callback function we previously set. The function may get one byte at a time,
- or it may get many kilobytes at once. libcurl delivers as much as possible as
- often as possible. Your callback function should return the number of bytes
- it "took care of". If that is not the exact same amount of bytes that was
- passed to it, libcurl will abort the operation and return with an error code.
- When the transfer is complete, the function returns a return code that
- informs you if it succeeded in its mission or not. If a return code isn't
- enough for you, you can use the CURLOPT_ERRORBUFFER to point libcurl to a
- buffer of yours where it'll store a human readable error message as well.
- If you then want to transfer another file, the handle is ready to be used
- again. Mind you, it is even preferred that you re-use an existing handle if
- you intend to make another transfer. libcurl will then attempt to re-use the
- previous
- Multi-threading issues
- libcurl is completely thread safe, except for two issues: signals and alarm
- handlers. Signals are needed for a SIGPIPE handler, and the alarm() syscall
- is used to catch timeouts (mostly during DNS lookup).
- If you are accessing HTTPS or FTPS URLs in a multi-threaded manner, you are
- then of course using OpenSSL multi-threaded and it has itself a few
- requirements on this. Basicly, you need to provide one or two functions to
- allow it to function properly. For all details, see this:
- http://www.openssl.org/docs/crypto/threads.html#DESCRIPTION
- When using multiple threads you should first ignore SIGPIPE in your main
- thread and set the CURLOPT_NOSIGNAL option to TRUE for all handles.
- Everything will work fine except that timeouts are not honored during the DNS
- lookup - which you can work around by building libcurl with ares-support.
- Ares is a library that provides asynchronous name resolves. Unfortunately,
- ares does not yet support IPv6.
- For SIGPIPE info see the UNIX Socket FAQ at
- http://www.unixguide.net/network/socketfaq/2.22.shtml
- Also, note that CURLOPT_DNS_USE_GLOBAL_CACHE is not thread-safe.
- When It Doesn't Work
- There will always be times when the transfer fails for some reason. You might
- have set the wrong libcurl option or misunderstood what the libcurl option
- actually does, or the remote server might return non-standard replies that
- confuse the library which then confuses your program.
- There's one golden rule when these things occur: set the CURLOPT_VERBOSE
- option to TRUE. It'll cause the library to spew out the entire protocol
- details it sends, some internal info and some received protcol data as well
- (especially when using FTP). If you're using HTTP, adding the headers in the
- received output to study is also a clever way to get a better understanding
- wht the server behaves the way it does. Include headers in the normal body
- output with CURLOPT_HEADER set TRUE.
- Of course there are bugs left. We need to get to know about them to be able
- to fix them, so we're quite dependent on your bug reports! When you do report
- suspected bugs in libcurl, please include as much details you possibly can: a
- protocol dump that CURLOPT_VERBOSE produces, library version, as much as
- possible of your code that uses libcurl, operating system name and version,
- compiler name and version etc.
- If CURLOPT_VERBOSE is not enough, you increase the level of debug data your
- application receive by using the CURLOPT_DEBUGFUNCTION.
- Getting some in-depth knowledge about the protocols involved is never wrong,
- and if you're trying to do funny things, you might very well understand
- libcurl and how to use it better if you study the appropriate RFC documents
- at least briefly.
- Upload Data to a Remote Site
- libcurl tries to keep a protocol independent approach to most transfers, thus
- uploading to a remote FTP site is very similar to uploading data to a HTTP
- server with a PUT request.
- Of course, first you either create an easy handle or you re-use one existing
- one. Then you set the URL to operate on just like before. This is the remote
- URL, that we now will upload.
- Since we write an application, we most likely want libcurl to get the upload
- data by asking us for it. To make it do that, we set the read callback and
- the custom pointer libcurl will pass to our read callback. The read callback
- should have a prototype similar to:
- size_t function(char *bufptr, size_t size, size_t nitems, void *userp);
- Where bufptr is the pointer to a buffer we fill in with data to upload and
- size*nitems is the size of the buffer and therefore also the maximum amount
- of data we can return to libcurl in this call. The 'userp' pointer is the
- custom pointer we set to point to a struct of ours to pass private data
- between the application and the callback.
- curl_easy_setopt(easyhandle, CURLOPT_READFUNCTION, read_function);
- curl_easy_setopt(easyhandle, CURLOPT_INFILE, &filedata);
- Tell libcurl that we want to upload:
- curl_easy_setopt(easyhandle, CURLOPT_UPLOAD, TRUE);
- A few protocols won't behave properly when uploads are done without any prior
- knowledge of the expected file size. So, set the upload file size using the
- CURLOPT_INFILESIZE_LARGE for all known file sizes like this[1]:
- /* in this example, file_size must be an off_t variable */
- curl_easy_setopt(easyhandle, CURLOPT_INFILESIZE_LARGE, file_size);
- When you call curl_easy_perform() this time, it'll perform all the necessary
- operations and when it has invoked the upload it'll call your supplied
- callback to get the data to upload. The program should return as much data as
- possible in every invoke, as that is likely to make the upload perform as
- fast as possible. The callback should return the number of bytes it wrote in
- the buffer. Returning 0 will signal the end of the upload.
- Passwords
- Many protocols use or even require that user name and password are provided
- to be able to download or upload the data of your choice. libcurl offers
- several ways to specify them.
- Most protocols support that you specify the name and password in the URL
- itself. libcurl will detect this and use them accordingly. This is written
- like this:
- protocol://user:password@example.com/path/
- If you need any odd letters in your user name or password, you should enter
- them URL encoded, as %XX where XX is a two-digit hexadecimal number.
- libcurl also provides options to set various passwords. The user name and
- password as shown embedded in the URL can instead get set with the
- CURLOPT_USERPWD option. The argument passed to libcurl should be a char * to
- a string in the format "user:password:". In a manner like this:
- curl_easy_setopt(easyhandle, CURLOPT_USERPWD, "myname:thesecret");
- Another case where name and password might be needed at times, is for those
- users who need to athenticate themselves to a proxy they use. libcurl offers
- another option for this, the CURLOPT_PROXYUSERPWD. It is used quite similar
- to the CURLOPT_USERPWD option like this:
- curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "myname:thesecret");
-
- There's a long time unix "standard" way of storing ftp user names and
- passwords, namely in the $HOME/.netrc file. The file should be made private
- so that only the user may read it (see also the "Security Considerations"
- chapter), as it might contain the password in plain text. libcurl has the
- ability to use this file to figure out what set of user name and password to
- use for a particular host. As an extension to the normal functionality,
- libcurl also supports this file for non-FTP protocols such as HTTP. To make
- curl use this file, use the CURLOPT_NETRC option:
- curl_easy_setopt(easyhandle, CURLOPT_NETRC, TRUE);
- And a very basic example of how such a .netrc file may look like:
- machine myhost.mydomain.com
- login userlogin
- password secretword
- All these examples have been cases where the password has been optional, or
- at least you could leave it out and have libcurl attempt to do its job
- without it. There are times when the password isn't optional, like when
- you're using an SSL private key for secure transfers.
- To pass the known private key password to libcurl:
- curl_easy_setopt(easyhandle, CURLOPT_SSLKEYPASSWD, "keypassword");
- HTTP Authentication
- The previous chapter showed how to set user name and password for getting
- URLs that require authentication. When using the HTTP protocol, there are
- many different ways a client can provide those credentials to the server and
- you can control what way libcurl will (attempt to) use. The default HTTP
- authentication method is called 'Basic', which is sending the name and
- password in clear-text in the HTTP request, base64-encoded. This is unsecure.
- At the time of this writing libcurl can be built to use: Basic, Digest, NTLM,
- Negotiate, GSS-Negotiate and SPNEGO. You can tell libcurl which one to use
- with CURLOPT_HTTPAUTH as in:
- curl_easy_setopt(easyhandle, CURLOPT_HTTPAUTH, CURLAUTH_DIGEST);
- And when you send authentication to a proxy, you can also set authentication
- type the same way but instead with CURLOPT_PROXYAUTH:
- curl_easy_setopt(easyhandle, CURLOPT_PROXYAUTH, CURLAUTH_NTLM);
- Both these options allow you to set multiple types (by ORing them together),
- to make libcurl pick the most secure one out of the types the server/proxy
- claims to support. This method does however add a round-trip since libcurl
- must first ask the server what it supports:
- curl_easy_setopt(easyhandle, CURLOPT_HTTPAUTH,
- CURLAUTH_DIGEST|CURLAUTH_BASIC);
- For convenience, you can use the 'CURLAUTH_ANY' define (instead of a list
- with specific types) which allows libcurl to use whatever method it wants.
- When asking for multiple types, libcurl will pick the available one it
- considers "best" in its own internal order of preference.
- HTTP POSTing
- We get many questions regarding how to issue HTTP POSTs with libcurl the
- proper way. This chapter will thus include examples using both different
- versions of HTTP POST that libcurl supports.
- The first version is the simple POST, the most common version, that most HTML
- pages using the <form> tag uses. We provide a pointer to the data and tell
- libcurl to post it all to the remote site:
- char *data="name=daniel&project=curl";
- curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDS, data);
- curl_easy_setopt(easyhandle, CURLOPT_URL, "http://posthere.com/");
- curl_easy_perform(easyhandle); /* post away! */
- Simple enough, huh? Since you set the POST options with the
- CURLOPT_POSTFIELDS, this automaticly switches the handle to use POST in the
- upcoming request.
- Ok, so what if you want to post binary data that also requires you to set the
- Content-Type: header of the post? Well, binary posts prevents libcurl from
- being able to do strlen() on the data to figure out the size, so therefore we
- must tell libcurl the size of the post data. Setting headers in libcurl
- requests are done in a generic way, by building a list of our own headers and
- then passing that list to libcurl.
- struct curl_slist *headers=NULL;
- headers = curl_slist_append(headers, "Content-Type: text/xml");
- /* post binary data */
- curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDS, binaryptr);
- /* set the size of the postfields data */
- curl_easy_setopt(easyhandle, CURLOPT_POSTFIELDSIZE, 23);
- /* pass our list of custom made headers */
- curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers);
- curl_easy_perform(easyhandle); /* post away! */
- curl_slist_free_all(headers); /* free the header list */
- While the simple examples above cover the majority of all cases where HTTP
- POST operations are required, they don't do multipart formposts. Multipart
- formposts were introduced as a better way to post (possibly large) binary
- data and was first documented in the RFC1867. They're called multipart
- because they're built by a chain of parts, each being a single unit. Each
- part has its own name and contents. You can in fact create and post a
- multipart formpost with the regular libcurl POST support described above, but
- that would require that you build a formpost yourself and provide to
- libcurl. To make that easier, libcurl provides curl_formadd(). Using this
- function, you add parts to the form. When you're done adding parts, you post
- the whole form.
- The following example sets two simple text parts with plain textual contents,
- and then a file with binary contents and upload the whole thing.
- struct curl_httppost *post=NULL;
- struct curl_httppost *last=NULL;
- curl_formadd(&post, &last,
- CURLFORM_COPYNAME, "name",
- CURLFORM_COPYCONTENTS, "daniel", CURLFORM_END);
- curl_formadd(&post, &last,
- CURLFORM_COPYNAME, "project",
- CURLFORM_COPYCONTENTS, "curl", CURLFORM_END);
- curl_formadd(&post, &last,
- CURLFORM_COPYNAME, "logotype-image",
- CURLFORM_FILECONTENT, "curl.png", CURLFORM_END);
- /* Set the form info */
- curl_easy_setopt(easyhandle, CURLOPT_HTTPPOST, post);
- curl_easy_perform(easyhandle); /* post away! */
- /* free the post data again */
- curl_formfree(post);
- Multipart formposts are chains of parts using MIME-style separators and
- headers. It means that each one of these separate parts get a few headers set
- that describe the individual content-type, size etc. To enable your
- application to handicraft this formpost even more, libcurl allows you to
- supply your own set of custom headers to such an individual form part. You
- can of course supply headers to as many parts you like, but this little
- example will show how you set headers to one specific part when you add that
- to the post handle:
- struct curl_slist *headers=NULL;
- headers = curl_slist_append(headers, "Content-Type: text/xml");
- curl_formadd(&post, &last,
- CURLFORM_COPYNAME, "logotype-image",
- CURLFORM_FILECONTENT, "curl.xml",
- CURLFORM_CONTENTHEADER, headers,
- CURLFORM_END);
- curl_easy_perform(easyhandle); /* post away! */
- curl_formfree(post); /* free post */
- curl_slist_free_all(post); /* free custom header list */
- Since all options on an easyhandle are "sticky", they remain the same until
- changed even if you do call curl_easy_perform(), you may need to tell curl to
- go back to a plain GET request if you intend to do such a one as your next
- request. You force an easyhandle to back to GET by using the CURLOPT_HTTPGET
- option:
- curl_easy_setopt(easyhandle, CURLOPT_HTTPGET, TRUE);
- Just setting CURLOPT_POSTFIELDS to "" or NULL will *not* stop libcurl from
- doing a POST. It will just make it POST without any data to send!
- Showing Progress
- For historical and traditional reasons, libcurl has a built-in progress meter
- that can be switched on and then makes it presents a progress meter in your
- terminal.
- Switch on the progress meter by, oddly enough, set CURLOPT_NOPROGRESS to
- FALSE. This option is set to TRUE by default.
- For most applications however, the built-in progress meter is useless and
- what instead is interesting is the ability to specify a progress
- callback. The function pointer you pass to libcurl will then be called on
- irregular intervals with information about the current transfer.
- Set the progress callback by using CURLOPT_PROGRESSFUNCTION. And pass a
- pointer to a function that matches this prototype:
- int progress_callback(void *clientp,
- double dltotal,
- double dlnow,
- double ultotal,
- double ulnow);
- If any of the input arguments is unknown, a 0 will be passed. The first
- argument, the 'clientp' is the pointer you pass to libcurl with
- CURLOPT_PROGRESSDATA. libcurl won't touch it.
- libcurl with C++
- There's basicly only one thing to keep in mind when using C++ instead of C
- when interfacing libcurl:
- "The Callbacks Must Be Plain C"
- So if you want a write callback set in libcurl, you should put it within
- 'extern'. Similar to this:
- extern "C" {
- size_t write_data(void *ptr, size_t size, size_t nmemb,
- void *ourpointer)
- {
- /* do what you want with the data */
- }
- }
- This will of course effectively turn the callback code into C. There won't be
- any "this" pointer available etc.
- Proxies
- What "proxy" means according to Merriam-Webster: "a person authorized to act
- for another" but also "the agency, function, or office of a deputy who acts
- as a substitute for another".
- Proxies are exceedingly common these days. Companies often only offer
- internet access to employees through their HTTP proxies. Network clients or
- user-agents ask the proxy for docuements, the proxy does the actual request
- and then it returns them.
- libcurl has full support for HTTP proxies, so when a given URL is wanted,
- libcurl will ask the proxy for it instead of trying to connect to the actual
- host identified in the URL.
- The fact that the proxy is a HTTP proxy puts certain restrictions on what can
- actually happen. A requested URL that might not be a HTTP URL will be still
- be passed to the HTTP proxy to deliver back to libcurl. This happens
- transparantly, and an application may not need to know. I say "may", because
- at times it is very important to understand that all operations over a HTTP
- proxy is using the HTTP protocol. For example, you can't invoke your own
- custom FTP commands or even proper FTP directory listings.
- Proxy Options
- To tell libcurl to use a proxy at a given port number:
- curl_easy_setopt(easyhandle, CURLOPT_PROXY, "proxy-host.com:8080");
- Some proxies require user authentication before allowing a request, and
- you pass that information similar to this:
- curl_easy_setopt(easyhandle, CURLOPT_PROXYUSERPWD, "user:password");
- If you want to, you can specify the host name only in the CURLOPT_PROXY
- option, and set the port number separately with CURLOPT_PROXYPORT.
- Environment Variables
- libcurl automaticly checks and uses a set of environment variables to know
- what proxies to use for certain protocols. The names of the variables are
- following an ancient de facto standard and are built up as
- "[protocol]_proxy" (note the lower casing). Which makes the variable
- 'http_proxy' checked for a name of a proxy to use when the input URL is
- HTTP. Following the same rule, the variable named 'ftp_proxy' is checked
- for FTP URLs. Again, the proxies are always HTTP proxies, the different
- names of the variables simply allows different HTTP proxies to be used.
- The proxy environment variable contents should be in the format
- "[protocol://]machine[:port]". Where the protocol:// part is simply
- ignored if present (so http://proxy and bluerk://proxy will do the same)
- and the optional port number specifies on which port the proxy operates on
- the host. If not specified, the internal default port number will be used
- and that is most likely *not* the one you would like it to be.
- There are two special environment variables. 'all_proxy' is what sets
- proxy for any URL in case the protocol specific variable wasn't set, and
- 'no_proxy' defines a list of hosts that should not use a proxy even though
- a variable may say so. If 'no_proxy' is a plain asterisk ("*") it matches
- all hosts.
- SSL and Proxies
- SSL is for secure point-to-point connections. This involves strong
- encryption and similar things, which effectivly makes it impossible for a
- proxy to operate as a "man in between" which the proxy's task is, as
- previously discussed. Instead, the only way to have SSL work over a HTTP
- proxy is to ask the proxy to tunnel trough everything without being able
- to check or fiddle with the traffic.
- Opening an SSL connection over a HTTP proxy is therefor a matter of asking
- the proxy for a straight connection to the target host on a specified
- port. This is made with the HTTP request CONNECT. ("please mr proxy,
- connect me to that remote host").
- Because of the nature of this operation, where the proxy has no idea what
- kind of data that is passed in and out through this tunnel, this breaks
- some of the very few advantages that come from using a proxy, such as
- caching. Many organizations prevent this kind of tunneling to other
- destination port numbers than 443 (which is the default HTTPS port
- number).
- Tunneling Through Proxy
- As explained above, tunneling is required for SSL to work and often even
- restricted to the operation intended for SSL; HTTPS.
- This is however not the only time proxy-tunneling might offer benefits to
- you or your application.
- As tunneling opens a direct connection from your application to the remote
- machine, it suddenly also re-introduces the ability to do non-HTTP
- operations over a HTTP proxy. You can in fact use things such as FTP
- upload or FTP custom commands this way.
- Again, this is often prevented by the adminstrators of proxies and is
- rarely allowed.
- Tell libcurl to use proxy tunneling like this:
- curl_easy_setopt(easyhandle, CURLOPT_HTTPPROXYTUNNEL, TRUE);
- In fact, there might even be times when you want to do plain HTTP
- operations using a tunnel like this, as it then enables you to operate on
- the remote server instead of asking the proxy to do so. libcurl will not
- stand in the way for such innovative actions either!
- Proxy Auto-Config
- Netscape first came up with this. It is basicly a web page (usually using
- a .pac extension) with a javascript that when executed by the browser with
- the requested URL as input, returns information to the browser on how to
- connect to the URL. The returned information might be "DIRECT" (which
- means no proxy should be used), "PROXY host:port" (to tell the browser
- where the proxy for this particular URL is) or "SOCKS host:port" (to
- direct the brower to a SOCKS proxy).
- libcurl has no means to interpret or evaluate javascript and thus it
- doesn't support this. If you get yourself in a position where you face
- this nasty invention, the following advice have been mentioned and used in
- the past:
- - Depending on the javascript complexity, write up a script that
- translates it to another language and execute that.
- - Read the javascript code and rewrite the same logic in another language.
- - Implement a javascript interpreted, people have successfully used the
- Mozilla javascript engine in the past.
- - Ask your admins to stop this, for a static proxy setup or similar.
- Persistancy Is The Way to Happiness
- Re-cycling the same easy handle several times when doing multiple requests is
- the way to go.
- After each single curl_easy_perform() operation, libcurl will keep the
- connection alive and open. A subsequent request using the same easy handle to
- the same host might just be able to use the already open connection! This
- reduces network impact a lot.
- Even if the connection is dropped, all connections involving SSL to the same
- host again, will benefit from libcurl's session ID cache that drasticly
- reduces re-connection time.
- FTP connections that are kept alive saves a lot of time, as the command-
- response roundtrips are skipped, and also you don't risk getting blocked
- without permission to login again like on many FTP servers only allowing N
- persons to be logged in at the same time.
- libcurl caches DNS name resolving results, to make lookups of a previously
- looked up name a lot faster.
- Other interesting details that improve performance for subsequent requests
- may also be added in the future.
- Each easy handle will attempt to keep the last few connections alive for a
- while in case they are to be used again. You can set the size of this "cache"
- with the CURLOPT_MAXCONNECTS option. Default is 5. It is very seldom any
- point in changing this value, and if you think of changing this it is often
- just a matter of thinking again.
- When the connection cache gets filled, libcurl must close an existing
- connection in order to get room for the new one. To know which connection to
- close, libcurl uses a "close policy" that you can affect with the
- CURLOPT_CLOSEPOLICY option. There's only two polices implemented as of this
- writing (libcurl 7.9.4) and they are:
- CURLCLOSEPOLICY_LEAST_RECENTLY_USED simply close the one that hasn't been
- used for the longest time. This is the default behavior.
- CURLCLOSEPOLICY_OLDEST closes the oldest connection, the one that was
- createst the longest time ago.
- There are, or at least were, plans to support a close policy that would call
- a user-specified callback to let the user be able to decide which connection
- to dump when this is necessary and therefor is the CURLOPT_CLOSEFUNCTION an
- existing option still today. Nothing ever uses this though and this will not
- be used within the forseeable future either.
- To force your upcoming request to not use an already existing connection (it
- will even close one first if there happens to be one alive to the same host
- you're about to operate on), you can do that by setting CURLOPT_FRESH_CONNECT
- to TRUE. In a similar spirit, you can also forbid the upcoming request to be
- "lying" around and possibly get re-used after the request by setting
- CURLOPT_FORBID_REUSE to TRUE.
- HTTP Headers Used by libcurl
- When you use libcurl to do HTTP requeests, it'll pass along a series of
- headers automaticly. It might be good for you to know and understand these
- ones.
- Host
- This header is required by HTTP 1.1 and even many 1.0 servers and should
- be the name of the server we want to talk to. This includes the port
- number if anything but default.
- Pragma
- "no-cache". Tells a possible proxy to not grap a copy from the cache but
- to fetch a fresh one.
- Accept:
- "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*". Cloned from a
- browser once a hundred years ago.
- Expect:
- When doing multi-part formposts, libcurl will set this header to
- "100-continue" to ask the server for an "OK" message before it proceeds
- with sending the data part of the post.
- Customizing Operations
- There is an ongoing development today where more and more protocols are built
- upon HTTP for transport. This has obvious benefits as HTTP is a tested and
- reliable protocol that is widely deployed and have excellent proxy-support.
- When you use one of these protocols, and even when doing other kinds of
- programming you may need to change the traditional HTTP (or FTP or...)
- manners. You may need to change words, headers or various data.
- libcurl is your friend here too.
- CUSTOMREQUEST
- If just changing the actual HTTP request keyword is what you want, like
- when GET, HEAD or POST is not good enough for you, CURLOPT_CUSTOMREQUEST
- is there for you. It is very simple to use:
- curl_easy_setopt(easyhandle, CURLOPT_CUSTOMREQUEST, "MYOWNRUQUEST");
- When using the custom request, you change the request keyword of the
- actual request you are performing. Thus, by default you make GET request
- but you can also make a POST operation (as described before) and then
- replace the POST keyword if you want to. You're the boss.
- Modify Headers
- HTTP-like protocols pass a series of headers to the server when doing the
- request, and you're free to pass any amount of extra headers that you
- think fit. Adding headers are this easy:
- struct curl_slist *headers=NULL; /* init to NULL is important */
- headers = curl_slist_append(headers, "Hey-server-hey: how are you?");
- headers = curl_slist_append(headers, "X-silly-content: yes");
- /* pass our list of custom made headers */
- curl_easy_setopt(easyhandle, CURLOPT_HTTPHEADER, headers);
- curl_easy_perform(easyhandle); /* transfer http */
- curl_slist_free_all(headers); /* free the header list */
- ... and if you think some of the internally generated headers, such as
- Accept: or Host: don't contain the data you want them to contain, you can
- replace them by simply setting them too:
- headers = curl_slist_append(headers, "Accept: Agent-007");
- headers = curl_slist_append(headers, "Host: munged.host.line");
- Delete Headers
- If you replace an existing header with one with no contents, you will
- prevent the header from being sent. Like if you want to completely prevent
- the "Accept:" header to be sent, you can disable it with code similar to
- this:
- headers = curl_slist_append(headers, "Accept:");
- Both replacing and cancelling internal headers should be done with careful
- consideration and you should be aware that you may violate the HTTP
- protocol when doing so.
- Enforcing chunked transfer-encoding
- By making sure a request uses the custom header "Transfer-Encoding:
- chunked" when doing a non-GET HTTP operation, libcurl will switch over to
- "chunked" upload, even though the size of the data to upload might be
- known. By default, libcurl usually switches over to chunked upload
- automaticly if the upload data size is unknown.
- HTTP Version
- There's only one aspect left in the HTTP requests that we haven't yet
- mentioned how to modify: the version field. All HTTP requests includes the
- version number to tell the server which version we support. libcurl speak
- HTTP 1.1 by default. Some very old servers don't like getting 1.1-requests
- and when dealing with stubborn old things like that, you can tell libcurl
- to use 1.0 instead by doing something like this:
- curl_easy_setopt(easyhandle, CURLOPT_HTTP_VERSION,
- CURLHTTP_VERSION_1_0);
- FTP Custom Commands
- Not all protocols are HTTP-like, and thus the above may not help you when
- you want to make for example your FTP transfers to behave differently.
- Sending custom commands to a FTP server means that you need to send the
- comands exactly as the FTP server expects them (RFC959 is a good guide
- here), and you can only use commands that work on the control-connection
- alone. All kinds of commands that requires data interchange and thus needs
- a data-connection must be left to libcurl's own judgement. Also be aware
- that libcurl will do its very best to change directory to the target
- directory before doing any transfer, so if you change directory (with CWD
- or similar) you might confuse libcurl and then it might not attempt to
- transfer the file in the correct remote directory.
- A little example that deletes a given file before an operation:
- headers = curl_slist_append(headers, "DELE file-to-remove");
- /* pass the list of custom commands to the handle */
- curl_easy_setopt(easyhandle, CURLOPT_QUOTE, headers);
- curl_easy_perform(easyhandle); /* transfer ftp data! */
- curl_slist_free_all(headers); /* free the header list */
- If you would instead want this operation (or chain of operations) to
- happen _after_ the data transfer took place the option to
- curl_easy_setopt() would instead be called CURLOPT_POSTQUOTE and used the
- exact same way.
- The custom FTP command will be issued to the server in the same order they
- are added to the list, and if a command gets an error code returned back
- from the server, no more commands will be issued and libcurl will bail out
- with an error code (CURLE_FTP_QUOTE_ERROR). Note that if you use
- CURLOPT_QUOTE to send commands before a transfer, no transfer will
- actually take place when a quote command has failed.
- If you set the CURLOPT_HEADER to true, you will tell libcurl to get
- information about the target file and output "headers" about it. The
- headers will be in "HTTP-style", looking like they do in HTTP.
- The option to enable headers or to run custom FTP commands may be useful
- to combine with CURLOPT_NOBODY. If this option is set, no actual file
- content transfer will be performed.
- FTP Custom CUSTOMREQUEST
- If you do what list the contents of a FTP directory using your own defined
- FTP command, CURLOPT_CUSTOMREQUEST will do just that. "NLST" is the
- default one for listing directories but you're free to pass in your idea
- of a good alternative.
- Cookies Without Chocolate Chips
- In the HTTP sense, a cookie is a name with an associated value. A server
- sends the name and value to the client, and expects it to get sent back on
- every subsequent request to the server that matches the particular conditions
- set. The conditions include that the domain name and path match and that the
- cookie hasn't become too old.
- In real-world cases, servers send new cookies to replace existing one to
- update them. Server use cookies to "track" users and to keep "sessions".
- Cookies are sent from server to clients with the header Set-Cookie: and
- they're sent from clients to servers with the Cookie: header.
- To just send whatever cookie you want to a server, you can use CURLOPT_COOKIE
- to set a cookie string like this:
- curl_easy_setopt(easyhandle, CURLOPT_COOKIE, "name1=var1; name2=var2;");
- In many cases, that is not enough. You might want to dynamicly save whatever
- cookies the remote server passes to you, and make sure those cookies are then
- use accordingly on later requests.
- One way to do this, is to save all headers you receive in a plain file and
- when you make a request, you tell libcurl to read the previous headers to
- figure out which cookies to use. Set header file to read cookies from with
- CURLOPT_COOKIEFILE.
- The CURLOPT_COOKIEFILE option also automaticly enables the cookie parser in
- libcurl. Until the cookie parser is enabled, libcurl will not parse or
- understand incoming cookies and they will just be ignored. However, when the
- parser is enabled the cookies will be understood and the cookies will be kept
- in memory and used properly in subsequent requests when the same handle is
- used. Many times this is enough, and you may not have to save the cookies to
- disk at all. Note that the file you specify to CURLOPT_COOKIEFILE doesn't
- have to exist to enable the parser, so a common way to just enable the parser
- and not read able might be to use a file name you know doesn't exist.
- If you rather use existing cookies that you've previously received with your
- Netscape or Mozilla browsers, you can make libcurl use that cookie file as
- input. The CURLOPT_COOKIEFILE is used for that too, as libcurl will
- automaticly find out what kind of file it is and act accordingly.
- The perhaps most advanced cookie operation libcurl offers, is saving the
- entire internal cookie state back into a Netscape/Mozilla formatted cookie
- file. We call that the cookie-jar. When you set a file name with
- CURLOPT_COOKIEJAR, that file name will be created and all received cookies
- will be stored in it when curl_easy_cleanup() is called. This enabled cookies
- to get passed on properly between multiple handles without any information
- getting lost.
- FTP Peculiarities We Need
- FTP transfers use a second TCP/IP connection for the data transfer. This is
- usually a fact you can forget and ignore but at times this fact will come
- back to haunt you. libcurl offers several different ways to custom how the
- second connection is being made.
- libcurl can either connect to the server a second time or tell the server to
- connect back to it. The first option is the default and it is also what works
- best for all the people behind firewalls, NATs or IP-masquarading setups.
- libcurl then tells the server to open up a new port and wait for a second
- connection. This is by default attempted with EPSV first, and if that doesn't
- work it tries PASV instead. (EPSV is an extension to the original FTP spec
- and does not exist nor work on all FTP servers.)
- You can prevent libcurl from first trying the EPSV command by setting
- CURLOPT_FTP_USE_EPSV to FALSE.
- In some cases, you will prefer to have the server connect back to you for the
- second connection. This might be when the server is perhaps behind a firewall
- or something and only allows connections on a single port. libcurl then
- informs the remote server which IP address and port number to connect to.
- This is made with the CURLOPT_FTPPORT option. If you set it to "-", libcurl
- will use your system's "default IP address". If you want to use a particular
- IP, you can set the full IP address, a host name to resolve to an IP address
- or even a local network interface name that libcurl will get the IP address
- from.
- When doing the "PORT" approach, libcurl will attempt to use the EPRT and the
- LPRT before trying PORT, as they work with more protocols. You can disable
- this behavior by setting CURLOPT_FTP_USE_EPRT to FALSE.
- Headers Equal Fun
- Some protocols provide "headers", meta-data separated from the normal
- data. These headers are by default not included in the normal data stream,
- but you can make them appear in the data stream by setting CURLOPT_HEADER to
- TRUE.
- What might be even more useful, is libcurl's ability to separate the headers
- from the data and thus make the callbacks differ. You can for example set a
- different pointer to pass to the ordinary write callback by setting
- CURLOPT_WRITEHEADER.
- Or, you can set an entirely separate function to receive the headers, by
- using CURLOPT_HEADERFUNCTION.
- The headers are passed to the callback function one by one, and you can
- depend on that fact. It makes it easier for you to add custom header parsers
- etc.
- "Headers" for FTP transfers equal all the FTP server responses. They aren't
- actually true headers, but in this case we pretend they are! ;-)
- Post Transfer Information
- [ curl_easy_getinfo ]
- Security Considerations
- libcurl is in itself not insecure. If used the right way, you can use libcurl
- to transfer data pretty safely.
- There are of course many things to consider that may loosen up this
- situation:
- Command Lines
- If you use a command line tool (such as curl) that uses libcurl, and you
- give option to the tool on the command line those options can very likely
- get read by other users of your system when they use 'ps' or other tools
- to list currently running processes.
- To avoid this problem, never feed sensitive things to programs using
- command line options.
- .netrc
- .netrc is a pretty handy file/feature that allows you to login quickly and
- automaticly to frequently visited sites. The file contains passwords in
- clear text and is a real security risk. In some cases, your .netrc is also
- stored in a home directory that is NFS mounted or used on another network
- based file system, so the clear text password will fly through your
- network every time anyone reads that file!
- To avoid this problem, don't use .netrc files and never store passwords in
- plain text anywhere.
- Clear Text Passwords
- Many of the protocols libcurl supports send name and password unencrypted
- as clear text (HTTP Basic authentication, FTP, TELNET etc). It is very
- easy for anyone on your network or a network nearby yours, to just fire up
- a network analyzer tool and evesdrop on your passwords. Don't let the fact
- that HTTP uses base64 encoded passwords fool you. They may not look
- readable at a first glance, but they very easily "deciphered" by anyone
- within seconds.
- To avoid this problem, use protocols that don't let snoopers see your
- password: HTTPS, FTPS and FTP-kerberos are a few examples. HTTP Digest
- authentication allows this too, but isn't supported by libcurl as of this
- writing.
- Showing What You Do
- On a related issue, be aware that even in situations like when you have
- problems with libcurl and ask somone for help, everything you reveal in
- order to get best possible help might also impose certain security related
- risks. Host names, user names, paths, operating system specifics etc (not
- to mention passwords of course) may in fact be used by intruders to gain
- additional information of a potential target.
- To avoid this problem, you must of course use your common sense. Often,
- you can just edit out the senstive data or just rearch/replace your true
- information with faked data.
- SSL, Certificates and Other Tricks
- [ seeding, passwords, keys, certificates, ENGINE, ca certs ]
- Multiple Transfers Using the multi Interface
- The easy interface as described in detail in this document is a synchronous
- interface that transfers one file at a time and doesn't return until its
- done.
- The multi interface on the other hand, allows your program to transfer
- multiple files in both directions at the same time, without forcing you to
- use multiple threads.
- [fill in lots of more multi stuff here]
- Future
- [ sharing between handles, mutexes, pipelining ]
- -----
- Footnotes:
- [1] = libcurl 7.10.3 and later have the ability to switch over to chunked
- Tranfer-Encoding in cases were HTTP uploads are done with data of an
- unknown size.
- [2] = This happens on Windows machines when libcurl is built and used as a
- DLL. However, you can still do this on Windows if you link with a static
- library.
- [3] = The curl-config tool is generated at build-time (on unix-like systems)
- and should be installed with the 'make install' or similar instruction
- that installs the library, header files, man pages etc.
|