123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382 |
- Updated for curl 7.9.1 on November 2, 2001
- _ _ ____ _
- ___| | | | _ \| |
- / __| | | | |_) | |
- | (__| |_| | _ <| |___
- \___|\___/|_| \_\_____|
- INTERNALS
- The project is split in two. The library and the client. The client part uses
- the library, but the library is designed to allow other applications to use
- it.
- The largest amount of code and complexity is in the library part.
- CVS
- ===
- All changes to the sources are committed to the CVS repository as soon as
- they're somewhat verified to work. Changes shall be commited as independently
- as possible so that individual changes can be easier spotted and tracked
- afterwards.
- Tagging shall be used extensively, and by the time we release new archives we
- should tag the sources with a name similar to the released version number.
- Windows vs Unix
- ===============
- There are a few differences in how to program curl the unix way compared to
- the Windows way. The four perhaps most notable details are:
- 1. Different function names for socket operations.
- In curl, this is solved with defines and macros, so that the source looks
- the same at all places except for the header file that defines them. The
- macros in use are sclose(), sread() and swrite().
- 2. Windows requires a couple of init calls for the socket stuff.
- Those must be made by the application that uses libcurl, in curl that means
- src/main.c has some code #ifdef'ed to do just that.
- 3. The file descriptors for network communication and file operations are
- not easily interchangable as in unix.
- We avoid this by not trying any funny tricks on file descriptors.
- 4. When writing data to stdout, Windows makes end-of-lines the DOS way, thus
- destroying binary data, although you do want that conversion if it is
- text coming through... (sigh)
- We set stdout to binary under windows
- Inside the source code, We make an effort to avoid '#ifdef [Your OS]'. All
- conditionals that deal with features *should* instead be in the format
- '#ifdef HAVE_THAT_WEIRD_FUNCTION'. Since Windows can't run configure scripts,
- we maintain two config-win32.h files (one in lib/ and one in src/) that are
- supposed to look exactly as a config.h file would have looked like on a
- Windows machine!
- Generally speaking: always remember that this will be compiled on dozens of
- operating systems. Don't walk on the edge.
- Library
- =======
- There are plenty of entry points to the library, namely each publicly defined
- function that libcurl offers to applications. All of those functions are
- rather small and easy-to-follow. All the ones prefixed with 'curl_easy' are
- put in the lib/easy.c file.
- curl_global_init_() and curl_global_cleanup() should be called by the
- application to initialize and clean up global stuff in the library. As of
- today, it can handle the global SSL initing if SSL is enabled and it can init
- the socket layer on windows machines. libcurl itself has no "global" scope.
- All printf()-style functions use the supplied clones in lib/mprintf.c. This
- makes sure we stay absolutely platform independent.
- curl_easy_init() allocates an internal struct and makes some initializations.
- The returned handle does not reveal internals. This is the 'SessionHandle'
- struct which works as an "anchor" struct for all curl_easy functions. All
- connections performed will get connect-specific data allocated that should be
- used for things related to particular connections/requests.
- curl_easy_setopt() takes three arguments, where the option stuff must be
- passed in pairs: the parameter-ID and the parameter-value. The list of
- options is documented in the man page. This function mainly sets things in
- the 'SessionHandle' struct.
- curl_easy_perform() does a whole lot of things:
- It starts off in the lib/easy.c file by calling Curl_perform() and the main
- work then continues in lib/url.c. The flow continues with a call to
- Curl_connect() to connect to the remote site.
- o Curl_connect()
- ... analyzes the URL, it separates the different components and connects to
- the remote host. This may involve using a proxy and/or using SSL. The
- Curl_gethost() function in lib/hostip.c is used for looking up host names.
- When Curl_connect is done, we are connected to the remote site. Then it is
- time to tell the server to get a document/file. Curl_do() arranges this.
- This function makes sure there's an allocated and initiated 'connectdata'
- struct that is used for this particular connection only (although there may
- be several requests performed on the same connect). A bunch of things are
- inited/inherited from the SessionHandle struct.
- o Curl_do()
- Curl_do() makes sure the proper protocol-specific function is called. The
- functions are named after the protocols they handle. Curl_ftp(),
- Curl_http(), Curl_dict(), etc. They all reside in their respective files
- (ftp.c, http.c and dict.c). HTTPS is handled by Curl_http() and FTPS by
- Curl_ftp().
- The protocol-specific functions of course deal with protocol-specific
- negotiations and setup. They have access to the Curl_sendf() (from
- lib/sendf.c) function to send printf-style formatted data to the remote
- host and when they're ready to make the actual file transfer they call the
- Curl_Transfer() function (in lib/transfer.c) to setup the transfer and
- returns.
- Starting in 7.9.1, if this DO function fails and the connection is being
- re-used, libcurl will then close this connection, setup a new connection
- and re-issue the DO request on that. This is because there is no way to be
- perfectly sure that we have discovered a dead connection before the DO
- function and thus we might wrongly be re-using a connection that was closed
- by the remote peer.
- o Transfer()
- Curl_perform() then calls Transfer() in lib/transfer.c that performs
- the entire file transfer.
- During transfer, the progress functions in lib/progress.c are called at a
- frequent interval (or at the user's choice, a specified callback might get
- called). The speedcheck functions in lib/speedcheck.c are also used to
- verify that the transfer is as fast as required.
- o Curl_done()
- Called after a transfer is done. This function takes care of everything
- that has to be done after a transfer. This function attempts to leave
- matters in a state so that Curl_do() should be possible to call again on
- the same connection (in a persistent connection case). It might also soon
- be closed with Curl_disconnect().
- o Curl_disconnect()
- When doing normal connections and transfers, no one ever tries to close any
- connections so this is not normally called when curl_easy_perform() is
- used. This function is only used when we are certain that no more transfers
- is going to be made on the connection. It can be also closed by force, or
- it can be called to make sure that libcurl doesn't keep too many
- connections alive at the same time (there's a default amount of 5 but that
- can be changed with the CURLOPT_MAXCONNECTS option).
- This function cleans up all resources that are associated with a single
- connection.
- Curl_perform() is the function that does the main "connect - do - transfer -
- done" loop. It loops if there's a Location: to follow.
- When completed, the curl_easy_cleanup() should be called to free up used
- resources. It runs Curl_disconnect() on all open connectons.
- A quick roundup on internal function sequences (many of these call
- protocol-specific function-pointers):
- curl_connect - connects to a remote site and does initial connect fluff
- This also checks for an existing connection to the requested site and uses
- that one if it is possible.
- curl_do - starts a transfer
- curl_transfer() - transfers data
- curl_done - ends a transfer
- curl_disconnect - disconnects from a remote site. This is called when the
- disconnect is really requested, which doesn't necessarily have to be
- exactly after curl_done in case we want to keep the connection open for
- a while.
- HTTP(S)
- HTTP offers a lot and is the protocol in curl that uses the most lines of
- code. There is a special file (lib/formdata.c) that offers all the multipart
- post functions.
- base64-functions for user+password stuff (and more) is in (lib/base64.c) and
- all functions for parsing and sending cookies are found in (lib/cookie.c).
- HTTPS uses in almost every means the same procedure as HTTP, with only two
- exceptions: the connect procedure is different and the function used to read
- or write from the socket is different, although the latter fact is hidden in
- the source by the use of curl_read() for reading and curl_write() for writing
- data to the remote server.
- http_chunks.c contains functions that understands HTTP 1.1 chunked transfer
- encoding.
- An interesting detail with the HTTP(S) request, is the add_buffer() series of
- functions we use. They append data to one single buffer, and when the
- building is done the entire request is sent off in one single write. This is
- done this way to overcome problems with flawed firewalls and lame servers.
- FTP
- The Curl_if2ip() function can be used for getting the IP number of a
- specified network interface, and it resides in lib/if2ip.c.
- Curl_ftpsendf() is used for sending FTP commands to the remote server. It was
- made a separate function to prevent us programmers from forgetting that they
- must be CRLF terminated. They must also be sent in one single write() to make
- firewalls and similar happy.
- Kerberos
- The kerberos support is mainly in lib/krb4.c and lib/security.c.
- TELNET
- Telnet is implemented in lib/telnet.c.
- FILE
- The file:// protocol is dealt with in lib/file.c.
- LDAP
- Everything LDAP is in lib/ldap.c.
- GENERAL
- URL encoding and decoding, called escaping and unescaping in the source code,
- is found in lib/escape.c.
- While transfering data in Transfer() a few functions might get
- used. curl_getdate() in lib/getdate.c is for HTTP date comparisons (and
- more).
- lib/getenv.c offers curl_getenv() which is for reading environment variables
- in a neat platform independent way. That's used in the client, but also in
- lib/url.c when checking the proxy environment variables. Note that contrary
- to the normal unix getenv(), this returns an allocated buffer that must be
- free()ed after use.
- lib/netrc.c holds the .netrc parser
- lib/timeval.c features replacement functions for systems that don't have
- gettimeofday() and a few support functions for timeval convertions.
-
- A function named curl_version() that returns the full curl version string is
- found in lib/version.c.
- If authentication is requested but no password is given, a getpass_r() clone
- exists in lib/getpass.c. libcurl offers a custom callback that can be used
- instead of this, but it doesn't change much to us.
- Persistent Connections
- ======================
- The persistent connection support in libcurl requires some considerations on
- how to do things inside of the library.
- o The 'SessionHandle' struct returned in the curl_easy_init() call must never
- hold connection-oriented data. It is meant to hold the root data as well as
- all the options etc that the library-user may choose.
- o The 'SessionHandle' struct holds the "connection cache" (an array of
- pointers to 'connectdata' structs). There's one connectdata struct
- allocated for each connection that libcurl knows about.
- o This also enables the 'curl handle' to be reused on subsequent transfers,
- something that was illegal before libcurl 7.7.
- o When we are about to perform a transfer with curl_easy_perform(), we first
- check for an already existing connection in the cache that we can use,
- otherwise we create a new one and add to the cache. If the cache is full
- already when we add a new connection, we close one of the present ones. We
- select which one to close dependent on the close policy that may have been
- previously set.
- o When the transfer operation is complete, we try to leave the connection
- open. Particular options may tell us not to, and protocols may signal
- closure on connections and then we don't keep it open of course.
- o When curl_easy_cleanup() is called, we close all still opened connections.
- You do realize that the curl handle must be re-used in order for the
- persistent connections to work.
- Library Symbols
- ===============
-
- All symbols used internally in libcurl must use a 'Curl_' prefix if they're
- used in more than a single file. Single-file symbols must be made static.
- Public ("exported") symbols must use a 'curl_' prefix. (There are exceptions,
- but they are to be changed to follow this pattern in future versions.)
- Return Codes and Informationals
- ===============================
- I've made things simple. Almost every function in libcurl returns a CURLcode,
- that must be CURLE_OK if everything is OK or otherwise a suitable error code
- as the curl/curl.h include file defines. The very spot that detects an error
- must use the Curl_failf() function to set the human-readable error
- description.
- In aiding the user to understand what's happening and to debug curl usage, we
- must supply a fair amount of informational messages by using the Curl_infof()
- function. Those messages are only displayed when the user explicitly asks for
- them. They are best used when revealing information that isn't otherwise
- obvious.
- Client
- ======
- main() resides in src/main.c together with most of the client code.
- src/hugehelp.c is automatically generated by the mkhelp.pl perl script to
- display the complete "manual" and the src/urlglob.c file holds the functions
- used for the URL-"globbing" support. Globbing in the sense that the {} and []
- expansion stuff is there.
- The client mostly messes around to setup its 'config' struct properly, then
- it calls the curl_easy_*() functions of the library and when it gets back
- control after the curl_easy_perform() it cleans up the library, checks status
- and exits.
- When the operation is done, the ourWriteOut() function in src/writeout.c may
- be called to report about the operation. That function is using the
- curl_easy_getinfo() function to extract useful information from the curl
- session.
- Recent versions may loop and do all this several times if many URLs were
- specified on the command line or config file.
- Memory Debugging
- ================
- The file lib/memdebug.c contains debug-versions of a few functions. Functions
- such as malloc, free, fopen, fclose, etc that somehow deal with resources
- that might give us problems if we "leak" them. The functions in the memdebug
- system do nothing fancy, they do their normal function and then log
- information about what they just did. The logged data can then be analyzed
- after a complete session,
- memanalyze.pl is the perl script present only present in CVS (not part of the
- release archives) that analyzes a log file generated by the memdebug
- system. It detects if resources are allocated but never freed and other kinds
- of errors related to resource management.
- Use -DMALLOCDEBUG when compiling to enable memory debugging, this is also
- switched on by running configure with --enable-debug.
- Test Suite
- ==========
- Since November 2000, a test suite has evolved. It is placed in its own
- subdirectory directly off the root in the curl archive tree, and it contains
- a bunch of scripts and a lot of test case data.
- The main test script is runtests.pl that will invoke the two servers
- httpserver.pl and ftpserver.pl before all the test cases are performed. The
- test suite currently only runs on unix-like platforms.
- You'll find a complete description of the test case data files in the
- tests/README file.
- The test suite automatically detects if curl was built with the memory
- debugging enabled, and if it was it will detect memory leaks too.
- Building Releases
- =================
- There's no magic to this. When you consider everything stable enough to be
- released, run the 'maketgz' script (using 'make distcheck' will give you a
- pretty good view on the status of the current sources). maketgz prompts for
- version number of the client and the library before it creates a release
- archive. maketgz uses 'make dist' for the actual archive building, why you
- need to fill in the Makefile.am files properly for which files that should
- be included in the release archives.
|