linuxnotes.txt 16 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
  1. Role of the Kernel:
  2. The kernel dictates which program gets which pieces of memory, it starts and kills programs, and it handles displaying text on a monitor. When an application needs to write to disk, it must ask the operating system to do it. If tow applications ask for the same resource, the kernel decides who gets it, and in some cases, kills off one of the applications in order to save the rest of the system.
  3. The kernel also handles switching of applications. A computer will have a small number of CPUs and a finite amount of memory. The kernel takes care of unloading one task and loading a new task if there are more tasks than CPUs. When the current task has run a sufficient amount of time, the CPU pauses the task so that another may run. This is called pre-emptive multitasking. Multitasking means that the computer is doing several tasks at once, and pre-emptive means that the kernel is deciding when to switch focus between tasks. With the tasks rapidly switching, it appears that the computer is doing many things at once.
  4. Each application may think it has a large block of memory on the system, but it is the kernel that maintains this illusion, remapping smaller blocks of memory, sharing blocks of memory with other applications, or even swapping out blocks that haven't been touched to disk.
  5. When the computer starts up it loads a small piece of code called a boot loader. The boot loader's job is to load the kernel and get it started. The boot loader loads the Linux Kernel, and then transfers control. Linux then continues with running the programs necessary to make the computer useful, such as connecting to the network or starting a web server.
  6. Applications:
  7. Applications make requests to the kernel and receive resources, such as memory, CPU, and disk, in return. The kernel also abstracts the complicated details away from the application. The application doesn't know if a block of disk is on a solid-state drive from manufacturer A, a spinning metal hard drive from manufacturer B, or even a network file share. Applications just follow the kernel's Application Programming Interface (API) and in return don't have to worry about the implementation details.
  8. When we, as users, think of applications, we tend to think of word processors, web browsers, and email clients. The kernel doesn't care if it is running something that's user facing, a network service that talks to a remote computer, or an internal task. So, from this we get an abstraction called a process. A process is just one task that is loaded and tracked by the kernel. An application may even need multiple processes to function, so the kernel takes care of running the processes, starting and stopping them as requested, and handing out system resources.
  9. Linux Distributions:
  10. Take Linux and the GNU tools, add some more user facing applications like an email client, and you have a full Linux system. People started bundling all this software into a distribution almost as soon as Linux became usable. The distribution takes care of setting up the storage, installing the kernel, and installing the rest of the software. The full featured distributions also include tools to manage the system and a package manager to help you add and remove software after the installation is complete.
  11. Debian is more of a community effort, and as such, also promotes the use of open source software and adherence to standards. Debian came up with its own package management system based on the .deb file format. While Red Hat leaves non Intel and AMD platform support to derivative projects. Debian supports many of these platforms directly.
  12. What is a Command?
  13. The simplest answer to the question, "What is a command?", is that a command is a software program that when executed on the command line, performs an action on the computer.
  14. When you consider a command using this definition, you are really considering what happens when you execute a command. When you type in a command, a process is run by the operating system that can read input, manipulate data and produce output. From this perspective, a command runs a process on the operating system, which then causes the computer to perform a job.
  15. However, there is another way of looking at what a command is: look at its source. The source is where the command "comes from" and there are several different sources of commands within the shell of your CLI:
  16. * Commands built0in to the shell itself:
  17. A good example is the cd command as it is part of the bash shell. When a user types the cd command, the bash shell is already executing and knows how to interpret that command, requiring no additional programs to be started.
  18. * Commands that are stored in files that are searched by the shell:
  19. If you type a ls command, then the shell searches through the directories that are listed in the PATH variable to try to find a file named ls that it can execute. These commands can also be executed by typing the complete path to the command.
  20. * Aliases:
  21. An alias can override a built-in command, function, or a command that is found in a file. Aliases can be useful for creating new commands built from existing functions and commands.
  22. * Functions:
  23. Functions can also be built using existing commands to either create new commands, override commands built-in to the shell or commands stored in files. Aliases and functions are normally loaded from the initialization files when the shell first starts, discussed later in this section.
  24. This brief example may be helpful in understanding the concept of commands. An alias is essentially a nickname for another command or series of commands. For example, the cal 2014 command will display the calendar for the year 2014. Suppose you end up running this command often, instead of executing the full command each time, you can create an alias called mycal and run the alias, as demonstrated in the example,
  25. alias mycal="cal 2014"
  26. The most important question to ask when determining the configuration of a machine is "what will this machine do?"
  27. Decision Points:
  28. The first thing you need to decide is the machine's role. Will you be sitting at the console running productivity applications or web browsing? If so, you have a desktop. Will the machine be used as a Web server or otherwise sitting in a server rack somewhere? You're looking at a server.
  29. Servers usually sit in a rack and share a keyboard and monitor with many other computers, since console access is only used to set up and troubleshoot the server. The server will run in non-graphical mode, which frees up resources for the real purpose of the computer. A desktop will primarily run a GUI.
  30. Next, determine the functions of the machine. Is there specific software it needs to run, or specific functions it needs to do? Do you need to be able to manage hundreds or thousands of these machines at the same time? What is the skill set of the team managing the computer and software?
  31. You must also determine the lifetime and risk tolerance of the server. Operating systems and software upgrades come on a periodic basis, called the release cycle. Software vendors will only support older versions of software for a certain period of time before not offering any updates, which is called the maintenance cycle (or life cycle). For example, major Fedora Linux releases come out approximately every 6 months. Versions are considered End of Life (EOL) after 2 major versions plus one month, so you have between 7 and 13 months after installing Fedora before you need to upgrade. Contrast this with the commercial server variant, Red Hat Enterprise Linux, and you can go up to 13 years before needing to upgrade.
  32. The maintenance and release cycles are important because in an enterprise server environment it is time consuming, and therefore rare, to do a major upgrade on a server. Instead, the server itself is replaced when there are major upgrades of replacements to the application that necessitate an operating system upgrade. Similarly, a slow release cycle is important because applications often target the current version of the operating system and you want to avoid the overhead of upgrading servers and operating systems constantly to keep up. There is fair amount o work involved in upgrading a server, and the server role often has many customizations made that are difficult to prot to a new server. This necessitates much more testing than if only the application were upgraded.
  33. If you are doing software development of traditional desktop work, you often want the latest software. Newer software has improvements in both functionality and appearance, which contributes to more enjoyment from the use of the computer. A desktop often stores its work on a remote server, so the desktop can be wiped clean and the newer operating system put on with little interruption.
  34. Individual software releases can be characterized a beta or stable. One of the great things about being an open source developer is that you can release your new software and quickly get feedback from users. If a software release is in a state that it has many new features that have not been rigorously tested, it is typically referred to as beta. After those features have been tested in the field, the software moves to a stable point. If you need the latest features, then you are looking for a distribution that has a quick release cycle and makes it easy to use beta software. On the server side, you want stable software unless those new features are necessary and you don't mind running code that has not been thoroughly tested.
  35. Another loosely related concept is backward compatibility. This refers to the ability for a later operating system to be compatible with software made for earlier versions. This is usually a concern if you need to upgrade your operating system, but aren't in a position to upgrade your application software.
  36. Of course, cost is always a factor. Linux itself might be free, but you may need to pay for support, depending on which options you choose. Your chosen operating system might only run on a particular selection of hardware, which further affects the cost.
  37. Server Applications:
  38. Linux excels at running server applications because of its reliability and efficiency. When considering server software, the most important question is "what service am I running?" If you want to serve web pages, you will need web server software, not a mail server!
  39. One of the early uses of Linux was for web servers. A web server hosts content for web pages, which are viewed by a web browser using the Hypertext Transfer Protocol (HTTP) or its encrypted flavor, HTTPS. The web page itself can be static which means that when the web browser requests the page the web server just sends the file as it appears on disk. The server can also serve dynamic content, meaning that the request is sent by the web server to an application, which generates the content. WordPress is one popular example. Users can develop content through their browser in the WordPress application and the software turns it into a fully functional website.
  40. Apache is the dominant web server in use today. Apache was originally a stand-alone project but the group has since formed the Apache Software Foundation and maintains over a hundred open source software projects.
  41. Another web server is nginx. It focuses on performance by making use of more modern UNIX kernels and only does a subset of what Apache can do. Over 65% of websites are powered by either nginx or Apache.
  42. Email has always been a popular use for Linux servers. When discussing email servers it is always helpful to look at the 3 different roles required to get email between people:
  43. 1. Mail Transfer Agent (MTA) - figures out which server needs to receive the email and uses the Simple Mail Transfer Protocol (SMTP) to move the email to that server. It is not unusual for an email to take several "hops" to get to its final destination, since an organization might have several MTAs.
  44. 2. Mail Delivery Agent (MDA, also called the Local Delivery Agent) - takes care of storing the email in the user's mailbox. Usually invoked from the final MTA in the chain.
  45. 3. POP/IMAP server - The Post Office Protocol and Internet Message Access Protocol are two communication protocols that let an email client running on your computer talk to a remote server to pick up the email.
  46. Sometimes a piece of software will implement multiple components. In the closed source world, Microsoft Exchange implements all the components, so there is no option to make individual selections. In the open source world there are many options. Some POP/IMAP servers implement their own mail database format for performance, so will also include the MDA if the custom database is desired. People using standard file formats ( such as all the emails in one text file) can choose any MDA.
  47. The most well known MTA is sendmail. Postfix is another popular one and aims to be simpler and more secure than sendmail.
  48. If you're using standard file formats for storing emails, your MTA can also deliver mail. Alternatively, you can use something like procmail, which lets you define custom filters to process mail and filter it.
  49. Dovecot is a popular POP/IMAP server owing to its ease of use and low maintenance. Cyrus IMAP is another option.
  50. For file sharing, Samba is the clear winner. Samba allows a Linux machine to look like a Window machine so that it can share files and participate in a Windows domain. Samba implements the server components, such as making files available for sharing and certain Windows server roles, and also the client and so that a Linux Machine may consume a Windows file share.
  51. If you have Apple machines on your network, the Netatalk project lets your Linux machine behave as an Apple file server.
  52. The native file sharing protocol for UNIX is called the Network File System (NFS). NFS is usually part of the kernel which means that a remote file system can be mounted just like a regular disk, making the access transparent to other applications.
  53. As your computer network gets larger, you will need to implement some kind of directory. The oldest directory is called the Domain Name System and is used to convert a name like http://www.linux.com to an IP address like 192.168.100.100, which is a unique identifier of that computer on the Internet. DNS also holds such global information like the address of the MTA for a given domain name. An organization may want to run their own DNS server to host their public facing names, and also to serve as an internal directory of services. The Internet Software Consortium maintains the most popular DNS server, simply called bind after the name of the process that runs the service.
  54. The DNS is largely focussed on computer names and IP addresses and is not easily search able. Other directions have sprung up to store other information such as user accounts and security roles. The Lightweight Directory Access Protocol (LDAP) is the most common directory which also powers Microsoft's Active Directory. In LDAP, an object is stored in a tree, and the position of that object on the tree can be used to derive information abut the object in addition to what's stored with the object itself. For example, a Linux administrator may be stored in a branch of the tree called "IT department", whci is under a branch called "Operations". Thus one can find all the technical staff by searching under the IT department branch. OpenLDAP is the dominant player here.
  55. One final piece of network infrastructure is called the Dynamic Host Configuration Protocol (DHCP). When a computer boots up, it needs an IP address for the local network so it can be uniquely identified. DHCP's job is t listen for requests and to assign a free address from the DHCP pool. The Internet Software Consortium also maintains the ISC DHCP server, which is the most common player here.
  56. A database stores information and also allows for easy retrieval and querying. The most popular databases here are MySQL and PostgreSQL. You might enter raw sales figures into the database and then use a language called Structured Query Language (SQL) to aggregate sales by product and date in order to produce a report.