Introduction to Yum

1. Introduction

 

Yum is a tool for automating package maintenance for a network of workstations running any operating system that use the Red Hat Package Management (RPM) system for distributing packaged tools and applications. It is derived from yup, an automated package updater originally developed for Yellowdog Linux, hence its name: yum is “Yellowdog Updater, Modified”.

Yup was originally written and maintained by Dan Burcaw, Bryan Stillwell, Stephen Edie, and Troy Bengegerdes of Yellowdog Linux (an RPM-based Linux distribution that runs on Apple Macintoshes of various generation). Yum was originally written by Seth Vidal and Michael Stenner, both at Duke University at the time. Since then both Michael and Seth have moved on, Seth to working for Red Hat, where he remains the dominant force behind yum development and maintenance.

It is important to note that yum is an open source GPL project and that many people have contributed code, ideas, bug fixes and documentation. The AUTHORS list was up to 26 or so as of the time of this HOWTO snapshot; yum is a clear example of the power of open source develpmment!

Yum is a Gnu Public License (GPL) tool; it is freely available and can be used, modified, or redistributed without any fee or royalty provided that the terms of its associated license are followed.

1.1 What Yum Can Do

Yum consists of the yum client itself as well as a suite of tools and numerous plugins that modify the basic default behavior of the yum client. In addition, the createrepo command allows one to create a yum repository on any suitable server. Once a yum repository is prepared any client permitted to access the repository over the network can install, update, or remove one or more rpm-based packages from the repository. Yum can also be used as a more or less drop-in replacement for executing the familiar:

rpm -Uvh whatever.rpm

command, with the benefit that yum will automatically search its connected repositories for dependencies for the RPM at hand and permit them to be automagically installed in one step. This alone is an enormous benefit compared to trying to work one’s way out of “dependency hell” and track down and install by hand all of the dependencies of a typical RPM package.

 

In addition, the yum client encapsulates various informational tools. It can list rpm’s both installed and available for installation, extract and publish information from the rpm headers based on keywords or globs, find packages that provide particular files. Yum is therefore of great use to users of a workstation, either private or on a LAN; with yum they can look over the list of available packages to see if there is anything “interesting”, search for packages that contain a particular tool or apply to a particular task, and more.

If this isn’t enough, yum is the back end for a number of GUI tools that provide a user with “instant” visual access to the entire (rather enormous) range of packages available for any linux distribution that relies on it. For example, in Fedora 12 (as of the current snapshot of this HOWTO) there are over twenty thousand individual packages listed in the repositories connected to my personal laptop. Not all of these are distinct applications as many applications have distinct packages such as the application itself, a development package, a library package and a documentation package, but it is safe to say that there are many thousands of actual applications, most of them well written and maintained and quite useful. A GUI-based tool is a useful way to browse through this list to see what is available that you might find useful or cool to have on your personal linux/yum-based machine.

Yum is designed to be a client-pull tool, permitting package management to be “centralized” to the extent required to ensure security and interoperability even across a broad, decentralized administrative domain. No root privileges are required on the server by yum clients — yum rquuires at most anonymous access (restricted or unrestricted) from the clients to a repository server (often one that is maintained by a central — and competent — authority). This makes yum an espicially attractive tool for providing “centralized” scalable administration of linux systems in a decentralized network management environment, where a mix of machines maintained by their owners and by a variety of network managers naturally occurs (such as a University or corporation).

One of yum’s most common uses in any LAN environment is to be run from a nightly cron script on each yum-maintained system to update every rpm package on the system safely to the latest versions available on the repository, including all security or operationally patched updates. If e.g. yum-cron is itself installed from a rpm custom-preconfigured to perform this nightly update, an entire campus that installs its systems from a common repository base can achieve near complete consistency with respect to distribution, revision, and security. Security and other updates will typically appear on all net-connected clients no more than 24 hours after the an updated rpm is placed on the repository by its (trusted) administrator who requires no root-level privileges on any of the clients.

Consequently with yum a single trusted administrator can maintain a trusted rpm repository (set) for an entire University campus, an entire corporation, an entire government laboratory or institution. Alternatively, responsibility for different parts of a distribution can be split up safely between several trusted administrators on distinct repositories, or a local administrator can add a local trusted repository to overlay or augment the offerings of the campus level repositories or international distribution repositories. All systems at a common revision level will be consistent and interoperable to the extent that their installed packages (plus any overlays by local administrators) allow. Yum is hence an amazingly powerful tool for creating a customized repository-based package delivery and maintenance system that can scale the work of a single individual to cover thousands of machines.

And it’s free. It just doesn’t get any better than that….

1.2 How Yum Works

To understand how yum works it helps to define a few terms:

  • server: The term server generally refers to the physical system that provides access one or more ways to a repository. However when yum was first developed there was generally only one server per repository and server was often used as more or less of a synonym for repository. We will use it below only in the former sense — as a reference to a particular web, ftp, nfs server that provides access to a repository, not as the repository itself.
  • repository: A repository is a collection of rpms under some sort of filesystem tree. For most purposes associated with yum, the repository will have two more important characteristics. It has had the command createrepo run on the tree, which extracts and encodes all of the metadata that yum relies on in order to function. Also, the tree, and is made accessible by URL from a server (which means as one or more of http://my.web.server/path, ftp://my.ftp.server/path, file://full/file/path to the repository tree).
  • serverid: As noted above, there used to be a more or less one to one correspondance between servers and repositories in early versions of yum. However, this correspondance is now many to many. A single repository can be mirrored on many servers, and a single server can hold many repositories. When organizing “robust” access to repositories (which means providing URL’s to the same repository on fallback servers in case the primary server of a repository is down) it is now necessary to label the repository with some sort of unique id that obviously cannot be the server or repository alone. The serverid is thus the unique name used to indicate that all the repositories given under a single baseurl are (presumably) mirrors of one another.
  • RPM: This stands for “Red Hat Package Manager”, the toolset developed by Red Hat for distributing and maintaining “packages” of tools, libraries, binaries, and data for their linux distribution. It is fully open source and is currently the basis for many linux distributions other than Red Hat. When the documentation below speaks of “an rpm” it refers to a single package, usually named packagename-version.arch.rpm. To understand how yum functions, it is necessary to understand a bit about the structuring of rpm’s.

 

An rpm consists of basically three parts: a header, a signature, and the (generally compressed) archive itself. The header contains a complete file list, a description of the package, a list of the features and libraries it provides, a list of tools it requires (from other packages) in order to function, what (known) other packages it conflicts with, and more. The basic rpm tool needs information in the header to permit a package to be installed (or uninstalled!) in such a way that:

  • Installing the package breaks none of the already installed packages (recursively, as they may need packages of their own to be installed).
  • All the packages that the package requires for correct operation are also (or already) installed along with the selected package, recursively.
  • A later version of the package does not (accidentally) replace an earlier version of the package.

Note that a similar list applies to uninstallation; removing a package must not break any packages left behind, for example.

 

This process is generically known as “resolving package dependencies” and is one of the most difficult parts of package management. It is quite possible to want to install a packaged tool that requires two or three libraries and a tool. The libraries in turn may require other libraries, the tool other tools. By the time you’re done, installing the package may require that you install six or eight otherpackages, none of which are permitted to conflict or break any of the packages that are already there or will remain behind.

If you have ever attempted to manage rpm’s by hand, you know that tracking down all of the headers and dependencies and resolving all conflicts is not easy and that it actually becomes more difficult in time as a system manager updates this on one system, that on another, rebuilds a package here, installs something locally into /usr/local there. Eventually (sometimes out of sheer frustration) an rpm is –force installed, and thereafter the rpm database itself on the system itself is basically inconsistent and any rpm install is likely to fail and require –force-ing in turn. Entropy creeps into the network, and with it security risks and dysfunction.

Yet not updating packages is also a losing situation. If you leave a distribution based install untouched it remains clean. However, parts of it were likely broken at the time of install — there are always bugs even in the most careful of major distributions. Some of those bugs are security bugs, and as crackers discover them and exploits are developed it rapidly becomes a case of “patch your system or lay out the welcome mat for vermin”. This is a global problem with all operating systems; even Windows-based systems (notorious for their vulnerability to viruses and crackers) can be made reasonably secure if they are rigorously kept up to date. Finally, users come along and demand THIS package or THAT package which are crucial to their work — but not in the original, clean, consistent installation.

In balance, any professional LAN manager (or even humble standalone linux workstation owner) has little choice; they must have some sort of mechanism for updating the packages already installed on their system(s) to the latest, patched, secure, debugged versions and for adding more packages, including ones that may not have been in the distribution they relied upon for their original base install. The only questions are: what mechanism should they use and what will it cost them (in time, hassle, learning curve, and reliability as well as in money). Let us consider the problem:

In a typical repository, there are a lot of distinct packages (currently many thousands). I have over 2500 packages installed on my Fedora 12 based laptop as I type this, corresponding to close to 1000 distinct applications. Each of these packages contains considerable metadata detailing their library requirements and so on, and can easily be as large as hundreds of megabytes in size (for the larger office suites or browsers).

Early automated update tools either required a locally mounted repository directory — for example, the original install CD for the operating system — in order to be able to access all of the headers quickly (local disk access even from a relatively slow CD-ROM drive, being fast enough to deliver the rpm’s in a timely way so that their headers could be extracted and parsed) or required that each linked rpm be sent in its entirety over a network to an updating client from the repository just so it could read the header. The first was locally fast but required a large commitment of local disk resources (in addition to creating a new problem, that of keeping all the local copies of a master repository synchronized). The other was very slow. Both were also network resource intensive.

This is the fundamental problem that yum solves for you. Yum splits off the headers and metadata on the repository side (using createrepo or any of several other repo management tools). The headers themselves are sorted, XML encoded, and compressed, and are then available to be downloaded separately and quickly to the yum client where they are typically cached semi-permanently and periodically updated . Yum clients also cache (space permitting or according to the requirements and invocation schema selected by the system’s administrator) the rpm’s themselves when they are downloaded for an actual install or update, giving a yum client the best of both the options above — a local disk image of (just the relevant part of) the repository that is automatically and transparently managed and rapid access to just the package metadata.

An actual download of all the headers associated with packages found on your system occurs the first time a yum client is invoked and thereafter it adds to or updates the cached metadata (and downloads and caches the required rpm’s) only if the repository has more recent versions or if the user has deliberately invoked yum’s “clean” command to empty all its caches. All of yum’s dependency resolution then proceeds from these cached header files, and if for any reason the install or update requires an rpm already in the cache to be reinstalled, it is immediately available.

With the header information (only) handy on high-speed local media, the standard tools used to maintain rpm’s are invoked by yum and can quickly proceed to resolve all dependencies, determine if it is safe to proceed, what additional packages need to be installed, and so forth. Note well that yum was originally designed (by a highly experienced systems administrator, Seth Vidal) with the help of all the other highly experienced systems administrators on the yum list to be safe. It will generally not proceed if it encounters a dependency loop, a package conflict, or a revision number conflict.

If yum finds that everything is good and the package can be safely installed, removed, or updated, it can either be invoked in such a way that it does so automatically with no further prompts so it can run automagically from cron, or (the general default when invoked from a command line) it can issue a user a single prompt indicating what it is about to do and requesting permission to proceed. If it finds that the requested action is in fact not safe, it will exit with as informative an error message as it can generate, permitting the system’s administrator/owner to attempt to resolve the situation by hand before proceeding (which may, for example, involve removing certain conflicting packages from the client system or fixing the repository list).

From the overview given above, it should be apparent that yum is potentially a powerful tool indeed, using a single clever idea (the splitting off of the rpm headers) to achieve a singular degree of efficiency. One can immediately imagine all sorts of ways to exploit the information now so readily available to a client and wrap them all up in a single interface to eliminate the incredibly arcane and complex commands otherwise required to learn anything about the installed package base on a system and what is still available. The yum developers have been doing just that on the yum list – dreaming up features and literally overnight implementing the most attractive ones in new code, generally in the form of plugins that augment or extend the base functionality of yum itself. At this point the yum suite is very likely only thing you’ll ever need to manage packages on any rpm based system. Red Hat and Fedora themselves now use yum directly to perform the original install, speeding the install itself up by a factor of perhaps two and skipping the need to do an update immediately after installing to bring the system up to date.

By putting all the extra “optional” functionality into optional plugins with their own documentation, yum has achieved and extraordinary degree of power while still retaining its appealing simplicity. Most of what you need yum to do is done with only a handful of highly intuitive commands. It is also remarkably self-documenting, but it can be a bit intimidating for a user who is not familiar with the command line but who wishes to go beyond the default update behavior their system was installed with or accessible through the various GUIs.

This HOWTO is intended to document yum’s capabilities so even a novice can learn to use it client-side effectively in a very short time, and so that LAN administrators can have guidance in the necessarily more complex tasks associated with building and maintaining the repositories from which the yum clients retrieve metadata and rpm’s.

1.3 Yum, RPM, and Red Hat

Because yum invokes the same tools and python bindings used by e.g. Red Hat to actually resolve dependencies and perform installations and indeed has been directly integrated into Red Hat and Fedora’s original installation process for several versions now it has proven remarkably robust over several changes to the rpm toolset that have occurred since its inception, some of them fairly major. At this point it is almost impossible for yum to “break” without Red Hat’s own rpm installation toolset breaking as well. Since Red Hat more or less directly supports yum development, it is most unlikely to go away or break any time soon.

It is important to emphasize, however, that yum is not a tool for administering Red Hat only repositories. Red Hat and Fedora will be prominently mentioned in this HOWTO largely because historically its original development at Duke University proceeded from the Red Hat basis for our campuswide linux distribution; Duke has been a primary (yum-enabled) mirror for Red Hat, Centos, and Fedora, and of course Duke is literally down the road a few miles from Red Hat itself.

However, Note Well: Yum itself is designed for, and has been successfully used to support, rpm repositories of any operating system or distribution that relies on rpm’s for package management and contains or can be augmented with the requisite rpm-python tools. Yum has been tested on or is in production on just about all the major rpm-based linuces, as well as at least one Solaris repository. Its direct conceptual predecessor (with which it shares many design features and ideas, although very little remaining actual code) is Yellowdog Linux’s updater tool yup, which had nothing whatsoever to do with Red Hat per se. Yum truly is free like the air, and distribution-agnostic by deliberate design. There is nothing to prevent yum from being used to distribute non-software packages or to automate the distrbuution and maintenance of non- open-source commercial software (where repository access might be e.g. authenticated in some way) and there are excellent reasons for commercial vendors of Linux software, at least, to consider doing so!

1.4 RPM Hell

A moment or two of meditation upon dependency resolution should suffice to convince one that Great Evil is possible in a large rpm repository. You have hundreds, perhaps thousands of rpm packages. Some are commercial, some are from some major distribution(s), others are local homebrew. What if, in all of these packages built at different times and by different people, you ever find that there exist rpm’s such that (e.g.) rpm A requires rpm B, which conflicts with rpm C (already installed)? What if rpm A requires rpm B (revision 1.1.1) but rpm B (revision 1.2.1) is already installed and is required in that revision by rpm C (also already installed)? What if you install an application from source that replaces critical dependencies for installed packages that may not be compatible? What if you install a commercial package that requires an obsolete library to function? It is entirely possible to assemble an “rpm repository from hell” such that nearly any attempt to install a package will break something (or require something that breaks something).

Untangling and avoiding this mess is what earns the major (rpm-based or not) linux distribution providers whatever money they are paid. Teyy provide an entire set of rpm’s (or other packages) “all at onc”” that are guaranteed to be consistent in the distribution snapshot on the C’ss or ISO images or primary website repositories. All rpm’s required by any rpm in the set are in the set. No rpm’s in the provided set conflict with other rpm’s in the set. Consequently any rpm in the set can be selected to be installed on any system built from the distribution with the confidence that, once all the rpm dependencies are resolved, the rpm (along with its missing dependencies) can be successfully installed. The set provided is at least approximately complete, so that one supposedly has little incentive or need to install packages not already in the distribution (except where so doing requires the customer to “buy” a more expensive distribution from the vendor:-).

In the real world this ideal of consistency and completeness is basically never achieved. All the distributions I’ve ever tried or know about have bugs, often aren’t totally consistent, and certainly are not complete. A “good” distribution can serve as a base for a repository and support e.g. network installs as well as disk or CD local installs, but one must be able to add, delete, update packages new and old to the repository and distribute them to all the systems that rely on the repository for update management both automatically and on demand.

Alas, rpm itself is a terrible tool to use for this purpose, a fact that has driven managers of rpm-based systems to regularly tear their hair for years now. Using rpm directly to manage rpm installs, the most one can do is look one step ahead to try to resolve dependencies. Since dependency loops are not at all uncommon on real-world repositories where things are added and taken away (and far from unknown even in box-set linux distributions that are supposed to be dependency-loop free) one can literally chase rpm’s around in loops or up a tree (so to speak;-) trying to figure out what has to be installed, uninstalled, modified, hacked, or rebuilt before finally succeeding in installing the one lonely application you originally set out to install.

rpm doesn’t permit one to tell it to “install package X and anything else that it needs, after YOU figure out what that might be”, nor does it know where to look for the latter. Yum, of course, does. That’s why it is better!

Even yum, though, can’t “fix” a dependency loop, or cope with all the arcane revision numbering schemes or dependency specifications that appear in all the rpm’s one might find and rebuild or develop locally for inclusion in a central repository. When a dependency loop or other revision problem is encountered, a Real Human has to apply a considerable amount of systems expertise to resolve the problem. This suggests that building rpm’s from sources in such a way that they “play nice” in a distribution repository, while also forming a critical component of said repository for other applications, isnot a trivial process. So much so that many rpm developers simply do not succeed.

Also, yum achieves its greatest degree of scalability and efficiency if only rpm-based installation is permitted on all the systems using yum to keep up to date. Installing locally built software into /usr/local becomes Evil and must be prohibited (or done only by the truly knowledgeable, when truly necessary and subsequently kept up to date and maintained). Commercial packages, too, usually have to have their installation mechanisms circumvented and be repackaged into some sort of rpm for controlled distribution; this is usually a good idea anyway, as those cute little install mechanisms are often enormously stupid as they fail to replicate all of the functionality of yum and lock the system into libraries that rapidly become obsolete or that have unpatched security flaws.

For that reason, systems administrators of organizations will generally need to learn how to build rpms and set up local organization-specific repositories to handle packages that are not already built and available in the base distribution. An entire section of this HOWTO is devoted to a guide for repository maintainers and rpm builders, including some practices which (if followed) would make dependency and revision numbering problems far less common and life consequently good.

In the next few sections we will see where to get yum (if it isn’t already the basis of your operating system environment, how to install its server side support including a small repository, and then how to set up and test a yum client. Following that there will be a few sections on advanced topics and design issues; how to set up a repository in a complex environment, how to build rpm’s that are relatively unlikely to create dependency and revision problems in a joint repository, how to package third party (e.g. site licensed) software so it can be distributed, updated, and maintained via yum (linux software distributors take note!) and more.

1.5 Copyright

Yum HOWTO Copyright (c) 2003, 2010 by Robert G. Brown

Please freely copy and distribute (sell or give away) this document in any format. It’s requested that corrections and/or comments be forwarded to the document maintainer. You may create a derivative work and distribute it provided that you:

 

  • Send your derivative work (in the most suitable format such as sgml) to the LDP (Linux Documentation Project) or the like for posting on the Internet. If not the LDP, then let the LDP know where it is available.
  • License the derivative work with this same license or use GPL. Include a copyright notice and at least a pointer to the license used.
  • Give due credit to previous authors and major contributors.

 

If you’re considering making a derived work other than a translation, it’s requested that you discuss your plans with the current maintainer.

1.6 Disclaimer

Use the information in this document at your own risk. I disavow any potential liability for the contents of this document. Use of the concepts, examples, and/or other content of this document is entirely at your own risk.

All copyrights are owned by their owners, unless specifically noted otherwise. Use of a term in this document should not be regarded as affecting the validity of any trademark or service mark.

Naming of particular products or brands should not be seen as endorsements.

You are strongly recommended to make a backup of your system before major installations or changes, and to backup your system at regular intervals. Yum, like any other installation tool, can break software or even your entire installation, however carefully it has been designed not to do this very thing. Be especially cautious if your system is not yum-based from the beginning, or if you have done a lot of hand-building and installing of software before attempting to install yum, as you may already be in RPM Hell and not know it!

1.7 News

This is the first release of this document, so there isn’t much news.

Eventually we will hope that the very latest version number of this document can be obtained from a URL like http://yum.baseurl.org (which probably doesn’t work yet).

1.8 Credits/Acknowledgements

Too many to mention, at this point. Still, rgb wishes to lavishly praise Seth Vidal and Michael Stenner and Icon Riabitsev for all of the work they did making and documenting yum over the years. However, the entire AUTHORS list, and many of the members of the yum list, also deserve acknowledgement. Thanks, people!

Any comments or suggestions for yum can be mailed to the : yum mailing list. You might visit the yum mailing list website and join this list if you are interested in development. Comments on or suggestions for this HOWTO should be mailed to rgb at phy dot duke dot edu directly.

1.9 Useful Links:

 

3 Replies to “Introduction to Yum”

  1. I’d better give it a shot all of the servers I manage run CentOS or Red Hat, so I should really be running an RPM-based distro anyway. I’m very pleasantly surprised by FC7, it’s very pretty and feels sturdier than Gutsy. If Werewolf improves on it, I could be very enamoured by it! I’ll post a review if I can.

    1. Thanks for commenting. Yes please feel free to send in any review or post to my site and i will be more than pleased to review and publish it if good. Please feel free to drop comments and suggestions. d.

Leave a Reply

Your email address will not be published. Required fields are marked *