Use cases for hardlinks?

Question

In what situations would one want to use a hard-link rather than a soft-link? I personally have never run across a situation where I'd want to use a hard-link over a soft-link, and the only use-case I've come across when searching the web is deduplicating identical files.

There are good answers below, but consider the (moot) historical context. When Unix was new, disk drives were slow and had limited capacity and buffering. A hard link was simply another direct entry in the file system to the same file. Whether you were accessing ls, or as you liked to call it, list, was irrelevant. If you had made list a soft link, its use would involve finding it in the directory, reading the special file called list, see that you want the file ls, find ls in the directory, and read the actual ls file from disk. A huge difference in performance! — RichF, 18 hours ago
@OrangeDog: Yes, but you only need a link-count field in the inode if you want to support multiple links. (You might need a flag for the in-memory version of inodes to handle the unlinked but still-open case. fsck after a crash without journaling would still have to look for inodes with no links either way.) — Peter Cordes, 5 hours ago
POSIX directory semantics would have to be designed differently: .. is always the same inode as . in the parent directory. Things like find can check that link-count=2 to detect leaf directories, and avoid stating the entries from readdir to look for subdirectories. But that's only a minor feature enabled by support for hardlinks of non-directory files (regular, symlink, device, socket, and named-pipe). (Yes, symlinks have their own inode, and can be hardlinked.) — Peter Cordes, 5 hours ago

Gypsy Spellweaver · Accepted Answer · 2017-01-28 03:58:08Z

Aside from the backup usage mentioned in another comment, which I believe also includes the snapshots on a BTRFS volume, a use-case for hard-links over soft-links is a tag-sorted collection of files. (Not necessarily the best method to create a collection, a database-driven method is potentially better, but for a simple collection that's reasonably stable, it's not too bad.)

A media collection where all files are stored in one, flat, directory and are sorted into other directories based on various criteria, i.e.: year, subject, artist, genre, etc. This could be a personal movie collection, or a commercial studio's collective works. Essentially finished, the file is saved, not likely to be modified, and sorted, possibly into multiple locations by links.

Bear in mind that the concept of "original" and "copy" are not applicable to hard-links: every link to the file is an original, there is no "copy" in the normal sense. For the description of the use-case, however, the terms mimic the logic of the behavior.

The "original" is saved in the "catalog" directory, and the sorted "copies" are hard-linked to those files. The file attributes on the sorting directories can be set to r/o, preventing any accidental changes to the file-names and sorted structure, while the attributes on the catalog directory can be r/w allowing it to be modified as needed. (Case for that would be music files where some players attempt to rename and reorganize files based on tags embedded in the media file, from user input, or internet retrieval.) Additionally, since the attributes of the "copy" directories can be different than the "original" directory, the sorted structure could be made available to the group, or world, with restricted access while the main "catalog" is only accessible to the principal user, with full access. The files themselves, however will always have the same attributes on all links to that inode. (ACL could be explored to enhance that, but not my knowledge area.)

If the original is renamed, or moved (the single "catalog" directory becomes too large to manage, for example) the hard-links remain valid, soft-links are broken. If the "copies" are moved and the soft-links are relative, then the soft-links will, again, be broken, and the hard-links will not be.

Note: there seems to be inconsistency on how different tools report disk usage when soft-links are involved. With hard-links, however, it seems consistent. So with 100 files in a catalog sorted into a collection of "tags", there could easily be 500 linked "copies." (For an photograph collection, say date, photographer, and an average of 3 "subject" tags.) Dolphin, for example, would report that as 100 files for hard-links, and 600 files if soft-links are used. Interestingly, it reports that same disk-space usage either way, so it looks like a large collection of small files for soft-links, and a small collection of large files for hard-links.

A caveat to this type of use-case is that in file-systems that use COW, modifying the "original" could break the hard-links, but not break the soft-links. But, if the intent is to have the master copy, after editing, saved, and sorted, COW doesn't enter the scenario.

FYI: btrfs snapshots aren't hardlinks. They have different behavior (e.g., modifying one copy doesn't modify the other). And stat will show only one link. — derobert, 19 hours ago
@derobert Not sure how snapshots work, little investigation shows interesting things. For unchanged files/directories stat show the same inode number, but different device ID. Must have something to do with the way subvolumes are overlaid on the main, rarely mounted, volume. I suspect that if the main volume was mounted stat would show a link count equal to the number of snapshots that held that version of the file. COW probably takes care of the modifying one not affecting any others. Mere speculation based on mild curiosity, but not curious enough to dig deeper. — Gypsy Spellweaver, 10 hours ago
Each symlink has its own inode, so it uses up an inode entry in the filesystem. Traditional Unix filesystems require you to choose how much space to reserve for inodes at FS creation time, instead of allocating it as-needed like XFS does. So it's actually significant that the symlink version would use up many more inodes (even besides the VFS cache footprint implications). — Peter Cordes, 5 hours ago

Stephen Kitt · Answer 2 · 2017-01-28 00:03:02Z

up vote 20 down vote

Hard links are useful for cases where you don't want to tie the existence of both files. Consider this:

touch a
ln -s a b
rm a

Now b is useless. (And these steps may happen quite far apart, be done by different people, etc.)

Whereas with a hard link,

touch a
ln a b
rm a

b is still present and correct.

answered 2 days ago

Stephen Kitt

57.8k9102137

But when is that desirable? I've never found myself in a situation where I want that behavior. – Matthew Cline 2 days ago

7

@MatthewCline You would want this behavior when managing efficient incremental backups. Especially when old backups are deleted, in a soft-link based backup system you would have to check and relink all newer backup files/links to a valid base again, whereas hardlinks do that job "for free" on inode level. timeshift/backintime for example use hardlinks extensively. – orzechow 2 days ago

3

@orzechow I don't think you want hard link behavior anywhere near your backup system. github.com/bit-team/backintime/wiki/… backintime foolishly assumes that all changes to files will be by a remove-create cycle rather than updating in place. – DepressedDaniel 2 days ago

9

@DepressedDaniel hard links are fine inside a backup system, you just don't want the backups to be hard linked to the live files. But in any case a backup should never be reachable directly from a live system... – Stephen Kitt yesterday

1

This is not an answer-- specifically, it's not a use case. It's just a demonstration of hard links behavior. – user394 yesterday

| show 2 more comments

thrig · Answer 3 · 2017-01-28 16:35:56Z

up vote 10 down vote

A single program may change its behavior depending on what name it is launched as:

$ ls -li `which pgrep` `which pkill`
208330 -r-xr-xr-x  2 root  bin  19144 Jul 26  2016 /usr/bin/pgrep
208330 -r-xr-xr-x  2 root  bin  19144 Jul 26  2016 /usr/bin/pkill

Which over in the source is decided via something like

if (strcmp(__progname, "pgrep") == 0) {
    action = grepact;
    pgrep = 1;
} else {
    action = killact;

though the exact details wil vary depending on the OS and language involved.

This allows (mostly) identical code to not have to be compiled out to two (mostly) identical binaries. Bear in mind unix dates to days when disk space was super expensive, though according to Stevens in APUE chapter 4 symlinks were implemented in BSD4.2 (1983) to replace various limitations of hardlinks. A test program to check whether the symlink name is used as the program name might look something like:

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
    printf("called as '%s'\n", *argv);
    exit(0);
}

And tested via:

$ cc -o myname myname.c 
$ ln -s myname alias
$ ./myname
called as './myname'
$ ./alias
called as './alias'
$

edited yesterday

answered 2 days ago

thrig

10.1k523

4

But isn't that usually handled with softlinks? – Matthew Cline 2 days ago

1

@MatthewCline it could be today, but symlinks did not exist prior to 4.2BSD (1983) according to Stevens in APUE. – thrig 2 days ago

3

@thrig, the question specifically asks for use cases that cannot be accomplished by symlinks or are, at least, preferable rather than using symlinks. Your answer apply to both HLs and SLs. – Marcelo yesterday

3

BusyBox takes this to the maximum. – Max Ried yesterday

add a comment |

Martin Schröder · Answer 4 · 2017-01-29 00:59:50Z

Filesystems are a simple and yet an efficient way to organize and classify files (this is its very primary reason for existence). Hardlinks allow a higher degree of flexibility in this matter.

As mentioned, there is no concept of original and copies when dealing with hardlinks, all directory entries (hardlinks) are simply references to the existence of the file (point to its inode) with no precedence, hence there are also no broken hardlinks...

So here there are some of the use cases that hardlinks attend but softlinks don't:

Imagine you have a collection of movies or music or other media and want to have different classification criteria applied, like songs classified by artist in a branch (each artist has its own sub-directory); by genre in another branch (each in a different sub-directory), etc.. Still you don't want to duplicate the files nor to decide where to put the "original" so that you have the freedom to reclassify without having to "manage" and re-link files when moving in order to avoid broken links.
Another reason is to avoid the waste of storage space that would be required for having multiple copies of the same file and yet allow the chroot syscall to benefit from a subset of files in the "master" filesystem root (symbolic links could never reference files from outside the chroot sandbox, even if they have relative paths).
Another very important but rarely mentioned reason for hardlinks to exist are the .. subdirectories. The .. directories actually are (in most unix fs implementations) hardlinks to the parent directory, without hardlinks this has to be implemented in a completely different way, while the existence of hardlinks makes this very easy to be implemented.

For point 1, using uuids as the 'canonical' name for files, and making all the human-readable names symbolic links to the uuids, is an alternate solution. — R.., yesterday

phk · Answer 5 · 2017-01-28 12:22:00Z

up vote 4 down vote

I had recently a use case for a somewhat safe update procedure for U-Boot based systems where uImage is a soft-link pointing to the image to boot, the idea was that a power outage should pose no problems, no matter at which point in the process it happens (assuming the file system plays along):

ln image.bin backup_image.bin
ln -sf backup_image.bin uImage

// replace image.bin

ln -sf image.bin uImage
rm backup_image.bin

Without hardlinks it wouldn't be that simple.

answered yesterday

phk

2,5254834

2

+1 because this is conceptually a very nice reason, but unfortunately ln -sf is not atomic. It deletes the old symlink and makes a new one. To fix this you need to make a new symlink with a temporary name and rename(2) (mv) it to name of the one you want to replace. – R.. yesterday

@R.. You're right! 😲 stat("uImage", {st_mode=S_IFREG|0777, st_size=0, ...}) unlink("uImage"), symlink("backup_image.bin", "uImage") – phk 20 hours ago

1

BTW, see here for my version of install.sh that solves the problem: git.musl-libc.org/cgit/musl/tree/tools/install.sh – R.. 19 hours ago

add a comment |

Kamil Maciorowski · Answer 6 · 2017-01-29 14:29:58Z

When my P2P software finishes downloading a certain file, the file is placed in a specific directory. Downloaded files hardly ever need to be edited. The common case is I make a hardlink in a different directory where I need the file to be.

Advantages:

I still share the file in P2P network as I should even if I 'rm' or mv the "copy".
The file is also at the path where I need it; most of such locations are not shared.
I can rm the "original" to stop sharing the file; this operation doesn't affect the "copy" in desired place.
My diskspace is used just once.

The main point: if I knew in advance which file I would rm first, I might go with symlink. But I never know.

Paul Draper · Answer 7 · 2017-01-29 23:09:43Z

Very common, real-world example that needs hardlinks:

git clone --reference <repository>

This clones from an local Git repo with nearly zero copying. Instead of copying the object files (immutable files used by Git for its "database"), it simply hardlinks them.

Any repo can remove an object, but the inode stays valid for the rest of the repos. And if an object is removed from all repos, it's deleted from disk. Hard links make for a beautifully robust and fast solution. Very common in CI servers.

There is a non-hard-link version: git clone --shared <repository>. This, however, has a lot of caveats since everyone is working on the same directory.

Dan Pritts · Answer 8 · 2017-01-30 04:00:17Z

BackupPC is a backup system that uses hard links on the servers to provide file-level deduplication.

Hard links are superior to soft links here because they provide automatic reference counting. Files are first stored in a "pool" directory tree based on their md5 hash. Any backup that makes use of that file makes a hard link to the pool file. As backups expire or are deleted, their hard links are removed from the filesystem.

A cron job periodically deletes any files in the pool directory that don't have more than one link.

This method has some disadvantages (principally, that it is difficult to use filesystem-based tools to replicate the backup store), but it's proven to be quite robust in practice.

Another use case: the tomcat java web application server treats file names as metadata (a java "war" file must be named based on its path on the web server).

e.g.: foo.war is the java code that serves the url /foo

Unfortunately, it resolves symlinks before making this decision.

So, say you want to deploy an application build, and give it a descriptive file name (e.g., with a release number or date). You can't make a symlink to the file with the "real" name - you have to make a hardlink.

foo.war symlinked to foo-20170129.war doesn't work

foo.war hardlinked to foo-20170129.war does work.

i don't like this tomcat behavior, but hardlinks give me a way around it.

Thomas Padron-McCarthy · Answer 9 · 2017-01-30 09:17:54Z

One use that I have had for hard links is when downloading or uncompressing a broken file. The program that does the downloading or uncompressing (such as unzip or unrar) will often automatically remove the incomplete file when it encounters an error, and there is usually no option to keep it. If I want to keep the file, I can make a hard link to it.

asked	2 days ago
viewed	1531 times
active	today

current community

your communities

more stack exchange communities

Use cases for hardlinks?

9 Answers 9

Your Answer

Not the answer you're looking for? Browse other questions tagged filesystems hard-link or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Use cases for hardlinks?

9 Answers 9

Did you find this question interesting? Try our newsletter

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged filesystems hard-link or ask your own question.

Linked

Related

Hot Network Questions