Find files which belong to a user or Unix group

If you need to find all the files owned by a certain Unix user or group in a given directory, it’s actually very easy to do using find command.

How to find files owned by a user

If you know the username, this is the command you might use to locate all the files which belong to it:

ubuntu$ find /var -user www-data
/var/cache/apache2/mod_disk_cache
/var/lib/awstats
...

In this case, we’re looking for all the files and directories owned by a webserver used www-data under /var.

Find files owned by a Unix group

If you’re doing a broader search, you may be interested in identifying all the files owned by a Unix group. Here’s how you can do it:

ubuntu$ find /usr -group staff
/usr/local/lib/site_ruby
/usr/local/lib/site_ruby/1.8
/usr/local/lib/site_ruby/1.8/x86_64-linux
/usr/local/lib/python2.5
/usr/local/lib/python2.5/site-packages
/usr/local/lib/python2.4
/usr/local/lib/python2.4/site-packages
/usr/local/share/fonts

In this example, we’re using find to locate all the files in /usr owned by the staff group.

Locate files by UID or GID

If you’re more comfortable dealing with Unix user IDs (UIDs) and group IDs (GIDs), you can use them with find command as well. In this example, I’m looking for the temporary files created by myself (my UID on that system is 1000):

ubuntu$ find /var/tmp -uid 1000
/var/tmp/photos.rar
/var/tmp/mysql.tar.gz
...

I presume this post answers your questions on the topic, but do let me know if there’s anything else you’d like me to explain!

See also:




How To Compare Directories in Unix

Certain situations require you to quickly confirm which files between two directories are different, and while your particular requirements may suggest writing a script for this task, I want to make sure you’re familiar with the basics first – majority of directory comparisons can be done using diff command (yes, that’s right – the same one used for comparing files).

Why compare directories?

First of all, let’s agree on why you may need to compare directories. There’s a few possible reasons:

  • comparing the amount of space consumed by two directories – this is the very first and the fasted way to compare directories because it gives you an idea how close in terms of space usage the directories are. For example, if you’re comparing two daily backups of the same piece of software, you normally don’t expect them to be vastly different.
  • identifying if some files are missing from one of the directories – can be useful when you want to make sure two directories with configuration files for a certain package are identical – files can be different, but the same files are present in the same locations for both directories
  • confirming if files in two directories are the same – a typical task when comparing your actual data against a backup copy. When something goes wrong, this is one of the first things you do to make sure all the important files are not only present, but are actually the same as they have been when you took the last backup copy
  • highlighting textual differences between files in directories – this is a useful exercise when you’re looking at two similar directories and expect only minor changes between the files – version numbers, different file or directory names hardcoded in various scripts, etc.

Comparing the size of two directories

I’m going to show you this trick before getting into details of using diff command. For size comparison, we should use the du command, it’s really easy.

The options used for the du command in the example below are: -s for summary (calculate the directory size based on the sizes of all the possible subdirectories it may have) and -k for kilobytes, so /usr/lib is roughly 400Mb in size as per the output below.

ubuntu$ du -sk /usr/lib /usr/lib64
404196  /usr/lib
0       /usr/lib64

This sample output will tell you that directories are vastly different, so that may save you time because you may choose not to compare anything file-by-file if one of the directories looks to be empty or really off space consumption wise.

Test setup for diff comparison exercises

For today’s post, I’ve created a set of directories and files to show how you can compare them. Here is the setup:

ubuntu$ find /tmp/dir1 /tmp/dir2
/tmp/dir1
/tmp/dir1/file1
/tmp/dir1/file2
/tmp/dir1/dir11
/tmp/dir1/dir11/file11
/tmp/dir1/dir11/file12
/tmp/dir2
/tmp/dir2/file1
/tmp/dir2/dir11
/tmp/dir2/dir11/file11
/tmp/dir2/dir11/file12
/tmp/dir2/file3

As you can see, I’ve got two directories: /tmp/dir1 and /tmp/dir2, with a dir11 subdirectory in each of them. There’s also a few files here and there, some of them missing from one of the directories specifically to be highlighted by our comparison exercises.

Basic diff usage for comparing directories

The easiest way to get started is to simply invoke diff command and specify two directories as command line parameters. Here’s what you will probably see:

ubuntu$ diff /tmp/dir1 /tmp/dir2
Common subdirectories: /tmp/dir1/dir11 and /tmp/dir2/dir11
diff /tmp/dir1/file1 /tmp/dir2/file11
Only in /tmp/dir1: file2
Only in /tmp/dir2: file3

This output confirms that /tmp/dir1 and /tmp/dir2 both contain a dir11 directory, and also shows that /tm/dir1/file1 and /tmp/dir2/file1 are actually different files even though they have the same name. By default, diff compares such files and you can see the result of each comparison in the output. Also included are pointers to the files which are present only in one of the compared directories: you can see that file2 can only be found in /tmp/dir1 and file3 was present only in /tmp/dir2.

Find which files are missing in one of the directories

From the example below, it is easy to deduct that the command line for identifying files missing in one of the directories will be this one:

ubuntu$ diff /tmp/dir1 /tmp/dir2 | grep Only
Only in /tmp/dir1: file2
Only in /tmp/dir2: file3

Highlight the different files, not the differences

If you’re only interested in files which exist in both directory structures, but are different – you can use a special command line option. It will simply point the files out, without getting into any further details. You’ll probably notice how this output is very similar to the default one:

ubuntu$ diff --brief /tmp/dir1 /tmp/dir2
Common subdirectories: /tmp/dir1/dir11 and /tmp/dir2/
Files /tmp/dir1/file1 and /tmp/dir2/file1 differ
Only in /tmp/dir1: file2
Only in /tmp/dir2: file3

Note how instead of showing the difference between file1 in /tmp/dir1 and /tmp/dir2, this time you only get told that these two files are different.

How to recursively compare directories

If you’re dealing with a complex directory structure, you’ll be glad to know that –recursive parameter for the diff command compares not only the immediate directories pointed to from the command line, but also walks through the full tree of subdirectories:

ubuntu$ diff --recursive --brief /tmp/dir1 /tmp/dir2
Files /tmp/dir1/dir11/file12 and /tmp/dir2/dir11/file12 differ
Files /tmp/dir1/file1 and /tmp/dir2/file1 differ
Only in /tmp/dir1: file2
Only in /tmp/dir2: file3

Feeling better now? Many directory comparison tasks can be accomplished using the diff command, but if you’re stuck with a particular problem which can’t be solved using my examples – please leave a commend and I’ll come up with a solution.

See also:




Using variables in Unix shell scripts

Any Unix shell script longer than a line will most likely involve using variables. Variables are used to store temporary values to simply using them in various Unix commands of your script. The beauty of using variables is that they can be evaluated and set ones, but then reused as many times as you like without your shell interpreter having to re-evaluate them again.

Defining a variable in Unix shell

To specify a value for a variable, you need to decide on the variable name – can be any word or combination of English alphabet symbols and digits, and specify the value.

In Bourne shell (sh), Bourne Again Shell (bash) and Korn Shell (ksh), here’s how you would define a new variable and assign it a value:

CONFIG_FILE="/etc/myfile"

In C-Shell (csh), it’s done like this:

setenv CONFIG_FILE "/etc/myfile"

Basic variables usage

To access the value of a variable, you need to use the same variable name, but with a dollar sign in front of it, like this:

$CONFIG_FILE

Important: to set a new value to the variable, you use just the variable name like CONFIG_FILE, but to access this value later you need to use the $CONFIG_FILE form.

The most basic way to use variables is to assign them constant values at the beginning of your script. This is a good way to define locations of standard files or directories your script will work with, etc:

#!/bin/sh
#
CONFIG_FILE=/etc/myfile
MY_DIR=/etc
echo $CONFIG_FILE

Using output of Unix commands to set variables

One of the best things about shell scripting is that it’s very easy to use any Unix command to generate the output and use it to set the variable.

In this example, I’m running a date command and saving its output as values for my variables:

ubuntu$ cat /tmp/1.sh
#!/bin/sh
#
STARTED=`date`
sleep 5
FINISHED=`date`
#
echo "Script start time: $STARTED"
echo "Script finish time: $FINISHED"

If I run this simple script, I see the following:

ubuntu$ /tmp/1.sh
Script start time: Wed May 7 04:56:51 CDT 2008
Script finish time: Wed May 7 04:56:56 CDT 2008
The same approach can be used for practically any scenario.

Here’s an example of using uname command to extract some useful information about our system:

#!/bin/sh
#
STARTED=`date`
NODE=`uname -n`
OS=`uname -o`
CPU=`uname -p`
FINISHED=`date`
#
echo "Nodename: $NODE"
echo "OS type: $OS"
echo "Processor: $CPU"
echo "Script start time: $STARTED"
echo "Script finish time: $FINISHED"

And this is how it works:

ubuntu$ /tmp/1.sh
Nodename: ubuntu
OS type: GNU/Linux
Processor: unknown
Script start time: Wed May  7 05:05:31 CDT 2008
Script finish time: Wed May  7 05:05:31 CDT 2008

That’s it for today! Let me know what next topic about Unix shell scripting you’d like to see covered, and I’ll do my best to explain it in the coming posts.

Related books

If you want to learn more, here’s a great book:


linux-command-line-shell-scripting-bible
Linux Command Line Shell Scripting Bible




How To Find a Location of a Directory in Unix

Very quick tip for you today, I just see that many of visitors of this block are curious how they can find a directory in Unix – and so here’s a command to help you do just that.

Finding directories in Unix

There’s nothing better than to employ the find command. As you might remember, among many things, this wonderful tool allows you to search files by their type. Since nearly everything in Unix is a file, this means you can find directories.

Let’s take an example: if you wand to find out everything about your MySQL installation, you can have a search across your filesystems to find all the directories called mysql:

Here is how you would find a directory called mysql under /etc directory:

ubuntu# find / -name mysql -type d
/var/log/mysql
/var/lib/mysql
/var/lib/mysql/mysql
/etc/mysql
/usr/lib/perl5/DBD/mysql
/usr/lib/perl5/auto/DBD/mysql
/usr/share/mysql

As you can see, there are quite a few directories which belong to MySQL, and you can see from the list that MySQL configuration is most likely to be in /etc/mysql directory.

Narrowing down directory search in Unix

If you search across all your filesystems, it may take too much time. That’s why it makes sense to narrow your search using common sense.

For example, if you’re looking for a configuration file of some standard package of software, most likely it will be under /etc directory, so you can specify it and greatly reduce the searching time.

In this example, we’re narrowing directory search to only those directories that are part of /etc:

ubuntu# find /etc -name mysql -type d
/etc/mysql

See also:




How To Find Your UID From Bash

I see this question a lot in search engines requests which point to this blog. And if you’re so interested how this is done, I’m happy to explain.

Standard shell environment variables pointing to user id

Not only in bash, but in any other shell on your system, there’s quite a few standard environment variables set by default when you log in. Among them there are ones which contain your username, and so you can use them to find out the uid as well.

The variables I’m talking about are USER and USERNAME. Both of them should contain the same user name, in my example it’s greys:

ubuntu:~$ echo $USER
greys
ubuntu:~$ echo $USERNAME
greys

Knowing the username, it’s very easy to use the id command to confirm the user id:

greys@ubuntu:~$ id -u greys
1000

Bash environment variables

In Bash specifically, there’s also a few variables automatically set for your convenience so that you don’t have to figure UID based on the username.

In fact, there are three variables which you will find useful:

  • UID – your user ID
  • EUID – your effective user ID
  • GROUPS – array of all the Unix groups you belong to

Here is how to show the values of these variables:

ubuntu$ echo $UID
1000
ubuntu$ echo $EUID
1000
ubuntu$ echo $GROUPS
113



How To Find What Symlink Points To

To some this may seem like a trivial task, but I see great interest from Unix/Linux beginners arriving to this blog: how exactly does one confirm what a symlink points to?

First of all, if you haven’t already done so – read my Unix Symlink Example post to learn what a symlink is and to refresh your mind about creating symlinks.

If you’re still reading, perhaps a bit more explanations are needed.

Since symlink is nothing but a special Unix file, you can use all the standard Unix commands to work with it – list it or remove it for example.

Listing symlinks

For listing files, the most obvious choice is a Unix ls command, and the way you use it to list symlink is the same way you’d list any other file:

ubuntu$ ls ubuntu-release 
ubuntu-release

Now we come to exactly the reason why some people don’t find working with symlinks obvious: when we list the symlink, we expect to see the file it points to. More precisely, we expect to see the name of this file.

Showing what a symlink points to

To show what a symlink points to, you need to use a long format of the ls command:

ubuntu$ ls -l ubuntu-release 
lrwxrwxrwx 1 greys greys 10 2008-03-23 16:53 ubuntu-release -> /etc/issue

As you can see, the ubuntu-release symlink points to the /etc/issue file.




How To Find the Largest Files in your Unix system

I see that my Finding Large Files and Directories post is quite popular, yet there are a few more ways to simplify your search for the largest disk space consumers in your Unix system.

Make find command show file sizes

If you remember, the default way a find command reports results includes only the fully qualified (that means including the full path) filenames.

Now, if you look at a task of identifying the largest files, it’s great if you can get a list of all the files bigger than some figure your specify, but what would be even better is to include the exact size of each file right into the output of the find command.

Here’s how you do it: it’s possible to specify which information about each file you’d like to see. Check out the find command man page for all the possibilities, but in today’s example I’m using two parameters: %s means the size of a file in bytes and %f means the filename itself.

Let’s say I want to get a list of all the files under /usr directory which are larger than 15Mb each, and show the exact size of each file. Here’s how it can be done:

ubuntu$ find /usr -size +15M -printf "%s - %p\n"
39859372 - /usr/lib/vmware/webAccess/java/jre1.5.0_07/lib/rt.jar
35487120 - /usr/lib/vmware/bin/vmware-hostd
16351166 - /usr/lib/vmware/bin/vmplayer
38353296 - /usr/lib/vmware/hostd/libtypes.so
54366585 - /usr/lib/vmware/hostd/docroot/client/VMware-viclient.exe
92143616 - /usr/lib/vmware/isoimages/linux.iso
23494656 - /usr/lib/vmware/isoimages/windows.iso
47070920 - /usr/lib/libgcj.so.81.0.0
20890468 - /usr/share/fonts/truetype/arphic/uming.ttf
17733780 - /usr/share/icons/crystalsvg/icon-theme.cache
18597793 - /usr/share/myspell/dicts/th_en_US_v2.dat
45345879 - /usr/src/linux-source-2.6.22.tar.bz2

Just to help you refresh your mind, here’s the explanation of all the parameters in the command line:

  • /usr is the directory where we’d like to find the files of interest
  • -size +15M narrows our interest to only the files larger than 15Mb
  • -printf “%s – %p\n” is the magic which shows the nice list of files along with their sizes.

Sort the list of files by filesize

Next really useful thing we could do is to sort this list, just so that we could see a nice ordered representation of how big each file is. It’s very easily done by piping the output of the find command to a sort command:

ubuntu$ find /usr -size +15M -printf "%s - %p\n" | sort -n
16351166 - /usr/lib/vmware/bin/vmplayer
17733780 - /usr/share/icons/crystalsvg/icon-theme.cache
18597793 - /usr/share/myspell/dicts/th_en_US_v2.dat
20890468 - /usr/share/fonts/truetype/arphic/uming.ttf
23494656 - /usr/lib/vmware/isoimages/windows.iso
35487120 - /usr/lib/vmware/bin/vmware-hostd
38353296 - /usr/lib/vmware/hostd/libtypes.so
39859372 - /usr/lib/vmware/webAccess/java/jre1.5.0_07/lib/rt.jar
45345879 - /usr/src/linux-source-2.6.22.tar.bz2
47070920 - /usr/lib/libgcj.so.81.0.0
54366585 - /usr/lib/vmware/hostd/docroot/client/VMware-viclient.exe
92143616 - /usr/lib/vmware/isoimages/linux.iso

As you can see, the smallest files (just above 15Mb) are at the top of the list, and the largest ones are at the bottom.

Limit the number of files returned by find

The last trick I’ll show you today is going to make your task even easier: why look at the pages of find commnand output, if you’re after only the largest files? After all, your list can be much longer than the one shown above. To solve this little problem we’ll pipe the output of all the commands to yet another unix command, tail.

tail command allows you to show only a specified number of lines of any standard input or Unix text file you point it to. By default, it strips the number of lines to 10, which can be enough for your purposes.

Here’s how you can get a least of the 10 largest files under /usr:

ubuntu$ find /usr -size +15M -printf "%s - %p\n" | sort -n | tail
18597793 - /usr/share/myspell/dicts/th_en_US_v2.dat
20890468 - /usr/share/fonts/truetype/arphic/uming.ttf
23494656 - /usr/lib/vmware/isoimages/windows.iso
35487120 - /usr/lib/vmware/bin/vmware-hostd
38353296 - /usr/lib/vmware/hostd/libtypes.so
39859372 - /usr/lib/vmware/webAccess/java/jre1.5.0_07/lib/rt.jar
45345879 - /usr/src/linux-source-2.6.22.tar.bz2
47070920 - /usr/lib/libgcj.so.81.0.0
54366585 - /usr/lib/vmware/hostd/docroot/client/VMware-viclient.exe
92143616 - /usr/lib/vmware/isoimages/linux.iso

Show the largest 10 files in your Unix system

Now that you know all the most useful tricks, you can easily identify and show the list of the 10 largest files in your whole system. Bear in mind, that you should probably run this command with root privileges, as files in your system belong to various users, and a single standard user account will most likely have insufficient privileges to even list such files.

If you’re trying to locate your largest files in Ubuntu, use the sudo command (assuming you have the sudo privileges to become root):

ubuntu$ sudo find / -size +15M -printf "%s - %p\n" | sort -n | tail

alternatively, just become root by doing something like this (you obviously should know the root password to do that):

$ su - root 

and then run the find command itself. Here’s how the output looks on my Ubuntu desktop:

ubuntu$ find / -size +15M -printf "%s - %p\n" | sort -n | tail
39859372 - /usr/lib/vmware/webAccess/java/jre1.5.0_07/lib/rt.jar
45345879 - /usr/src/linux-source-2.6.22.tar.bz2
45356784 - /var/cache/apt/archives/linux-source-2.6.22_2.6.22-14.52_all.deb
45424028 - /var/cache/apt/archives/kde-icons-oxygen_4%3a4.0.2-0ubuntu1~gutsy1~ppa1_all.deb
47070920 - /usr/lib/libgcj.so.81.0.0
54366585 - /export/dist/vmware/server2b2/vmware-server-distrib/lib/hostd/docroot/client/VMware-viclient.exe
54366585 - /usr/lib/vmware/hostd/docroot/client/VMware-viclient.exe
92143616 - /export/dist/vmware/server2b2/vmware-server-distrib/lib/isoimages/linux.iso
92143616 - /usr/lib/vmware/isoimages/linux.iso
340199772 - /export/dist/vmware/server2b2/VMware-server-e.x.p-63231.x86_64.tar.gz

That’s it for today, hope this helps! Please bookmark this post if you liked it, and leave comments if there are any questions!




How to Find the Owner of a File in Unix

Surprisingly, I see quite a few questions around file ownership asked all the time. And one of the first questions asked concerns the Unix user who owns a particular file.

It’s very easy to confirm who the owner of a file is, and you can do it using the ls command.

Find the owner of a file

Using -l command line option, you make ls return the output in a long format. And two fields in each line of the output are showing you the username and the group name which file belongs to.

In this example, we can see that /etc/passwd belongs to a user root and a unix group called root:

ubuntu$ ls -l /etc/passwd
-rw-r--r-- 1 root root 1443 Jan 30 16:49 /etc/passwd

And if I look at one of my own files, you can see that it belongs to me (greys) and to my primary unix group (admin):

ubuntu$ ls -l /home/greys/myfile.txt
-rw-r--r-- 1 greys admin 0 Mar 20 05:13 /home/greys/myfile.txt

That’s it, do you see yourself that it’s not rocket science? Do ask questions in the comments if you’re still not sure about details!




Unix filesystem basics: symlink example

I can see some of you have arrived to my Unix file types post looking for an example of using symlinks in Unix. Today I would like to give you a quick introduction into Unix symlinks.

What is symlink?

Symlink is a short name for symbolic link (sometimes also referred as soft link) is a special type of file in Unix, which references another file or directory. Symlink contains the name for another file and contains no actual data. To most commands, symlinks look like a regular file, but all the operations (like reading from a file) are referred to the file the symlink points to.

How to create a Unix symlink

Just to give you an example, here’s how a typical symlink can be created and verified.

First, we create a new text file called /tmp/file.1:

greys@ubuntu:~$ echo "Hello from file 1" > /tmp/file.1
greys@ubuntu:~$ cat /tmp/file.1
Hello from file 1

Now, we create a symlink called /tmp/file.2, which points to the original file of ours. We use the standard Unix ln command, first specifying the target file (the real file we want our symlink to point to), then specify the name of our symbolic link:

greys@ubuntu:~$ ln -s /tmp/file.1 /tmp/file.2

If we look at both files, here’s what we see:

greys@ubuntu:~$ ls -al /tmp/file*
-rw-r--r-- 1 greys greys 18 2008-02-07 22:22 /tmp/file.1
lrwxrwxrwx 1 greys greys 11 2008-02-07 22:23 /tmp/file.2 -> /tmp/file.1

If you notice, the /tmp/file.2 has an “l” in the long-format output of theĀ ls command, which confirms it’s a symbolic link. You also can see right away where this symlink points to.

Just to confirm the typical behaviour of a symlink, here’s what happens when we try to show the contents of the /tmp/file.2: we see the contents of the file it points to, /tmp/file.1:

greys@ubuntu:~$ cat /tmp/file.2
Hello from file 1

How to remove a symlink

Guess what happens when you remove a symlink? Actually, not much. You only remove the symlink file itself, not the data file it refers to. Here’s what I mean:

greys@ubuntu:~$ rm /tmp/file.2
greys@ubuntu:~$ ls -al /tmp/file*
-rw-r--r-- 1 greys greys 18 2008-02-07 22:22 /tmp/file.1

While we’re at it, I would also like to explain what an orphan symlink is: it’s a symbolic link which points nowhere, because the original target file it used to point to doesn’t exist anymore.

Here is how an orphan symlink looks. First off, we recreate the symlink and verify it points to /tmp/file.1 once again:

greys@ubuntu:~$ ln -s /tmp/file.1 /tmp/file.2
greys@ubuntu:~$ ls -al /tmp/file*
-rw-r--r-- 1 greys greys 18 2008-02-07 22:22 /tmp/file.1
lrwxrwxrwx 1 greys greys 11 2008-02-07 22:38 /tmp/file.2 -> /tmp/file.1

Now, we simply rename the /tmp/file.1 file to /tmp/file.3:

greys@ubuntu:~$ mv /tmp/file.1 /tmp/file.3

This, naturally, makes /tmp/file.2 an orphan symlink which points to the old /tmp/file.1, but there isn’t a file like this anymore. Attempt to show the contents of /tmp/file.2 will thus fail:

greys@ubuntu:~$ ls -al /tmp/file*
lrwxrwxrwx 1 greys greys 11 2008-02-07 22:38 /tmp/file.2 -> /tmp/file.1
-rw-r--r-- 1 greys greys 18 2008-02-07 22:22 /tmp/file.3
greys@ubuntu:~$ cat /tmp/file.2
cat: /tmp/file.2: No such file or directory

See also:




How To Find Out Which Group a Unix User Belongs To

If you know the name of a particular user on your Unix system and just want to confirm the primary Unix group (gid) of this individual, just use the id command:

$ id -g greys
115

If Unix group id (gid) for the user doesn’t help you much and you’re interested in the Unix group name, use this:

$ id -gn greys
admins

If you want to get all the information about the user id and all the groups the user belongs to, it’s easier to use the default id command:

$ id greys
uid=1000(greys) gid=115(admins) groups=35(testgroup),115(admins)

See also: