Unix Sockets Tutorial

I’ve noticed how many people found other pages of this blog trying to find more information about Unix sockets, and so I thought it’s about time we shed some light on this seeming mysterious, but really simple concept.

What is a Unix socket?

A Unix socket (the technically correct name for it is Unix domain socket, UDS) is a way of inter-process communication (IPC) in Unix. Like almost everything in Unix, a socket is a file. It’s a special file, to be precise. Unix processes which want to communicate between each other use special set of functions to access the special file of a Unix socket, and easily exchange data in both directions.

In very simple terms, a Unix socket is nothing but a byte stream – a data transfer between processes running locally or on networked Unix systems.

Examples of Unix sockets

Most well know examples of Unix sockets are probably those servicing the graphics system on your Unix box: X11 server socket and some optional sockets of programs managing it, like GDM (Gnome Desktop Manager).

GDM socket

bash-2.05b$ ls -al /tmp/.gdm_socket
srw-rw-rw-    1 root     root            0 Oct 25 17:49 /tmp/.gdm_socket
bash-2.05b$ file /tmp/.gdm_socket
/tmp/.gdm_socket: socket

syslog socket

The message logging daemon, syslogd, uses /dev/log on most systems to accept new messages to be logged in log files. Here’s how this file looks in RedHat:

bash-2.05b$ ls -al /dev/log
srw-rw-rw-    1 root     root            0 Oct 25 17:49 /dev/log
bash-2.05b$ file /dev/log
/dev/log: socket

Types of sockets in Unix

Here is another thing which can be quite confusing about sockets in Unix – the classification.

On the highest level, there are two types of sockets: Unix domain sockets for IPC (AF_UNIX) and Unix network sockets using Internet family of protocols (AF_INET), most commonly referred to as Unix Internet sockets.

These two types essentially define a set of communication protocols supported by a socket. Unix domain sockets are for IPC (interprocess communication) only, which means they can only be used for processes running locally on the same Unix system and communicating to each other. The Internet sockets support protocols which allow you to connect processes between different Unix systems: IP protocol for the network communication, and TCP/UDP protocols for the transport.

Connection-oriented and connectionless sockets

Both Unix domain sockets and Unix Internet sockets can use reliable (guaranteed) and unreliable (best effort) approaches for establishing and maintaining connections. With Unix domain sockets it doesn’t really matter, as all the communication is local to your Unix system, but with Internet sockets it’s a different story.

Connection-oriented sockets are called stream sockets (SOCK_STREAM), and are the reliable and guaranteed way to communicate. For Internet sockets, the TCP protocol is used to ensure that your data is confirmed to be delivered to the destination point, and your data packets will be received in the same order they were sent out.

The reason such sockets are called stream sockets is because your Unix will create and maintain a stream – an active connection between the source and the destination points, using TCP to manage this connection. All packets are automatically acknowledged and synchronization is maintained to ensure the ordered way of sending and receiving packets.

Connectionless sockets are called datagram sockets (SOCK_DGRAM) and they use UDP for their communication, which means that the packets of data which you send may or may not be received on the other end. Because there is no control over the transfer (no acknowledgments of the delivery, no ordered packet arrangements), there is no need to maintain such a connection. Hence the name – connectionless. You simply send a message out, and it’s then a problem of a higher level protocol to ensure the data is transferred successfully.

See also

That’s it – should be enough for a brief Unix sockets introduction. Do ask your questions in the comments to this post, and I’ll be sure to answer them in future posts on Unix sockets – there’s still plenty to show and explain!




Find Compiler Version in Unix

Finding the compiler version in your Unix system should be the first step before you attempt to compile any package from its source codes. In fact, if you’re familiar with the common compilation routine, the configure script which you run to generate the Makefile before compiling anything does exactly that – it finds out which compilers (if any) you have installed on your system, and confirms their versions and capabilities.

If you want to find the compiler version yourself, here’s what you do:

1) Confirm which compiler you’re looking at

Most likely, it will be gcc, but since GCC isn’t a GNU C Compiler anymore, but a GNU Compiler Collection, it can stand for any programming language from this suite. Here they are, just so that you know (a binary of each compiler is shown in bold):

gcc – GNU project C and C++ compiler
g++ – GNU project C and C++ compiler
gcj – GNU Compiler for Java
g77 – GNU project Fortran 77 compiler
gnat – GNU Ada Translator

2) Find where your compiler is installed

You have a pretty good chance of finding gcc binaries right under /usr/bin directory on most modern systems. Some older Unix distros will not have GCC installed by default, others like recent versions of Solaris and OpenSolaris, will have gcc under a different location. In Solaris 10, gcc binaries are under /usr/sfw/bin directory.

If the gcc binary isn’t found, consider looking for the file using the find command, or query software repository to confirm if it’s installed or now.

3) Run the compiler to find out its version

Most compilers will give you their version if -v option is specified:

$ gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --enable-checking=release x86_64-linux-gnu
Thread model: posix
gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)

See also:




How To Take A Screenshot in Unix (xwd)

Quite often there’s a need for you to take a screenshot of your Unix desktop, and as always there’s a number of ways to do it. Today I’m going to cover the command line approach to taking screenshots.

Taking a Screenshot with xwd

Most modern Unix desktop systems come with Gnome desktop environment by default, and use Xorg as their default X11 server. This means you are likely to have the xwd tool in your OS, which allows you to take screenshots.

Furthermore, many Unix distros come with hundreds of command-line tools bundled with the default OS install. Imagemagick is one of such bundled toolkits, and you can use the convert tool which is part of it for converting the xwd-generated screenshot into any graphics format of your preference.

Here’s how you use these two commands together:

bash-3.0$ xwd -root | convert - /tmp/screenshot.png

In this line, xwd command is invoked to take a full screenshot of your desktop. You then pipe its output to the convert tool and specify where you want this output saved to.

See Also




Perl: Searching Through Directory Trees

I had a need to scan a huge directory tree today, identifying the users and Unix groups owning all the files. The problem I faced was too long usernames and group names which meant the

find /directory -ls

command which I normally use for such tasks wasn’t terribly useful because there was no space delimiter between a username and a group. Results of such scan of the directory tree will have to later be parsed by other tools, and that’s why proper splitting of the output into separate fields is so important.

 

This issue was motivational enough to refresh my Perl skills and sketch the following script (based entirely on this Never Run Unix Find Again article).

It’s a very simple piece of code which takes a directory to scan as a parameter.

How this works

As you can see, we’re using the standard File:Find functionality, and the two parameters find function takes are the wanted function, where you put conditions for your search.

Within this function, you call lstat to obtain all the necessary information about each directory entry, and then output the necessary fields.

Perl code

#!/usr/bin/perl
use File::Find;
if ($ARGV[0] ne "") {
        $dir = $ARGV[0];
} else {
        print "Please specify a directory!";
        exit;
}

find &wanted, $dir;

sub wanted {
  my $dev;         # the file system device number
  my $ino;         # inode number
  my $mode;        # mode of file
  my $nlink;       # counts number of links to file
  my $uid;         # the ID of the file's owner
  my $gid;         # the group ID of the file's owner
  my $rdev;        # the device identifier
  my $size;        # file size in bytes
  my $atime;       # last access time
  my $mtime;       # last modification time
  my $ctime;       # last change of the mode
  my $blksize;     # block size of file
  my $blocks;      # number of blocks in a file
  my $user;	# username
  my $group;	# unix group name

#Right below here your telling lstat to retrieve all this info on each and every file/directory. Each and every file/directory is written to $_.

  (($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks) = lstat($_));
  $user = getpwuid($uid);
  $group = getgrgid($gid);

  print $File::Find::name . ":$mode:$size:$user:$group:$ctime:$mtimen";
}

Hope you find this useful. Good luck with finding all your files! 🙂

For further reading, please consult the Perldoc section on File:Find.




Unix File Types

In Unix systems, there are 6 file types. Below I will give a very short description of each.

How to find out the type of file in Unix

The first and most obvious way to confirm the type of a particular file is to use the long-format output of ls command, invoked by the -l option:

$ ls -l * 
 -rw-r--r-- 1 greys greys       1024 Mar 29 06:31 text

The very first field of such output is the file type and access permissions field, I’ll cover them in a separate post in the future. For now, just concentrate on the first character in this field. In this particular case, it’s “-“, which means it’s a regular file. For other file types, this character will be different.

Regular file

This is the most common type of a file in Unix. Being a plain collection of bytes with arbitrary data. There’s nothing mysterious about this type. Most of the files you will ever work with are regular.

In long-format output of ls, this type of file is specified by the “-” symbol.

Directory

This is a special type of a file in Unix, which only contains a list of other files (the contents of a directory). You don’t work with directories directly, instead you manage them with standard commands provided with your OS. The whole directory structure of your Unix system is made of such special files with directory content in each of them.

In long-format output of ls, this type of file is specified by the “d” symbol:

$ ls -ld * 
 -rw-r--r-- 1 greys greys	1024 Mar 29 06:31 text
 drwxr-xr-x 2 greys greys	4096 Aug 21 11:00 mydir

Special Device File

This type of files in Unix allows access to various devices known to your system. Literally, almost every device has a special file associated with it. This simplifies the way Unix interacts with different devices – to the OS and most commands each device is still a file, so it can be read from and written to using various commands. Most special device files are owned by root, and regular users cannot create them,

Depending on the way of accessing each device, its special device file can be either a character (shown as “c” in ls output) or a block (shown as “b”) device. One device can have more than one device file associated, and it’s perfectly normal to have both character and block device files for the same device.

Most special device files are character ones, and devices referred by them are called raw devices. The simple reason behind such a name is that by accessing the device via its special device character file, you’re accessing the raw data on the device in a form the device is ready to operate with. For terminal devices, it’s one character at a time. For disk devices though, raw access means reading or writing in whole chunks of data – blocks, which are native to your disk. The most important thing to remember about raw devices is that all the read/write operations to them are direct, immediate and not cached.

Block device file will provide similar access to the same device, only this time the interaction is going to be buffered by the kernel of your Unix OS. Grouping data into logical blocks and caching such blocks in memory allows the kernel to process most I/O requests much more efficiently. No longer does it have to physically access the disk every time a request happens. The data block is read once, and then all the operations to it happen in the cached version of it, with data being synced to the actual device in regular intervals by a special process running in your OS.

Here’s how the different types of special device files look in your ls output:

$ ls -al /dev/loop0 /dev/ttys0
brw-rw---- 1 root disk 7,  0 Sep  7 05:03 /dev/loop0
crw-rw-rw- 1 root tty  3, 48 Sep  7 05:04 /dev/ttys0

Named Pipe

Pipes represent one of simpler forms of Unix interprocess communication. Their purpose is to connect I/O of two Unix processes accessing the pipe. One of the processes uses this pipe for output of data, while another process uses the very same named pipe file for input.

In long-format output of ls, named pipes are marked by the “p” symbol:

$ ls -al /dev/xconsole
prw-r----- 1 root adm 0 Sep 25 08:58 /dev/xconsole

Symbolic Link

This is yet another file type in Unix, used for referencing some other file of the filesystem. Symbolic link contains a text form of the path to the file it references. To an end user, symlink (sort for symbolic link) will appear to have its own name, but when you try reading or writing data to this file, it will instead reference these operations to the file it points to.

In long-format output of ls, symlinks are marked by the “l” symbol (that’s a lower case L). It also show the path to the referenced file:

$ ls -al hosts
lrwxrwxrwx 1 greys www-data 10 Sep 25 09:06 hosts -> /etc/hosts

In this example, a symlink called hosts points to the /etc/hosts file.

Socket

A Unix socket (sometimes also called IPC socket – inter-process communication socket) is a special file which allows for advanced inter-process communication. In essence, it is a stream of data, very similar to network stream (and network sockets), but all the transactions are local to the filesystem.

In long-format output of ls, Unix sockets are marked by “s” symbol:

$ ls -al /dev/log
srw-rw-rw- 1 root root 0 Sep  7 05:04 /dev/log

That’s it. Hope this gave you a better idea of what file types you can find working on your Unix system. I’ll obviously expand relevant topics in the future. Let me know if there’s anything in particular you’d like me to concentrate on!




URL file-access is disabled in the server configuration

I’ve recently upgraded Apache and PHP on my VPS, and one of the unpleasant surprises was that some scripts which tried including pages from remote sites (I know, not the most secure approach, but there were reasons for that) got broken.

allow_url_fopen

Traditionally, all the websites Google finds suggest that you double-check that your php.ini config has the allow_url_fopen enabled:

allow_url_fopen = On

Well, in my case it was enabled, but scripts were still broken. The really weird thing was that the upgrade procedure didn’t include changing the php.ini in any way, so it was fully working before and I kind of expected it to continue working.

allow_url_include

After some quick research, I’ve found out that PHP 5.1 introduced a new security option to accompany the allow_url_fope, and this was exactly the option which broke my scripts:

allow_url_include = On

There you have it, hope it helps you next time you come across this problem!