A Unix Cook-Booklet: Finding information

1. Finding information

1.1 Moving around and listing files (cd,ls)

By typing ls the files in the current directory are listed. Useful variants are ls -a (list also files whose name starts with a dot, since these are normally hidden), ls -l (list in long format) and ls -sF (show file size and type).

The command pwd prints the current directory. The directory can be changed by cd (change directory). For example, cd /usr/bin changes to directory /usr/bin. Just typing cd returns to the user home directory. The home directory can be found by typing pwd in the home directory, or by echo $HOME. In C-shell the tilde character also denotes the home directory.

Filenames that start with / are absolute filenames, such as /bin/sh. Other names are relative to the current directory. The dot denotes the current directory and two dots the parent directory (the directory one level up). For example, to cd one level up and then down in directory tex, one can do cd ../tex.

1.2 Locating files by name

The command locate (or glocate) finds files with a given name, or part of a name. The command relies on a database and therefore works quickly. Very recent files are not found, since the database is typically updated every night.

If not sure about locate, one can use find, which actually goes through the directory structure:


find . -type f -name '*myfile*' -print

The first argument is the directory to go through, the current directory . in this case. The search is limited to proper files only (usually not necessary). The filename pattern can contain wildcards such as * and ?. Quoting is necessary to prevent expansion by the shell. -print is necessary, otherwise nothing is shown.

On some systems commands analogous to locate exist and are named whereis or rfind.

1.3 Looking inside files

To search for a string in a single file, or a group of files that can be named by wildcards, one uses grep, for example


grep Janhunen *.tex
grep -i janhunen *.tex
grep -in '^c.*mrqmin' *.f *.h

In the first case all lines containing Janhunen are printed in all TeX/LaTeX source files in the current directory. In the next one, lower and uppercase letters are treated the same (ignore case, -i). The last example prints all Fortran comment lines (starting with c) that contain mrqmin, ignoring case and printing out line numbers (-n). The hat denotes the beginning of a line and .* stands for any number (*) of any characters (.) in between.

Grep uses regular expressions, thus characters such as ., *, [ and ] and some others have special meaning. Fgrep is a variant that does not use regular expressions but a plain string. Use fgrep if you don't need regular expressions.

The GNU/Linux version of grep recognizes options such as -3 which print out 3 lines surrounding the match also. This is often very useful.

To search in multiple directories one can do e.g.


grep -in janhunen *.tex */*.tex */*/*.tex
find . -type f -name '*.tex' -exec grep -in janhunen {} ";" -print

After -exec, find takes arguments to the executed command (grep), until a semicolon appears. The semicolon must be quoted to prevent interpretation by the shell. The {} passes the current filename to grep.

Often one would like to see or say something about the contents of an unknown file. If the file is a text file, one can do more myfile or less myfile to view it page by page, or to do head -20 myfile to display the first 20 lines, or tail -20 myfile to show the last 20 lines, or cat myfile to show it all, if the file is short. (To find the size one does ls -l myfile.)

A binary file is more difficult, but there is a command strings which prints out all printable substrings from the file's binary byte stream. This is very useful for finding out e.g. which binary program produces a given error message, for example.

The command file tries to guess the 'type' of a binary file the best it can. For example, it can be used to find out whether a byte stream read from a tape is in tar format or perhaps gzipped tar format.

Finally, one can take the file in the GNU emacs editor for viewing. One exits from emacs by typing Control-x Control-c.

1.4 The man-pages

The system manual pages are read by man, for example man man or man ls. man -k keyword lists those man-pages whose header contains the given keyword. The man-pages are grouped in numbered sections, for example all user commands are in section 1 and C-callable system calls in section 2. If the same name appears in multiple sections, man may show only the first one. In such a case one has to give man the section number also, for example, man 3 printf shows the C-function printf, while (on my Linux system) man printf would show the GNU command line version of printf. apropos is the same as man -k and xman is a version with graphical user interface.

To turn man-pages into normal text files (printable form) pipe the output to col -b. For example,


man awk | col -b | grep -i -3 aho

searches for the substring 'aho' in the awk man page, printing also 3 surrounding lines around the match (requires GNU version of grep).

1.5 Which command is executed

The command which, which is present in most modern shells, prints out which command is actually executed. For example, which ls might print out ls is /bin/ls. The commands are searched for in $PATH environment variable. To find its value, do echo $PATH. To add your own directories in PATH, do


setenv PATH ${PATH}:$HOME/bin          # in C-shell (csh,tcsh)
PATH=${PATH}:$HOME/bin; export PATH    # in Bourhe-shell (sh,ksh,bash)
export PATH=${PATH}:$HOME/bin          # simpler, works in ksh and bash

To find out which shell you are using, do echo $SHELL. You can set such definitions in your startup files. Csh and tcsh read ~/.login once when you log in and ~/.cshrc every time a new shell is started. Environment variables settings should in general be gathered in .login while aliases and shell variables (the set command) should rather be put in .cshrc. This is a rough guideline only and depends on personal taste. The Bourne-shell derivatives (sh,ksh,bash) read ~/.profile.

Next Previous Contents