Shell beginners guide

What is a shell and how to use it

view on github

Introduction to the shell

Table of contents

  1. Overview
  2. Shell commands
  3. Words and shell expansion
  4. Standard streams, piping and redirections
  5. Common shell utilities
  6. Running commands in the background

Overview

  • Shell core principles

    • THE "STEERING WHEEL" OF THE SYSTEM : it exposes all the operating system's features to the user.
    • THE AIM IS TO BE LAZY : automate tasks and have the computer perform them on your behalf.
    • NO ABSTRACTIONS OF ANY KIND : for instance, the shell does not know what an "object" is.
    • NO DATA TYPES : shell variables are not typed at all (no "string", "number" etc ...).
    • REGEX IS SUPPORTED : the POSIX regex implementation unleashes the full power of the shell.
  • Directories relevant to the shell

    directory contents
    /etc Config files for the system
    /var/log Log files for various system programs
    /bin Commonly used programs
    /usr/bin Another location for programs on the system

Shell commands

  • Typing a command in the shell :

    • Linux is case sensitive.
    • Prefixing a character with a backslash \( escapes it: it nullifies its interpretation by the shell as a special character.
    • Long hand command line options begin with two dashes --option and short hand options begin with a single dash -o.
    • When using single dash, several options can be invoked by placing all the letters together after the dash -lAh.
    • Option requiring arguments generally have to be placed separately along with their corresponding argument.
    • "The Linux command line does not have an undo feature. Perform destructive actions carefully."
    • You can stop/exit from a running shell command with Ctrl+c
    • "Whenever you get in trouble you can generally press Ctrl+c to get yourself out of trouble."
  • Shell wildcards

    • Wildcards are a set of building blocks used in patterns that the shell will translate to a set of files or directories.
    • Referring to a file or directory on the command line means referring to a path, and that path can use wildcards so it is turned into a set of files or directories.
    • The wildcards are translated by the shell, not by the command itself (ls in that example).
    • The shell replace wildcards patterns with every matching path and passes the resulting paths set to the command.
# list all files and directories starting with a b
ls -la b*
  • Executing a command in the shell

    • When a command is executed, the shell looks for the invoked program (or executable) through a preset series of directories.
    • That series of directories is stored in the environment variable $PATH and no other directories will be considered during the search.
    • $PATH directories are searched sequentially, and the shell will execute the first executable it finds that matches the command.
    • $PATH is an environment variable and as such can be modified by individual users to fit their needs (usually through .bashrc).
    • For instance, users can manage different installations of the same program, create programs wrappers, access custom scripts from anywhere, etc ...

Words and shell expansion

  • When the shell receives a command (either from the command line or from a script) it breaks it up into words.
  • A word is a non-zero-length sequence of characters delimited by white spaces.
  • After this happens, the shell performs seven operations on the words.
  • These seven operations can change how the words are interpreted and are collectively known as shell expansion.
  • Enclosing single or multiple words in double quotes results in :
    • The delimited sequence of words will be considered a single word.
    • The only operation performed by the shell will be variable substitution.
# word expansion happens, command creates two directories
mkdir some directory

# word expansion prevented, command creates one directory
mkdir "some directory"

Note : variable substitution displays line feeds correctly.


Standard streams, piping and redirections

  • Every process started from the command line has three data streams automatically attached to it :

    • standard input (data fed into the program).
    • standard output (data printed by the program, defaults to the terminal).
    • standard error (for error messages, also defaults to the terminal).
  • Inside the process it is the opposite : the process reads data from its standard and writes data to its standard output and error.

  • Using those different streams to read and write data is made possible by devices that are mapped to file descriptors :

    stream name device file descriptor
    0 STDIN /dev/stdin /proc/<processID>/fd/0
    1 STDOUT /dev/stdout /proc/<processID>/fd/1
    2 STDERR /dev/stderr /proc/<processID>/fd/2
  • For convenience, "self" can be used instead of the process ID : /proc/self/fd/1.

  • This mechanisme allows communications between process and files through the use of piping and redirection operators.

  • Any process that is on the left side of a piping or redirection operator has to provide an output.

  • That output will then be written to whatever is on the right side of the operator :

    • A file in the case of redirection operators < or > (for <, the file has to be on the left).
    • A process input in the case of piping operator | (the output will be written to its standard input).
    • The << and >> operators will write to a file in append mode instead of replace mode.
  • Prefixing a redirection operator with a number redirects the corresponding stream output : 1>

  • Prefixing a stream number with an ampersand allows redirecting to a stream : &1

  • Examples :

# pipe first process to second process, write first process pid to stdout and have second process read it
echo -e "first command pid : $$\n" | echo -e "second command pid : $$\nstdin is [$(cat /dev/stdin)]"

# redirect file contents to process, then pipe process output to another process
wc -c < "$some_file" | echo "total number of chars in $some_file is $(cat /dev/stdin)"

# redirect stderr to stdout (have to be placed at the end) and redirect stdout to a file in append mode
cat "$some_file" "$other_file" 1>> "$both_files_contents" 2>&1

Note : command arguments should be favored over reading stdin from inside the process or script whenever possible.


Common shell utilities

  • type man <utility> to display help and available options.
  • head prints the first x lines from a file :
# extract first 10 lines from 2 files, sort results, number lines and add separator
head -qn 10 "$some_file" "$other_file" | sort -fdi | nl -s ' --> ' -w 10

# extract first 10 lines from 2 files, display column 2 only on lines containing separator
head -qn 10 "$some_file" "$other_file" | cut -s -d '.' -f 2
  • sed is a powerful stream editor for filtering and transforming text :
# execute a regexp match and replace on streamed file ($some_file), print newline
# sed -r is POSIX extended, see https://remram44.github.io/regex-cheatsheet/regex.html
sed -r 's/\s([a-z]+)\s([0-9]{1})$/ eats : \2 \1 /g' "$some_file" && echo

# does the same by piping cat stdout to sed stdin (always use the dash)
cat "$some_file" | sed -r 's/\s([a-z]+)\s([0-9]{1})$/ eats : \2 \1 /g' - && echo
  • grep searches a given set of data and print every line matching a given pattern :
# grep -E activates POSIX extended regexp for search patterns
grep -E -n '\s[oap]{1}[a-z]+\s[0-9]{1,2}$' "$some_file"

# use absolute path, number lines, print file name, ignore binaries
# print output context for each match (leading and trailing line)
grep -E --color=always -nbHIC1 '\s[oap]{1}[a-z]+\s[0-9]{1,2}$' "$some_file"

# same on 2 files, opening the regexp to get matches from both
grep -E --color=always -nbHIC1 '\s[oapw]{1}[a-z]+\s[0-9]{1,2}$' "$some_file" "$other_file"
  • xargs builds and execute command lines from standard input :
# xargs reads items from the standard input, delimited by blanks (which can be protected with double or single quotes or a backslash) or newlines
# it then executes the command (default is /bin/echo) one or more times with any initial-arguments followed by items read from standard input
# blank lines on the standard input are ignored. xargs can be a nice alternative to for loops in bash scripts if seeking performance
ls "$some_directory" | xargs -tn1 -I '%1' cat "$some_directory/%1" >> "$some_directory/$all_file_contents"

Running commands in the background

  • Adding an ampersand & at the end of a command makes the shell run the command in the background and create a job.
  • Jobs can be moved between foreground and background : Ctrl + z pauses the running foreground process and moves it into the background.
  • Conversely, fg can be used to bring background processes to the foreground.
# start background command
sleep 60 && echo \ && echo "dear $USER, job has completed" &

# output: job id
[1] 2864

# view jobs
jobs

# output: running jobs
[1]+  Running   sleep 60 && echo \  && echo "dear $USER, job has completed" &

# bring job 1 to foreground
fg 1

# process is foreground again
sleep 60 && echo \ && echo "dear $USER, job has completed"

# command returns ...