Shell beginners guide

What is a shell and how to use it

view on github

✔️ Shell core principles

  • THE "STEERING WHEEL" OF THE SYSTEM : it exposes all the operating system's features to the user.
  • THE AIM IS TO BE LAZY : automate tasks and have the computer perform them on your behalf.
  • NO ABSTRACTIONS OF ANY KIND : for instance, the shell does not know what an "object" is.
  • NO DATA TYPES : shell variables are not typed at all (no "string", "number" etc ...).
  • REGEX IS SUPPORTED : the POSIX regex implementation unleashes the full power of the shell.

✔️ Directories relevant to the shell

  • /etc stores config files for the system.
  • /var/log stores log files for various system programs (permissions may be restricted).
  • /bin stores several commonly used programs (some of which we will learn about in the rest of this tutorial).
  • /usr/bin another location for programs on the system.

✔️ Typing a command in the shell

  • Linux is case sensitive.
  • Prefixing a character with a backslash \( escapes it, which means that it nullifies its interpretation by the shell as a special character.
  • Long hand command line options begin with two dashes --option and short hand options begin with a single dash -o.
  • When using a single dash, several options can be invoked by placing all the letters together after the dash -lAh.
  • Option requiring arguments generally have to be placed separately along with their corresponding argument.
  • The Linux command line does not have an undo feature. Perform destructive actions carefully.
  • You can stop/exit from a running shell command with <Ctrl> + c
  • "Whenever you get in trouble you can generally press + c to get yourself out of trouble."

✔️ Shell wildcards

  • Wildcards are a set of building blocks that are used in patterns that the shell will translate to a set of files or directories.
  • Referring to a file or directory on the command line means referring to a path, and that path can use wildcards so it is turned into a set of files or directories.
# list all files and directories starting with a b
ls -la b*
  • The wildcards are translated by the shell, not by the command itself (ls in that case)
  • The shell replace the wildcards pattern with every matching path and passes the resulting paths set to the command

✔️ Executing a command in the shell

  • When a command is executed, the shell looks for the invoked program (or executable) through a preset series of directories
  • That series of directories is stored in the environment variable $PATH and no other directories will be considered during the search
  • $PATH directories are searched sequentially, and the shell will execute the first executable it finds that matches the command
  • $PATH is an environment variable and as such can be modified by individual users to fit their needs (usually through .bashrc)
  • For example, users can manage different installations of the same program, create programs wrappers, access custom scripts from anywhere, etc ...

✔️ Words and shell expansion

  • When the shell receives a command (either from the command line or from a script) it breaks it up into words.
  • A word is a non-zero-length sequence of characters delimited by white spaces.
  • After this happens, the shell performs seven operations on the words.
  • These seven operations can change how the words are interpreted and are collectively known as shell expansion.
  • Enclosing single or multiple words in double quotes results in :
    • The delimited sequence of words will be considered a single word
    • The only operation performed by the shell will be variable substitution

Note : variable substitution displays line feeds correctly.

✔️ Standard streams, piping and redirections

  • Every process started from the command line has three data streams automatically attached to it :

    • standard input (data fed into the program)
    • standard output (data printed by the program, defaults to the terminal)
    • standard error (for error messages, also defaults to the terminal)
  • Inside the process it is the opposite : the process reads data from its standard and writes data to its standard output and error

  • Using those different streams to read and write data is made possible by devices that are mapped to file descriptors :

    stream name device file descriptor
    0 STDIN /dev/stdin /proc/<processID>/fd/0
    1 STDOUT /dev/stdout /proc/<processID>/fd/1
    2 STDERR /dev/stderr /proc/<processID>/fd/2
  • For convenience, "self" can be used instead of the process ID : /proc/self/fd/1

  • This mechanisme allows communications between process and files through the use of piping and redirection operators

  • Any process that is on the left side of a piping or redirection operator has to provide an output

  • That output will then be written to whatever is on the right side of the operator :

    • A file in the case of redirection operators < or > (for <, the file has to be on the left)
    • A process input in the case of piping operator | (the output will be written to its standard input)
    • The << and >> operators will write to a file in append mode instead of replace mode
  • Prefixing a redirection operator with a number redirects the corresponding stream output : 1>

  • Prefixing a stream number with an ampersand allows redirecting to a stream : &1

  • Example :

# pipe first process to second process, write first process pid to stdout and have second process read it
$ echo -e "first command pid : $$\n" > /dev/stdout | echo -e "second command pid : $$\nstdin is [$(cat /dev/stdin)]"

# redirect file contents to process, then pipe process output to another process 
$ wc -c < somefile | echo "total number of chars is $(cat /dev/stdin)"

# redirect stderr to stdout (have to be placed at the end) and redirect stdout to a file in append mode
$ cat datafile doesntexist 1>> blabla 2>&1

Note : command arguments should be favored over reading stdin from inside the process or script whenever possible

✔️ Common shell utilities

  • head prints the first x lines from a file
# extract first 10 lines of datafile and fishyeah, sort, number lines and add separator 
$ head -qn 10 datafile fishyeah | sort -fdi | nl -s ' --> ' -w 10

# extract first 10 lines of datafile and fishyeah, display column 2 only on lines containing separator 
$ head -qn 10 datafile fishyeah | cut -s -d '.' -f 2   
  • sed is a stream editor for filtering and transforming text
# execute a regexp match and replace on streamed file datafile, print newline 
# sed -r is POSIX extended, see
$ sed -r 's/\s([a-z]+)\s([0-9]{1})$/ eats : \2 \1 /g' datafile && echo 

# does the same by piping cat stdout to sed stdin (always use the dot) 
$ cat datafile | sed -r 's/\s([a-z]+)\s([0-9]{1})$/ eats : \2 \1 /g' - && echo
  • grep searches a given set of data and print every line matching a given pattern
# grep -E activates POSIX extended regexp for search patterns
$ grep -E -n '\s[oap]{1}[a-z]+\s[0-9]{1,2}$' datafile

# use absolute path, number lines, print file name, ignore binaries, print output context for each match (leading and trailing line) 
$ grep -E --color=always -nbHIC1 '\s[oap]{1}[a-z]+\s[0-9]{1,2}$' ~/.test_the_shell/datafile

# same on 2 files, opening the regexp to get matches from both
$ grep -E --color=always -nbHIC1 '\s[oapw]{1}[a-z]+\s[0-9]{1,2}$' datafile fishyeah
  • xargs builds and execute command lines from standard input
# xargs reads items from the standard input, delimited by blanks (which can be protected with double or single quotes or a backslash) or newlines
# it then executes the command (default is /bin/echo) one or more times with any initial-arguments followed by items read from standard input
# Blank lines on the standard input are ignored. xargs can be a nice alternative to for loops in bash scripts if seeking performance
$ ls ./dirtextfiles | xargs -tn1 -I '%1' cat ./dirtextfiles/%1 >> ./dirtextfiles/.hiddentest3.txt

✔️ Running commands in the background

  • Adding an ampersand & at the end of a command will make the shell run the command in the background and create a job.
  • Jobs can be moved between the foreground and background : Ctrl + z pauses the running foreground process and moves it into the background.
  • Opposite, fg can be used to bring background processes to the foreground.
# start background command
$ sleep 60 && echo \ && echo "dear $USER, job has completed" & 
[1] 2864 
# view jobs
$ jobs 
[1]+  Running   sleep 60 && echo \  && echo "dear $USER, job has completed" &
# bring job 1 to foreground
$ fg 1 
sleep 60 && echo \ && echo "dear $USER, job has completed"
# command returns ...