Shell beginners guide

✔️ Shell core principles

THE "STEERING WHEEL" OF THE SYSTEM : it exposes all the operating system's features to the user.
THE AIM IS TO BE LAZY : automate tasks and have the computer perform them on your behalf.
NO ABSTRACTIONS OF ANY KIND : for instance, the shell does not know what an "object" is.
NO DATA TYPES : shell variables are not typed at all (no "string", "number" etc ...).
REGEX IS SUPPORTED : the POSIX regex implementation unleashes the full power of the shell.

✔️ Directories relevant to the shell

/etc stores config files for the system.
/var/log stores log files for various system programs (permissions may be restricted).
/bin stores several commonly used programs (some of which we will learn about in the rest of this tutorial).
/usr/bin another location for programs on the system.

✔️ Typing a command in the shell

Linux is case sensitive.
Prefixing a character with a backslash \( escapes it, which means that it nullifies its interpretation by the shell as a special character.
Long hand command line options begin with two dashes --option and short hand options begin with a single dash -o.
When using a single dash, several options can be invoked by placing all the letters together after the dash -lAh.
Option requiring arguments generally have to be placed separately along with their corresponding argument.
The Linux command line does not have an undo feature. Perform destructive actions carefully.
You can stop/exit from a running shell command with <Ctrl> + c
"Whenever you get in trouble you can generally press + c to get yourself out of trouble."

✔️ Shell wildcards

Wildcards are a set of building blocks that are used in patterns that the shell will translate to a set of files or directories.
Referring to a file or directory on the command line means referring to a path, and that path can use wildcards so it is turned into a set of files or directories.

# list all files and directories starting with a b
ls -la b*

The wildcards are translated by the shell, not by the command itself (ls in that case)
The shell replace the wildcards pattern with every matching path and passes the resulting paths set to the command

✔️ Executing a command in the shell

When a command is executed, the shell looks for the invoked program (or executable) through a preset series of directories
That series of directories is stored in the environment variable $PATH and no other directories will be considered during the search
$PATH directories are searched sequentially, and the shell will execute the first executable it finds that matches the command
$PATH is an environment variable and as such can be modified by individual users to fit their needs (usually through .bashrc)
For example, users can manage different installations of the same program, create programs wrappers, access custom scripts from anywhere, etc ...

✔️ Words and shell expansion

When the shell receives a command (either from the command line or from a script) it breaks it up into words.
A word is a non-zero-length sequence of characters delimited by white spaces.
After this happens, the shell performs seven operations on the words.
These seven operations can change how the words are interpreted and are collectively known as shell expansion.
Enclosing single or multiple words in double quotes results in :
- The delimited sequence of words will be considered a single word
- The only operation performed by the shell will be variable substitution

Note : variable substitution displays line feeds correctly.

✔️ Standard streams, piping and redirections

Every process started from the command line has three data streams automatically attached to it :
- standard input (data fed into the program)
- standard output (data printed by the program, defaults to the terminal)
- standard error (for error messages, also defaults to the terminal)
Inside the process it is the opposite : the process reads data from its standard and writes data to its standard output and error
Using those different streams to read and write data is made possible by devices that are mapped to file descriptors :

stream name device file descriptor

0 STDIN /dev/stdin /proc/<processID>/fd/0

1 STDOUT /dev/stdout /proc/<processID>/fd/1

2 STDERR /dev/stderr /proc/<processID>/fd/2
For convenience, "self" can be used instead of the process ID : /proc/self/fd/1
This mechanisme allows communications between process and files through the use of piping and redirection operators
Any process that is on the left side of a piping or redirection operator has to provide an output
That output will then be written to whatever is on the right side of the operator :
- A file in the case of redirection operators < or > (for <, the file has to be on the left)
- A process input in the case of piping operator | (the output will be written to its standard input)
- The << and >> operators will write to a file in append mode instead of replace mode
Prefixing a redirection operator with a number redirects the corresponding stream output : 1>
Prefixing a stream number with an ampersand allows redirecting to a stream : &1
Example :

stream	name	device	file descriptor
0	STDIN	`/dev/stdin`	`/proc/<processID>/fd/0`
1	STDOUT	`/dev/stdout`	`/proc/<processID>/fd/1`
2	STDERR	`/dev/stderr`	`/proc/<processID>/fd/2`

# pipe first process to second process, write first process pid to stdout and have second process read it
$ echo -e "first command pid : $$\n" > /dev/stdout | echo -e "second command pid : $$\nstdin is [$(cat /dev/stdin)]"

# redirect file contents to process, then pipe process output to another process 
$ wc -c < somefile | echo "total number of chars is $(cat /dev/stdin)"

# redirect stderr to stdout (have to be placed at the end) and redirect stdout to a file in append mode
$ cat datafile doesntexist 1>> blabla 2>&1

Note : command arguments should be favored over reading stdin from inside the process or script whenever possible

✔️ Common shell utilities

head prints the first x lines from a file

# extract first 10 lines of datafile and fishyeah, sort, number lines and add separator 
$ head -qn 10 datafile fishyeah | sort -fdi | nl -s ' --> ' -w 10

# extract first 10 lines of datafile and fishyeah, display column 2 only on lines containing separator 
$ head -qn 10 datafile fishyeah | cut -s -d '.' -f 2

sed is a stream editor for filtering and transforming text

# execute a regexp match and replace on streamed file datafile, print newline 
# sed -r is POSIX extended, see https://remram44.github.io/regex-cheatsheet/regex.html
$ sed -r 's/\s([a-z]+)\s([0-9]{1})$/ eats : \2 \1 /g' datafile && echo 

# does the same by piping cat stdout to sed stdin (always use the dot) 
$ cat datafile | sed -r 's/\s([a-z]+)\s([0-9]{1})$/ eats : \2 \1 /g' - && echo

grep searches a given set of data and print every line matching a given pattern

# grep -E activates POSIX extended regexp for search patterns
$ grep -E -n '\s[oap]{1}[a-z]+\s[0-9]{1,2}$' datafile

# use absolute path, number lines, print file name, ignore binaries, print output context for each match (leading and trailing line) 
$ grep -E --color=always -nbHIC1 '\s[oap]{1}[a-z]+\s[0-9]{1,2}$' ~/.test_the_shell/datafile

# same on 2 files, opening the regexp to get matches from both
$ grep -E --color=always -nbHIC1 '\s[oapw]{1}[a-z]+\s[0-9]{1,2}$' datafile fishyeah

xargs builds and execute command lines from standard input

# xargs reads items from the standard input, delimited by blanks (which can be protected with double or single quotes or a backslash) or newlines
# it then executes the command (default is /bin/echo) one or more times with any initial-arguments followed by items read from standard input
# Blank lines on the standard input are ignored. xargs can be a nice alternative to for loops in bash scripts if seeking performance
$ ls ./dirtextfiles | xargs -tn1 -I '%1' cat ./dirtextfiles/%1 >> ./dirtextfiles/.hiddentest3.txt

✔️ Running commands in the background

Adding an ampersand & at the end of a command will make the shell run the command in the background and create a job.
Jobs can be moved between the foreground and background : Ctrl + z pauses the running foreground process and moves it into the background.
Opposite, fg can be used to bring background processes to the foreground.

# start background command
$ sleep 60 && echo \ && echo "dear $USER, job has completed" & 
[1] 2864 
# view jobs
$ jobs 
[1]+  Running   sleep 60 && echo \  && echo "dear $USER, job has completed" &
# bring job 1 to foreground
$ fg 1 
sleep 60 && echo \ && echo "dear $USER, job has completed"
# command returns ...