EGLUG Systems Development Course

Tue, 29/06/2010 - 8:20pm — bahaa2008


You can find the course details here

Here is the outline for the course.

EGLUG Systems Development Course (ESDC)
 Amr Ali
 062810

Copyright Notice

	Copyright (C) 2010 by Amr Ali
	amr-ali.co.cc

	All rights reserved.

	This work is licensed under the Creative Commons
	Attribution-Noncommercial-Share Alike 3.0 Unported License.
	To view a copy of this license visit

	or send a letter to Creative Commons, 171 Second Street,
	Suite 300, San Francisco, California, 94105, USA.

Abstract

	This course is only a premier to the system development field
	and should not be treated as a reference or base a study of
	sorts upon it. It only teaches the principles of system design
	and development for developers that invested quite an effort into
	programming and willing to go further into the realm and valleys of
	the hacker.

Quotes

	"Software Engineering might be science; but that's not what I do.
	I'm a hacker, not an engineer." - Jamie Zawinski

	"I decry the current tendency to seek patents on algorithms.
	There are better ways to earn a living than to prevent others
	from making use of one's contributions to computer science."
	- Donald E. Knuth

	"Science is what we understand well enough to explain to a computer.
	Art is everything else we do." - Donald E. Knuth

	"Always program as if the person who will be maintaining your program
	is a violent psychopath that knows where you live." - Martin Golding

	"Coding styles are like assholes, everyone has one and
	no one likes anyone else's." - Eric Warmenhoven

Jokes

	"Computers are like air conditioners: they stop working when you open
	WINDOWS."

	"unzip; strip; touch; finger; mount; fsck; more; yes; unmount; sleep"
	- My daily UNIX command list.

	"The Internet: where men are men, women are men,
	and children are FBI agents."

	"One of the main causes of the fall of the Roman Empire was that,
	lacking zero, they had no way to indicate successful termination
	of their C programs." - Robert Firth

	"UNIX is user-friendly. It's just very selective about
	who its friends are."

	"Microsoft is not the answer,
	Microsoft is the question, NO is the answer." - Erik Naggum

	"I would love to change the world, but they won't give me
	the source code" - Amr Ali? :-)

My Personal Favorite

	"There are 10 kinds of people in this world, those that understand
	trinary, those that don't, and those that confuse it with binary."
	- Whoever understands that joke will do good in this course :-)

Table Of Contents

	1. Introduction
	1.1. Summary
	1.2. Prerequisites
	1.2.1. Programming Experience
	1.2.2. Field Of Experience
	1.2.3. Programming Languages
	1.2.4. Depth Of System Knowledge
	1.2.5. CPU Designs And Architectures
	1.2.6. Mentality
	2. UNIX Based Systems Communications
	2.1. User Space To User Space
	2.2. User Space To Kernel Space
	2.3. Kernel Space To Kernel Space
	2.4. InterProcess Communication
	2.4.1. Types Of Communication
	2.4.1.1. Signals
	2.4.1.2. Pipes
	2.4.1.3. Sockets
	2.4.1.4. Message Queues
	2.4.1.5. Semaphores
	2.4.1.6. SpinLocks
	2.4.1.7. Mutexes
	2.4.1.8. Shared Memory
	2.4.2. Synchronization
	2.4.3. Common Problems
	3. System Application Design
	3.1. Strategy
	3.2. Daemons
	3.3. Logs
	3.4. Storage
	3.5. Debugging
	4. Case Studies
	4.1. Case 1
	4.2. Case 2
	4.3. Case 3
	4.4. Final Project
	5. Author
	5.1. Background
	5.2. Contact Information
	6. Thanks

1. Introduction

1.1. Summary

	ESDC is for whoever wants to develop system level applications and
	solutions that are specifically designed for UNIX based systems.
	This is an open ended course which entitles the expansion of the above
	TOC at any time without prior notice to students.

1.2. Prerequisites

1.2.1. Programming Experience

	Whoever applies to this course should had quite an experience with
	different programming languages to have the mature mentality required
	for this course. Basically it is a "MUST" that a student had at least
	written one thousand (1000) line of code in any language.

1.2.2. Field Of Experience

	It is not required that students had done any system development
	prior to this course but it is rather preferred that they have
	done at least readings on the topic or have a general idea what
	the hell we are talking about.

1.2.3. Programming Languages

	It is a "MUST" that a student is very fluent in C as a language
	but not necessarily in its libraries, however general knowledge
	of them is preferred and knowing how to use the man pages is
	a must along with knowing the meaning of "RTFM".

	ASM is not required at all, but knowing the different syntaxes
	is preferred and maybe little to how ASM as language works.

	BASH, I know I'm stating the obvious, but I don't either want to
	be fronted with questions about administering UNIX or how to build
	a Makefile, so BASH/M4 are best to be known.

1.2.4. Depth Of System Knowledge

	This course is built around UNIX based systems, mainly Linux,
	so it is a "MUST" to know your way around that system and no,
	this is not "how to make windows drivers" course.

1.2.5. CPU Designs And Architectures

	It is strongly preferred that a student knows about the different
	CPU architectures and designs and how they contribute to system
	development, that's why some ASM would be preferred in general.

	Just know this, SMP is trouble, lock your code good, or keep debugging
	day and night until you bleed out of your eye sockets.

1.2.6. Mentality

	You should have the mentality of a hunter, you never quite, and
	you never surrender to failure, you keep trying and trying.
	Always alert to the tiniest details and ready to adapt new
	tactics and techniques quickly, which requires dedication
	and effort.

2. UNIX Based Systems Communications

2.1. User Space To User Space

	First let me introduce you to what is user space,
	user space is whatever made by mere humans and runs
	in the background (ex. background process, or daemon)
	and of course cannot communicate directly with the
	physical layer of your computer.

	User space communications mainly deals within the area
	of IPC (InterProcess Communication), like shared memory
	segments, message queues, and named/unnamed pipes.

	Forget the ideas you had to have a certain file that
	some process writes to and another reads from, I had these
	ideas when I was 12, you are an adult use IPC.

	Apparently main purposes of IPC is to make two processes
	run totally independently of each other and communicate in
	an efficient way to pass certain information to each other
	back and forth.

2.2. User Space To Kernel Space

	"Kernel: is the central component of most computer operating
	systems; it is a bridge between applications and the actual
	data processing done at the hardware level." - Wikipedia

	So let me put this in more simple words, the kernel is basically
	the guy that facilitates the usage of your hardware, bluntly,
	without a kernel and you wanted to print out "hello." to your screen
	you will have to write the code that communicates to your PCI bus and
	to your video card, with all the pedantic necessary op codes and flags
	to make this 6 characters word appear to your dead cold black screen.

	But if we have our kernel how we communicate to it? can we include just
	a couple of C header files and it will contain all the above mentioned
	code? the answer is of course YES. Thats the main purpose of the
	standard C library or `stdlib', which simply is an abstraction to all
	the assembly (yes ASM, you can't talk to the kernel directly in C)
	required to communicate back and forth with your virtual/real hardware.

	However we still want to communicate directly with our kernel, isn't there
	any other possible ways except for ASM? of course there is other ways
	you silly sally, which are still another abstraction over the ASM
	interfaces the kernel provides, like IOCTL, ProcFS (Linux only),
	NetLink, and System Calls (SysCalls) all these are called IPC, but
	I don't like calling them that, so I'll call them
	ISC (InterSpace Communication).

2.3. Kernel Space To Kernel Space

	Lets imagine that you got so elite to the point that you created two
	kernel modules, and you would like to exchange information between
	them. The thing you must understand about the Linux kernel is that
	it ends up compiled to a single file, everything is shared inside the
	kernel, so you declare a certain function it becomes exported to what
	is known to be KST (Kernel Symbols Table), this table will contain
	all the functions and variables you've exported, so other parts of
	the kernel can call them.

	As a side note, as I want to impose the kernel space image upon you,
	there is no floating point arithmetics in kernel space, just because
	its not worth it. That can tell you how pedantic the process of code
	getting selected to go inside the main kernel repo is, so don't expect
	any kind of functions/libraries you are used to in user space to exist
	in kernel space.

2.4. InterProcess Communication

2.4.1. Types Of Communication

	There are mainly 8 types of communication, three of which are locking
	mechanisms, Signals, Pipes, Sockets, Message Queues, Semaphores,
	SpinLocks, Mutexes, and Shared Memory. Each will be described
	in the following sub sections, but the general idea is, they
	are all are used "not" equally, they all have their different
	purposes.

2.4.1.1. Signals

	These are one of the oldest methods to build interrupt based
	applications, which means that a signal can be sent to a process
	or two in case of an event, or the receiving of certain new data.
	Interrupt based style is heavily used inside the kernel so you
	should understand this type of communication in great depth if
	you are planning to dive into kernel space. Also must note that
	this type of communication is asynchronous, which means there is
	no two parallel ways of communication, its only one way, or live
	on one wire as our friends at the electric engineering department
	would love to call it.

2.4.1.2. Pipes

	This is a unidirectional byte stream way of communication, which
	connects the standard output from one process into the standard
	input of another process, of course that bridge is made using
	files, but not just normal usual files, they are files that are on
	a VFS inode which itself points to a physical page within memory.
	Must note that these are unnamed pipes, there are also named pipes
	which create real files with the only difference that synchronization
	must be handled by you, locking, etc. so don't expect the magic of
	unnamed pipes, but you can set permissions to FIFOs (they are called
	that because they work with the principle of First In First Out, so
	what you write first will be read first on the other end), they can
	be created simply by the command `mkfifo' (RTFM).

2.4.1.3. Sockets

	I'm sure most of you people have stumbled upon that concept before
	or read about it somewhere, or even used it. Its simply what makes
	today's networks and even the Internet, they are all operate on the
	concept of sockets. Figured it out already?, well they are simply
	all ports and IP/Host addresses/names, but not necessarily used only
	in the case of over the network communication, it can also be used
	as IPC, heard of UNIX sockets before? and yes they are different than
	TCP/IP sockets. I'm not going into much detail over this here, as this
	has literally tons of information over the Internet, which you sir/mam
	can google out on your own.

2.4.1.4. Message Queues

	You can think of this type of communication as in one-to-many
	relationship, which happens if you want to send a message to many
	processes, the only difference between it and mailboxes is that
	it has restriction on the size of each message, and shares the
	same synchronization as mailboxes which is asynchronous, meaning
	that the sender and the receiver do not need to interact with the
	message queues at the same time. It's also similar in some ways to
	Pipes, except that it all happens in memory, no files.

2.4.1.5. Semaphores

	These are more of locks than a way to communicate, and happens mostly
	over some shared resource either resides in memory or on disk. Simply
	they are a location in memory which value can be tested and set by more
	than one process, the test/set operations are atomic or uninterruptible
	which from a process point of view; once started nothing can stop it.

	You can think of them as some variable that get incremented if a
	process or thread jumps in the critical region to modify some
	critical resource (ex. memory page, file, etc.), and once finished
	with that region in question, it decrements the value in the variable
	in an atomic fashion.

	Semaphores are not the best solution out there, its quite expensive
	to lock and unlock a semaphore, it takes literally thousands of CPU
	cycles to do so, because of the system calls that had to be made. But
	they had their uses, for example if your critical region is supposed
	to be just setting an integer value to a variable, then it is a very
	bad idea to use semaphores. But if you have an operation like writing
	to some file, then its worth it to put some process to sleep and then
	wake it up after you finish, which what semaphores does.

2.4.1.6. SpinLocks

	SpinLocks on the other hand are the fastest out there, simply because
	they are hardware implemented, note that if you are working on a single
	core/processor system, SpinLocks are useless unless you have preemptive
	kernel, or preemption is compiled into your kernel. SpinLocks are the
	fastest simply because they are implemented in hardware, not like other
	locking methods which implemented in software. However take extra care
	when exactly you use SpinLocks, as their name says it, they do a busy
	spin, which means that they keep spinning in a while loop and saving
	the time of sleeping the process and waking it up again, so if the
	critical section is taking more than a thread quantum, then SpinLocks
	are a very bad idea.

2.4.1.7. Mutexes

	You can think of Mutexes as hybrids for SpinLocks and Semaphores, which
	explains why they are the most used in user space applications. They
	do require expensive system calls when locking, but when you do unlocks
	it does it without the kernel help, which saves half the time
	Semaphores take. Basically if you don't know what you are doing, your
	best bet is to use Mutexes, they are widely known and understood, so
	you won't be bothered with all the technicalities, but yet again
	if you don't want to be bothered, this course is totally not for you
	sir/mam, go have some windows lecture instead.

2.4.1.8. Shared Memory

	Shared pages of memory are what they are, I can't really think of a
	better to describe them except that they just have an id just like
	MQueues (Message Queues, duh sherlock), but just simply share memory
	pages, not necessarily having the same address for in each process
	accessing them, but they do reference the same page, the mechanics
	of this part is complex and deep, so I leave them to later on in
	course.

2.4.2. Synchronization

	We have seen two terms till now, asynchronous and synchronous
	operations, the two differ only by one character but in meaning
	they differ a lot. Asynchronous operations are operations that
	do not necessarily expect answers right away, an example would
	be your email, you send an email to a friend, but do you expect
	him to answer instantly? no. Synchronous operations on the other
	hand are operations that do expect answers instantly, basically
	they do block on answers, an example would be talking with your
	friend on the phone, when you talk, he listens and responds in
	the same conversation context.

2.4.3. Common Problems

	Most IPC methods of communication are known to share one big
	common problem, which is synchronization, which in the context
	of computers and applications should be addressed in terms of
	managing access between processes/threads, especially on a system
	that has more than one CPU (either virtual or physical).

	One of these problems does touch security, like race conditions.
	Race conditions happen when two threads or processes race for
	a certain operation, like setting or reading a value, when that
	happens, it can exploited to corrupt the system memory, or even
	gain unauthorized access to the system itself, so must take extra
	caution when setting up locks.

	Another problem is deadlocks, which happen when a certain process
	or thread locks and dies before it unlocks, which keeps either all
	other processes/threads waiting on the lock or spinning on the lock
	which ultimately results to the termination of the application.

	These kind of problems are very hard to debug, so I felt mentioning
	them in a separate section to show how important they are, or otherwise
	you will end up producing very bad code and some guy with a tie and
	a suit knows second to none about computers, yelling at you real loud
	asking for the name of the person that taught you these stuff to murder
	with a chainsaw.

3. System Application Design

3.1. Strategy

	If you grown up to be the strategy nut I am, I'm sure you will be very
	good at these stuff, simply because on the fly planning always works,
	pre-planning stuff always fail, and you need to have a vision into
	things along adapting and inventing different strategies of your own
	to be able to see bugs and errors at a glance and be able to mitigate
	them right away and know exactly what to modify and what to lave as-is,
	its really a very good skill to have, and as every other skill it comes
	with effort and training. The only method I found very effective to
	train that kind of skill is to look at others code, see and learn how
	they done things in their own way and style, try to understand it from
	the little tiny pieces, put the pieces together to form the mental
	image they had when they first developed that application and see
	if it can be improved in anyways.

	Bluntly as I always say, try to be more of a hacker that investigate
	every single detail and tries to understand the whole of everything,
	its not a shame to fail thousands of times, but it is a shame to be
	ignorant even about the little tiny things that everybody else
	discards, and always remember that knowledge is power.

3.2. Daemons

	So what are these little evil daemons, huh? They are simply what
	windows people call them, "background processes" or "services",
	if you never developed one before its time to build one. The main
	difference between a daemon and a process like "top &" is that
	the former closes all standard input descriptors (stdin, stdout,
	stderr), and forks another process which stays in a conditional
	loop till the condition that it is keeping the loop running is
	gone (ex. waiting on a SIGTERM signal). Also logging is a major
	trait of daemons, most have logs of their own and those that don't
	make use of other pre-installed logging systems.

3.3. Logs

	Any daemon should have one or more ways to communicate with the system
	administrator, if its not logs, what would it be? You can't communicate
	thorough standard output because a daemon is never attached to any
	ttys or pts's, so it gotta be logging. There are several mechanisms for
	logging, either you design your own logging, but will have to rotate
	your logs, so you won't end up with one file of gigs of bytes. Rotating
	logs isn't hard, you can make use of `logrotate'. Or you can save
	yourself all the trouble and make use of `syslog' which is a logging
	system that provides a very simple interface that you can make use of.

3.4. Storage

	Some daemons needs some way of organized storage, there are several
	solutions to fulfill your storage needs, one is to use `SQLite',
	which is used by many applications, like APT just to name one, but
	`SQLite' is only for not so sophisticated schemes of storage, and
	you should only use it in cases that does not require huge data
	sets to be stored. If your application requires some heavy duty
	DB system, I strongly recommend the usage of `MySQL', its a very
	good and a well known DBMS that comes a long with a very well
	done API.

3.5. Debugging

	Debugging daemons is specifically hard, because it involves threads
	so you want to know where exactly a certain bug is, however you need
	first to learn how to escape from the fork being done at first that
	spawns the background process. `set follow-fork-mode child' this
	command shall force GDB to follow a fork child, meaning that once
	your daemon forks a background process, GDB starts debugging the
	new child, and the parent is simply discarded and let to die.

	To know how many threads are running and which one is currently in
	context, you issue `info threads' which will display a numbered
	list. But what if you wanted to switch to one of these threads? easy
	you just issue `thread [threadno]' where [threadno] is the thread
	number you got from `info threads'. But I strongly recommend that you
	learn GDB from ground up, as it is a one essential tool in development
	under UNIX based systems.

4. Case Studies

4.1. Case 1

	Develop an application that creates exactly two threads to calculate
	parallel Fibonacci sequence starting at any given point in the seq.

	ex. of input file ...
	04 08
	89 21
	55 89
	13 21

	The first column is where the Fibonacci sequence begins and the second
	column is where it ends. The results should be outputted to standard
	output in the form of each line in the input file corresponds to a line
	in standard output.

	Also note that the order of the columns, where the Fibonacci sequence
	begins and ends is not sorted, so you might find the first column is
	the beginning of the sequence and other times the second column is the
	where it begins.

4.2. Case 2

	Create another application that calculates the Collatz conjecture
	series based on the previous application output, only that it
	communicates with the previous application over shared memory pages
	and for each outputted line of Fibonacci sequence, a Collatz series
	has to be generated for each number in that line and outputted to
	standard output in the form of a table that each column begins with
	the original number from the first application and ends by one (1)
	(read Collatz conjecture on Wikipedia).

4.3. Case 3

	Create a daemon that forks to the background and closes all standard
	descriptors, creates a few threads to pre-calculates a very large
	number of Fibonacci and Collatz series, and a client application that
	communicates with the daemon over shared memory pages to get some of
	these results and display them to standard output.

	Note that the amount of results going to be requested from the daemon
	must come from the user not hard coded, so your application must ask
	for the amount of results before getting any.

4.4. Final Project

	With a team, do develop a server that listens on a specific IP/port
	which can handle simultaneous connections using forking or threading
	if you choose forking, you will have to implement IPC between the
	forked processes and the parent process as data in each process
	has to be shared across all other processes in a central fashion
	as the parent holds all the data in a shared memory page, and all other
	forked processes access it from there. If you however decided to go
	threading you have the advantage that you won't have to implement IPC
	in your system, but would have to design the threads as a worker thread
	and a thread pool, the worker thread, waits on connections and once
	a request for a connection is presented, a thread is assigned this
	connection from the thread pool. Once the connection ends, the thread
	gets released of the connection and back to the thread pool as an
	available thread again.

	This server purpose is to broadcast messages to each and every client
	connected to it, but also store last hour of messages in a queue for
	offline clients, so when they login to the server they receive all
	last hour messages which being exchanged, that means that all messages
	that been sent back and forth between clients. This also means that you
	will have to develop a client that connects to the server and be able
	to send and receive messages from and to the server.

	If you want to get fancy, develop a configuration file parser, and make
	the listening IP/port put into a file to be read by the server when it
	starts. This is not required, but its just a way for me saying, if you
	want to get creative its absolutely encouraged.

	Good luck :-)

5. Author

5.1. Background

	I've started coding since the early age of 10 years old, and once
	I started writing my first few lines on MSX-170/MSX-350, I never
	looked back, programming and being able to have full control over
	a machine has been an addiction of mine for many years gone and many
	to come. I've started to dwell in the security field by writing my
	first symmetric encryption algorithm by the age of 14, which got me
	even more interested in programming but at a totally different level,
	all I wanted ever since is to be able to code at the most intimate
	level of the machine, and so I have done, I'm now able to code
	some of the BIOS, learned Verilog and able to design and write FPGA
	solutions.

	As for security, lets just say, it became second nature to me and
	a passion, I see vulnerabilities in humans let alone code, I can
	manipulate about everything from a group of processes to a group
	of people. Its all comes down to this, once you discover this
	security 7th sense it just becomes like your sense of vision, it just
	changes all and every aspect of your life.

5.2. Contact Information

	Please visit http://amr-ali.co.cc

6. Thanks

	I'd like to thank mother for all the support, encouragement, and love
	she always gave me in that direction. (Love ya mommy :-P)

	Also would also like to give thanks to the people that effectively
	changed my life to the better and being patient all along ...

	Gerald M O'Steen - For being the awesome mentor he is and for teaching
		me everything he could and being a very very good friend.

	Mark LaDoux - For beating me like a dead cow till I matured and learned
		the ways of pursuing knowledge and being a good friend.

	Love you all guys <31337
Login to post comments
2712 reads
EGLUG Systems Development Course

Active forum topics

Who's new

Navigation

Who's online

User login

Book navigation