Binary & Hex
This guide will help you to understand both packets and files analysis. This understanding is an essential tool to be a successful analyzer.
The Binary Language
A computer is made up of millions of transistors. Each one of them can either be on or off depending if the electricity flows through or not. These status are represented by 2 numbers: 0 (off) and 1 (on). This is called the binary language and it's the only thing your computer understands. This language is said to be base 2 because it has only 2 possible digits: 0 and 1. Here a little example:
0 0 1 0 0 1 1 0 1 1 0 1 1 1 0 0 1 1 0...
In binary language a digit is called a bit. Binary language being a bit too hard to read for humans, we decided to group them in packets of 8 bits and to call this packet a byte. The 8 bits of a byte a used and read the same way we do in decimal system. Here are the different values a byte can have:
1 byte = 8 bits -> corresponding decimal value 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 2 0 0 0 0 0 0 1 1 3 0 0 0 0 0 1 0 0 4 ... 1 1 1 1 1 1 1 1 255
To calculate an integer number in decimal (the base 10 we commonly use), we add 1 bit to each as a representation of the power of 2. The following powers are used as 2^N, where N is the place holder.
_ _ _ _ _ _ _ _ 7 6 5 4 3 2 1 0
Example: 0 0 0 0 1 1 0 1 = 2^3 + 2^2 + 2^0 = 13
To more easily add it up, you can precalculate each 2^N value:
_ _ _ _ _ _ _ _ 128 64 32 16 8 4 2 1
And just add the numbers when the bit is 1, so.. Example: 0 1 0 1 0 1 1 1 = 64 + 16 + 4 + 2 + 1 = 87
The Hexadecimal Language
The hexadecimal language is a base 16 system. It means each digit has 16 possible value: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F (0 to 15 in decimal). Because 2^4=16 it makes binary reading far more easier. An hexadecimal digit consist of 4 consecutive bits. It means you can write any byte value using 2 hexadecimal digits as follows:
original bits -> 4 bits groups -> hexadecimal 0 1 0 1 0 1 1 1 0101 0111 0x57
As you see we added a "0x" prefix to our hexadecimal number. This is the standard used to let readers know that the following digits are hexadecimal digits. 57 is decimal, 0x57 is hexadecimal (87 in decimal).
Another example:
1 1 0 0 1 0 1 1 = 128 + 64 + 8 + 2 + 1 = 203 1100 1011 binary 12 11 decimal C B hexadecimal
So 203 = 0xCB in hexadecimal.
TIP: You can use windows calculator to do these convesions quickly. Just change its mode to scientific.
For simplicity of this article, signed (positive and negative) are not covered, as well as floating point numbers.
You can google the following examples for further research.
- Signed numbers : "Two's complement"
- Floats : "IEEE Floating Point"
TIP: Also, if these guides do not suffice or are confusing, use google for better explainations, Wikipedia has A LOT more information on all of the subjects in this document.
The C++/C# language data types
The programmation languages provides various pre defined data types. This array shows what type can be expected from input data:
bytes amount | type | sign | values | C++ corresponding types | C# corresponding types |
---|---|---|---|---|---|
1 | integer | 0=false, else = true | bool | bool | |
1 | integer | signed | -128 to +127 (ascii characters) | char | sbyte |
1 | integer | unsigned | 0 to +255 (ansi characters) | unsigned char | byte |
2 | integer | signed | -32768 to +32767 | short | short |
2 | integer | unsigned | 0 to +65535 | unsigned short | ushort |
2 | char | unicode character | wchar_t | char | |
4 | integer | signed | -2147483648 to +2147483647 | int, long | int |
4 | integer | unsigned | 0 to +4294967295 | unsigned int, unsigned long, void* | uint |
4 | real | signed | -3.4*10^-38 to +3.4*10^38 | float | float |
8 | integer | signed | -9223372036854775808 to +9223372036854775807 | long long | long |
8 | integer | unsigned | unsigned long long | ulong | |
8 | real | signed | -1.7*10^-308 to +1.7*10^308 | double | double |
n | char | ansi/ascii characters | string, char[] | string | |
2n | char | unicode characters | wstring, wchar_t[] | string |
NOTE: In C++, "LONG" is commonly represented as "long long" while "long" by itself is interpreted as a 32 bit (4 byte) int on GCC and other ANSI standard compilers. For the sake of shortness however, we will refer "LONG" as a 64 bit (8 byte) integers.
NOTE: long type has a size that varies depending on operating system and processor. It has the size of all pointers (void*, int*, ...). If an environment has a 32bits RAM indexation (4GB max) then the long will be 4 bytes. If the environment has a 64 bits RAM indexation, the long will be 8 bytes.
Big Endian vs Little Endian (byte orders)
The little story: The term "endian" can be traced to Jonathan Swift's novel "Gulliver's Travels." In one of Gulliver's adventures, he encounters an island whose inhabitants bitterly argue over the correct way to open soft-boiled eggs - the little end or the big end. Little endians and big endians are each convinced that their method is the only correct method for opening an egg.
Big Endian
The BIG ENDIAN byte order is the natural read order: highest weigth digits on the left, and lowest weigth digits on the right.
Example: let's write 16909060 in hexadecimal (0x?? = 1byte = 8bits):
0x01 0x02 0x03 0x04 (bytes sequence in memory) 0x01020304 (reconstitued associated hexadecimal value) 16909060 (corresponding decimal value)
Note: also known as NETWORK BYTE ORDER, it's the standard used for IP communications as defined by the OSI model link layer. Though it does not mean all protocols respect the standard! It's used by: Sun Solaris (SPARC), HP-UX, (Motorola 68000), IBM System/370, ...
Little Endian
The LITTLE ENDIAN byte order is BIG ENDIAN opposite: lowest weigth digits on the left, and highest weigth digits on the right.
Example: let's write again 16909060 in hexadecimal (0x?? = 1byte = 8bits):
0x04 0x03 0x02 0x01 (bytes sequence in memory) 0x01020304 (reconstitued associated hexadecimal value) 16909060 (corresponding decimal value)
Why the fuck does LITTLE ENDIAN exists?
Let's see an example, we have an INT value of 8. Let's store it in LITTLE ENDIAN and look at each byte memory address:
0 1 2 3 (memory address) 0x08 0x00 0x00 0x00 (bytes sequence in memory) 0x00000008 (reconstitued associated hexadecimal value) 8 (corresponding decimal value)
Imagine now we want to convert this int variable into a short or a byte. Because it's stored reversly we don't need to change the starting address of our variable (it means conversions are faster in LITTLE ENDIAN):
0 1 2 3 (memory address) 0x08 0x00 0x00 0x00 (int) 0x08 0x00 (short) 0x08 (byte)
The little endian was used in the first place as you can guess for compatibility reasons when we went from 8bits, to 16bits, to 32bits, to 64bits systems...
Note: also known as HOST BYTE ORDER. It's used by: SCO Unix, DEC Unix (Digital), Microsoft Windows, ...
SWG and bytes orders
- Most (if not all) files of SWG use the LITTLE ENDIAN format to store values.
- Network transmissions also use the LITTLE ENDIAN for most of it.
It will be specified which order is used in the packet breakdown documentations.
This is just a summary of the common information that is assumed to be understood throughout the rest of our documentation. If you have any specific questions, as someone, or use you're best friend Google (or his friend Wikipedia).