Difference between revisions of "Binary & Hex"

From SWGANH Wiki
Jump to: navigation, search
(BIG ENDIAN vs LITTLE ENDIAN (byte order))
(The data types)
 
(18 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 
__NOTOC__
 
__NOTOC__
  
This page is a summary of basic binary and hexadecimal number systems to help understand packet analysis. These skills are essential tools to be a successful analyzer.
+
This guide will help you to understand both packet and file analysis. This understanding is an essential tool to be a successful analyzer.
 
+
=BINARY NUMBERS=
+
 
+
Binary numbers are a base 2 number system that consist of 0's and 1's, which usually represent voltage, or no voltage on a computer system.
+
  
 +
=The Binary Language=
 +
A computer is made up of millions of transistors. Each one of them can either be '''on''' or '''off''' depending if the electricity flows through or not. These status are represented by 2 numbers: 0 (off) and 1 (on). This is called the binary language and it's the only thing your computer understands. This language is said to be '''base 2''' because it has only 2 possible digits: 0 and 1. Here a little example:
 
<pre>
 
<pre>
0 0 0 0 0 0 0 0 0  
+
0 0 1 0 0 1 1 0 1 1 0 1 1 1 0 0 1 1 0...
 
</pre>
 
</pre>
 
+
In binary language a digit is called a '''bit'''. Binary language being a bit too hard to read for humans, we decided to group them in packets of 8 bits and to call this packet a '''byte'''. The 8 bits of a byte a used and read the same way we do in decimal system. Here are the different values a byte can have:
Each number is commonly called a "bit". 8 bits is equal to a single byte, which is the most common form of data. All data is made up of bytes.
+
<pre>
 
+
1 byte = 8 bits  -> corresponding decimal value
To calculate an integer number in decimal (base 10, what we commonly use), we add each 1 bit as a representation of the power of 2. The following powers are used as 2 ^ N, where N is the place holder.
+
0 0 0 0 0 0 0 0              0
 
+
0 0 0 0 0 0 0 1              1
 +
0 0 0 0 0 0 1 0              2
 +
0 0 0 0 0 0 1 1              3
 +
0 0 0 0 0 1 0 0              4
 +
...
 +
1 1 1 1 1 1 1 1            255
 +
</pre>
 +
To calculate an integer number in decimal (the base 10 we commonly use), we add 1 bit to each as a representation of the power of 2. The following powers are used as 2^N, where N is the place holder.
 
<pre>
 
<pre>
 
_  _  _  _  _  _  _  _
 
_  _  _  _  _  _  _  _
 
7  6  5  4  3  2  1  0
 
7  6  5  4  3  2  1  0
 
</pre>
 
</pre>
 +
Example: 0 0 0 0 1 1 0 1 = 2^3 + 2^2 + 2^0 = 13
  
so 0 0 0 0 1 1 0 1 is equal to 2^0 + 2^2 + 2^3 = 13
+
To more easily add it up, you can precalculate each 2^N value:
 
+
to more easily add it up, think of:
+
 
+
 
<pre>
 
<pre>
 
  _  _  _  _  _ _ _ _
 
  _  _  _  _  _ _ _ _
 
128 64 32 16 8 4 2 1  
 
128 64 32 16 8 4 2 1  
 
</pre>
 
</pre>
 +
And just add the numbers when the bit is 1, so..
  
and just add the numbers when the bit is 1, so..
+
Example: 0 1 0 1 0 1 1 1 = 64 + 16 + 4 + 2 + 1 = 87
  
 +
=The Hexadecimal Language=
 +
The hexadecimal language is a base 16 system. It means each digit has 16 possible value: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F (0 to 15 in decimal). Because 2^4=16 it makes binary reading far more easier. An hexadecimal digit consist of 4 consecutive bits. It means you can write any byte value using 2 hexadecimal digits as follows:
 
<pre>
 
<pre>
0 1 0 1 0 1 1 1 is.. 64 + 16 + 4 + 2 + 1 = 87
+
original bits  -> 4 bits groups -> hexadecimal
 +
0 1 0 1 0 1 1 1     0101 0111        0x57
 
</pre>
 
</pre>
 +
As you see we added a "0x" prefix to our hexadecimal number. This is the standard used to let readers know that the following digits are hexadecimal digits. 57 is decimal, 0x57 is hexadecimal (87 in decimal).
  
=HEXADECIMAL NUMBERS=
+
Another example:
 
+
Because binary numbers can get VERY long, hexadecimal, a base 16 number system is used. Because 16 is a power of 2, it works out nicely. Intead of 0-9 like decimal uses, hexadecimal uses 0-F, which is just 0-9 with the additon of A B C D E F as digits. Most data viewed on computers is in hexadecimal. To convert from binary to hexadecimal and vice versa, use the following method:
+
 
+
Each digit represents 1/2 of a byte. so 2 hexacedimal numbers is equal to 1 byte. So devide your "bits" into two groups. Lets use the example from above.
+
 
+
<pre>
+
0 1 0 1    0 1 1 1
+
</pre>
+
 
+
Now, add them up seperatly:
+
 
+
<pre>
+
5            7
+
</pre>
+
 
+
and there you go, your hexadecimal representation is 0x57. 0x is a common prefix to writing the numbers to let readers know it is base 16.0x57 is not equal to 57, rather it is equal to 87. So where do the letters come in?
+
 
+
Well lets do another example:
+
 
+
 
<pre>
 
<pre>
 
1 1 0 0 1 0 1 1  = 128 + 64 + 8 + 2 + 1 = 203
 
1 1 0 0 1 0 1 1  = 128 + 64 + 8 + 2 + 1 = 203
  
1 1 0 0  1 0 1 1  BINARY
+
1100 1011 binary
  12       11     DECIMAL
+
  12   11 decimal
C         B       HEX
+
  C   B hexadecimal
</pre>
+
 
+
so 203 is equal to 0xCB in hexadecimal.
+
 
+
'''TIP''': <font color=orange>''You can use windows calculator to do these convesions quickly. Just change the mode to scientific.''</font>
+
 
+
For simplicity of this article, signed (positive and negative) are not covered, as well as floating point numbers.
+
 
+
You can google the following examples for further research.
+
 
+
Signed numbers : "Two's complement"
+
Floats : "IEEE Floating Point"
+
 
+
TIP: Also, if these guides do not suffice or are confusing, use google for better explainations,
+
Wikipedia has A LOT more information on all of the subjects in this document.
+
 
+
=DATA TYPES=
+
 
+
C++ uses the following data types: These are the sizes in bytes they take to store, and these units are commonly
+
used throughout the rest of our documentation. These apply for both signed and unsigned (positive and negative)
+
 
+
<pre>
+
CHAR 1 Byte (8 bit integer)
+
SHORT 2 Bytes (16 bit integer)
+
INT 4 Bytes (32 bit integer)
+
LONG 8 Bytes (64 bit integer)
+
FLOAT 4 BYtes (32 bit floating point)
+
 
</pre>
 
</pre>
 +
So 203 = 0xCB in hexadecimal.
  
'''NOTE''': <font color=orange>''LONG is commonly represented as "long long" while "long" by itself is interpreted as a 32 bit (4 byte) int on GCC and other ANSI standard compilers. For the sake of shortness however, we will refer "LONG" as a 64 bit (8 byte) integers.</font>''
+
'''TIP''': <font color=orange>''You can use windows calculator to do these convesions quickly. Just change its mode to scientific.''</font>
  
'''NOTE''': <font color=red>''long type has a size that varies depending on operating system and processor. It has the size of all pointers (void*, int*, ...). If an environment has a 32bits RAM indexation (4GB max) then the long will be 4 bytes. If the environment has a 64 bits RAM indexation, the long will be 8 bytes.</font>''
+
For simplicity of this article, signed (positive and negative) are not covered, as well as floating point numbers. More info here:
 +
*{{en}} [http://en.wikipedia.org/wiki/Two's_complement Signed numbers : Two's complement]
 +
*{{en}} [http://en.wikipedia.org/wiki/IEEE_754 Floats : IEEE Floating Point]
  
'''TIP''': <font color=green>''For information on maximum numerical storage capacity of each data type as signed or unsigned, use google, it is readily available on the internet.</font>''
+
=The data types=
 +
The programmation languages provides various pre defined data types. This array shows what type can be expected from input data:
 +
{| class="wikitable"
 +
!bytes amount
 +
!type
 +
!sign
 +
!values
 +
!C++ corresponding types
 +
!C# corresponding types
 +
!SWGANH Alias
 +
|-
 +
|1
 +
|integer
 +
|
 +
|0=false, else=true
 +
|bool
 +
|bool
 +
|{{bool}}
 +
|-
 +
|1
 +
|integer
 +
|signed
 +
| -128 to +127 (ascii characters)
 +
|char
 +
|sbyte (NB:Not CLS Compliant)
 +
|{{sbyte}}
 +
|-
 +
|1
 +
|integer
 +
|unsigned
 +
|0 to +255 (ansi characters)
 +
|unsigned char
 +
|byte
 +
|{{byte}}
 +
|-
 +
|2
 +
|integer
 +
|signed
 +
| -32768 to +32767
 +
|short
 +
|short
 +
|{{short}}
 +
|-
 +
|2
 +
|integer
 +
|unsigned
 +
|0 to +65535
 +
|unsigned short
 +
|ushort
 +
|{{ushort}}
 +
|-
 +
|2
 +
|char
 +
|
 +
|unicode character
 +
|wchar_t
 +
|char
 +
|
 +
|-
 +
|4
 +
|integer
 +
|signed
 +
| -2147483648 to +2147483647
 +
|int, long
 +
|int
 +
|{{int}}
 +
|-
 +
|4
 +
|integer
 +
|unsigned
 +
|0 to +4294967295
 +
|unsigned int, unsigned long, void*
 +
|uint
 +
|{{uint}}
 +
|-
 +
|4
 +
|real
 +
|signed
 +
| -3.4*10^-38 to +3.4*10^38 (7 digits)
 +
|float
 +
|float
 +
|{{float}}
 +
|-
 +
|8
 +
|integer
 +
|signed
 +
| -9223372036854775808 to +9223372036854775807
 +
|long long
 +
|long
 +
|{{long}}
 +
|-
 +
|8
 +
|integer
 +
|unsigned
 +
|0 to +18446744073709551615
 +
|unsigned long long
 +
|ulong
 +
|{{ulong}}
 +
|-
 +
|8
 +
|real
 +
|signed
 +
| -1.7*10^-308 to +1.7*10^308 (15 digits)
 +
|double
 +
|double
 +
|{{double}}
 +
|-
 +
|n
 +
|char
 +
|
 +
|ansi/ascii characters
 +
|string, char[]
 +
|
 +
|{{a_string}}
 +
|-
 +
|2n
 +
|char
 +
|
 +
|unicode characters
 +
|wstring, wchar_t[]
 +
|string
 +
|{{u_string}}
 +
|-
 +
|}
 +
'''NOTE''': <font color=red>''In C++, the long types have a size that varies upon operating system and processor. It has the size of all pointers (void*, int*, ...). If an environment has a 32bits RAM indexation (4GB max) then the longs will be 4 bytes. If the environment has a 64 bits RAM indexation, the longs will be 8 bytes.</font>''
  
=BIG ENDIAN vs LITTLE ENDIAN (byte order)=
+
=Big Endian vs Little Endian (byte orders)=
 
The little story: The term "endian" can be traced to Jonathan Swift's novel "Gulliver's Travels." In one of Gulliver's adventures, he encounters an island whose inhabitants bitterly argue over the correct way to open soft-boiled eggs - the little end or the big end. Little endians and big endians are each convinced that their method is the only correct method for opening an egg.
 
The little story: The term "endian" can be traced to Jonathan Swift's novel "Gulliver's Travels." In one of Gulliver's adventures, he encounters an island whose inhabitants bitterly argue over the correct way to open soft-boiled eggs - the little end or the big end. Little endians and big endians are each convinced that their method is the only correct method for opening an egg.
==BIG ENDIAN==
+
==Big Endian==
 
The BIG ENDIAN byte order is the natural read order: highest weigth digits on the left, and lowest weigth digits on the right.
 
The BIG ENDIAN byte order is the natural read order: highest weigth digits on the left, and lowest weigth digits on the right.
  
Line 110: Line 199:
 
Note: also known as NETWORK BYTE ORDER, it's the standard used for IP communications as defined by the OSI model link layer. Though it does not mean all protocols respect the standard! It's used by: Sun Solaris (SPARC), HP-UX, (Motorola 68000), IBM System/370, ...
 
Note: also known as NETWORK BYTE ORDER, it's the standard used for IP communications as defined by the OSI model link layer. Though it does not mean all protocols respect the standard! It's used by: Sun Solaris (SPARC), HP-UX, (Motorola 68000), IBM System/370, ...
  
==LITTLE ENDIAN==
+
==Little Endian==
 
The LITTLE ENDIAN byte order is BIG ENDIAN opposite: lowest weigth digits on the left, and highest weigth digits on the right.
 
The LITTLE ENDIAN byte order is BIG ENDIAN opposite: lowest weigth digits on the left, and highest weigth digits on the right.
  
Line 119: Line 208:
 
16909060            (corresponding decimal value)
 
16909060            (corresponding decimal value)
 
</pre>
 
</pre>
Why the fuck does LITTLE ENDIAN exists?
+
Why does LITTLE ENDIAN exists?
  
 
Let's see an example, we have an INT value of 8. Let's store it in LITTLE ENDIAN and look at each byte memory address:
 
Let's see an example, we have an INT value of 8. Let's store it in LITTLE ENDIAN and look at each byte memory address:
Line 146: Line 235:
  
 
This is just a summary of the common information that is assumed to be understood throughout the rest of our documentation.  
 
This is just a summary of the common information that is assumed to be understood throughout the rest of our documentation.  
If you have any specific questions, as someone, or use you're best friend Google (or his friend Wikipedia).
+
If you have any specific questions, ask someone, or use you're best friend Google (or his friend Wikipedia).

Latest revision as of 10:18, 28 August 2008


This guide will help you to understand both packet and file analysis. This understanding is an essential tool to be a successful analyzer.

The Binary Language

A computer is made up of millions of transistors. Each one of them can either be on or off depending if the electricity flows through or not. These status are represented by 2 numbers: 0 (off) and 1 (on). This is called the binary language and it's the only thing your computer understands. This language is said to be base 2 because it has only 2 possible digits: 0 and 1. Here a little example:

0 0 1 0 0 1 1 0 1 1 0 1 1 1 0 0 1 1 0...

In binary language a digit is called a bit. Binary language being a bit too hard to read for humans, we decided to group them in packets of 8 bits and to call this packet a byte. The 8 bits of a byte a used and read the same way we do in decimal system. Here are the different values a byte can have:

1 byte = 8 bits  -> corresponding decimal value
0 0 0 0 0 0 0 0               0
0 0 0 0 0 0 0 1               1
0 0 0 0 0 0 1 0               2
0 0 0 0 0 0 1 1               3
0 0 0 0 0 1 0 0               4
...
1 1 1 1 1 1 1 1             255

To calculate an integer number in decimal (the base 10 we commonly use), we add 1 bit to each as a representation of the power of 2. The following powers are used as 2^N, where N is the place holder.

_  _  _  _  _  _  _  _
7  6  5  4  3  2  1  0

Example: 0 0 0 0 1 1 0 1 = 2^3 + 2^2 + 2^0 = 13

To more easily add it up, you can precalculate each 2^N value:

 _  _  _  _  _ _ _ _
128 64 32 16 8 4 2 1 

And just add the numbers when the bit is 1, so..

Example: 0 1 0 1 0 1 1 1 = 64 + 16 + 4 + 2 + 1 = 87

The Hexadecimal Language

The hexadecimal language is a base 16 system. It means each digit has 16 possible value: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F (0 to 15 in decimal). Because 2^4=16 it makes binary reading far more easier. An hexadecimal digit consist of 4 consecutive bits. It means you can write any byte value using 2 hexadecimal digits as follows:

 original bits  -> 4 bits groups -> hexadecimal
0 1 0 1 0 1 1 1      0101 0111         0x57

As you see we added a "0x" prefix to our hexadecimal number. This is the standard used to let readers know that the following digits are hexadecimal digits. 57 is decimal, 0x57 is hexadecimal (87 in decimal).

Another example:

1 1 0 0 1 0 1 1  = 128 + 64 + 8 + 2 + 1 = 203

1100 1011 binary
 12   11  decimal
  C    B  hexadecimal

So 203 = 0xCB in hexadecimal.

TIP: You can use windows calculator to do these convesions quickly. Just change its mode to scientific.

For simplicity of this article, signed (positive and negative) are not covered, as well as floating point numbers. More info here:

The data types

The programmation languages provides various pre defined data types. This array shows what type can be expected from input data:

bytes amount type sign values C++ corresponding types C# corresponding types SWGANH Alias
1 integer 0=false, else=true bool bool BOOL
1 integer signed -128 to +127 (ascii characters) char sbyte (NB:Not CLS Compliant) SBYTE
1 integer unsigned 0 to +255 (ansi characters) unsigned char byte BYTE
2 integer signed -32768 to +32767 short short SHORT


2 integer unsigned 0 to +65535 unsigned short ushort USHORT
2 char unicode character wchar_t char
4 integer signed -2147483648 to +2147483647 int, long int INT


4 integer unsigned 0 to +4294967295 unsigned int, unsigned long, void* uint UINT
4 real signed -3.4*10^-38 to +3.4*10^38 (7 digits) float float FLOAT
8 integer signed -9223372036854775808 to +9223372036854775807 long long long LONG


8 integer unsigned 0 to +18446744073709551615 unsigned long long ulong ULONG
8 real signed -1.7*10^-308 to +1.7*10^308 (15 digits) double double DOUBLE
n char ansi/ascii characters string, char[] A_STRING
2n char unicode characters wstring, wchar_t[] string U_STRING

NOTE: In C++, the long types have a size that varies upon operating system and processor. It has the size of all pointers (void*, int*, ...). If an environment has a 32bits RAM indexation (4GB max) then the longs will be 4 bytes. If the environment has a 64 bits RAM indexation, the longs will be 8 bytes.

Big Endian vs Little Endian (byte orders)

The little story: The term "endian" can be traced to Jonathan Swift's novel "Gulliver's Travels." In one of Gulliver's adventures, he encounters an island whose inhabitants bitterly argue over the correct way to open soft-boiled eggs - the little end or the big end. Little endians and big endians are each convinced that their method is the only correct method for opening an egg.

Big Endian

The BIG ENDIAN byte order is the natural read order: highest weigth digits on the left, and lowest weigth digits on the right.

Example: let's write 16909060 in hexadecimal (0x?? = 1byte = 8bits):

0x01 0x02 0x03 0x04 (bytes sequence in memory)
0x01020304          (reconstitued associated hexadecimal value)
16909060            (corresponding decimal value)

Note: also known as NETWORK BYTE ORDER, it's the standard used for IP communications as defined by the OSI model link layer. Though it does not mean all protocols respect the standard! It's used by: Sun Solaris (SPARC), HP-UX, (Motorola 68000), IBM System/370, ...

Little Endian

The LITTLE ENDIAN byte order is BIG ENDIAN opposite: lowest weigth digits on the left, and highest weigth digits on the right.

Example: let's write again 16909060 in hexadecimal (0x?? = 1byte = 8bits):

0x04 0x03 0x02 0x01 (bytes sequence in memory)
0x01020304          (reconstitued associated hexadecimal value)
16909060            (corresponding decimal value)

Why does LITTLE ENDIAN exists?

Let's see an example, we have an INT value of 8. Let's store it in LITTLE ENDIAN and look at each byte memory address:

   0    1    2    3 (memory address)
0x08 0x00 0x00 0x00 (bytes sequence in memory)
0x00000008          (reconstitued associated hexadecimal value)
8                   (corresponding decimal value)

Imagine now we want to convert this int variable into a short or a byte. Because it's stored reversly we don't need to change the starting address of our variable (it means conversions are faster in LITTLE ENDIAN):

   0    1    2    3 (memory address)
0x08 0x00 0x00 0x00 (int)
0x08 0x00           (short)
0x08                (byte)

The little endian was used in the first place as you can guess for compatibility reasons when we went from 8bits, to 16bits, to 32bits, to 64bits systems...

Note: also known as HOST BYTE ORDER. It's used by: SCO Unix, DEC Unix (Digital), Microsoft Windows, ...

SWG and bytes orders

  • Most (if not all) files of SWG use the LITTLE ENDIAN format to store values.
  • Network transmissions also use the LITTLE ENDIAN for most of it.

It will be specified which order is used in the packet breakdown documentations.

This is just a summary of the common information that is assumed to be understood throughout the rest of our documentation. If you have any specific questions, ask someone, or use you're best friend Google (or his friend Wikipedia).