Difference between revisions of "Binary & Hex"

From SWGANH Wiki
Jump to: navigation, search
(The data types)
(The data types)
 
(10 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 
__NOTOC__
 
__NOTOC__
  
This guide will help you to understand both packets and files analysis. This understanding is an essential tool to be a successful analyzer.
+
This guide will help you to understand both packet and file analysis. This understanding is an essential tool to be a successful analyzer.
  
 
=The Binary Language=
 
=The Binary Language=
Line 32: Line 32:
 
</pre>
 
</pre>
 
And just add the numbers when the bit is 1, so..
 
And just add the numbers when the bit is 1, so..
 +
 
Example: 0 1 0 1 0 1 1 1 = 64 + 16 + 4 + 2 + 1 = 87
 
Example: 0 1 0 1 0 1 1 1 = 64 + 16 + 4 + 2 + 1 = 87
  
Line 55: Line 56:
  
 
For simplicity of this article, signed (positive and negative) are not covered, as well as floating point numbers. More info here:
 
For simplicity of this article, signed (positive and negative) are not covered, as well as floating point numbers. More info here:
*{{en}}[http://en.wikipedia.org/wiki/Two's_complement Signed numbers : Two's complement]
+
*{{en}} [http://en.wikipedia.org/wiki/Two's_complement Signed numbers : Two's complement]
*{{en}}[http://en.wikipedia.org/wiki/IEEE_754 Floats : IEEE Floating Point]
+
*{{en}} [http://en.wikipedia.org/wiki/IEEE_754 Floats : IEEE Floating Point]
 
+
TIP: Also, if these guides do not suffice or are confusing, use google for better explainations, Wikipedia has A LOT more information on all of the subjects in this document.
+
  
 
=The data types=
 
=The data types=
Line 69: Line 68:
 
!C++ corresponding types
 
!C++ corresponding types
 
!C# corresponding types
 
!C# corresponding types
 +
!SWGANH Alias
 
|-
 
|-
 
|1
 
|1
 
|integer
 
|integer
 
|
 
|
|0=false, else = true
+
|0=false, else=true
 
|bool
 
|bool
 
|bool
 
|bool
 +
|{{bool}}
 
|-
 
|-
 
|1
 
|1
Line 82: Line 83:
 
| -128 to +127 (ascii characters)
 
| -128 to +127 (ascii characters)
 
|char
 
|char
|sbyte
+
|sbyte (NB:Not CLS Compliant)
 +
|{{sbyte}}
 
|-
 
|-
 
|1
 
|1
Line 90: Line 92:
 
|unsigned char
 
|unsigned char
 
|byte
 
|byte
 +
|{{byte}}
 
|-
 
|-
 
|2
 
|2
Line 97: Line 100:
 
|short
 
|short
 
|short
 
|short
 +
|{{short}}
 
|-
 
|-
 
|2
 
|2
Line 104: Line 108:
 
|unsigned short
 
|unsigned short
 
|ushort
 
|ushort
 +
|{{ushort}}
 
|-
 
|-
 
|2
 
|2
Line 111: Line 116:
 
|wchar_t
 
|wchar_t
 
|char
 
|char
 +
|
 
|-
 
|-
 
|4
 
|4
Line 118: Line 124:
 
|int, long
 
|int, long
 
|int
 
|int
 +
|{{int}}
 
|-
 
|-
 
|4
 
|4
Line 125: Line 132:
 
|unsigned int, unsigned long, void*
 
|unsigned int, unsigned long, void*
 
|uint
 
|uint
 +
|{{uint}}
 
|-
 
|-
 
|4
 
|4
Line 132: Line 140:
 
|float
 
|float
 
|float
 
|float
 +
|{{float}}
 
|-
 
|-
 
|8
 
|8
Line 139: Line 148:
 
|long long
 
|long long
 
|long
 
|long
 +
|{{long}}
 
|-
 
|-
 
|8
 
|8
Line 146: Line 156:
 
|unsigned long long
 
|unsigned long long
 
|ulong
 
|ulong
 +
|{{ulong}}
 
|-
 
|-
 
|8
 
|8
Line 153: Line 164:
 
|double
 
|double
 
|double
 
|double
 +
|{{double}}
 
|-
 
|-
 
|n
 
|n
Line 159: Line 171:
 
|ansi/ascii characters
 
|ansi/ascii characters
 
|string, char[]
 
|string, char[]
|string
+
|
 +
|{{a_string}}
 
|-
 
|-
 
|2n
 
|2n
Line 167: Line 180:
 
|wstring, wchar_t[]
 
|wstring, wchar_t[]
 
|string
 
|string
 +
|{{u_string}}
 
|-
 
|-
 
|}
 
|}
Line 194: Line 208:
 
16909060            (corresponding decimal value)
 
16909060            (corresponding decimal value)
 
</pre>
 
</pre>
Why the fuck does LITTLE ENDIAN exists?
+
Why does LITTLE ENDIAN exists?
  
 
Let's see an example, we have an INT value of 8. Let's store it in LITTLE ENDIAN and look at each byte memory address:
 
Let's see an example, we have an INT value of 8. Let's store it in LITTLE ENDIAN and look at each byte memory address:
Line 221: Line 235:
  
 
This is just a summary of the common information that is assumed to be understood throughout the rest of our documentation.  
 
This is just a summary of the common information that is assumed to be understood throughout the rest of our documentation.  
If you have any specific questions, as someone, or use you're best friend Google (or his friend Wikipedia).
+
If you have any specific questions, ask someone, or use you're best friend Google (or his friend Wikipedia).

Latest revision as of 10:18, 28 August 2008


This guide will help you to understand both packet and file analysis. This understanding is an essential tool to be a successful analyzer.

The Binary Language

A computer is made up of millions of transistors. Each one of them can either be on or off depending if the electricity flows through or not. These status are represented by 2 numbers: 0 (off) and 1 (on). This is called the binary language and it's the only thing your computer understands. This language is said to be base 2 because it has only 2 possible digits: 0 and 1. Here a little example:

0 0 1 0 0 1 1 0 1 1 0 1 1 1 0 0 1 1 0...

In binary language a digit is called a bit. Binary language being a bit too hard to read for humans, we decided to group them in packets of 8 bits and to call this packet a byte. The 8 bits of a byte a used and read the same way we do in decimal system. Here are the different values a byte can have:

1 byte = 8 bits  -> corresponding decimal value
0 0 0 0 0 0 0 0               0
0 0 0 0 0 0 0 1               1
0 0 0 0 0 0 1 0               2
0 0 0 0 0 0 1 1               3
0 0 0 0 0 1 0 0               4
...
1 1 1 1 1 1 1 1             255

To calculate an integer number in decimal (the base 10 we commonly use), we add 1 bit to each as a representation of the power of 2. The following powers are used as 2^N, where N is the place holder.

_  _  _  _  _  _  _  _
7  6  5  4  3  2  1  0

Example: 0 0 0 0 1 1 0 1 = 2^3 + 2^2 + 2^0 = 13

To more easily add it up, you can precalculate each 2^N value:

 _  _  _  _  _ _ _ _
128 64 32 16 8 4 2 1 

And just add the numbers when the bit is 1, so..

Example: 0 1 0 1 0 1 1 1 = 64 + 16 + 4 + 2 + 1 = 87

The Hexadecimal Language

The hexadecimal language is a base 16 system. It means each digit has 16 possible value: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F (0 to 15 in decimal). Because 2^4=16 it makes binary reading far more easier. An hexadecimal digit consist of 4 consecutive bits. It means you can write any byte value using 2 hexadecimal digits as follows:

 original bits  -> 4 bits groups -> hexadecimal
0 1 0 1 0 1 1 1      0101 0111         0x57

As you see we added a "0x" prefix to our hexadecimal number. This is the standard used to let readers know that the following digits are hexadecimal digits. 57 is decimal, 0x57 is hexadecimal (87 in decimal).

Another example:

1 1 0 0 1 0 1 1  = 128 + 64 + 8 + 2 + 1 = 203

1100 1011 binary
 12   11  decimal
  C    B  hexadecimal

So 203 = 0xCB in hexadecimal.

TIP: You can use windows calculator to do these convesions quickly. Just change its mode to scientific.

For simplicity of this article, signed (positive and negative) are not covered, as well as floating point numbers. More info here:

The data types

The programmation languages provides various pre defined data types. This array shows what type can be expected from input data:

bytes amount type sign values C++ corresponding types C# corresponding types SWGANH Alias
1 integer 0=false, else=true bool bool BOOL
1 integer signed -128 to +127 (ascii characters) char sbyte (NB:Not CLS Compliant) SBYTE
1 integer unsigned 0 to +255 (ansi characters) unsigned char byte BYTE
2 integer signed -32768 to +32767 short short SHORT


2 integer unsigned 0 to +65535 unsigned short ushort USHORT
2 char unicode character wchar_t char
4 integer signed -2147483648 to +2147483647 int, long int INT


4 integer unsigned 0 to +4294967295 unsigned int, unsigned long, void* uint UINT
4 real signed -3.4*10^-38 to +3.4*10^38 (7 digits) float float FLOAT
8 integer signed -9223372036854775808 to +9223372036854775807 long long long LONG


8 integer unsigned 0 to +18446744073709551615 unsigned long long ulong ULONG
8 real signed -1.7*10^-308 to +1.7*10^308 (15 digits) double double DOUBLE
n char ansi/ascii characters string, char[] A_STRING
2n char unicode characters wstring, wchar_t[] string U_STRING

NOTE: In C++, the long types have a size that varies upon operating system and processor. It has the size of all pointers (void*, int*, ...). If an environment has a 32bits RAM indexation (4GB max) then the longs will be 4 bytes. If the environment has a 64 bits RAM indexation, the longs will be 8 bytes.

Big Endian vs Little Endian (byte orders)

The little story: The term "endian" can be traced to Jonathan Swift's novel "Gulliver's Travels." In one of Gulliver's adventures, he encounters an island whose inhabitants bitterly argue over the correct way to open soft-boiled eggs - the little end or the big end. Little endians and big endians are each convinced that their method is the only correct method for opening an egg.

Big Endian

The BIG ENDIAN byte order is the natural read order: highest weigth digits on the left, and lowest weigth digits on the right.

Example: let's write 16909060 in hexadecimal (0x?? = 1byte = 8bits):

0x01 0x02 0x03 0x04 (bytes sequence in memory)
0x01020304          (reconstitued associated hexadecimal value)
16909060            (corresponding decimal value)

Note: also known as NETWORK BYTE ORDER, it's the standard used for IP communications as defined by the OSI model link layer. Though it does not mean all protocols respect the standard! It's used by: Sun Solaris (SPARC), HP-UX, (Motorola 68000), IBM System/370, ...

Little Endian

The LITTLE ENDIAN byte order is BIG ENDIAN opposite: lowest weigth digits on the left, and highest weigth digits on the right.

Example: let's write again 16909060 in hexadecimal (0x?? = 1byte = 8bits):

0x04 0x03 0x02 0x01 (bytes sequence in memory)
0x01020304          (reconstitued associated hexadecimal value)
16909060            (corresponding decimal value)

Why does LITTLE ENDIAN exists?

Let's see an example, we have an INT value of 8. Let's store it in LITTLE ENDIAN and look at each byte memory address:

   0    1    2    3 (memory address)
0x08 0x00 0x00 0x00 (bytes sequence in memory)
0x00000008          (reconstitued associated hexadecimal value)
8                   (corresponding decimal value)

Imagine now we want to convert this int variable into a short or a byte. Because it's stored reversly we don't need to change the starting address of our variable (it means conversions are faster in LITTLE ENDIAN):

   0    1    2    3 (memory address)
0x08 0x00 0x00 0x00 (int)
0x08 0x00           (short)
0x08                (byte)

The little endian was used in the first place as you can guess for compatibility reasons when we went from 8bits, to 16bits, to 32bits, to 64bits systems...

Note: also known as HOST BYTE ORDER. It's used by: SCO Unix, DEC Unix (Digital), Microsoft Windows, ...

SWG and bytes orders

  • Most (if not all) files of SWG use the LITTLE ENDIAN format to store values.
  • Network transmissions also use the LITTLE ENDIAN for most of it.

It will be specified which order is used in the packet breakdown documentations.

This is just a summary of the common information that is assumed to be understood throughout the rest of our documentation. If you have any specific questions, ask someone, or use you're best friend Google (or his friend Wikipedia).