Difference between revisions of "Binary & Hex"

From SWGANH Wiki
Jump to: navigation, search
(LITTLE ENDIAN)
(BIG ENDIAN vs LITTLE ENDIAN (byte order))
Line 97: Line 97:
  
 
=BIG ENDIAN vs LITTLE ENDIAN (byte order)=
 
=BIG ENDIAN vs LITTLE ENDIAN (byte order)=
 +
The little story: The term "endian" can be traced to Jonathan Swift's novel "Gulliver's Travels." In one of Gulliver's adventures, he encounters an island whose inhabitants bitterly argue over the correct way to open soft-boiled eggs - the little end or the big end. Little endians and big endians are each convinced that their method is the only correct method for opening an egg.
 
==BIG ENDIAN==
 
==BIG ENDIAN==
 
The BIG ENDIAN byte order is the natural read order: highest weigth digits on the left, and lowest weigth digits on the right.
 
The BIG ENDIAN byte order is the natural read order: highest weigth digits on the left, and lowest weigth digits on the right.
Line 107: Line 108:
 
</pre>
 
</pre>
  
Note: also known as NETWORK BYTE ORDER, it's the standard network order. Though it does not mean all protocols respect the standard! This order is mainly used on RISC processors, various Unix OS, Sun Solaris, ...
+
Note: also known as NETWORK BYTE ORDER, it's the standard used for IP communications as defined by the OSI model link layer. Though it does not mean all protocols respect the standard! It's used by: Sun Solaris (SPARC), HP-UX, (Motorola 68000), IBM System/370, ...
  
 
==LITTLE ENDIAN==
 
==LITTLE ENDIAN==
Line 136: Line 137:
 
The little endian was used in the first place as you can guess for compatibility reasons when we went from 8bits, to 16bits, to 32bits, to 64bits systems...
 
The little endian was used in the first place as you can guess for compatibility reasons when we went from 8bits, to 16bits, to 32bits, to 64bits systems...
  
Note: also known as HOST BYTE ORDER. It's mainly used in x86 processors and Windows.
+
Note: also known as HOST BYTE ORDER. It's used by: SCO Unix, DEC Unix (Digital), Microsoft Windows, ...
  
 
==SWG and bytes orders==
 
==SWG and bytes orders==

Revision as of 04:04, 15 March 2007


This page is a summary of basic binary and hexadecimal number systems to help understand packet analysis. These skills are essential tools to be a successful analyzer.

BINARY NUMBERS

Binary numbers are a base 2 number system that consist of 0's and 1's, which usually represent voltage, or no voltage on a computer system.

0 0 0 0 0 0 0 0 0    

Each number is commonly called a "bit". 8 bits is equal to a single byte, which is the most common form of data. All data is made up of bytes.

To calculate an integer number in decimal (base 10, what we commonly use), we add each 1 bit as a representation of the power of 2. The following powers are used as 2 ^ N, where N is the place holder.

_  _  _  _  _  _  _  _
7  6  5  4  3  2  1  0

so 0 0 0 0 1 1 0 1 is equal to 2^0 + 2^2 + 2^3 = 13

to more easily add it up, think of:

 _  _  _  _  _ _ _ _
128 64 32 16 8 4 2 1 

and just add the numbers when the bit is 1, so..

0 1 0 1 0 1 1 1 is.. 64 + 16 + 4 + 2 + 1 = 87

HEXADECIMAL NUMBERS

Because binary numbers can get VERY long, hexadecimal, a base 16 number system is used. Because 16 is a power of 2, it works out nicely. Intead of 0-9 like decimal uses, hexadecimal uses 0-F, which is just 0-9 with the additon of A B C D E F as digits. Most data viewed on computers is in hexadecimal. To convert from binary to hexadecimal and vice versa, use the following method:

Each digit represents 1/2 of a byte. so 2 hexacedimal numbers is equal to 1 byte. So devide your "bits" into two groups. Lets use the example from above.

0 1 0 1     0 1 1 1

Now, add them up seperatly:

 5             7

and there you go, your hexadecimal representation is 0x57. 0x is a common prefix to writing the numbers to let readers know it is base 16.0x57 is not equal to 57, rather it is equal to 87. So where do the letters come in?

Well lets do another example:

1 1 0 0 1 0 1 1  = 128 + 64 + 8 + 2 + 1 = 203

1 1 0 0   1 0 1 1  BINARY
 12        11      DECIMAL
 C         B       HEX

so 203 is equal to 0xCB in hexadecimal.

TIP: You can use windows calculator to do these convesions quickly. Just change the mode to scientific.

For simplicity of this article, signed (positive and negative) are not covered, as well as floating point numbers.

You can google the following examples for further research.

Signed numbers : "Two's complement" Floats : "IEEE Floating Point"

TIP: Also, if these guides do not suffice or are confusing, use google for better explainations, Wikipedia has A LOT more information on all of the subjects in this document.

DATA TYPES

C++ uses the following data types: These are the sizes in bytes they take to store, and these units are commonly used throughout the rest of our documentation. These apply for both signed and unsigned (positive and negative)

CHAR		1 Byte		(8 bit integer)
SHORT		2 Bytes		(16 bit integer)
INT		4 Bytes		(32 bit integer)
LONG		8 Bytes		(64 bit integer)
FLOAT		4 BYtes		(32 bit floating point)

NOTE: LONG is commonly represented as "long long" while "long" by itself is interpreted as a 32 bit (4 byte) int on GCC and other ANSI standard compilers. For the sake of shortness however, we will refer "LONG" as a 64 bit (8 byte) integers.

NOTE: long type has a size that varies depending on operating system and processor. It has the size of all pointers (void*, int*, ...). If an environment has a 32bits RAM indexation (4GB max) then the long will be 4 bytes. If the environment has a 64 bits RAM indexation, the long will be 8 bytes.

TIP: For information on maximum numerical storage capacity of each data type as signed or unsigned, use google, it is readily available on the internet.

BIG ENDIAN vs LITTLE ENDIAN (byte order)

The little story: The term "endian" can be traced to Jonathan Swift's novel "Gulliver's Travels." In one of Gulliver's adventures, he encounters an island whose inhabitants bitterly argue over the correct way to open soft-boiled eggs - the little end or the big end. Little endians and big endians are each convinced that their method is the only correct method for opening an egg.

BIG ENDIAN

The BIG ENDIAN byte order is the natural read order: highest weigth digits on the left, and lowest weigth digits on the right.

Example: let's write 16909060 in hexadecimal (0x?? = 1byte = 8bits):

0x01 0x02 0x03 0x04 (bytes sequence in memory)
0x01020304          (reconstitued associated hexadecimal value)
16909060            (corresponding decimal value)

Note: also known as NETWORK BYTE ORDER, it's the standard used for IP communications as defined by the OSI model link layer. Though it does not mean all protocols respect the standard! It's used by: Sun Solaris (SPARC), HP-UX, (Motorola 68000), IBM System/370, ...

LITTLE ENDIAN

The LITTLE ENDIAN byte order is BIG ENDIAN opposite: lowest weigth digits on the left, and highest weigth digits on the right.

Example: let's write again 16909060 in hexadecimal (0x?? = 1byte = 8bits):

0x04 0x03 0x02 0x01 (bytes sequence in memory)
0x01020304          (reconstitued associated hexadecimal value)
16909060            (corresponding decimal value)

Why the fuck does LITTLE ENDIAN exists?

Let's see an example, we have an INT value of 8. Let's store it in LITTLE ENDIAN and look at each byte memory address:

   0    1    2    3 (memory address)
0x08 0x00 0x00 0x00 (bytes sequence in memory)
0x00000008          (reconstitued associated hexadecimal value)
8                   (corresponding decimal value)

Imagine now we want to convert this int variable into a short or a byte. Because it's stored reversly we don't need to change the starting address of our variable (it means conversions are faster in LITTLE ENDIAN):

   0    1    2    3 (memory address)
0x08 0x00 0x00 0x00 (int)
0x08 0x00           (short)
0x08                (byte)

The little endian was used in the first place as you can guess for compatibility reasons when we went from 8bits, to 16bits, to 32bits, to 64bits systems...

Note: also known as HOST BYTE ORDER. It's used by: SCO Unix, DEC Unix (Digital), Microsoft Windows, ...

SWG and bytes orders

  • Most (if not all) files of SWG use the LITTLE ENDIAN format to store values.
  • Network transmissions also use the LITTLE ENDIAN for most of it.

It will be specified which order is used in the packet breakdown documentations.

This is just a summary of the common information that is assumed to be understood throughout the rest of our documentation. If you have any specific questions, as someone, or use you're best friend Google (or his friend Wikipedia).