Endianness. Little Endian vs Big Endian

Endianness. Little Endian vs Big Endian

Endianness describes the byte order in which bytes of large values are stored, transferred and accessed. Based on order in which bytes are stored, transmitted and accessed there are two main types of endianness called little-endian and big-endian .

Before we understand Little Endian and Big Endian we need to understand how memory and it’s addressing work.

Computer Memory Address

Computer memory is generally abstracted as an ordered series of bytes. Memory address is a number which is reference to a specific memory location in which data(by bytes) is stored . Where computer divides memory into bytes, each byte is assigned a unique number which is a memory address so called byte addressable. Whole memory (like RAM) is addressed by bytes regardless of whether it is being used or not for storage. So that CPU can track where data and instructions are stored in RAM by using memory address.

Memory Addressing. Flat Memory Model
Memory Addressing. Flat Memory Model

First memory location(byte) is assigned lowest possible memory address and further locations are addressed with increasing memory address. Large values are divided into bytes and placed in each memory location.

Question: How large values which require more than one byte to store or transmit are arranged ?

Order in which bytes of large values are stored or transmitted over network is described by Endianness or Byte Order .

To better understand byte order. Have a look at terminology Least Significant Byte, Most Significant Byte, Left Most Byte and Right Most Byte.

In any positional number system including binary we count position from right to left. As we write numbers(from left to write) in binary representation of decimal number we call,

  • Byte which is having least positional value is called Least Significant Byte
  • Least Significant Byte can also be called as Right Most Byte since it comes right most as we write from left to right
  • Byte which is having greatest positional value is called Most Significant Byte
  • Most Significant Byte can also be called as Left Most Byte since it comes left most as write from left to right

Following Image illustrates terms Least Significant Byte and Most Significant Byte. By taking number 0x1A2B3C4D which is of size 32 bits.

Bit Numbering. Terms - Least Significant Byte, Most Significant Byte
Bit Numbering. Terms – Least Significant Byte and Most Significant Byte

Least significant byte can also be referred as LSB or LSByte
Most significant byte can also be referred as MSB or MSByte

Endianness

The Origin Of Words Of Endianness

Words Little Endian and Big Endian are originated from the 1726 novel Gulliver’s Travels by Jonathan Swift. In which civil war breaks out over whether the big end or the little end of the egg is proper to crack open. Words Little Endian and Big Endian are used to name two groups who fight over little end and big end of the egg is proper way to break egg receptively .

The origin of words of Endianness. Little Endian and Big Endian
The origin of words of Endianness. Little Endian and Big Endian


The novel further describes an intra-Lilliputian quarrel over the practice of breaking eggs. Traditionally, Lilliputians broke boiled eggs on the larger end; a few generations ago, an Emperor of Lilliput, the Present Emperor’s great-grandfather, had decreed that all eggs be broken on the smaller end after his son cut himself breaking the egg on the larger end. The differences between Big-Endians (those who broke their eggs at the larger end) and Little-Endians had given rise to “six rebellions… wherein one Emperor lost his life, and another his crown”. The Lilliputian religion says an egg should be broken on the convenient end, which is now interpreted by the Lilliputians as the smaller end. The Big-Endians gained favour in Blefuscu.

Excerpt of story from wiki Lilliput and Blefuscu

Endianness. The Byte Order

Endianness is also called as Byte Order which describes order in which bytes of large values are stored in memory or transmitted over network in a transmission protocol or a stream (ex. an audio, video streams).

There are two main endianness formats based on whether Most Significant Byte (MSB) or Least Significant Byte (LSB) comes first, when storing data or transmitting streams. Which are Little-Endian and Big-Endian

  • If Least Significant Byte (LSB) is stored or transmitted first, it is said to be Little-Endian
  • If Most Significant Byte (MSB) is stored or transmitted first, it is said to be Big-Endian

Using words Little-Endian and Big-Endian for byte order is analogous to cracking egg on little end and big end respectively

Endianness - The Byte Order - Little Endian and Big Endian
Endianness – The Byte Order – Little Endian and Big Endian. Cracking Egg Analogy

It is interesting topic in computer science, since these two formats are conflicting. Using wrong byte order may potentially read wrong data or corrupt when writing.

Most Computers prefer to handle all of it’s data in single byte order. Systems native byte order is default

You may not need to worry about endianness and endianness conversion unless you work with assembly, drivers, middle level languages like c, embedded systems and low level networking transports. High level programming languages will take care of it for you.

Endianness - Little Endian vs Big Endian
Endianness – Little Endian vs Big Endian

Little Endian

Like Little-Endians crack egg on little end, least significant byte is stored or transmitted over network first in Little-Endian format. Little endian format is also called as host byte order.

Little-Endian is also known as the “Intel convention”. As Intel used little-endian format from it’s earlier processors.

  • Least significant byte is stored or transmitted first
  • When bytes of large values are stored in memory “Least Significant Byte” is placed first in memory at lowest memory address
  • Numeric significance or positional value is increased with increasing memory address
  • Easier mathematical computation. Mathematical operations mostly work from LSB

Little Endian
Little-Endian

Big Endian

Like Big-Endians crack egg on big end, most significant byte is stored or transmitted over network first in Big-Endian format. Big endian format is also called as network byte order.

Over decades most of processors took transition to little endian at present many processors(laptops, desktops and servers) come in little endian. Yet, big endian is widely used for network transmission. So it is called “network byte order” .

Big-Endian is also known as the “Motorola convention”.Back then as it was widely used by Motorola processors.

  • Most significant byte is stored or transmitted first
  • When bytes of large values are stored in memory “Most Significant Byte” is placed first in memory at lowest memory address
  • Numeric significance or positional value is decreased with increasing memory address
  • Quick sign checking. Since MSB stores first.
  • Better human readability as it is one to one mapping. Bytes are in the same order as we write.
Big Endian
Big-Endian

Check Native Endianness (Native Byte Order)

Check native “byte order” of system using linux command line (terminal)

You can use the command “lscpu” to get CPU information including it’s byte order

# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               158
Model name:          Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
Stepping:            9
CPU MHz:             2808.002
BogoMIPS:            5616.00
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            6144K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase avx2 invpcid rdseed clflushopt

As you can see endianness(byte order) in the above output “Byte Order: Little Endian“.

Check native byte order using python programming language

Python sys module has attribute called “byteorder” where it’s value describes native byte order of the system .

  • sys.byteorder will be “big” if it’s big-endian (most significant byte first) platform
  • sys.byteorder will be “little” if it is little-endian (least significant byte first) platform
>>> import sys
>>> sys.byteorder
little

Check native byte order of system using java

In java “java.nio.ByteOrder” provides flags “BIG_ENDIAN” and “LITTLE_ENDIAN” to denote Big-Endian and Little-Endian respectively. It has the static method “nativeOrder()” to get system’s native byte order.

java.nio.ByteOrder.nativeOrder()

Check endianness of system using C program

#include <stdio.h>
int main()
{
unsigned int i = 1;
char *c = (char*)&i;
if (*c)
printf("Little Endian System");
else
printf("Big Endian System");
getchar();
return 0;
}

Above c program prints Endianess of the system. In the above program we have used character pointer “c” to dereference integer “i“. Since size of character is 1 byte it will contain only first byte of integer.

  • *c will return 1 which is Least Significant Byte on Little-Endian machine.
  • *c will return 0 which is Most Significant Byte on Big-Endian machine

Following “c” program prints memory representation of given 32-bit integer. You can see byte order in action, order in which given 32-bit integer is stored in memory either by Little-Endian or Big-Endian.

Above program will print ” 4d 3c 2b 1a” if machine is Little-Endian (Least Significant Byte Fist) or “1a 2b 3c 4d” if machien is Big-Endian (Most Significant Byte Fist)

Convert Endianness

It is required sometimes to convert endianness. You don’t have to worry about converting endiannes if you are working application level. For data exchange between machines common format is used. Yet if machine is little endian often data sent over network is big endian. Big-Endian is standard byte order for network communication. In such a case we have to convert data that we receive in Big-Endian to host fomat little endian and when we send data back to network need to convert Little-Endian to Big-Endian. There are many utility function to perform this conversion.

Following function are available in c to help endianness conversion

htons() – Host to Network Short
htonl() – Host to Network Long
ntohs() – Network to Host Short
ntohl() – Network to Host Long

Python library NumPy provides way to convert endianness. NumPy library uses c types and low level data structures behind.

How Systems Of Different Endianness Interact

Files that are created on little-endian machine won’t be read as they were created on big-endian machines without conversion. This is the reason, some programs binary file formats which are endianness independent. Where files are endianness dependent byte order marks are used in headers of file to hint a program endianness of the file.

Example: TIFF files use MM and II in header (begning of file) for big-endian and little-endian respectively. Where MM and II are palindromes these are endianness independent, interpreted same on both little-endian and big-endian machines.

Unicode uses byte order mark in the beginning of the stream to indicated endianness of following unicode bytes.

Endianness problems. Who needs to worry about byte order

You don’t need to worry much if you work with application layer protocols or high level programming languages, but it is good to know.

Think what’s the out put of following program

#include <stdio.h>

void foo(void *d){
printf("Data = %d\n", *(char *)d);
}

int main(void){
unsigned int i = 1;
foo(&i);
}

Summary

  • Words Little-Endian and Big-Endian came from 1726 novel Gulliver’s Travels by Jonathan Swift
  • System is said to be Little-Endian if it stores or transmits Least Significant Byte First
  • System is said to be Big-Endian if it stores or transmits Most Significant Byte First
  • Command “lscpu” on linux will print CPU information including Byte Order (Endianness)
  • Currently m
  • ost of processors are little endian
  • Big Endian is mostly used in network protocols so called network byte order .
  • You don’t have to worry about byte order (endianness) compatibility, system will take care of it behind the senses. Unless you work with assembly, c/c++ language, drivers, embedded system and low level network transports.
  • Choose One endianness and stick to it to avoid obscured bugs.