Monday, December 25, 2006

Exchangeable image file format for Digital Still Cameras: Exif

After some google searches I concluded that doesn't exist a python library that’s able me to manipulate some data in a JPEG Exif. I need this, cause I’m involved in a project called Syncropated, and this software wants to embed a thumbnail in JPEGs files. As you can see at Exif_2-1_V1.PDF, section 2.5.5 it’s possible.

But if I will code something to write this thumbnail, why do not put some code to parse data and something else?

Below you can read how the Exif works and in another post I will talk more about the Python Exif library that I’m coding.

First we want to know if the file is a JPEG or not, so we can simple check the two first bytes of the file. If the byte[0] equal then ‘FF’ and the second one is ‘D8′ (Figure 1, ) the file can be considered a JPEG candidate. To be a JPEG, it must follow others rules shown below. JPEG files that has Exif must have the word `Exif` in the header. To be more specific these words are supposed to start at the 6th byte of the file, forming a sequence like the green one () one that you can see at Figure 1.

The 49 49 (Figure 1, ) represent the ordering of the data sequences, if its is Big-endian or Little-endian. Little-endian is represented by `II` (Intel format) and Big-endian is `MM` (Motorola format).


Figure 1. JPEG File raw bytes.


Other important thing to know about Exit is the organization of the data, Exif is divided in:

  • JPEG HEADER
  • 0th IFD
  • 0th IFD Values
  • 1st IFD
  • 1st IFD Values
  • 1st Thumbnail – Image Data
  • 0th (Primary) – Image Data
An IFD (Image File Directory) is used to store tags, with values and data types. IFDs works like a chained lists. The IF0 points to ID1 and so so…

All IFDs have the same structure, the first two bytes represent the number of tags in the directories and all tags in IFD is stored in the same way:

  • Tag (Bytes 0 – 1)
  • Type (Bytes 2 – 3)
  • Count (Bytes 4 – 7)
  • Value Offset (bytes 7 – 11)

If the value can be represented in 4 bytes or smaller, it will be saved in the offset space else the offset will point the data and the “x” IFD value will be used.
The pointer to the next IFD is represented at the final of the IFD block, before the IFD values block. If you get the numbers of tags (2 first bytes) and times 12 (where 12 is the size of an IFD tag) you will get the offset to the next IFD postion (represented by 4 bytes).

I have a particular interest in the thumbnail, so as you can see in the explanation above (Exif organization), I need to have an IFD0 and IFD1 to have a thumbnail. IFD1 have two special tags, that points me to the thumbnail. The first one is the JPEGInterchangeFormat (0×0201) and the other one is the JPEGInterchangeFormatLength (0×0202).

The JPEGInterchangeFormat value is an offset to the beginning of the thumbnail and the JPEGInterchangeFormatLength contains the thumbnail size. With this two informations we are able to get any thumbnail embeded in JPEG Exif files ;P