Skip to content

Formats&Protocols

Endianess

Numeric format

$$ e^{i\pi} + 1^{{\tt NaN}\cdot\inf} = 0 $$

(source).

Character encoding

Notation

Caret notation is a notation for unprintable control characters in ASCII encoding. The notation consists of a caret (^) followed by a capital letter.

ASCII

Stands for American Standard Code for Information Interchange, it's a 7bits code.

ASCII reserves the first 32 codes (numbers 0–31 decimal) for control characters: codes originally intended not to represent printable information, but rather to control devices (such as printers) that make use of ASCII, or to provide meta-information about data streams such as those stored on magnetic tape.

Unicode

Unicode is not an encoding but is a classification of characters: though each one is identified by a number, this number is not used directly in its representation: 'UTF-8' is particular representation where for compatibility purpose the ASCII set is maintained as is.

  • http://www.cl.cam.ac.uk/~mgk25/unicode.html
  • http://canonical.org/~kragen/strlen-utf8.html
  • http://nedbatchelder.com/text/unipain/unipain.html
  • http://farmdev.com/talks/unicode/
  • http://www.joelonsoftware.com/articles/Unicode.html
  • http://www.2ality.com/2013/09/javascript-unicode.html
  • http://the-pastry-box-project.net/oli-studholme/2013-october-8/
  • http://www.utf8everywhere.org/
  • https://speakerdeck.com/mathiasbynens/hacking-with-unicode
  • http://agiliq.com/blog/2014/11/character-encoding-and-unicode/
  • http://eev.ee/blog/2015/09/12/dark-corners-of-unicode/
  • http://reedbeta.com/blog/programmers-intro-to-unicode/
  • Hacking GitHub with Unicode's dotless 'i'

For testing purpose use

“Iñtërnâtiônàlizætiøn” looks like E2 80 9C 49 C3 B1 74 C3 AB 72 6E C3 A2 74 69 C3 B4 6E C3 A0 6C 69 7A C3 A6 74 69 C3 B8 6E E2 80 9D in UTF-8 in hex.

An implementation of strlen() is the following:

# original from <http://canonical.org/~kragen/strlen-utf8.html>
.global strlen_utf8
strlen_utf8:
        push %esi
        cld
        mov 8(%esp), %esi
        xor %ecx, %ecx
loopa:  dec %ecx
loopb:  lodsb
        shl $1, %al
        js loopa         # x1xxxxxx
        jc loopb         # 1xxxxxxx
        jnz loopa        # 00xxxxxx
        mov %ecx, %eax
        not %eax
        pop %esi
        ret

Punycode

Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames.

Regex

The most common compression formats are all using the DEFLATE algorithm defined in RFC 1951, in particular ZLIB (defined in RFC 1950 and GZip (defined in RFC 1952); the Zip format instead is defined here.

JPEG

PNG

PDF

  • Reference
  • Let's write a PDF file speakerdeck
  • PDF file format basic structure
  • https://www.aldeid.com/wiki/Analysis-of-a-malicious-pdf
  • http://esec-lab.sogeti.com/posts/2009/06/26/at-least-4-ways-to-die-opening-a-pdf.html
  • https://www.osdefsec.com/analyzing-malicious-pdf/
  • http://eternal-todo.com/tools/peepdf-pdf-analysis-tool

H264

JSON

ELF

An ELF file is identified by four magic bytes \x7FELF; it has an header that gives general information about the types of ELF file that can be

  • Relocatable file
  • Executable
  • Shared objects/library

the architecture and the entry point.

It defines a series of sections and segments (program header and section header), respectively describing the execution and the linking for the file.

The kernel loads in memory only the PT_LOAD type and if is defined an interpreter (in the section PT_INTERP) call the interpreter to do its job (i.e. resolve the dynamic sections).

This is a prime that is also an ELF:

7f454c46010101000000000000000000020003000100000054800408340000000000000000000000340020000100000000000000010000000000000000800408008004085b0000005b0000000500000000100000b32a31c040cd80597ec9b11d

Relocation

Relocations are the entities used by the linking process, dynamic or not

Core dumps

TLS

Dwarf

  • https://www.ibm.com/developerworks/aix/library/au-dwarf-debug-format/index.html
  • http://www.dwarfstd.org/doc/Debugging%20using%20DWARF-2012.pdf
  • https://stackoverflow.com/questions/5954140/dumping-c-structure-sizes-from-elf-object-file
  • http://wiki.dwarfstd.org/index.php?title=DWARF_FAQ
  • https://landley.net/kdocs/ols/2007/ols2007v2-pages-35-44.pdf
  • https://maskray.me/blog/2020-11-08-stack-unwinding

MS-DOS&PE

Mach-O

QRcode

Compact disc

UART

USB

SD Card

Miscellanea

Polyglot