# Formats&Protocols

• Linux Foundation Referenced Specifications
• CyberChef A simple, intuitive web app for analysing and decoding data without having to deal with complex tools or programming languages. CyberChef encourages both technical and non-technical people to explore data formats, encryption and compression.

## Numeric format

$$e^{i\pi} + 1^{{\tt NaN}\cdot\inf} = 0$$

(source).

## Character encoding

### Notation

Caret notation is a notation for unprintable control characters in ASCII encoding. The notation consists of a caret (^) followed by a capital letter.

### ASCII

Stands for American Standard Code for Information Interchange, it's a 7bits code.

ASCII reserves the first 32 codes (numbers 0–31 decimal) for control characters: codes originally intended not to represent printable information, but rather to control devices (such as printers) that make use of ASCII, or to provide meta-information about data streams such as those stored on magnetic tape.

### Unicode

Unicode is not an encoding but is a classification of characters: though each one is identified by a number, this number is not used directly in its representation: 'UTF-8' is particular representation where for compatibility purpose the ASCII set is maintained as is.

• http://www.cl.cam.ac.uk/~mgk25/unicode.html
• http://canonical.org/~kragen/strlen-utf8.html
• http://nedbatchelder.com/text/unipain/unipain.html
• http://farmdev.com/talks/unicode/
• http://www.joelonsoftware.com/articles/Unicode.html
• http://www.2ality.com/2013/09/javascript-unicode.html
• http://the-pastry-box-project.net/oli-studholme/2013-october-8/
• http://www.utf8everywhere.org/
• https://speakerdeck.com/mathiasbynens/hacking-with-unicode
• http://agiliq.com/blog/2014/11/character-encoding-and-unicode/
• http://eev.ee/blog/2015/09/12/dark-corners-of-unicode/
• http://reedbeta.com/blog/programmers-intro-to-unicode/
• Hacking GitHub with Unicode's dotless 'i'

For testing purpose use

“Iñtërnâtiônàlizætiøn” looks like E2 80 9C 49 C3 B1 74 C3 AB 72 6E C3 A2 74 69 C3 B4 6E C3 A0 6C 69 7A C3 A6 74 69 C3 B8 6E E2 80 9D in UTF-8 in hex.


An implementation of strlen() is the following:

# original from <http://canonical.org/~kragen/strlen-utf8.html>
.global strlen_utf8
strlen_utf8:
push %esi
cld
mov 8(%esp), %esi
xor %ecx, %ecx
loopa:  dec %ecx
loopb:  lodsb
shl \$1, %al
js loopa         # x1xxxxxx
jc loopb         # 1xxxxxxx
jnz loopa        # 00xxxxxx
mov %ecx, %eax
not %eax
pop %esi
ret


### Punycode

Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames.

## Regex

The most common compression formats are all using the DEFLATE algorithm defined in RFC 1951, in particular ZLIB (defined in RFC 1950 and GZip (defined in RFC 1952); the Zip format instead is defined here.

## PDF

• Reference
• Let's write a PDF file speakerdeck
• PDF file format basic structure
• https://www.aldeid.com/wiki/Analysis-of-a-malicious-pdf
• http://esec-lab.sogeti.com/posts/2009/06/26/at-least-4-ways-to-die-opening-a-pdf.html
• https://www.osdefsec.com/analyzing-malicious-pdf/
• http://eternal-todo.com/tools/peepdf-pdf-analysis-tool

## ELF

An ELF file is identified by four magic bytes \x7FELF; it has an header that gives general information about the types of ELF file that can be

• Relocatable file
• Executable
• Shared objects/library

the architecture and the entry point.

It defines a series of sections and segments (program header and section header), respectively describing the execution and the linking for the file.

The kernel loads in memory only the PT_LOAD type and if is defined an interpreter (in the section PT_INTERP) call the interpreter to do its job (i.e. resolve the dynamic sections).

This is a prime that is also an ELF:

7f454c46010101000000000000000000020003000100000054800408340000000000000000000000340020000100000000000000010000000000000000800408008004085b0000005b0000000500000000100000b32a31c040cd80597ec9b11d


### Relocation

Relocations are the entities used by the linking process, dynamic or not

### Dwarf

• https://www.ibm.com/developerworks/aix/library/au-dwarf-debug-format/index.html
• http://www.dwarfstd.org/doc/Debugging%20using%20DWARF-2012.pdf
• https://stackoverflow.com/questions/5954140/dumping-c-structure-sizes-from-elf-object-file
• http://wiki.dwarfstd.org/index.php?title=DWARF_FAQ
• https://landley.net/kdocs/ols/2007/ols2007v2-pages-35-44.pdf