Encoding
Links
- Dark corners of Unicode
- Let’s Stop Ascribing Meaning to Code Points
- UTF-8 – “The most elegant hack” (2013)
- UTF-8 Everywhere (HN)
- Not everything is UTF-8 (2020) (Lobsters)
- Archie Markup Language (ArchieML) - Structured text format optimized for human writability.
- Unidecode - Lossy ASCII transliterations of Unicode text.
- BARE Message Encoding - Simple binary representation for structured application data. (Lobsters)
- Explaining text representation in computers (2020)
- Text Encoding: The Thing You Were Trying to Avoid (2020)
- Amazon Ion - Richly-typed, self-describing, hierarchical data serialization format offering interchangeable binary and text representations. (HN) (HN)
- Unicode In Five Minutes (2013)
- Unicode support. What does that actually mean? (2020)
- Fast UTF-8 validation (2020) (HN) (Code)
- UTF-8 Illustrator
- Coded Character Sets, History and Development (1980)
- Awesome Code Points - Curated list of characters in Unicode, that have interesting (and maybe not widely known) features or are awesome in some other way.
- Text rendering tests - Unicode’s test suite for text rendering engines.
- Executable PNGs (2020) (Lobsters) (HN)
- Unicode Proposal to Encode Subscripts/Superscripts for Mathematical Programming
- Emoji Under the Hood (HN)
- The history of UTF-8 as told by Rob Pike (Lobsters) (HN)
- Concise Encoding - Friendly data format for human and machine. Ad-hoc, secure, with 1:1 compatible twin binary and text formats and rich type support. (Code)
- Practical Reed-Solomon for Programmers (2021)
- Apache Avro - Data serialization system. (Web)
- Unicode sorting is hard & why browsers added special emoji matching to regexp (2021)
- Any Encoding, Ever (2021) (HN)
- casync - Content Addressable Data Synchronizer.
- ANSI Escape Codes (HN)
- Entropy coding in Oodle Data: Huffman coding (2021)
- Fun with Morse Code
- Latinendian vs Arabendian (2020)
- ICU - International Components for Unicode (Code)
- ruststep - STEP toolkit for Rust.
- Overview of Serialization Technologies (2018)
- Substrait - Cross-Language Serialization for Relational Algebra. (Code) (substrait-rs)
- OpenH264 - Open Source H.264 Codec.
- Unicode Normalization Forms: When ö ≠ ö (2021) (HN)
- Planus - Alternative flatbuffer implementation.
- TinyCBOR - Concise Binary Object Representation (CBOR) Library.
- Cheat sheets for the Portable Document Format
- Understanding UUIDs, ULIDs and String Representations (Lobsters) (HN)
- How UTF-8 works (2022) (HN) (Lobsters)
- What Every Programmer Absolutely, Positively Needs To Know About Encodings (2011) (HN)
- Why I invented “dash encoding”, a new encoding scheme for URL paths (2022) (Tweet)
- You Don't Know GIF – An analysis of a GIF file and some weird GIF features (2022)
- DeGauss - Avro schema compatibility checker.
- Hex: A Strategy Guide (HN)
- trrs - CLI tool to transform data between different encodings.
- Ask HN: Is there a tool to generate binary protocol figures out of a spec? (2022)
- Plain Text - Dylan Beattie (2021)
- Low Complexity Communication Codec (LC3)
- ltp - High performance, readable, and maintainable, in-place encoding format.
- Identity Crisis: Sequence v. UUID as Primary Key (Lobsters)
- Concise Encoding - Secure data format for a modern world. (HN)
- BIPF (Binary In-Place Format) Spec - Binary format designed for in-place (without parsing) reads, with schemaless json-like semantics.
- New UUID Formats (2022) (HN)
- UTF8.XYZ - Quick web app for fetching Unicode characters without extra fluff. (Code)
- How to estimate disk space
- muon - Compact and simple binary object notation. (Lobsters) (Doc)
- Character encoding and UTF-8 (2022) (HN)
- Type of Barcodes and Their Usage (HN)
- Unicode Character Search - Search for Unicode Characters by name, codepoint or text. (Code)
- Understanding Big Data File Formats (2022)
- Free Lossless Audio Codec (FLAC)
- How QR codes work (2022) (HN)
- Lyra V2 – a better, faster, and more versatile speech codec (2022) (HN)
- ULIDs are greate replacement for UUIDs
- LXMF - Lightweight Extensible Message Format.
- VRS - File format optimized to record & playback streams of sensor data, such as images, audio samples, and any other discrete sensors (IMU, temperature, etc), stored in per-device streams of timestamped records.
- "AVIF: Creating a new image format in the open" by Jon Bauman (2022)
- Awesome Unique ID
- Google explains why it's removing JPEG XL from the Chromium code base (2022) (Lobsters) (HN)
- Elements Of a Great Markup Language (2022) (HN) (Lobsters)
- The essence of Reed-Solomon coding (2022) (HN) (Lobsters)
- msgpack-tools - Command-line tools for converting between MessagePack and JSON.
- ADBC: Arrow Database Connectivity
- hext - Binary File Markup Language.
- MP4 file encoding explained visually (HN)
- GraphAr - Open source, standard data file format for graph data storage and retrieval.
- Quite OK Image is now my favorite asset format (2022) (HN)
- PA - Native storage format based on arrow.
- rsbkb - Command line tools to encode/decode things.
- Hello, PNG (2023) (HN)
- Schemaboi - Serialization / deserialization format designed to fill a similar niche as Protobuf or JSON.
- Unicode Arrows
- A Safer High Performance AV1 Decoder (2023) (Lobsters)
- Image Codec Comparison (JXL vs. AVIF vs. WebP vs. JPG) (2023)
- ZebraPack - Data description language and serialization format.
- capnp-ts - Cap'n Proto serialization/RPC system for TypeScript & JavaScript.
- Ron - Rusty Object Notation. (HN)
- multicodec - Compact self-describing codecs. Save space by using predefined multicodec tables.
- Jesth - Next-level human-readable data serialization format.
- Packet Description Language (PDL) - Domain specific language for writing the definition of binary protocol packets.
- Descript Audio Codec - State-of-the-art audio codec with 90x compression factor. Supports 44.1 kHz mono/stereo audio.
- Computational complexity of texture encoding (2023)
- Unicode is harder than you think (2023) (HN)
- Cap'n Proto 1.0 (2023) (HN)
- Overwrite/Insert Difference Format
- BinaryPack - Binary JSON like serialization with binary types.
- Remarshal - Convert between CBOR, JSON, MessagePack, TOML, and YAML.
- You Don't Need UUID (2021) (HN)