Character encoding

November 24, 2012

Character encoding

ASCII

  • encoding in 7-bit.
  • 32 -> 127 representing characters.
  • 0 -> 31 representing control characters.
  • 128 -> 255 was called OEM characters, many company has their own idea about how to use these charaters.

ANSI:

  • lower 127 characters is same with ASCII.
  • higher 127 characters were divided into different “code pages”

Unicode:

  • Code point: In Unicode, a letter maps to something called a code point which is still just a theoretical concept.
  • Encoding: Unicode Byte Order Mark: indicating encoding order is ‘high-endian’ or ‘low-endian’

UTF8: Every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes.

The Most Important thing:

  It does not make sense to have a string without knowing what encoding it uses. 

Striking the Balance: Simplicity, Adaptability, and Effective Prioritization in Software Development

### **Local Optimization and Its Impact:** Local optimization refers to optimizing specific parts of the process or codebase without con...… Continue reading

Terraform Tips: Multiple Environments

Published on October 17, 2021

Terraform Tips: Layered Infrastructure

Published on October 02, 2021