-
인코딩 예전에 정리해둔거.
■ UTF-8
- 문자당 1 ~ 4 bytes
-
Code range Scalar value UTF-8
hexadecimal binary binary Notes
000000-00007F 0zzzzzzz 0zzzzzzz
128 codes ASCII equivalence range; byte begins with zero a
seven z seven z
000080?0007FF
1920 codes 00000yyy yyzzzzzz 110yyyyy 10zzzzzz first byte begins with 110, the following byte begins with 10.
three y; two y, six z five y; six z
000800?00FFFF
63488 codes xxxxyyyy yyzzzzzz 1110xxxx 10yyyyyy 10zzzzzz first byte begins with 1110, the following bytes begin with 10.
four x,four y; two y,six z four x; six y; six z
010000?10FFFF
1048576 codes 000wwwxx xxxxyyyy yyzzzzzz 11110www 10xxxxxx 10yyyyyy 10zzzzzz First byte begins with 11110, the following bytes begin with 10
three w, two x; four x, four y; two y, six z three w; six x; six y; six z
■ UTF-16/UCS-2
- 가변길이 인코딩 for Unicode
- 16bits Word
- surrogates pairs
- U+FFFF보다 큰 문자 인코딩하기위해서.
"水z??" (water, z, G clef), UTF-16 encoded
labeled encoding byte order byte sequence
UTF-16LE little-endian 34 6C, 7A 00, 34 D8 1E DD
UTF-16BE big-endian 6C 34, 00 7A, D8 34 DD 1E
UTF-16 little-endian, with BOM FF FE, 34 6C, 7A 00, 34 D8 1E DD
UTF-16 big-endian, with BOM FE FF, 6C 34, 00 7A, D8 34 DD 1E
■ UCS-2
- UTF-16과 거의 흡사하나
- Surrogate pairs 가 없어서, U+FFFF보다 큰문자 지원못함
- 따라서 16bits 고정
- BOM(Bytes Order Mark)에 따라
Little Endian, Bing Endian 구분.
반응형