ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • 인코딩
    computing 2013. 5. 24. 11:40



    인코딩 예전에 정리해둔거.


    ■ UTF-8

    - 문자당 1 ~ 4 bytes


    Code range            Scalar value                    UTF-8

    hexadecimal              binary                    binary Notes 


    000000-00007F        0zzzzzzz 0zzzzzzz 

    128 codes ASCII equivalence range; byte begins with zero a

    seven z seven z 


    000080?0007FF

    1920 codes 00000yyy yyzzzzzz 110yyyyy 10zzzzzz first byte begins with 110, the following byte begins with 10. 

    three y; two y, six z five y; six z 


    000800?00FFFF


    63488 codes xxxxyyyy yyzzzzzz 1110xxxx 10yyyyyy 10zzzzzz first byte begins with 1110, the following bytes begin with 10. 

    four x,four y; two y,six z four x; six y; six z 

    010000?10FFFF

    1048576 codes 000wwwxx xxxxyyyy yyzzzzzz 11110www 10xxxxxx 10yyyyyy 10zzzzzz First byte begins with 11110, the following bytes begin with 10 

    three w, two x; four x, four y; two y, six z three w; six x; six y; six z 




    ■ UTF-16/UCS-2

    - 가변길이 인코딩 for Unicode

    - 16bits Word

    - surrogates pairs

       - U+FFFF보다 큰 문자 인코딩하기위해서.


    "水z??" (water, z, G clef), UTF-16 encoded

     

    labeled encoding byte order byte sequence 

    UTF-16LE little-endian 34 6C, 7A 00, 34 D8 1E DD 

    UTF-16BE big-endian 6C 34, 00 7A, D8 34 DD 1E 

    UTF-16 little-endian, with BOM FF FE, 34 6C, 7A 00, 34 D8 1E DD 

    UTF-16 big-endian, with BOM FE FF, 6C 34, 00 7A, D8 34 DD 1E 




    ■ UCS-2

    - UTF-16과 거의 흡사하나

    - Surrogate pairs 가 없어서, U+FFFF보다 큰문자 지원못함

    - 따라서 16bits 고정

    - BOM(Bytes Order Mark)에 따라

      Little Endian, Bing Endian 구분.



    반응형

    댓글

Designed by Tistory.