Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 7 of 7

Thread: which charset?

  1. #1
    Senior Coder low tech's Avatar
    Join Date
    Dec 2009
    Posts
    1,058
    Thanks
    195
    Thanked 130 Times in 130 Posts

    which charset?

    Hi all


    I've always used utf8 as my default charset since I use a lot of Chinese characters on my site, but a while back I read an article that we now have utf8mb4.


    Is utf8mb4 now the standard choice for proper UTF-8 support?


    Also there is unicode_ci and general_ci, does it matter which one I choose? I don't know the difference.


    As i'm redoing my site i'd like to think i'm starting right.

    LT
    Last edited by low tech; 07-17-2017 at 07:20 AM.
    0000

  2. #2
    Regular Coder Vege's Avatar
    Join Date
    Jan 2008
    Posts
    831
    Thanks
    21
    Thanked 113 Times in 112 Posts
    mysql utf8 charsets supports the first 3 bytes of utf8. Also called BMP (Basic Multilingual Plane).
    Most common characters for western people outside the BMP are emoji (��).
    But if you need chinese characters that are rarely used you might need the fourth byte.
    utf8mb4 is the correct charset for this.

    With innodb you might have problem with max index length for text field (197 iirc)

    Collation depends on the nationality of users and IMHO should not be set in the database level.
    It determines the correct ordering of results.
    In finnish (and swedish) the alphabets are abcdefghijklmnopqrstuvwxyz
    If database uses other collation than swedish then the query
    select name from user order by name would be returned in order:
    abcdefghijklmnopqrstuvwxyz
    But this is only for scandinavian countries. There are specific ones for all other countries
    In general the preferred collation is unicode_520_ci
    With country selection you could drop the collate definition and do it while selecting:
    SELECT name
    FROM user
    ORDER BY name COLLATE utf8mb4_swedish_ci;

    //TODO: add some editing

  3. Users who have thanked Vege for this post:

    low tech (07-18-2017)

  4. #3
    Senior Coder low tech's Avatar
    Join Date
    Dec 2009
    Posts
    1,058
    Thanks
    195
    Thanked 130 Times in 130 Posts
    Hi Vege

    Thank you.

    My site is mainly Chinese with some English, but the audience is Chinese.

    So im going with utf8mb4 for future proofing.


    My site is really just data (text with some html tags) stored in database.

    And data retrieval to build the page. At this time no text searching or input from users.



    But I have an issue with choosing length values for data types.

    For example, to store a long article (eg 2000 characters), I would choose TEXT as the datatype

    but what length should I enter?? What unit of measurement is it in??


    What happens to the space allocated if article is less than chosen length. Is it
    flexible like a varchar datatype?


    Appreciate any info on that.


    LT
    0000

  5. #4
    Regular Coder Vege's Avatar
    Join Date
    Jan 2008
    Posts
    831
    Thanks
    21
    Thanked 113 Times in 112 Posts
    Quote Originally Posted by low tech View Post
    Hi Vege

    Thank you.

    My site is mainly Chinese with some English, but the audience is Chinese.

    So im going with utf8mb4 for future proofing.


    My site is really just data (text with some html tags) stored in database.

    And data retrieval to build the page. At this time no text searching or input from users.



    But I have an issue with choosing length values for data types.

    For example, to store a long article (eg 2000 characters), I would choose TEXT as the datatype

    but what length should I enter?? What unit of measurement is it in??


    What happens to the space allocated if article is less than chosen length. Is it
    flexible like a varchar datatype?


    Appreciate any info on that.


    LT
    You can't restrict text fields with an extra (number) parameter. Varchar excluded.
    MySQL supports four TEXT field types (TINYTEXT, TEXT, MEDIUMTEXT and LONGTEXT)
    TINYTEXT is a string data type that can store up to to 255 characters.
    TEXT is a string data type that can store up to 65,535 characters. ...
    LONGTEXT is a string data type with a maximum length of 4,294,967,295 characters.
    Same applies for numbers. INT(2) has the same max value than INT(10).
    Text fields data requirements work like varchars.
    Keep in mind that storing multiple byte characters will lessen the amount of characters you can store in the table.
    For example varchar(255) allows only 64 four byte utf8 characters. Same applies to all text fields.

  6. #5
    Senior Coder low tech's Avatar
    Join Date
    Dec 2009
    Posts
    1,058
    Thanks
    195
    Thanked 130 Times in 130 Posts
    Quote Originally Posted by Vege View Post
    You can't restrict text fields with an extra (number) parameter. Varchar excluded.

    Same applies for numbers. INT(2) has the same max value than INT(10).
    Text fields data requirements work like varchars.
    Keep in mind that storing multiple byte characters will lessen the amount of characters you can store in the table.
    For example varchar(255) allows only 64 four byte utf8 characters. Same applies to all text fields.

    Hi Vege

    I hate setting up tables arrgh

    So are you saying I don't need to add a length value If I choose TEXT as a datatype?? (using xampp phpmyadmin btw)

    LT
    0000

  7. #6
    Regular Coder Vege's Avatar
    Join Date
    Jan 2008
    Posts
    831
    Thanks
    21
    Thanked 113 Times in 112 Posts
    Quote Originally Posted by low tech View Post
    Hi Vege

    I hate setting up tables arrgh

    So are you saying I don't need to add a length value If I choose TEXT as a datatype?? (using xampp phpmyadmin btw)

    LT
    Yes. Typically only with varchar you need to specify length.
    You can give them length like TEXT(20) but the 20 has no real effect. It's basically a helper for some archaic mysql viewers that would show that many characters when previewing a data row.

  8. Users who have thanked Vege for this post:

    low tech (07-18-2017)

  9. #7
    Senior Coder low tech's Avatar
    Join Date
    Dec 2009
    Posts
    1,058
    Thanks
    195
    Thanked 130 Times in 130 Posts
    Quote Originally Posted by Vege View Post
    Yes. Typically only with varchar you need to specify length.
    You can give them length like TEXT(20) but the 20 has no real effect. It's basically a helper for some archaic mysql viewers that would show that many characters when previewing a data row.

    @Vege

    Ah that makes sense now. Great.

    For all my effort reading on database design, I am totally crap at it. I simply just do not play with database enough to be confident. I've read loads, but for the most part its all theory.

    LT
    0000


 

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •