2. What is mbstring for?
3. Supports many character encodings including unicode 4. Supports some different national languages * 5. Character encoding conversion 6. Some Japanese specific functions / settings 7. Mbstring is NOT...
8. How to get mbstring
9. On most PHP servers it's already there so... 10. ...just switch it on! 11. Present and switched on out-of-the-box in Zend Server (CE and upwards) 12. If not present then download, but shouldn't need to compile etc 13. Some key directives for mbstring
14. mbstring.language 15. See http://php.net/manual/en/mbstring.configuration.php 16. Easy peasy in Zend Server 17. Enough now let's rock and roll!
18. For example, we all know strlen() 19. So let's have a look at mb_strlen() 20. mb_strlen() 21. More mb_strlen() 22. Even more mb_strlen() 23. Still rocking and rolling...
24. So let's have a look at mb_strpos() 25. mb_strpos() 26. More mb_strpos() 27. Wrapping up and moving on
28. BE CAREFUL but you can make calls to strlen() (and etc) automatically call mb_strlen()- this is the mbstring.func_overload directive 29. Mbstring specific functions
30. mb_convert_encoding() 31. LOTS of supported encodings 32. ( http://php.net/manual/en/mbstring.supported-encodings.php ) 33. Mbstring.detect_order directive comes into play here 34. mb_detect_encoding() 35. mb_detect_order() 36. More mb_detect_order() 37. Mbstring specific functions
38. mb_convert_encoding() 39. LOTS of supported encodings 40. ( http://php.net/manual/en/mbstring.supported-encodings.php ) 41. Mbstring.detect_order directive comes into play here 42. mb_convert_encoding() 43. More mb_convert_encoding() 44. Regular expressions on multibyte strings
45. mb_ereg() 46. mb_ereg_match() 47. mb_ereg_replace() 48. and many more! 49. Note: PHP's regular preg_*() functions can also do UTF-8 with the /u pattern modifier !! 50. mb_ereg() 51. More mb_ereg() 52. Summary of mbstring functions
53. Multibyte versions of regular string functions 54. Regex functions 55. Encoding detection / conversion 56. Japanese specific functions / settings 57. Other misc stuff 58. Putting it all together
59. BUT... 60. Don't forget your: 61. PHP script files(best to have encoding of file same asmbstring.internal_encoding) 62. Database 63. Output (ie. Probably HTML) 64. Input (ie. Form submissions etc) 65. Multibyting your database
66. PostgreSQL I'm no expert but IIRC Postgres automagically understands and converts input / output character encodings 67. MySQL can choose a collation for server, each schema, each table, each column! 68. MySQL collation means charset + sort order (for example CS means case-sensitive sort order) 69. More multibyting your database
70. You'll need to do an SQL query of: 71. SET NAMES utf8 and / or SET CHARACTER SET utf8 72. After connecting and before reading / writing 73. (otherwise characters will become garbled) 74. Multibyting your output HTML
75. Content-Type: "text/html; charset=UTF-8;" 76. ie. header("Content-Type: text/html; charset=UTF-8;"); 77. Possible but less desirable to output as a meta tag in the HTML : 78. 79. (or simply for HTML5) 80. Don't forget lang=xy or xml:lang=xy where needed 81. Multibyting your input
82. Out-of-the-box, form data on a SJIS host page comes in as SJIS. Form data on an EUC-JP host page comes in as EUC-JP and etc 83. Or have I just been very lucky? 84. Look at mbstring.http_input directive if struggling 85. That's all folks!
86. Previous examples of preg_match() failing will probably work with the /u patter modifier (to enable UTF-8) 87. No mb version of trim() or preg_match_all() 88. Mbstring in action:http://twitter.com/japxlate http://mapanese.info 89. Questions welcome at [email protected]
Top Related