Those conversions are performed using the particular encoding your file/byte stream/network protocol is using.Īs a side note, you should consider getting rid of 8859-* encoding and using unicode, and the utf-8 encoding, as much as possible in new developments. Inversely, when you have to write something to the disk or the network, which can only accept 8-bit bytes, you have to convert back your characters into bytes. Therefore, a function that reads files (I didn't watch the video, but the name of the function is explicit enough to understand its purpose) has to convert bytes (on the disk) into characters (in memory, using whatever internal representation the language happens to use). The computer can in no way "guess" which is the encoding of a byte stream (like a csv file) that you feed it for processing as text (= strings of characters). (read this sentence twice or more, and commit into permanent brain memory) In all modern languages, a character and a byte are two different things. The benefit of letters of course is that it starts to move you into decoding (which is part of the reason why including letters is more. Which is why manipulatives, counters, letters, etc. It is a crucial first step in creating a new memory. It is difficult for learners to keep track of the phonemes when they are trying to figure them out. This is the process in which the information is processed and categorized for storage and retrieval. This doesn't scale well, of course, and won't work for languages that don't use latin alphabet and need far over 256 different values. Encoding is transforming internal thoughts and external events into short term and long-term memory. latin-1, aka iso-8859-1, is one of those encodings, but as you may guess, not the only one. Historically, every language community has decided on a specific encoding for all byte values above 127. In addition, all special characters of all those languages don't fit into 256 different byte values. Many other languages, even those who use the same latin alphabet, need all kinds of accented letters and specialties that do not exist in English. Therefore, 1 character = 1 byte, everybody agrees on the meaning of each one of the 256 possible 8-bit values. Here the underlying reason the for encoding parameter.Įnglish speakers live in an easy world where the number of necessary characters to write any kind of text or computer code is small enough to be stored in a 8-bit byte (even on a 7-bit, btw, but that's not the point).
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |