mosix 2482814c56 about unicode | 2 年 前 | |
---|---|---|
api | 2 年 前 | |
irc | 2 年 前 | |
.env.example | 2 年 前 | |
.gitignore | 2 年 前 | |
README.md | 2 年 前 | |
main.py | 2 年 前 | |
message.py | 2 年 前 | |
request_oauth_token.py | 2 年 前 |
Before jumping into links, read the text. This is still in development. Let's call it a playground.
First of all twitch has an api. The client can do some stuff, but not enough.
The class used for this is the TwitchApiClient(client_id, client_secret)
.
See main.py
which i think should be __main__.py
. Pls check.
The idea is to find a channel and subscribe to message events.
The api client only needs clientId
and clientSecret
.
Then we need to send a message. This is where i started.
than i found out they use irc chat.
Twitch is great, because they use irc chat. For irc chat we need an oauth token. The irc client should login. thats all it can do for now. the irc client needs a token with permission for stuff.
Starting with:
chat:edit
)request_token.py
script.after redirect to localhost, copy the token from url
start telnet with command: telnet
before connect you may want to enable seeing your input characters too
then connect
open irc.chat.twitch.tv 6697
(you must be genius via telnet)open irc.chat.twitch.tv 6667
(use this for telnet only)telnet url port
now you have an open tcp connection.
Every character you type will be send and decoded UTF-8
(i will write some below) on server side at
the moment you type it. So when you hit the delete button. the character
is transmitted. it will not delete any text. the server will not do anything.
i mean sometimes he was saying something like: hey you!!!!
. Which is nice,
so you see, there is a connection. thanks for responding to bullshit.
request_oauth_token.py
to get a token for irc login.env
file: source .env
chmod u+x request_oauth_token.py
&& python3 request_oauth_token.py
PASS oauth:<yourToken>
\n
NICK <username>
\n
via Enter/JOIN <channelname>
PRIVMSG
If you use the irc client, you have tls support. there is a wrapper for the tcp socket.
Next thing to be added to the code is sending a message. Should be PRIVMSG
.
Read for commands and PRIVMSG
with result.
Useful links:
3.3 Sending messages
)UTF8
and encodingLets start with some bytes we have received. They are just random
0 and 1. Encoding means, which value maps to which letter.
For example 0100 0001
matches the letter A
in UTF-8
.
That is why you can not use a bytes object as a string. you
need to decode it. so you must know the format or have some
crazy detection code.
UTF-8
is multibyte encoding. A normal character (char
) would be
size of a byte (0000 0000
8 bits). So maximum possible number combinations
is 2^8 = 256. 0
is a number too (0000 0000
is also a combination). so the highest value is 255
(but they start with 0
so 2^7, continue reading).
Anyway.
A normal character that uses only one byte will always start with a 0
.
so 0101 1101
is a one byte character (dunno which char that is).
So the highest value for a 1byte char is 2^7-1=127 (because the first bit is used
to say its a normal char). UTF-8
is also compatible (or somewhatever you wanna
call it) with ASCII. So all ASCII chars map in UTF-8 to the same character.
ASCII starts always with 0
.
They simply use the first two bytes for something else. They say:
0
: single byte char10
→ a following multibyte11
→ marker multibyte (with unused 6 bits for value)so the logic is:
0
→ normal11
→ multibyte start
10
read left 6 bits0
or 11
this is a new charHave an example:
mkdir /tmp/test && cd/tmp/test
echo "🙏" > example.txt
xxd example.txt
Output: 00000000: f09f 998f 0a .....
So how do we read bytes. This is hex representation. hex has the base 16. but let's not discuss number systems. hex uses the chars 0-9 and A-F. You can just continue counting.
With more digits:
[16^4][16^3][16^2][16^1][16^0]
[2^4][2^3][2^2][2^1][2^0]
How to read this easy: gnome-calculator
in programming mode.
What I wanted to say is, that a byte is represented as 2 hex chars.
Because F
= 1111
. And we need two.
So the output in binary is:
1111 0000 1001 1111 1001 1001
Translates to:
1111 0000
multibyte start1001 1111
multibyte read next1001 1001
multibyte read next1000 1111
multibyte read next0000 1010
new simple charwe have two chars. First one (X
= marker multibyte read):
1111 0000 1001 1111 1001 1001 1000 1111
XXXX XXXX XX01 1111 XX01 1001 XX00 1111
= 1F64F
And 0000 1010
= 10(dec) = LF (\n
). Use echo -n
.
me thinking They could have used the first byte too.
The value is then searched in the used font.
Those fonts can have different formats, because you can save the shape of a letter or emoji
in different formats. The font needs to have the character. But in general the encoding is
saying: How to understand that value
. For example the character \r
says: move cursor back to line start
.
It will not be represented in a font.
So what is unicode:
The Unicode Standard is the universal character encoding standard for written characters and text.
And
Unicode characters are represented in one of three encoding forms: a 32-bit form (UTF- 32), a 16-bit form (UTF-16), and an 8-bit form (UTF-8). The 8-bit, byte-oriented form, UTF-8, has been designed for ease of use with existing ASCII-based systems.
I think, an encoding that by definition uses several encodings (just search things in fonts) should not be called encoding. More like:
Unicode defines shapes that represent a value encoded in utf-8, utf-16 or utf-32. Those shapes are defined by the used font.
References: