lun-4 (imagine gpt-3 but gayer)

a robot girl does writing and shit. reach at email [email protected]

Before actually starting this I want to say that programming language discourse is 99% of the time based on subjective experience. The writing here is my subjective experience dealing with Python. Python may be good for you, or bad for you, or whatever. Replacements beware.

Time changes, people change

I've started learning Python after attempting to learn C but failing back in approximately 2013. But I would only consider writing Python for internet-facing production in 2017.

This rant has hindsight bias, for sure. But a good point on Python is that I didn't quite knew how to program things before learning it. Sure, I did write massive if chains with C but nothing more complex because as soon as I started reading the pointers chapter I would get confused.

Nowadays I'm not that confused with C pointers, or the difference between the stack and the heap (thanks Zig), and I'm a different person, with different goals and needs on what I want my software to be, or, what its principles are. But, even though I'm a different person, I'm still locked in to the younger me with Python, and now that I can put “5 years of experience writing Python” on my CV, I don't think this will go away any time soon.

A non-exhaustive list of things that keep me sad while writing Python

I could go on and enumerate things I'm not happy with and that I ultimately don't have power to fix, so here we go:

  • Typing is sad, it's both a mix of giving static type checking but without actually breaking the language. I mean, yes, typing is good, but in the end you can declare a function that takes an int and give it a string and you'll only know something is wrong at execution time if you don't have mypy, and so you go to install mypy, and then it doesn't find some library, or a library doesn't have typehints, or it decides to never want typehints, etc, etc, etc.
    • A subproblem of typing is using things like typing.Protocol. Which I tried to use, once. It type passed afterwards, but I feel dirty after writing something like that. I wish I could elaborate further on why, other than that general description of “something's wrong” while looking at it.
  • Packaging. God. Packaging. After going through requirements.txt, setup.py, poetry, dephell, pipenv, I just gave up and settled on setup.py since it's the only guaranteed thing that might work.
  • Not being happy with the “large” frameworks AKA Django, meaning I have to go to things like Flask, but then there's asyncio, so I'm going to Sanic, then sanic is it's own hellish thing, and I'm now comfortable on Quart. And by using a niche of a niche of a niche web framework, I'm having to create my own patterns on, for example, how to hold a database connection by just putting app.db = ... on @app.before_serving, you can be sure mypy does not catch anything about that at all. Love it.
  • A Bug in a completely separate part of my software was caused by a bump in dependencies but not bumping my python version on CI. I kept banging my head for hours until I gave up and updated from 3.7 to 3.9.
  • Endless language constructs seem to be the norm now. I don't like that we now have 4 different ways to merge dictionaries together, because the existing one didn't have “a good syntax”. That feels like a really low bar for “new” language constructs.
    • This is new on 3.9, so its likely the two last entries are more of rants than critiques, idk.

The future?

I don't know. None of those things look like will ever be fixed. I'm seeing if I can write webshit with Zig, but that's all experimental. Might take much, much longer to “ship” but I feel that by not having the ecosystem or an interpreter slowing me down I'd have a better development experience.

I can't deny that I'm now with years-long experience with Python and I have to put something in my curriculum, though.

The tech industry makes me sad.

Or, “Monolith First, Complicated Thing Later, Also Plan Before Switching, And How Those Things Have Been Known For Ages, HELP”

I wan't to start this with Gall's law, from General Systemantics:

A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with a working simple system.

When I see large scale distributed systems being written with Kubernetes or whatever, I have the gut feeling Gall's law wasn't followed, though my sample size is very small: 1.5 systems. 0.5 comes in because the other system I'm managing isn't Kubernetes (yet, I'm sure it'll become that after a while).

I consider that jumping from a monolith line of thought to a microservice line of thought, for any team or system, is a very large jump that must be scrutinized before actually doing it. Microservices are no silver bullet, mostly because developers are taught on writing monoliths for their whole life, and then you push up microservices, which come with a lot of specific tooling, and as years pass, even more tooling is created, to the point enumerating them all can be absolute sensory overload. Really.

My Bad Kubernetes Experience

In one of the systems I was developing on, I wasn't actually messing with Kubernetes directly, but, regardless, I had a very poor experience integrating with it. The monolith app I developed was inside a VPS, and had to push data to MongoDB and Redis nodes which were living inside the Kubernetes cluster. The person that managed it told us to use kubectl port-forward, and sure, we tested it out, and it worked, at least at the start, but we all know how it goes from here.

While the system was operating at spike load, the cluster restarted itself, and kubectl port-forward lost connectivity with it. In my mental model, it would reconnect to the cluster: it has authentication, it doesn't need to hold any state. Yet it didn't, and our app kept unable to connect to Redis (wish a very cryptic ECONNRESET in the app logs) for a long time while we attempted to debug it, a restart on the systemd unit that did the port-forward “fixed” it, after we talked with the person operating the cluster how redis was, and they told it was up.

After talking to a friend about that situation (a long time later), they told me port-forward is “a broken ssh tunnel that might not even work”, and “it's not meant to be used for anything other than testing”. So I could have used some other Kubernetes solution which wouldn't have made me write three paragraphs.

My Bad Microservices Experience

I think I can say this one is caused by bad management wanting to put “novel” things on teams that are very new, don't know anything about what a container actually is, or don't know what distributed system design actually entails to the failure modes of such system.

When you jump into microserivces, and design with them in mind, you have to know that you're designing a distributed system, and while the units composing that system are small and simple and understandable, the whole can be much, much more complicated (see: natural systems in the whole world, inculding human-made ones!). As well as you must know all the “fallacies of distributed systems”, which can become somewhat cliché, but are actually quite true, and at this rate, if I'm developing a distributed system, I should print out the list and put it next to me and every other developer, I've seen them happen, even though the fallacies have been enumerated since... 1997? It's weird how we're still having issues that come from those fallacies in 2020. We should do better.

I think this boils down to “people should learn a little bit before jumping in the microservice hype”, and in most cases, Gall's law still applies. I believe that things should be written first as a monolith, to have the developers learn the business logic in an environment that function calls aren't remote everywhere. And if scaling is required, then proceed to investigate a distributed architecture, if you can actually scale there, or do something else.

Offloading to the database

Another data point from overall system design is that you shouldn't offload everything to the database, as in, make every microservice need it for any operation, because as things like autoscaling jump in and now you have 10 times the amount of microservices, your database will suffer 10 times the load. That might catch fire.

Thouh if the database does catch on fire AND is a managed database solution, you can blame the cloud company providing it instead of yourself (hrrrrm Discord loved to do this on their postmortems, but nowadays they don't even give any postmortem. It's sad).

Disclaimer: I do not work for Discord, or with Discool. If i wanted to look cool I would call myself an “independent security researcher”, but in the end I just stare at Chromium's DevTools while Discord is open for an hour.

What are Intents?

Discord has unveiled Gateway Intents a while ago. A PR on the Discord API Docs repository was created to help library developers start adding support for Intents, and provides some answers to questions developers raised up.

The general idea of Intents is that instead of having all events being thrown away at your gateway connection, only the events you actually want are sent, this gives huge wins to general network usage. Most bots get a ton of typing events while being unable to do anything about that, for example. There was a stop-gap measure for this via the “guild_subscriptions” field, but the general idea is that Intents is the way to go, as it's more generic.

From all intents, two of them are considered “Privileged”: members and presence, providing the member list and user status (online/offline/what game they're playing) respectively.

I'm writing this on October 3rd, 2020 (with many drafts afterward). I did have a lot of time in my hands to study about Intents but I'm only doing it right now and sharing the results since I had to port my personal bot to it.

What's the actual deal/reasoning with Privileged Intents?

From the original blogpost:

We believe that whitelisting access to certain information at scale, as well as requiring verification to reach that scale, will be a big positive step towards combating bad actors and continuing to uphold the privacy and safety of Discord users.

To use Privileged Intents, your bot: – must not be in 100 guilds, or – if you are in more than 100 guilds, you must be a “verified developer” with a “verified bot”. If you attempt to bring your bot down to less than 100 guilds after that, you can't have the intents back

The verification process involves filling a form describing your bot's functionality, and give a government-issued identification document via Stripe Identity. Depending on your country, your passport may be the only available option. You can check that information on the Stripe Identity documentation.

There is a grace period before bots are forced to be verified: October 7th, 2020. Bots beyond 100 guilds past that date will not be able to join any more guilds.

From the blogpost, and the surrounding timing on it, I can safely say the creation of Privileged Intents is to provide an answer to what I dub the “discool discourse”. They haven't said that publicly, and I can only base this on feeling alone.

Discool was a service that stored a lot of Discord user metadata. You could put a User ID in, and get the list of guilds they were in (of course, not all guilds, but a lot of public guilds had their data scraped).

The way they did it was by creating burner user accounts able to join those guilds, scraping all the data already available to users (the Discord client needs the member list after all), and store them in a database. Give enough time and effort, and you have a pretty big privacy scandal regarding user data and Discord.

From my perspective, Privileged Intents were made to make developers that create a large-scale privacy scandal legally binded to Discord, and go to court, and etc, etc, etc.

Are the Privileged Intents a wrong way to protect data?

I think so, or at least, very flawed. As I said, Discool uses user accounts, they never used bots in the first place, any Intent business would never get to those bad actors. Of course, Discord has been implementing various kinds of user extra-verification with “machine learning”, but the point is that anyone with a browser can start up Selenium, point it to Discord, and start scraping. Hell, given enough time and effort someone could gcore the browser process to scrape data while still being classified as having “human sentient behavior”

Even in the case of an actual privacy scandal on Discord, the damage for it, even though you're limited to 100 guilds, even though it's technically less users it's the same level of data, you can still fuck up the lives of some people, but I think Discord only cares about having less of the numbers.

When giving data out to anyone, you should assume the worst, because you don't control their intent (pun unintended). Sure, requiring ID and so providing the existing legal system as a protection against misuse of user data works with more than 0% efficacy, but I don't think that is a good solution.

In regards to giving out my own data, I'm not sure I can trust Discord in handling of that. I lost my trust on that after they explained that their data collection (the famous /api/science route, that was formerly /api/tracking, but got blocked by extensions to the point they had to rename it), if turned off by the client, would still continue (you can prove this with devtools), and in their FAQ post, just said that it's dropped on the server.

The nitty gritty: when you turn the flag off, the events are sent, but we tell our servers to not store these events. They're dropped immediately — they're not stored or processed at all. The reason that we chose to do it this way is so that when you turn it off on your desktop app it also turns off automatically on your phone – and vice-versa. This allows us to keep things the same across all of our apps and clients, across upgrades.

Believe me when I say, since they already sync user configuration across clients, it's easy to make the app disable tracking upon seeing the already existing configuration option to disable tracking.

Learning to bypass limitations imposed by Privileged Intents

For context: in the library that I'm using for my bot, discord.py, there's a full framework, discord.ext.commands, in it you can declare a command like such:

from discord.ext import commands

@commands.command()
async def some_command(ctx, argument1, argument2):
    ...

One specific feature of the framework is the ability to convert from raw strings to useful objects, for example, if I wanted to receive a user, I could say

async def some_command(ctx, user: discord.User):
    ...

and discord.py will convert either a mention, or an ID, or the username back to the User object. This is possible because Discord gives all member data to the bot upon the bot's startup. It is fully dependent on the members privileged intent.

Without the members intent, the member lists are basically empty, and the feature doesn't work anymore.

Bringing the feature back

Discord's message events contain full user objects (which have ID, name, avatar, discriminator, etc) and member objects, (which contain roles, nickname, etc) since the client must be able to draw out the UI containing that user.

Even though you don't have full member lists on startup, you, using that fact, can start passively collecting data to re-create the feature. It works like this: on any message, store the user in persistent storage, when needed, look up from that storage, and maybe put it in cache for performance reasons.

Considering message events also contain the guild's ID, you can safely say that with any message that contains an author and a guild ID, you can say that author is a member of that guild, and still do the same thing Discool thing. It is technically less data than originally, since you're only collecting data about the people that are actually sending messages in the guild.

Keep in mind that messages being sent also include the system messages created upon a user's join (they contain the same user object). They're messages like any other, and will reach any bot in the guild. You can have the same effect with bots that create welcome messages to new users, as they usually put a mention to the user (“welcome @user to $guild!”), and since mentions internally are represented by putting the user ID on the message, bots can still scrape it up.

You can also use that same method to extract relationships with typing events as well, as they contain guild ID and user ID

Future work

New Discord features

Recently I've seen work-in-progress inside the Discord client on a way for bots to declare their own commands (you can also see the Figma prototypes on the “The Future of Bots” blogpost), removing the need for bots to have every single message being sent to them (it currently works with bots checking in if the message starts with a given prefix, etc).

It is defnitely a better idea than the current state of things, but I'm not sure if the boundary of “hey, your user data is now in the hands of the bot developer” is very well defined. Many users would still use the bot, even once, and have their user info stored on a database. I'm not using it for malicious intent, but if we're talking about the technical purpose of Privileged Intents, we must assume every bot is malicious, and design around that.

discord.py using Request Guild Members for the User and Member converters

See this issue. It would sidestep the problem of having to extract info out of messages, which would help my usecase immensely.

I don't know if it'll be implemented, and I'm too anxious to open an issue on discord.py to use it, but it is a nice idea.

Closing thoughts

this music is fucking awesome

I tried, as an experiment/curious endeavour, to run osu!lazer (2020.717.0) on a NetBSD 9.0 QEMU/KVM guest, on my machine. Here's the results:

  • Follow the Linux Emulation NetBSD guides, installing the linux compatibility packages (suse_*) was good enough to continue.
  • FUSE is not included by default, but thankfully, the AppImage gives instructions to extracting itself into a folder.
  • An AppRun file exists in the folder. Trying to execute it would give “Failed to initialize CoreCLR, HRESULT: 0x80070008”. It is related to ulimit -n, the default is way lower than 2048
  • Raise the file descriptor hard limit using sysctl kern.maxfiles, then bring up the hard limit (ulimit -Hn), then bring up the soft limit (ulimit -Sn). If I recall correctly, I bumped it up to 8000.
  • Some other failure happened while running AppRun, upon further inspection via ktrace/kdump, a sched_setaffinity syscall failed because of permissions.
  • Running it as root fixed that.
  • A traceback appeared regarding ICU libraries not being found, even though I have them installed.
  • You can edit osu!.runtimeconfig.json to disable that requirement. More here
  • osuTK kept talking about unsupported platforms. Upon further investigation, it was because no valid GPUs were found. A very cryptic error, for sure.
  • In the end and some more discussion, we blamed the fact I was running NetBSD as a KVM guest as the culprit. virt-manager only provides QXL, VGA, and virtio, and virtio gpu drivers aren't on NetBSD 9.0.

Maybe I'll run NetBSD as a host on one of my machines and keep this experiment running. Until then, that's what I got.

To be continued..?

awawawaw

beep boop

welcome to writing moment

  • lunlunlunlunluna