Posts made by david_chisnall@infosec.exchange

So, it turns out, Techbro Rasputin’s FOSDEM talk isn’t because he is a donor, the organisers actually though it was a good idea.

I guess I’m not sad anymore that I’ll miss the event this year.

@soatok I treasure my ignorance of cryptography because I believe that there are two safe amounts of crypto-knowledge:

A lot, so that you can design something robust.
None, so that you know not to try.

Between them is a dangerous middle ground of thinking you know enough to do something sensible. Hypothetically, there may be a safe middle ground where you don't know much but do know how little you know. I'm not confident of being able to do that.

Your post made me worry, because even I know least why some of those things are a bad idea.

@dansup Might be worth checking with the EU regulators. I would be pretty shocked if this did not violate the Digital Markets Act, and that has some fairly beefy financial penalties.

Someone should point out to Trump that if each Canadian province became a US state then there would be enough blue electoral college votes to ensure that his team never got in again, but if he sold the blue costal states to Canada then he would have enough votes for a constitutional amendment to make him dictator for life to pass.

@ryanc I was going for -INF, but -0 is probably better.

@skinnylatte @Rycaut Germany and Israel also have such a scheme in their constitutions. Unfortunately, the reason I qualify is the same reason that there is no documentation that I qualify.

@ryanc @sophieschmieg @Lookatableflip I guess it’s not pure software, but anything running on a real computer has a hardware component. The randomness bit is pure software, using whatever it can from the environment as entropy sources, but none of the entropy sources alone (without a hardware random number generator) has enough entropy to be useful, and interrupt timings can sometimes be under attacker control (some fun attacks from the ‘90s involved sending packets at specific timing to influence the entropy collection).

@ryanc @Lookatableflip @sophieschmieg That depends a lot on the system. It will use all of the entropy sources available to the kernel. On modern systems, that typically includes at least one hardware entropy source. These are often a set of free-running ring oscillators, which then feed into some cryptographic hash function for whitening.

Without these, it will use much weaker things. The contents of the password file, the hash of the kernel binary, the cycle count at the time interrupts fire or devices are attached, and so on.

There have been some high-profile vulnerabilities from embedded devices that did things like generating private keys on first boot, with deterministic device attach time, and ended up with a handful of different private keys across the entire device fleet.

@cesarb @tthbaltazar @mjg59 Don’t confuse on-package TPMs and fTPMs. A lot of fTPMs (which run on the main core in a privileged mode) are often vulnerable to side channels. Several of the recent transient execution attacks could leak fTPM secrets. I think most of these were patched by doing some aggressive state flushing on TPM events, but people keep finding new side channels. On-package TPMs, where the TPM is a separate component either in the same package or on the same die are typically not vulnerable to these attacks. On the MS Surface laptops, there’s a Pluton subsystem on die, which runs the TPM stack. Pluton is one of the few Microsoft security products I have a lot of faith in (I worked with that team, they’re great): it stood up to over a decade of attacks from people with physical access and a strong financial incentive to break it.

A lot of the current hype around LLMs revolves around one core idea, which I blame on Star Trek:

Wouldn't it be cool if we could use natural language to control things?

The problem is that this is, at the fundamental level, a terrible idea.

There's a reason that mathematics doesn't use English. There's a reason that every professional field comes with its own flavour of jargon. There's a reason that contracts are written in legalese, not plain natural language. Natural language is really bad at being unambiguous.

When I was a small child, I thought that a mature civilisation would evolve two languages. A language of poetry, that was rich in metaphor and delighted in ambiguity, and a language of science that required more detail and actively avoided ambiguity. The latter would have no homophones, no homonyms, unambiguous grammar, and so on.

Programming languages, including the ad-hoc programming languages that we refer to as 'user interfaces' are all attempts to build languages like the latter. They allow the user to unambiguously express intent so that it can be carried out. Natural languages are not designed and end up being examples of the former.

When I interact with a tool, I want it to do what I tell it. If I am willing to restrict my use of natural language to a clear and unambiguous subset, I have defined a language that is easy for deterministic parsers to understand with a fraction of the energy requirement of a language model. If I am not, then I am expressing myself ambiguously and no amount of processing can possibly remove the ambiguity that is intrinsic in the source, except a complete, fully synchronised, model of my own mind that knows what I meant (and not what some other person saying the same thing at the same time might have meant).

The hard part of programming is not writing things in some language's syntax, it's expressing the problem in a way that lacks ambiguity. LLMs don't help here, they pick an arbitrary, nondeterministic, option for the ambiguous cases. In C, compilers do this for undefined behaviour and it is widely regarded as a disaster. LLMs are built entirely out of undefined behaviour.

There are use cases where getting it wrong is fine. Choosing a radio station or album to listen to while driving, for example. It is far better to sometimes listen to the wrong thing than to take your attention away from the road and interact with a richer UI for ten seconds. In situations where your hands are unavailable (for example, controlling non-critical equipment while performing surgery, or cooking), a natural-language interface is better than no interface. It's rarely, if ever, the best.

@chris It’s going to take 8 hours, which means overnight is about the only option. At the speed of the Shinkansen, it would take a little over three hours which, if you include time waiting at the airports and getting to and from out-of-town airports, would be much faster than flying.

I don’t think either continent ‘gets it’ when it comes to trains, but one is doing much worse.

For reference, the Shinkansen is now over half a century old. This is not new exciting technology, this is something that has been deployed, at scale, for decades and just needs political will and money. The 320 km/h variants have been running for around 40 years. The speed record of 443 km/h is a bit newer but ran commercial routes. The maglev versions, which could do this trip in under two hours, are over twenty years old but are only test tracks, not something that’s a ‘just buy it’ option.

@zebratale @carbontwelve Calculators do make mistakes. Most pocket calculators do arithmetic in binary and so propagate errors converting decimal to binary floating point, for example not being able to represent 0.1 accurately. They use floating point to approximate rationals, so collect rounding errors for things like 1/3.

The difference is that you can create a mental model of how they fail and make sure that the inaccuracies are acceptable within your problem domain. You cannot do this with LLMs. They will fail in exciting and surprising ways. And those failure modes will change significantly across minor revisions.

@carbontwelve I used machine learning in my PhD. The use case there was data prefetching. This was an ideal task for ML, because the benefits of a correct answer were high and the cost of an incorrect answer were low. In the worst case, your prefetching evicts something from cache that you need later, but a 60% accuracy in predictions is a big overall improvement.

Programming is the opposite. The benefits of being able to generate correct code faster 80% of the time are small but the costs of generating incorrect code even 1% of the time are high. The entire shift-left movement is about finding and preventing bugs earlier.

I finally turned off GitHub Copilot yesterday. I’ve been using it for about a year on the ‘free for open-source maintainers’ tier. I was skeptical but didn’t want to dismiss it without a fair trial.

It has cost me more time than it has saved. It lets me type faster, which has been useful when writing tests where I’m testing a variety of permutations of an API to check error handling for all of the conditions.

I can recall three places where it has introduced bugs that took me more time to to debug than the total time saving:

The first was something that initially impressed me. I pasted the prose description of how to communicate with an Ethernet MAC into a comment and then wrote some method prototypes. It autocompleted the bodies. All very plausible looking. Only it managed to flip a bit in the MDIO read and write register commands. MDIO is basically a multiplexing system. You have two device registers exposed, one sets the command (read or write a specific internal register) and the other is the value. It got the read and write the wrong way around, so when I thought I was writing a value, I was actually reading. When I thought I was reading, I was actually seeing the value in the last register I thought I had written. It took two of us over a day to debug this. The fix was simple, but the bug was in the middle of correct-looking code. If I’d manually transcribed the command from the data sheet, I would not have got this wrong because I’d have triple checked it.

Another case it had inverted the condition in an if statement inside an error-handling path. The error handling was a rare case and was asymmetric. Hitting the if case when you wanted the else case was okay but the converse was not. Lots of debugging. I learned from this to read the generated code more carefully, but that increased cognitive load and eliminated most of the benefit. Typing code is not the bottleneck and if I have to think about what I want and then read carefully to check it really is what I want, I am slower.

Most recently, I was writing a simple binary search and insertion-deletion operations for a sorted array. I assumed that this was something that had hundreds of examples in the training data and so would be fine. It had all sorts of corner-case bugs. I eventually gave up fixing them and rewrote the code from scratch.

Last week I did some work on a remote machine where I hadn’t set up Copilot and I felt much more productive. Autocomplete was either correct or not present, so I was spending more time thinking about what to write. I don’t entirely trust this kind of subjective judgement, but it was a data point. Around the same time I wrote some code without clangd set up and that really hurt. It turns out I really rely on AST-aware completion to explore APIs. I had to look up more things in the documentation. Copilot was never good for this because it would just bullshit APIs, so something showing up in autocomplete didn’t mean it was real. This would be improved by using a feedback system to require autocomplete outputs to type check, but then they would take much longer to create (probably at least a 10x increase in LLM compute time) and wouldn’t complete fragments, so I don’t see a good path to being able to do this without tight coupling to the LSP server and possibly not even then.

Yesterday I was writing bits of the CHERIoT Programmers’ Guide and it kept autocompleting text in a different writing style, some of which was obviously plagiarised (when I’m describing precisely how to implement a specific, and not very common, lock type with a futex and the autocomplete is a paragraph of text with a lot of detail, I’m confident you don’t have more than one or two examples of that in the training set). It was distracting and annoying. I wrote much faster after turning it off.

So, after giving it a fair try, I have concluded that it is both a net decrease in productivity and probably an increase in legal liability.

Discussions I am not interested in having:

You are holding it wrong. Using Copilot with this magic config setting / prompt tweak makes it better. At its absolute best, it was a small productivity increase, if it needs more effort to use, that will be offset.
This other LLM is much better. I don’t care. The costs of the bullshitting far outweighed the benefits when it worked, to be better it would have to not bullshit, and that’s not something LLMs can do.
It’s great for boilerplate! No. APIs that require every user to write the same code are broken. Fix them, don’t fill the world with more code using them that will need fixing when the APIs change.
Don’t use LLMs for autocomplete, use them for dialogues about the code. Tried that. It’s worse than a rubber duck, which at least knows to stay silent when it doesn’t know what it’s talking about.

The one place Copilot was vaguely useful was hinting at missing abstractions (if it can autocomplete big chunks then my APIs required too much boilerplate and needed better abstractions). The place I thought it might be useful was spotting inconsistent API names and parameter orders but it was actually very bad at this (presumably because of the way it tokenises identifiers?). With a load of examples with consistent names, it would suggest things that didn't match the convention. After using three APIs that all passed the same parameters in the same order, it would suggest flipping the order for the fourth.

#GitHubCopilot #CHERIoT

@FediTips @stefano @mkj

That's all I'm trying to say with the original post, I'm trying to get across to a mostly non-technical audience why a network being on many servers is preferable to putting all your eggs in one basket.

But that's my point. To a non-technical audience, it doesn't matter. To a non-technical audience, their instance going down for a federated service, or their availability zone going down for a centralised service are indistinguishable. And 'Twitter is down' is an easier thing to explain than 'your bit of the Fediverse is down, but the rest of it is fine'.

There are a lot of benefits from federation. Being able to have a second source if one provider has terms you don't like, for example (though the fact that ActivityPub requires cooperation from the original instance to transfer your followers is a potential problem).

Federation and fault tolerance are orthogonal problems. You can build a fault-tolerant distributed system. You can build a federated system with a load of single points of failure. There are enough good reasons to prefer a federated system without claiming this one, which is a real stretch.

@mkj @FediTips

That there can still be problems (you're 100% right: there can) doesn't mean creating a situation where one affects everyone is better.

It isn’t better or worse. For most users, the thing that matters is the amount of downtime. If power goes out in each town in a country for a different week, is that better or worse than if the power goes out in the entire country for ten minutes and then comes back? Aside from the impossibility of doing a cold start of most electricity grids, I would expect most people to be happier in latter case.

The case with federation is more complex because two people can’t communicate if either of their endpoints is broken, so the failure modes are difficult to reason about individually.

Even though the aggregate network may never go down, in the same way that email is never down for everyone, that doesn’t really matter to most users. The thing that matters is how much there is downtime that affects them.

Being able to use a backup instance is not really a solution, because most Fediverse software doesn’t allow you to synchronise feeds across instances, so you can’t just fail over. Email is slightly better in that you can often send emails from an alternative server if your primary is down but you generally can’t receive them.

It is possible to build replication, transparent fail-over, and so on in a federated system, but ActivityPub doesn’t try to do any of this.

Psychologically there’s also a difference. When something is down for everyone, there’s shared commiseration, when it’s down for you that can be harder, especially when you’re the one using the unusual thing. When a thing that ‘everyone’ uses is working and the weird thing you use is down, people are less sympathetic than when the thing both of you use is down.

@FediTips If the instance that I’m using goes down, it doesn’t really matter to me that only the bit of the Fediverse that I use is down and not the rest. The surprising thing is that small instances run by volunteers seem to have less downtime than centralised systems run by teams of well-paid SREs.

@n8chz @macberg @MartyFouts @DJGummikuh @GossiTheDog The first post is self promotional but boosts are just advertising. I think the main difference currently is that there’s no money (or other tangible incentive) changing hands.

That’s not guaranteed though. If someone collects a hundred thousand followers on Mastodon, companies are likely to start asking them to boost posts in exchange for enticements. The only differences between that and something like LinkedIn are that the reach is limited to people who follow the big account and the money goes to the person who did the boost (who is, at least, explicitly linking their reputation to it) and not the platform.

Nothing in the Fediverse is intrinsically immune to advertising, though being able to flag accounts as spam and have them blocked (and possibly their instances if they fill up with marketing drones) may help.

[email protected]

Posts