**Session Date/Time:** 18 Mar 2026 06:00 This transcript is for the Technology Deep Dive on Secure Routing session. ### **Chair Slides** [View Presentation](https://datatracker.ietf.org/meeting/125/materials/slides-125-tdd-sessa-chair-slides-00) **Warren Kumari:** Hello everybody. Is this mic working? Hello? Mic? The mic works for me, but I don't think it’s working in the room. **Remote Speaker:** Oh, hello, the mic is working. I'm assuming you're remote because it's not working in the room. Could somebody remote ask to say something and I will check if it is just in the room? Sure. We can’t hear them. Can you hear us? **Remote Speaker:** Hello! Can you hear remote? **Warren Kumari:** Yes. **Remote Speaker:** Yeah, that’s... Warren, can you hear remote in room? **Warren Kumari:** I can hear you in the room. **Remote Speaker:** Testing, testing, testing, testing. Testing one, two, three. I don't think you have audio in the room. Warren? I'm gonna reconnect to... Yep. No audio in the room, I think. Yeah. **Warren Kumari:** Testing, testing, testing. One, two, three. Actually, for me, echo. The mics, I think, are on, but the speaker set is off. Do, re, mi, fa, so, la, ti... Do, re, mi... No, no audio. Would somebody who's remote mind asking to try and talk just so we can see if we can hear from the remote people? Oh, there we go! It's all fixed now. Yay, technology! Woo! **Remote Speaker:** Hey, Warren, can you hear me? **Warren Kumari:** Yep, I can hear you now. I think that the speakers were off for the... Testing... Oh, this one is... Yep, it's quite... Alrighty. So, hello everybody. This is going to be Technology Deep Dives on Secure Routing. Hopefully, by now, everybody has seen the IETF Note Well. I encourage you to note this well. I am not going to try and interpret it for you because I am not a lawyer, and I am certainly not your lawyer. This also has the mention of the Code of Conduct stuff, which largely is, in the words of Bill S. Preston, Esquire, "Be excellent to each other." Please read the rest of it, though, because as I said, I am not going to interpret this for you. Probably by now everybody knows how to use MeetEcho and the meeting stuff. A quick note: if you are trying to do anything like present, which I don't think anyone is going to be doing from here, if you turn off your VPN, that would probably help, or at least make sure that your VPN is not going all the way far away and then coming back. This is a quick overview of what we're going to be talking about today. This is a Technology Deep Dive session, which in general is a way for people to learn about sort of cross-area stuff, often about technologies that people will use as sort of base technologies to build on in the future. This one's going to be on Routing Security, and the agenda looks like this. First up, we have Mr. Jeff. Oh, Jeff sounds surprised. There we go. Share slides. Should be the background... Background. That's you. And I will add the slide clicker. --- ### **Background to Routing Security and Why It is So Hard** [View Presentation](https://datatracker.ietf.org/meeting/125/materials/slides-125-tdd-sessa-background-to-routing-security-and-why-it-is-so-hard-00) **Geoff Huston:** Hi, I'm Geoff Huston. Normally, I would sit there and say listening to me for half an hour would be, you know, anyone's definition of deadly boring. And realistically, because this is going to be highly opinionated, I would encourage feedback, shouting, and anything else you want to throw. I still encourage feedback, shouting, and anything else you want to throw, but you've got to go to a microphone to do it. Neh. But if you feel so inclined, jump up and scream, don't wait till the end, because, you know, don't wait till the end. I also should at least say why I got involved in this, because I'm here by accident. When I started down this journey, I came out of an operation in an ISP, and oddly enough, routing wasn't the problem. The problem was that a whole bunch of customers would come to us with IP address prefixes saying, "They're mine, truly, ruly. Please route it for me." And you sit there and go, "Really? How do I know they're yours?" "Oh, well, you can go off to APNIC or somewhere else and get some ASCII bullshit, and that ASCII will say it’s mine. So obviously it’s me." And you kind of go, "Anyone can fake that stuff. Why is it your address?" And the answer was, "I don't know, it's just mine." And as a registry, we kind of felt uncomfortable about that, and rightly so, because how do I know it's your address? And the ISP side, these poor ISPs were sitting there going, "Two people are saying it’s their address. Why should we be the judge? Can’t, it’s not our job." We expected more. And the pressure was kind of put at that point, going, "It would be really good if we used, I don't know, some digital signature stuff, and you demonstrated that it was yours because of something you had, a private key, that you signed something that only you could sign it. I know that it was you, and you can't deny it." This is not a routing statement. It has nothing to do with routing. It's actually a statement about the addressing infrastructure and the fact that on the internet, addresses didn't come from providers per se, like in the old telephone network. They came from customers, they came from campuses, they came from enterprises. Folk had addresses and they brought them to the factory that routed them, the ISP, and that connection was really quite abusable and got abused a lot. And we were searching for a better answer. So I want to go through how we got to routing, which is a bit of a leap from what was originally, at least in my head, a problem with address registries and not really connected to routing at all. So, back into this deep dive. Some problems are hard, some problems are easy and relatively solvable. Routing is a hard problem. And the best in the world stuff up consistently. So every routing discussion starts with the same old BGP porn clips. And this is the classic. Someday in the year's gone by, Pakistan Telecom went and did some more specifics on YouTube and took out YouTube almost globally. Whoop-de-do, it was years ago. But we always repeat it. More and more, there’s websites and all this stuff, you know. Google went down after mishaps through here. Who’s that? That’s out of focus for me. BGP leak in Brazil? Brazil. Brazil, yeah, Brazil. Look, everyone has these problems. Is routing hard, or is configuring routers hard? It’s configuring routers. Why? Because the folk you put on the operations desk are probably your cheapest people in the company, and they're just following recipes, and sometimes they just don't look. And, you know, stuff happens consistently. Our tools get better. This is a BGP mon with a movie of how some hijack happened. It's disgustingly common. And, you know, everyone does it. Sometimes if you're really lucky, you can bring down huge amounts of the internet. Other times you only bring down an island or two. This is Telstra's little mishap that basically took Australia out for a while. It happens. And the issue is, we're not getting any smarter, we're not getting any better. The tools aren't any better. It's the same thing 30 years later. And you sit there and go, "Are we not getting any cleverer? Are we no smarter than we were 30 years ago, or is something else going on?" And then you kind of think, well, surely after this much experience, we would learn. We would understand why. We would be able to understand the interactions between technology and the market. We would understand the economics and motivations behind the uptake of technology, and we would have a better view of trying to develop technology to answer specific problems and deploy it. We'd be good at this. We could demonstrate how good we are with IPv6. That was a joke. And it really does show that some problems are intractably hard for very, very perverse and different reasons. We never had to think about V4. We really didn't. Folk wanted it so badly, they put up with anything, and they were snatching it out of the IETF faster than we could write RFCs. The time in the, in fact, through the 1990s was a weird time where it was just a runaway success. So when we developed V6, we naturally assumed that everything we touched was golden and as soon as we developed this, instantly folk wanted it. No. With V6, we've found almost the complete opposite. You kind of go, "Well, the changes are so slight, what's your problem, dudes? Why aren't we all running V6 30 years later?" And the answer is, oh, it's way more complex than that. It's way more complex. And quite frankly, there's a bunch of folk saying, "That doesn't look so bad for my customers. They don't seem to mind. I don't seem to feel any pressure. Why should I spend money on something that customers aren't going to pay me for?" They're not going to move. And of course, the internet is not a command economy. It's a command economy of peer autonomous networks, each of you are running separate businesses, each of you make your own decisions. And because of that, each of you needs to feel that motivational pressure. And sometimes you just don't. Other times you do. India, V6 success. US, meh, some do, some don't. Nothing bad about it, it’s just reality. So, let's go back to routing. Those disaster porn slides. Did Pakistan Telecom really have a problem with YouTube globally? Unlikely. They were just implementing a local instruction. It's on the "let's not allow local people to see YouTube" and it escaped. Generally, most of those slides are all about unintended consequence. So, most of these routing incidents are not because the evil people are so much better at being bad than the rest of us are good at defending it. Not at all. Because most of the time we just stuff it up. And I think the industry accepts that because we're not terribly malicious when there is an accident. We're forgiving. "Yeah, right, it could have been me." It was me once. I too am guilty for a large-scale stuff up for a day or two. All of us have probably been in it or near that kind of stuff. So there's not as many consequences for having a bad day. It's kind of, "Yeah, that happens. We'll get over it. It's sort of alright." And so when you get headlines like, you know, "Telstra does a routing stuff up and steers a whole bunch of traffic," the real answer is, "Yeah, routing is really hard." And it wasn't anything world domination plan by an Australian service provider, it’s just routing is hard. But there are some new players out there, and they're called regulators. And those regulators have a very, very different view of this infrastructure. Very different. There is no such thing as a mistake. Everything that happens is deliberate and there will be consequences. So I'll go this too close to home, but when an Australian ISP managed to exceed the max prefix limit in their routers and a whole bunch of the network just went down, which also means a whole bunch of triple-zero emergency calls just went down, the issue was there really were consequences. And oddly enough, even into areas of fatality. They couldn't reach the emergency services, bad things happened. Folk got fired, stuff got changed, enormous fines were levied. So that forgiving attitude—could have been me—we're getting much harder on ourselves. And the regulatory interest is, failure for whatever reason is not an answer. You can't make mistakes, they say. The answer is, "I don't know how not to make mistakes." And then, of course, I'm sorry, that's the world we're in. So, I want to sort of talk about hard problems and easy problems for a second, because routing is in the hard problem space and I've mentioned IPv4 was a runaway. Oddly enough, despite the IETF's high-standing moral view, network address translation was a runaway success. Totally. So standards doesn't necessarily mean they're a prerequisite for a technology success. Sometimes the absence of a standard is what motivates its deployment. TCP, which I actually think was the true piece of intellectual contribution into the entire internet architecture, that transport protocol is brilliant. And its successor, QUIC, is equally, I think, one of the more fundamental parts of technology in today's world. These days, I think the world is all names and I think the DNS is a success. No one could ever have deliberately designed an artifact at the scale and speed the DNS runs at. If we had, it wouldn't have. The fact that we accidentally stumbled backwards on a dark night into the DNS is amazing. The fact that it works is a true miracle, and you know, don't touch it. Leave it alone because it works and that's great. We also stumbled backwards into content distribution systems. Not deliberate. But none of your packets go around the world anymore. And in some ways, you could argue that because around 70 to 80 percent of all the content that most users get delivered to their devices come from within a small number of kilometers of where they are, routing doesn't matter. The world is not dragging stuff from long distances to you anymore. We replicate and bring it really close so that most of the commercial internet is just the last-mile single-hop non-routed. That's where we're heading. At the moment, it's a bit of a hybrid. And what's driving that? Well, in volume terms and in money terms, it's all streaming. So these are successes. Don't think we planned any of them. They just happened. They really did happen. They were surprising to everyone. So, what can I say about them? We didn't all need to do it at once. The folk who did it first made money. And everyone else said, "Whoa, I want to do that too." And so early adopters saw reward, tangible advantages came along. As you get big, you get more efficient at the job, the price that you can produce the service starts to drop. This is always the bugbear of the decentralized internet group, because you sit there and go, "If I'm ten times bigger than you, I can produce a service for, let's say, a hundredth or a thousandth of the unit cost." How does a competitor compete if they're not equally of that size? And the answer is, they don't. So volume economics, oddly enough, is a success factor. Big gets efficient, efficient gets bigger, bigger... you know, it's a circular thing. And the other thing about this, realistically, is common benefit. You get cheaper services, the provider basically gets benefit from it as well, that alignment produces deployment. The whole reason why V4 was a runaway success was that the telephone network was the precise opposite: large, inefficient, expensive, difficult to change, and a huge pressure from consumers to go, "I want more." And the telephone companies are going, "We've given you faxes. What's your problem?" Right? So that's the kind of success factors. Where have we failed? Spam. Oh my god, we've failed. In fact, the whole thing about attack, understanding the difference between productive use and hostile use, we have no defense. Everyone feels vulnerable to various points. Even getting into the basic issue of forwarding and bad behavior, source address filtering, BCP 38. Oh yes, it's a wonderful thing. Do I do it? No, no, no, I'll do it next week. Secure end systems—pah! Is that a tautology? There's no such thing. No one understands the complexity of the billions of lines of code that sit inside these devices. And you go, "Well, is it secure?" And the answer is, "I don't know." It's full of about 500 open-source libraries. I had no control over them. Weird stuff happens. It's not secure. It's just, you know, you didn't pay much for it, it's what you get. Secure networks? Only if you cut all the wires, and even then, I'm not even sure that it works. And, of course, secure routing. Massive failure. Why? If we all have to do the same thing at the same time, forget it. Just doesn't work. We're not like that. Why? There is no command and control. There are 220-odd different countries and regimes. There's a whole bunch of providers big and small. There's a whole bunch of different factors. There's a whole bunch of national GDP. Some consumers are rich, some are poor. You know, there's a whole bunch of these things that says everybody is living a different life. So if you wish to apply market pressure in a deregulated communications network environment, otherwise known as the internet, you're kind of dreaming. It ain't ever going to happen. So if you think you need a technology that only works when everyone does it, never going to happen. It's never going to happen because no one's going to move in lockstep with everyone else because there's no factors enforcing that. And when you say, "Well, if we all did it, the world would be wonderful," but none of you feel the benefit until then, you're kind of dooming it, bit like V6 adoption. "It'll be great when everyone does it. When I can go V6 only and it's just perfect." And the answer is, "Yeah, probably would." But this path from no V6 to V6 everywhere and only V6, that path is not one year, it's not two years, it's three decades, and it looks like stretching out for at least another three, possibly more, because NATs are brilliant. So no one feels the pressure. And the folk who did V6 early, Japan was a really early adopter. Really early. Year 2000, big push. Where's Japan now? Same as everyone else. About half the users have V6, half the ISPs do V6, and the rest? "Nah, we'll get around to it." So if there's no clear advantage for early adopters, if there's common benefit but no individual benefit, everything takes time. So, why is this hard in routing? Um, it's technically hard. Routing is difficult. This is not a street map. Let me blindfold you and take you to a different city than this one and drop you at a random street corner where you don't know the language and say, "Get to this hotel. Get to this railway station." And what's your answer? "Where's the map?" and I'm going, "No, you're on your own. Go figure." Routing is like that, because routing is a network of peers trying to construct the map of which each routing speaker sits. It's that hard. And the solutions are very few and, quite frankly, I think there are only two that we've already glommed onto. The original Bellman-Ford distance vector algorithm, 1956, '57, and Dijkstra's shortest path first. Someone can jump up and contradict me, but I'm going to make a stab and go 1962. And none of you were born then, so I'm probably safe, right? Round then. Um, so technically it's difficult. Um, economically misaligned. Routing is everyone. You've all got to do it, but we've said there's big, there's small, there's all kinds of motivations, and to think that there's an economic motivation for all of us to move and change, you're kidding yourself. And the last: risk. How many people live in California? 33 million. One of them is just here. Every hundred years there's a bloody big earthquake, and not just a little one, a bloody big one. The last ripper was 1906, 120 years ago. Why would you live there? Risk mitigation. You sit there and go, "Nah, it won't happen." And you go, "No, no, no, it doesn't affect me. I live in Seattle." Whoa! The earthquake that's going to come there, and it comes every 600 years regular as clockwork, is just one of those earth-shaking massive ones, because it's lateral, not vertical. If you live in the Northeast and it's due, you're dead. When was the last one? 700 years ago or more. Oops! We're disgusting at risk. So if you say, "You should do this to mitigate risk," the answer is, "What hasn't happened to me this week? I haven't got a problem." It's a lousy way to sell something, because humans don't get motivated by risk. They really don't. They undervalue future risk. If I really understood future risk, all I would eat was salads. Same problem. Okay, so let's relate that now to securing routing. No one's in charge. Each of you is running your own autonomous system, and they're called autonomous for a reason. Your own rules, your network, your rules. I might lie, because I might not be an honest person, you can't tell. There's no ground truth to assess whether I'm lying. You have to accept what everyone says as going, "Meh, that's what they said." It's very hard to contradict, so you can't audit this environment for correctness. You just can't. It's like having a road system that says, "You can drive on the left, you can drive on the right, you can drive in the middle, your choice." And it's kind of, "Christ, what happens then? Disaster." Well, that's kind of what we've got here, because there's no ground truth for everyone to set their standards against. So when you get conflicting information in routing, what do you do? Your network, your rules, go figure. There's no external set of rules to arbitrate. Routing is a rumor. I'm not in touch with every router in the world. My router just hears stuff from its neighbors, who heard stuff from their neighbors, who heard, who heard, who heard. Yeah? What did they hear originally? And my router goes, "I don't know." Can I check that? No. There's no credentials that let me look to say, "Did the message get translated badly?" And the answer is, "Well, no." There is no original message. So, BGP is rumor propagation. And now you say, "Let's make rumor propagation secure." And in some ways, you kind of think, that's not just hard, that's a contradiction in terms. So, you kind of go, "Well, why do we want routing security if we don't know what it is? We kind of think it's impossible. It's this really, really hard dream." So it's sort of... meh. Why should we worry? Well, we should worry because lots of people have routers, and lots of people inject routes into the routing system, and it's really easy to be really, really bad in all kinds of ways. And if you really want to try hard and not get caught, routing is a fine place to play. Um, what's the risk? I send traffic towards a bad place and then send it on its way. The bad place tries to break encryption, break everything, and look inside what's going on. No one sees it because it's still getting to the endpoint, but other folk are having a look. Even knowing that you're talking to you and you're talking to him at this time of the day is actually good information. And that kind of silent redirection of traffic is a massive issue for some folk, and quite rightly so. Um, there are a million ways to do a denial-blocking attack. You could write an encyclopedia of this, and, you know, on page 300, you'll find "launch a routing attack." That'll do it. The answer is, "Yeah, it will." Well, if we stop routing from doing it, will we stop denial of service? No, no, no, don't get enthusiastic. Take a deep breath. There are many ways to stuff up services, and this is one of them. Not the only one. But you can divert into a black hole. Um, you can actually start the fake business. You know, why would you fake stuff out? We've got TLS. TLS protects everything, doesn't it? I got my certificate from somewhere that gave me a TLS certificate to make all this work. How did I do that? From the DNS. Does the DNS run over TLS? Well, some of it does, most of it doesn't. So if I manage to confuse someone who wishes a certificate, what happens then? And my answer to you is, go try it. You might have a fine day with this. It might just work. So it's easy to sort of come in at the vulnerabilities and start to undermine the stuff that we really rely on. And the world relies on TLS. And TLS is so badly protected, it's not funny. So, what's the risk? I attack routing, divert traffic, and start using different credentials and mimic your genuine server with my fake. And we've tried it in crude ways, we've tried it in really, really sophisticated ways, and we've tried it always in between. A few get caught, most don't. So, here's the HTTPS vector. That's a recipe on how to do it. Go figure, it's not hard. Because if you can attack routing... sorry, not on. Uh, if you can attack routing, you can do this stuff relatively easily because routing is the weak point that lets stuff in. Um, yeah, too many words, I'm not going to do it. So, what do I want if I say let's secure routing? Well, that's my wishlist, and I'm going to whiz through that wishlist: whether the address is real, you know, whether they've actually got permission to route it, who's given permission to actually inject it into the routing system, is the rumor propagation mill actually correct or is it lying, and what I'm hearing—deliberate routing lies—or is someone just withholding information? Six items. This is an old problem. It dates back to the original formation of the outgrowth of the internet from the early ARPANET project into the project that included the National Science Foundation, NASA Science Internet, the high-energy physics network, and a whole bunch of NSF-funded mid-levels and a couple of internationals thrown in. A hard problem, because routing was kind of "Give me a call on the telephone and we'll try and sort this out." So one of the very early initiatives by the National Science Foundation was to fund the routing arbiter database. Sue worked at Merit at the time, and I think you were involved, were you not? She's nodding a yes. Um, a very early attempt to try to put some order and structure into the routing environment. And what you did in this database is you described the addresses and routes. But what equally it was was an audit list, because if I read that database and then listened to the routes that I got told, I could check them for consistency. I don't know if what you said you were going to do is a lie or not, but I can check that what I'm hearing matches what you said you would do. And that's not a bad first step. Um, so I can use it to filter routes, I can do all kinds of really, really good things. The answer is, well, great. Um, yeah, great. I'm waiting for the right slide that said what's wrong with this. We got better, because, you know, you guys are obsessive-compulsives. Did you know that? You really are. You can't let a simple problem get out the door, you have to add complication. I was sitting there in Happy Eyeballs this morning, watching the Happy Eyeballs folk, dual-stack protocol configuration, talk about the application of Happy Eyeballs in the V6-only world. I'm sitting there going, "What are you talking about?" Same kind of problem here. That instead of simply saying addresses and routes, the routing policy specification language came along. And oh my god, formal language systems that only three people on the planet understood it, and a set of tools like the IRR toolset, RtConfig, RPS tool, and so on, that harvested this data and actually tried to make filters that you could apply in your router. The problem was that while you could check that my routes conform to what I said I was going to do, you can't check if I'm lying or not. And I can say anything. And as long as I did no more than what I said I was going to do, we're good, right? No, you're not. What if I say I'll route your networks, Andy? And it's kind of, "You're not meant to do that," but I just said I could. How do you check the authority model? How do you check that what I said I was going to do is within the bounds of what I'm capable of saying? And of course, in this world, one is a really, really difficult number. If you can do it, I can do it too. And you, and you, and you. Too many routing registries was an instant sort of convolution of this, and whenever you've got two or more, the problem is: who's right? If there's information there but not there, it's kind of, "I'm sorry, I'm getting very confused. I don't understand." And that was kind of the issue with the routing registries: there was no source of truth, there were many versions of some approximation of truth. And of course, not every route was in a routing registry. The data was incomplete. Scaling issues. And so the folk who fixed it were the worst possible folk to want to fix it: the obsessive-compulsives started to make it more complex and more difficult to use, and it's kind of, "Oh my god." And and so it kind of died a death because none of us understood it anymore. Generally used solutions need to be simple. If they're not, they become their own problem. So, we need a robust model. We need a way to pin this down, and I'm back to the original APNIC problem: how do I know it's my address? And the classic answer is digital signatures. You have a private key—don't tell anyone because my god, that really, really matters—sign it with your private key, tell everyone your public key. If you can prove that I signed it, I can't disprove you. It's real, I signed it, it's current, it's got a date. That authority model is actually really cool. It kind of works, and to be perfectly frank, the entire infrastructure of the internet is based on exactly that and nothing more. Nothing more. So, let's put it into routing. Easy, yeah right. Um, the original work—this is work done by BBN, Dr. Steve Kent, I think, was largely involved in the development of this in the late 1990s as I recall—SBGP. He took a different view, and it was an interesting view: I don't care about what I see, I care about the operation of the protocol. Can I ensure that the protocol has not been subverted? That a BGP speaker faithfully gets input, applies policy, and announces to its neighbors stuff without inventing. And SBGP was actually an application of security through protocol correctness, not payload correctness. And that's a really big distinction. Um, everyone needed certificates, including your routers. Oops! Everyone needed to do a lot of processing, including the routers. Cisco were making cheap processing engines. That's a lot of processing. Um, it had its issues. But it wasn't a bad model. It was quite a complete model if that's the way you wanted it. Um, but that routing processing problem, it didn't take long for SOBGP to appear in the internet drafts that said, "That's a lot of processing. We're not sure people will afford it or want to do it, and we're not sure the solution is valuable enough to justify it. Here's a cheaper version, SOBGP Lite." And it kind of goes a little bit differently: it reduces the signing load and the validation load. So, SOBGP was sort of better than nothing and could be better than the routing registries, but had its problems. So the IETF was faced with the conundrum. These days, we call a dispatch group into it, the vogue at the time because the IETF is just fashion and nothing else—it’s very narrow, not a very deep-thinking body—so we spun up a requirements working group, which sat there and went, "Geez, that's a really hard problem to decide which is better. Um, let's not decide." Which was about as helpful as, I don't know, as helpful as you could get, right? They didn't get around to really giving us answers. So, we worked on the bits we could. And this is where I'm going to head towards Keyur, we walked into authority injection. We walked into taking those private and public key signatures and using them in the full horror of X.509. Why? We looked hard at the DNS, which avoided it, and found that what gave the DNS a leg in was the implicit structure in DNS names. If I'm Jeff.com, I got it from .com, .com got it from the IANA, you know, you could see that and that implicit hierarchy in the name gave you a way of chaining signatures together. Addresses don't have that structure. So you have to impose a structure. And X.509, to really think about it, is a way of imposing an artificial and administrative structure on an otherwise unordered space. We had an old RFC that no one ever used, 3779—you could sign addresses. Whoa! So if I get a key from my registry that says that this is my address, I can sign things, and we got into the world we're in today that everyone needs to sign routing origination authorities. We haven't touched BGP yet. But what we have said is, "That's my address. It's a real address. And I give somebody permission to launch a route about that address into the routing system." And I'm seeing a whole lot of discussion today about how many ROAs we've signed. I think Virginia Tech and Georgia in America are busy counting ROA publication rates. It's a national race, and you sit there and go... how good is this? So I'm back to my pony list. If we all used ROAs, the first three of these things are probably okay, but the ones that really matter, the ones that are all about malicious behavior, won't help. So is three out of six good enough? Don't get your pony. Do not deserve a pony yet, because quite frankly, we need to go a lot, lot, lot further. And that's what the rest of the afternoon is going to be about. You can look at these slides later, I'm going to sit down and shut up, but I'll leave you with that slide: we really, really, really wanted the full pony. Thanks. **Warren Kumari:** We're on to the next. Let me just figure out how to get Keyur's slides up and how I get these ones to turn off. Keyur, I'm assuming you will be presenting your slides? --- ### **The SIDR Approach and Alternatives** [View Presentation](https://datatracker.ietf.org/meeting/125/materials/slides-125-tdd-sessa-the-sidr-approach-and-alternatives-02) **Keyur Patel:** Um, sure. Either way. I'll do it. You just tell me when to click next. Okay, sweet. So picking up from where Geoff left—can you hear me? Am I audible? **Warren Kumari:** Yep. **Keyur Patel:** Awesome. Um, the two solutions that Geoff talked about, SBGP and SOBGP, they fundamentally represented different tradeoffs between processing load as well as levels of assurance within BGP, right? And back in the day, around 2002, RPSEC was a new working group that was created and its whole job was to list the set of functional requirements that a secure routing framework would back then address. That working group reached an agreement on using signed credentials to validate the function of route origin validation and said, "Hey, if we could do that, that would be awesome." They also wanted to go slow on path validation. Since then, in 2005, you had a new working group called Secure Inter-Domain Routing that was created. And the whole thesis back then was: hey, BGP work was done in IDR and IDR was pretty busy, had lots of BGP extensions to work on, and these extensions were pretty complicated and time-consuming. And since BGP runs pretty much the whole of the internet, um, that working group also mandated that any extensions that come out from IDR um has to have multiple implementations deployed to show these extensions are robust and have maturity. As a result, a new working group called SIDR was created with the hope that, hey, a bunch of uh designs and prototypes that we talked about for early RPKI and origin validation were already done underway and that it would be easier to sort of get it done in the newer working group. Next slide, please. So inside SIDR, we started looking at a couple of solutions in particular: RPKI, route origin validation, and BGPsec. And they had something very much in common in the sense that separate data outside BGP was used to validate the in-band BGP data. Um, it was reasonably quick to deploy and get some deployment experience. And most importantly, allowed this work um to be deployed and developed inside SIDR outside IDR to be very specific. Next slide, please. So looking at the solutions in a bit more details: if you look at RPKI ROV, as Geoff said, it was based on a separate Resource Public Key Infrastructure called RPKI. And the whole premise there was um this infrastructure would be managed by IANA, um and RIRs, who would be acting as trust anchors and certificate authorities um for this infrastructure. And this would be predominantly used to validate origin AS or the first AS in that AS Path of a BGP prefix. Completely backwards compatible, incremental deployment was reasonably easy and it was quick to deploy. Next slide, please. Then came BGPsec, and what it did was to perform path validation using cryptographic signature. Provided a stronger path validation and a very strong origin validation on that first AS in the AS Path, so as to say that the path that BGP was taking was somewhat secured and guaranteed. However, the solution required um replacement of standard AS Path, which is uh a BGP attribute that carries a set of ASes, with BGPsec Path attributes. Um, and that would carry the path validation related information. Again, relied on RPKI to manage the keys and validate signatures. Reasonably secure, however, computationally pretty expensive um when it came to validating the BGPsec Path attribute. Um, next slide, please. Then came a solution called ASPA or AS Path Authorization, and it basically validated authenticity of the AS Path information within the BGP. Differs from BGPsec in the sense that the check in ASPA goes after the path integrity based on AS relationships rather than um cryptographically validating um the AS Path. And this is done by leveraging RPKI infrastructure again. So, RPKI in the sense came around to support origin, then BGPsec, and now ASPA as well. Next slide, please. And these were the solutions that were developed inside SIDR. The previous alternatives involved, again, IRR, which was discussed earlier as a set of publicly distributed publicly distributed databases which maintained um operators' routing policies as well as IP address ownership. Um, the policy was stored in RPSL, again created in a separate working group, um I believe it was RPS working group, um wherein operators could query and get this data using WHOIS. And it was then used to generate automatic filters um that would be applied on the routers um in various forms of policies, with the hope that it would prevent the route hijacking. This was um somewhat of an old-school way, still is carried out by a few set of folks. Um, next slide, please. And then you had what they call as Peer Lock, wherein uh tier-one ASes basically um using filtering policy rejects any routes that would come from with those basically from tier-one ASNs coming from the customers. This is a manual mechanism that was put in place um with an out-of-band coordination that was done. Um, and then you had Arin's Origin AS, uh where you would again have allowed set of IP address uh owners to list their authorized ASNs. And this would be then implemented as filtering policies. So quite a lot of these filtering policies um are now being actively looked upon in context of what the work was done inside SIDR with the help to automate it and provide a much more robust solution using um RPKI. This is my last slide, where I transition it off to Job. --- ### **Internet routing security, a distributed database problem** [View Presentation](https://datatracker.ietf.org/materials/slides-125-tdd-sessa-internet-routing-security-a-distributed-database-problem-01) **Warren Kumari:** And while I'm bringing up slides, keep in mind we should have time at the end of this for a lot of questions. So start thinking up your questions. And I will add the slide clicker as a participant so the clicker should work. Take it away, Job. **Job Snijders:** Hi everyone. Uh, my name is Job Snijders. Um, uh, this is my first time in China. It feels like a great honor to be here and discuss the complicated problem of routing security with all of you. Um, I hope that some of my insights will inspire you to participate in solving the challenge of routing security. Um, one of the very first problems is that it's not entirely clear to everybody what routing security even is. And I don't mean this in a snarky way. Um, but there are people when you use the word security, it is a very broad word. There are some that will say, "Well, I would like to be safe from free-letter agencies." There are some that would say, "I want to be safe from criminal organizations that are after my digital assets." Then there are others that say, "Well, I need to be safe from this other department in my organization, uh who also log in to the routers and whose configurations might conflict with mine." And differentiating uh malicious harm from accidents is of course very hard as an external observer. I don't know if you stepped on my toe by accident or or on purpose. But what we do know is that the internet is quite important for the functioning of society as we know it, and that disruptions to internet routing have severe consequences for better or for worse. Let's dive into uh when I was much, much younger than I am now. Uh, my first experience with routing security. I uh as I was making these slides, the memory suddenly popped into my head. It was about 20 years ago. It was my first job at an internet service provider in the Netherlands. And my first uh interaction with routing security was a German person yelling at me on the phone. "Job! Why are you hijacking my IP space?" And I was like, "Uh, what is happening?" What had happened is the following: my boss told me, "Job, you must provision a new prefix on a new VLAN. Take a free block, configure it, and uh let me know when that's done." So I went to my laptop and I typed in numbers and I picked what I thought was the first free available block. And I configured it on the interface and propagated this to the internet. And then I went to lunch. And then about 10 minutes later, my boss comes with a phone, red hot, and says, "Somebody on the phone wants to talk to you." And what had happened: in all my innocence, I had swapped two digits. I had typed 48 instead of 84, or some variant thereof. And in my defense, the keys of course are very, very close to each other. So that was my first experience with routing security. And I feel that this pattern has repeated over and over again. Virtually every network operator has at some point made a typo that negatively affected other entities on the internet. And I think this is a key to routing security: the internet is a substrate that we share, uh a spectrum so to speak. And if you accidentally start blasting on someone else's frequency, you disrupt their operation. Um, yeah, that's that's uh one of the problems. And keys on the keyboard, IP addresses, autonomous systems, it's all the same. There's no difference. We have globally unique identifiers, that is the core foundation of how the internet works. And the moment there is accidental duplicity in the use of those purportedly globally unique identifiers, that's where trouble starts. So what is it we can do about this? How can you make a weird phenomenon like the internet, make it grow in a way that we don't step on each other's toes? That we don't accidentally use each other's identifiers or press the wrong keys with severe consequences? There's a few solutions. And here, I think a beautiful photo from the 1900s, this is one type of solution: the security mechanism at play is isolation. You create a wall, and the wall protects you. On one side of the wall, there is chaos, and on the other side of the wall is your operation. And this is a means to secure yourselves. But there are downsides to this type of security device. And one of the downsides in context of internet routing is that the resources we want to isolate from each other, that I cannot step on your IP space and you cannot, you know, step on my IP space, is that the shape of the resources is constantly changing. Internet service providers, their shape on the internet is morphing. Like, you build new connections, you gain address space, you sell address space, there is constant change. So the analogy to a wall is somewhat impractical because the wall is a static configuration and it's not easy to change the position of a wall. And these changing resources change, but the interconnectedness between the resources is also constantly changing. People build new cables, cables get severed, people build new relationships, new fiber, fiber gets destroyed, etc., etc. Rinse and repeat. So we have an ever-changing shaped device that is connected to all kinds of other devices, and it all has to happen at extreme high speed. Yeah, that's that's uh it's not easy. So this brings us to another core problem: not only are the keys close to each other, but it's not immediately clear who operates what keys. Who is behind what identifiers? And I don't care about your your name or your identity, but like the abstract entity: who has this block of IP space? Because if that answer is somehow recognized and usable in an automated fashion, then maybe we can build other security systems. Now, the approach I was given—because and I I will disavow some responsibility for what is to come, because in comparison to some others, I am a youngster, and I was parachuted into this situation. It was already a chosen design, so my involvement was trying to make it work better. I did not pick the core of this design. The approach that was taken by the community at large was one that we know to be hierarchical delegation of authority. Because this neatly mapped to the the mental image that the community at large had with there is, you know, the all address space and then it was subdivided into multiple regions roughly aligning with geographic regions and then further subdivided to ISPs who further subdivided to the end users. And this hierarchy was the footprint for how a routing security solution uh uh how it should be. So the approach was we need a PKI. And and Geoff articulated that digital signatures was recognized as this is the means to our ends. And the PKI that we now use for IP addresses and autonomous systems is the RPKI, the Resource Public Key Infrastructure, which is uh sort of collaboratively developed by multiple entities and amongst them are the regional internet registries and the IETF is the forum in which we conduct business and come to agreement on how exactly it works. Really cool stuff. So the plan to secure the internet was: we'll make a globally distributed database that contains the delegations of authority, which we then use to isolate and interconnect the resources. And this seems counter-intuitive, but we need both: we need the isolation property to safely interconnect with each other. And RPKI is, in my opinion, quite a success story. Um, the Pakistan YouTube—Pakistan Telecom YouTube example from the uh early 2000s is a famous example of how a fairly innocent mistake, you know, the the letters are very close to each other, a configuration error sort of had severe consequences that were not really intended. Um, the exact same situation happened in 2024 with my former employer Fastly. Uh, it was a large telecom operator that wanted to uh sever reachability of a certain IP uh and accidentally their attempt to apply a a form of censorship leaked out into a wider scope. But this time around, nothing happened. And this was because RPKI route origin validation hampered the propagation of the bad data. And it's always weird to try and write a story out of, well, the the accident did not happen, but um yeah, the if you download the slides, the link is clickable and you can read my full report on how uh RPKI ensured that, well, the business continued as usual and we were not globally down. So with this in mind, we can clearly see that the future... uh, I think this is Shenzhen 20 years from now. I have never been in such a modern city, but okay, sidetrack. Um, but this of course is a dream. This is not the daily reality for PKI engineers like myself. So a little bit of a timeline perspective on where we are in the maturity cycle, if that even is a word, uh of of RPKI. In 2005 to 2012, uh the IETF worked hard to develop a set of standards that were interoperable and that was now known as as the RPKI. And this was published. Then in the subsequent years, despite the efforts of quite a few uh fine folk, uh the larger operational the the operational community at large was not either aware of RPKI or convinced that RPKI would help them or they saw other obstacles like, well, my equipment does not support route origin validation, um and so RPKI as a real-world technology lay dormant for a few years. Um, but some momentum was created in in 2018, 2019. Uh, I myself flew around a lot and I was like, "Hey folks, uh remember that you get woken up in the middle of the night because a customer is complaining about downtime? Those annoying phone calls? There now is a solution and we call it RPKI route origin validation. Let's turn it on." And I happened to be in a unique position where I could somewhat manipulate some of uh the deployment strategies. So I worked at a very large telecom operator at the time, and uh I went to my competitors and I said, "We're gonna do this RPKI thing and it is a unique selling point." It was not, but they believed me because I was their competitor. So they put on their roadmaps, "Alright, shit, Job is gonna do RPKI, we must be first." And then I went to my management and I said, "Rumor on the street says that our competitors are gonna do RPKI." And my management was like, "Job, you dirty, dirty player." Anyway, long story short, in 2020, the largest uh intercontinental carriers, the companies that would historically carry the bad route announcements from continent to continent, they all came to an agreement that they would use RPKI. And um this was was very useful because it meant that routing incidents would no longer like a small nationwide problem would no longer immediately pivot into a global phenomenon that hit the news in every country, as happened with the famous Pakistan telecom YouTube incident. Uh, the RPKI route origin validation approach helped isolate or or uh reduce the scope of incidents. The incidents still happen, and I I believe that the number of typographic errors that people make has not gone down in the course of of the years, but the impact they now have on the global system is far, far less than it used to be. So in 2020, a cabal of operators turns on route origin validation. And oh boy, that was shocking. Because it turned out that many of the code paths had never really been exercised in real-world environments, and it turned out that there were discrepancies or ambiguities uh or inconsistencies or let's call them bugs in the RFCs. And I would say, like, in the period from 2020 to 2024, a lot of energy in our engineering community was spent on making RPKI work as it was intended to work. And this happens with most protocols: you have to get some operational experience. Um, I I don't know what we could have done differently, but we spent a few years working on a technology that was old in terms of years but still very, very new in terms of operational experience. But I think we got most of the the sharp edges covered or or softened up. And and in the last few years we are working more on optimizing. Like, there is some breathing room, the foundation of the system works, the system is helping operators and it reduces the negative impact of mistakes and routing incidents, uh so now it's time to to look like, okay, how do we make this thing not just fly, but fly fast? So where we are today, 2024, is most of the internet the global internet routing system, routing tables, is covered by RPKI ROAs. And according to, uh I will say a an American company, so there's a bit of bias in these numbers, uh but most of the internet traffic measured by volume flows towards destinations that are covered by RPKI ROAs. Lots and lots of networks are not using RPKI data to filter out BGP announcements, and this is because lots and lots of networks have sort of outsourced a lot of the operational uh work uh to their vendors, to their transit providers. So what you see in in the use or the application of RPKI data towards routing is that at core junctures like the the transit-free providers or uh route servers at internet exchanges, that's where filtering happens and those were also those were also the spots where the blast radius uh was biggest. So it it's hard to measure what exactly is success, but I do think that we see less and less hijacks that have global impact uh nowadays than say 10 years ago, prior to the 2020 push for RPKI deployment. Confusingly, in the global routing uh system, which roughly is like a million routes, uh there's roughly 5,000 or so RPKI invalid routes. And uh it is always challenging to explain that that there is sort of a background noise, that there is statistical noise, that there is lots and lots of people that have IP space that are not using that IP space, somehow misconfigured their ROA and then you end up with an invalid route that that has limited visibility uh in in the route collection systems. So it it's like, maybe there's an analogy to to email spam. You can have great spam filters, but there will always be a bit of spam trying to come in. And that's okay, that's that's part of a system like this. People can misconfigure their ROAs. Small point: routing security is more than RPKI. There also is basic hygiene practices and RFC 9234 is a very, very strong example of a technology that increases the robustness of the system but does not rely on uh cryptography. And in a similar spirit, uh GTSM, the generalized transport security mechanism that uses the TTL in IP packets, is a very interesting safety device that does not rely on cryptography or pre-shared key, but it is very, very useful because it guards against uh certain types of mishaps. So routing security—RPKI plays a very important role but it's not the the end all be all. Now, RPKI is here, it’s widely used, people rely on it, it's preventing accidents. But what is RPKI really? RPKI is not a routing protocol. And this is somewhat confusing because we're telling routing protocol engineers and operators like "use this thing" and they're like, "Okay, where does this fit? In the OSPF closet, in the BGP closet?" And you're like, "Uh, no, it's actually an entirely new thing. Um, it is a PKI." It's not a routing protocol. It is a distributed database. And that means we have distributed database challenges. They're and I think this was somewhat underestimated in the the global deployment of RPKI, that distributed databases they they come with with many, many sharp edges. So here's a graph—I run lots of monitoring instances that monitor the RPKI from all kinds of vantage points. I have uh relying party instances in multiple countries, I have relying party instances that use only V4 or only V6 or only Rsync or only RRDP or only hide behind anonymizing VPNs. Like, I try to do all the permutations uh to gather information about the RPKI and to understand the shape of this distributed database. Distributed databases are very hard when they are managed by just one corporation and it is your distributed database where you control the application code that interacts with it, but with the RPKI we have a distributed database that is managed by many, many entities, and that is poses challenges. So in summary, we have the the horsemen of the apocalypse, uh challenges that we have in distributed database systems are for instance data consistency: is what this node has the same as that other node has? And if they are different, why is it different? This is where for instance partitioning of the network plays a role: can all the nodes reach their source of information? If they can reach their source of information, how long does it take? And the RPKI is a globally distributed database. As an example, the CNNIC RPKI servers are located in China. That's fantastic. And the Japanese RPKI server is located in Japan. Alright, seems reasonable. But I am located in Amsterdam and the latency from me to this side of the world is significant. And then my friends are in Australia and they have their own RPKI server that is hosted in Australia. Like, whoa! Alright, so there's a lot of networking paths between all the participants with variance um in in latency and loss. Uh, and this negatively impacts replication. How do we efficiently replicate the database? Because you need the whole database or as much as you can get from the database in order to make good routing decisions. Who can scribble into the database? How? Security is a huge issue because the RPKI is used in automated pipelines to automatically decide is this BGP route announcement good or bad. And if you can crack the RPKI in some way, if you can pervert it into doing something that was not the intention behind it, then you hack the routing. So the RPKI that is supposed to be the protection security mechanism also is a new pivot point, a new entry point. And uh the RPKI itself has had security issues in the past. Like, we have had to do uh revisions of RFCs to address uh DOS attacks that were uh not possible before but were introduced by the very RPKI itself. And how do we make this scale to a planet-wide system? To give you a bit of an overview of how big the distributed database is: every second, somewhere on this planet, two new RPKI objects come into existence and the median size of these RPKI objects is 2 kilobytes. Every two minutes, somebody somewhere deletes or adds a route origin authorization. So you can see there's a discrepancy between the automatic maintenance of the database itself, and this is mostly reissuing of CRLs and an RPKI-specific object called a manifest, and the application of the database, which is ROAs. Every 26 minutes, somebody changes an ASPA somewhere on the planet. So that churn is a lot different than the ROAs. Um, this is good and bad news, because route origin authorizations fit neatly into a Patricia tree. You can use a radix tree to express the internet routing table, and your ROAs sort of slot into the radix tree. But ASPAs interact on the graph of how the autonomous systems are interconnected with each other. So the churn in ASPA has very, very different properties when this will be used on planetary scale, because the creation or deletion of an ASPA might affect hundreds and thousands of routes depending on your topological position. So, even though these ASPA objects are really, really tiny, and only we see one change every half hour, the impact of that tiny thing can be very, very big. And we don't know how this will work out in practice. Like, we we did our best in creating implementations, and, yeah, we we gotta see how this, you know, will play out because ASPA is is still very new technology. In short, every 24 hours—sorry, the the total database essentially is 500,000 objects, or files, or entries, whatever you want to call it, and it's growing. And out of the 500,000, every day about 180,000 change. So that's a lot of churn in a lot of tiny files, or tiny objects, that needs to be distributed across the planet. The size of it all is a gigabyte. But that's, of course, in like the the perfect shape, after you have gathered everything and validated everything. The total churn in the system, the retries, the attempts to synchronize and data that you ignore but is transferred anyway is larger. So I would say this is not a small database, given how many participants, how many writers and readers are concurrently interacting with this database, it being a globally distributed database. So, yeah, it it's fun engineering in this field. How do I know the size of the RPKI? Well, I took a tape recorder and I started recording all the data that came into my relying party instances, because I recognized with BGP, there is infinite amount of data. Let's assume there is a few hundred thousand EBGP routers that together form the internet, or I don't know, maybe it's half a million, maybe a little bit more. And every one of those routers has its own perspective on the global internet routing table. And this is a consequence of BGP best path selection and certain changes not propagating as as far as they as they... yeah, the propagation characteristics of the BGP make it so that everybody has its own unique perspective on the routing table. But with RPKI, it's not a routing protocol, it's a distributed database. So it is possible, in theory, to capture every aspect, every issuance, every movement in the RPKI and store it. And that's what I've been doing. This is an image from a project I run, rpki-views.org. And what you see in this graph—you see the same if you go to rpki-views.org where you can also download all the datasets if you're interested—the current size of the database, 500,000; the solid bar is the churn in that database in a given day; and the purple line is the number of unique measurements or snapshots, or like photographs, that I was able to take on a given day. And I I collect all the data, and every day, I normalize, deduplicate, and compress yesterday's data. Because I had to make a compromise between fast delivery of this data and the cost of the storage. And I recognized that if I want to be cheap on the storage side and maintain this, host this data for years or decades, then I cannot be real-time. So there's tension between if I want to be real-time, it is very, very expensive on the storage side. If I create a 24-hour buffer and use that to segmentize and compress the data, I end up with something that is something I can afford. So what is it I can afford? The RPKI change rate, every day, I can compress it into one gigabyte of data. So that means with 365 days in the year, every year generates 365 gigabytes of data in compressed form. How compressed is this data, really? I am approaching a 98, 99 compression ratio. So this means that these small blobs of data are almost indistinguishable from GZIP bombs. So be aware when you download this—when you start unpacking the data, it [whoosh], it inflates by, yeah, a factor of almost a hundred. And why is this necessary? In order to optimize and mature the RPKI ecosystem, we need to understand the RPKI and be able to replay it. And I've come to learn that the RIRs and other key players in the ecosystem, they themselves are not logging or storing or archiving the data that they produce. And this could be simply because of an organizational oversight, but it also could be because it's kind of tricky and expensive to capture the change data and store it at in accessible ways. So maturing the RPKI system, having all that data, has allowed us to make informed design decisions on, for instance, the deprecation of fields that turned out not to be used. And this is great because the simpler the RPKI profiles are, the less room there is for error in implementation, and the better it is for everybody. So having a rich history, being able to rely on that historic data being complete, allows protocol developers to decide this is functionality we can safely deprecate, and this is functionality that is used in the field and cannot be deprecated. The RPKI is a foundation on top of which we build new applications. And it was shown in the previous presentations that route origin authorizations and the mirror component, route origin validation on BGP routers, that was just the first application. It was the lowest-hanging fruit. It was the simplest minimum viable product that could be imagined. And getting to the level of adoption that we have seen in the global internet with just that simple thing was already a tremendous challenge, because we learned as a community that we had to build lots and lots of infrastructure to support this simple application. But now that the RPKI infrastructure is in place and we have it, and we have the pipelines, the validators are deployed and there's automated pipelines to the BGP routers, we can build on top of that foundation and create new applications. And this is where ASPA comes in as a new application leveraging the pre-existing work that went into ROAs. RSC is another application of the RPKI. These links are clickable, so if you download the slides, you can inspect them in more detail. But auditing and debugging the RPKI has become recognized as something that is a challenge and that we must improve on that. So in other advanced ecosystems like the DNS, there is DNS TAP in order to capture and replicate what is actually happening in the DNS application layer. In BGP, there's the MRT format to capture BGP data and store it in a way that it can be later used to inspect what transpired, what parameters influenced this event. And there's also BMP, the BGP monitoring protocol. But we didn't have something like that for the RPKI. So I invented something; it's called CCR, Canonical Cash Representation, and it is a compact binary format to capture the inner state of an RPKI validator so that you can exactly replay: what is the input data into the validation process and what was the output data? So I take photos of the inner workings of the validator, and that allows me to reconstruct inputs and outputs. And this is a tool that is important because we have multiple transports in the RPKI; there's Rsync and RRDP. And they both can run over V4 and V6 and you can anycast them. So is the database consistent from a multi-perspective viewpoint? It's hard to measure because the application of route origin validation uses a protocol called RTR, and that's a TCP connection between your RPKI cache and the BGP router. And that protocol no longer has any of the metadata that is in the to-be-validated cryptographically signed objects. So we were flying blind for multiple years. It was it's almost a miracle we got as far as we did. And performance. It was under-appreciated what the impact of performance is. Because if you have inconsistent performance, where inconsistent means that sometimes a ROA propagates in say 10 minutes, and sometimes it propagates in two hours—if you have that type of variance, it means that if you as operator make a typographic error in the creation of your ROA, you type in the wrong origin ASN, and that your mistake propagates in 10 minutes, and you're like, "Oops, I knocked myself offline or I disrupted the customer service." If your remedy then takes two hours to propagate, that is very, very annoying. So that is one aspect of it. In order for it to be a better usable tool, propagation of RPKI data needs to be more predictable, and it needs to be cheaper. Cheaper usually means it works better over lossy links, or high-latency links, or in difficult circumstances. Like for me, I have great internet access at my home in Amsterdam. But if I'm on a satellite dish, on a ship, or I don't know, somewhere, and I need to download gigabytes of data because the synchronization protocols are inefficient and there's like retransmission of a lot of data to arrive at the desired state—I'm not going to be happy about that. So we need protocols that work well in challenging circumstances. Distributing a database on high-speed links with blazing fast processors, easy. Distributing a database reliably without causing thundering herd effects or without causing needless transmissions—retransmissions—yes, that's that's more challenging. But we have to explore can we improve the performance, because the performance is not for vanity reasons, it is for operational reasons. The data must propagate as fast as we can make it to propagate. And Jeff would say, "Ah, Job, just put the RPKI data in the BGP." Thank you! And like, "But Jeff, that world doesn't exist. I am stuck with this system and I must make this system work well." So here we are. Okay, tomorrow. Other aspects of maturing the RPKI are the creation of open datasets. Because there's lots and lots of brilliant students that come along that could have a positive impact on the RPKI protocols or our understanding of the RPKI. But if they come fresh off university, they're not going to have like the time to wait for a bunch of years to collect all the data and then come up with an analysis. So we need to help them hit the ground running. And in my opinion, one of the ways to do that is that I collect all the raw data, and I give it to the world in hopes that other people can do something useful with it. And I have to do it in a way that it's cheap enough for me to be able to afford to give it for free away to the rest of the world. So this is where RPKI-Spool comes in, which is, um, yeah, really an exercise in how can I make storing the RPKI data as cheap as possible because I need to be able to host it on a puny server in my electricity closet at home. No joke. So we have open datasets. We now have almost six years' worth of data that is essentially a capture of all the RPKI movements of this distributed database. We now have tooling like CCR to capture the states and to reason about differences in inputs and outputs between different implementations or implementations using different transports. So I'm I'm carefully becoming more and more optimistic that this unwieldy distributed database is something that we as a community can manage and maintain, because we are now creating the tools in an iterative fashion to better understand it and then incrementally improve it. Some lessons, and this is my last slide, I think. Routing security is really a multi-decade journey. It is so strange that this type of problem is a multi-generational problem, that the people that helped start the foundation of the RPKI, that not all of them are are alive. Like, the the timespan—20, 30 years—it's enormous. And you have new participants that join the ecosystem and older participants that unfortunately no longer participate in the system. This this is not a project that anybody could solve in a short amount of time. It this means we need to reason about, okay, what does this mean that we have a multi that we have a project that is so big, we might not be able to finish it within the lifespan of a single human? That is bonkers! But here we are. It means patience, perseverance, compassion—is an important one—and also that we have to trust each other. I didn't design the RPKI, but I trust its design. I read as much as I could on the topic, and I'm convinced, I'm like, "Okay, this seems reasonable. This seems like something that it is complicated, but it's it's doable." It's a nice challenge. I like puzzles. That's a separate problem. Um, another aspect that at times has been under-appreciated is there is multiple groups of stakeholders. And out of the groups, there is three I want to highlight, but you have the people that write code, the people that are great at abstract or theoretical thinking, and people that are stuck with the bill at the end of the night—the operators. And the best standards in the RPKI ecosystem have been collaborations between all three factions where there is a iterative feedback loop where the theory and practice are as closely aligned as possible. And CiderOps, thankfully, recently adopted in its charter a requirement for implementations. So this means that there will no longer be RFC publications that were a pipe dream and later on turned out to be hard to implement or impossible to implement, and that we have to do a BIS document and argue about the BIS document. We've streamlined the process by making the barrier to publication a little bit higher. And I will repeat in all working groups, running code must be a requirement before you proceed to present your work to the world as RFC. And if you work with only two out of three factions, you invariably end up in a situation where the party that was left out is not happy with the results. You need the buy-in from all participants in this ecosystem. And then finally, in it is my interpretation—again, I was not there at the start of the RPKI journey—but compromises were made. And the development of the RPKI already in and of itself with just that simple application, ROAs and route origin validation, already was so much work that shortcuts—it's not really the right word—but off-the-shelf components were used to make the project somewhat feasible. So a good example of this is, instead of inventing a synchronization protocol, Rsync was used. Rsync was a readily available command-line utility that would run on all operating systems. And, you know, people were like, "Well, we have a bunch of files we need to synchronize from A to B, so just start Rsync." And I I think that was a perfectly appropriate decision to make at the time. But as time went by, it was recognized that Rsync did not have the scaling properties that were desirable for the RPKI, that for instance the transfer in Rsync is very efficient, but figuring out what to transfer is very, very costly. So, yeah, Rsync was not a good fit. And it needed to be replaced, and it was replaced with RRDP. And replacing an existing component in a running system is a massive challenge. It's like you're flying the airplane and then halfway through the journey, the pilot is like, "Well, uh, we, uh, we got a newer model for our left engine, and, well, strapping your seat belts, we're going to replace the left engine without landing." Like, "Okay. Thanks." And then, you know, RRDP turned out to be not as efficient as we hoped it to be. So RRDP now is under consideration for replacement itself. And the lesson here is not like do not use off-the-shelf components, because if the RPKI community as it was being, you know, 10, 15 years ago, had tried to create a PKI and a protocol towards the BGP routers and the CA signing infrastructure, you know, in many organizations on many continents—if it had tried to boil the ocean, it would not have happened at all. So but it is good a priori to recognize and document we are using an off-the-shelf component that is not fully optimized for our use case, so we need to think about is it possible to replace this component later on or are we painting ourselves into some kind of corner? And this is not always easy to recognize. And in that sense, the advice is a truism, it's, you know, it's hindsight is 20/20. But, yeah, and and you cannot know everything up-front. Like the the exact shape of the RPKI data was not fully understood 15 years ago. We there there was not a strong consensus in the RPKI community on the exact role of manifest, why manifest existed. So for some people they said manifest are critical component, and an a different faction was like, "Well, you you say it's important, but I don't really see the function." And then as the years went by, people started to recognize, "Oh, manifests are a critical component. They are not optional." And them being not optional actually opened a pathway to more efficient synchronization that would not have been possible in the previous interpretation of manifest. So it it takes time to learn how the clock is ticking in your system, and then you can optimize with confidence, and that is that is fun. **Warren Kumari:** All right, questions. And if you could use the the cube, but go ahead. **Linga Jia:** Hey, oh, thank you for all three presenters. Oh, and I think in IETF meeting, drafts updates is some so boring for me like student. I'm a student in Tsinghua University, oh I'm Linga Jia. I love the tutorial today. Oh, so I have three questions for Job Snijders, um. As we know, RPKI is a decentralized database, but but only RIRs can publish an ROA or something what. So it is centralized in some aspects. Oh, [Job shakes head] oh he disagrees with me. So that's my question and you can respond to me. Oh, so I think if the RPKI system is secured from political attack? I mean, like like the Russia is kicked out by oh out of the SWIFT paying system if a war is happening. So is the RPKI secure from such attack? **Job Snijders:** That is a shall I take a first? **Geoff Huston:** I'll do the first one. **Job Snijders:** I'll do the first one. I want to say one thing. I argue the RPKI is a distributed database. I didn't say decentralized. So that is a big difference in English words. It just means it we sprinkle the data over lots of places, but I think what you mean with decentralized is a different concept. **Linga Jia:** So it is centralized. **Geoff Huston:** No it's not. It's also not. Listen to what happened there economically. You have a block of addresses. You can become a delegated CA and publish your stuff wherever you want. You're going to pay for it. You're going to have to run the machinery, and everyone else who wants to work across that distributed database has to visit you every two minutes these days, which is insane. It's really bad design. You're lazy. You're cheap. Everyone else is too, so nothing personal. The RIRs say, "You can do that. Knock yourself out, or for free, we'll publish your crap over here with everyone else for free." What do you do? "I'll take the free option, thanks." Now, interestingly, you're not paying for it. A bunch of people are doing work for free—never a good idea in security—but at the same time, all these other folk trying to pick up that database don't have to visit 70,000 different sites. And so that's probably a good thing. So there are pros and cons, there's engineering tradeoffs in this, but there's nothing stopping anyone from running their own delegated repository. And and other commercial players might say, "Well, yes, this sounds like a fine proposition, I will run one commercially." Go for it. Knock yourself out. There's nothing limiting in the architecture, it's a straight-up commercial issue. **Job Snijders:** And another weird thing to realize is you we build RPKI as an open technology with open standards, so if you want to start a trust anchor, you have all the tools available to you to be like APNIC or ARIN or RIPE. The thing you the that the IETF standards cannot give you is popularity or reputation. So it is perfectly possible to start competing PKIs using the RPKI technology, but you'd have to somehow convince a large constituent following to use your trust anchor. And in this sense, the RPKI is both centralized but there is an option for alternatives. So if one RIR somehow uh comes into bad weather—and there has been lots of concern with the wellbeing of the AFRINIC trust anchor—then it is possible for operators to simply remove an RIR's trust anchor from their uh validators and, yeah, then it no longer exists. And if a new entity comes along that takes over the operation of the RIR that had problems, uh they can distribute their new public key. So the the trust is centralized, but you can choose who you want to trust. The distribution of the data is, well, it's distributed globally. So the data is distributed—I wouldn't use the word decentralized. It is not blockchain, it consumes far less energy, the transaction rate is way better. And I don't think we we need a consensus-based protocol. PKI is an appropriate approach for this problem space. **Linga Jia:** Oh, thank you. And the second question: oh, I know RPKI is built for routing security, but it what it really does is it can tells the world an AS an AS uh wants to say something, it can stores an object in the RPKI system. Oh, [Job shakes his head again] oh he shakes his head again. **Job Snijders:** Yeah, you you've got it the wrong way around. **Geoff Huston:** The AS that is being listed in a routing origination authority did not publish that ROA, did not publish it and doesn't necessarily even know. It's the permission on the part of the owner of the address block to say, "If that autonomous system over there originates in a route for this address prefix, it is doing so with my permission as a prefix holder." Whether the network even knows that it's been given that permission is not the question and it's not being answered here. It's not a handshake, it's a unilateral permission. That's all it is. **Linga Jia:** So it can be a proof of the owner of an IP address, uh the owner say something for its IP address. **Geoff Huston:** Correct, because they have the key, they're the controller of that address. Doesn't say anything about the autonomous system other than saying, "I grant permission." **Linga Jia:** Yeah, thank you. And as we see recently some other drafts may propose proposed that uh the owner of IP addresses want to say something what something else for its IP address. **Warren Kumari:** My turn, Geoff. **Geoff Huston:** What would you like to say? That it's Monday? That it's time for a cookie? Am I like, what else do you want to say as the owner of a prefix? **Job Snijders:** So the good news is, yeah, there is an RFC that allows to use the RPKI for non-routing purposes. And this is Resource Signed Checklist. Yeah, it had become that RFC. Uh and and this is an application that um um leverages the RPKI infrastructure without imposing a cost on the infrastructure itself, and it has it's unrelated to routing. Now, I have yet to see what exactly the community at large will do with this technology, but there is a standardized means to to do RPKI signatures over arbitrary digital objects and, yeah, we'll see. So, yeah, RPKI can be used for non-routing purposes and it's it's to be determined how useful that is and how it's used in practice because it's relatively new. **Keyur Patel:** Can I can I make a quick comment here? Um it's to Job's point: it's not just cost to the infrastructure, if it gets passed to the routers, it's cost to the routers too. And the router's memory and IO as well as processors aren't free and cheap. **Job Snijders:** That is new information for me, Keyur. Memory is no longer cheap. **Linga Jia:** Oh, thank you. and last but not important one is: I know RPKI is a big system and I think running it is costly, so who pays for it? An ROA publisher, or CA owner, or or the IANA? **Geoff Huston:** IANA doesn't pay for this. If you choose to use the publication services provided through your regional internet registry, the costs of that regional internet registry operating that infrastructure are borne by the membership fees levied by that regional internet registry. So it's without further cost to you. That's an option. But that money, that membership fee, is not infinite. There are limitations in the degree of, if you will, money and resources that can be spent in operating it. You might want a better job done. You might want 7 by 24 instant expertise coverage. If you want more than that, or you feel you want more control over that publication, you're more than welcome to use a delegated structure and run your own publication point. And other commercial players might say, "Well, yes, this sounds like a fine proposition, I will run one commercially." Go for it. Knock yourself out. There's nothing limiting in the architecture, it's a straight-up commercial issue. **Job Snijders:** Yeah, the the architecture allows the cost to be distributed in some way. Um but I I think a different answer is: I think the likes of governments need to help chip in for infrastructure like this. It relies on open-source software. Half the world is using my software for internet and I am unemployed. It is a ridiculous proposition that we have so much of society depending on open-source software and the community at large struggles to fund the research and implementation effort. So there the it's very dystopian how there is a gap between who benefits from the RPKI, unknowingly because I mean the average internet consumer has no idea that the RPKI exists, and who builds the RPKI and who is paying for that. It's not figured out. Like, in theory it should work by paying RIRs and so on, but in practice the RPKI is a loss leader essentially. **Linga Jia:** Okay, thank you very much. I take too much time. **Warren Kumari:** I love your haircut. Both the old one and new one. **Job Snijders:** Oh, thank you. It's uh inspired by my Kung Fu teacher. Sue? **Sue Hares:** Sue Hares, Huawei. Um, Job, it sounds like the real challenge is to get 100% coverage. And Geoff, you want stub ASes to put all their data in, right? Am I missing something? You're looking at me like I gave a miss... here's where opinions start to differ all over the place. Without any cohesive mechanism of path propagation protection, even the limited vaguely weird structure of ASPA, which I would contend is broken design—without even that, just simply flooding the world with ROAs is a really expensive way of stopping route leaks. You'd be better off actually doing some kind of audit on routing configurations and routers, and that would be a cheaper solution than getting all of us to deploy this stuff. That is horrendously expensive for a really limited outcome. So part of this issue about routing security, it's the whole "I want a pony." If you've only got half a pony, is it worth riding? Is it worth the cost? Is it really gonna happen? Because I've seen a lot of folk and Georgia Tech has a really good sort of "let's get behind this, let's everyone build ROAs," and you sit there and go, "When you've built it, what have you got?" Have I got routing security? No you haven't. You haven't even remotely close to it. Any determined attacker that wants to do some kind of path synthesization can drive a truck through it. So it's not security, it's the pantomime of security, it's just simply a fine way of detecting certain forms of route leaks. You want to pay that cost? Fine. Does everyone want to pay that cost? I don't know, I doubt it. But I would certainly not advocate "Oh yes, let's all do that, that'd be a really good thing." I think it's a really expensive thing with a really limited benefit. You know, it's a much bigger problem, and you want an expensive pony. Don't settle for half. Don't think half is going to help you. Do the lot or don't bother is my view. Because the whole issue about routing security is all about going all the way. Not only do I want to stop accidental route leaks, I want to make sure that it's hard for someone to synthesize the wrong information. If I can't do that, I want to make sure that if they lie, I could have heard it anyway for real. And that's where the SOBGP thing comes in. Is plausibility good enough? Well, it could have propagated through that path. Even if you fake it, it could have happened anyway. Is that an attack? So one of the ways of protecting the routing system is to actually say, "Look, I don't care that what it got propagated this way. What I care about is to make sure that anyone who tries to fake it can only fake it in ways that could have happened anyway, in which case it doesn't matter." And the whole routing security requirements, that RPSEC thing, got hung up on that argument. The purists from the security world, bless their little obsessive-compulsive hearts, said, "No, no, no, plausibility is not good enough." Even though it's backwards, because propagation is backwards to forwarding—even though it's backwards, we insist that the path is real. And I would actually characterize it as a bunch of operators in the room going, "I don't care. I really don't care how it got propagated. I want to know that the route that I'm hearing, the path that is being represented, is close enough that even if it's a lie, it could have happened anyway. It's okay, I've constrained your lies, or your ability to create lies, to the point where it really doesn't matter." And that was the hang-up of the argument: pragmatism versus purism in a security world. Security folks go, "No, we can't be pragmatic, we need to be absolute," and the operators were going, "God, I don't see—the cost of doing that is so high, we don't see what the initial the additional benefit was." And that's a fine argument to have, because everybody's got to pay the bills, right? So, the IETF found that really hard to delineate, because the IETF doesn't honestly get into those kinds of problems and get asked to make those kinds of decisions. Typically what we do is do standards and say, "Knock yourself out, go and pick one." But when we're really asked to figure out between A and B, we tend to hash it up pretty badly because some folk want A and some folk want B. You know, that's where we are. **Warren Kumari:** We are over time, so. **Job Snijders:** Yeah, I think that the the crux of it, the pragmatism versus purist perspectives, is is very interesting. There there's a Dutch saying uh that the operation succeeded, but the patient died. Um but yeah, I think uh you know, for me personally, I don't need all ASNs to participate, but I do want this type of phone call to stop. And RPKI ROV has done miracles in that regard. So I'm excited to see what else we can do with this system. And I agree with Geoff, it is an expensive system. But the previous system also was expensive and didn't give us similar results, so. **Warren Kumari:** So I'd like to thank all of the presenters for doing such a great job and, you know, coordinating and things. Just like to thank everyone who showed up. Now, I have a request of you all: please let us know what you would like for future technology deep dives, and let us know how you think this went, and how you would like them different, and what sort of topics you would like, etc. You can email, I guess, Andy or myself. You're officially in charge. Apparently it's your project now. You can email either one of us, and I believe there will also be questions in the survey asking, you know, what you thought of this session and others. But if you might be willing to present on something that is of wide interest to the community, especially some fundamental technology that people use as a building block, please let us know. Thanks all.