Session Date/Time: 18 Mar 2026 01:00
Tommy Pauly: Can you hear me if I talk from behind the camera?
Jen Linkova: I see you.
Tommy Pauly: Loud and clear. Good.
Jen Linkova: If you are talking, Jen, we can't hear you. Nevermind, my room is muted.
Tommy Pauly: Remote room 2.
Jen Linkova: Yes, so I... Okay, and this should be webcam mic. Is that different, better, worse? We could play, right? I just, I just used...
Tommy Pauly: Can you hear me okay from here?
Jen Linkova: We can now.
Tommy Pauly: That's good.
Jen Linkova: Cool. And you can hear me as well?
Tommy Pauly: Yep. Oh, sorry, Jen.
Jen Linkova: Yes.
Tommy Pauly: Eric, the problem is, I think at least a couple of times...
Jen Linkova: She's got muted again. Hmm. I think Jen just pressed a button, though, so she chose to.
Tommy Pauly: So Eric, are you in the same room as remote room 2, just a different camera view? I'm confused.
Eric Kinnear: Yes, so we have we-room and camera for me. So yes, this is the same physical space.
Tommy Pauly: Ah, right. Okay.
Eric Kinnear: Just a little, little more fun than, than me pivoting, pivoting the camera here the whole time. Okay. Give people another second to join and then we will start.
Tommy Pauly: All right, let's do it. Welcome to the Happy Session at IETF 125. This session is being recorded. Um, make sure that you've joined, uh, correctly via the in-room tool or, uh, on remote to make sure that you can join the queue. This is the IETF Note Well. These are the terms under which we participate in the IETF. Please note that we also operate under a Code of Conduct. Uh, if you see any violations, you can report those to your chairs, your responsible AD, or the ombuds team. And also make sure to read this closely and take note of your responsibilities underneath the IPR guidelines, um, at the moment that you become aware of any intellectual property that might pertain to the work that we're doing. We've got some nice links here. And this is our agenda for today. So we're going to go through, uh, some updates on the main Happy Eyeballs document, uh, and we've got some discussion there. And then Max is going to give us some exciting updates about some of what's been happening in Firefox land. And Brian is going to talk about a deeply confusing Happy and Sad topic. And then we will wrap some things up. So with that, uh, let's start with our main document here. Do you want me to share slides? All right, and your mic is live.
Nidhi: Um, hi everyone. Uh, so together with Tommy, I'll, we'll be presenting the main Happy Eyeballs v3 algorithm document updates. Uh, can you hear me now? Or how do I...
Tommy Pauly: Oh, you're using the laptop mic, right?
Nidhi: No, this mic.
Tommy Pauly: The camera mic, but there you go.
Nidhi: Okay. Um, next slide. So we'll be going over some of the updates in the latest version of the draft, and, um, Tommy will be going through some of the open issues and pull requests. Next slide. Um, so here's a general overview of the changes that we've landed since the last IETF in the latest version of the draft. Thank you. Next slide. And I'll be going through some of the more non-editorial changes that we've landed. Um, so the first one, uh, this was a topic that we discussed at the last, um, meeting. Uh, we had some text that said that clients may choose to wait for the TLS handshake, but we changed that to a should, um, based on agreement in the room. Um, yeah. Uh, the next one, so there was an issue that was raised about, um, how we should deal with conne- connection attempts with 0-RTT or session resumption and, uh, the fact that they might be unfairly advantaged, uh, in the race. But, uh, that is in fact the correct behavior that we wanted, and so we added some text to clarify that the connection attempt delay should not change for those resumed connections. Uh... um, this was to do with, uh... oh, uh, in the draft we had a bunch of, we have a bunch of recommendations, uh, for how we wanted to, what we wanted to do with the delay values, so we added some text to clarify that, uh, clients are free to evolve the delay values and do what they like, but they must stay, um, they must follow the normative requirements that are outlined in the document. Uh, so in the document, we also touched upon MTU issues, but there was an issue that was raised about what those MTU issues exactly are, um, so now we've added some more text to clarify what they are. Um, for connections that are just using TCP, uh, this might happen, uh, if larger packets are sent, um, but a small MTU is configured, uh, even after the handshake packets succeed, um, but this is not as much of an issue with TLS and Quick because, um, of the higher configured, um, MTU needed for the handshake bytes to succeed. And lastly, uh, we had an issue that was raised about how flushing the historical data on network changes might make the history virtually useless on mobile devices. Um, so we added some text to kind of clarify that we don't want to reuse historical data across different network interfaces, but if a client's able to reliably identify that they're back on the same network, um, it can choose to use the historical data again on reconnection. I think that's it. Yep. Um, do we have any comments, questions?
Eric Kinnear: Looks like we're in good shape.
Nidhi: Okay.
Eric Kinnear: I think Tommy was going to take us through some of the open issues.
Tommy Pauly: Uh, that's right. Hello, everyone. Um, yeah, so just going through, uh, various open issues and pull requests, uh, we're not going to hit everything but we'll hit the most interesting ones. If you go to the next slide, um, for the open PRs, uh, we have three, uh, big ones. Um, the first, uh, about that's been a very long-standing, uh, pull request from Andre about rewriting the IPv6-only section. This is something that at the last meeting, uh, we all agreed we needed to talk more about. The good news is we did talk more about it yesterday, uh, so we will cover that and I'll ask some of the other folks who were in that conversation to help summarize. We, uh, have not updated but we still have, uh, the text hanging around for talking about optimistic DNS. Uh, that raised, uh, lots of, uh, conversation and some people who have concerns about that type of approach last time. Uh, I think the direction we want to take here is to describe it separately. Uh, so I think we'll have some discussions about what form that takes, if that's something that we try to submit as a document to Happy or something else. And then we also have a PR from Ben that we'll talk about later in this deck about a restructuring the document broadly, and we can have a, uh, nice discussion there. So if we, uh, moving ahead to the first the v6... and, uh, so we, we have brought this up before, uh, generally this is just about, uh, refining and updating the section that talks about v6-only networks. This is something that we've had since Happy Eyeballs v2, the landscape has changed, we have more experience, more, uh, types of things, we have Prefix 64, etc. Uh, Jen, did you want to get up and talk us through the summary here?
Jen Linkova: Yeah. Okay, so just to, yeah, remind what was the problem. Unlike, like, traditional network when you just have either just v4 or v4 and v6 addresses and you choose between them, you might now get a situation when you have DNS 64 synthesized addresses, v6 addresses, and there was a question: are they good, shall we treat them as normal v6, or we prefer to use native v4 if available? And when we talk about native v4, do we need to treat CLAT as a native v4 or do we need to differentiate? So it's all created like very complex set of scenarios. And then Andre, uh, uh, discovered an interesting corner case when you resolve a destination, IPv4 destination which should be reachable through VPN, but if you resolve it on v6-only network you might get synthesized v6 address which will be sent to default route to v6-only network and would not reach the destination. Or worse, it might reach the destination but destination might not be unhap- might not be happy about you not coming through VPN. And in this case, yeah, you actually get connection established but you get some error messages. So we'd like to cover those cases and make sure that if you have VPNs and split tunneling, it also works. So next slide, please. So proposal: first of all, most of those issues arise when you have more than one interface. While no sane implementation probably would use WiFi to resolve but send traffic to mobile and vice versa, in case of VPNs it's not the case, split tunnelings are common, so you might do resolution through split tunneling but traffic might might end up on WiFi or vice versa. So for purpose of this discussion, we loosely defined PVD as a set of interfaces, uh, your application might send traffic to, in the after everything is done. So now when we ask for AAAA, when we ask for A. So proposal is, if you have at least one non- at least one v6 address of global scope in that PVD, then you ask for AAAA because you definitely have some kind of v6 connectivity. For v4 there are two cases when application- your Happy Eyeballs should ask for A: it's you either have non-loopback and non-link-local v4 on that PVD, or you do not have v4 but you somehow discovered NAT64 prefix. And how you discover it, it should be out of scope because there are v6-ops RFC which tells you how you should be discovering NAT64 prefix. We all know how to do this. So what we do? v4 answer used as it is, added to the candidate list. Uh, for v6 answer, if we know Pref64, it's unsynthesized, so we do not treat synthesized addresses as v6, instead we treat them as v4. And non-synthesized v6 addresses are just normal v6. They actually might include DNS 64 answer if for some reason you did not discover Pref64, but nothing we can do here. So what I think the benefit here is that we do not need explicitly think about: is it v6-only network, is it CLAT device? We do not care, right? We only care about DNS resolution results and that's it. Because it was discussion: can we detect that it's CLAT and not normal IPv4, do we care and so on? Next slide. Yeah, so basically what's gonna happen here, right? Let's say you get three DNS responses for a given name, and the application detected NAT64 prefix. In this case, we basically synthesized v6 address is transferred to v4 and your final candidate list is just normal v6 and v4. So this would cover actually this VPN scenario, right? And, yeah, there are obviously possible combinations, but it looks like this should work in most of the cases. The obvious side effect of this is that you have a dual-stack host on a network with DNS 64, that host will be sending traffic over v4 and not sending it to NAT64 gateway, but I think it's not a huge price to pay for, uh, making split VPN split tunneling case work. And that's basically it. Comments from Ben?
Ben Schwartz: Hi. Uh, Ben Schwartz. So I am not going to pretend to understand anything that you just said. Um, but uh, my understanding is that this VPN even a split tunnel VPN is modeled as a separate network interface. So there are necessarily multiple interfaces in play here. There's the VPN interface, the virtual interface, and there's some other, perhaps physical interface, uh, and each of these interfaces might be single-stack or dual-stack in various combinations. My understanding is that the scope of the Happy Eyeballs draft is scoped to a single interface. You are like starting point is I am trying to connect to this domain name or really this URL on this interface. Those are inputs to the algorithm. And then from that point, Happy Eyeballs runs. So I think that this question, in my view, is very important but it is out of scope for Happy Eyeballs. This is a decision you have to make before you start Happy Eyeballs, and then Happy Eyeballs is independent of this decision.
Tommy Pauly: Uh, Ben, I... last time I read the whole draft, I don't think it said anything that it's scoped for a single interface. Uh, is it- am I wrong here?
Jen Linkova: So there's a- there's a bit of charter text, uh, where it talks about the algorithm may apply to scenarios with multiple network links or pools of multiple connections, but specific mechanisms for such scenarios are out of scope. So that's mainly about like complex multipath racing about what do you do when you want to simultaneously or like staggered resolve on WiFi and cell and race between those, like that's not in scope for Happy Eyeballs itself, that's another layer up. Um, I think there's a bit of semantics games one can play here. Like, Jen, you know what you said back on the previous slide was kind of setting up that this scenario is only applicable when you have like the split VPN and essentially like that's obviously like the VPN resolver is not literally part of your WiFi config but it has become part of your provisioning domain. Like, you are set up such that you are told on this network some DNS resolution goes here, some DNS resolution goes there, some addresses go here, some go there. Um, and so like in some ways you could view it as that it's a potentially a multi-interface but single PVD configuration. It's not trying to explicitly point to different ones. Um, but also I think when we talk about the proposal, the, the, I think a benefit here for this proposal is that while it solves the VPN case, it is not- it can be agnostic to that. Like, you can run this algorithm just on a single interface. You really it's just saying like, hey, if you- because there are other reasons, there are other motivations beyond the VPN problem for why you want to request the A record. Because there are some things that we have like that have been around since Happy Eyeballs v2 of like the last resort timer of like if you had done AAAA only but then the v6 server's broken, you may not have gotten the synthesized address and then you need to do it later. That is a single interface problem with Happy Eyeballs. Which is also solved by if you have a prefix always ask for A. And even beyond that, back when we wrote Happy Eyeballs v2 we had mainly DNS 64 as the way we did things, but now we have Pref64, we have better ways to get the prefix and do synthesis without a DNS 64. And Jen has told me, "Hey, you need to make sure you support v6-only networks that don't do DNS 64," um, which is great, and we had to fix things around that in our Happy Eyeballs implementation to make sure that we send the A query. So I think broadly this proposal ends up being true, and I agree with you that the VPN case is a weird one and it's more about testing the algorithm. What do you think?
Jen Linkova: I actually think that like because yeah I- I've been asking, right? Does this all apply to single interface, but I think the concept of PVD is really more applicable there because even without v6, without v6-only case, right? If you have a scenario when you can do resolution through one interface and send traffic through another, like in most common scenario is VPN but I guess we might- I don't know, maybe there are others, right? So I think probably the scope of all of this should be to this like PVD thing, right? Not necessarily single interface but a one or more interfaces which are treated together for resolution and traffic sending.
Ben Schwartz: Yeah, so- so doing DNS resolution on one interface and then trying to use the answer on a different interface is like a red flag already. Um, like...
Tommy Pauly: That is unfortunately a thing often done for VPN cases which I agree bad, and I don't think we should specify anything about it but there are many deployments that require it.
Ben Schwartz: Uh, so yeah, I- anyway, I can appreciate that there may be some situations where you- you end up having to basically paper over kind of bad configurations. But, um, but ideally, I would like to, uh, not have to specify them as part of the Happy Eyeballs algorithm, which is complicated enough when dealing with a single interface. Uh, and I would prefer also to, to like within Happy Eyeballs, uh, assume that the DNS answers are from the same interface and then like maybe we can talk about this as a situation where you're starting from an IP literal and an interface. And maybe that IP literal was- was resolved by DNS outside of the scope of Happy Eyeballs.
Tommy Pauly: Right, I- I think that's the right way to think about it. Um, so before we move on because we do have a queue, um, but we also have time. Uh, for this I- I hear you like and I also don't want to have weird text around like, this funky VPN configuration could exist. Uh, I do think this proposal addresses that, but it also can be written entirely in terms of just one interface and everything. Do you have any concerns with the steps here on the screen?
Ben Schwartz: No, I have no understanding of what is being proposed here so I have no problem.
Tommy Pauly: Okay, cool. So there's- there's no objection to saying we should ask for A, uh, when we're on that. Okay, great. Lorenzo? Uh, Jen, did you have more?
Jen Linkova: I was saying, first of all, the document already kind of implies single interface but does not say it, so I guess we can continue doing this, right? We can continue saying, if there is IPv6 of global scope, don't say in interface or PVD and leave it to implementation, it might be an option. No? Whatever. But saying it's bad configuration, it's not bad configuration, most corporate enterprise devices configured that way, so I just do not think we can call them like non-exa- bad examples, like it's way too many of them.
Tommy Pauly: Yeah, Ben, maybe I can sort of help clarify. Um, even without HE of any version, um, a split tunnel VPN will always require something like this, right? You're gonna have to do a lookup on one of the interfaces and your you know if your destination falls out of the even like in the single protocol without v4 or v6, you can you might have to do a lookup on one interface and then you're you know if you find an IP address that falls out of the VPN you have to send the packet on a different interface. So that's sort of it's unrelated to HE or multiprotocol or dual stack or anything, it's just it's just how it works. I do think we do I do think we need to clarify what is and isn't in scope. You know your I agree intuitively Tommy with your like oh yeah that this type of multi-interface racing is out of scope and this type is in scope because blah blah. I think maybe tightening that up would be helpful. I think PVD you know that PVD saying saying PVD is maybe the best way forward here because it's a concept that we've documented, you know it's it's described. And I think it is broader than than interface, but you know these use cases are broader than the single interface. Um, the, um, so having a good specification of what's in scope and what's out of scope would be good. Um, yeah.
Tommy Pauly: And- and just for that scoping bit, you know looking at the exact text in the charter, I- I think I'm pretty happy with how it phrases it, um, at least in terms of you know the algorithm can apply to different things but specific mechanisms that are just for the purpose of doing multi-interface or multi-PVD stuff, those are out of scope. So I think like we- we go into bad territory once we start like having a whole block of text of like "And here's how you race against this other set of answers for this other interface." But as long as you can describe it in a way that works for one, it's okay if it applies to more. Yeah.
Lorenzo Colitti: And I- I would strongly caution against saying, oh, any address anywhere. Um, that's basically impossible to get right. And I think it's important to say no, no, if you have another interface on some other network that's unrelated, do not use the IMS address on the WiFi network. It just will never work, don't ever do it. I mean I think um so I think the other the- actually the thing that I got in the queue to say is, can you go to the example? Um, I think it's like yeah. So I hope that either I am confused or the example is wrong, because I hope that the text is right. So here I think what we are saying is...
Tommy Pauly: But I think in the VPN case that we discussed yesterday, the desired behavior is that we connect over IPv4, right? Because in this case, we would...
Jen Linkova: Yeah, but in this case you also get a global v6, it's not a dual-v4-only example.
Tommy Pauly: It's not- it's not the exact VPN example. Yeah. It's an example where you get two answers, one from DNS 64.
Lorenzo Colitti: Right, the VPN case would have given you only the either the NAT64 or the Pref64 encapsulated address and a native IPv4. It would never give you native v6 because native v6 doesn't exist.
Tommy Pauly: Right, so- so what are we gonna do here?
Jen Linkova: We gonna get the two addresses v6 and v4 and we gonna apply Happy Eyeballs as normal.
Lorenzo Colitti: We just ignore- we just get deconstruct synthesized address.
Tommy Pauly: But it's not- it's not an instructive example because even if the v6, even if the NAT64 prefix doesn't exist, we do the same thing.
Jen Linkova: Yeah, but if it were no global v6 address and just two v4 and synthesized, you will get just v4 in this example. Forget about...
Lorenzo Colitti: Yeah, but it doesn't matter what v4 is, even if there was no v4 address we'd still end up using 2001:DB8::1. I'm just saying the example probably the example is like not very helpful. It's not wrong.
Tommy Pauly: Right, right. So and to be clear I don't think like an unless I haven't seen it, an updated PR isn't around. So like we're talking about this is essentially the proposal here and that exact text needs to be finished.
Jen Linkova: So yeah, like I said, if the inst- the example is simply not illustrative enough, I think that's fine. We should ignore.
Tommy Pauly: No, example saying that we do not like to prefer synthesized addresses over v4. In this case, because what we were initially discussing, if we have those three addresses, in which order? What's the preference? Do we want NAT64 address versus v4? That's why I put this example here. It's more generic than VPN case. VPN case will be subset of this without the first v6 address.
Lorenzo Colitti: I mean, this gets to another point. You as the author of the CLAT draft, right?
Jen Linkova: Yeah.
Lorenzo Colitti: Do we have any text anywhere in 6724bis that says if you have a CLAT address, don't try to connect to a native v6 address? Because CLAT is documented in your draft to use a dedicated IPv6 address.
Jen Linkova: v4 address.
Lorenzo Colitti: Yes, dedicated v4 address but also mapped directly to a single dedicated v6 address.
Jen Linkova: Yes.
Lorenzo Colitti: Do we say anywhere that that dedicated v6 address should never be used to connect to a native destination?
Jen Linkova: Why?
Lorenzo Colitti: You don't think it ought not to be? Because no, because if we do, then we will basically the packet will get dropped on entry because the packet will be because the v6 address is dedicated to CLAT, it will be translated and it won't work. So out of topic, out of scope...
Jen Linkova: Yeah, I think we...
Lorenzo Colitti: I think we do need to fix that though. Okay, sorry.
Jen Linkova: Next up we have...
Nidhi: So for... um, the example it's not instructive for the VPN case, but it also shows the other case I was concerned about, that you get, um, in case you have CLAT and DNS 64, you just get, um, every v4 address probably two times. One time through the A record and one time through the DNS 64. And this, um, with this algorithm we agreed on yesterday, we were able, we are now able to filter these out and just treat them as a single v4 address, which it in the end is. And I think this solves a lot of problems.
Tommy Pauly: Great point. Ben.
Ben Schwartz: Uh, hey. So, uh, I have no objection to, you know, some IP preference order rule or like a thing that says you cross off certain IP addresses when you have others that are known to be equivalent but better, like, you know, in some sort of NAT64 situation. Uh, I think that's fine. Uh, this multi-interface stuff, the more I think about it, the more complicated it gets. Uh, I really think that that, uh, should go to a separate draft. A quick example: imagine that you have, you know, two interfaces, you do DNS resolution on one of them, you get an IP address back that says effectively that would by policy would cause you to use the other interface. Like, in my view, you should flip to that other interface and re-resolve because...
Tommy Pauly: Absolutely.
Ben Schwartz: Except for VPNs, because VPNs are weird.
Tommy Pauly: Even on VPNs, you should re-resolve through the VPN and, um, and get a properly optimized IP address through the VPN. But that IP address could take you back out, causing you to go through... So like there's unlimited complexity here. Um, let's talk about it somewhere else.
Tommy Pauly: Yes. All right, I totally think that is out of scope for this document and the group. Um, but I can tell you from experience that there are cases where, I mean it's- it's baffling, but something will only resolve externally on a non-VPN resolver but will give you an address that only works inside the VPN. And like it relies on doing this, and if you try to re-resolve it or if you try to route it the other way, it's not going to work. There are lots of bad configurations out there. Yeah, I agree, we shouldn't try to address them. Implementations may have to deal with them, but that's separate.
Eric Kinnear: So I hopped in the queue to chat a little bit about some of the scope question. Um, I think given our charter a close read, we're not talking here about the thing that we declared out of scope in the charter, which is, you know, racing different multipath stuff and trying to figure out which path is faster. Um, what we're talking about here is saying, partway through the connection establishment process, your path may have changed because you may have discovered that you got routed somewhere else, but you are still very much only using one at a time. Um, the fact that a few milliseconds ago you used a different one is unfortunate and sad, uh, but I think we're still in okay shape to try to help people cope with this. Uh, and I do appreciate that the proposal here tries to be as as general as possible and cover some of the weirder edge cases without spending a ton of time leading people in that direction. So, um, I think at least from my perspective, it's not unreasonable to have some of that guidance in this document. Um, I will reiterate to Ben's point, I don't think there is anybody who is excited about or would like anyone to ever configure something that resolves in one place and uses in the other. Um, that is generally well known to be a bad plan. Uh, and so I think what we're talking about here is: is there something that we can do that will reduce weird brokenness when people configure stuff strangely? And that is in fact the entire point of Happy Eyeballs, is to make it so that the user doesn't see weird brokenness when somebody configured something strangely and perhaps should go fix it. Um, so I think for right now, having something like this in the document makes sense. Uh, if we want to take more generic guidance about this and pull it out of the document either into another document in this working group or in Sixman or in some other working group, um, I think we should also do that in parallel, but let's let's see what we can do to give a minimal set of guidance that doesn't lose everybody down the rabbit hole while they read this, um, but if you generally follow that guidance you should be mostly not screwed up.
Lorenzo Colitti: Yeah, I- I think the- I think the error was to write the word interface in any place in even in the charter, right? The- there are fundamental- I mean I think going back to the first RFC that defines PVD, I don't know what what what what the number is, it's in the 6000s I think, or 7000s. Um, PVD is a consistent set of configuration information. And the PVD is the level at which a configuration error can occur. Like, an interface is often most of the time an interface is a PVD, there's sort of an implicit PVD concept at an interface. But the way the way VPNs really work, especially split tunnels, is that kind of like the- the VPN interface is sort of the main thing in the PVD, but it but it implicitly pulls in all the routes that were like that are outside the VPN from whatever other underlying default network there is on the system. That's how it works, that's how it's always worked, that's how all implementations behave and how, you know, how operators expect them to behave, it's just how it's supposed to work. So I think using the word PVD is just better and, you know, I'm not saying that we should sort of amend the charter, but I think we owe it to implementers here to say, look, you know, um, let's try to maybe use the word PVD instead of the word interface in this document and just because then it will just become a lot clearer and it will be there's a good definition of it in the MIF RFC, I forget what again I forget the number. Because if we don't do that, if we sort of try try to tiptoe around this stuff, this is the stuff that people are going to get wrong. I mean like our implementation does some stuff wrong in this case and I don't know how to fix it, but you know, if you don't at least tell people look, these are the things that you need to look out for and here's a good way to do it, it's- it's really just, you know, not going to be done right. So I think, yeah, let's let's let's see if we can use PVD as much as we can and let's basically just try to dispel the word interface from anywhere in the document because it's not going to help. Because I think then- then Ben, like the things that you think are stupid and out of scope and like not supported will not be. I think we'll see pretty clearly and it, you know, I think it'll be clear that, you know, when operators configure something with a given intent, the system will actually behave like they expect it to. So I think that's- it's- it's going to be better.
Tommy Pauly: All right, thank you. I'm going to file an issue around referencing PVDs. All right. Okay, I think that drains our queue. So next steps on this one is we're going to need some actual text that incorporates what we discussed. I think the broader changes to talk about PVDs throughout the document should be separate from that, but I think we have a way forward here. Uh, I appreciate the continued help from the folks who are talking here in crafting and refining that text. Okay. So then moving on, the last large chunk I wanted to talk about here was the large pull request from Ben, which I recently went in and made sure we could have some rendered versions and rendered diffs to make it easier to consume as a document. Um, this is a very large restructuring proposal, and here I mainly want to talk about kind of like from what I see what it's trying to achieve and then just like ask some of the broader questions behind about what do we want for the structure and flow here. Uh, and then I guess I'll- I'll summarize it. This is purely my perspective on it, but Ben, I will absolutely want you to give your perspective and insight on it as well, since we have not spoken about it in detail yet. Cool.
Ben Schwartz: All right. Um, yeah. So I- the- the truth of this I think is that we had a discussion about this during chartering. The charter said, um, that we will specify requirements for algorithms and provide an example algorithm. And somewhere along the way, uh, after I, you know, complained about this maybe previous time, I think that text was removed from the charter. So the- the charter does not actually say that. Uh, the- but the draft does, twice actually. In the abstract and the introduction, it says, uh, that the document will provide requirements for algorithms and an example algorithm. And then it does not really do that. It does, I would say, something in between. Um, it lays out in normative language requirements for behaviors, some of which look like requirements about about what you have to achieve and some of them look like requirements about what you're specifically supposed to do to achieve it. Um, maybe the clearest example of this is the grouping and sorting logic, where, uh, there's I think a pretty clear distinction between what we're trying to achieve, which is to produce an order of connection attempts that meets certain criteria, and the like specific approach that you should take uh with like breaking them into groups in certain ways, sorting them. Like sorting, strictly speaking, is not required at all to, uh, you know, produce the same behavior ultimately. But it's not just about the minutiae of implementation, it's also about clarifying the amount of wiggle room. Um, one aspect of this that personal opinion, um, I think that there is basically one implementation of Happy Eyeballs v2. In fact, there was a recent research paper, uh, a- last year I think there was a research paper that came out that analyzed all of the Happy Eyeballs behaviors and compared them against the Happy Eyeballs v2 specification. There is one, uh, that actually lines up with the specification in all their tests, everybody else has their own variations. Um, and I think that that's fine. That doesn't bother me at all. Um, but we should acknowledge that clients are going to have some amount of diversity. They're not all going to implement the same algorithm. And we should write down instead, normatively, what we actually care about. What those clients need to do or really should do, um, in order for the internet to work well. Um, and we can also provide an example algorithm that documents like the- the wisdom of one particular implementation that did it a particular way, but I don't think we should expect everybody to do it exactly the same way.
Tommy Pauly: Um, yeah, thanks for the summary. So I mean absolutely right that I think with v2 that has been primarily one, uh, although I think later we're going to be hearing, uh, from Max about other experience and I know the Google team is working on stuff. So I think one of the hopes and goals for v3 and the work of this working group is to get more, uh, implementation consistency in terms of how things work. Now you're absolutely right that there is a distinction between what is required for, you know, interoperability purposes or like "This is what you need to make the internet work" um and what is, you know, really up to the implementation. But I I- I would come down on the side of saying that almost all of Happy Eyeballs, intentionally and correctly, is just a single-sided algorithm that it doesn't really have much to do with interoperability. Um, it and it's I think I view the relationship more as this is, you know, even though standard track it's a little bit more of like here's kind of like the BCP and like a best practices on how to establish connections in a way that will have the effect of making the user's eyeballs happy and, uh, avoiding broken network deployments or server deployment realities while doing so in a way that does not violate any normative requirements that you have, really from other documents. Like, you've got to make sure like you kind of stay within your bounds and you don't screw up the protocols in what you're doing, but I- I think the- the driving thrust and the goal of you know why you're looking at the algorithm is to learn how to do this. So if you... so here I'll jump through the slides here and then we can just have a broad discussion. Um, so right, like like it says here, this is the particular piece of text. This is holdover from v2. Uh, I have to talk to David like, why exactly did we phrase it this way? I don't remember. Um, and you're right that like it doesn't really reflect I think even how v2 exactly works. Um, but I think the broader question of this topic is, you know, how do we want to draw the line between requirements, best practice recommendations, and examples? So just looking at the structure here, um, the thing on the left is some truncated bits of the table of contents from the current document. Part on the right is essentially the equivalent text in how you are rearranging it. Um, and, you know, currently, like the text is very much trying to be based around like there are three overall stages that we've talked about in this group time and time again of like you do your resolution, you do your grouping and sorting, you do your connecting. Um, and you're right, within those there's a mix of there are a couple notes- not, I think overall, not really that many, a couple notes on like here are the normative things you have to do. Um, but more of it's the you know here's a recommendation for you know how you go about running an algorithm. And then at the end, there's the summary of "And here are the tunable things" as more of an implementation guide. Uh, so then in Ben's PR, there's a essentially the same three but trying to distill it down just to these are normative things and doesn't talk about how you would implement it. And then you have something that's labeled as an example with the tunable things and then again going through those three stages about how you'd go about doing it. Um,
Ben Schwartz: Notably, I didn't have to add a lot of text for this, because we already have effectively duplicated most of these things.
Tommy Pauly: Right, so- so a lot of this I think is like editorial about like how how do we present it to people and communicate clearly. So I think it's a great point in that regard. Um, now so then my subjective comments here, um, I think first like the- the motivating thing here about like the example algorithm line, I- I suggest we should just ditch that particular text because I don't think it aligns with what the charter has, it doesn't align with what the document actually does. I think I- I would characterize the split here being more between there are normative requirements and then there are there's like the recommended algorithm approach. I think it's a little bit different than just an example, because I think it is something where we're saying like "This this is the thing we're recommending you do as part of the document." You don't have to follow it, you're still a good internet citizen if you don't follow it, but if you're trying to get the benefits of it, this is what we recommend. Um, so as I mentioned before, very little of this all is actually normative. It's best practice. So then when we talk about the structure here, my main concern personally with the split you had is just like it's more repetitive and I think it's potentially less clear to the reader going through because I go through this section up here and then a lot of the commentary about how I'd build it is way down in a whole different section of the document. Um, but I think I feel like there are ways that we could achieve what you're talking about, um, potentially differently. So like either we could do a thing where we, uh, you know keep the structure more like what it is on the left here in terms of like the narrative of the discussion is saying "Hey, there are these three stages, here's how they work." And we could pull out the relatively few normative things potentially next to the end, like where it has here the tunables and saying like "Hey, by the way, when you're doing implementation, here's the checklist of all the things in other RFCs that you need to not violate." So we could essentially have this like normative requirements checklist, that's approach alternative one I would suggest on that first arrow. Or the other way to do it would be to in for each of the sections very clearly denote "Here's the normative bits, here's the recommended approach," but group them kind of interior to the three sections, and so we'd essentially have like a section four about resolution where there's the parts of like "Here's the recommended approach and here's the normative bits." And then the grouping and sorting, or you know what you call filtering and prioritizing, like that's all together and then there's the like "This is what you have to do and here's an approach that we recommend to do it" um to just kind of intermingle those parts but be very clear in each of the three steps what is required for interoperability and non-violation of other protocols and what is the the the algorithm as like implementation guidance.
Ben Schwartz: Yeah, so I'm totally open to whatever structure. I think the point that I was trying to get at here is that um is that the question of normative force is very hazy and kind of muddled to me in the current draft. Um, I'd like that to be a lot clearer. And, um, and I think that separating them in this way was very helpful for me to see that like the algorithm strikes me as very underspecified in some places, and the requirements also strike me as underspecified, uh, and separating them made it a lot clearer which parts were missing and where we were sort of dodging by specifying requirements but not how to meet them or specifying what to do but not what is actually mandatory about it. So, um, I think that any any anything that pushes the text toward more clarity there I think is is a good step. Um, I'll give you one example. Like you can see from that outline that there's really a fourth stage in the algorithm, uh, which is created which the requirements require you to take the inputs from the system, like which routes exist on which interface or whatever, and apply some rules which are specified, um, in order to derive the configuration parameters. Like, should I be sending, like, whether to send whether to send A and AAAA queries depends on some rules about the existence of a default route of each family and also something called a preferred address family which can change the priority. So there's logic about what to do with that. Um, I would say that's a stage of the algorithm.
Tommy Pauly: Right, which I guess I've always viewed as part of what's here the first stage of resolution because that's where we describe it today.
Ben Schwartz: Um, so yeah, I think it's just for me it's just about clarity um and making the document to me the requirements are more interesting than the implementation. Like, I don't think that my team is likely to really implement anything that resembles this algorithm, um, as as specified. But I think that we're very interested in gaining the wisdom of like what behaviors it's trying to achieve and seeing which of those could make sense in our context and matching matching those or meeting those requirements in some way. Lorenzo.
Lorenzo Colitti: I think... yeah, I can see how you know I can see how there'd be like lots of objections to like ripping the draft apart and refactoring it like this. But I also see that, you know, I think when we say very little of this is normative, which is true, and then we say this is standards track but there are no interoperability sort of consequences if you if you like violate any of this, that's that's sort of unusual, right? So I think you sort of mentioned the words BCP, right? And if you said look this is a BCP and you said and if we said this is a BCP and we said look, you know, implementations are sort of free to deviate... I mean I know that there are things that we have in this draft to sort of make strong statements that we think, you know, are important for so security and other properties, right? But the truth is, you know, a bunch of, you know, the implementation variance is going to be there because if the interoperability requirements aren't there, you know, implementations are basically going to do what they want anyway. So I think maybe, you know, I think to address this concern, we could you know one way to address you know Ben's concern I think would be sort of to tweak it a little bit without sort of refactoring and like you know separating it into example and reference and requirements and just say look, this is this is a sort of a, you know, our current this sort of encodes our collective best judgment about how to do this sort of as a best way and just call it a BCP, right? I don't know because then then, you know, really none of this is normative. And then you could say look, you know, you must do this because if you don't here's the consequences and then those and then as you say there's very little of that. Right.
Tommy Pauly: And I haven't had the time to go do the exercise of, you know, pulling out okay what are all the actual normative things that we would pull out, but when I'm scanning through it, I think almost all of them, if not all of them, are essentially normative requirements that come from somewhere else, right? Like there's the like you must sort the things you know according to the destination sorting rules that are from over here. And like you must, you know, partition your historical data and cookies like like cookies because that's, you know, that comes from somewhere else too. So I think a lot of those normative requirements really come down to here is the bag of things that when you are implementing the client connection algorithm, these are the things you can bump up against that you've got to make sure you don't violate these things. They're not things new from this document that we're making new claims on the world, but it's like it's a bunch of stuff that we're gathering, uh, to help make sure you don't violate them.
Lorenzo Colitti: I mean I hate to say it then: informational. I'm just kidding. But oh, I don't really care too much about the set there, but I think there is yeah there's I viewed slightly different from informational because it's not like it is a bit opinionated of saying, you know, this algorithm if you follow it will get you specific results around brokenness. And, you know, you don't have to follow it. If you want to be broken on a network that has busted v6 connectivity or if you don't want to have a good user experience, you don't have to do it. But the thing that we're trying to enforce is more between the implementation and the users. Less between implementations on a network.
Eric Kinnear: And- and that's the case for every standard, right? Like we- we- we- we are not the people who can force you to follow anything that we publish regardless of what status it is. So I think we- we- we talked about that a bit earlier. If we think there's a different reason based on how things are showing up that we'd want to change that, we could certainly we could certainly do that. But I think let's let's not use a opportunity to restructure the document for clarity as a as a reason to reopen that. I think that would be more of a "Hey, we looked at it all together and and this is what we're seeing." Anyway, you know, I think the there was some there's some sort of this I think there's a if it's supposed to be informational, you're not supposed to use must in in sort of normative text in informational documents. But if it's if it's the case that all of this language comes from other documents, then I don't think that would apply here. Anyway, I know I don't have strong feelings about this, it's just the other thing is that, you know, if somebody comes up with something better and it's technically non-compliant and it like violates a should or must here, I don't think we would want that to be invalid, right?
Tommy Pauly: Correct. And- and I think we should be very careful to make sure... and I think it is already the case that there's no should or must about like, you know, here's how you have to write your code around it. Like... and we- we should be we- be careful about those to not add musts for things that are a just a choice about how to implement it. Ben.
Ben Schwartz: Yeah, so I think there's you- you described these two categories of things, like things that you like have to do to comply with other standards and um and things that you, uh, you know, hey you can you can break it but like then it'll you're going to get broken behaviors, like that's on you. Um, I think there's a lot in between of there's a third category of like these are things that the text normatively or not tells you to do. Um, and and the consequences of doing it or not in exactly that way or not are like not that clear. Um, like we were having discussion about like timing your retries to the retransmission timeout or there's this business of like, well, you try v6 v4 v6 v4 or you could try v6 v6 v4 or v6 v6 v4. Uh, like the exact ordering of how you do that, like that doesn't matter. Uh, there's...
Tommy Pauly: Right, and- and like for example that ordering one is specifically an instance v2 has been like "This is a tunable, like you can essentially have a preference for how you you know how many you interleave." Um, right. Actually just another thing to pull out because like in overall, you know racing and I think, you know, something you were pointing out of like, you know, doing the grouping, like the document talks about it as you create these groups and then you kick off the groups and you kind of let them cascade with their timers. You could do the exact same thing and it is isomorphic to creating a list, just like an ordered list of preference across priorities where you interleave between priorities. Um, I think normatively we should not care about the difference between them. Uh, the reason it describes it the way it does is I think it is easier to write the text describing it, um, but I think it would be totally fine to say like "If you have some other way of achieving this same end, like cool, that's fine."
Ben Schwartz: But then you're no longer describing an algorithm. That's a requirement. And so like that's the division that I I want. Like I want an algorithm an algorithm says like you do it this way, like and algorithms have run time, right? Like an algorithm is is order N squared or whatever, right? Like but requirements are, you know, abstract.
Tommy Pauly: Right. But I I think it's I personally think it is okay to have this be a description of here is an algorithm that gets you this, if you want to tweak the algorithm or have a different way that comes up with the same effect on the wire and if you want to structure your code differently to get a different O of something, you know like that's fine. Like that's fine.
Ben Schwartz: So I've spent enough time on this already. I'll say one more thing: I would actually like the algorithm to be more algorithmic, um, ideally each of these sections in my view would have a pseudo-code function signature at the top. Like here are the inputs and the types of the inputs to this stage, here is the output and the like structure and type of the output of this stage. Um, I think that vocabulary would be very valuable for me as as I'm trying to pull out the pieces of this that might be useful and have vocabulary about the transformations of the steps of connection.
Tommy Pauly: That's an interesting idea, yeah. Okay, I think we've drained the queue. That is all the content we had for the main doc today. Thank you very much.
Lucas Pardue: There was one little bit of chat in the chat about why HE v2 was not widely implemented. Is there any comment on that? Anything we can learn from from that? As Dave and Miriam had had made those asked those questions.
Dave Plonka: Uh, this is Dave Plonka with Net... oh, sorry, I didn't realize you were there. Go ahead.
Lucas Pardue: Yeah, I'm yeah I'm here.
Dave Plonka: I just uh can talk about a little bit what happened in the comments, uh someone brought up the TU Munich paper that we talked about in Madrid I believe, or Montreal. Um, but basically the one takeaway is that only Safari implemented HE v2, um, and the the question that we should ask as measurement researchers, the subset of us that are, Miriam was saying well why wasn't why would she would say why would people not implement HE v3, and I said well why don't we look at why didn't they not implement HE v2 because we have a lot many years there. And I'm only guessing, my my suspicion is they said "This seems to mostly work with HE v1, the world changed in the last five years with with how often v6 works well so maybe that's why they were able to get away with HE v1." So people weren't really noticing. But my guess is they thought it was complicated, so the cool thing and why I have enthusiasm about v3 is this is the opportunity as you all know to make HE v3 at least seem step-wise simple to implement. And I think that's everyone's goal. So getting something and I don't know how we can do it best, but I'm willing to try to help participate in it. The TU Munich people are continuing, they could survey if they're interested in, you know, they want to keep measuring what HE v3 does so in the community of the the communities of the engineers and the researchers, we should be able to figure out why those aren't there. Some of the people that didn't implement HE v2 are in involved I'm sure you know in HE v3 so they can tell us why they didn't do it and they could help us guard against writing a document that is as hard or much harder than HE v2 was. Maybe maybe that's not the case but that's one possibility of what's going on.
Lucas Pardue: And of course the scope of v3 is is rather greater.
Lorenzo Colitti: I mean it is very complicated for sure. I think probably most folks looked at it and they, you know, for the v6 brokenness problem you can you can get around it, but it's rare enough that you don't really need to build something complicated. I think the other the other advantage of HE v2 is it basically tries harder and it like basically pre- you know it performs better when the SYN get lost, right? And it just tries it just drives those connection attempts a little bit harder and so it results in better latency to connection establishment on on sort of slightly more lossy networks. So it the question is do you even see that in your metrics and like is it a is it big enough? Is it worth the effort? I know that HE v3 is even harder to implement, so I don't know how that how successful that's going to be. Maybe bits of it are going to be cherry-picked perhaps. I think the other thing to say is HE v2 was kind of written, I think it was kind of mostly written after the fact based on one implementation that was already kind of doing a lot of that stuff, and other implementations maybe sort of like structured somewhat differently. Uh, in particular cross-platform implementation needs to deal with getaddrinfo and a lot of platforms can't do HE v2, a lot of platforms didn't or don't have, I think that now they do, async DNS API, so that's more difficult. Uh, I don't know if this is useful anyway but I think the difficulty has a lot to do with it and I do agree that I don't know if there's harm to writing something sophisticated that people don't implement. Maybe there isn't. I mean technically it is better, whether the complex- like how much incremental gain the complexity is going to bring, I don't know. It might be worth looking again at the numbers of the implementation that does have HE v2 and seeing if they could even measure the difference between that and HE v1. I suppose the answer is no they can't because they'd have to write an HE v1 implementation. So yeah, I think I don't know what to say about that but...
Tommy Pauly: Right, I'll go very quickly with some thoughts on this. Um, for the implementations and I don't have exact references, but I do recall I think like there are other HE v2 implementations. Um, but if you're measuring browsers, you're right, like no there's just one browser. And I think there's a distinction between do you have any library that does it versus is this in the major browsers that you're measuring. Um, and I think there are a lot of ways in which, speculating, like this is maybe a consequence of the different layers about how the different browser stacks work, kind of a consequence of just how the teams are set up at the companies working on those. Um, because at Apple, the team working with Happy Eyeballs and connection establishment was the same one dealing with deploying IPv6-only networks and was doing the asynchronous DNS library. And so it was easy to tie that together. That's not the case for a lot of the browser implementations who are having to use getaddrinfo, um, on a system. They don't have control over that, and are probably also, you know, Lorenzo, you work on the v6 stack but like at- at at that time, you know, it was distinct from the team working on the cross-platform browser, um, and so it's just a bit of a separation of concerns too.
Stewart Cheshire: Uh, I'm Stewart Cheshire from Apple. Listening to the previous discussion, uh I was struck by an interesting philosophical point about normative text, because traditionally in the IETF, when we say must, what that means is if you don't do this it won't work, or it'll cause unreasonable harm to the network. And here we're in a situation where the point of Happy Eyeballs is to get a better user experience. Doing things the way it used to be done sequentially, waiting for each connection to time out, technically does work if you're willing to wait 30 minutes for a web page to start loading, but practically uh that's not a good user experience and people give up before that. So it it does shine this interesting light on must is now not this clear-cut line "Does it work? Does it not work?" it becomes "Does it work well enough for the customer to be happy and not give up?" So I thought that was an interesting nuance on this. Um, on the adoption question, um there was a comment that Happy Eyeballs v2 was only done on Safari, and uh in one sense that's true, but for people who don't know, I wanted to elaborate: Happy and we have Lorenzo present, uh and Eric and Tommy who can correct me if I'm getting any of this wrong. Um, the way Apple did Happy Eyeballs v2 was in the network stack. So Safari gets the benefit of that, but so does every other app running on iOS that's using the recommended network APIs. So to present this as narrowly only Safari does this I think is incorrect. I think it's better to say every application that you run on your iPhone gets this for free, or at least the ones that are using the recommended APIs. And my third comment is the way I view this is we'vedone a lot of work at Apple going back 15 years continually trying to make the user experience better. And we did Happy Eyeballs v2 because it made things better, and we're working on v3 because it makes things even better. If other people don't want to do that, I'm sure there's certain people in our management would love for iOS to have the only good user experience for networking. As engineers, we don't take that view. We we benefit when everything works better. So we want to share our thinking about this. It's a two-way street, that we share this publicly with other people who want to do the same thing and we also benefit from feedback and criticism from other people looking at it that may have spotted problems we hadn't thought of. One example that I ran into a couple of years ago with the Matter home automation protocol: that's not web browsing, but it is software that has to communicate over the network with other devices that may have v4, may have v6, may have multiple paths. And we ran into this issue where there were problems on Android because if it gets the wrong v6 candidate address and waits for a timeout, you're waiting 15 minutes for your lights to turn on. And that, again, technically the must, it does work, but who wants to wait 15 minutes for the lights to turn on? Practically speaking, that's not working. And the unfortunate situation was that software written on iOS that uses the networking APIs gets Happy Eyeballs for free, and the way at that time it was implemented on Android, it was built into the browser, it was not built into the network stack for all applications. And we got a lot of pushback saying "Oh, we can't do concurrent connections, that's too hard." And for somebody working on open standards, that's really sad to me. So I would love to see these techniques being built into the base networking APIs so every application gets it. It's not a web browser feature, it's a feature for every application that needs to use the network, needs to do it reliably and needs to do it quickly before the user gives up and throws their phone on the ground in disgust.
Nidhi: Um, I just want to speak from the perspective of Chrome. I guess, like as Tommy said, there are some aspect of like differences in team structures and how that works. But um there is also the aspect of like: this is a complicated algorithm. And yes, we don't implement v2, but in Chrome, we do implement like, I guess we can call it v1++, where we implement like some aspects of v2 but not all the way to v2. Um, but now that v3 has much more considerations of Quick and HTTPS resource records, like it makes it more worth the effort to get those benefits for users, and so that's why we've been investing in that effort more now. Um, but yeah.
Eric Kinnear: Do we have a few seconds? And yeah. I think Stewart before you... sorry, go ahead. I wanted to just chime in about a thing you said earlier, actually, which was some of the benefit that people see from v2 is, I think I very much agree with you, that some of that is literally just a when SYNs are lost and things like that, um it tries more aggressively. And so that kind of comes back to something that Stewart said a minute ago, which was some of the musts are, you know, yeah sure you must not wait 30 minutes for for a connection to come up. But there's also I think a couple of limits in the other direction that say like "Hey, like your minimum time between attempts is is such," and some of those are things that we would probably claim are present in the document in an attempt to protect the network from that kind of excessive load being generated.
Tommy Pauly: Like don't open a connection every two milliseconds, yeah.
Eric Kinnear: Yeah, exactly. So like v2 does help with some of that aggressiveness, and I think some of the stats that that folks have shared over the years show that that is able to get a user a connection sooner, which is really the goal and you want a good connection sooner. Um, but there is some some of the normative language in there is is specifically trying to protect things. Okay.
Lorenzo Colitti: Yeah, I- I just wanted to say, Stewart, from an engineering perspective, I mean I totally agree with you know saying well, you know, I wish my competitors didn't implement this, I do get that. I think one issue is that v3 sort of by its very nature requires glomming together DNS and TLS. And in a layered architecture that's sort of inherited from Linux, that's just really hard. I mean Chrome basically if we wanted to do it like let's say iOS does, we would have to have one library that does DNS and TLS and Quick in the OS, and Chrome would have to use that library. And sort of I think in terms of Chrome being a multi-platform thing, that's just organizationally really, really difficult, I think. But it's and it's a reality in many sort of many many of these software projects, right? Because nobody owns or very few parties own enough of the stack that they can actually write this all rolled into algorithm, all in all in one and benefit from these layering sort of cross-layer optimizations. Anyway, that's sort of architectural concern. I yes, you know, your competitors can't implement this for these reasons and yeah that's better for you, that's that's good.
Eric Kinnear: And next up on that line. Go ahead, Jordi.
Jordi Palet: Jordi Palet. Um, hearing the the last few speakers, I just realized something I am not sure is actually covered in the draft. Because Happy Eyeballs version 3 can be implemented either at the operating system level, let's say network stack or whatever, uh but also by specific applications like browsers. It would not be much better to state that the implementation should be done at the operating system level instead of specific applications? That will have the advantage of avoiding unwanted interactions in between specific applications and the stack if both implement Happy Eyeballs version 3. I am not sure if that could happen, but it's a possibility. But also it will have the advantage that all the applications will actually take advantage of Happy Eyeballs version 3, right? Just an open question. I am not sure is covered somehow, at least as a recommendation or strong recommendation or even a must in the actual document.
Eric Kinnear: Thank you for the comment. Yeah, to echo some of the chat, I'm not sure that we can necessarily require who is going to implement the spec. Um, there's certainly a lot of places where we wish we could make people implement specs, um, but I don't think we necessarily want to go to folks and say "Hey, please don't, this is out of scope for you" if they can provide what what the benefits are. Um, keeping an eye on the time, I think next up we have Max who's going to give us a quick set of slides about some of the exciting things happening in Firefox land. Do you want to request slides or do you want me to put them up and I can pass control to you?
Max Resing: If you don't mind, that would be kind. Thanks.
Eric Kinnear: Perfect. All right, here's these. And here is slides control for you. You should be able to just hit the arrow key.
Max Resing: Wonderful. Okay. Thank you very much. Uh, yeah, as said, I'm Max, working at Mozilla and more specifically the networking stack focusing on HTTP and Quick, but in this case Happy Eyeballs. And what I want to show is how we implemented Happy Eyeballs v3 as in the draft in Firefox, and yeah, what we see so far. Um, if you would like to try all of this out, what I'm discussing today, you can download Firefox Nightly. So far our implementation is only in Firefox Nightly and it is disabled by default. So what you would have to do is go into your installed Firefox Nightly, go to about:config and flip the setting to true. And then from there on, you don't even have to restart, any website you visit will use the new Happy Eyeballs implementation in Firefox. One example for testing this out is the happy-eyeballs.net tester. This is related to the TU Munich group um and the research paper floated around in the chat and was discussed here, the research paper around Happy Eyeballs v2, and they have built really good tooling now for Happy Eyeballs v3, so I can highly recommend that website. But anyways, any website will really work if you want to try it out. Happy Eyeballs is quite dependent, or becomes really interesting, once you have HTTPS DNS RRs. And Firefox is not able to get HTTPS RRs on all platforms. Um, so consider enabling DoH, so DNS over HTTPS, so that Firefox will have access to all HTTPS, to HTTPS records on all platforms. In terms of status, as I said, it is in Firefox Nightly, it is disabled by default for now. We don't have any major deviations from the draft. Um, everything that came up um we discussed on GitHub. So very well done, thanks for the excellent draft. We are very likely to tune the resolution delay and the connection establishment latency. I'll go a little bit into detail on that based on metrics we're about to collect. Um, and then the major big change, but I much rather want to phrase it as an addition, is that we'll likely make our Happy Eyeballs implementation proxy aware. So Firefox, um as all I think I think all other browsers, are able to proxy traffic through like for example an HTTP Connect proxy or for example Mask. And that adds a lot of complexity to connection establishment as you suddenly have this cardinality explosion of: how do you connect to the proxy? What proxy protocol do you use to the proxy? And then what endpoints do you use to the target through the proxy over the proxy protocol that you previously chose? So we'll likely add something like that to our implementation, but I'm mentioning it here not to add it to the draft, I assume that is out of scope, but rather as an addition that we're thinking about. Um, so there are multiple interesting things for the working group, I think, in our effort to add this to Firefox. One: obviously to try it out if you're interested. Another one is that all Firefox telemetry, um or most of it actually, is a public data set. Um, so it can be accessed by anyone, even outside of Mozilla. Um, and we already collect various related metrics to Happy Eyeballs. I'll go a little bit into this. But we are planning to add a lot of Happy Eyeballs specific metrics that will then be measured across a large portion of the Firefox population. Um, that includes for example the resolution delay by record type, number of connection attempts, connection outcome by the various dimensions that we have. Um, as long as these metrics um are in line with the Mozilla privacy standards, I'm happy to take suggestions um by the working group to for example improve the draft with real-world data. Um, after all that are that is like a data set of many million devices. Um, and yeah, I'm happy to add that to Firefox Nightly and then have it advertise the trains to Firefox release eventually. Okay, so feedback input welcome on that one. Um, from here just to mention a couple, these are already accessible. Or all the metrics that I show here now, you have on the bottom right you have a link to go to our public telemetry page, the slides are shared. Um, so for example we would optimize our DNS lookup delay through the 75th percentile or the 99th percentile of Firefox users out there. Um, then we have similar things on connection establishment latency. We have metrics, I'm just posting here a snapshot of it, and again here we would for example tune the connection attempt delay based on these. So I mentioned um how this is helpful to the working group in terms of you just simply being able to test it out, then us being able to provide telemetry, but in addition to that our implementation of the core algorithm is actually an independent library. It's a Rust library, it's on GitHub if you go to github.com/mozilla/happy-eyeballs. Um, it is fully deterministic without side effects, abstract over IO and time. Um, as in: you can easily integrate it into your stack and it has no dependencies at all on Firefox. Um, it's published on a permissive license, MIT and Apache. So if you would like to use this, play around with this, contribute this, um all of that is more than welcome. Happy to discuss anything related, um for example on GitHub. The deterministic no-side-effect abstract-over-IO-and-time is also relevant for the test suite. So the library at this point has quite an extensive test suite taking all the text from the draft and encoding them, formalizing them, into these kind of tests you see on the right. Um, given that the library is abstract over IO and time, that is very easy to do, as in: it's very easy to formalize the Happy algorithm Happy Eyeballs algorithm. And it has been discussed here in the working group in the past whether we should have like a shared set of test cases that then many can test against. And I'm more than happy um if this could for example be used to formalize or discuss various properties of the algorithm in the future. So yeah, this is also on GitHub, happy for input, happy for folks to use it. Next steps for us: if you go on the GitHub repository, you'll find a lot of issues, mostly they're smaller ones that we still have to tackle in the library itself. And then we have a bunch of pending work in Firefox. Those we track on Bugzilla, as in the integration work into Firefox as a larger application. Um, we plan to enable Happy Eyeballs v3 on Firefox Nightly in the coming weeks, as I said currently it's simply disabled but still shipped. And then we plan to ship it in Firefox release in the coming months. And then yeah, we would actually have telemetry data of around 100,000 users and then in the last in the second that would be a significantly larger population. Um, I'm more than happy to share the telemetry at the IETF in the next meeting or via the mailing list. As I said, most of this is already public anyways. That is all from my side. If there's any questions, proposals, happy to answer here. Otherwise reachable via email or Matrix, anywhere else. Thank you very much.
Ben Schwartz: Uh, hey. So are you measuring this against what you were doing before?
Max Resing: We are we will be measuring this to what we did before on high-level metrics. Um, we will not re-implement all the very Happy Eyeballs specific metrics in the old stack, as in for example we will be able to provide like time to first byte, connection establishment latency, DNS resolution and so on. We will not be able I think to provide the number of connection attempts we tried previously to have the final successful connection on the old stack. On the new stack, we will have these kind of metrics.
Ben Schwartz: Sure. But you also said you'll have high-level metrics like page load success or, you know, largest contentful paint or whatever.
Max Resing: Yes. The one that we're we plan to optimize most of this around for this particular algorithm, not in general of course in Firefox, is time to first byte on a request but otherwise yeah, all the other ones are also available. And we will most likely run an experiment as well so that we actually have AB testing of the old and the new.
Ben Schwartz: Thanks.
Nidhi: Uh, yeah. I just wanted to say, uh, this is great. Thank you so much, Max. Um, I think it would be really great if you could share some of the telemetry maybe next meeting or whenever um you've run the experiment and see if there's any um updates to the recommendations we have in the draft based on that data. But yeah, thank you so much.
Max Resing: Thank you.
Eric Kinnear: I would uh I would second that. This is this is fantastic work and a super great update. Um, also excited to see kind of some more results in at 126. Um, but we definitely have said in this working group that we want to guide what we're doing with data from telemetry. So, uh, what you've got there seems like a really good start. I know it's easy to suggest that somebody else add all kinds of metrics, um but I would be interested in in hearing from the rest of the working group, uh you know what are the things that we think we really need to measure? What would be useful to do there? And then I think Max had a good set of notes of things that need to be considered before Firefox could potentially add any of those. So, um, not not signing you up to necessarily add them but it's good to have a list of of things that folks would be interested in. Um, even some of the graphs that you showed here are really really interesting to to dig into and we can do some of that offline. But this is awesome work.
Lorenzo Colitti: Uh, yes, thank you. I I would like to third that. Um, it's yeah, it's great to see data. I'm really looking forward to seeing data from A to B from sort of whatever users you can whatever realistic user sample you can you can disclose. Uh, one one question I had was actually sort of going back to kind of going back to the sort of the earlier discussion around: well, a lot of the benefit in the Happy Eyeballs is just hammering the network fast enough to like get those connections out there. And one thing that's I think that's interesting to me is: if we're measuring time to first byte, we will see that because we're going to hit the network as hard as we can and if you want to optimize for time to first byte, your optimal strategy is to open like a million connections simultaneously, try to get the request out, get the first byte, and then cool, your metric go up and then you basically hit a bunch of buffer bloat which means your maybe page loads get worse. I don't know. So I'm just wondering and this is you know kind of very unhelpful but I'm sort of wondering what what sort of other type of metric we could we could use to characterize user experience, right? So um you know maybe you know maybe like actual sort of median page load page load times, I don't know. Cuz they there may be sort of we may be sort of in one thing that would be interesting if we sort of jacked the timers all down and literally drove the implementation really, really hard, we could sort of start looking for self-inflicted damage. And, you know, because the implementation really like it's very oriented with like: let's turn the crank and see, you know, how fast we can get these connections to succeed. Anyway, that's sort of kind of musing. I I wonder it would be I guess maybe just having a a full page load metric might be interesting. I bet we will my guess would be we'll probably see better improvements on time to first byte than for page load. And so yeah.
Max Resing: If I could answer mention something we're planning around this. Um, so yeah, time to first byte is how well does the algorithm work for us in that particular one. Um, yeah, obviously we could just remove the the delays in total. I expect that the number of connection attempts that I'm listing here is like our check metric, and I expect that in most cases even though this is a really intuitive this is really only intuition right now, that number of connection attempts will be one or two in most cases. So that would be my check metric for now. But also agreed with page load, total page load will definitely not be impacted as much as time to first byte.
Lucas Pardue: I think it's me. Um, so the talk of a kind of web performance metrics has just triggered me in a good way that we we do have like a whole bunch of APIs in the web space to retrieve metrics and beacon them back and there's whole industry of real user measurement stuff that exists. That maybe you're not familiar to the people in the IETF but it does exist. So, um, this kind of goes back to some of the earlier discussion we had around now reporting and and other things about failures. But is there any way that like the result of the Happy Eyeballs choice can be reflected... like some of this is captured already in like the next hop protocol, but I don't know if that's got enough granularity. What I'm thinking here, I don't need an answer right now, but what I'm thinking is rather than put the burden on the browser vendors to build their own metrics and put it in their own telemetry and and host it for people to look at, is that you know find a way to enable people to experiment, opt-in for their websites, and then tell them what was picked and then they can do the performance analysis that they already do. And, you know, maybe we can work together um and get some stuff. Just just a thought. I think the answer is typically no, because of privacy or too hard, but maybe if we have time now it's it's something we could maybe stick on a W3C agenda.
Eric Kinnear: I think that's also somewhat on our agenda. Um, also apologies Ben, were you in the queue freshly or from before? So we skipped you.
Ben Schwartz: I just wanted to respond to Lorenzo. Uh, that I don't actually think that's true. I think that if you uh just tried to blast out the maximum amount of parallel attempts, um maybe with TCP that would have been a winning strategy, but with Quick in the mix, uh I don't think it is. Um, uh and we've done some tests to see where the balance point is and uh it turns out that having long delays actually has been uh performing better than uh than trying to make them very small. Uh, that could be because Quick client hellos actually take up a non-trivial amount of network time. Um, it could be because of CPU use, it could be because if you don't delay TCP then you're more likely to use TCP instead of Quick and if Quick would have performed better then uh that's actually worse.
Lorenzo Colitti: Looking forward to the numbers. Uh, I uh I had forgotten to ask another question Max. Uh, you said that there are some platforms where you can't HTTPS records. I wanted to know if Android was on one such platform, because we there is an API to do this on Android and if you just didn't call it, that's okay, if you called it and you found that it did something horrible for you and it didn't work, then let us know about it, you know? Or if you did or if you didn't even notice it exists, well it does exist, it's called android.net.resolver.send or an you know so look at look look that up. And it does work on I think most of the fleet these days, it was introduced in Android Q, so it's- it's been a while.
Max Resing: I mentioned this here, I don't think we're blocked on Android, I believe we can use the native APIs there and we are using the native APIs. I think we're having trouble on macOS and on Windows 10, even though there might be a solution for the macOS issue.
Eric Kinnear: I see I hopped back in the queue to say a couple of things. Uh, obviously for the macOS one we can certainly help with that. Um, you mentioned that you kind of generally follow the the shape of the document and what it says. Uh, if there are any points that were super confusing or uh places where it was challenging, it'd be good to kind of fold those in. Um, but I think we've been, you know, been chatting about that already so that's, you know, nothing new, just a note to the rest of the gang. Um, the only other thought that I had there was in terms of metrics. I know we were just saying like "Hey, it'd be kind of nice to know like how did the race go and what happened there?" Um, we've had some reasonable success on the iOS side looking at things like, you know, this was the nth attempt in the sorted list that ended up getting connected and stuff like that. And so I think we see exactly what your intuition is is expecting to see, which is that by and large, the vast majority of the time, it's really the first attempt um that works and occasionally the second one. Um, and I think that's kind of the goal here, right, is is we're not trying to say let's go blast the network with tons and tons and tons of traffic all at once. We're trying to say that in in the common case, you connect to the thing that is the quote-unquote best and it works, um and the rest of this is is how you handle the realities of the world not being always the common case. Okay.
Max Resing: Yeah. To paraphrase your suggesting um not just to record the winning connection but actually position in the queue of the winning connection. Yeah, that's a good idea. Thanks.
Eric Kinnear: Like I had, you know, this many v6 and this many Quick attempts and, you know, all the different dimensions and this was the nth in the resulting list. Um, is kind of an interesting drop-off to see. Um, but yeah. Fantastic work. Uh, to Lucas's point earlier, um our next topic it seems like we've drained the queue is to talk a little bit about some of that metric reporting and strategies for doing that. Um, so thank you very much Max. Very excited to see some of the metrics. If folks do have ideas about other metrics that you think of 10 minutes after the meeting is over, um uh they can probably email the list and/or send you issues on GitHub.
Max Resing: Yes. Anywhere is fine. Thank you very much.
Eric Kinnear: Wonderful. Thank you so much. And next up I believe we have Brian.
Brian Trammell: Good morning everyone. There we go. Uh, I might as well present my own slides. Here we go. Good. I can PowerPoint karaoke someone else's. So, um, good morning everyone. Uh, I'm Brian Trammell. I am talking about a um proposal that is complementary and opposite to Max's talk. It's complementary and that it uses a different approach to kind of do the same thing, it's opposite in that it's uh probably not very practical and has no running code behind it. Um, but it's an interesting thought experiment. So the problem I think Lucas stated it um uh slightly better than than I do here is hey we have all of this information at the browser side or at the operating system stack side about uh the state of Happy Eyeballs, but the people who are deploying uh the services that the browsers are looking at where a lot of the problems are going to be are actually on the server side. So how do we get information from one to the other? One is, you know, Qlog-like things, telemetry-like things, the other would be path exposure. So what would that look like, right? Like so we have a couple of paths are racing, um we choose one of them. Uh, it'd be really interesting if we could say, "Hey, I tried connecting to you using this path but I ended up not using it and here is why." Uh, maybe go have a look, right? Like so this is trying to expose slightly more information toward network operators that they have a misconfiguration that is leading to um leading to a path not being selected. Um, again, because it's not necessarily always at the server side, we'd like this signal to be path observable as well, right? Like so I mean like nobody's going to be looking at this in the core but on the access networks on either side um uh you'd be exposing essentially the misconfigurations there. Um, so uh this was an idea that came up after a couple of side discussions in Montreal sort of late in the meeting and it might show. Uh, the basic idea is well it's 2025, let's extend ICMP. Why not? Um, so here's what this would look like, right? Like so you have uh type and code that says "This is a slow alternate detection not selected" message. This is a message that goes along the path that was not selected to say "Hey, you were not selected." Um, what else do we have here? We have um some exposure of the signal that's being sent, so um which hash algorithm is used for normalized DNS answer? We'll get into why we're doing that in a moment. Uh, and a uh sample rate parameter um that essentially says "Hey, of um of 100 unselected paths, I'm only going to send like three," right? Like so a 3% sample rate so that we're not just creating a source of ICMP spam. Um, the 5-tuple, also the source and destination address of the IPv4 or IPv6 header containing the ICMP or ICMP6 message, the protocol and the source port within the ICMP message match the 5-tuple of the non-selected alternative. So that's going to get routed at least at the network layer the same way. Um, mumble mumble hand wave um equal-cost multipath, um and uh is going to have information at the destination that matches to um the destination port. There are also some NAT things that you would have to do here to get this to work properly. Um, this also contains uh for observability from uh at the server side where the server side is going to know which DNS answer um led to the client thinking that the path exists. It has that hash of the of a normalized DNS answer, right? Like so this allows linkability back to um back to the uh domain name and answer for that domain name, but this is hashed so you're not essentially then exposing information that would be in SNI or ECH. So why do this? Uh, again, um one of the failures and success of Happy Eyeballs is it converts availability risk into a performance penalty. And if that performance penalty is acceptable, then it's um it is um not going to lead to anybody at the server side or on the network side fixing it. Um, the client always knows when the path is not selected. All the server's going to see from this is traffic mixes, right? It's like "I'm getting this much v4, I'm getting this much v6. Ah, that's close enough. We're not going to look into any persistent misconfigurations that we don't know about." Uh, why ICMP? Um, this is what ICMP is for, right? Like exposing information to the network path about like diagnostics. Um, another reason to do this is that yes, 2025, 2026, uh adding a type and code to ICMP is a weird thing, maybe a bunch of new ICMP message hitting firewall logs will make this new feature discoverable for people who are not reading the output of this working group. Um, yeah and then the hash DNS for keeping to leak name-level debug information to entities that don't already have it, um and sampling to keep this from turning into a spam source. Um, and the sample rate advertising makes the the um post-measurement statistical analysis a lot easier to do without guessing what that sample rate is. And you might want to have the sample rate be adaptive. So this is sort of a minimal um minimal conceptual approach to this problem, right? Like what would happen if you glued ICMP to this problem? What's the minimum thing that you could do? There's more that you can do um that might be interesting to dig into, right? The client usually has more information about the failure. Um, there are some discussions about the cost-benefit analysis. Like so for example what we were talking about in the telemetry uh just now is um the ability to say "Okay, well this was ranked 2, this was ranked 3, this was ranked 4," right? Like you could expose that, but there's always going to be sort of a a risk-benefit tradeoff uh with respect to sending that along too. Um, the client also knows the path that succeeded, um and in some cases the path failure might be a failure of routability, right? Like so the basic assumption of being able to send this ICMP or ICMP6 message along the same path as the non-selected alternative um is that that's at least routable. But it might be that that misconfiguration is so bad that that's not even routable, right? Like so I did not... I'm talking to you over here, I'm not talking to you over there because that's not a real usable path, right? Like so I gave you a I gave you an um a AAAA answer that has a a uh DB8 IPv6 address in it, for example, right? Like so that exists. Um, we could also send an alternate failed message when the alternative is unreachable. There are a lot more path linkability concerns about that, so like we thought about this in the in the early um conceptualization of it but decided not to do it. Um, so this is more thought experiment and the the idea is to start a discussion. Um, there is sort of a uh there is a question as to whether this past draft might be a base basis to consider talking about further in the working group. Uh, so with that I see we have a queue and I will stop talking.
Nidhi: Yeah, Brian. Um, I really liked to see this and because I really like signaling when Happy Eyeballs fails and why it fails. My basic question would be: would you only send the signal to uh the let's say top-ordered uh connection attempts? So if you have for example 12 candidates and you picked the second, would you only send this path signal to the first one or also to the ones that did not yeah be considered anyway because they're sorted too too low in the in the rank?
Brian Trammell: Um, that's an interesting question. Uh, probably off the top of my head I would say that the different uh different rank-ordered things might get different uh sampling rates, right? Like so you want more information about the ones that are closer to being selected probably so you'd have that at a higher sampling rate. But that would be, you know, that would be a detail uh a detail to work out later I would think. Thanks for the question.
Ben Schwartz: Uh, hey, Ben Schwartz. Uh, I want to push back against the hash. Um, like I'm not sure, maybe I'm misunderstanding, but it does not seem to accomplish the privacy goal here. Uh, it seems like anybody who is interested in knowing what this is could reconstitute this could test your hash against the um possible destination in order to confirm that you were in fact attempting to connect to something. So...
Brian Trammell: So if you know if you know like all the names that are likely to map to an address, yeah, that's pretty easy to to disambiguate. Um, the larger the set of...
Ben Schwartz: Yeah, so that would totally break ECH. That's like very specifically the encrypted client hello threat model.
Brian Trammell: So there's a- there's a... I mean, yes, this is a- this is not a necessary feature of the proposal, right? Like so you take this out and you still get most of the same benefits. That's part of the sort of the cost-benefit discussion I want to have. Huh?
Ben Schwartz: I think- I think this can be fixed with some some public key crypto. If you're willing to do some some public key crypto, if you're willing to put an HPKE on it, then um this can be fixed, I think.
Brian Trammell: Okay. Cool. Um, but the other thing I wanted to... okay, a couple of other points. The HE v3 uh exists because there's more to the path than uh than Layer 3. Um, and so one of the most valuable things to be able to signal would be like "Hey, I am using your path but I'm using TCP because Quick appears to be broken for me on this path." That's a super common problem that I would love a way to signal to the path. Um, I this so I don't think we can encode that yet in this framework.
Brian Trammell: Well, so you'll notice that I'm like I've got like type um 44 code 1 and the rest of the code space is reserved. That's why the rest of the code space is reserved is exactly to yeah, so. So the transport like again it's the same thing as with the with the telemetry: you can infer that right now from just looking at the protocol that was the non-selected alternative, like looking at the next header. But it yeah, I think it makes a lot of sense to add a to add a code for um not-selected Layer 4 alternative. And we've got 8 bits there so we can say a lot. The last thing I would say is um if you are using ServiceB HTTPS records then you have a bunch of you have some powerful stuff that you can do here like publishing an HPKE public key or um this also closely resembles some discussions that have been bubbling since the beginning of like Alt-Svc where um for the successful connection people want to know why did you choose this endpoint. That you can do in-band inside HTTP for example, you can use a request header and so there's this idea of Alt-Used or alt-used which was defined but not very widely implemented. Um, or like we could try to define a ServiceB-Used similar thing. Um, that would be pretty valuable as a server operator. It'd be great to have that kind of information. It has never managed to overcome privacy concerns so uh but this looks a lot like that so um if we can overcome the privacy concerns maybe we could do both.
Brian Trammell: Yeah, I will I will dig into the history of that actually. Thanks a lot for the pointer but yeah I think that the the general principle is things that are only usable or useful at the server should not be exposed via ICMP, right? Like the point of this is like what is the subset of information that is also useful one or two hops away uh either on the client side or the server side.
Lars Eggert: Lars Eggert, Mozilla. Plus one on the privacy aspect. So it seems uh kind of interesting that we are trying to privacy protect the users very hard and at the same time sort of ICMP sending stuff about where the users are talking into the network. Um, so that we certainly need a work out. Um, the other question actually got in line for is um it's 2026 and we're changing ICMP. What's the feasibility of generating ICMP from userspace on consumer operating systems these days?
Brian Trammell: Um, not great. Uh, yeah I mean like I like I did say this was a complement in every way to the to the Mozilla telemetry approach, right? Like and and that is the um that's the reason for the snark in the slide, right? Like so the the um I don't think we have a way to do path exposure that isn't that would like actually meet all the other goals of discoverability that isn't ICMP. Um, but yeah, that's like the biggest feasibility issue here.
Lars Eggert: If you can't send it, it's not going to help you, right? That's my point, right? If if as an application running on iOS or Android or even Linux and macOS and Windows, right? If I if I can't send the ICMP as an app then we're kind of dead in the water.
Brian Trammell: Right. Thanks.
Lucas Pardue: So, um, thanks thanks Brian for bringing this topic up. Um, I think Lars and Ben probably addressed a lot of the points I was going to make. The the one thing I was unclear of of is like in this proposal is who who is the information for? As a server operator, we you know anecdotally see loads and loads of connections that never make any HTTP request. They're successful, they go through a whole TLS or Quick secure handshake and then we don't see anything on them. So to echo some of the comments is if that's the target, like we want to pass the information to the server that the client abandoned, that sounds useful and could be achieved either through something like an H2 frame or a just the close code or a TLS you know alert notify or whatever like I think there's ways we could maybe do some of the signaling while addressing the security consideration side of things. Um, the other thing Ben mentioned was like DNS. Uh, there are and I mentioned Nell, network error logging, uh there is a a discussion an active one that's been open for about two years. Uh, I just posted a link in the chat uh about distributing Nell policies in the in the DNS. So that would allow you or allow a client not you but a client to uh do do this reporting out of band but still in a way that the server could describe where it would like the telemetry to go. Um, however ignoring those, there are people in our network team that would probably like to have this kind of information available to them to go and do their networky things that they do and say like "Oh, there's all these TCP connections and yet they're all being given up because there's this thing I don't need to worry about them." I just pessimistic about the chances of this being feasible. Cool, thanks.
Miya Kulivind: Yeah, um thanks for starting this work. I think this is kind of uh one way to think about it or one piece of the problem here because what we really want to understand is: what was the problem and also where was the problem. And then we want to kind of communicate out to that entity to tell that entity more about it. And in my head what we need to do is like maybe this or other or a set of signaling mechanism which can just say there is a problem, please reach out to me and then maybe you can reach back and get further information if you have some kind of trust relationship. Um, so that might be one piece of the puzzle. Um, and also I would really like to learn much more about what we know already about what usually breaks. I think there might be very common cases like it was brought up like Quick doesn't work, right? So those cases but then we still don't know where it breaks actually in a lot of cases. But like like looking at the data, understanding what we have and then maybe define a set of error codes that cover the most common cases or something like that that would be very useful.
Brian Trammell: Yeah, so there was a there was a revision of this document that like went much farther down the idea of like guessing what those error codes might be. Uh, I pulled them out because like the the for for rev 00 of that because that would have required a lot more sort of privacy tradeoff analysis that I wanted to do on a Friday in Montreal.
Miya Kulivind: Yes, definitely privacy but I think we also really need the data about like what we know about these errors already rather than guessing them. Absolutely, absolutely. And the other thing I wanted to check is like I'm not sure like sorry if I missed it or I'm stupid or whatever: what do you need the DNS hash even for? Like what do you want to detect with that one?
Brian Trammell: So the the DNS hash would be if you um have uh if you have sort of a frontend that's serving a whole bunch of different names, it would allow you to disambiguate um and classify by which name you're having the problem with. Uh, although probably honestly as I think about it in this discussion, that needs to be inside the envelope instead.
Miya Kulivind: Yeah, also I think the way we should think about it is really not as um a whole Happy Eyeball approach but like, you know, I tried this on this path and this is the failure, which is kind of really independent of the question of what else you tried, right? It's just like telling you I tried this and it didn't work.
Brian Trammell: Yep.
Jordi Palet: I still think that who really needs to know uh or the part that importantly needs to know uh when Happy Eyeballs is falling back um is is the the ISP, okay? So then I don't think there are privacy concerns because he know already all the information or he can sniff the information uh and and normally this is not considered a a privacy issue. Um, what what I fail to see is what is the practical availability of this in terms of this ISP. How they will really be able to to use this or what they need to implement new in the network, because in my experience ISPs are not happy to implement anything new. So there is a balance between how much effort needs to be done in the client itself that it's running Happy Eyeballs and how much effort needs to be done in the ISP that need to get this report. That's what I'm failing to see and and I think we need to have this this discussion in terms of different choices not just this one uh to to realize which one uh in that balance uh is is more feasible to be practically deployed. Yeah, absolutely. I think a protocol a protocol that is very nice but nobody will use, I don't think it makes sense to work on that. Yeah. So the the I mean again thanks the the point of this was to start that discussion. Um, I think there is a lot of of discussion to be had about the balance of who needs which information where and what can they do about it, right? Like so so putting un-actionable signals on things is is not using anything. I did want to point one quick thing from the chat. Ben Schwartz says "Scone seems to think that UDP is sufficient for client-to-path signals." Yes, uh I had also noticed that. Uh, I think uh I did not necessarily want to, you know, with my Scone chair hat on say "You should totally use Scone for this." Um, but uh there's a lot of additional space in the in the um Quick version number space that could be used for alternate signaling for this. So if if that's a thing that makes sense and sort of like shifts the balance on can uh non-operating system entities send this signal, that's definitely something we should talk about. Sorry, I'm overtime I see the chairs getting getting um restless to close out.
Eric Kinnear: We're always excitable. Um, Philip you joined the queue after it was closed, do you have literally six words you'd like to say?
Philip: I think we should think about different path signals or signals for different parties like an enterprise operator certainly will need another signal, but I think we should take this to the mailing list.
Brian Trammell: Yep.
Eric Kinnear: Wonderful. Um, thank you. Thank you Brian for uh writing much of this down and bringing it. Um, I think especially as we're kind of starting to collect different approaches to solving this, um that really helps guide our discussion around as a working group and with, you know, maybe Sage and and other privacy experts etc. um really helps guide our discussion about where do we think we're going to offer value here? What would actually meet the needs? What even are those needs? Um all that kind of stuff. So uh very appreciated on that. Um, wrapping up, um you know where to send things for uh requests to Max for different telemetry and stuff for Firefox. Uh from the main document, I think Jen and Andre were going to take a look at uh some of the PRs for some of the v6 bits, um and Tommy and Nidhi and the other authors were going to work with Ben on some of the proposed restructuring, um and keep an eye on some of that stuff. So I think we have people assigned to each of our major next steps. Um, I do want to give uh another thank you to I believe Dave you were taking some notes for us. Uh you rock, appreciate you. Um, and also uh some of the Happy Eyeballs web tester folks have released a bunch of updates since the previous meeting, um which would be worth taking a quick look at. Um they were not able to be here during this time slot to come present, so we'll probably have them come and talk more about it during 126, um but just a note that there are a bunch of cool updates that have happened there, uh so go take a look at those as well.
Jen Linkova: No, I was just uh going to we can do it offline, uh this PVD stuff replacing interface with PVD, you want it as a separate PR, I assume, but...
Eric Kinnear: Yes. Please. Um, but yes and we can sort out all the rest of that um offline. Awesome. Uh thank you so much. Uh please send comments, notes, thoughts, feelings, and everything to the list and we will see you next time.
Tommy Pauly: Great, thanks everyone. Thanks Eric.