Markdown Version

Session Date/Time: 07 May 2026 13:00

Alvaro Retana: Good morning, everyone, or good afternoon. If anyone wants to check your mic, this would be a good time.

Ignacio Castro: Hello, hello.

Alvaro Retana: Hello, everyone.

Marcelo Santos: Hey, everyone.

Priyanka Sinha: Hello, everyone.

Alvaro Retana: I hear you both fine.

(Silence)

Alvaro Retana: Hi, Sue. Good morning. No, no one speaking yet. We're… there's a couple of people that had confirmed coming, so we're giving them a couple of minutes. I hope you heard that, though. Yeah, thanks.

(Silence)

Ignacio Castro: Hi everyone. Hope everybody is fine. While we wait for other people to join, maybe I can start going over the Note Well. So, as you probably know, the IRTF follows the IETF Intellectual Property Rights and its corresponding disclosure rules. By participating in the IRTF, you agree to the following IRTF process and procedures, which you can see in the slides. We are recording this conversation as per the Note Well. And we follow the privacy and code of conduct. And you can find the relevant links and the relevant RFCs in the slides of this meeting. Chair slides.

The goals of the IRTF is to focus on long-term research objectives that are related to the IETF. The IRTF conducts research and it is not a standards development organization. And all this information can be found in the RFC 7418. This is RASPRG, and the chairs, Alvaro and myself, are coordinating it. The goal of RASPRG is understanding the standardization process via evidence-based reproducible work. The outputs are joint reports, papers, tools, data, open-source software. The goal is not to do hierarchical comparisons between SDOs or directly influence IETF operations, and all this information is in the charter.

If people can help with taking notes of the meeting, that would be really appreciated. And just very quickly to go over the agenda. At the end of the last IETF, there was some discussion, and a few people said that it would be helpful to have an interim meeting. I think Jean and Marcelo and Alvaro were physically present there. I think Jean is not here. The agenda follows from that conversation. The idea, and this is by the way, more of a tentative agenda rather than a strict agenda, was to discuss existing work. As a part of that, I have added discussion on the draft that Colin has submitted, and that for clarity, I am also a co-author. Then to discuss the gap of the things that could be done, the scope of the things that we want to do, and how to move forward on them. And who would like to participate, be involved, as well as the decisions and next steps. I don't know, Alvaro, if I'm missing anything. You were there at the end of the session.

Alvaro Retana: No, so you covered most of it. The realization at the last meeting in China was that there is work that keeps coming up related to common themes around participation, for example, or participants, or geography, or things like that. And of course, not a lot of the data is, even if it's the same source, it's not treated the same way. There's some inference done around location of people, around where they're from, or where they work, or things along those lines that it would be nice if we had a common way of deriving that so that we can compare some of the results, so that we can see change over time, so that other people can repeat the analysis or the research. And the need or the willingness to be able to have common definitions and common sources and common work. So that's how we sort of got here. As Ignacio said, yes, this is a guide of what we think we should talk about. It is probably more a conversation than anything else. We have slides from Colin and Stephen proposed some slides earlier, about five minutes ago, which I uploaded, Ignacio, on the page there. So I think that's it, right? We start talking about what's going on, where are the gaps, what do we want to do, and go from there.

Ignacio Castro: All right. Very well. So I can see a few people that I know have been working on this space. If there is no particular volunteer to start, maybe I can propose either Stephen or Colin, who I've seen have proposed slides.

Stephen McQuistin: Sure, so I could go first. My slides are a sort of general introduction to a project called sodestream that we've been working on, Ignacio is part of, just for full disclosure I suppose, and from which I guess the draft that Colin is going to talk about sort of falls out of.

So the slides are essentially picking on a strand of work that this sodestream project has looked at, which is measuring the sort of combined internet standards community that exists when you combine the IETF and the W3C. So how do we look at that sort of shared community? As I say, that is the result of a UK-funded project called sodestream, or Streamlining Social Decision-Making for Improved Internet Standards. And the sort of basis of that project really is that designing standards, developing those standards, involves a sort of complex dynamic process of decision-making, right? We've got interaction and communication between lots of people. We're all familiar with how these processes play out. People come to that process with a sort of different set of interests and priorities and how they want the work to proceed. And ultimately, they arrive at a set of decisions, right? A draft gets moved forward or a bit of text gets added. Some decision is made that moves the work forward. And the idea behind the sodestream project was how do we go about sort of streamlining that process? Making it better for some definition of better. Be that faster, be that more representative of the community, be that arriving at a set of standards that is better in some way.

Of course, all of the data, or a lot of the data that we would want in order to try and make these decisions is publicly available. And so we're sort of looking at the data that's available from the Datatracker and more recently, the data that's available from the W3C. If you can go to the next slide, Ignacio. sodestream: Measuring the IETF and W3C. I don't know if I can control them.

I'll not sort of labor the point too much. I think we're all here because we know why this sort of measurement work matters, right? We know that the internet is important as critical infrastructure. We need to move to thinking about the way in which these standards are arrived at rather than just the sort of technical technical measures that we might have of the standards themselves. How do we go about thinking about the resilience and the efficiency of the process that actually makes those standards? And also thinking about things like influence, right? How do we measure the influence that people within the community have on the decisions that are made and ultimately the standards that get produced? As I say, the view, the longer-term view here is can we identify bottlenecks in the process, both the social process and the technical process that might hold up standards development work? And can we then improve that process, perhaps by introducing some tools that go about mitigating some of those bottlenecks?

So next slide. You might have seen previous presentations of ours that have looked solely at the IETF. As I say, the novelty of this talk is that we've expanded that work to include the W3C with a view to really sort of trying to study the wider internet standards development community, and in particular thinking about the sort of intersection and the overlap between those two communities in terms of things like resilience, right? Are we dependent on a few people sort of holding the community together, the broader W3C-IETF community together, or is that a more resilient organization, a more resilient community than it appears?

So we, and I won't present all of these, but we're thinking about things like how participation in that shared community has developed over time, how authorship within those shared communities has changed, again thinking about that overlap, who exists within both of those communities, what does that sort of knowledge sharing look like? Do we have enough people that exist in both the IETF and W3C communities to make sure that that's a healthy structure, that we're not duplicating work, for example? And then relatedly, can we think about how much dependency there is on a small set of individuals? Are there a small set of sort of influential participants that are sort of driving a lot of the internet standards development?

If we go to the next slide. Yep. So as I say, we've got a lot of data. Again, I will not go into too much here because I think it's covered by Colin's discussion in a minute. But we're looking at all of the documents that are produced by the IETF and the W3C. So that's about 10,000 RFCs, give or take. About 400 W3C recommendations. We cover roundabout 130,000 participants across both organizations. And then in terms of the communication that participants have, we're looking at roundabout 4.3 million emails and roundabout 1,800 GitHub repositories where this work is actually done.

We can then combine all of that and the processes that we use are described in the publications that we've got on our website. We're essentially constructing a sort of social graph where we link together participants based on the communication that they have between them. And then we can go about sort of analyzing and inspecting how people participate, how they author new work, and sort of look at these questions of participation and authorship.

Broadly speaking, what we can see over time is that the IETF isn't growing in terms of the overall volume of participation. So if we look at how many people are participating, there was a sort of spike in the mid to late 2000s and it's sort of leveled off at roundabout 4,000 participants if you look at the total participation over mailing lists and GitHub that are active per year. We can see that the W3C by contrast is seeing some growth to the extent that we can see in the slide there. What's driving that growth might be its use of GitHub. It's a far more extensive user of GitHub than the IETF is. And that does allow for participation that perhaps is wider than the IETF is seeing.

If we go to a couple of slides forward. Yep, so just to sort of drive that point, we can see that the W3C has almost wholesale switched from using email and mailing lists to using GitHub. And you can see that sort of crossover point happening in the mid 2010s where GitHub use sort of took over from email in terms of the sort of volume of participation. Now what we've not started to look at is the sort of quality of that engagement. What does it mean to participate via email versus via GitHub? We've not sort of dug into that. But we are seeing GitHub being used more frequently. And obviously there has been a sort of push in the IETF to use it, although it's a much less significant proportion of all of the use that we see.

I don't want to take up too much time, so the last sort of plot that I'll present, if we go a couple forward to the topic modeling. So this plot here is looking at what is being discussed across both the IETF and the W3C and whether or not a topic is being discussed in one organization or the other or within the intersection of them both. And I'll finish on this plot just to drive home the point of this wider analysis and the idea that we should look beyond just a single organization, because what we're seeing here is that these organizations don't exist in isolation, right? There is collaborative effort across both the IETF and W3C, and so considering how participation varies across those organizations and the extent to which there is an overlap is fairly important. What we don't want is a scenario where there's duplicate effort or where one organization is considering a topic that perhaps belongs in another organization.

So, as I say, I don't want to take up too much time here, but it was just to highlight some of the work that we've been doing exploring that gap, both in terms of the volume of participation and then here looking at what's actually being discussed. You'll find much more analysis in the publications that we've got on the website. That's sodestream.github.io. And there's more in the rest of the slides. But I'm conscious that I don't want to take up too much of the agenda time. So I'll leave it there.

Ignacio Castro: Thank you, Stephen. That's great. There was a question from Priyanka whether the numbers are raw or cleaned, such as spam, duplicates removed. I think they are clean, right?

Stephen McQuistin: Yeah, so all this data is cleaned. And I think Colin's going to touch on this question of how we go about actually processing the data in a more general way. But yes, one of the challenges is that the data that we get is in various stages of being processed. So in the IETF's case, a lot of the spam has previously been removed from the mailing list archives, so they're maintained in a way that's relatively clean. We do further cleaning on top of that just for consistency's sake. But that varies across the different organizations. Some of the mail archives haven't been cleaned in that way and haven't been processed, but we do that cleaning for the data that I presented here.

Ignacio Castro: Great. Thank you. I don't know if there is any other brief question.

Stephen McQuistin: Right, in that case, thank you very much.

Ignacio Castro: Thank you, Stephen. I don't know if Marcelo or Jie, who have also been working, and of course anyone else wants to discuss about their ongoing efforts or past efforts. By the way, no slides are needed if you don't have them. It's absolutely fine to just discuss briefly what you're working on.

Marcelo Santos: Hi everybody. Can you hear me?

Ignacio Castro: Very well.

Marcelo Santos: Yes. We have a similar work, but with a little bit different focus. But I don't have slides, so I'm sorry. But we are working in a presentation for the next meeting in Vienna. And it's just a small talk, five minutes, just to discuss some ideas so we can collaborate maybe. I appreciate that if it's possible. Because we would like to analyze the data. Actually, we are analyzing the Brazilian and Latin American participation. We are trying to cross some data with drafts, RFCs and email lists, and how we can measure impact in this kind of collaboration, in email list drafts and so on. And how we can predict if a draft will become an RFC or not, and what features are important during this process. So we are working in something like that.

And we would like to show a better presentation in Vienna about that. And we really would like to show a beta version of a software to help people to contribute to IETF. I mean, I explain a little bit better. The idea is we have a lot of researchers spread all over the world, of course. And how to contribute to IETF when we have an initial idea, mainly research groups? So we saw in the last meeting on China, a lot of AI for all of sessions and we don't have a consensus and I mean, an organized discussion about AI, for instance. And what we are trying to do, create an interface where we can put some ideas and some maybe expertise areas and we can direct, after that we can put in the interface, send this small prompt, small text, and receive where we can contribute, research groups, work groups. For instance, I would like to work with software-defined networks or I don't know, BGP, it doesn't matter. And we put our expertise and we can summarize the drafts that we have some discussion about it, or we can have some RFCs that we need to improve or some mailing list discussion. So it's an initial door for beginners.

That's the idea. So I don't know if it's linked with RASP research group, but we would like to discuss deeply in the next presidential meeting these two things: this kind of software, maybe with LLM to interact with the user and receive some guidelines how to contribute; and this data a little bit more prepared to show and have some discussions. And one initial thing that we are working right now, and I think it's very interesting to have this discussion also is: okay, we look frequently drafts that became an RFC. But let's try to do the opposite. Look the drafts that don't become an RFC. Why? So it's another vision. Pay attention of this kind of drafts that expired, and why they expired? We don't have so many interactions in the mailing list, in the research groups or work groups, it doesn't matter. But why a lot of drafts don't become an RFC? So it's another kind of analysis because generally we focus on why drafts become an RFC, but why they don't? So we are creating a database about that. And we need to analyze carefully yet the data to get some insights about it. So in summary, that's it. So sorry because I don't have a slide organized to show everything with more precision. So in summary, that's it.

Ignacio Castro: Thank you, Marcelo. That's great. I think that we all have a high dose of slides and I think that a presentation without slides is actually maybe even refreshing. I don't know if there are any quick questions or comments on Marcelo's talk. Well, just from my side, it sounds very interesting and it does sound like it has a very good fit in RASPRG. We did some work predicting whether a draft will be adopted or published. Maybe there are some insights from there that can be useful. And what you were saying as a presentation for Vienna sounds like a good idea. There is going to be one related from Jaime on looking at the likelihood of a draft being AI-heavy in terms of the text, which might be relevant to some of the things that you were saying. I don't know, Jie, if you want to discuss your work.

Jie Bian: Oh yes, I'm trying to turn on the camera. Sorry, I don't know how to turn on the camera, but I'll just briefly talk about my research and sorry I don't have time to prepare the slides for today. So our research project basically focused on finding the connection of the IETF mail archive and the internet drafts or RFCs. So usually the working group, you guys have some discussions, conversation happening on the email archive or on the GitHub and afterward you edit the internet draft. So this research project is mainly trying to build a bond, build a bridge between the corresponding details, not the whole draft, and the relevant discussions. And we have several papers on that and this year we have two papers, one in Elrak and the other one is in the Web Conference. And right now I'm building an AI agent. It's more like chatting bot, so if you input a question or you just input some specific description from an internet draft or RFCs, this agent will help to search from the email archive and after that it will try to answer your questions or answer the details regarding the specific details and generate a replies. So right now I'm working on that and hopefully I can have a nice prototype so I can send to the mailing list. You guys can play around and see if the agent is really explaining it or it's like some hallucination with its generated text. So that's basically what our project's about. Yeah, any questions?

Ignacio Castro: Okay. Thank you very much. I don't know if there are any questions for Jie. Okay, very well. I don't know if there is anyone else who would like to discuss briefly past, future… Ah, Sue, please go ahead.

Sue Hares: Thank you. I am looking at the back end of processing. You know that I have done that over the years. What I'm focusing on right now is trying to take the data that I've gathered and make it available for the rest of the community. In case you've not seen past presentations, my study is on what happens when, after the draft passes working group last call in a working group, what's the process look like between when they complete it and how it turns into an RFC. Part of that process is the IESG's decisions on it. And I did a fairly lengthy longitudinal study on the effect of IESG deliberations during 1990 to 2016. I completed that study under my dissertation and review. I'm going from 2016 to 2026 in the same methodology and then trying to make it available under standard scanning tools that are available in the IESG. So it's got two processes. First, making the data that I have for the past available in case anyone does longitudinal studies. And two, going forward trying to make tools that are commonly available, be able to look for the same predictors of IESG performance.

The predictors are behavioral. So this is a behavioral study based in IESG minutes. It's not like most people's studies in that you may not be interested in how I get the data, but you may be interested in the predictive portion of the data. My research area is in group behavior. How does the group behavior of the IESG predict time it will take to pass the working group draft that has been approved all the way through to a published RFC. There are other pieces that are subsequently interesting, which is the fact that the IESG process is effective, but not necessarily as cut and dried as you think it might be when you look at it. There is a lot of flow that's effective in that when an IESG member puts a discuss on a draft, there is an opportunity in most IESGs to go back and cycle privately between the two parties being discussed, either the proposed AD and the discussing or multiple discussing IESG members. That's effective in that it works offline. However, the rate and the approval rate varies per year as an IESG's character is dependent on the participants and the IETF chair. So there's a predictor for past presentations. This was done with a great deal of what you would call in computer science scan technology, what they would call in social science qualitative analysis. It is essentially looking for key indicators in their discussions for predicting how they will approve things over time. It is not an individual draft predictor, but it is a predictor over the time period for an IESG. Now, it is usually a predictor that looks backward, but we shall see what the data brings to us.

This is not… this is the back end of your draft. How do we wait? Because most of you, when you talk about it, have been talking about how do we get it to working last call and then you assume a common fixed value for time for the draft to go from passing working group last call to the I to being published. That's not true. There's float in the IESG process, in the directorate review process, and in the RFC process. Anyway, if you're interested… I'll answer Michael in a moment. So, I hope if you're interested in the data, some people wanted to look, use it to go with their data that's looking at the working group. And Michael, psychohistory and social science that's accredited by scholars have a lot to do with one another. No, I'm not… that's a fun comment. I never put that together with my history. I'm looking… it's a scholarly approach to organizational behavior. Yeah, I've read those and enjoy those documents. It's just my PhD and my teaching would… I'll use that one. That's fun. Any questions?

Ignacio Castro: I think we have a question from Priyanka.

Priyanka Sinha: No, sorry. I just wanted to say that this sounds really amazing, this kind of contribution.

Sue Hares: Well, that's very kind of you. Like Michael says, yeah well, it is something, Michael, the psychohistory is interesting. Lifting off from Michael the psychohistory, current leadership dynamics look at the same sort of thing that Asimov wrote about, but they're much more practical today. And you can think of it fairly simply. Does the cohesion… when you work in a group, if people go the extra mile to get the document out, if they, like Alvaro, will stay up late… Alvaro helped me with a document recently and he wrote pages and pages of comments and they were specific and they really helped us reframe it. And you know, I spent a good 20 hours integrating this into a problematic text and it looks much better now. Now, if someone will do something like that who's in IESG or a reviewer, it makes a big difference in the process. And IESG members who go the extra mile, who help you or who select directorate members who will go the extra mile really change the dynamics of how you can get things from a proposal all the way through to a published RFC. And that factor, we look at distribution of people, we look at distribution of contribution, we look at email, we look at GitHub, which my working group has switched over to a massive use of the BGP IDR working group, we switched over to a massive use of GitHub over the last two years. We work at every forum to try to get that, so I know personally from as a working group chair how that works. But beyond that, there's going the extra mile and that measurement is important. I'm sure that I could find the same things in a working group. It's just the data inside of the IESG is very contained. They write minutes. They put in the Data Tracker how things occur and their decisions have direct import. Working groups have some of the same information. I'm hoping after I finish the IESG work and make it easily accessible, that I can go back to working group. I hope this is interesting. I've given some slides, but I'm hoping to not only have slides but data that you can utilize at least through 2016 by Vienna.

Ignacio Castro: Sue, fantastic. That's super interesting. Really looking forward to it. I don't know if anyone else has any other questions or brief comments. Okay, very well. I was wondering Jennifer, could you briefly discuss a bit about Data Tracker and stats since I think that to a great extent we all use your resources for which we are eternally grateful, even if sometimes we use the API a little bit more intensively than we should. We always apologize. Sorry about that.

Jennifer Richards: Sure. That's all right. That's what it's there for. So hi, I'm Jennifer Richards. I'm with the tools team. I'm a developer there and I work primarily on the Data Tracker. I guess just my own background, I spent a long time doing research in astronomy, so I'm interested in a lot of the work that's going on here. It's very different, but I'm seeing some of the same sorts of general considerations which are different from my current life as a developer. So happy to be here. I guess one thing Alvaro had asked me to mention, I'm going to post a URL here in the chat. There is a statistics section in the Data Tracker. It's a little bit minimal at the moment. There had been some going back a long time before I was involved here that had sort of fallen into disrepair, so it's empty. But we've been starting to planning to for some time improve that. And Eric Vyncke, who I'm sure many people know, has been helping contributing some plots that we have there with starting to get some country and affiliation statistics on relevant related to the individual meetings and historically. He's continuing to work on that. And this is an area where things that are of interest, I would say, to the community at large, and especially if they're relatively straightforward to calculate, we do have a hook where things like that can go in and plots can just be made available. So I think that's just something that people should be aware of.

And I guess the other thing Ignacio mentioned, the API, which there is what we call the V1 API is a pretty raw access to the data that are collected in the Data Tracker. It's virtually everything, excluding some private things like messages that are sent and other details that shouldn't be leaking out. Basically anything that's in the Data Tracker is available in a read-only mode there. It was implemented a while back in a very raw way that it requires pretty intimate knowledge of how things are modeled within the Data Tracker. And we've tried to work, I know I've interacted with quite a bit with Colin and with others. I'm not exactly sure how they know what they know about what we do inside the Data Tracker, but I'm impressed with what they're able to pull out of it. We're happy to assist where we can. But something to be aware of is we are planning to come out with another API revision. It won't be V2 because that's something else. I'm not sure what its name will be. I can't speak to what our timeline on this is. But an aim is to get away from coupling it very, very tightly to the internals of the Data Tracker and try to make it more stable and friendlier to use. That's really something that we're... that's the high-level picture of what we'd like. We're getting some experience with that in work we're doing with the RPC, the RFC production center, revamping their tools. We've been experimenting with approaches to doing that API development, building it on open API schemas and things to try to make code generation more possible and to try to sort of insulate what users see from how it turns out to be convenient for modeling, and really with the aim that your tools won't break every time we do a Data Tracker release without warning. So I think that's what I have to say. I'd be happy if anyone has questions, but I'm really happy to see people using the data that we're collecting. So thank you.

Ignacio Castro: Thank you very much, Jennifer. Are there any questions or comments? Will the API be published in an internet draft? Sue asks.

Jennifer Richards: I don't know. You know, this is still something we've sort of discussed and has on the roadmap, but we don't have a concrete plan for how that will roll out. Although that's an interesting suggestion. Thanks.

Ignacio Castro: I'm certain that a number of people in the group will have opinions about the API. I'm sure they will.

Alvaro Retana: I mean, to be fair, this is the IETF. People have opinions about most things.

Jennifer Richards: Yes, I have noticed that in my time here, so we'll try to make everyone happy but not too happy.

Ignacio Castro: Well, that would be easy in the IETF. All right, well, thank you very much. Then, if no one else wants to present… I don't know if there is anyone from the group of Jean who wanted to discuss, and I presume present, ongoing efforts, but if there is no one here, maybe Colin, can I ask you to present your slides?

Colin Perkins: If I can figure out how to do that. Yes, absolutely. And there is a bunch of interesting talks. I think people are doing fantastic work. So I look forward to the presentations in Vienna. All right, is that working? Can everyone see the slides and hear me?

Ignacio Castro: Yeah.

Colin Perkins: All right, great. So thank you for the opportunity to speak. This is some work that came out of the sodestream project which Stephen mentioned earlier. Analysing Internet Standards Development Organisation Data. And a number of us put together an initial draft to try and, I guess, structure some of our thinking around how we do analysis of the data from the IETF and the various other internet SDOs and how we can try and combine that, those different data sets, and work with that data effectively. It's a pretty early draft. It talks about the standards development process and the broader sociotechnical system in which it's embedded. It talks about some of the data which is made available by the IETF for analysis, briefly about some of the data from other SDOs and how to combine that with the IETF data. And it talks about some of the challenges with data processing, ethics, data protection, and so on, and then tries to make some initial recommendations. As I say, this is an early draft. The point is very much to help us in the sodestream project structure our thinking and hopefully to help this group as a whole to, I guess, structure the discussion and thinking about how we process and manage and work with this data. It's early. It's got a number of things missing, not least it is completely missing any references to prior literature, and we certainly need to fix that. But I'm sure there's a number of things in the rest of the document which need work and we're very happy to get input on this and I don't think any of the authors are precious about the contents of the document. So please do send your feedback.

The draft starts out by talking about the sociotechnical system in which the standards process is embedded. And I think it's important to think about the broader context when working with the IETF data and when working with data from other internet SDOs. When we're analyzing this data, we're certainly working with the technical artifacts with the Data Tracker, with the mailing list archives, with the set of internet drafts and RFCs and presentations and GitHub messages and so on and so on. But we're also working with data about a set of people and a sort of wide and varying set of people with different backgrounds and different interests who move between different organizations. And there's a number of organizations, both businesses and academic organizations, governments and civil society and so on that are involved. And all the different people and the different organizations have their own interests and goals and strengths and weaknesses and so on. And I think understanding that context, understanding who is involved, both people and organizationally, and what are their motivations is one of the critical parts to this. And I think also understanding the governance process, understanding the context in which those people are working and how that differs between the different SDOs and the different parts of the different SDOs. All that feeds into the standards process and there's a fairly complex feedback loop of people and organizations making proposals, which get discussed and then lead back through revisions of the documents and so on. And that eventually results in some standards and some implementations of those standards. And those then feed back into the standards process and the whole thing continues. And there's a very complex dynamic goes on.

We broke this process down, identified a set of pieces of this technical, this sociotechnical system, the participants, the organizations, the technical groups, working groups, study groups, research groups, and so on. The different artifacts in terms of the documents, people working with the standards documents, internet drafts, and so on. The collaboration infrastructure in terms of mailing lists, chat logs, session recordings, GitHub discussion, and so on. The governance process, the rules of the different standards organizations, the processes they follow, and final standards and the implementations that are the result of this process. And I think what's important here is to understand that while we can certainly extract a bunch of metrics, and I'll talk more about those in a minute, the metrics we can find are providing evidence but only capture, and can only capture, part of the process. And there are some very critical aspects of this process which are extremely hard to observe and extremely hard to infer what's going on by just looking at the data which is captured. It's hard to understand the culture of an organization unless you're embedded within that organization, unless you understand the way the organization works, unless you understand the unwritten rules and processes people follow, not just the written rules of how the organization is supposed to operate. It's difficult to understand the way people and organizations express and have influence on the process. And some of that's formal, you know, someone is in a leadership role, a working group chair, an area director, or someone. But some of that is also just influence gained from having a reputation of knowing what one is talking about. And again, inferring this and inferring who's driving the agenda and who's leading the organization and who's following is extremely difficult to do from just looking at the data. Yet it's critical to understanding the process. It's hard to see the informal discussion that happens. We can measure the mailing list discussion, we can measure the GitHub discussion, we can see what happens in a working group. What, of course, is missing from the data we can capture is the discussion which happens over dinner or in the bar or in the hallway. And the informal chat and negotiation which drives a lot of the process. And understanding the way people exercise power, the way people exercise authority, the culture, the agenda setting, the influence, is extremely challenging. And I think also just looking at the metrics and the artifacts and the other data, not only is it necessarily incomplete through no fault of the people collecting the data, that's just the nature of the process, it differs greatly in the accuracy of the data which is collected, the representativeness of that data, and perhaps the relevance of the data. And it's easy to focus on particular things which are easy to measure, which are not necessarily important or don't necessarily give a good view of the organization as a whole.

Looking at the IETF specifically, the draft talks about the type of data which is available. It looks at the Data Tracker and what's made available through the Data Tracker API and tries to summarize the key points of data which are made available there. It talks about the mailing list archives, the RFC Editor data, and a bunch of other things. The IETF is actually a great organization to study because it makes so much data publicly available. There's about 3 million emails in the mail archive, for example, going back to, in some cases, I think, the late 80s. There's five and a bit million records in the Data Tracker. If you just pull the mail archive and the Data Tracker, that's 40 gigabytes of data before you start looking at the RFC text, the internet draft text, the presentations, the session recordings, the agendas, the liaison statements, the IPR declarations, and all of that data. And that's before you start combining it with data from GitHub and all the various other sources of data which you might want to bring in to get a more complete view of the system. So there's an awful lot you can pull out of the IETF. The various other talks before, we've seen a bunch of people doing really interesting things with this and it's fantastic the amount of data the IETF makes available and how useful it is.

Across the internet governance ecosystem more broadly, we see, I think, a mix of availability of data. Some of the SDOs vary significantly in their working model compared to the IETF, with different degrees of openness and different degrees by which you can participate openly or as part of an organization or a government delegation or whatever it is. There's different amounts of data which are made available either publicly or to members or so on. And the amount of data you can get, who has access to that data, how the different organizations work, how their governance is structured, varies tremendously and makes it very difficult to sometimes collect data, but when you can collect data, it makes it difficult to compare data across organizations. There's a lot of people who work across organizations. There's a lot of standards that are developed across organizations. An example from the IETF might be the WebRTC standards, which were done jointly with W3C, but there's a bunch of similar things across different SDOs. And pulling the data from the different organizations, putting it together, and doing the entity resolution to see that this person in this organization is the same as this person over here and understanding how they contribute and how they work across organizations, it is very much a challenge.

In terms of data processing, as Jennifer said, there's an awful lot of data available in the IETF Data Tracker. We've got the mailing list archives, we've got all sorts of other sources of data. Working with this data is a challenge and I think anyone who's tried to work with these types of data sources will be aware of some of the difficulties. Entity resolution is a tremendous challenge: identifying people, knowing that this person you see in this organization in 2025 is the same as the person with a similar name in a different organization 20 years before. Even just identifying people consistently across the IETF as they phrase their name in different ways and they change their jobs and work on different topics. Identifying organization names is a tremendous challenge. I looked at the Data Tracker and found 282 different variant spellings of Huawei, for example. And I think Huawei is perhaps the organization with the most names, but it's certainly not unique, in that all of the organizations are have so many different ways of describing them and putting all this together and identifying what is the same organization and in what context, because in some ways if we just look at all these names on the slide, they're clearly all Huawei, but they're all different parts of Huawei and in different cases should be treated as the same, or in different cases should perhaps be treated as different organizations. So there's challenges there. There's challenges tracking affiliations and this is something where the Data Tracker doesn't track changes in affiliation well. There's challenges in reconstructing document lifestyles: who had what role in the IETF, who was a working group chair, who was an area director at what time, which area directors were responsible for which groups, who's managing the discussion for certain documents, for example, is tricky at times. Working with the mailing list archive is a tremendous challenge, not because it's badly archived, it's very well archived and it's very easy to access the data, but because it goes back 30 years, 40 years, and there's a lot of malformed or poorly structured messages in the archive and modern tooling just falls over when you access it. And I think anyone who's tried to put the IETF mail archive into the Python standard libraries for email processing will rapidly realize that there's an awful lot of messages which do not process because IETF participants have an interesting choice in mail clients at times. And of course the data we have has changed over time and some of the records are incomplete, especially as you go back past the last 20 years or so. And certainly for the IETF, a lot of the old mailing list archives and a lot of the records of the discussion from the 90s and the early 2000s, for example, are missing or incomplete.

Yeah, I mean, I think I mentioned this briefly earlier. Ethics and data protection are something that are a big challenge working with this data. The IETF has a bunch of policies around this. It's clearly sensitive data. It's data which has legal restrictions on usage and the way it can be processed and restrictions that vary in different jurisdictions. And it's easy to forget that the regulations that apply for researchers using that data may differ from those that apply to the IETF because the data was collected for the purposes of operating the standards process and may not always be usable for people using it for other purposes. And I think it's important to realize and to think that while the data is public, the implications of it are not always well known and it's important to be careful about what can be extracted and be careful about what you publish. It's certainly working with this data, for example, it's straightforward, or relatively straightforward, to identify people who are very successful or people who are perhaps less successful at different parts of the IETF, you know, different parts of their role in the IETF, and publishing that without care is potentially a concern and could potentially affect those individuals. So we need to be careful about how we use this data. And yeah, people working with this data need to talk to their research ethics committee, their institutional review board, or whatever the equivalent is in their organization. And I think people need to be aware that this is, to at least some extent, critical infrastructure, right? The operation of the standards process matters, and if it's broken because of people hammering on the Data Tracker and affecting a meeting, or if it's broken because you're publishing things which disrupt the operation by causing fights in the organization, for example, you know, this is a problem and we need to be careful to proceed with care and not to disrupt people's livelihood or disrupt the operation of the standards process.

And I think to wrap up, the draft tries to make some recommendations. I think some of these are perhaps obvious, some perhaps less so. It tries to draw some conclusions about what the IETF should do in terms of maintaining data quality and potentially how it can safely backfill some of the historic data. And it makes some suggestions for what researchers should do and how you should use the data and to what extent you need to be careful when using the data. And we would certainly appreciate people's input on these recommendations. There's a bunch of things which I'm sure are missing and I'm sure there's a bunch of things which people can discuss and have opinions on and some may be incorrect or certainly incomplete. And yeah, so that's about all I have. Thank you for your time. As I say, there's a bunch of things we can maybe talk about and I'd be happy to talk about the rest of the document as well. Thank you.

Ignacio Castro: Thank you very much, Colin. Do we have any questions or comments for Colin? There was a comment from Sue and Priyanka. I don't know if you want to make it directly.

Sue Hares: Sure. I'll go ahead, Colin. Two comments. I found that another loop that affects the IESG in my study and in my experience is the IT industry economic environment. You know, when you see data centers growing, you'll see a growth in presentations that try to create protocols, either management or transport area or application or even network layer. So that economic loop might be useful to be added to your diagram.

Colin Perkins: Yep. Absolutely.

Sue Hares: In my study, I've found that those forces, a SWOT from those forces (strength, weakness, opportunities and threats), could explain quite a bit of the focus of a particular IESG, the output of a particular time period, and they were often times the main focus of some of the presentations by IETF chairs. So you might find that as useful. IETF chairs oftentimes are like a person driving an old-fashioned covered wagon with six horses trying to handle all the reins. They are amazing people.

Colin Perkins: Yeah, I don't envy them. That's a difficult job.

Sue Hares: A difficult job, but it actually impacts their ability to do that job, and the IESG and the working chairs affects how the organization moves.

Colin Perkins: Absolutely.

Sue Hares: As to culture, speaking from someone who has a PhD in a discipline which studies organizational culture as well as group culture, there are scholarly techniques for looking at organizational cultures that provide measurement points and ways, frameworks to look at cultures that may be useful in your work. If you need references for that, I'd be glad to help you with the references or help you with the generic models that seem to be the current best of scholarly work.

Colin Perkins: Yeah, yeah. I mean, I think we've come across some of that, but I would certainly be interested in getting your input and getting some references to make sure we've not missed anything. Yeah, absolutely. I mean, we've had presentations by people like Corinne Cath as well in the past that have spoken about things in that space to some extent. So but yeah, the more input on that, I think that's something we are missing certainly. So yeah, I'd certainly appreciate input.

Sue Hares: Both of those features are something that would be helpful. In addition, there are standards for use of data that relates to human beings in the social sciences and in the organizational leadership field. Again, those standards you might simply refer to and give that information. So those are three points I will… I suspect the best way to be specific is to send these in to the RASP group with an analysis of your draft. Am I correct on that?

Colin Perkins: Either send it to the RASP group or raise issues on the GitHub repo for the draft. Yeah, whichever you prefer. If it's something that's specific and easy to incorporate, then raise issues, and if not, then we can talk on the mailing list.

Sue Hares: Yes. I will do that. I will do both. Thank you very much.

Colin Perkins: Thank you.

Ignacio Castro: I think there was a question from Priyanka.

Priyanka Sinha: Hi, Colin. I guess I'm… I think I made a mistake, as in I already assumed all of this is already done. I was too excited about all of this and not having done much, feeling guilty. But I think the main thing is that I was thinking probably once this is all going more into the research group and becoming more standardized, maybe we could consider adding more exogenous data, like, you know, if some organization which is participating in the IETF wants to give their data so that we could improve our understanding, because Colin specifically said that it's as good as the data, right? The data can be exogenous to… may not be within the IETF. If somebody provides that data, whether it is their internal meetings data or their internal email data. For example, Linux kernel mailing list is where a lot of the adoption happens, right? So that data is also there. But I think this is very silly of me because I think I jumped the gun. This can come much later.

Colin Perkins: Yeah, I mean, this is the first draft and, yeah, we're certainly looking for input on what we should include and, yeah, I mean data from outside the IETF, whether that's things like adoption of standards and people talking about what's useful or not is certainly one part of that. I guess other parts of that might be patent data, for example, and, yeah, I guess some market share and company dynamics. But yeah, there's lots of things that could be fed into this and understanding how to combine all the data sources and what insights we can get is, yeah, it's something I don't have a good insight into and it'd be really interesting to look at.

Priyanka Sinha: Yes, yes, Colin. And I fully support adoption of this draft. This looks great as it is. I mean, just feeling guilty for not having worked more.

Colin Perkins: There's plenty of time to contribute.

Ignacio Castro: Any other comments or questions? All right, thank you. Right, so that was the overview of all the ongoing efforts so far, and I presume that we will be seeing more of these probably in Vienna. So I don't know if anyone else wants to raise or discuss any other issues. Looks like there is quite a bit of space for common contribution, common effort. Not sure if anyone wants to discuss or mention anything on that front, or that's something that it would be easier to do in Vienna. I think Jean asked in the past about GitHub. We already have a GitHub from RASPRG. So people are welcome to use it. All right, so I don't know if I'm missing anything. Alvaro, is there anything else that we are missing or have forgotten to say?

Alvaro Retana: No, not at this point. So if you guys go look at the agenda, I think we budgeted 20 minutes for the review of the existing work and we're almost at 80 or so, which is great, I think, because this shows that there's a lot of interest in the topic. And, you know, listening to what everyone's working on, there are a lot of commonalities. You know, a lot of information or a lot of the research output is based on common information: information about participants, information about affiliations, information that needs to be, ideally, treated in similar ways, right, so that we can be talking about the same things. Whether it is participation in Latin America, as Marcelo said, or how do we even treat the different entities, 286 Huaweis, or whatever that is. We need to treat it in a similar way. Looking for example at the statistics in the Data Tracker, one of the graphs that Eric put up there is affiliations, right? How many people from a specific affiliation. So somehow he came up with one Huawei. Ideally, we're all looking at the same set of people that are part of that affiliation. And I think that's where probably we want to sort of go from here, right? You know, what are the things that we want to see be common? How do we, I don't know… and this was part of the discussion in China: how do we maybe clean the data in the same way, right? So that we're looking at the same things as we report things out. So right now, as I said, I think that it's great that there's so much work. We're obviously not going to cover the whole agenda now. So I guess I want to throw it back to you, Ignacio, or to the group in general. We can do several things with the rest of the time, right? We can keep talking about the types of data that we want to have normalized somehow now. We can do that in Vienna. We can do that at another interim. I don't know if that's a lot easier for people to get together, I don't know, sometime towards the end of this month or in June before the IETF in Vienna. What do you guys think?

Ignacio Castro: Anybody has any thoughts on this front? Plans for another interim. We can have another interim if there is demand for it. Ah, Jennifer has an approach for normalizing affiliations. Okay, that's great. We did spend quite a bit of time doing that ourselves, and I think this is one of those cases where we have multiple people doing the same effort. So indeed, great to combine it.

Jennifer Richards: If I can just, so that's code that Eric contributed. It's simple and I suspect, you know, works well enough for his goal of showing the top 20 in the last 10 years kind of thing. So that's more, I think, for information as to what we're doing rather than a "please do it this way" claim, just to put that in context.

Colin Perkins: Yeah, we have something which looks like a fancier version of this, which also takes into account the email domain to help with the matching and tries to map identifying identifiers, you know, it matches email addresses and GitHub identifiers and Data Tracker IDs and things like that to try and figure out who's the same person with perhaps the same organization. But yeah, it's a challenge to do well. I'm slightly wondering whether it might be something where it's useful to have a hackathon or something and have people bash on the data with different approaches to see which works best and try and figure out some metrics.

Ignacio Castro: Yeah, I recall we even look at mergers and acquisitions as it has happened in the IETF, and of course sometimes people retain the old email address for a while and it just makes things a little bit more complicated, in particular at a time where maybe priorities might be changing. So yeah, that might be a good idea for a hackathon. Jie, you were asking about an interim. If there is demand for it, we definitely could organize it. Did you have something specific in mind?

Jie Bian: No, not in particular. It's just that I remember that you mentioned it in one of the emails in the list that, my understanding was, it's procedural that after an interim there's a follow-up interim one or two weeks after, and you were saying that you were, if I read properly one of your emails, I might have gotten it wrong.

Ignacio Castro: I think that was because you suggested to have two interims, and I was happy to, but that was the main reason, to be honest. I think it's great to have one if there is demand, if there is not, it might be better to just have the conversation in the mailing list. We can also see it on the fly if there is conversation and there is some work that could benefit from putting heads together, then I think it's a great idea.

Jie Bian: Yeah, yeah, it's fine. No problem. Thank you.

Alvaro Retana: So, you know, it seems to me that sort of the next step is: are there, I'm going to call them tools, that we want to have common tools for, like the normalizing of the affiliations, for example, right? Is that the tool that we want to have? Should we work on that, you know, do PRs or whatever to the work that has been done in the Data Tracker? Are there other ways of doing it, which is, you know, to Colin's recommendation of maybe do a hackathon, right, along those lines? If we have those types of discussions before the IETF, then we could have a hackathon in Vienna, right, and organize that. But of course, we need you guys who are doing the research to come up with, you know, what are the common tools, who wants to work on those things. And if that is the type of discussion that we can have before July, before the next IETF, then if we can get people committed to talking about that, right, then it might be a good use of the time before that. And we don't need maybe people to volunteer today, but let's say in the next week or so, if we see interest on the list and people say, "Yes, I want to… these I think are the three tools that we need" or something, and there's maybe some code script, you know, whatever that we can start looking at, maybe there's something that we can organize in the next month or so. Just the proposal, just thinking out loud.

Ignacio Castro: Yeah, that sounds like a great idea. Maybe from a very practical perspective, we could look at things in two different ways. So there is at the moment the draft that Colin has presented and as he said, we are not precious at all. It's a starting point and maybe that could help to coalesce the more high-level visions that we have on this topic, the different sources, the problems that we encounter, the common approaches to do different things. And that maybe could serve to put it all together. And at the same time, as different of us are doing practical research, producing code, producing output, maybe when we have something that is kind of specific, maybe a little bit more self-contained, we could propose it in the group and maybe discuss whether to include it in the GitHub repository. And we could use maybe that as a way to create tools that we can discuss, we can test and we can share.

Colin Perkins: Yeah, I guess there's also… I mean, one way of doing this might be to share tools. Another way might be to share test cases or data sets. You know, I don't know whether entity resolution, for example, is so complex that we need a single tool or whether it's something where we just need to identify a set of common cases which trip things up so everyone could just go and implement the 20 lines of code or whatever it needs to do it. And I suspect it's more complex than that, but it may not be so complex that everyone needs to standardize on a single way of doing it. I don't know.

Ignacio Castro: Yeah, you're right. You may not need a single tool, you may need a single set of criteria, and everyone does their own thing. Yeah. My use of the term "tool" was very, very generic. Okay, very well. I don't know if anyone else has any other comment. Well, if no one else has any other comment, I think we might have recovered 30 minutes for your personal lives or for the next task on the list. Okay, thank you very much, everybody. Great to see you all and looking forward to see you in the next one or in Vienna at the latest. Bye, everybody.

Alvaro Retana: Bye.