Markdown Version

Session Date/Time: 10 Jun 2026 14:00

Here is the complete transcript of the CBOR Interim Working Group meeting, formatted in Markdown with speaker attributions based on the meeting context.


CBOR Interim Working Group Meeting Transcript

Paul Hoffman: I'm going to give it one more minute. Rohan just texted me on some different place saying that he's about to come on, and since he's the first presenter... ah, and there he is. Very good.

Okay, good morning, good afternoon, good evening. This is the yet another CBOR interim working group meeting. I have filled in the agenda a little bit, or actually, somebody started the agenda for me, thank you. And we do have a participant list there, so if you are participating, please go put your name in the list.

The agenda is pretty simple, which is we're still trying to finish the working group last call on the EDN literals. We've got a presentation from Rohan and a presentation from Carsten. Time permitting, we will go back to talking about serialization, which stuff is fortunately still going on slowly on, but I still think that we're we're lot closer on serialization—it seems to me from looking at the list—that we're a lot closer on serialization than we are on knowing exactly what we want or what we don't want in EDN literals.

So, unless anyone has any objections—if you do have an objection to us, just jumping in, please raise your virtual hand. Not seeing any virtual hands, so first up is Rohan, which means, even though it's 7:00 in the morning, I have to remember how to do slides. Upload, manage slides...

Christian Amsüss: Or I can do that, Paul.

Paul Hoffman: Okay, thank you.

Rohan Mahy: Okay. Hi, everybody. I'm Rohan. Okay, so I think us talking a little bit about what the what we were trying to achieve with EDN is is a good reminder.

I put the things that we want to do with EDN into these three buckets. The first one is just kind of informal stuff like putting things on a whiteboard, and in these cases, you don't need to have a formal syntax. I could go and write a map that contains a text string with no double quotes, or I could put the name of an algorithm from some IANA registry, and people would just assume that it's the—that I meant the integer that that corresponds to without having to do like a specific EDN legal syntax.

We have the uses for EDN to CBOR, like test vectors, like configuration files or documents which are generated automatically. Many of these are used in CI.

And formal examples and specifications where we're really trying to show a specific feature of how we're specifically using CBOR in a very concrete way. So in the cases of test vectors, maybe not so much, but certainly in configuration files and automatically generated documents, implementers need to be—need to be very careful about what libraries people who deploy this need to make sure that libraries that they use are are used defensively, that we don't end up with somebody making some some one-character or one-line change that is basically a supply chain attack, or that borks our entire CI or some production system.

Another main use, and where we got the "D" in EDN, is diagnostics: taking CBOR that's going by on the wire, that's on a file on the disk, and making it making it something we can read. So in this sense, we can take EDN and turn it into something that doesn't need to have all of the fancy features that are currently in the EDN for them to be more useful than just looking at a hex dump. But we better be able to express all the valid CBOR that might be there, and there might be some things that are hiding that we really need to be able to see: unusual map orders, non-preferred encodings, indefinite length encoding (excuse me), and non-trivial tags. So being able to express those so that we can see them in that direction is quite important. All right.

So I'm trying to make the case that readability is important and that we should use readability as a as a sort of guiding star for how we develop EDN, largely because of these—I'm just going to go back temporarily—but because of the potential for harm when we take an EDN document that has been deliberately attacked in the EDN-to-CBOR case, and for being able to express things clearly coming from CBOR in the diagnostic case.

All right, so it's easier to spot genuine errors and attacks when the EDN is legible when we're going from EDN to CBOR. And when we're going from CBOR to EDN, we need to be able to make sure that unusual CBOR—things that we would not have expected to be there—are noticed.

And if anybody is wondering why we care about information that is expressed in a different way, there are side-channel attacks and exfiltration attacks which use these kind of hidden channels. CBOR is used in WebTokens, and so this is a logical place to be able to exfiltrate a private key, a secret key, or some sort of private information about perhaps an individual that is authenticating.

And then finally, not to belabor the point, but in general, if we have a spec that lends itself to readability, it's going to make it easier for people to adopt using the CBOR ecosystem. If a developer looks at an EDN document in a spec or an example and they're like, "Yeah, okay, I see what's going on there," then they are more likely to use CBOR than they are if they see something that is hard for them to understand. All right.

Okay, so I have a bunch of these topics. We've talked about a lot of these on the list, many even today, and we're just going to dive in with backtick-quoted strings.

So okay, quick: without counting, how many backticks are in that second line?

There's a psychological principle of how our brains work that we cannot—we cannot just look at that and know how many it is. There are two particular concepts: one is called subitizing, about how many we can see at a glance, which is about four. And then there are how many we can distinguish easily in our working memory, which is seven plus or minus one. And this is relevant because if I have a string of 22 backticks and then I have some stuff, and then I have another string, I can't tell if it's 20, 21, 22, 23, or 24 backticks at the end. And if I can't tell, then that is—that is a great way, that's a footgun—it's a great way for me to have a supply chain attack on my EDN that I'm using for my CI.

We have another place that is totally—the current syntax is totally easy for people to screw up, which is leading and trailing backticks in the current syntax. In general, having leading backticks and trailing backticks are—it just makes the problem of finding the beginning and the end of the string a lot harder. And so I've color-coded what the current syntax allows here, and you know, it's not immediately obvious without this coloring what is in the string and what's not in the string.

So this is the point about my code coloring. My code-coloring point is that many editors only allow you to use regular expressions to go and express this to get code coloring. And if you get code coloring and you have the colors in the wrong place, or some backticks are included which were not included, you're going to get—you're going to get a mess, and the humans are not going to understand, and they're going to make mistakes. And those mistakes can be very costly. They have real implications in the real world.

So, um, I presented here when I submitted the slides yesterday what I—you know, I had seen as some solutions. So the first one is we could simply drop backtick-quoted strings. JSON has been living with escaping quotation marks and backslashes for decades, and we've been living with it in EDN for ten years. That's a possibility.

Now, we've had a decent amount of support for some kind of some kind of additional quoting mechanism assuming we are using backticks. And so the concrete proposal that I made was to restrict us to a maximum of eight and not to allow leading or trailing backticks. Now, you could say, "Well, you know, how do you express leading and trailing backticks?" Well, you could concatenate them. You could use a different quoting mechanism. You can have concatenated strings; we've got that. There's no problem doing that. It—I think it makes things much more obvious what's going on, and someone who's not familiar with the syntax is much less likely to have the wrong impression about what's going on.

And I care much less about the generalization argument of, "Oh well, you should be able to do this with an arbitrary number," because humans cannot tell the difference between 17 and 16 backticks at the end and will not actually be able to detect whether the quote is closed or not.

Вадим proposed fenced backticks, but this means that there are things that we can't express that have leading and trailing space whitespace that that's hard to deal with in that case. All right, um, I'm going to... I'm going to let there be a brief discussion, I suppose, or should I...

Paul Hoffman: Actually, can you just please continue? This would be a presentation because we're going to have to have the discussion, more discussion on the list. So instead of trying to do each one, remembering that we only have another 45 minutes, I think it's better to have your presentation continue, have Carsten's presentation, and then if there's more discussion time, great. But this is all for the list, not for here.

Rohan Mahy: Got it. Okay. All right, we also have four comment comment types. I gave some examples of pathological comments. You know, it's—I don't think most people are going to be able to just at a glance tell what is what the quoted string actually is in these examples.

I'm bringing this up because I think that we want to have some style guidelines for how to use comments, and I want to prevent us from using—from proliferating comments inside of quoted strings because I think that, again, this is another huge footgun.

So, I can't think of very many other contexts in which you can have comments in quoted strings. Regular expression comments are the other thing that comes to mind. But for backwards compatibility, we allow comments inside of a hex string and a b64 string. And the, you know, notably the characters that you're allowed to use in these two are different.

So, I'm, you know, we could—we could take out the C-style comments that we've allowed. They actually allow no new functionality for us. But I don't think that that's, you know, that's not something that I really have a strong feeling about, and we did have a consensus call on this, so I don't wish to go and re-litigate that.

However, I have been very adamant that I do not want us to put EDN-style comments into new quoted string extensions. There are, as Carsten says, there are 1D strings which don't need comments on the inside of the quoted string, and there are 2D quoted strings, and many of those are their own languages which already have their own if they're complicated enough to require a comment, they probably already have their own comment syntax, which may or may not be the same as our comment syntax. We don't need to insert another content syntax into quoted strings.

I also think we want to say, you know, when to use when to use these kinds of comments, and I think the slash, single-slash comments should—the only time that they should appear before a value is right before the name in a map key. All right. Christian, I can't tell what your comment is about.

Christian Amsüss: Yeah, just, just, just as a clarifying comment. But I don't think this has ever been proposed that anything else would have like EDN-style comments.

Rohan Mahy: Carsten has proposed this on multiple occasions. So, it was in the—it was even in the ABNF for a while. Anyway, moving on.

Optional commas. So right now, we have this feature of allowing commas that separate things to be optional as long as there is at least one space or comment. This can result in some pretty ugly and unreadable syntax. Again, the first one isn't even an attempt to be obfuscated; it just happens to be hard to read. And the bottom one is obviously like super easy to make a mistake in.

I think that from a style perspective, we should just say, "Please use a comma." If you have a new line at the end of your value, that's probably perfectly readable, but in every other circumstance, we don't really want to be doing that.

Okay, string concatenation. So right now, we have an explicit plus sign that we can use for concatenation. We do have one issue which has been discussed separately on the list today, which is that this is one of the only features in EDN where encoding indicators don't really make sense without some additional guidance.

So, looking at some of these examples, you know, I could have a null and the word "hello" and a tab and the word "world," and I could set encoding indicators on each of these individual fragments, but what we do on the concatenated string, we need to decide what that is. I made a proposal which is just that you take the highest number of those.

And, you know, basically, I think that there's an additional point that I make later on in this deck about that might be enough for us. Вадим also made a proposal to have a sort of a concat or concat_ something that concatenates strings. I looked at some examples of this, and I think that while this is kind of a cool idea, I think it results in slightly poorer legibility in complex cases. But welcome more people to take a position on that.

All right. Indefinite length strings. So, we have a bunch of special case stuff for dealing with indefinite length strings. I think having a—instead of using the current syntax, using a sequence extension here instead would be a brilliant way to make this more explicit and clear and easier to use and much less error-prone.

So, you could do something like this: you could have a tstring-i or tstring-indefinite, and you have the individual segments that you want incorporated there. And this allows us to put an encoding indicator inside of our individual parts that are sent individually. It also allows us to get rid of the single quote, single quote underbar and double quote, double quote underbar for an empty empty tstring or bstring fragment.

Carsten, could you please in the chat, could you please explain what you mean by the...

Paul Hoffman: Rowan, Rowan, no, please, let's just keep going here. We really want to have discussion on the mailing list. We're already supposedly done with working group last call, we're trying to find places where there is not consensus, and having a discussion here that does not appear on the list is actually massively unhelpful for finding consensus.

Rohan Mahy: Fair enough. Okay, encoding indicators. So this is a place where we've had some, let's say, some weakness in the implementations that are out there.

But this is extremely important for two cases: for test vectors, and then for receiving CBOR on the wire and turning it into EDN if what we receive is not in preferred encoding.

I already mentioned this problem with concatenated strings a couple of slides ago. So, I think that this might be one of those places where Carsten has expressed a desire to do things like be able to do map ordering with an encoding indicator. I think we have a very clear link between the encoding indicators that were in previous versions of EDN and the underbar-i addition. Basically allows us to express things that are directly tied to how an individual element in CBOR appears on the wire. And if we needed to have 128-bit integers or lengths, we could add the underbar-4 encoding indicator at that point.

We could, you know, we could have two different extensibility points: we could have one for encoding indicators, or we could have one for application extensions, or we could have application extensions be able to represent things that have encoding indicators. And I think that this would make a lot of sense.

All right, so one of these things, the consequence of two of these things that I mentioned, one is being able to express encoding, have to do with how we—what the meaning is of a return value sort of for an application extension. So the spec doesn't actually say one way or the other what we do. There have been basically two mental models: Carsten's is that the output of the extension is effectively morally equivalent to legally DN, another EDN text. And so, you know, I've given two examples here of you take an application extension and then what that would what that actually looks like on the the new EDN text that replaces it.

Joe and Вадим basically assumed that this was instead that the output of an extension was a sequence of CBOR bytes and a type. And if you do this, it means that you can do the—we can do this indefinite string as an application extension instead of with this kind of ugly syntax that we have now and a lot of special cases, and we can make it actually quite a lot easier to use. And we can also do things like sorting; we could do with this.

Now, Carsten said that, you know, currently we do not have any requirement that a map that you type in in EDN, when converted to CBOR, shows up in the same order. I think this violates the principle of least surprise. I think most people who would type it in would expect that what they see would be what they get. And even worse, coming from CBOR, if somebody had a weird order of a map and I put that in EDN, and I couldn't see that the order was bizarre, that that would be a major failure of the CBOR ecosystem if in my diagnostic encoder I can't see what the actual order was of the bytes.

All right, two more things. Stand-in syntax. The idea behind stand-in syntax is that I can have a document with some EDN that the processor doesn't understand, and it turns it into this tagged CBOR, which then might be later processed by something else. I am aware of nobody who has implemented a system that takes CBOR with stand-in tags and turns it back into into the CBOR that was intended by the original author of the EDN.

However, in our actual use cases for EDN-to-CBOR conversion like test vectors and configuration, this is not what we want at all. This is an enormous potential attack vector.

So, I suggest that we remove this, and we could add this as a feature later. Another option is we could make this configurable and off by default.

And so, we—this would give us an opportunity having some kind of metadata or pragma or processing information, processing instructions, whatever you want to call it. This could be a very handy feature for us to have. This would allow us to say, for example, "Okay, these in order for you to process this document, you need to be this tall," or rather, "You need to be able to understand these extensions." You could enable or disable stand-in processing. You could say, "Hey, if you see an ellipsis here, I want you to fail right away," because I wasn't expecting to see any ellipses. You could do things about the kind of map order you expected. You could do this in the other direction: you could indicate when you were coming from CBOR into EDN, the processor could indicate what settings they used.

All right, next. Last thing. I made a list of a few points here that I consider to be useful for style or default processing rules for things like pretty printing and so on that go above and beyond what's in Section 1.3 of the current document. They're here for reference. That's it for me. Thank you.

Paul Hoffman: Okay, thank you, Rohan. Let's go on to Carsten now, and then I'll talk a little bit at the end of how we want to deal with all of this. As you can tell from my tone of voice, I'm being pretty grumpy right now about how this is not necessarily the way we are doing things in these meetings and on the side is not necessarily helping us get to consensus. So, Carsten, you are up next.

Carsten Bormann: So, do you want me to request slides, or will you show them?

Paul Hoffman: Christian's got it.

Carsten Bormann: Okay, so I think this will be quick because I wanted to mostly give a status report on working group last call comment handling and there are several PRs already that are merged, which should cover the comments from Laurence, Marco, Martin, and Nicolai.

There are a number of PRs which are unmerged. Of course, it takes some time to actually evaluate a PR, so this is not a big problem, but it will take a few more days until we have those. So I think these are quite obvious and done.

There are a few issues that are resolved with PRs, so this is just for your reference. You actually can click on the underlined things on the slides if you download them.

And there are several issues that have brought up editorial things, and this is mostly about moving text between different sections, which of course always creates problems when you merge different pull requests. So I'm trying to concentrate that work on a specific day, probably at the end of this week.

There also has been a proposal to rationalize the ABNF structure a little bit more, and yeah, that's just ugly, risky work. But I agree that it should be done to enhance the readability of the ABNF.

So, um, there are two areas of activity. One, I would just summarize as, yeah, backtick—the great renaming discussion. We, in the previous interim, had some pretty good but not complete consensus for moving from EDN to CDN. Then people repeated the argument that they have a conflict with CDN, and this time with specific anecdotes from their own work. So maybe we have to take this argument a little bit more serious than we did. And there are several proposals, so I, for instance, had a concise practical diagnostic notation. There's also a proposal for CEDN without saying what it means. And, yeah, if you want to have a little laugh, you can look at this after this meeting. This is how I would motivate the name CEDN, which would not be an abbreviation but just the name. And there are a lot of words that start with C-E-D-N that can be used for motivating that. Okay, but I think we can do this on the mailing list. I just wanted to kickstart that again because we had a lot of activity on that, and essentially converged on CDN, and now we haven't really changed that again, but maybe the arguments for changing it again are a little bit more weighty than I thought.

So, the last bucket here is the interesting ones, and I think Rohan already touched on most of these.

So, there is work on—this is an issue—on clarifying the relationship between encoding identifiers and application extensions and the concatenation mechanism. I think Rohan gave a pretty good overview there. The question really is, does it make sense to treat encoding as an extension point? And if it does, what is to be thought about the argument that we should try to make sure that encoding does not get tangled up with actual semantic activity? So, we are not mixing up encoding indicators with application extensions or with the concatenation mechanism. So, I think that's a discussion that's worth having.

Rohan brought up map order issues. Of course, these have been brought up for the last 13 years, and previously before that with JSON. So, that's a problem that we are not going to fix fundamentally. That's a decision that JSON made that we foolishly imported and that is hurting us every day. Maybe the actual remedy here is to finally complete the map-like data structures document, so we have better ways to do map-like data structures that are ordered, in principle.

And finally, we have code but not yet a pull request for a symmetric version of the raw string delimiters. So, this is implemented and seems to work in my test cases, but of course, we have to convert the implementation into text that goes into the document, and Rohan has just pointed out the document maybe doesn't have to solve all the problems in ABNF, which makes—may make the ABNF simpler, or it may make life harder for people who have good ABNF toolchains. So, maybe we want to just provide both, I don't know. But the symmetry problem is solved, and I think we now have an easier way approaching the question whether raw strings should stay in the document or not.

So, that's my summary of where we are. And of course, I wouldn't mind questions.

Paul Hoffman: Um, I think that's a reasonable summary of where we are, thank you for doing that. Without—and I don't see any other questions, so with that, I would like to do a little bit of a—you can take slide off the screen.

I would like to do a bit of discussion here. Might as well turn on the camera.

Of where we are in the working group with with this particular document. So, we had a working group last call, which ended last week. We didn't end it because there were people who had indicated they still wanted to participate.

There's been a huge amount of discussion, some of which—I mean, so the initial discussion said there were a number of people who said, "This is an important document, I agree with most of the parts in it," (which I think is easy because a lot of the parts were actually adopted from what we already had before). "We need to do this." So there is, I believe at this point with the working group last call, a reasonably strong consensus that we should move forward with the document.

There are—there were also a bunch of different places in the document that people said, "I either don't like this being included at all," or "I like the idea, but I want it done differently." We saw some of those today in Rohan's discussion; we've certainly seen a bunch on the mailing list, and no, I'm not going to catch up on the last 7 hours of 35 discussions now.

But I want to separate out those two. So, there are some people who would like to remove a feature completely. That to me is very different than, and that will be very different in the output of the working group, than that's really different than "That's an okay feature, but if we did it this way, it would be safer, it would be easier to read, it would interact with this other feature better," and such like that.

So, we now in the working group have a couple of choices to make. One is: do we just keep grinding on the document as we were doing before the working group last call and, by dash-30, do another working group last call? Do we take dash-26—and we know there's a dash-26 because Carsten just explained that he has a bunch of pull requests already, you know, like ready to go and some that are nearly ready to go—do we take that and start doing onesies, twosies where someone says, "I do not want the thing in section whatever," or "I want the thing in section whatever to be different"?

I don't know the answer to that. Before the last few days, I would have said we should have done it the—we should go the first way, which is: take a locked-down version and start very carefully saying, "I want this out," or "I want this changed." But in the last few days, people have been discussing a lot of things that are sort of more generic to the whole thing. I'm not going to ask the working group to decide this, I can't imagine that that discussion is going to be very good. This is going to be between Christian, Bormann, and I; we will do this over the course of the next few days.

I will ask Carsten: please do a new draft by either this Friday or the following Monday, with the pull requests that you have in. And you said you have a couple of them sort of ready to go. Do whatever you can based on the discussion so far, and let's start whatever we do with dash-26. And again, hopefully by next Monday, Christian, Bormann, and I will have come up with a procedure for how to get to working group consensus—you know, rough consensus clearly, but working group rough consensus on everything in this document. Again, it's very clear that we want this document to move forwards, there's very clear that there's energy to do it, but it's also really clear that at this point, there are some things that some people want to take out that other people want to leave in, and there are some things that people say, "Well, if we can't take it out, we should change it this way." And once you go into that space, there's always four things to do. As Carsten pointed out, we can't even agree on the name yet. I hope that that becomes the least interesting thing.

So I will call this part of the meeting closed, so that Laurence has some time to talk about status of the serialization document. Again, let's keep the working group discussion mostly focused on EDN. That doesn't mean at this point status is dead; in fact, it does seem alive and it does seem to be getting some action going. So, Laurence, I'll hand it over to you.

Laurence Lundblade: Okay, I'm, uh, didn't prepare anything today, so I'm just speaking off the top of my head here. Um, there's a few things outstanding for another draft. One is some rewarding to the introduction. Another is this PR that Carsten filed. I've been through that and I picked up most of the stuff that I thought was valuable in that. A lot of it was stuff that I—in sort of two categories: stuff like naming that I think needs to be resolved on the list—I mean, is it a serialization or a serialization constraint? That's a mailing list thing, not a PR thing. Um, a lot of it was also inserting parentheticals and, you know, asides into the introduction, and I think the introduction needs to be very tight and just do a simple job rather than be informative. So, give details and give information. The introduction—the purpose of that is just to say why we're here, and we just need to say enough in the introduction to explain to somebody why we're here. We don't have to do exposition in the introduction. So, that's a PR that I will probably merge soon.

The other thing I'm working on is the sample code to do floating-point encoding. There was some code in the CDE document that I looked at. I also have code in my implementation. They basically do the same thing, but the coding styles are very different—an interesting look at coding. So, I'm planning on having something halfway between the CDE document and my code from QCBOR. The code in the CDE document has got about 10 constants in it that are quite mysterious if you don't know anything about floating-point. Correct and interesting, but very mysterious if you don't know, you know, the bit shifts and how many bits in this and how to mask that.

The other thing about that sample code is that because there's no preferred serialization, you know, shortest length float for NaN payloads, I've taken that out. And, I'm sure Carsten is not happy about that, but its purpose is to align with the document—the sample code has to align with what the document says, so that's why it's been framed that way.

So, though my work right now is still about a bunch of coding work to get that sample code tested and, you know, fluffed up and pretty and get that in. And once that's done, I'll publish a new draft. That's probably about another week. That's—that's all I have.

Paul Hoffman: Great. That's actually plenty. Thank you, and thank you for keeping in on it, even though there isn't active working group discussion right now on it. So, I think that that is good. Please do another draft when you feel ready.

And feel free to flag in the new draft where you think there may be disagreement in the working group, so that people can focus on that. Like you said, something like, "I don't think Carsten will be happy about this," or, you know, there's plenty of people who will be unhappy about lots of things. But I think that for serialization at this point, we do have a good idea of where people have very different ideas of vocabulary and such like that.

Were there any other questions on the status of serialization for Laurence? And if so, put up your hand.

Okay, good. Thank you for the presentations today. Thank you, Carsten, for keeping in on doing, you know, getting a new draft ready. And again, if you can do that by—let's let's give people a weekend without having to to be reading stuff. If you can shoot for Monday, and Carsten, since you're in Europe, that'll be before those of us who are not. And, um, I will commit the chairs to figuring out then what we do about the fact that we've had a working group last call and what we do to get this to—get the document to have rough consensus on, you know, as many things as we can, and what we will do if there are parts where there is not rough consensus, how we will handle that.

If there's nothing else, we can end the meeting. Any hands before we do?

Okay, great. Thanks very much for your participation and, again, anything that people said here in the chat that is supposed to be actionable, needs to happen on the list. Thanks very much.