Skip to content

Support TCP for protocol messages#3636

Draft
softins wants to merge 46 commits into
jamulussoftware:mainfrom
softins:tcp-protocol
Draft

Support TCP for protocol messages#3636
softins wants to merge 46 commits into
jamulussoftware:mainfrom
softins:tcp-protocol

Conversation

@softins

@softins softins commented Mar 11, 2026

Copy link
Copy Markdown
Member

Short description of changes

Support fallback to TCP for protocol messages, in order to overcome potential loss of large messages due to UDP fragmentation. Currently an incomplete draft, for comment as development continues.

CHANGELOG: Client/Server: Support TCP fallback for protocol messages.

Context: Fixes an issue?

Discussed in issue #3242.

Does this change need documentation? What needs to be documented and how?

It will need documentation once design and development are complete. Particularly need to explain the firewall requirements for a server or directory.

Status of this Pull Request

Incomplete, still under development. Main server side complete and working. Client side development in progress. Complete and ready for review and testing. Still marked draft as it needs some of the debug messages to be commented out before merging.

What is missing until this pull request can be merged?

A lot of testing of both server and client. Intended for Jamulus 4.0.0.

Checklist

  • I've verified that this Pull Request follows the general code principles
  • I tested my code and it does what I want
  • My code follows the style guide
  • I waited some time after this Pull Request was opened and all GitHub checks completed without errors.
  • I've filled all the content above

@softins softins added this to the Release 4.0.0 milestone Mar 11, 2026
@softins softins self-assigned this Mar 11, 2026
@softins

softins commented Mar 11, 2026

Copy link
Copy Markdown
Member Author

So far, this implements the server side of the design described here and here

@softins softins force-pushed the tcp-protocol branch 4 times, most recently from 5e1a658 to 0ae51e2 Compare March 16, 2026 13:05
@softins softins linked an issue Mar 16, 2026 that may be closed by this pull request
@softins softins added the feature request Feature request label Mar 16, 2026
@softins softins force-pushed the tcp-protocol branch 3 times, most recently from 7ad1d1f to d939e5b Compare March 26, 2026 17:38
@softins

softins commented Mar 28, 2026

Copy link
Copy Markdown
Member Author

So the next stage of implementation has been achieved: client-side support in the Connect dialog.

  1. If the server list has not been received via UDP when the associated message indicating TCP support has arrived, the client will retry fetching the server list over TCP.
  2. If the client list for a server has not been received via UDP when the associated message indicating TCP support has arrived, the client will retry fetching the client list over TCP, and will continue to use TCP for that server while the Connect dialog is open.
  3. A directory or server that does not have TCP support will not send the TCP supported message, and will continue to be handled as in current versions.
  4. If the server list or client list is successfully received over UDP, there is no need for the client to try TCP.

It has been tested by using nft to drop outbound Jamulus UDP messages with a specific message ID, to simulate loss due to fragmentation.

Examples for a directory-enabled server running on port 22120:

  • drop UDP server list: nft add rule inet filter output udp sport 22120 @ih,16,16 0xee03 drop
  • drop UDP client list: nft add rule inet filter output udp sport 22120 @ih,16,16 0xf503 drop
  • drop UDP "TCP supported" msg: nft add rule inet filter output udp sport 22120 @ih,16,16 0xfb03 drop

Note that nft rules require network byte order (big-endian), but Jamulus IDs are little-endian:

  • CLM_SERVER_LIST = 1006 = 0x03ee => 0xee03 (LE byte order)
  • CLM_RED_SERVER_LIST = 1018 = 0x03fa => 0xfa03 (LE byte order)
  • CLM_CONN_CLIENTS_LIST = 1013 = 0x03f5 => 0xf503 (LE byte order)
  • CLM_TCP_SUPPORTED = 1019 = 0x03fb => 0xfb03 (LE byte order)

@softins

softins commented Mar 28, 2026

Copy link
Copy Markdown
Member Author

The next step is to try implementing the connected-mode TCP described here

@ann0see ann0see self-requested a review April 7, 2026 14:51
Comment thread src/tcpserver.h
Comment thread src/main.cpp
bool bUseTranslation = true;
bool bCustomPortNumberGiven = false;
bool bEnableIPv6 = false;
bool bEnableTcp = false;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we'll have a long time for the 4.0 release, I'd enable it by default soon (of course once we've tested that the basics work)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I disagree. It's a server-only option, and most servers operators will not need to enable TCP support. Only those running large directories or large servers will need to, and they also need to understand and configure their firewall requirements.

TCP support in the client will indeed be enabled by default, but will only take effect when talking to a directory or server that has enabled it.

If a server operator enables TCP without having configured their firewall correctly, client users could have problems as the server would advertise TCP support to the client, but the client could be unable to connect.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not give an error message or fallback procedure in case the TCP connection timed out?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm sure we can. I haven't yet tested that scenario.

But it doesn't negate my view that server-side TCP support needs to be an explicit option.

Comment thread src/connectdlg.cpp Outdated
Comment thread src/connectdlg.cpp Outdated
@ann0see ann0see added the bug Something isn't working label Apr 9, 2026
@github-project-automation github-project-automation Bot moved this to Triage in Tracking Apr 9, 2026
@ann0see ann0see moved this from Triage to In Progress in Tracking Apr 9, 2026
@softins

softins commented Apr 9, 2026

Copy link
Copy Markdown
Member Author

Well I've finished implementing everything I intended to, for directory, server and client, so it's ready for reviewing and trying out, as and when time permits (post 3.12.0).

I have a private directory and server built and running with TCP support, at newjam.softins.co.uk on the standard port 22124.

In order to demonstrate the use of TCP in a new client's connect dialog, it will be necessary to use custom firewall filters on the client end to temporarily drop incoming UDP Jamulus protocol messages containing a server list or connected clients list.

There is full forward and backward compatibility between clients and servers built with TCP support and older versions.

@softins softins marked this pull request as ready for review April 9, 2026 22:48
@softins softins marked this pull request as draft April 10, 2026 06:30
@softins

softins commented Apr 10, 2026

Copy link
Copy Markdown
Member Author

Keeping as draft, because it will need quite a few debug messages removed before merging.

@softins

softins commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Now rebased to latest dual-stack main and tested. I haven't yet studied @pljones review comments from a week or two ago. Will do soon.

The displayed address of 0.0.0.0 was misleading for a dual-stack socket
Comment thread docs/TCP.md
@softins softins marked this pull request as draft June 10, 2026 19:25
Comment thread docs/TCP.md

The basic summary is that TCP need only be used as a fallback when it is determined that a UDP message from a directory or server failed to reach the client, probably due to fragmentation, _and_ that the directory or server explicitly supports TCP.

### Current operation when client opens Connect dialog

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also note that the JSONRPC Client API provides means to request the server list and client list. Both those scenarios are susceptible to the same issue.

Unless the solution materially differs, then I'd avoid - as much as possible and except for clarification - to mention the UI. It's just an example consumer of the response.

The solution needs to work regardless of that consumer.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Most of the document avoids mentioning the GUI at all - it's worth trying to keep it that way throughout.)

Comment thread docs/TCP.md

### Enhancement for TCP support

1. A server (which may also be a directory) can be configured with the command-line option `--enabletcp` to enable TCP operation.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note as to why this should not ever be made the default, if there is one. Otherwise, clarify why it's defaulting to off. (My view is, it's either innocuous and can be enabled for everyone and it'll either work or do nothing or it will cause some kind of trouble people should have highlighted to them.)

Comment thread docs/TCP.md
b. There is no need for the directory to send `CLM_RED_SERVER_LIST` to the client, since the TCP connection is reliable, so the directory server just sends the `CLM_SERVER_LIST` over the TCP connection.

4. When the client has received the `CLM_SERVER_LIST` over TCP, it closes the TCP connection, populates its list of servers in the connect dialog in the normal way and stops the 2.5 sec re-request timer.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear here -- there is no mention of persisting the information that a Directory has TCP support for CLM_SERVER_LIST. Presumably this is not needed as, each time a CLM_REQ_SERVER_LIST UDP request is sent, the same steps - 2 to 4 - are repeated.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, the information does not need to be persisted at the client end. In the connect dialog implementation, I do persist the TCP support status separately for each listed server, so that once it knows a particular server needs TCP it doesn't drop back to UDP (except I need to make it drop back to UDP if the TCP connection fails). But that's just an optimisation and only for the life of that server list and gets forgotten when changing directories or on closing the dialog.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably worth explicitly saying this in the doc -- and commenting the code clearly as to the design behind it. The doc is good but the code needs to have the commentary "right there", as in years to come... no one's going to read the fine manual...

Comment thread docs/TCP.md

8. If the server accepts a TCP connection and receives a `CLM_REQ_CONN_CLIENTS_LIST` over it, it will process the request in the same way as for a UDP request, but will send the reply over the TCP connection.

9. When the client has received the `CLM_CONN_CLIENTS_LIST` over TCP, it closes the TCP connection and updates the list of clients for that server in the GUI. However, it will note for that server that TCP is needed, and if/when the number of connected clients next changes while the connect dialog is still open, it will immediately request the updated list via TCP instead of UDP.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it will note for that server that TCP is needed, and if/when the number of connected clients next changes while the connect dialog is still open, it will immediately request the updated list via TCP instead of UDP.

Explain why this makes sense here and not for the Directory and why there's no point saving this information if the UDP request works.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server list is only fetched from the directory once, not repeatedly, after opening the connect dialog or changing directory. So there's nothing to remember there. And persisting the TCP status for each listed server is also just an optimisation and not essential, as I mentioned above.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a Client never gets an updated server list from the Directory, whilst the dialog is open, even if another Server registers?

Hm... We should call that a bug, really. If I open the Connect dialog and the nearest server is 35ms, I'd want to know if one 15ms appeared.

Comment thread docs/TCP.md
Comment thread docs/TCP.md
- `RECORDER_STATE` - current state of the server-based recording. Sent when the state changes?
- `JITT_BUF_SIZE` - the size of the receiving jitter buffer for this connection on the server. Sent in Auto mode when the value changes.
- `CLM_PING_MS` - sent in response to a `CLM_PING_MS` received from the client. For client-side ping time calculation.
- `CONN_CLIENTS_LIST` - list of connected clients. Sent when the list changes due to a client connecting or leaving. This message could be large on a server with many clients.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be fixed by a new protocol message:
CLIENT_LIST_CHANGE <action> [<channel> [<channel details>]
where

  • <action> is "clear", "add", "update" or "remove"
  • <channel> is the server channel number, only present on add/update/remove
  • <channel details> is the player details, only present on add/update

which should never fragment...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except when a client joins a server that already has a large number of connected clients, it will still get the complete list all at once, so I don't think CLIENT_LIST_CHANGE gains us much.

@pljones pljones Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except when a client joins a server that already has a large number of connected clients

Maybe a V4 Client could first send REQ_CLIENT_LIST_CHANGES and, if it got the CLIENT_LIST_CHANGE "clear" back, it wouldn't request the client list and the Server wouldn't send the full list or the Server would at least stop sending them. (That would cut traffic generally, too.)

Pre-V4 support carries on the same way - and hasn't got TCP anyway, so TCP can't fix it.

Comment thread docs/TCP.md

By sending the `CLM_TCP_SUPPORTED` message immediately *after* sending a potentially large list of servers or connected clients, it allows a client easily to determine whether or not it needs to fall back to TCP without the necessity of timeouts or other delays. It will only need to use TCP if it has not already succeeded in receiving the message over UDP.

## CONNECTED MODE

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to get this into a separate PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? That just feels like more work for no gain. Temptation can always be resisted!

@pljones pljones Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly because I can see alternative approaches, so I'm more cautious. With the pre-connection, it's not arguable - it needs doing, we've seen it break and it affects a lot of people. The audience for this part is different.

Comment thread docs/TCP.md

## OTHER CONSIDERATIONS

The server should only be configured to offer TCP by specifying `--enabletcp` if the server operator has also configured any firewall to allow the inbound TCP connections.

@pljones pljones Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's upside down: if the Server operator specifies --enabletcp, then they will need to make sure their system allows the traffic. They can specify --enabletcp as much as they like, it just won't get traffic from anywhere if it's blocked... Jamulus would not fail if it can't connect to the Client over TCP: the Client might not have incoming TCP access, so the connection would fail at that point.

Comment thread docs/TCP.md

The server should only be configured to offer TCP by specifying `--enabletcp` if the server operator has also configured any firewall to allow the inbound TCP connections.

If a server were to offer TCP to the client, but the server's firewall didn't allow the incoming TCP connection, the client request for TCP would wait until its request times out.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should handle it like any other dropped request. It'll go around and try UDP.

This is why I'm nervous about "remember I was told to try TCP" until it's actually worked.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm thinking of splitting CFM_TCP into CFM_TCP_REQUEST and CFM_TCP_RESULT, like it is for UDP.

Comment thread docs/TCP.md

If a server were to offer TCP to the client, but the server's firewall didn't allow the incoming TCP connection, the client request for TCP would wait until its request times out.

This has to be the responsibility of the server/directory operator, and is why TCP operation must be controlled by a command-line option, rather than always enabled. The operator should only enable TCP in the Jamulus server if they know their environment has been configured to support it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to disagree with it being a reason for not always enabling it, as stated. Things should carry on working exactly as they would without it -- except an extra exchange happens that fails. Enabling by default IPv6 does the same.

Comment thread docs/TCP.md

This has to be the responsibility of the server/directory operator, and is why TCP operation must be controlled by a command-line option, rather than always enabled. The operator should only enable TCP in the Jamulus server if they know their environment has been configured to support it.

Most operators of small servers of directories will not need to be concerned with TCP at all. _The only server operators who will need to enable TCP support are those running large directories (e.g. Volker, Peter) or those running a large server designed to support many simultaneous client connections._

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is the real reason. I think it should be explain like this primarily, rather than on the technicalities.

It's pointless doing it for everyone. It doesn't matter, it's just pointless.

Comment thread docs/TCP.md

The reason for using a TCP connection in an active session is just to provide a reliable path for delivering a list of connected clients that could be large and subject to fragmentation (if it is sent over UDP). So the established TCP connection is only used to deliver client lists, and not other protocol messages.

Therefore, if the server has an active TCP connection from the client, it will use the connectionless `CLM_CONN_CLIENTS_LIST` message to deliver updates for the connected client list. If there is no active TCP connection, updates will be delivered using the connected-mode `CONN_CLIENTS_LIST` over UDP as at present.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth noting at this point (as it's in the flow and notable) why the two messages are different - i.e. why it doesn't bother to use the CONN_CLIENTS_LIST.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working feature request Feature request needs documentation PRs requiring documentation changes or additions

Projects

Status: Waiting on Team

Development

Successfully merging this pull request may close these issues.

Support TCP for protocol messages

3 participants