Skip to content

Conversation

milancurcic
Copy link
Member

Currently this PR only adds an example program with input concatenation with basic Fortran. There is no change to the library code.

@jvdp1 this is almost exactly your example in #211. I am not sure that this is what you're looking for.

Specifically, in case of 1-d outputs and inputs, it's so trivial that no separate wrapper like a concatenate layer is needed. We just concatenate the arrays.

More generally, following the Keras concatenate, it's also not clear to me that there needs to be a layer for this. A non-trivial case is concatenating 2 N-d arrays along some arbitrary axis (but the two arrays need to have the same shape along all other axes). In this case, a function would be useful, but not sure that a dedicated layer does anything.

And, it's possible that I still don't understand the intent. :) Let me know what you think.

In support of #211

@milancurcic milancurcic added enhancement New feature or request question Further information is requested labels Sep 10, 2025
@milancurcic
Copy link
Member Author

I think I understand better now; this line, "Concatenates a list of inputs." from the Keras documentation was what took me in the completely wrong direction. It's not only about concatenating inputs. It's about merging layer parameters.

I will first make a working example with building blocks that we already have, and then we can discuss if and what part should be best abstracted and how.

@milancurcic
Copy link
Member Author

milancurcic commented Sep 11, 2025

Hi @jvdp1, please see the new example. Like before, no library code was changed.

Merging two networks to feed into one however needs some manual operations that are currently not handled by the library code. Specifically, the backward pass from net3 (downstream branch) to upstream branches (net1 and net2) needs to bypass the calculation of gradient in the branch output layers, which are currently in the library done with the loss function.

In a nutshell, the flow in the example is this:

  1. Forward propagate net1 and net2;
  2. Concatenate outputs and pass as input to net3 and forward propagate it.
  3. Backward propagate net3 to compute its gradients
  4. Manually pass gradients from net3 first hidden layer to compute the gradients in the output layers of net1 and net2
  5. Backward propagage net1 and net2 hidden layers to compute their gradients
  6. Run the optimizer (net % update()) on all 3 networks.

The merged network converges on a minimal example; commenting out call net1 % update(), call net2 % update(), or both, results in a slower convergence because we are effectively disabling updates of parts of a merged network.

Now, about whether and how to abstract this. I'm not clear that this could be implemented as a layer type in the existing framework where a network is assumed to have a 1-d array of layers. However, I can imaging a new network type, say merged_network, that we could invoke like this:

net1 = network(...)
net2 = network(...)
net3 = network(...)

net = merged_nework( &
  upstream_networks = [net1, net2], &
  downstream_network = net3 &
)

and we would define the usual forward, backward, and update methods on the merged_network type to encapsulate the logic that is hand-coded in the example.

Let me know what you think. If the usual time works for you tomorrow (Friday, September 12), I could do Zoom.

@jvdp1
Copy link
Collaborator

jvdp1 commented Sep 11, 2025

Thank you @milancurcic for the new example. I will test it tomorrow on my case.

Now, about whether and how to abstract this. I'm not clear that this could be implemented as a layer type in the existing framework where a network is assumed to have a 1-d array of layers. However, I can imaging a new network type, say merged_network, that we could invoke like this:

net1 = network(...)
net2 = network(...)
net3 = network(...)

net = merged_nework( &
  upstream_networks = [net1, net2], &
  downstream_network = net3 &
)

and we would define the usual forward, backward, and update methods on the merged_network type to encapsulate the logic that is hand-coded in the example.

This makes also sense to me.

Let me know what you think. If the usual time works for you tomorrow (Friday, September 12), I could do Zoom.

Tomorrow is fine. I will try to get some results for our meeting.

@milancurcic milancurcic removed the question Further information is requested label Sep 25, 2025
@milancurcic milancurcic changed the title Concatenate Toward merge networks Sep 25, 2025
@milancurcic
Copy link
Member Author

Hi @jvdp1 see the latest updates and the simplification to the example.

network % get_output(output) is a new subroutine that returns a 1d pointer to the output of the network. This removes the need for the explicit select type statements (for getting the output of net1 and net2) in the user code.

The next thing we need to decide, IMO, is whether it's sufficient to always return a 1-d array as network output, or are 2-d and 3-d array variants necessary. 2-d or 3-d may have uses in other applications, but here we should decide how we want to do concatenation. 1-d is trivial, so if 1-d concatenation is sufficient, we can called it good enough. However, if we want to be able to pass a 2-d or 3-d array as input to net3 without an explicit reshape() layer in the middle, we will need get_output_[23]d and a separate concatenation function or layer.

@jvdp1
Copy link
Collaborator

jvdp1 commented Oct 6, 2025

Hi @jvdp1 see the latest updates and the simplification to the example.

Thank you! I tested the changes and it works fine. The simplification of the backward step (by providing the gradient instead of the output) was very useful, because I made a mistake initially in all the select type procedure.

network % get_output(output) is a new subroutine that returns a 1d pointer to the output of the network. This removes the need for the explicit select type statements (for getting the output of net1 and net2) in the user code.

The next thing we need to decide, IMO, is whether it's sufficient to always return a 1-d array as network output, or are 2-d and 3-d array variants necessary. 2-d or 3-d may have uses in other applications, but here we should decide how we want to do concatenation. 1-d is trivial, so if 1-d concatenation is sufficient, we can called it good enough. However, if we want to be able to pass a 2-d or 3-d array as input to net3 without an explicit reshape() layer in the middle, we will need get_output_[23]d and a separate concatenation function or layer.

For my application 1-d array is enough. However, as it returns pointers, could it return a 1-d array pointer pointing to 2/3-d arrays? So, a reshape() won't be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants