I am reminded of the Inmos T9000 (fewer than 9000 sold, quite possibly) which had an
instruction grouper: it had knowledge of the capability of the execution pipeline and would group as many operations from the instruction stream as would fit into packets to be sent down. There may be people who still have nightmares about that: there was some kind of grammar as to what the pipeline could do.
More popular was the trace cache in the Pentium 4 - didn't that store microops?
You do need a nice big, and fast, decoder to see what's going on and whether there are any branches or jumps. And some way to deal with half-finished packets, and packets where you arrive in the middle. Oh, what joy! This beast had better be built programmatically.