Been there, done that, hombre. Did the math, and at the last place I was at, our test data, if transcribed line by line into composition notebooks (our seed files were basically JSON), we'd be tossing 137 or so notebooks worth through the system every test run.
Can you get devs to care about what valid data looks like? Nigh impossible. Hell, I had a hard enough time keeping my testers authoring new test data in a reasonably spec compliant way. A proper data lifecycle is the key, but it will almost always be the least popular part of your process because most people just don't want to think about it.
At some point in your process someone has to know what they are doing. There is no machine that knows correct data for you. It's part of of what makes testing difficult. Everyone else can live in fantasy land, but you, as a tester, have to bring the hammer of reality crashing down. Won't make you many friends, but it is what it is. Your test data must reflect a reality. Someone has to do the footwork observe that reality. Only someone who has done so can then do the next step of authoring valid/representative test data.