optimizing ESM3 datasets