@IEthics
"We were able to decompose #reliability into 12 different dimensions. Evaluating 14 models on two complementary benchmarks, we found that nearly two years of rapid capability progress have produced only modest reliability gains." #ethics #AI #tech #research