There's a field called Interpretability (sometimes "Mechanistic Interpretability...

scottydog51834 · on July 25, 2023

Some people (namely the EAs) care because they don't want AI to kill us.

Another reason is to understand how our models make important decisions. If we one day use models to help make medical diagnoses or loan decisions, we'd like to know why the decision was made to ensure accuracy and/or fairness.

Others care because understanding models could allow us to build better models.

ShamelessC · on July 26, 2023

> At least half of this interest overlaps with Effective Altruism's fears that AI could one day cause considerable harm to the human race.

That’s a little depressing.