5.9 Ethical and Professional Considerations
Data preparation decisions have downstream consequences that are easy to underestimate. Every choice — which records to keep, how to handle missing values, what transformations to apply — shapes the analysis that follows. Professional practice requires that these choices be deliberate, documented, and defensible.
5.9.1 Documentation and Auditability
A BI professional’s data cleaning decisions must be reproducible and auditable. If a colleague, a manager, or a regulator asks “why were these 47 records excluded?” or “how were missing BMI values handled?”, you need a clear answer — not a vague recollection. This means writing cleaning scripts (not making manual edits to spreadsheets), commenting code to explain why a decision was made (not just what was done), and maintaining a record of the raw data alongside the cleaned version. The project organization principles discussed earlier in this chapter support this practice.
5.9.2 Handling Sensitive Data
BI datasets frequently contain personally identifiable information (PII) — employee names, health records, salary data, customer addresses. Professional data handling requires following your organization’s data governance policies: restricting access to sensitive fields, anonymizing or aggregating data when individual-level detail is not needed, and never storing sensitive data in unsecured locations. During data cleaning, be especially careful with derived variables — combining zip code, age, and department may re-identify individuals even after names are removed.
5.9.3 Honest Reporting of Data Quality
It is tempting to clean away problems silently — dropping rows with missing values, removing outliers that complicate a model, or ignoring inconsistencies that are difficult to resolve. Professional practice requires transparency: report data quality issues to stakeholders rather than hiding them. If 15% of records are missing a key variable, the stakeholder needs to know — that limitation affects the reliability of any analysis built on the data, and concealing it undermines trust.