Who should clean the data?

Discussion in 'CROs - General Discussion' started by Anonymous, Dec 15, 2014 at 9:26 PM.

Tags: Add Tags
  1. Anonymous

    Anonymous Guest

    I was not formally trained as a biostatistician, and my introduction to the role was at a small medical device company that did not have a well developed stats department or SOPs. So I'm looking to learn...

    Example 1
    Adverse event "Event Description" is a free form text field. They want a summary table of this field. Should I just list them as-is, or should I group "arthritis pain" and "arthritic pain" and "arthritis" and "arthrritus pain" together?

    Example 2
    Some field has a list of choices, and an "other" and a write-in field for the other. Can I do what I think of as "garbage collection" on that field, and change NA, N/A, ND, UNK, unknown, did not do, ??, blanks, and "dddtt--" all to N/A?

    Example 3
    If it is hard to get detailed requirements or table templates from the customer, but I want to give them informative concise tables, can I take the liberty of summarizing more broadly than in example 1 or 2? Can I take multi-sentence free form text fields, and make my own categories like: arm related, leg related, chest related, other, and then read and interpret and assign each one to one of those categories?

    Example 4
    Free form text fields for medication type and dose. Anyone who has ever done this knows it is a mess. What a huge ugly mess! Please, never again! So anyway, I spent probably 24 solid work hours grouping medications (based on different brand names, different abbreviations, different misspellings, and deducing what the subject meant when they wrote "same" - ie finding what they wrote on the previous visit, etc) and doses, based on what they wrote, compared with what google tells me is a legitimate dose for that medication in cases where what they have written is nonsense, i.e. 1mg when the doses sold are 100mg and 200mg). After doing all this, I felt it was so... subjective? Outside my training? etc that I hired a pharmacist consultant to look over it and basically say "yep, looks good" because honestly it is the kind of thing that anyone with patience and internet access could do, but would it hold up to professional / legal standards for a biostatistician to be making these kinds of assessments? Should I have done this? Or should I have refused from the outset and said the data is unusable, please have someone with relevant expertise fix this and then I can analyze it?

    My inclination used to be to do 1 and 2, but not 3. Number 4 I sort of agreed to do because I had down time and I wanted to try it; but after my gung ho spirit wore off, I thought maybe it was not appropriate work for me! Sometimes I've been asked to do 3. Lately I have been thinking (and again, I want to learn - so please correct me if I'm wrong) that it is even unprofessional for me to do 2, but maybe it is still ok for me to do 1.

    I think I'd really like not to have to do any of those kinds of things, but it is hard to argue it because it seems so self-serving! But seriously, not trying to be lazy, just wondering, it seems like I could make the case that someone on the clinical side of things should be doing all of these types of changes; I could also make the argument that someone in data management should be handling all this before I even see the data. On the other hand, if someone told me "no, stuff like this will always slip through, and it is your job as the statistician to clean it up," I guess I could buy that too.

    tl;dr Should I clean data where judgement calls are required? Just what is considered a judgment call, and what is considered a self-evident correction?

    Thanks in advance
     

  2. Anonymous

    Anonymous Guest

    What kind of organization do you work for? Is your AE data coded in MEDDra and conmed data in WHODrug
     
  3. Anonymous

    Anonymous Guest

    None of the data I have worked with have been coded that way. I don't think those standards are used in medical device clinical trials. Or if they are, the people I have worked with simply haven't used them.