Thursday, April 28, 2011

need help removing time from a csv file

im trying to process a csv and make it easier for sorting, and i need to remove the time and the dash from it. the file has entries like this:

James,07/20/2009-14:40:11
Steve,08/06/2006-02:34:37
John,11/03/2008-12:12:34

and parse it into this:

James,07/20/2009
Steve,08/06/2006
John,11/03/2008

im guessing sed is the right tool for this job?

thanks for your help.

From stackoverflow
  • cut -d '-' -f 1 < /path/to/your/file

    Edit after comment: sed 's/-[0-9][0-9]:[0-9][0-9]:[0-9][0-9]//g' < /path/to/your/file

    nmuntz : what if the name contains a dash?
    Alberto Zaccagni : If the name is something like Al-Ashrad then the output will be Al, which is wrong, thank you for pointing that out. I edited accordingly.
  • Python

    import csv
    import datetime
    
    rdr = csv.reader( open("someFile.csv", "rb" ) )
    rows = list( reader )
    rdr.close()
    
    def byDateTime( aRow ):
        return return datetime.datetime.strptime( aRow[1], "%m/%d/%Y-%H:%M:%S" )
    
    rows.sort( key= byDateTime )
    
    wtr = csv.writer( open("sortedFile.csv", "wb" ) )
    wtr.writerows( rows )
    wtr.close()
    
  • just use awk

    awk -F"," '{ split($2,_,"-"); print $1,_[1] }' OFS="," file
    
  • Yes, I think sed is the right tool for the job:

    sed 's/-[:0-9]*$//' file
    

0 comments:

Post a Comment