Tuesday, March 1, 2011

What does 'u' mean in a list?

This is the first time I've came across this. Just printed a list and each element seems to have a u in front of it i.e.

[u'hello', u'hi', u'hey']

What does it mean and why would a list have this in front of each element?

As I don't know how common this is, if you'd like to see how I came across it, I'll happily edit the post.

From stackoverflow
  • The u just means that the following string is a unicode string (as opposed to a plain ascii string). It has nothing to do with the list that happens to contain the (unicode) strings.

  • I believe the u' prefix creates a unicode string instead of regular ascii

  • it's an indication of unicode string. similar to r'' for raw string.

    >>> type(u'abc')
    <type 'unicode'>
    >>> r'ab\c'
    'ab\\c'
    
    day_trader : Ah, I thought r'' meant something to do with a regular expression?
    Samir Talwar : It's generally used for regular expressions so we can write things like `r'/[ \t]+/'` instead of `'/[ \\t]+/'` (note the double backslash - you don't have to escape things in raw strings unless you're escaping the closing quote).
    SilentGhost : it's often used in regex to avoid all the escaping backslashes
    day_trader : I see. If I iterate through a unicode listing and check if some string is 'in' the list, will that recognise the string? I'm currently checking each element to see if it matches a certain string and it keeps escaping everytime. Is this because it's Unicode?
    Mike Graham : r and u are a bit different. u indicates the type of the string, whereas r (or ru, if you want to use raw unicode literals) makes a normal str (or unicode, if u and r are both used) but that is parsed differently at compile time. `>>> repr(r'foo') "'foo'" >>> repr(u'foo') "u'foo'"` Notice how the r goes away (that's just a matter of what backslashes do) and the u does not (because it makes an object of different type.)
    SilentGhost : if your string is a unicode string that uses only ascii characters (as in your example) `in` operation would cast the strings implicitly and you'll get `True`: 'abc' in [u'abc'] results in `True`. If your unicode string uses characters outside of ascii charset, you naturally would get `False` in such test.
  • Unicode.

0 comments:

Post a Comment