Skip to content

Commit 3324591

Browse files
committed
Fix UTF-8 bug in NSString_RegEx
This class would use the location information provided by regex(3) as range for for a substring. However, the information regex(3) returns is a byte-based, while NSString works on characters. This can cause a problem when there are UTF-8 characters in the string, as the wrong subsstring will be returned. This is fixed by taking the UTF bytesequence, and extracting a substring from that, rather than using NSString's own substring method
1 parent 4544816 commit 3324591

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

NSString_RegEx.m

+3-1
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,9 @@ - (NSArray *) substringsMatchingRegularExpression:(NSString *)pattern count:(int
5757
break;
5858

5959
NSRange range = NSMakeRange(pmatch[i].rm_so, pmatch[i].rm_eo - pmatch[i].rm_so);
60-
NSString * substring = [self substringWithRange:range];
60+
NSString * substring = [[[NSString alloc] initWithBytes:[self UTF8String] + range.location
61+
length:range.length
62+
encoding:NSUTF8StringEncoding] autorelease];
6163
[outMatches addObject:substring];
6264

6365
if (ranges)

0 commit comments

Comments
 (0)