Wikipedia:Reference desk/Archives/Computing/2013 January 12

From Wikipedia, the free encyclopedia
Computing desk
< January 11 << Dec | January | Feb >> January 13 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


January 12[edit]

core i series[edit]

what is meant by core i3, corei5 etc.? what is/are the main difference/s between them? — Preceding unsigned comment added by 112.134.236.163 (talk) 06:18, 12 January 2013 (UTC)[reply]

The technical specifications are given in List of Intel microprocessors, and the history in Intel Core. They are really just brand names and do not directly represent the real physical number of "cores" -- see Multi-core processor. Dbfirs 09:45, 12 January 2013 (UTC)[reply]
For i3, i5, and i7, see Intel Core. Bubba73 You talkin' to me? 05:34, 13 January 2013 (UTC)[reply]

Parsing an array of strings from a string with a regex[edit]

I asked earlier about parsing a two-dimensional array of strings from a string such as this:

{ {a, b}, {c, d}, {e, f} }

I approached this by first deleting the outermost { } braces and then using a regex to find the contents of the inner { } braces. I first used this regex: \{.*\}, to find anything betweeen { } braces. This didn't work, as it found the entire string. Then I changed it to \{[^\{\}]*\} to find anything except { } braces between { } braces. This works, but if the strings themselves contain { } braces, this will fail.

Now assuming the { } braces in the actual strings are escaped, for example:

{ {a, \{b\}}, {\{c\}, d} }

what kind of regex can I use to get the strings from this? JIP | Talk 11:17, 12 January 2013 (UTC)[reply]

It probably depends on the particular type of regex you're using. (While the common varieties have similar overall syntax, there are a number of differences between various styles, especially as you get into the more complex features.) One possibility is to use what's sometimes termed "look-behind" or "look-ahead" to match '{' and '}' characters that aren't preceded by '\' differently than those that are. E.g. in Perl syntax [1], a regex like "(?<!un)m" will match the 'm' in "matched", but won't match the 'm' in "unmatched". - Of course, this won't help in situations where you use things like quotation marks to escape characters. It's important to realize that regexes can't parse everything. (They can only parse something that can be expressed as a "regular language", which is only a small subset of possible grammars.) It's easy to come up with simple-looking expressions that are actually quite difficult to parse. -- 71.35.120.28 (talk) 23:25, 12 January 2013 (UTC)[reply]
Your first example is matching wrong because it's a greedy regex, in other words it's not stopping at the first } but it's going until it finds the last time your regex fails... in other words it's matching the last }. It depends on the regex library you're using, but c# and perl, for instance, will allow a .*? expression which will be the non-greedy version. I think that should match what you're looking for.
Yes, I noticed when it first failed that it failed because it was greedy. But merely making it non-greedy won't fix the problem. Given the above example I think I'm going to end up with something like:
a, \{b\
\{c\
when I want to end up with:
a, \{b\}
\{c\}, d
If all else fails I'll probably have to parse it character by character, where {, } and \ are special characters. \ puts the parsing into "escape mode", any succeeding character puts it out of it. { and } mean "start of match" and "end of match" unless we're in "escape mode", in which case they simply mean { and }. What I fear is that this will be significantly slower. JIP | Talk 19:07, 13 January 2013 (UTC)[reply]
It's tricky to make regex work with structured pragmas, but if you're limited to something relatively simple, see if the non-greedy version works. Shadowjams (talk) 06:39, 13 January 2013 (UTC)[reply]
Basic unix grep btw won't handle non-greedy expressions. I don't know about ngrep, you might want to test it. Shadowjams (talk) 06:40, 13 January 2013 (UTC)[reply]
This language can in fact be handled with a regular expression (but note that balanced braces cannot). The way I would construct it is more or less from the "parse it character by character" description, which is easily written as a DFA. The result (in a Perl-like syntax) is {(\\.|[^\}])*}. With many regular expression engines and known-valid input, you don't even need the \ in the second operand of the alternation: a greedy matching algorithm will always use the first operand for a backslash and will never need to use the second. --Tardis (talk) 00:30, 14 January 2013 (UTC)[reply]
I think your expression should be \{(\\.|[^\\}])*\}. Perl regexes have never really been regular expressions. In particular, they support recursion, which should allow you to match any context-free language (though not necessarily very efficiently), and nested braces are in that category. Also, I want to emphasize (though you already said it) that if the input might be incorrect you should not leave out the \ from the [] brackets, since it will then succeed on erroneous strings like {\} at the end of the file. And you mustn't leave it off if you're using a real NFA/DFA regular expression engine like re2. -- BenRG (talk) 17:23, 14 January 2013 (UTC)[reply]