Few strata of geekery are more obsessive than regular-depression geekery.
-
Daphne Preston-Kendalreplied to Tim Bray last edited by
@timbray Precedent: the regexp processor of the MOO programming language used (uses!) % instead of \. I think it got this from somewhere else before it, too, but I don’t know where.
Why are you planning to write your own regexp processor instead of using Go’s? It is good and linear time (although the constant factors aren’t great in comparison to some others). You can even construct an AST from your own regexp syntax and let it do the NFA/DFA conversion and simulation for you
-
Tim Brayreplied to Daphne Preston-Kendal last edited by
@dpk As to using Go's regex: My NFA representation is is idiosyncratic, hyper-optimized for raw matching speed, and does much less than Go's. The only result I want is a boolean matches/doesn't-match, so the Go machinery has loads and loads of stuff I don't need.
-
@timbray Looking forward to the poll. Great question. Love the suggestions. I know which one I'm gonna vote for.
PS <see attached> ️ (nothing wrong there, just some font “fun” that's always frustrating).
-
@leoncowle You are right. *sighs*
-
@timbray
1) At first I thought your “regular-depression” was a joke. Now thinking it was an auto-correctism.2) instead of using a different escape character, wouldn’t it be better to protect the whole regex by sending it as pre-compiled binary? E.g. via a pre-processor mechanism.
-
@timbray The left guillemet is not hard to type on Mac (or iOS) keyboards: Option-\. It's thus even mnemonically tied to backslash.
But semantically I see « and I want to see a corresponding » (Shift-Option-\). So my spitball idea would be using left and right guillemets to "quote" the escaped special character:
«(»[^«n»«r»)]*«)»
-
Tim Brayreplied to Cameron Hayne last edited by [email protected]
@cameronhayne Gack. Fixed, thks.
-
-
@philsplace @gruber Well, \r is not a vanilla normal character representing itself either. I just want to remove all special meanings from “\”.
-
@gruber I have to say that looks nice. Hmm, using a pair of enclosing markers suggest they could contain more than one character… So you could also have «P{Lu}» rather than «P»{Lu}. Not sure how I feel about that.
-
@timbray I thought of that too, but didn't want to send you down that rabbit hole. But it's intriguing.
The problem I'm thinking about is that sometime you want to escape a literal character: «(» would mean a literal open paren, but «n» would mean a newline. There aren't many non-literal escapes, though. So, another spitball (could be a truly horrible idea?): what if you keep backslash for non-literal escapes like \n and \r, but use «…» to mean “quote these characters literally”?
-
@timbray So you could type this to get three consecutive literal open parentheses:
«(((»
or
«\\»
to get 2 literal backslashes. Both of those are zillions more legible than `\(\(\(` or the infamous matchsticks of `\\\\`.
-
@gruber All this is compelling, but my library’s users are developers not civilians, and being able to just say “put an X wherever you used to put a \” is attractive. Also I'm kinda over inventing container syntax. Having said that, your idea is visually attractive.