Regex Question.
-
So I've got a strange issue with my youtube plugin, it doesn't seem to handle parameters after the youtube ID very well.
I've got a var that looks like this:
id = $el.data('youtube-id')
Which is parsed via the following regex
var regularUrl = /<a href="(?:https?:\/\/)?(?:www\.)?(?:youtube\.com)\/(?:watch\?v=)([\w\-_]+)?&([\w\-_]+)">.+<\/a>/g; var shortUrl = /<a href="(?:https?:\/\/)?(?:www\.)?(?:youtu\.be)\/([\w\-_]+)">.+<\/a>/g; var embedUrl = /<a href="(?:https?:\/\/)?(?:www\.)youtube.com\/embed\/([\w\-_]+)">.+<\/a>/;
Except it doesn't just put the ID in, it also includes all of the parameters that a user may add afterwards, like start times etc. Which breaks, because the ID becomes
tGZlwK2qTCI&t=3m20s
which isn't a valid video ID. Normally this would only be a problem for the thumbnail that I fetch, but I append &autoplay to the URL here:src="//www.youtube.com/embed/' + id + '?autoplay=1"
I assume it's something up with my regex, I would like $1 to be just the video ID (11 characters), and everything after that to become a part of $2 so I can parse the parameters back in afterwards.
Basically what I'm after is the youtube URL looking like
src="//www.youtube.com/embed/' + id + '?autoplay=1` + parameters + `"
So
$1
would be the 11 character youtube ID, and$2
would be all other parameters after that ID. -
@julian said:
Is this client-side or server-side? Usually I can tell, except with Node, it's all js
Ermmm.
nodebb-plugin-youtube-lite/library.js at master 路 a5mith/nodebb-plugin-youtube-lite
Lazyloads Youtube Videos on your NodeBB Forum. Contribute to a5mith/nodebb-plugin-youtube-lite development by creating an account on GitHub.
GitHub (github.com)
&
nodebb-plugin-youtube-lite/static/lib/lazyYT.js at master 路 a5mith/nodebb-plugin-youtube-lite
Lazyloads Youtube Videos on your NodeBB Forum. Contribute to a5mith/nodebb-plugin-youtube-lite development by creating an account on GitHub.
GitHub (github.com)
-
Server-side, then.
Use this module: http://nodejs.org/api/url.html#url_url_parse_urlstr_parsequerystring_slashesdenotehost
It's going to make your life a million times easier than parsing an URL via regex.
For client-side, use the
Location
object, built in. But that's another topic -
@julian said:
Server-side, then.
Use this module: http://nodejs.org/api/url.html#url_url_parse_urlstr_parsequerystring_slashesdenotehost
It's going to make your life a million times easier than parsing an URL via regex.
For client-side, use the
Location
object, built in. But that's another topicRemember you're talking to an idiot here? I'll look into that.
-
@Ted sits idly by and stalks the topic, knowing that with a little more time, this will be resolved.
-
@esiao The regex works, but if you use a parameter, the ID becomes the
{ID}&the parameter
, which breaks embedding.I forked the youtube plugin that psychobunny made, so I've not really changed much of it.
EDIT: Using that site, I've managed to get what looks right, I'll give it a go and let you know how it goes.
EDIT again, as you can see from http://regexr.com/39m51 the end of the ID is now being included under $2 if there's no parameters, which also breaks it. Is there a way of parsing null if there's no parameters? I'm so close. I think.
-
With
/<a href="(?:https?:\/\/)?(?:www\.)?(?:youtu\.be)\/((?:[\w\-_]+){11})\??([^&]+)?(&?[\w&]+)*">.+<\/a>/g
On<a href="http://youtu.be/foNkJJWFuI8?t=47s¶meter">something</a>
It creates three groups
1: id
2: time
3: parameterIs that what you wanted ? If the time is not used you can make a non capturing group on
([^&]+)
-
That wouldn't work on
<a href="http://youtu.be/foNkJJWFuI8?t=47s¶meter=1">something</a>
due to the = sign.I'm ok with not using the parameters bit, but time would be good to have. As long as I can get the ID without anything else leaking into it, I'm not 100% concerned about parameters etc.
-
Hey @esiao , thanks for the code, there's a slight issue though, that appears to be regex based, it's only firing each code, once, if I embed the same URL, it will only embed 1, not the other, however if I change the video embed to be one of the other URL variations, replacing watch?v= with /embed/ for example, then it embeds fine, as I can't read regex, is there something in this that is stopping it from firing again afterwards.
-
-
@esiao said:
Unfortunately, that doesn't seem to work either, even if I put the works of shakespere between the two youtube URLs, it still only displays one.
Also it doesn't seem to match
watch?v=videoID
either. But it's probably a slightly different regex. -
I'd like to help you out, but I'd need more specific inputs you'd like to read the ID and the parameters from.
Currently I can just tell you that sth. like
[\w\-_]
is no clean regex since it's equivalent to[\w-]
and the shorter the better the overview
Also the[^<a]+
out of the last full regex of @esiao would stop at the firsta
occurrence, not only at the first<a
occurrence as it may suggest.
So there are a few not-so-well parts within each regex I've seen yet and you didn't consider users who put thev=...
parameter after other parameters within the regularUrls. And are you sure that it'll always be like<a href="...">...</a>
and in no case thea
-tag could get another attribute (My emoji-extended broke at some version because thecode
-tags got ^^)...If you want me to help you out with more clean regex (up to my knowledge) I'd likely help you if I get a few example URLs that cover all cases.
Also if you'd be willing to learn regex syntax I'd try to explain my results afterwardsBut for now I have to sleep first, good night zzz