Parse Text With Multiple Links Using Regex In Javascript
Solution 1:
How about:
var text = 'http://www.youtube.com/watch?v=-LiPMxFBLZY testing http://www.youtube.com/watch?v=Q3-l22b_Qg8&feature=related http://yahoo.com';
var ytre = /(\b(https?|ftp|file):\/\/[\-A-Z0-9+&@#\/%?=~_|!:,.;]*[\-A-Z0-9+&@#\/%=~_|])/ig;
var resultArray = text.match(ytre);
Solution 2:
To parse URLs, using regexs, look at the RFC that defines URLs.
So to find regular expressions, use a variant that makes the protocol and authority non-optional, like /\b(([^:\/?#]+):)(\/\/([^\/?#]*))([^?#]*)(\?([^#]*))?(#(.*))?/gi
.
http://www.ietf.org/rfc/rfc3986.txt says
Appendix B. Parsing a URI Reference with a Regular Expression
As the "first-match-wins" algorithm is identical to the "greedy" disambiguation method used by POSIX regular expressions, it is natural and commonplace to use a regular expression for parsing the potential five components of a URI reference.
The following line is the regular expression for breaking-down a well-formed URI reference into its components.
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?123456789
The numbers in the second line above are only to assist readability; they indicate the reference points for each subexpression (i.e., each paired parenthesis). We refer to the value matched for subexpression as $. For example, matching the above expression to
http://www.ics.uci.edu/pub/ietf/uri/#Related
results in the following subexpression matches:
$1 = http:$2 = http $3 = //www.ics.uci.edu $4 = www.ics.uci.edu $5 = /pub/ietf/uri/$6 = <undefined> $7 = <undefined> $8 = #Related$9 = Related
where indicates that the component is not present, as is the case for the query component in the above example. Therefore, we can determine the value of the five components as
scheme = $2authority = $4path = $5query = $7fragment = $9
Post a Comment for "Parse Text With Multiple Links Using Regex In Javascript"