Qual é a melhor expressão regular para verificar se uma string é um URL válido?

Como posso verificar se uma determinada string é um endereço de URL válido?

Meu conhecimento de expressões regulares é básico e não me permite escolher entre as centenas de expressões regulares que já vi na web.

Eu escrevi meu padrão de URL (na verdade, IRI, internacionalizado) para estar em conformidade com o RFC 3987 ( http://www.faqs.org/rfcs/rfc3987.html ). Estas estão na syntax PCRE.

Para IRIs absolutos (internacionalizados):

/^[az](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=@])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?$/i 

Para também permitir IRIs relativas:

 /^(?:[az](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=@])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?|(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:])*@)?(?:\[(?:(?:(?:[0-9a-f]{1,4}:){6}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|::(?:[0-9a-f]{1,4}:){5}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){4}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:[0-9a-f]{1,4}:[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){3}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,2}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:){2}(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,3}[0-9a-f]{1,4})?::[0-9a-f]{1,4}:(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,4}[0-9a-f]{1,4})?::(?:[0-9a-f]{1,4}:[0-9a-f]{1,4}|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3})|(?:(?:[0-9a-f]{1,4}:){0,5}[0-9a-f]{1,4})?::[0-9a-f]{1,4}|(?:(?:[0-9a-f]{1,4}:){0,6}[0-9a-f]{1,4})?::)|v[0-9a-f]+[-a-z0-9\._~!\$&'\(\)\*\+,;=:]+)\]|(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])(?:\.(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])){3}|(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=@])*)(?::[0-9]*)?(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|\/(?:(?:(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*)?|(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=@])+)(?:\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@]))*)*|(?!(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])))(?:\?(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}\/\?])*)?(?:\#(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}!\$&'\(\)\*\+,;=:@])|[\/\?])*)?)$/i 

Como eles foram compilados (em PHP):

 < ?php /* Regex convenience functions (character class, non-capturing group) */ function cc($str, $suffix = '', $negate = false) { return '[' . ($negate ? '^' : '') . $str . ']' . $suffix; } function ncg($str, $suffix = '') { return '(?:' . $str . ')' . $suffix; } /* Preserved from RFC3986 */ $ALPHA = 'a-z'; $DIGIT = '0-9'; $HEXDIG = $DIGIT . 'a-f'; $sub_delims = '!\\$&\'\\(\\)\\*\\+,;='; $gen_delims = ':\\/\\?\\#\\[\\]@'; $reserved = $gen_delims . $sub_delims; $unreserved = '-' . $ALPHA . $DIGIT . '\\._~'; $pct_encoded = '%' . cc($HEXDIG) . cc($HEXDIG); $dec_octet = ncg(implode('|', array( cc($DIGIT), cc('1-9') . cc($DIGIT), '1' . cc($DIGIT) . cc($DIGIT), '2' . cc('0-4') . cc($DIGIT), '25' . cc('0-5') ))); $IPv4address = $dec_octet . ncg('\\.' . $dec_octet, '{3}'); $h16 = cc($HEXDIG, '{1,4}'); $ls32 = ncg($h16 . ':' . $h16 . '|' . $IPv4address); $IPv6address = ncg(implode('|', array( ncg($h16 . ':', '{6}') . $ls32, '::' . ncg($h16 . ':', '{5}') . $ls32, ncg($h16, '?') . '::' . ncg($h16 . ':', '{4}') . $ls32, ncg($h16 . ':' . $h16, '?') . '::' . ncg($h16 . ':', '{3}') . $ls32, ncg(ncg($h16 . ':', '{0,2}') . $h16, '?') . '::' . ncg($h16 . ':', '{2}') . $ls32, ncg(ncg($h16 . ':', '{0,3}') . $h16, '?') . '::' . $h16 . ':' . $ls32, ncg(ncg($h16 . ':', '{0,4}') . $h16, '?') . '::' . $ls32, ncg(ncg($h16 . ':', '{0,5}') . $h16, '?') . '::' . $h16, ncg(ncg($h16 . ':', '{0,6}') . $h16, '?') . '::', ))); $IPvFuture = 'v' . cc($HEXDIG, '+') . cc($unreserved . $sub_delims . ':', '+'); $IP_literal = '\\[' . ncg(implode('|', array($IPv6address, $IPvFuture))) . '\\]'; $port = cc($DIGIT, '*'); $scheme = cc($ALPHA) . ncg(cc('-' . $ALPHA . $DIGIT . '\\+\\.'), '*'); /* New or changed in RFC3987 */ $iprivate = '\x{E000}-\x{F8FF}\x{F0000}-\x{FFFFD}\x{100000}-\x{10FFFD}'; $ucschar = '\x{A0}-\x{D7FF}\x{F900}-\x{FDCF}\x{FDF0}-\x{FFEF}' . '\x{10000}-\x{1FFFD}\x{20000}-\x{2FFFD}\x{30000}-\x{3FFFD}' . '\x{40000}-\x{4FFFD}\x{50000}-\x{5FFFD}\x{60000}-\x{6FFFD}' . '\x{70000}-\x{7FFFD}\x{80000}-\x{8FFFD}\x{90000}-\x{9FFFD}' . '\x{A0000}-\x{AFFFD}\x{B0000}-\x{BFFFD}\x{C0000}-\x{CFFFD}' . '\x{D0000}-\x{DFFFD}\x{E1000}-\x{EFFFD}'; $iunreserved = '-' . $ALPHA . $DIGIT . '\\._~' . $ucschar; $ipchar = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':@')); $ifragment = ncg($ipchar . '|' . cc('\\/\\?'), '*'); $iquery = ncg($ipchar . '|' . cc($iprivate . '\\/\\?'), '*'); $isegment_nz_nc = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '+'); $isegment_nz = ncg($ipchar, '+'); $isegment = ncg($ipchar, '*'); $ipath_empty = '(?!' . $ipchar . ')'; $ipath_rootless = ncg($isegment_nz) . ncg('\\/' . $isegment, '*'); $ipath_noscheme = ncg($isegment_nz_nc) . ncg('\\/' . $isegment, '*'); $ipath_absolute = '\\/' . ncg($ipath_rootless, '?'); // Spec says isegment-nz *( "/" isegment ) $ipath_abempty = ncg('\\/' . $isegment, '*'); $ipath = ncg(implode('|', array( $ipath_abempty, $ipath_absolute, $ipath_noscheme, $ipath_rootless, $ipath_empty ))) . ')'; $ireg_name = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . '@'), '*'); $ihost = ncg(implode('|', array($IP_literal, $IPv4address, $ireg_name))); $iuserinfo = ncg($pct_encoded . '|' . cc($iunreserved . $sub_delims . ':'), '*'); $iauthority = ncg($iuserinfo . '@', '?') . $ihost . ncg(':' . $port, '?'); $irelative_part = ncg(implode('|', array( '\\/\\/' . $iauthority . $ipath_abempty . '', '' . $ipath_absolute . '', '' . $ipath_noscheme . '', '' . $ipath_empty . '' ))); $irelative_ref = $irelative_part . ncg('\\?' . $iquery, '?') . ncg('\\#' . $ifragment, '?'); $ihier_part = ncg(implode('|', array( '\\/\\/' . $iauthority . $ipath_abempty . '', '' . $ipath_absolute . '', '' . $ipath_rootless . '', '' . $ipath_empty . '' ))); $absolute_IRI = $scheme . ':' . $ihier_part . ncg('\\?' . $iquery, '?'); $IRI = $scheme . ':' . $ihier_part . ncg('\\?' . $iquery, '?') . ncg('\\#' . $ifragment, '?'); $IRI_reference = ncg($IRI . '|' . $irelative_ref); 

Edit 7 March 2011: Por causa do modo como o PHP manipula barras invertidas em strings entre aspas, elas são inutilizáveis ​​por padrão. Você precisará de barras invertidas de escape duplo, exceto quando a contrabarra tiver um significado especial na regex. Você pode fazer isso assim:

 $escape_backslash = '/(?< !\\)\\(?![\[\]\\\^\$\.\|\*\+\(\)QEnrtaefvdwsDWSbAZzB1-9GX]|x\{[0-9a-f]{1,4}\}|\c[AZ]|)/'; $absolute_IRI = preg_replace($escape_backslash, '\\\\', $absolute_IRI); $IRI = preg_replace($escape_backslash, '\\\\', $IRI); $IRI_reference = preg_replace($escape_backslash, '\\\\', $IRI_reference); 

Acabei de escrever um post para uma ótima solução para reconhecer URLs nos formatos mais usados, como:

  • www.google.com
  • http://www.google.com
  • mailto:somebody@google.com
  • somebody@google.com
  • www.url-with-querystring.com/?url=has-querystring

A expressão regular usada é:

 /((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[\w]*))?)/ 

No entanto, eu recomendo que você vá para http://blog.mattheworiordan.com/post/13174566389/url-regular-expression-for-links-with-or-without-the ver o exemplo de trabalho.

Qual plataforma? Se estiver usando o .NET, use System.Uri.TryCreate , não um regex.

Por exemplo:

 static bool IsValidUrl(string urlString) { Uri uri; return Uri.TryCreate(urlString, UriKind.Absolute, out uri) && (uri.Scheme == Uri.UriSchemeHttp || uri.Scheme == Uri.UriSchemeHttps || uri.Scheme == Uri.UriSchemeFtp || uri.Scheme == Uri.UriSchemeMailto /*...*/); } // In test fixture... [Test] void IsValidUrl_Test() { Assert.True(IsValidUrl("http://www.example.com")); Assert.False(IsValidUrl("javascript:alert('xss')")); Assert.False(IsValidUrl("")); Assert.False(IsValidUrl(null)); } 

(Obrigado a @Yoshi pela dica sobre o javascript: 🙂

Aqui está o que RegexBuddy usa.

 (\b(https?|ftp|file)://)?[-A-Za-z0-9+&@#/%?=~_|!:,.;]+[-A-Za-z0-9+&@#/%=~_|] 

Ele corresponde a estes abaixo (dentro das marcas ** ** ):

 **http://www.regexbuddy.com** **http://www.regexbuddy.com/** **http://www.regexbuddy.com/index.html** **http://www.regexbuddy.com/index.html?source=library** 

Você pode baixar o RegexBuddy em http://www.regexbuddy.com/download.html .

No que diz respeito ao post de resposta de pálpebra que diz “Isto é baseado na minha leitura da especificação URI.”: Obrigado Eyelidness, o seu é a solução perfeita que eu procurei, pois é baseado na especificação URI! Excelente trabalho. 🙂

Eu tive que fazer duas alterações. O primeiro a obter o regexp para coincidir com as URLs de endereço IP corretamente no PHP (v5.2.10) com a function preg_match ().

Eu tive que adicionar mais um par de parênteses à linha acima de “Endereço IP” ao redor dos canais:

 )|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}(?# 

Não tenho certeza porque.

Eu também reduzi o tamanho mínimo do domínio de nível superior de 3 para 2 letras para suportar .co.uk e similares.

Código final:

 /^(https?|ftp):\/\/(?# protocol )(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+(?# username )(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?(?# password )@)?(?# auth requires @ )((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*(?# domain segments AND )[az][a-z0-9-]*[a-z0-9](?# top level domain OR )|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}(?# )(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])(?# IP address ))(:\d+)?(?# port ))(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*(?# path )(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)(?# query string )?)?)?(?# path and query string optional )(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?(?# fragment )$/i 

Essa versão modificada não foi verificada em relação à especificação de URI, portanto, não posso garantir sua conformidade, ela foi alterada para lidar com URLs em ambientes de rede local e TLDs de dois dígitos, além de outros tipos de URL da Web, e funcionar melhor no PHP configuração eu uso.

Como código PHP :

 define('URL_FORMAT', '/^(https?):\/\/'. // protocol '(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'. // username '(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'. // password '@)?(?#'. // auth requires @ ')((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*'. // domain segments AND '[az][a-z0-9-]*[a-z0-9]'. // top level domain OR '|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}'. '(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'. // IP address ')(:\d+)?'. // port ')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*'. // path '(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)'. // query string '?)?)?'. // path and query string optional '(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?'. // fragment '$/i'); 

Aqui está um programa de teste em PHP que valida uma variedade de URLs usando o regex:

 < ?php define('URL_FORMAT', '/^(https?):\/\/'. // protocol '(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'. // username '(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'. // password '@)?(?#'. // auth requires @ ')((([a-z0-9]\.|[a-z0-9][a-z0-9-]*[a-z0-9]\.)*'. // domain segments AND '[az][a-z0-9-]*[a-z0-9]'. // top level domain OR '|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}'. '(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'. // IP address ')(:\d+)?'. // port ')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)*'. // path '(\?([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)'. // query string '?)?)?'. // path and query string optional '(#([a-z0-9$_\.\+!\*\'\(\),;:@&=-]|%[0-9a-f]{2})*)?'. // fragment '$/i'); /** * Verify the syntax of the given URL. * * @access public * @param $url The URL to verify. * @return boolean */ function is_valid_url($url) { if (str_starts_with(strtolower($url), 'http://localhost')) { return true; } return preg_match(URL_FORMAT, $url); } /** * String starts with something * * This function will return true only if input string starts with * niddle * * @param string $string Input string * @param string $niddle Needle string * @return boolean */ function str_starts_with($string, $niddle) { return substr($string, 0, strlen($niddle)) == $niddle; } /** * Test a URL for validity and count results. * @param url url * @param expected expected result (true or false) */ $numtests = 0; $passed = 0; function test_url($url, $expected) { global $numtests, $passed; $numtests++; $valid = is_valid_url($url); echo "URL Valid?: " . ($valid?"yes":"no") . " for URL: $url. Expected: ".($expected?"yes":"no").". "; if($valid == $expected) { echo "PASS\n"; $passed++; } else { echo "FAIL\n"; } } echo "URL Tests:\n\n"; test_url("http://localserver/projects/public/assets/javascript/widgets/UserBoxMenu/widget.css", true); test_url("http://www.google.com", true); test_url("http://www.google.co.uk/projects/my%20folder/test.php", true); test_url("https://myserver.localdomain", true); test_url("http://192.168.1.120/projects/index.php", true); test_url("http://192.168.1.1/projects/index.php", true); test_url("http://projectpier-server.localdomain/projects/public/assets/javascript/widgets/UserBoxMenu/widget.css", true); test_url("https://2.4.168.19/project-pier?c=test&a=b", true); test_url("https://localhost/a/b/c/test.php?c=controller&arg1=20&arg2=20", true); test_url("http://user:password@localhost/a/b/c/test.php?c=controller&arg1=20&arg2=20", true); echo "\n$passed out of $numtests tests passed.\n\n"; ?> 

Mais uma vez obrigado a eyelidness para o regex!

Mathias Bynens tem um ótimo artigo sobre a melhor comparação de muitas expressões regulares: Em busca da regex de validação de URL perfeita

O melhor postado é um pouco longo, mas combina praticamente com qualquer coisa que você possa lançar nele.

Versão JavaScript

 /^(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[az\u00a1-\uffff0-9]-*)*[az\u00a1-\uffff0-9]+)(?:\.(?:[az\u00a1-\uffff0-9]-*)*[az\u00a1-\uffff0-9]+)*(?:\.(?:[az\u00a1-\uffff]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$/i 

Versão do PHP

 _^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[az\x{00a1}-\x{ffff}0-9]-*)*[az\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[az\x{00a1}-\x{ffff}0-9]-*)*[az\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[az\x{00a1}-\x{ffff}]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$_iuS 

A postagem Obtendo partes de uma URL (Regex) discute a análise de uma URL para identificar seus vários componentes. Se você quiser verificar se um URL está bem formado, ele deve ser suficiente para as suas necessidades.

Se você precisa verificar se é realmente válido, você eventualmente terá que tentar acessar o que estiver do outro lado.

Em geral, no entanto, provavelmente seria melhor usar uma function fornecida a você por sua estrutura ou outra biblioteca. Muitas plataformas incluem funções que analisam URLs. Por exemplo, há o módulo urlparse do Python, e no .NET você pode usar o construtor da class System.Uri como um meio de validar a URL.

Isso pode não ser um trabalho para regexes, mas para ferramentas existentes em seu idioma de escolha. Você provavelmente quer usar o código existente que já foi escrito, testado e depurado.

No PHP, use a function parse_url .

Perl: módulo URI .

Ruby: módulo URI .

.Net: class ‘Uri’

Regexes não são uma varinha mágica que você acena em todos os problemas que envolvem strings.

Analisador de referência de URI sem validação

Para fins de referência, aqui está a especificação da IETF: ( TXT | HTML ). Em particular, o Apêndice B. Analisando uma Referência de URI com uma Expressão Regular demonstra como analisar uma expressão regular válida . Isso é descrito como

para um exemplo de um analisador de referência de URI sem validação que pegará qualquer string fornecida e extrairá os componentes de URI.

Aqui está o regex que eles fornecem:

  ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? 

Como alguém disse, é provavelmente melhor deixar isso para um lib / framework que você já esteja usando.

Isso corresponderá a todos os URLs

  • com ou sem http / https
  • com ou sem www

… incluindo subdomínios e as novas extensões de nome de domínio de nível superior, como. museu ,. academia ,. fundação, etc., que pode ter até 63 caracteres (não apenas. com , .net ,. info etc.)

 (([\w]+:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)? 

Porque hoje o comprimento máximo da extensão de nome de domínio de nível superior disponível é de 13 caracteres, como. internacional , você pode alterar o número 63 em expressão para 13 para evitar que alguém o use de maneira errada.

como javascript

 var urlreg=/(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?/; $('textarea').on('input',function(){ var url = $(this).val(); $(this).toggleClass('invalid', urlreg.test(url) == false) }); $('textarea').trigger('input'); 
 textarea{color:green;} .invalid{color:red;} 
      

A melhor expressão regular para URL para mim seria:

 "(([\w]+:)?//)?(([\d\w]|%[a-fA-F\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?" 
  function validateURL(textval) { var urlregex = new RegExp( "^(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*$"); return urlregex.test(textval); } 

Jogos http://site.com/dir/file.php?var=moo | ftp: // user: pass@site.com: 21 / file / dir

Non-Matches site.com | http://site.com/dir//

 function validateURL(textval) { var urlregex = new RegExp( "^(http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*$"); return urlregex.test(textval); } 

Jogos http://www.asdah.com/~joe | ftp://ftp.asdah.co.uk:2828/asdah%20asdah.gif | https://asdah.gov/asdh-ah.as

Se você realmente procura pela melhor correspondência, provavelmente a encontrará em ” Uma boa expressão regular de Url “.

Mas um regex que realmente corresponda a todos os domínios possíveis e permita que qualquer coisa permitida de acordo com os RFCs seja terrivelmente longa e ilegível, confie em mim 😉

Eu não era capaz de encontrar o regex que eu estava procurando, então eu modifiquei um regex para preencher meus requisitos e, aparentemente, parece funcionar bem agora. Minhas exigências eram:

Aqui o que eu inventei, qualquer sugestão é apreciada:

 @Test public void testWebsiteUrl(){ String regularExpression = "((http|ftp|https):\\/\\/)?[\\w\\-_]+(\\.[\\w\\-_]+)+([\\w\\-\\.,@?^=%&:/~\\+#]*[\\w\\-\\@?^=%&/~\\+#])?"; assertTrue("www.google.com".matches(regularExpression)); assertTrue("www.google.co.uk".matches(regularExpression)); assertTrue("http://www.google.com".matches(regularExpression)); assertTrue("http://www.google.co.uk".matches(regularExpression)); assertTrue("https://www.google.com".matches(regularExpression)); assertTrue("https://www.google.co.uk".matches(regularExpression)); assertTrue("google.com".matches(regularExpression)); assertTrue("google.co.uk".matches(regularExpression)); assertTrue("google.mu".matches(regularExpression)); assertTrue("mes.intnet.mu".matches(regularExpression)); assertTrue("cse.uom.ac.mu".matches(regularExpression)); assertTrue("http://www.google.com/path".matches(regularExpression)); assertTrue("http://subdomain.web-site.com/cgi-bin/perl.cgi?key1=value1&key2=value2e".matches(regularExpression)); assertTrue("http://www.google.com/?queryparam=123".matches(regularExpression)); assertTrue("http://www.google.com/path?queryparam=123".matches(regularExpression)); assertFalse("www..dr.google".matches(regularExpression)); assertFalse("www:google.com".matches(regularExpression)); assertFalse("https://www@.google.com".matches(regularExpression)); assertFalse("https://www.google.com\"".matches(regularExpression)); assertFalse("https://www.google.com'".matches(regularExpression)); assertFalse("http://www.google.com/path'".matches(regularExpression)); assertFalse("http://subdomain.web-site.com/cgi-bin/perl.cgi?key1=value1&key2=value2e'".matches(regularExpression)); assertFalse("http://www.google.com/?queryparam=123'".matches(regularExpression)); assertFalse("http://www.google.com/path?queryparam=12'3".matches(regularExpression)); } 

Eu escrevi uma pequena versão groovy que você pode executar

corresponde aos seguintes urls (o que é bom o suficiente para mim)

 public static void main(args){ String url = "go to http://www.m.abut.ly/abc its awesome" url = url.replaceAll(/https?:\/\/w{0,3}\w*?\.(\w*?\.)?\w{2,3}\S*|www\.(\w*?\.)?\w*?\.\w{2,3}\S*|(\w*?\.)?\w*?\.\w{2,3}[\/\?]\S*/ , { it -> "woof${it}woof" }) println url } 

http://google.com

http://google.com/help.php

http://google.com/help.php?a=5

http://www.google.com

http://www.google.com/help.php

http://www.google.com?a=5

google.com?a=5

google.com/help.php

google.com/help.php?a=5

http://www.m.google.com/help.php?a=5 (e todas as suas permutações)

http://www.m.google.com/help.php?a=5 (e todas as suas permutações)

m.google.com/help.php?a=5 (e todas as suas permutações)

O importante para qualquer URL que não comece com http ou www é que eles devem include um / ou?

Eu aposto que isso pode ser ajustado um pouco mais, mas faz o trabalho muito bom por ser tão curto e compacto … porque você pode dividir em 3:

encontre qualquer coisa que comece com http: https ?: // w {0,3} \ w *?. \ w {2,3} \ S *

encontre algo que comece com www: www. \ w *?. \ w {2,3} \ S *

ou encontrar qualquer coisa que tenha um texto e um ponto, pelo menos duas letras e depois um? ou /: \ w *?. \ w {2,3} [/ \?] \ S *

Eu uso essa regex:

 ((https?:)?//)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,63}(:[\d]+)?(/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)? 

Para suportar ambos:

 http://stackoverflow.com https://stackoverflow.com 

E:

 //stackoverflow.com 

Eu tenho trabalhado em um artigo em profundidade discutindo a validação de URI usando expressões regulares. É baseado no RFC3986.

Validação de URI de Expressão Regular

Embora o artigo ainda não esteja completo, eu criei uma function PHP que faz um bom trabalho de validação de URLs HTTP e FTP. Aqui está a versão atual:

 // function url_valid($url) { Rev:20110423_2000 // // Return associative array of valid URI components, or FALSE if $url is not // RFC-3986 compliant. If the passed URL begins with: "www." or "ftp.", then // "http://" or "ftp://" is prepended and the corrected full-url is stored in // the return array with a key name "url". This value should be used by the caller. // // Return value: FALSE if $url is not valid, otherwise array of URI components: // eg // Given: "http://www.jmrware.com:80/articles?height=10&width=75#fragone" // Array( // [scheme] => http // [authority] => www.jmrware.com:80 // [userinfo] => // [host] => www.jmrware.com // [IP_literal] => // [IPV6address] => // [ls32] => // [IPvFuture] => // [IPv4address] => // [regname] => www.jmrware.com // [port] => 80 // [path_abempty] => /articles // [query] => height=10&width=75 // [fragment] => fragone // [url] => http://www.jmrware.com:80/articles?height=10&width=75#fragone // ) function url_valid($url) { if (strpos($url, 'www.') === 0) $url = 'http://'. $url; if (strpos($url, 'ftp.') === 0) $url = 'ftp://'. $url; if (!preg_match('/# Valid absolute URI having a non-empty, valid DNS host. ^ (?P[A-Za-z][A-Za-z0-9+\-.]*):\/\/ (?P (?:(?P(?:[A-Za-z0-9\-._~!$&\'()*+,;=:]|%[0-9A-Fa-f]{2})*)@)? (?P (?P \[ (?: (?P (?: (?:[0-9A-Fa-f]{1,4}:){6} | ::(?:[0-9A-Fa-f]{1,4}:){5} | (?: [0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){4} | (?:(?:[0-9A-Fa-f]{1,4}:){0,1}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){3} | (?:(?:[0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})?::(?:[0-9A-Fa-f]{1,4}:){2} | (?:(?:[0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})?:: [0-9A-Fa-f]{1,4}: | (?:(?:[0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})?:: ) (?P[0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4} | (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?) ) | (?:(?:[0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})?:: [0-9A-Fa-f]{1,4} | (?:(?:[0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})?:: ) | (?P[Vv][0-9A-Fa-f]+\.[A-Za-z0-9\-._~!$&\'()*+,;=:]+) ) \] ) | (?P(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3} (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)) | (?P(?:[A-Za-z0-9\-._~!$&\'()*+,;=]|%[0-9A-Fa-f]{2})+) ) (?::(?P[0-9]*))? ) (?P(?:\/(?:[A-Za-z0-9\-._~!$&\'()*+,;=:@]|%[0-9A-Fa-f]{2})*)*) (?:\?(?P (?:[A-Za-z0-9\-._~!$&\'()*+,;=:@\\/?]|%[0-9A-Fa-f]{2})*))? (?:\#(?P (?:[A-Za-z0-9\-._~!$&\'()*+,;=:@\\/?]|%[0-9A-Fa-f]{2})*))? $ /mx', $url, $m)) return FALSE; switch ($m['scheme']) { case 'https': case 'http': if ($m['userinfo']) return FALSE; // HTTP scheme does not allow userinfo. break; case 'ftps': case 'ftp': break; default: return FALSE; // Unrecognized URI scheme. Default to FALSE. } // Validate host name conforms to DNS "dot-separated-parts". if ($m['regname']) { // If host regname specified, check for DNS conformance. if (!preg_match('/# HTTP DNS host name. ^ # Anchor to beginning of string. (?!.{256}) # Overall host length is less than 256 chars. (?: # Group dot separated host part alternatives. [A-Za-z0-9]\. # Either a single alphanum followed by dot | # or... part has more than one char (63 chars max). [A-Za-z0-9] # Part first char is alphanum (no dash). [A-Za-z0-9\-]{0,61} # Internal chars are alphanum plus dash. [A-Za-z0-9] # Part last char is alphanum (no dash). \. # Each part followed by literal dot. )* # Zero or more parts before top level domain. (?: # Explicitly specify top level domains. com|edu|gov|int|mil|net|org|biz| info|name|pro|aero|coop|museum| asia|cat|jobs|mobi|tel|travel| [A-Za-z]{2}) # Country codes are exactly two alpha chars. \.? # Top level domain can end in a dot. $ # Anchor to end of string. /ix', $m['host'])) return FALSE; } $m['url'] = $url; for ($i = 0; isset($m[$i]); ++$i) unset($m[$i]); return $m; // return TRUE == array of useful named $matches plus the valid $url. } 

This function utilizes two regexes; one to match a subset of valid generic URIs (absolute ones having a non-empty host), and a second to validate the DNS “dot-separated-parts” host name. Although this function currently validates only HTTP and FTP schemes, it is structured such that it can be easily extended to handle other schemes.

This one works for me very well. (https?|ftp)://(www\d?|[a-zA-Z0-9]+)?\.[a-zA-Z0-9-]+(\:|\.)([a-zA-Z0-9.]+|(\d+)?)([/?:].*)?

Here’s a ready-to-go Java version from the Android source code. This is the best one I’ve found.

 public static final Matcher WEB = Pattern.compile(new StringBuilder() .append("((?:(http|https|Http|Https|rtsp|Rtsp):") .append("\\/\\/(?:(?:[a-zA-Z0-9\\$\\-\\_\\.\\+\\!\\*\\'\\(\\)") .append("\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,64}(?:\\:(?:[a-zA-Z0-9\\$\\-\\_") .append("\\.\\+\\!\\*\\'\\(\\)\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,25})?\\@)?)?") .append("((?:(?:[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}\\.)+") // named host .append("(?:") // plus top level domain .append("(?:aero|arpa|asia|a[cdefgilmnoqrstuwxz])") .append("|(?:biz|b[abdefghijmnorstvwyz])") .append("|(?:cat|com|coop|c[acdfghiklmnoruvxyz])") .append("|d[ejkmoz]") .append("|(?:edu|e[cegrstu])") .append("|f[ijkmor]") .append("|(?:gov|g[abdefghilmnpqrstuwy])") .append("|h[kmnrtu]") .append("|(?:info|int|i[delmnoqrst])") .append("|(?:jobs|j[emop])") .append("|k[eghimnrwyz]") .append("|l[abcikrstuvy]") .append("|(?:mil|mobi|museum|m[acdghklmnopqrstuvwxyz])") .append("|(?:name|net|n[acefgilopruz])") .append("|(?:org|om)") .append("|(?:pro|p[aefghklmnrstwy])") .append("|qa") .append("|r[eouw]") .append("|s[abcdeghijklmnortuvyz]") .append("|(?:tel|travel|t[cdfghjklmnoprtvwz])") .append("|u[agkmsyz]") .append("|v[aceginu]") .append("|w[fs]") .append("|y[etu]") .append("|z[amw]))") .append("|(?:(?:25[0-5]|2[0-4]") // or ip address .append("[0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\\.(?:25[0-5]|2[0-4][0-9]") .append("|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1]") .append("[0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}") .append("|[1-9][0-9]|[0-9])))") .append("(?:\\:\\d{1,5})?)") // plus option port number .append("(\\/(?:(?:[a-zA-Z0-9\\;\\/\\?\\:\\@\\&\\=\\#\\~") // plus option query params .append("\\-\\.\\+\\!\\*\\'\\(\\)\\,\\_])|(?:\\%[a-fA-F0-9]{2}))*)?") .append("(?:\\b|$)").toString() ).matcher(""); 

I found the following Regex for URLs, tested successfully with 500+ URLs :

/\b(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[az\x{00a1}-\x{ffff}0-9]+-?)*[az\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[az\x{00a1}-\x{ffff}0-9]+-?)*[az\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[az\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:\/[^\s]*)?\b/gi

I know it looks ugly, but the good thing is that it works. 🙂

Explanation and demo with 581 random URLs on regex101.

Source: In search of the perfect URL validation regex

I tried to formulate my version of url. My requirement was to capture instances in a String where possible url can be cse.uom.ac.mu – noting that it is not preceded by http nor www

 String regularExpression = "((((ht{2}ps?://)?)((w{3}\\.)?))?)[^.&&[a-zA-Z0-9]][a-zA-Z0-9.-]+[^.&&[a-zA-Z0-9]](\\.[a-zA-Z]{2,3})"; assertTrue("www.google.com".matches(regularExpression)); assertTrue("www.google.co.uk".matches(regularExpression)); assertTrue("http://www.google.com".matches(regularExpression)); assertTrue("http://www.google.co.uk".matches(regularExpression)); assertTrue("https://www.google.com".matches(regularExpression)); assertTrue("https://www.google.co.uk".matches(regularExpression)); assertTrue("google.com".matches(regularExpression)); assertTrue("google.co.uk".matches(regularExpression)); assertTrue("google.mu".matches(regularExpression)); assertTrue("mes.intnet.mu".matches(regularExpression)); assertTrue("cse.uom.ac.mu".matches(regularExpression)); //cannot contain 2 '.' after www assertFalse("www..dr.google".matches(regularExpression)); //cannot contain 2 '.' just before com assertFalse("www.dr.google..com".matches(regularExpression)); // to test case where url www must be followed with a '.' assertFalse("www:google.com".matches(regularExpression)); // to test case where url www must be followed with a '.' //assertFalse("http://wwwe.google.com".matches(regularExpression)); // to test case where www must be preceded with a '.' assertFalse("https://www@.google.com".matches(regularExpression)); 

For Python, this is the actual URL validating regex used in Django 1.5.1:

 import re regex = re.compile( r'^(?:http|ftp)s?://' # http:// or https:// r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[AZ]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' # domain... r'localhost|' # localhost... r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|' # ...or ipv4 r'\[?[A-F0-9]*:[A-F0-9:]+\]?)' # ...or ipv6 r'(?::\d+)?' # optional port r'(?:/?|[/?]\S+)$', re.IGNORECASE) 

This does both ipv4 and ipv6 addresses as well as ports and GET parameters.

Found in the code here , Line 44.

whats wrong with plain and simple FILTER_VALIDATE_URL ?

  $url = "http://www.example.com"; if(!filter_var($url, FILTER_VALIDATE_URL)) { echo "URL is not valid"; } else { echo "URL is valid"; } 

I know its not the question exactly but it did the job for me when I needed to validate urls so thought it might be useful to others who come across this post looking for the same thing

The following RegEx will work:

 "@((((ht)|(f))tp[s]?://)|(www\.))([az][-a-z0-9]+\.)?([az][-a-z0-9]+\.)?[az][-a-z0-9]+\.[az]+[/]?[a-z0-9._\/~#&=;%+?-]*@si" 

For convenience here’s a one-liner regexp for URL’s that will also match localhost where you’re more likely to have ports than .com or similar.

 (http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}(\.[az]{2,6}|:[0-9]{3,4})\b([-a-zA-Z0-9@:%_\+.~#?&\/\/=]*) 

You don’t specify which language you’re using. If PHP is, there is a native function for that:

 $url = 'http://www.yoururl.co.uk/sub1/sub2/?param=1&param2/'; if ( ! filter_var( $url, FILTER_VALIDATE_URL ) ) { // Wrong } else { // Valid } 

Returns the filtered data, or FALSE if the filter fails.

Check it here >>

Hope it helps.

I hope it’s helpful for you…

 ^(http|https):\/\/+[\www\d]+\.[\w]+(\/[\w\d]+)? 

This is a rather old thread now and the question asks for a regex based URL validator. I ran into the thread whilst looking for precisely the same thing. While it may well be possible to write a really comprehensive regex to validate URLs I eventually settled on another way to do things – by using PHP’s parse_url function.

It returns boolean false if the url cannot be parsed. Otherwise it returns the scheme, the host and other information. This may well not be enough for a comprehensive URL check on its own but can be drilled down into for further analysis. If the intent is to simply catch typos, invalid schemes etc it is perfectly adequate.

To Check URL regex would be:

 ^http(s{0,1})://[a-zA-Z0-9_/\\-\\.]+\\.([A-Za-z/]{2,5})[a-zA-Z0-9_/\\&\\?\\=\\-\\.\\~\\%]*