regex

mercredi 6 mai 2015

C# RegExp for search inner pattern

I have a string like this

[[[a]]][[[b]]][[[c]]]

I want to extract these :

a
b
c

So i wrote following Pattern :

@"\[\[\[(.+?)\]\]\]"

It works but the result is :

[[[a]]]
[[[b]]]
[[[c]]]

What is the matter ? could you help me ?

Thank you

Take a Unicode character from within a string and decode it

I'm currently working in Python, and I'm pulling a whole bunch a data from the net, including titles of photos. Some of the strings I'm getting have unicode in them, and I'd like to display it as its original character.

I know that if I type, for example,

print u'\u00a9'

that is will output the right character to the terminal.

However, if I get a string such as:

string = 'Copyright \u00a9 David'

I am not sure how to pull it out.

I managed to pull out the character code with RegEx, but I don't know how to insert it back in without getting an error.

I tried:

char = \u00a9
string = 'Copyright' + u'char' + 'David'

which didn't really work.

I need a way to programatically pull out the code (which I can do with RegEx), and then re-insert into the original string with the u' in front of it.

Removing html tag with a specific class from HTML , but not the content using regular expression

In my php script, a variable has following html.

<div>
    first line starting text  <span class='highlight blink'> first line middlte text1 </span> first line end text.
    second line starting text  <span class="target"> second line middlte text2  </span> second line end text
    <div class="highlight blink"> third line text</div>
</div>

I want to remove tags with highlight class so above html looks like this (Using regex expression only)

<div>
   first line starting text  first line middlte text1 first line end text.
   second line starting text  <span class="target"> second line middlte text2  </span> second line end text
   third line text
</div>

I tried with this but it was failed to replace div tag which have multiple class (see third line, div tag must be removed)

$data = preg_replace('#<(\w+) class=["\']highlight["\']>(.*)<\/\1>#', '\2', $data);

I tried with this but it replace entire tag with classes. (See second line, span tag with target class should be stay untouched )

$data = preg_replace('#<(\w+) class=["\'](\w+)["\']>(.*)<\/\1>#', '\2', $data);

Anybody can help thanx in advance, I am trying it for 2 days

Remove first octet from IP address with Regex split

I'm trying to remove the first octet including the leading . from an IP address, I'm trying to use Regex but I cannot figure out the proper way to use it. Here is my code

'47.172.99.12' -split '\.(.*)',""

The result I want is

172.99.12

Regex capture groups in any order

I need to capture attribute values from the string like this:

att_name1=value1|att_name2=value2|att_name3=value3

Attributes can be in any order. Number of attributes is about 50.

I'm aware about lookarounds with which I can match the string. And I wrote the regex that can capture values in particular order:

"^att1=(?\w+)\|att2=(?\w+)\|att3=(?\w+)$

Is it a way to handle any attribute order?

regex pattern for integer with comma

The pattern

$test =  preg_match('/^(?=.\d)\d(?:\.\d\d)?$/', $_float1);

matches

How to modify this to accept a comma in between the integer part? such as 1,253.36?

Capture optional second group of digits (non-repeating) using regex

I have a huge dataset, where I am trying to extract a group of 4 digits. The problem is, sometimes there will be a preceding group of 4 digits that I -don't- want. These 2 groups will never be the same as each other.

Example:

String String 7777 Some more string
String 1234 7777 Some more string

In both of these situations, I want to extract ONLY 7777 (or whatever digit combination replaces it). There is no pattern to distinguish which number group will be in which position - any number from 0000 to 9999 can be in either first or second position.

If this were possible, I think it'd do what I want?

\b\d{4}{0,1}\s{0,1}(\d{4})\b

Optional 4 digits, optional space, capture 4 digits. But I've tried it, and some variations of it, but I can't get it to work!

Lookahead seems like a possible candidate, but I don't understand how to construct the pattern.