regex: Active questions tagged regex

Affichage des articles dont le libellé est Active questions tagged regex - Stack Overflow. Afficher tous les articles

mercredi 6 mai 2015

Error parsing regex pattern in php

I want to split a string such as the following (by a divider like '~@@' (and only that)):

to=enquiry@test.com~@@subject=test~@@text=this is body/text~@@date=date/this isn't camptured after the slash

into an array containing e.g.:

to => enquiry@test.com
subject => test
text => this is body/text
date => date

I'm using php5 and I've got the following regex, which almost works, but there are a couple of errors and there must be a way to do it in one go:

        //Split the string in the url of $text at every ~@@
        $regexp = "/(?:|(?<=~@@))(.*?=.*?)(?:~@@|$|\/(?!.*~@@))/";
        preg_match_all($regexp, $text, $a); 
        //$a[1] is an array containing var1=content1 var2=content2 etc;

        //Now create an array in the form [var1] = content, [var2] = content2
        foreach($a[1] as $key => $value) {
            //Get the two groups either side of the equals sign
            $regexp = "/([^\/~@@,= ]+)=([^~@@,= ]+)/";
            preg_match_all($regexp, $value, $r); 

            //Assign to array key = value
            $val[$r[1][0]] = $r[2][0]; //e.g. $val['subject'] = 'hi'
        }

        print_r($val);

My queries are that: 1. It doesn't seem to capture more than 3 different sets of parameters 2. It is breaking on the @ symbol and so not capturing email addresses. 3. I am doing multiple different regex searches where I suspect I would be able to do one.

Any help would be really appreciated.

Thanks

VIm: copy match with cursor position atom to local variable

I'm searching for a way to copy match result to local variable in vim script.

The issue is that I want to match text that includes cursor position atom \%#, that is, for example: [A-Za-z:]*\%#[A-Za-z:]\+, which matches identifiers like ::namespace::ParentClass::SubClass text under cursor (so <cword> does not work for me).

I would like to use this later in a script, but the more I dig the more I start to wonder if that's even possible (or: if I should do it differently, by collecting current line, cursor position and then just extract the identifier under cursor manually).

If that's not possible from within the vim script - what would be the idea behind the \%# atom? what is its use?

How to identify a set of dates from a string in rails

I have the following strings
"sep 04 apr 06"
"29th may 1982"
"may 2006 may 2008"
"since oct 11"

Is there a way to obtain the dates from these string. I used the gem 'dates_from_string', but it is unable to correctly obtain date from first scenario.

regular expression for decimal with fixed total number of digits

Is there a way to write regular expression that will match strings like

(0|[1-9][0-9]*)\.[0-9]+

but with a specified number of numeric characters. for example: for 3 numeric characters it should match "0.12", "12.3" but not match "1.234" or "1.2". I know I can write it something like

(?<![0-9])(([0-9]{1}\.[0-9]{2})|([1-9][0-9]{1})\.[0-9]{1})(?![0-9])

but that becomes quite tedious for large number of digits.

(I know I don't need {1} but it better explains what I'm doing)

Regex Multiple Matches PHP

Im trying to get all numbers from a string, having - or _ before the number and optional - _ space or the end of string at the end of the number.

so my Regex looks like this [-_][\d]+[-_ $]?

My problem is, i dont match numbers right after each other. From a String "foo-5234_2123_54-13-20" i only get 5234,54 and 20.

What i tried is following (?:[-_])[\d]+(?:[-_ $])? and [-_]([\d]+)[-_ $]? which obvoiusly didnt work,im looking for hours now and i know it cant be that hard so im hoping someone can help me here in short.

If that makes a different, im using php preg_match_all.

How to search a particular string in a file using pattern matcher in java

I have the following paragraph

java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:197) at jcifs.netbios.SessionServicePacket.readPacketType(SessionServicePacket.java:68) at jcifs.netbios.NbtSocket.connect(NbtSocket.java:107) at jcifs.netbios.NbtSocket.(NbtSocket.java:68) at jcifs.smb.SmbTransport.ensureOpen(SmbTransport.java:275) at jcifs.smb.SmbTransport.send(SmbTransport.java:602) at jcifs.smb.SmbTransport.negotiate(SmbTransport.java:847) at jcifs.smb.SmbTree.treeConnect(SmbTree.java:119) at jcifs.smb.SmbFile.connect(SmbFile.java:790) at jcifs.smb.SmbFile.connect0(SmbFile.java:760) at jcifs.smb.SmbFile.queryPath(SmbFile.java:1149) at jcifs.smb.SmbFile.exists(SmbFile.java:1232) at com.ssc.faw.util.SmbFileOperator.copyFile(Unknown Source) at com.ssc.faw.newnav2faw.FundList.checkFiles(Unknown Source) at com.ssc.faw.newnav2faw.FundList.buildList(Unknown Source) at com.ssc.faw.newnav2faw.Process.buildFundList(Unknown Source) at com.ssc.faw.job.NavToFaw.runNav2Faw(Unknown Source) at com.ssc.faw.job.NavToFaw.runNav2Faw(Unknown Source) at com.ssc.faw.job.NavToFaw.runJob(Unknown Source) at com.ssc.faw.job.NavToFaw.main(Unknown Source) [WARN ] 2015-05-05 21:02:26,383 Caught an Exception for file \ac_asd_2.my_web.com\global\nvice\alert\ filename abcd123.xls continuing to process other funds 0 : null jcifs.smb.SmbException: An error occured sending the request.

Now the question is, based on the occurence of the word "Connection reset" I need to find the immediate next .xls filename "abcd123.xls" (could be of any name) Can we do this via REGEX??

What can be the best RegEx for parsing the following string?I need to parse the lines highlighted in the below string

/product/prd-2210444/croft-barrow-denim-jacket-womens.jsp" class="showQuickViewPan image-holder-s javascript:void(0);" title="Croft & Barrow® Denim Jacket - Women's" rel="http://ift.tt/1JqaxcT javascript:void(0);" title="Croft & Barrow® Denim Jacket - Women's" rel="http://ift.tt/1IfQnoP javascript:void(0);" title="Croft & Barrow® Denim Jacket - Women's" rel="http://ift.tt/1JqaxcV javascript:void(0);" title="Croft & Barrow® Denim Jacket - Women's" rel="http://ift.tt/1IfQnoT javascript:void(0);" title="Croft & Barrow® Denim Jacket - Women's" rel="http://ift.tt/1JqaxcZ javascript:void(0)" class="moreViewSwatch-su /product/prd-2210444/croft-barrow-denim-jacket-womens.jsp /product/prd-201610/chaps-wool-blend-2-button-blazer-men.jsp" class="showQuickViewPan image-holder-s javascript:void(0);" title="Chaps Wool-Blend 2-Button Blazer - Men" rel="http://ift.tt/1IfQnp1 javascript:void(0);" title="Chaps Wool-Blend 2-Button Blazer - Men" rel="http://ift.tt/1Jqaxd1 javascript:void(0)" class="moreViewSwatch-su /product/prd-201610/chaps-wool-blend-2-button-blazer-men.jsp /product/prd-1492409/levis-classic-denim-jacket-womens.jsp" class="showQuickViewPan image-holder-s javascript:void(0);" title="Levi's Classic Denim Jacket - Women's" rel="http://ift.tt/1IfQpNs javascript:void(0);" title="Levi's Classic Denim Jacket - Women's" rel="http://ift.tt/1Jqaxd3 javascript:void(0);" title="Levi's Classic Denim Jacket - Women's" rel="http://ift.tt/1IfQpNu javascript:void(0);" title="Levi's Classic Denim Jacket - Women's" rel="http://ift.tt/1JqauxU javascript:void(0)" class="moreViewSwatch-su /product/prd-1492409/levis-classic-denim-jacket-womens.jsp /product/prd-c155964/chaps-classic-fit-gray-wool-blend-stretch-suit-separates-men.jsp" class="showQuickViewPan image-holder-s /product/prd-c155964/chaps-classic-fit-gray-wool-blend-stretch-suit-separates-men.jsp

how to find out '<' character which is not a markup tag in xml string using java?

The below xml is converted into a String.I have to find '<' character which is part of xml element actionComent value

<actionTakenTaskCollectionRoot>
  <actionTakenTask actionTakenTaskId="8a8080844cd55b0b014cd5f783ea0692">
    <actionComment>a **<** b</actionComment>
  </actionTakenTask>
</actionTakenTaskCollectionRoot>

How can I use vim regex to replace text when math divide is involved in the expression

I am using vim to process text like the following

0x8000   INDEX1 ....
0x8080   INDEX2 ....
....
0x8800   INDEXn ....

I want to use regular expression to get the index number of each line. that is

0x8000 ~ 0
0x8080 ~ 1
....
0x8800 ~ n

The math evaluation should be (hex - 0x8000) / 0x80. I am trying to using vim regular expression substitution to get the result in line

%s/^\(\x\+\)/\=printf("%d", submatch(1) - 0x8000)

This will yield

0     INDEX0
128   INDEX1
....
2048  INDEXn

What I want to do is to further change it to

0     INDEX0
1     INDEX1
...
20    INDEXn

That is, I want to further divide the first column with an 0x80. Here is when I get the problem.

The original argument is "submatch(1) - 0x8000". I now add an "/ 0x80" to it, which forms

%s/^\(\x\+\)/\=printf("%d", (submatch(1) - 0x8000)\/0x80)

Now Vim report error

Invalid expression: printf("%d", (submatch(1) - 0x8000)\/0x80))

It looks like vim meet problem when processing "/". I also tried with a single "/" (without escape), but still fails.

Can anyone help me on this?

How to extract a complex version number using sed?

I use sed in CentOs to extract version number and it's work fine:

echo "Version 4.2.4 (test version)" | sed -nre 's/^[^0-9]*(([0-9]+\.)*[0-9]+).*/\1/p'

But my problem is that i am not able to extract when the version is shown like this:

Version 4.2.4-RC1 (test version)

I want to extract the 4.2.4-RC1 if it is present.

Any ideas ?

EDIT

maybe the extract is make from a path like this: /var/opt/test/war/test-webapp-4.1.56-RC1.war. It's not the same format each time. Like :

echo "var/opt/test/war/test-webapp-4.1.56-RC1.war" | sed -nre 's/^[^0-9]*(([0-9]+\.)*[0-9]+).*/\1/p'

Regular Expression for Percentage of marks

I am trying to create a regex that matches percentage for marks

For example if we consider few percentages

1)100%
2)56.78%
3)56 78.90%
4)34.6789%

The matched percentages should be

100%
56.78%
34.6789%

I have made an expression "\\d.+[\\d]%" but it also matches for 56 78.90% which I don't want.

If anyone knows such expression please share

Powershell Regex Expression. Convert line uri to UK friendly format

I would like some assistance with Regex within Powershell please. If someone can help I'd be really grateful...

I have downloaded a script from Microsoft which will allow us to take a string and convert it into a friendly format to display on user profiles.

The original string is tel:+441234123456;ext=3456

What I need to do is convert it into a UK friendly format so

converted string is 01234 123456

The steps I think I need to take are :- Removing the tel:+44 and replacing with 0. After first 4 digits add a space. Finish the variable with the last 6 digits. Remove the ;ext=3456

There was a similar process but for US suggested, unfortunately no knowing regex this goes over my head slightly!

$tel = $LineURI -replace ‘tel:(\+1)([2-9]\d{2})([2-9]\d{2})(\d{4});ext=\d{4}’,’$1 ($2) $3-$4;

Any suggestions or help welcome!

How to re-match a group that did not capture anything?

I'm trying to parse a string in which a certain section can either be enclosed between " or ' or not be enclosed at all. However, I'm struggling finding a syntax that works when no quotation marks are there at all.

See the following (simplified) example:

>>> print re.match(r'\w(?P<quote>(\'|"))?\w', 'f"oo').group('quote')
"

>>> print re.match(r'\w(?P<quote>(\'|"))?\w', 'foo').group('quote')
None

>>> print re.match(r'\w(?P<quote>(\'|"))?\w(?P=quote)', 'f"o"o').group('quote')
"

>>> print re.match(r'\w(?P<quote>(\'|"))?\w(?P=quote)', 'foo').group('quote')
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<string>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
'NoneType' object has no attribute 'group'

The desired result for the last attempt should be None as the second command in the example.

Regular expression to create multiple word fragments based off the same words

Let's say I have the following string:

var str = "I like barbeque at dawn";

I want pairs of all words which are separated by a space. This can be achieved via the following regular expression:

  var regex = /[a-zA-Z]+ [a-zA-Z]+/g;
  str.match(regex);

This results in:

["I like", "barbeque at"]

But what if I want ALL permutations of the pairs? The regular expression fails, because it only matches any given word onces. For example, this is what I want:

["I like", "like barbeque", "barbeque at", "at dawn"]

I know I can use the recursive backtracking pattern to generate permutations. Do regular expressions have the power to create these types of pairs for me?

C# RegExp for search inner pattern

I have a string like this

[[[a]]][[[b]]][[[c]]]

I want to extract these :

a
b
c

So i wrote following Pattern :

@"\[\[\[(.+?)\]\]\]"

It works but the result is :

[[[a]]]
[[[b]]]
[[[c]]]

What is the matter ? could you help me ?

Thank you

Take a Unicode character from within a string and decode it

I'm currently working in Python, and I'm pulling a whole bunch a data from the net, including titles of photos. Some of the strings I'm getting have unicode in them, and I'd like to display it as its original character.

I know that if I type, for example,

print u'\u00a9'

that is will output the right character to the terminal.

However, if I get a string such as:

string = 'Copyright \u00a9 David'

I am not sure how to pull it out.

I managed to pull out the character code with RegEx, but I don't know how to insert it back in without getting an error.

I tried:

char = \u00a9
string = 'Copyright' + u'char' + 'David'

which didn't really work.

I need a way to programatically pull out the code (which I can do with RegEx), and then re-insert into the original string with the u' in front of it.

Removing html tag with a specific class from HTML , but not the content using regular expression

In my php script, a variable has following html.

<div>
    first line starting text  <span class='highlight blink'> first line middlte text1 </span> first line end text.
    second line starting text  <span class="target"> second line middlte text2  </span> second line end text
    <div class="highlight blink"> third line text</div>
</div>

I want to remove tags with highlight class so above html looks like this (Using regex expression only)

<div>
   first line starting text  first line middlte text1 first line end text.
   second line starting text  <span class="target"> second line middlte text2  </span> second line end text
   third line text
</div>

I tried with this but it was failed to replace div tag which have multiple class (see third line, div tag must be removed)

$data = preg_replace('#<(\w+) class=["\']highlight["\']>(.*)<\/\1>#', '\2', $data);

I tried with this but it replace entire tag with classes. (See second line, span tag with target class should be stay untouched )

$data = preg_replace('#<(\w+) class=["\'](\w+)["\']>(.*)<\/\1>#', '\2', $data);

Anybody can help thanx in advance, I am trying it for 2 days

Remove first octet from IP address with Regex split

I'm trying to remove the first octet including the leading . from an IP address, I'm trying to use Regex but I cannot figure out the proper way to use it. Here is my code

'47.172.99.12' -split '\.(.*)',""

The result I want is

172.99.12

Regex capture groups in any order

I need to capture attribute values from the string like this:

att_name1=value1|att_name2=value2|att_name3=value3

Attributes can be in any order. Number of attributes is about 50.

I'm aware about lookarounds with which I can match the string. And I wrote the regex that can capture values in particular order:

"^att1=(?\w+)\|att2=(?\w+)\|att3=(?\w+)$

Is it a way to handle any attribute order?

regex pattern for integer with comma

The pattern

$test =  preg_match('/^(?=.\d)\d(?:\.\d\d)?$/', $_float1);

matches

How to modify this to accept a comma in between the integer part? such as 1,253.36?