I would like to extract the element symbols (if present) from a word. For this, I have prepared a regex matching pattern consisting of all the symbols of elements in periodic table.
H|He|Li|Be|B|C|N|O|F|Ne|Na|Mg|Al|Si|P|S|Cl|Ar|K|Ca|Sc|Ti|V|Cr|Mn|Fe|Co|Ni|Cu|Zn|Ga|Ge|As|Se|Br|Kr|Rb|Sr|Y|Zr|Nb|Mo|Tc|Ru|Rh|Pd|Ag|Cd|In|Sn|Sb|Te|I|Xe|Cs|Ba|La|Ce|Pr|Nd|Pm|Sm|Eu|Gd|Tb|Dy|Ho|Er|Tm|Yb|Lu|Hf|Ta|W|Re|Os|Ir|Pt|Au|Hg|Tl|Pb|Bi|Po|At|Rn|Fr|Ra|Ac|Th|Pa|U|Np|Pu|Am|Cm|Bk|Cf|Es|Fm|Md|No|Lr|Rf|Db|Sg|Bh|Hs|Mt
Now, for a given word, I would like to extract the elements from it by using the above regex pattern. The problem that I face now is that for words like
CuIn2Se
I am able to extract
C,In,S
as the elements. This is an incorrect extraction as I need
Cu, In, Se
from the regex whereas I am getting "C,In,S" and I believe the reason for this is that the matching pattern sees "C" before "Cu" and "S" before "Se" (for e.g., the current matching pattern is like)
C | In | S | Cu | Se
To solve this problem, I think, I would have to ensure that regex matches maximum number of characters in my word by searching all the words in the pattern.
Aucun commentaire:
Enregistrer un commentaire