I am iterating through many csv files with 1000 to 3000 lines checking each line whether one of 70000 key words is inherited in a text of 140 characters. My problem at the moment is, that my code runs extremely slow. I guess because of the many iterations. I am relatively new programer and not sure what is the best way to speed up. It took 2 hours to check one entire file and there are still many many I need to go through. My logic at the moment is: import csv as list of lists -> for each list in list take the first element and search for each of the 70000 keywords whether it is mentioned.
Currently my code looks like the following:
import re
import csv
def findname(lst_names,text):
for name in lst_names:
name_match = re.search(r'@'+str(name), text)
if name_match:
return name
lst_users = importusr_lst('users.csv') #defined function to import 700000 keywords
lst_successes = []
with open(file, 'rb') as csvfile:
filereader = csv.reader(csvfile, delimiter = ',')
content = []
for row in filereader:
content.append(row)
if len(content)>1:
for row in content:
hit = []
mentioned = findname(lst_names, row[0]) #row[0] is the text of 140 characters
if mentioned:
hit = row[1:7]
hit.append(mentioned)
lst_successes.append(hit)
return lst_successes
Thanks for any help!
Aucun commentaire:
Enregistrer un commentaire