Using Beautifulsoup To Extract A Table In Python 3
Solution 1:
So you already have this:
datasets = [
  (('Tests', '103'), ('Failures', '24'), ('Success Rate', '76.70%'), ('Average Time', '71 ms'), ('Min Time', '0 ms'), ('Max Time', '829 ms')), 
  (('Tests', '109'), ('Failures', '35'), ('Success Rate', '82.01%'), ('Average Time', '12 ms'), ('Min Time', '2 ms'), ('Max Time', '923 ms'))
]
Here's how you can transform it. Assuming all rows are the same, you can extract headers from the first row:
headers_row = [hdr for hdr, data in datasets[0]]
Now, extract the second field of each tuple like ('Tests', '103') in each row:
processed_rows = [
  [datafor hdr, datain row]
  for row in datasets
]
# [['103', '24', '76.70%', '71 ms', '0 ms', '829 ms'], ['109', '35', '82.01%', '12 ms', '2 ms', '923 ms']]
Now you have the header row and a list of processed_rows. You can write them to a CSV file with the standard csv module.
A better solution may be to keep your original format and use csv.DictWriter.
- Extract the headers into - headers_row, as shown above.
- Write the data: - import csv withopen('data.csv', 'w', newline='') as csvfile: writer = csv.DictWriter(csvfile, fieldnames= headers_row) writer.writeheader() for row in datasets: # your original data writer.writerow(dict(row))
Here dict(datasets[0]), for example, is:
{'Tests': '103', 'Failures': '24', 'Success Rate': '76.70%', 'Average Time': '71 ms', 'Min Time': '0 ms', 'Max Time': '829 ms'}
Solution 2:
At the end, just convert your zip iterator to a list:
for row in table.find_all("tr")[1:]:
    dataset = zip(headings, (td.get_text() for td in row.find_all("td")))
    datasets.append(list(dataset))  # process iterator to list
print(datasets)
Final Output:
[[('Tests', '103'), 
('Failures', '24'), 
('Success Rate', '76.70%'), 
('Average Time', '71 ms'), 
('Min Time', '0 ms'), 
('Max Time', '829 ms')], 
[('Tests', '109'), 
('Failures', '35'), 
('Success Rate', '82.01%'), 
('Average Time', '12 ms'), 
('Min Time', '2 ms'), 
('Max Time', '923 ms')]]
If you want to convert the dataset to a csv string, use this code:
# convert to csv string
hdrline = ','.join(e[0] for e in datasets[0]) + "\n"
data = ""for rw in datasets:
    data += ','.join([e[1] for e in rw]) + "\n"
    
csvstr = hdrline + data
print(csvstr)
Output:
Tests,Failures,Success Rate,Average Time,Min Time,Max Time103,24,76.70%,71 ms,0 ms,829 ms
109,35,82.01%,12 ms,2 ms,923 ms
Solution 3:
If you are using the standard csv module, then you don't need to associate values with their labels
You can do the following, assuming you have a csvwriter which can be obtained via
https://docs.python.org/3.8/library/csv.html#csv.writer
import csv
...
withopen('file.csv', 'w', newline='') as csvfile:
    csvwriter = csv.writer(csvfile) # You may add options here to format your csv file as needed
    headings = [th.get_text() for th in table.find("tr").find_all("th")]
    csvwriter.writerow(headings)
    for row in table.find_all("tr")[1:]:
        data = (td.get_text() for td in row.find_all("td"))
        csvwriter.writerow(data)
Post a Comment for "Using Beautifulsoup To Extract A Table In Python 3"