Utiliser "PHP Parallel" avec DOMDocument dans mon code PHP

**cheboy** · 12/06/2022, 04h26

Bonjour.

Je souhaiterais utiliser PHP-8 Parallel (https://github.com/krakjoe/parallel. Ou encore: https://php.net/parallel) dans ma fonction follow_links:

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
function followLinks($urls) {
        global $alreadyCrawled;
        global $crawling;
 
        $parser = new DomDocumentParser($page);
 
        foreach($urls as $page) {
 
            $linkList = $parser->getLinks();
 
            foreach($linkList as $link) {
                $href = $link->getAttribute("href");
 
                if(strpos($href, "#") !== false) {
                    continue;
                }
                else if(substr($href, 0, 11) == "javascript:") {
                    continue;
                }
                $href = createLink($href, $url);
                if(!in_array($href, $alreadyCrawled)) {
                    $alreadyCrawled[] = $href;
                    $crawling[] = $href;
                    // Output the page title, descriptions, keywords, URL, Image, Video, etc... This output is
                    // piped off to an external file using the command line.
                    getDetails($href);
 
                }
            }
 
        }
        // Remove an item from the array after we have crawled it.
        // This prevents infinitely crawling the same page.
        array_shift($crawling);
 
        followLinks($crawling);
 
    }
 
$starts = ["https://website1.dn", "https://website2.dn", "https://website3.dn", "https://website4.dn"];
 
followLinks($starts);

Je cherche donc à traiter tous les URLs stockés dans la variable $starts à la fois de façon parallèle sachant que c'est la fonction get_details qui récupère les données de chaque URL ???

Merci de m'aider.

**cheboy** · 12/06/2022, 15h12

Voici ce que j'ai essayé de faire:

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
function followLinks($urls) {
        global $alreadyCrawled;
        global $crawling;
 
        $parser = new DomDocumentParser($page);
 
        foreach($urls as $page) {
 
            $linkList = $parser->getLinks();
 
            foreach($linkList as $link) {
                $href = $link->getAttribute("href");
 
                if(strpos($href, "#") !== false) {
                    continue;
                }
                else if(substr($href, 0, 11) == "javascript:") {
                    continue;
                }
                $href = createLink($href, $url);
                if(!in_array($href, $alreadyCrawled)) {
                    $alreadyCrawled[] = $href;
                    $crawling[] = $href;
                    $runtime = new \parallel\Runtime();
                    $crawling_arr_length = count($crawling);
                    $future = $runtime->run(function() {
                        for ($i=0; $i < $crawling_arr_length; $i++)
                            // Output the page title, descriptions, keywords, URL, Image, Video, etc... This output is
                            // piped off to an external file using the command line.
                            getDetails($href);
 
                        return "easy";
                    });
                }
            }
 
        }
        // Remove an item from the array after we have crawled it.
        // This prevents infinitely crawling the same page.
        array_shift($crawling);
 
        followLinks($crawling);
 
    }
 
$starts = ["https://website1.dn", "https://website2.dn", "https://website3.dn", "https://website4.dn"];
 
followLinks($starts);

Vous pouvez constater de la ligne 24 à 33 que j'ai essayé d'inclure le classe Runtime instanciée $runtime = new \parallel\Runtime(); mais, c'est complètement brouillon et ne fonctionne pas.

Quel serait la manière la plus appropriée pour inclure le Parallel dans l'appel de ma fonction getDetails dans la fonction followLinks ???

Utiliser "PHP Parallel" avec DOMDocument dans mon code PHP

Langage PHP

Discussions similaires

Partager

Partager